In recent past, UTM clustering lost its luster due to multi core processors such as Intel Quad processors. But, still I see that clustering is continue to be used, but I doubt 'Session based UTM clustering' is going to survive in long run. Having said that, I feel load sharing based on virtual instances (Virtual Instance based UTM Clustering) is going to survive in Service provider market. With this background, let me get back to the topic i.e challenges and solutions in Session based UTM clustering.
At very high level, one of the devices becomes master and other devices are participants. Master device receives traffic from all ports. On first packet of any session, It decides whether the session to be handled by itself or can be load balanced. In case of load balancing, it decides the device based on load balancing algorithm such as hash on source IP/destination IP or least loaded etc.. Further packets of the session are given to appropriate device. I am calling this functionality as 'Cluster plane'. Latency of the packet increases if the packets are handled by participants - Packet comes to master device first and handed over to participant device.
I believe clustering solutions should not call for multiple set of IP addresses - with each set assigned to a device. All devices in the cluster should share same IP addresses.
Challenges & Solutions
Cluster plane should not require major modifications to existing UTM functions. It is understood that some changes are required to take advantage of clustering. As long as changes are limited, then there is more success of adaptation of clustering by vendors.
Some security functions don't work when sessions are load balanced blindly based on load balancing algorithms. Special care should be taken to handle these scenarios. I tried to describe these scenario to some extent to give an idea of the challenges. I try to give possible solution to these problems. In some cases I try to describe the solutions which are balance between complexity of implementation versus limitations.
Inter dependency among sessions: Whenever there is dependency among sessions, all the sessions must be handed over to the same device. Some of the scenarios are:
- Firewall ALG Control and Data connections : Each application context has multiple 5 tuple sessions. These are called complex protocols/applications. These typically require special module in firewall called ALGs which create pin holes and do NAT translation of IP addresses in the payload. Firewalls maintain some state information across these sessions and they are dependent. In these cases, it is required that clustering ensures that these dependent sessions are assigned to same device.
- NAPT functionality: Outbound sessions are typically NATted with one or few IP addresses. New source port is assigned to ensure that 5 tuples are unique across the sessions after NAT operation. When sessions are load balanced across devices, there is a possibility that two devices might assign same port. If it happens that all other 4 tuples are same across these connections, then you have two sessions with same 5 tuples and these connections will not work. Clustering solution must divide the ports across devices. My recommendation: One time port division across devices will work just fine. Division can happen with maximum devices in cluster in mind, even though there are less number of devices at that time.
- Bandwidth rate control/ Session Rate Control / Session Limits etc.. : Some UTM functions detect application (Example: P2P/IM/DDLs etc..) traffic and control the traffic. Measuring and controlling the traffic is done in terms of bytes/sec, connections/sec and simultaneous connections based on rules with each rule having set of source IP addresses, destination IP addresses and Services. Traffic rate detection happens across the sessions. This is one big challenge in clustering environments. Sharing the dynamic state information across the devices is not an elegant solution, if there is no hardware based shared memory. For one, it is complex and second there could be lot of errors in the calculations. Multiple packets can come in before the state synchronization happens. And also, if the frequency of synchronization is very high, then this itself effects the bandwidth and performance. There are three solutions possible.
- A. Let each device detect and control the traffic individually. It is up to the administrator to configure the values appropriately. (or)
- B. Ensure that all sessions belonging to these rules are processed by one device. (or)
- C. Ensure that all sessions belong to each rule are processed by one device. That is, rules are shared among devices.
- Option C is elegant, but complex and in some scenarios may not work. Complexity arises from different functions and need for conflict resolution. That is, function1 might be having set of rules and function 2 might be having another set of rules with intersecting selector space. If function1 rules are divided across devices, then function2 rules should be divided such a way that rules having intersecting selector space are not assigned to different devices. Some times, there can be deadlocks and this requires conflict resolution. Conflict resolution would prefer one function over another function. So, the function which got lesser priority may not work as expected. This complexity gets compounded when more functions have these kinds of rules. Also complexity goes up with dynamic change of rules. This dynamic nature of rules might require existing rule to be assigned to new device and that results to moving sessions to new device with state maintenance. This becomes really ugly. Due to these complexities, my observation is that many UTM cluster implementations go for either Option A or Option B. Balance between Option A and Option B is typically done based on whether load sharing is important or granular control is important.
- My recommendation: Option A. If some function is very important, then advice administrator to disable clustering or internally disable cluster automatically when that function is enabled.
- Policy based IPsec VPN, IP-in-IP based VPN and Encapsulation-less Tunnel VPN: In IPsec VPN, security associations are created for each tunnel. UTM devices receives clear traffic from protected network. This traffic gets encrypted and sent out. When UTM device receives encrypted traffic from remote gateway, it decrypts and sends the clear traffic to protected network. Security associations maintain quite a bit of state information and due to this, it is required the clear and encrypted traffic is sent to the device having security association. Clear traffic from protected network correspond to multiple sessions. For a given tunnel, IKE session traffic, clear traffic and encrypted traffic should go to one device.
- In case of policy based VPN, clear traffic traffic selectors are known. IKE session traffic is known from the gateway IP addresses and encrypted traffic selectors are known from IP protocol and gateway IP addresses. In some cases, there could be multiple data tunnels between two gateways. In these cases, all data tunnels correspond to those should be in one device.
- In case of route based VPNs, clear traffic information is not present in IPsec policy rules. Here, IPsec tunnel is chosen based on routing information. Clustering plane should have access to routing table to make determination of device for the sessions.
- Most challenging part is with respect to handling of remote access VPN. Remote access VPN creates data tunnels dynamically and due to that tunnel selector information is known only when remote end point authenticates using IKE. If IKE is already load balanced, clustering solution must ensure that the traffic also is passed to the same device. It requires state communication between devices and clustering plane running in master device. Remote Access VPN is also used to assign IP addresses. If two remote IKE sessions are given to two different devices, there is a possibility of assigning same IP address to remote end points. Either IKE function should have facility to divide the IP addresses among themselves or clustering plane should ensure to redirect all remote access VPN IKE sessions to one device.
- My recommendation: IPSec throughput can be improved with Clustering. As a pure IPsec device, I recommend cluster plane taking advantage of clusters. As far as UTM devices are concerned, first and second generation of UTM clusters would go with processing of entire Ipsec VPN in master device. Cluster plane should redirect IPsec traffic before it does load balancing decisions.
- Routing Protocol such as OSPF and RIPv1/v2, IGMP Proxy : It is required that routing protocol information from all neighboring routers handled by only one device for route table consolidation. My recommendation: Clustering plane must redirect all routing protocol traffic to master device.
- DHCP Server : DHCP Server assign IP addresses to DHCP Clients. It should ensure that a given IP address is not assigned to multiple DHCP Clients. If DHCP server is run in multiple devices and DHCP Sessions are load balanced, there is a big chance that two different DHCP Servers might assign same IP address to two different machines. Synchronizing the lease information periodically and ensuring that integrity is maintained is too complex. To reduce this complexity, only one copy of DHCP Server should be active at any time and cluster plane should ensure that all DHCP packets are given to the active DHCP Server. My recommendation: Have changes such a way that DHCP Server in master is active and Cluster plan should redirect all DHCP packets to master device. In any case, DHCP Server is least loaded and balancing the DHCP traffic does not provide any tangible benefit.
- Dynamic IP addressing (DHCP Client and PPPoE) & Dynamic DNS: I believe that, only Medium to big enterprises use clustering solution. It is my assumption that these deployments will have static public IP addresses. Clustering solution should not complicate themselves by supporting this function.
- Route updater: Since it is required that only one routing protocol instance should be active in a cluster, clustering function should ensure to update dynamic routes to all devices.
Identical configuration among all devices:
Since sessions are load balanced, configuration should be same across all devices. If there is external management system, it becomes simple. With embedded managements, there would be some instances where the configuration is different among devices, particularly, in the time between configuration change is done and configuration is synchronized. My recommendation is to provide external management system. When embedded management is used, limitation must be made known to end users.
Statistics are updated by each device individually. Again, external management systems can do good job of consolidating this information before presenting to the user. It is my observation that embedded managements still depend on administrators to go to each device GUI to observe the statistics individually.
Stateful Layer 2/3 interfaces:
Ethernet interfaces are typically not stateful. So, even devices output the packets with same link header, there would be no issues. But in PPP and other stateful interfaces, packets are expected to be outputted by one device. In these cases, clustering and association function must ensure to send packets through one device. I recommend that master device is used for this purpose.
Interface level traffic shaping:
Many UTM devices support traffic shaping. Traffic shaping is done typically to prioritize one kind of traffic over another and also to provide guaranteed bandwidth for certain traffic. Typically, these policies are set on per interface basis. Hence, this is called 'Interface level traffic shaping'. When traffic shaping is enabled, clustering should ensure that all packets are transferred to one device, even if packets are destined to go via Ethernet interface. My recommendation is to use master device for this purpose.
Since master device is used for many purposes, this can become bottleneck. To ensure that this does not become bottleneck, I recommend to assign less work as part of load balancing decisions. Due to this, I suggest not to use 'hash based' load balancing algorithm.
With all these limitations, what is it good for?
It is a good question. With all these limitations and many raiders, I feel that it is good for applications such as Anti Virus. As I understand, performance AV with 4K sized emails is around 100 messages/sec in a typical Pentium-4 based systems. With more devices, this performance can go up linearly with number of devices in the cluster.