Saturday, February 5, 2011

Clustering of devices with traffic distribution by L2 Switch - One limitation & Mitigation

In my last post on "Data Center/Enterprises Clustering of Devices"  I discussed on how L2 switches are enabling device equipment vendors to provide cluster solution to take up the increasing load on the networks.  Many L2 switches are capable of analyzing multiple different types of  layer 2 headers to get to the inner IP packet and use inner IP packet source and destination IP address fields to distribute the traffic across multiple devices in the cluster using hash distribution.  L2 switches typically understand Ethernet and MPLS related headers such as Ethernet DIX,  LLC/SNAP,  802.1Q VLAN headers and MPLS Label headers.  That is, L2 switch can get hold of IP packets if packets are sent over above mentioned L2 headers. 

In some deployments, IP packets may be encapsulated in multiple L2 headers or L3 tunnels.  Some examples are : PW (Pseudo wire) header,  Ethernet over Ethernet using PW,  GRE/UDP,  GTP/UDP, IPinIP,  Mobile IP and may more.  L2 switches in the market today are not capable of understanding these headers to to get to the inner IP header.   In these cases, distribution based on inner IP header fileds will not be possible.  In these deployments, L2 switches may need to resort to distribution based on L2 header fields such as source and destination MAC or tunnel IP header fields.  Unfortunately,  distribution based on these fields may not be good at all.  If you take an example of this cluster being places between two routers,  MAC addressees of every packet traversing the cluster will be same.  Hence any distribution based on the MAC addresses would go to only one device in the cluster.  Similar would be the case, if distribution depends on the tunnel IP header fields. 

Switches have one capability though.  They can generate the hash based on calculated CRC on some part of the packet or CRC of the Ethernet packet.   CRC of the Ethernet packet can be assumed to provide good distribution as CRC on Ethernet packet is based on the complete Frame payload, that is, including the inner IP packet payload. Switches have capability to take few bits of CRC to distribute the packets across multiple cluster devices.  But the issue with this is that packets belonging to same connection would go to different cluster devices.  Base on my earlier post, cluster devices assume that packets belonging to a connection would always land on the same device. This assumption is no longer true if the cluster solution is being deployed in above mentioned environments.  These types of environments are not common at all in Data Center and Enterprises environments.  So, this problem may not be there in many instances. But service provider environments, multiple L2 and tunnel header situations are not uncommon.

How do cluster solutions work in these environments?

In these environments, if CRC based distribution is used,  switches are really doing packet sprinkling across multiple devices.  In these cluster devices should have additional intelligence.  
  • Cluster devices should be able to get past the headers to get to the inner IP header.
  • Cluster devices among themselves should have understanding of session distribution. One simple method is to do what switches were doing.  That is, they can generate the hash on the IP header fields (source and destination IP) and figure out the device which got the packet is the one which needs to server based on the hash value.  If it is,  it should continue processing the packet. If it is not, then it should give the packet to the device that owns the hash value.  
There could be good amount of traffic among cluster devices.  They can use the switch as their back plane to send and receive the traffic among them.  To avoid other device doing same thing, that is getting hold of the inner IP header and hashing on the inner IP header fields again,  the sending device can send this information along with the packet and receiving device can avoid doing same operations again.

As indicated above,  only few deployments where L2 switch does not do inner IP header based distribution.  If same cluster solution being provided for all kinds of deployments, then it is good for network equipment vendors of cluster solution to provide configuration on the cluster whether the packets which are being distributed by L2 switch is packet sprinkler or intelligent IP flow based sprinkling.  If it is packet based sprinkling, then additional logic in devices can kick in to figure out the real destination device.


Thursday, February 3, 2011

Data Center/Enterprises - Clustering of Network devices

Throughput requirements of Data center/Enterprise network equipment are going up with increased traffic in data centers and Enterprises.  In addition,  computational requirements of network equipment are also going up.  Some examples of why more computation power is required.
  • Intrusion Detection/Prevention now requires almost 3 - 4 times the  computation power on per Mbps of traffic than what was required few years back.  I guess it is mainly due to sophisticated nature of attacks and evasion techniques adopted by attackers.  Javascript analysis itself takes 10 times computational power  than the typical DPI based pattern matching.  Javascript analysis requires proxy based functionality to get hold of the javascript and script analysis for attack detection.  These two tasks require lot more CPU cycles than typical pattern matching.
  • Traditional Server Load Balancers (SLB) used to select the internal server based on the IP, UDP/TCP header values.  Next generation server load balancers (SLB) called ADCs do deep packet inspection, such as HTTP, SIP URL,  HTTP Request headers,  to select the internal server to send the load.  DPI requires more CPU cycles.
  • Application Firewalls such as Web Application Firewalls and SIP firewalls not only  do the deep packet inspection, but also deep data inspection and that  requires more horse power from CPUs.
  • DDOS prevention requires real time analysis of not only packet-by-packet analysis, but also sessions and application protocol level analysis across sessions to identify the attacks.  Many DDOS attacks on per session basis look exactly same as the normal traffic.  Hence the analysis across sessions is required for detecting the anomaly.  This capability requires not only lot of memory, but also good amount of computational power.

Multicore processors are helping some extent in solving performance issues.  Clustering of multiple Multicore SoCs are becoming necessary to solve above performance issues in Data Center and Large Enterprise markets.  Typically,  multiple blades, each using Multicore SoCs, running the same application are clustered to take up the load. L2 switches are increasingly used to front end the cluster.  L2 switches now can be configured to balance the load across multiple devices of cluster.  One might see the cluster and L2 switch in one enclosure giving a feeling that it is one big box providing tens of gigabits of performance.

What features of L2 switch are enabling clustering?
  • Distribution of sessions across multiple devices in cluster:  Majority of L2 switches have capability to distribute the  traffic coming from incoming ports (Data Ports) across multiple ports (Device ports).  By connecting devices in the cluster to these ports, then each device gets the traffic that was redirected to that port. But many of the network devices expect that all packets of any given session go to the same device.  For example, all packets belonging to one HTTP connection should go to one device.  If packets of the sessions are distributed across multiple devices,  they will not be able to do their operation of analysis, proxy etc..    A given connection traffic involves both Client to Server and Server to Client traffic.  Though L2 switches don't have session intelligence, due to the hash based distribution mechanism they adopt,  same hash value gets generated for session traffic whether it is C-S or S-C traffic of a connection.   Some cautions:
    • L2 switches don't do IP reassembly.  Due to this, hash generated for first fragment of a packet can be different from the non-initial fragments if the hash generation block is configured with L4 fields (TCP source and destination ports).  So, it is advisable to configure the hash block with IP addresses and IP protocol.   This may give rise to unequal distribution. But with large number of sessions in DC, this may not be a big limitation.  
    • Some application sessions require multiple connections. Example:  SIP (Session Initiation Protocol).   SIP voice call typically involves three connections - SIP control connection,  RTP for voice/video data and RTCP for control frames.  Many devices expect that all three connections land on the same device.  If all three connections have same source, destination and protocol fields, then all packets of SIP application session would be sent to the same device by the switch.  But, RTP and RTCP IP addresses may be different from the IP addresses of SIP control connection.  If your device needs to support this,  then it is responsibility of cluster devices.  Cluster devices need to have intelligence of ownership of these kinds of  application sessions. If a device receives packets belonging to application that is owned by some other device, it needs to redirect the traffic to that  device that owns the SIP session.
  • As indicated implicitly above,  L2 switch port are divided into - Network ports (Data ports) that connect to the DC/Enterprise networks and Device ports where the cluster of devices are connected.  With large density of ports in current generation of switches, some ports even can be dedicated to inter-device communication, there by avoiding any other back plane such as infiniband or some other L2 switch fabric.  L2 switches and devices providing ETS (802.1qaz)  and 10G ports'  support can use the same port for both inter-cluster communication as well as for network traffic. 
New Generation Configuration Framework

Even though there are multiple devices in cluster,  it is required that admin user configures the cluster only once.  Admin users should not be expected to configure each device in the cluster.   Fortunately new generation of configuration framework are designed to handle cluster configuration. 

New generation configuration frameworks support the mechanism to ensure that configuration is same across the devices in the cluster.  Increasingly,  configuration architecture supports central management system which takes care synchronization of configuration across devices on per operation basis.

Network devices maintain several statistics. With multiple devices in the cluster, each device maintains its own set of statistics. Admin user typically expects to see the consolidated list of statistic counter values across all devices in the cluster. Again, new configuration frameworks reads the statistics from each device, consolidates them and show the consolidated output. 

On image upgrade : When new image version is available,   new configuration frameworks allow admin users to upgrade the image only once for the cluster. All devices in the cluster would get the image from the central configuration framework.

With these advancements in L2 switches and configuration frameworks,  clustering is again back in the networks.