Saturday, September 25, 2010

Link Aggregation and Ipsec

Link Aggregation is also called Ethernet Trunking and Bonding.  This feature is described in 802.3ad. In 2008, this was rolled into 802.1AX group.

What is LAG:

LAG combines multiple Ethernet Ports and exposes it as one link to the upper layers in the system.
It is Layer 2 concept.  Only trunk port would be assigned with IP addresses.  Links in the trunk don't have any Layer 3 information.  Only one MAC address would be used for the trunk.  Individual MAC addresses of the links don't appear in any communication other than control protocol (Marker protocol).

How does it work?

LAG contains two components - Distributor and Collector.
Distributor distributes the outgoing traffic across the links that constitute the trunk.  Collector collects the data in inbound direction coming from different links and tunnels through the trunk port to rest of the system.
LAG assumes that all links in the trunk are full duplex and point to point.
Simplest distribution is to distribute the packet by packet across the links based on weight configured on the links.  But there could be packet mis-ordering issues.


What are some critical items to be taken care by the Distributor:

Packet mis-ordering is one of the issues distributor would face if it distributes the traffic blindly on per packet basis.   To avoid mis-ordering,  distributors are expected to send all the packets of given flow (conversation) sent on the same link.  First generation distributors used to apply hash on source and destination IP and select the link based on hash value.  Though this ensures that the traffic belonging to one conversation goes on the same link, but the distribution may not be symmetric for some workloads.  Second generation distributors go one step beyond and apply the hash on TCP/UDP ports too.  This would give better distribution, but it may have some mis-ordering problem if the outbound packets are fragments.  Only first fragment would have transport header and other fragments don't have transport header. In those cases, there is a chance that non-initial fragments go on some other link and lead to mis-ordering.  Since fragments are not very common, some deployment accepts some level of mis-ordering to get the better utilization of the links.

Collector and Packet mis-ordering:  

To ensure that packets are delivered in order, collector should ensure to send the packets up in order it receives on any given link. There is no order to be maintained on packets coming in across links.  Collector also should ensure that it does not starve any link while receiving the packets.

Ipsec and LAG:

In some deployments, traffic is always encrypted via Ipsec and sent to the remote office.  If one tunnel  is used to send the traffic, all the traffic going from the local network to remote gateway contains same source, destination IP addresses.  In case UDP traversal is applied, it would have same source and destination ports.  Even if there are multiple links in LAG, distributor hash will fall onto only one link and other links would not be used. 

Same is true with Reverse traffic.  Also note that Links in Aggregation group are with local ISP.  Remote gateway under same admin control will not know about local Link aggregation.  That is, incoming traffic balancing across the links is in the hands of service provider.  It is okay to assume that most of 802.3ad distributors are configured with to use IP addresses and in some cases even ports.

Since distributors only know the IP addresses and ports of the packet, links would be utilized well in both directions if there are large number of flows with different IP addresses and Ports. 


Solutions

There are two solutions I can think of.

Solution 1:  Using Multiple IP address on the trunk link. 

Create as many tunnels as number of IP addresses on the trunk link with remote gateway.  As described in the link here,  some software should distribute the flows across these IPsec tunnels.  Since each tunnel now has different source IP address in the outer IP header, LAG distribution hash may fall into different links and thereby utilizing the bandwidth well in outbound direction.  One should ensure that, there are many local IP addresses to ensure that all links are used and also all links are used evenly.

Reverse traffic also would be balanced fine as service provider switch also would see different IP addresses (Destination IP).

Getting or assigning  multiple public IP addresses to the trunk may not be possible.  In which case, second solution can be used. But second solution would have some packet overheads.

Solution 2 :   Usage forceful NAT-T

Even though NAT is not detected,  there are ways to force UDP traversal. That is, ESP packets are sent with in UDP payload.  Create as many tunnels as necessary for good distribution at the LAG level.  Each tunnel would have different UDP source port.  Some software in the device is expected to balance the traffic across these tunnels. LAG would distribute the tunnels across multiple links of the LAG.  Reverse traffic also would be balanced on different links due to different destination port values of tunnels.  Since it is expected that LAG distributor look at the transport header for distribution, it is necessary that there are no fragments.  So, it is mandatory that tunnels are configured with redside fragmentation. This will ensure that fragmentation is done before Ipsec encapsulation.


In both the solutions,  both remote and local gateways should have some logic to
  • know that multiple tunnels are created for same selectors for distributing the flows.
  • know how to distribute different conversations to different tunnels.
It requires more tunnel capacity in devices.  This should not be a problem as modern devices has good horse power and enough memory to create some more tunnels with peer gateway.


Comments?

No comments: