Sunday, September 26, 2010

eNodeB and IPsec

eNodeB secures the traffic over the IPsec tunnels to the Serving Gateway (SGW) over backhaul network.  Also, eNB creates many tunnels to peer eNBs for X2 and handover traffic.  Though all features related to Ipsec are valid in eNB scenarios too,  some features are worth mentioning in eNB context.

LAG and IPsec:

Please this link here to understand the issues and solutions related to LAG and Ipsec in general. This scenario is very much valid for eNB to SGW communication. Note that traffic from all GTP tunnels in non-handoff scenario go between eNB and SGW on one or few (when DSCP based tunnels) Ipsec tunnels.  When LAG is used between eNB and LAG,  similar issue of not utilizing more than one link would arise.  Both the solutions suggested in earlier article are valid in this scenario too.  In cases where it is difficult to get multiple public IP addresses to the LAG link, then scenario 2 - forceful NAT is only option I can think of.


Capabilities expected in eNB and SGW:

Using LAG effectively requires many tunnels.  It is good to have 1 + (number of links - 1 ) * 32 Ipsec tunnels for good distribution across links.  User traffic, in this case GTP traffic should be balanced across these IPsec tunnels.

Typically there are two GTP traffic tunnels for each cell phone user - One is typically created for Data traffic and another for voice traffic.  Without LAG, normally two Ipsec tunnels are created - One for data traffic coming/going  from/to  all the cell users and another for voice traffic for all voice traffic coming/going to cell users.  GTP traffic is distributed across these two Ipsec tunnels based on DSCP value.

Now, we have lot more Ipsec tunnels.  There should be additional logic in eNB and SGW which distributes the GTP traffic across these multiple Ipsec tunnels.  This logic should distribute the traffic from a given conversation to one Ipsec tunnel.  Each GTP tunnel traffic can be viewed as one conversation.   That is, GTP tunnels are distributed across the Ipsec tunnels.   One way is to look at the TEID (Tunnel Endpoint ID) and use hash on TEID to distribute the traffic across Ipsec tunnels.

Ipsec implementation on eNB and SGW should have capability to create multiple tunnels - number of tunnels to be created should be configurable.  eNB and SGW implementations also should have capability to bring up the tunnels on demand basis too. That is,  they should ensure that these number of tunnels are UP and running as long as there is traffic.  Note that all ipsec tunnel negotiation would have same selectors and ipsec implementations should not be having intelligence to remove old tunnels with same selectors.

If persistent feature is selected on SPD rules, then the implementations should ensure that all ipsec tunnels are UP and running all the time.

As described in the earlier article, it is necessary that Ipsec implementation have capability of doing 'Red side fragmentation' so that the LAG always sees UDP header in every packet which is required for its distribution.

DSCP based Ipsec tunnels:

LTE uses packet based network for  voice, streaming,  interactive and non-interactive data.  Hence it is necessary that Ipsec tunnel honor this priority to ensure that voice and other real-time traffic is given priority.  If both data and voice is sent on the same tunnel, there is a possibility of traffic getting dropped due to sequence number checks as part of anti-replay checks in the receiver.  Even though packets are marked with increasing sequence number in both data and voice traffic and encapsulated in the Ipsec tunnel,  due to local QoS and QoS in intermediate devices, voice traffic may be sent before the data traffic - that is traffic is reordered.  As you know, receiver window right edge moves with newer sequence number.  Due to this, some data packets which have lower sequence number get dropped if they are less than the lower edge.  To avoid unnecessary drops, there are two methods used - Increase the Anti-replay window size  or use different SAs (tunnels) for different kinds of traffic. Second method is normally used.

Due to this feature and above LAG feature, number of tunnels that need to be created in eNB and SGW can go up significantly.  Hence both eNB and SGW should have enough memory and computation power to handle multiple tunnels.

Persistent tunnels

To reduce the latency of initial traffic, it is necessary to have this feature. Tunnels are UP and running all the time even when there is no traffic.  This feature is good if the links are always-on and if there is no cost based on the traffic amount.


DSCP and ECN Copy settings:


Ipsec implementations expected to copy DSCP and ECN bits from inner header to outer header.  Inner header DSCP value is set by applications and this should be continued even when the traffic is tunneled.  This will ensure that the nodes between the eNB and SGW will also give QoS treatment.  Hence it is necessary to copy the DSCP bits from inner header to outer header.

ECN bits indicate the congestion to the peer so that peer entity can inform the source entity to apply the congestion.  TCP protocol has a way to inform the source entity when the receiver gets the IP packets with CE (congestion experienced) bit on in ECN bits of IP header.  Intermediate nodes, including eNB and SGW should honor this by copying from inner header to outer header while encapsulating and copy from outer header to inner header while decapsulation.

Peer IP address adoption 

eNodeB gets the IP address from backhaul provider dynamically. It is possible that IP address might be changed by the provider while traffic is going on.  Ipsec tunnels are expected to be UP and running even if the IP address changes on the gateways.   This internet draft discusses the mechanism to adopt the peer gateway address change.  This feature is expected to be present to ensure that voice traffic does not observe too much of jitter and latency.  Note that tunnel establishment takes hundreds of milliseconds as it involves IKE negotiation of the keys.  This can introduce jitter and latency when the voice traffic is going on at that time. Implementation must implement this draft to eliminate jitter and latency issues when the IP address changes on the remote gateway.

IP Fragmentation and Reassembly

Normally many vendors of eNB give performance with respect to UDP traffic without involving IP fragmentation and reassembly. Even though this gives one data point, this may be misleading to customers. Most of the traffic in Internet today is TCP and same is true in LTE world too.  TCP MSS is chosen such as way that TCP data packet with IP and TCP header would be MTU size.  When the traffic undergoes IPsec encapsulation, it is almost certain that packets would need to be fragmented as it exceeds the MTU of the link.  Though DF bit facility is available for end points to know the Path MTU,  this feature is not used in Ipv4 end points today.  Since packets are fragmented,  reassembly is required on other side.

This feature is implemented in Ipsec implementations, but I am afraid that many implementations, though have very good IPsec performance on non-fragmented packets, they are not optimized when fragmentation and reassembly is required.  Customers need to watch out for this as significant amount of traffic would be fragmented and reassembled.

IPv6 Support

Ipv6 is fast becoming choice of service providers and LTE core network.  Hence Ipv6 support is expected in ipsec implementations.  eNB and SGW must support both Ipv4 and IPv6 tunnels.  Also, they should be able to send IPv4 and Ipv6 traffic on IPv4/Ipv6 tunnels.

TFC

Traffic Flow Confidentiality feature is normally given less importance.  I was told that LTE networks require this feature be implemented in Ipsec tunnels so that the anybody who gets hold of backhaul network traffic will not be able to guess the type of traffic that is going on in the tunnels based on traffic characteristics such as - frequency of traffic,  size of packets,  distribution of packets  etc.. 


AES-GCM

AES-GCM combines both encryption and integrity in one algorithm. Hence it is called combined algorithm.  This algorithms is supposed to be 2x faster than AES-CBC algorithm. Also it is supposed to have half of latency of AES-CBC.  Hence it is good for both performance and also for latency which is required for voice traffic.  Hence it is becoming popular algorithm in eNB and SGW.

Validation Engineers and customers I believe should look for above features in eNB and SGW.

Comments?

2 comments:

Kyle said...

"1 + (number of links - 1 ) * 32"

Why 32? This gives us 32 - 31n tunnels per link, where n is the number of links. I guess that seems reasonable, but wouldn't it depend highly on the distribution of traffic per tunnel? I.e. if some tunnels were highly lumpy, there's a chance that an even distribution of 24 tunnels per link (when n = 4) will not result in an even distribution of traffic.

Kyle said...

Whoops. I meant 32 - 31/n tunnels per link. :P