In many of the packet processing applications, there are three operations -
'Session Creation', 'Session Deletion' and 'Session Search'. Session Creation and Deletion performance affects the session rate where as session search operation impacts the throughput. That is 'Session find' operation typically happens on per packet basis and 'Session creation/deletion' happens for each connection. Both the metrics - connection rate and throughput - are important. Any data structure selected should take care of these two important metrics. Also, any data structure selected should also perform well in Multicore environment. Fields used to search for the session depends on the application. 5-tuples of the packet is used in case of firewall/NAT/TCP/UDP, SIP, DIP, DSCP fields are used in case of forwarding application and DIP and SPI are used in case of IPsec.
Many implementations used to use hash tables for arranging the sessions. Hash algorithms vary from simple rotation & XOR operations, Bob-jenkins to CRC-32/64. Few bits of the output of algorithm is used to index into the hash table. Collision entries are normally arranged in double linked list form. As part of search operation, collision entries are searched for exact match by comparing all fields that comprise the key.
Problem with the hash table implementations are known for some time. Some of the issues are:
- Number of collision entries has impact on search performance. In some cases, if hash algorithm is bad' or if the type of sessions created at that instance are similar, it is possible that there are many entries in one hash bucket. That results to poor performance.
- Hash table attacks are not uncommon. There are cases where attackers broke the hash algorithm. Attacker send packets such a way that the device under attack creates large number of collision elements in one hash bucket. This leads to CPU exhaustion attack whenever packets are sent (by attackers) whose hash falls to a bucket having large number of entries.
- Some hash implementations try to take care of above problem by resizing the hash table dynamically. Even that can lead to 'CPU exhaustion' attack.
- Add/Delete operations are as simple as adding and removing a node from linked list. Session rate performance is typically better - O(1).
- Search operation tends to be good for most normal cases. Number of buckets per hash table is typically chosen as 1/4 of the number of maximum of sessions. That is, if 1M sessions to be supported by a device, then the hash table size is typically chosen as 256K. In ideal case, there would not be more than 4 collision entries. In many cases, it varies from 0 to 16 or so for good hash algorithms.
- Since hash tables use linked lists, RCU mechanisms can be used. As we know, read or search operations do not take any CPU cycles. Due to this, there is no lock overhead in Multicore processors.
Balanced binary trees also have one more disadvantage - It is not Multicore friendly. RCU can't be used for search operation. Read spin locks need to be used. This would take some cycles on per search operation. My observation is that it takes around 200 CPU cycles if there is no contention with writers. For some packet processing applications such as IP forwarding, Firewall/NAT, IPsec VPN 200 cycles can be huge as the rest of packet processing might happen within 1000 cycles or so. 200 cycles adds to 20% overhead and hence might not be acceptable.
Hybrid model of hash tables and balanced binary tree may be right approach. This approach is simple.
- Have hash tables for storing the sessions.
- By default, each hash bucket is a linked list.
- if number of entries in the linked list go beyond X elements, then rearrange them in a RB tree for that bucket. When the number of entries in the linked list go below Y elements and after some XYZ time it can move back to linked list.