Tuesday, January 12, 2010

Hash Tables versus Balanced Binary trees - Hybrid model

Hash tables are most commonly used in network infrastructure components.  Firewall/NAT sessions, IPsec SAs and routing cache entries are some of the examples where hash tables are used to store them.

In many of the packet processing applications,  there are three operations -
'Session Creation',  'Session Deletion' and 'Session Search'.  Session Creation and Deletion performance affects the session rate where as session search operation impacts the throughput. That is 'Session find' operation typically happens on per packet basis and 'Session creation/deletion' happens for each connection.  Both the metrics - connection rate and throughput - are important.  Any data structure selected should take care of these two important metrics.  Also, any data structure selected should also perform well in Multicore environment.  Fields used to search for the session depends on the application.  5-tuples of the packet is used in case of firewall/NAT/TCP/UDP,  SIP, DIP, DSCP fields are used in case of forwarding application and DIP and SPI are used in case of IPsec.

Many implementations used to use hash tables for arranging the sessions.  Hash algorithms vary from simple rotation & XOR operations, Bob-jenkins to CRC-32/64.  Few bits of the output of algorithm is used to index into the hash table.  Collision entries are normally arranged in double linked list form.  As part of search operation, collision entries are searched for exact match by comparing all fields that comprise the key.

Problem with the hash table implementations are known for some time. Some of the issues are:
  • Number of collision entries has impact on search performance. In some cases, if hash algorithm is bad' or if the type of sessions created at that instance are similar,  it is possible that there are many entries in one hash bucket.  That results to poor performance.
  • Hash table attacks are not uncommon.  There are cases where attackers broke the hash algorithm.  Attacker send packets such a way that the device under attack creates large number of collision elements in one hash bucket.  This leads to CPU exhaustion attack whenever packets are sent (by attackers)  whose hash falls to a bucket having large number of entries.
  • Some hash implementations try to take care of above problem by resizing the hash table dynamically. Even that can lead to 'CPU exhaustion' attack. 
Hash tables are also has advantages:
  • Add/Delete operations are as simple as adding and removing a node from linked list. Session rate performance is typically better - O(1).
  • Search operation tends to be good for most normal cases.  Number of buckets per hash table is typically chosen as 1/4 of the number of maximum of sessions. That is, if 1M sessions to be supported by a device, then the hash table size is typically chosen as 256K.  In ideal case, there would not be more than 4 collision entries.  In many cases, it varies from 0 to 16 or so for good hash algorithms.
  • Since hash tables use linked lists,  RCU mechanisms can be used.  As we know, read or search operations do not take any CPU cycles.  Due to this, there is no lock overhead in Multicore processors. 
Balanced binary trees (RB trees or AVL trees) are also increasingly used in place of hash tables. They provide consistent performance in both best and worst cases scenarios.  Every time the session is added or deleted, the tree gets balanced.  Search operation is also completed with O(LogN) performance where N being the number of sessions in the tree.  If there are 1M sessions,  at the most 3 nodes are checked.  Addition and deletion operations are also happen within O(LogN).  It would have some  performance degradation with addition and deletion operation, but there is no attack possible.

Balanced binary trees also have one more disadvantage -  It is not Multicore friendly.  RCU can't be used for search operation.  Read spin locks need to be used. This would take some cycles on per search operation.  My observation is that it takes around 200 CPU cycles if there is no contention with writers.  For some packet processing applications such as IP forwarding, Firewall/NAT, IPsec VPN  200 cycles can be huge as the rest of packet processing might happen within 1000 cycles or so.  200 cycles adds to 20% overhead and hence might not be acceptable.

Hybrid model of hash tables and balanced binary tree may be right approach.  This approach is simple.
  •  Have hash tables for storing the sessions.
  •  By default, each hash bucket is a linked list.
  •  if number of entries in the linked list go beyond X elements, then rearrange them in a RB tree for that bucket.  When the number of entries in the linked list go below Y elements and after some XYZ time it can move back to linked list.
This method would get best of both worlds.  In normal work loads, performance would be as good as hash table.  If there is any attack or any fringe cases,  performance degradation is managed with RB tree.



Anand said...

Hash tables have one more advantage of Key computation , which on a Multicore can be parallel .
If you look at the conntrack / netfilter implementation of hashtables , they do take care of mutiple collisions with configurable bucket size and the attacks ,by confirming the flows and then adding into the hashtable.

Srini said...

I am not familiar with conntrack module in Linux. How does it take care of reducing the collisions?