Sunday, May 11, 2008

Packet processing applications - Updating to Multicore processors

I described the need for session parallelization and some tips on programming here. There are many network packet processing applications which don't take advantage of SMP systems. I tried to describe the steps required to convert these applications to run on SMP systems with as less modifications as possible.

Target packet processing applications:

  • Packet processing application that maintain sessions (flows) such as firewall, IPsec VPN, Intrusion Prevention Software, Traffic Management etc..
  • Packet processing applications where significant amount of cycles are spent on each packet. Session Parallelization technique uses some additional checks and queuing (enqueue and dequeue) operations of the packets in the session. Session Parallelization technique only works, if the number of CPU cycles required for packet processing is much more than the CPU cycles it takes for few checks and queue operations. Otherwise, you are better off taking locks during packet processing. For example, IP forwarding packet processing application can be parallelized using session parallelization technique.

Let us examine the typical packet processing application requirements.

  • Upon first packet of the session, they create a context - Session context. These session contexts are kept in some kind of run time data structure such as Hash buckets, linked lists, arrays, some container etc.. - I call it as Home base structure.
  • Subsequent packets of the session gets the session context from the home base Data structures.
  • Session contexts are deleted either by timer or by user actions or due to some special packets.
  • Session contexts are typically 'C' structures with multiple members (states)
    • Some members are set during session context creation time and never changed. I call them "SessionConstants".
    • Some members are manipulated during packet processing and those values are needed by subsequent packets. And these are not required by any other function other than packet processing. I call them "Session Packet Variables".
    • Some members are manipulated by both packet processing and also in non-packet processing contexts. I call those "Session Mutex Variables".
    • Some variables are used as containers by other modules. That is responsibility of storage and retrieval is responsibility of those modules. I call them "Session Container Variables"
  • Any packet processing application might contain multiple modules and each module has its own sessions (control blocks). To improve performance and other reasons, yet times, modules store other modules' control block references in their sessions. Also, current module session might be referred by other modules. when the session is actively used by other modules by having pointer to the session, it is necessary that session is not freed until all active references are dereferenced. At the same time, delete operation can't be postponed until it is deferenced by every other module. To satisfy both the requirements, each control block typically contains two variables - "Reference Count" and "Delete flag". Reference Count is incremented when external modules store the reference of the session. When the session is being deleted (either due to inactivity timer, due to special packets or due to user actions etc.. ) and if the reference is used, then modules set the "Delete flag" of the session. Any search on the session ignores the sessions having "Delete flag" set. Also the module is expected to inform the external modules that the session is being deleted. Upon this notification, external modules are expected to remove the reference and as part of it they decrement the reference count of the session. Session is freed only after all references are removed and Delete flag is set.

Process to follow to make applications SMP aware (Example: Applications running in Kernel space)

  • Identify the Session or Control block of the target module.
  • Define two additional variables in the control block - Reference Count and Delete flag.
  • Identify the home base structure. Since the sessions can be created in multiple CPU contexts, define the lock for this structure. It is better to define Read/Write lock. Read lock can be taken when the session is searched in home base data structure. Write lock needs to be taken while adding or deleting the control block to/from the home base data structure. Ensure to increment the reference count while adding to home base structure. Remove it from the home base structure only if reference count is 1 and Delete flag is set to TRUE.
    • Some times home base could be some other module's control block. In this case, it is responsibility of other module (Container module) responsibility to store, retrive and reset this control block reference atomically.
    • Always ensure to initialize the control block completely before adding it to the home base data structure.
    • Ensure that control block reference count is incremented within the home base lock for both add and search operations.
  • Identify session constants. There is nothing that need to be done for SMP.
  • Identify Session Packet Variables. There is no need to lock these variables if "Session Parallelization" technique is employed.
  • Identify Session Mutex variables. Further identify the logic groupings of these variables. For each of these logical groupings:
    • Define a lock. This lock can be global lock or session lock. Session locks are good for performance reasons, but it requires more memory.
    • Ensure to define MACROS/inline/functions to manipulate and access variables of the group.
  • Identify container variables. If this control block is home base structure for other module control blocks, then define MACRO/inline/functions to add/retrieve/reset. These macros would be used by other modules.
    • Define this set of MACROS for each container variable.
    • Define a session lock for each container variable. Above MACROS use this lock to protect the integrity of the container value.
    • Since ADD macro is keeping the reference of foreign module control block, ensure to increment the reference count of foreign control block. To allow this, ADD macro should expect the fucntion pointer being passed to it by the foreign module. It is expected that the function ponited by function pointer is used to increment the reference count.
    • Expect the "Increment Reference Count" function pointer passed to the RETRIEVE macro. Since RETRIEVE macro returns the pointer of foreign module, it is necessary to increment the reference count to ensure that pointer is not freed prematurely.
    • Ensure that complete functionality of ADD/RETRIEVE/RESET MACROS work under the lock defined for each container variable.
  • Some guidelines on Reference Count and Delete Flag variables.
    • Define "IncRefCount", "DefRefCount" and "SetDelete" and "GetDelete" macros.
    • Define a lock to manipulate and access "Reference Count" and "Delete flag" variables.
    • Use this lock only in above macros/inline functions.
    • Use above macros for manipulating and accessing these variables.
    • Do first level of cleanup of control block when "SetDelete" macro is called. This cleanup involves informing external module of its intention of going down and also removing any foreign module references. Foreign references are ones this module stored as part of its packet/session processing. Removing foreign references involves decrementing the reference count of foreign reference and invaliding the reference in the control block.
    • Do final cleanup such as freeing up memory for control block only when "Reference count" is 1 and "Delete flag" is set to TRUE. Before freeing up the memory, remove it from the home base structure. This condition typically happens as part of "Decrement Reference Count" macro.
    • In non-SMP scenarios, the reference counts are incremented when external modules store the reference of the session in their modules. In SMP scnearios, while one processor processing the packets of the session, another processor can delete the session. Due to this, it is expected that reference count is incremented even while packet processing is happening. That is, whenever "Search" for session is done or when the session is being retried from container module, the reference count must be incremented. As indicated above, retrival of the session, including incrementing the reference count must be done either in data structure/container lock. Since incrementing the reference count happens within its own lock, you would see lock within lock in this scenario. That is, reference count lock is taken with data structure or container lock taken. This is Ok as data structure or container lock is never taken under reference count lock even in cases where session is added or removed from the data structure/conatiner.

1 comment:

Unknown said...

Thanks for great information you write it very clean. I am very lucky to get this tips from you.
Packet processing