Tuesday, April 15, 2008

Network security computing with Session parallelization - Development tips

Until 2005, performance of network security was a function of processor power. Processors were just keep getting better and faster and thereby the network security performance. In recent past, processors are not getting better, but more processors (cores) are being added to the chip. I guess physics limited the processor technology getting faster. Manufacturers are concentrating on increasing gate count in the chips than increasing the clock rate. Chip vendors are adding more processing cores in one die. These chips are called 'Multi Core Processors' (MCP).

Before any application takes advantage of MCPs, operating systems should support Multiple cores. Linux OS and other operating systems have been supporting MCPs for last few years. Linux OS calls this feature as 'SMP' (Symmetric Multi Processing). In this mode, all cores share the same memory i.e code and data segments are same across all cores. Since any core can access any part of the code and data, it is necessary to protect the critical data by serializing the access to the code segment accessing critical data.

Applications are expected to take advantage of multiple cores in the system. Now the mantra for applications is parallelization. There is no telling that parallelization of software is not easy. It is time consuming and takes significant investment.

One of the serialization technique is to stop cores from processing the critical code while other processor is executing that code. This is typically done in Linux Kernel space using spin locks. Too much of serialization reduces the performance and the application performance does not scale well with the number of cores. At the same time, critical data and data structures (such as binary trees, linked lists etc..) must be protected by serialization.

Other Difficulties in parallelization of code are:
  • Parallelization is subject to many errors such deadlocks, races conditions and more importantly it is difficult to debug.
  • Reproducibility of problems are difficult. Hence problem identification takes longer development cycles.
  • Maintainability of the code: First time the code might have been written keeping all parallelization problems in mind. As time progresses, different developers work on the code and maintain the code. They may not be aware of parallelization problems and make mistakes.
  • Single CPU based test coverage is not sufficient. Updated test suite should be able to find as many race conditions as possible.

Any parallelization approach should make it simple to develop, maintain code and yet efficient. 'Session parallelization' approach is one method that makes this task simple for networking applications running in the kernel space.

Network security primarily consists of firewall, IPsec VPN, Intrusion Prevention, URL filtering and Anti Virus/Spam/Spyware functions. To improve the performance of firewall, IPsec VPN and in some cases IPS functions, they are typically run in Linux kernel space. Each security function has notion of session. In case of firewall and IPS, sessions are nothing but 5 tuple flows and in case of IPsec VPN, session is 'SA Bundle'. Network security functions maintain state information within the sessions. State information, some times, changes on per packet basis and new packet processing depends on the current state. There are two ways to ensure the packet synchronization is ensured with respect to states. One way is serialize the code path that updates and checks the state information. If there are multiple instances where states are checked and modified in the packet path, then there are multiple serializations. Depending on number of serializations, performance impact is smaller or higher.

Another way is 'Session parallelization'. In session Parallelization, packet synchronization happens at the session level. At any time, only one core owns the session. If any other core receives the packet, then it is queued in the session for later processing by the core owning the session. If no core owns the session, then the core that received the packet starts processing the packet after stamping its ownership. Once the core processes the packet, it checks whether there are any packets pending to be processed. If so, it processes them and if not it disowns the session. Using this method, upon session identification, rest of packet processing does not need to take any locks for serialization. It not only improves the performance, but also less error prone. Having said that, locking of code can't be avoided in some cases such as session establishment, inter session state maintenance.

During session establishment, access to the configuration information is required. Configuration update can happen in one core context, while session establishment happens in some other core context. Data integrity must be maintained to ensure that core does not get the wrong configuration information. Also, data structure integrity must be maintained. Think of cases where configuration engine is removing a configuration record from a linked list and session establishment code is traversing the linked list. If care is not taken, then session establishment code might access invalid pointer resulting null pointer exception. Since, session establishment phase required read only access to the configuration structures, locks can be taken in read only mode. In read only mode, multiple cores can do session establishment for different sessions and thereby it does not effect the session establishment performance. Since configuration updates are less common, read only locks usage does not degrade the performance.

Another area where locks to be taken during packet processing is in situations where inter session state is maintained such as 'statistics counter update', session rate control state etc.. Since Many cores work on the different sessions simultaneously, it is necessary that inter session state information is protected using write locks around the code that manipulate the state information. It is very important that amount of code that is executed under locks is as small as possible as possible for better performance.

Though there are cases where locks are used such as inter session state and configuration data access, these instances are less in number. Session state updates and accesses are more common. As long as session parallelization is done, the SMP problem is manageable. Ofcourse, one could argue that a given session performance is limited by power of one core. But, in typical environment, there are many sessions and hence the system throughput will be proportional to number of cores.

No comments: