Wednesday, December 30, 2009

Huge-tlb-fs to improve Linux user space program performance

As I indicated  before, many networking vendors are moving their heavy packet processing applications to user space from kernel space for maintainability of the code, richness of libraries available in user space and GPL concerns of running applications in Kernel space. Typical concern one has in moving their applications to user space is performance.  This write up is one of the techniques that can be followed in reducing performance degradation in moving programs from Kernel space to user space.

Linux kernel space text, data segments are generally fixed and contiguous. Hence TLBs required are less and TLBs are assigned at initialization time.  Unlike kernel programs,  user space applications are divided across multiple processes. Each process runs in virtual memory.  TLBs are used to translate virtual address to physical address of memory.

Number of TLBs in processors are limited.  Hence TLBs are used as cache by Linux.  When there is no TLB entry corresponding to virtual address that is being accessed,  processor frees up one of the TLB entries, searches through the page tables for the mapping between virtual address space and physical address space  and adds this mapping to the freed TLB entry.   Programs with sparse memory address access will have more TLB misses and due to this performance can suffer.  In addition, if page tables are accessed more often,  then the number of times L1 and L2 cache filled up with page tables is also becomes high, there by programs usage of cache can go down, decreasing the performance of applications. This kind of problem is not there in Kernel space programs because of one time mapping of TLB entries for kernel text and data segments. Reduction of TLB misses is the key to improve performance. "hugetlbfs" facility allows TLB entries to map to bigger chunks of memory i.e huge pages.

This facility can be used to map text segment and also to do application memory management.   Linux documentation on this topic and some examples can be found here at:

Above documentation and usage is mainly limited to applications allocating huge page from the kernel space. In addition to applications allocating big  page, it is also necessary that the text segment of the program also makes use of this facility to decrease TLB misses.  Some work was done on this, but it is not yet available in the kernel main line. But patch is available for developers to try this.  Explanation on this technique is provided here  

Patch to the kernel,binutils and Glibc is provided here

No comments: