I see this statement in one of blog entries "However, the stack of any OS cannot go under ~2000 cycles per packets due to its design.". It is really true. Linux and other operating systems are generic and they need to cater to multiple different types of applications. Generality always comes with the price. Forwarding process of IP packets in Linux for example go through many layers of software from reception of the packet to the transmit of the packet. These layers add to cycles. As excellently listed down in above blogs, higher layers of the operating systems don't have full control over the hardware (processor and accelerator) features. Since many layers of software comes in picture, cache utilization is not very good. On top of it, context switching and locks (for Multcore processors) in these generic layers would use up more core cycles.
Fastpath is nothing new. This has been adopted by networking vendors for a long time. Fastpath is specific to applications running in the networking devices. For example, in networking devices, fastpath layers are implemented for Firewall, IPsec VPN, QoS and forwarding. Fastpath concept is very simple and described below.
- When packet is received, it first checks for the matching context.
- If no context, packet is given to the normal application modules.
- Normal path based on its policy rules may decide to handle the session by creating its own context in its application. In addition it either decides to create the context in FP, or decides to handle the context itself by not populating the context in FP
- If there is a matching context, packet is handled within fastpath and packet is routed without involving the normal path.
- It runs right above hardware and it has access to all hardware features.
- Since FP is specific to given hardware, it can take advantage of all features without worrying about losing generality. Since fastpath module for any given application is expected to be very small in footpriint, not having generality across all hardware devices is considered okay.
- Fastpath implementations follow run-to-completion model.
- Due to its small footprint, most of the fastpath code might be in the L1 Cache of processors.
- They can do locking of critical data blocks in processor caches.
- Lock free implementation - Linear performance with cores
There are different types of fastpath implementations in the industry today.
- ASIC based fast path implementations.
- Network processor based fastpath implementation (Remember NPF. They even have set of API documents for fast path. You can still find there here )
- Control plane and Data Plane with some cores running CP and some cores running FP in DP.
- Linux Ethernet Driver based Fastpath for devices that use Linux SMP.
I believe many network device vendors would need to go for some kind of fast path if they need to support high volume of traffic with cost effective processors.