Sunday, November 15, 2015

DPDK based Accelerated vSwitch Performance analysis

DPDK based OVS acceleration improves the performance many folds up to 17x times over native OVS.   Please see the link here :  https://download.01.org/packet-processing/ONPS1.5/Intel_ONP_Server_Release_1.5_Performance_Test_Report_Rev1.2.pdf

It is interesting to know following :
  • Performance impact with increasing number of flows.
  • Performance impact when additional functions introduced.
  • Performance impact when VMs are involved.

Performance report in 01.org has measured values with many variations.  To simplify the analysis, I am taking measured values with one core and hyper-threading disabled.  Also, I am taking numbers with 64 byte packets as PPS number is important for many networking workloads.

Performance impact with increasing number of flows:

Scenario :  Phy-OVS-Phy (Packets received from Ethernet ports are processed by the host Linux OVS and then sent out.  There is no VM involvement.  Following snippet is taken from Table 7-12.

One can observe from above table is that performance numbers are going  down with increasing number of flows, rather, dramatically.  Performance went down by 35.8% when the flows are increased from 4 to 4K. Performance went down  by 45% when the flows are increased from 4 to 8K.
Cloud and telco deployments typically can have large number of flows due to SFC and per-port filtering functionality.  It is quite possible that one might have 128K number of flows.  One thing to note is that the performance degradation is not very high from 4K to 8K. That is understandable that cache thrashing effect is constant after some number of simultaneous flows.  I am guessing that performance degradation with 128K flows would be around 50%.

Performance impact when additional functions introduced

Many deployments are going to have not only VLAN based switching, but also other functions such as VxLAN,  Secured VxLAN (VxLAN-o-Ipsec) and packet filtering (implemented using IPTables in case of KVM/QEMU based hypervisor).  There are no performance numbers in the performance report with VxLAN-o-Ipsec and IPTables, but it has performance numbers with VxLAN based networking.

Following snippet was taken from Table 7-26.  This tables shows PPS (packets per second) numbers when OVS DPDK is used with VxLAN.  Also, these numbers are with one core and hyper-threading disabled.


Without VxLAN,  performance numbers with 4 flows is 16,766,108 PPS.  With VxLAN, PPS number dropped to 7.347,179, a 56% degradation.

There are no numbers in the report for 4K and 8K flows with VxLAN.  That number would have helped to understand the performance degradation with both increasing number of flows & with additional functions.

Performance impact when VMs are involved

Table 7-12 of the performance report shows the PPS values with no VM involved.  Following snippet shows the performance measured when the VM is involved.  Packet flows through Phy-OVS-VM-OVS-Phy components.  Note that VM is running on a separate core and hence VM-level packet processing should not impact the performance.

Table 7-17 shows the PPS value for 64 byte packets with 4 flows.  PPS number is 4,796,202
PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
Performance degradation is around :  71%.

PPS value with 4K flows with VM involved (from Table 7- 21):  3,069,777
PPS value with 4K flows with no VM involved (from Table 7-12) : 10,781,154
Performance degradation is around : 71%.

Let us see the performance impact with the combination of VM involvement & increasing number of flows -

PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
PPS numbers with 4K flows with VM involvement is 3,069,777 (from Table 7-21)
Performance degradation is around :  81%.

Observations:


  • Performance degradation is steep when the packets are handed over to the VM. One big reason being that the packet is traversed through the DPDK accelerated OVS twice unlike Phy-OVS-Phy case where the OVS sees the packet only once.  Second biggest possible culprit is   the virtio-Ethernet (vhost-user) processing in the host.
  • Performance degradation is also steep when number of flows are increased. I guess cache thrashing is the culprit here. 
  • Performance degradation when more functions are used.

Having said that,  main thing to note is that,  DPDK based OVS acceleration in all cases is showing almost 7 to 17x improvement over native OVS.  That is very impressive indeed.

Comments?





Thursday, November 12, 2015

OF Actions using eBPF programs

I was just thinking about this and searched in Internet and I found that this is already implemented and part of OVS.  Please see the series of patches here:

http://openvswitch.org/pipermail/dev/2015-February/051002.html

Also, if you want to look at the patched OVS source code,  check this Linux distribution :

https://github.com/azhou-nicira/net-next

You can check the net/openvswitch directory to understand BPF changes.

Essentially, above patches are addressing one of the concerns of OF critics.  OF specifications today define actions.  Any new action requirement has to go through the standardization process in ONF, which can take few months, if not years. Many companies, like NICIRA, Freescale defined their own proprietary extensions.  But the acceptability is low as these are not part of the OF specifications. Now with BPF integration in the Kernel OVS data path,  one can define proprietary actions as BPF set of BPF programs.   As long as this generic BPF program action is standardized in OF specifications,  then new actions can be implemented as BPF programs without having to go through standardization process.

As of now, Linux provides hooking the BPF programs to three areas -  system call level,  CLS scheduling level and at the "dev" layer.   Once above patches are accepted in Linux Kernel, OVS datapath becomes fourth area where BPF programs can be associated.

I am hoping that, BPF programs can be executed not only in the Kernel, but also in DPDK accelerated OVS as well as smart-NIC (FPGA, Network processors, AIOP etc..).

Thanks
Srini

virtio based flow API

Hi,

OPNFV DPACC team is trying to standardize the flow API, that would be called from VNFs to talk to flow processing modules in VMM or smart-NIC.

One discussion group was started.  You are welcome to join this group here:

https://app.pivot-it.com/Public/Open-flow-based-distributed-data-path-for-virtualised-data-centers-openflowdatapath/enter/

Why flow flow API from VNFs :  NFV success depends on the performance VNFs can achieve. Many are feeling that NFV flexibility is great, but to achieve similar performance as physical appliances, one needs to have 2x to 10x hardware.  Even with lot more hardware, one can achieve throughput, but issues remain in achieving low packet latency and low packet jitter.  Since, NFV uses generic hardware, cost may not go up (in comparison with physical appliances), but the electricity charges related to powering more hardware and cooling would go up.  Also, space requirements also go up which can add to the cost of operations.

OVS acceleration using DPDK and smart-NICs are addressing this issue by accelerating or offloading virtual switching functionality.  1/2 of the problem is take care by this. Another 1/2 of the problem is packet processing by VNFs.  There are many VNFs, that follow CP-DP model where DP can be offloaded. Many smart-NIC solution want to take care of that too.  But lack of standards is dampening these efforts.  In NFV environments, vNFs and smart-NICs are coming from two different parties.  VNFs can be placed in many servers with smart-NICs  of various vendors. VNF vendors would not like to have drivers for various smart-NICs.  Also, smart-NIC vendors would not like to provide drivers for various VNF operating systems and their versions.  Operators, who are procuring these also would not want to worry about any inter-dependencies. That is where, standardization helps.

Another bigger issue in NFV markets is that type of VNFs that are going to be placed on a given server.  The type of VNFs on a server  keep changing with the time.  Since the type of VNFs are not known a priori,


  • One way to address is to provide all kinds of fast path functions in smart-NICs. That is very complicated, not only with respect to number of functions, but also with respect to keeping it up with newer VNF and related fast path functions. 
  • Second way is to let the VNFs install their own fast paths in smart-NIC.  That is also challenging as there are multiple  smart-NIC vendors. It is a complexity for vNF vendors to provide fast path functions for multiple  smart-NIC vendors.  It is also a complexity for smart-NIC vendors to ensure that malicious VNF fast path does not hinder its operations.  Few people are exploring the ways to use eBPF based fast path functions.  Since eBPF comes with some kind of verification of program to ensure that it does not hinder other operations,  many think that this could be one approach.  
  • Third way is to define generic pipeline of packet stages with dynamic flow programming. Similar to Openflow, but with reduced complexity of Openflow.  Smart-NIC vendors only need to provide flow processing entity.  All the intelligence of using flow processing entity is up to the VNFs.  VNFs don't push any software entity in smart-NIC.  With this mechanism, smart-NIC can be a network processors, AIOP,  FPGA etc.. 

Above discussion forum is trying to create generic flow API that can be used by VNFs to program tables, flows with actions, objects etc.. 

Please join the group and let us discuss technical items there. Once we have some clarity, then we intend to present this to OPNFV DPACC forum.


Thanks
Srini