Wednesday, December 23, 2015

Nice tutorial presentation on Openstack Horizon with AngularJS

Good textual information here:

Pasting the relevant text here:

One of the main areas of focus for Horizon has been around better user experience. Horizon has been moving towards AngularJS as a client-side architecture in order to provide better, faster feedback to users, decrease the amount of data loading on each request, and push more of the UI onto the client side.
For the Kilo release, the team is working on a reference implementation to dictate how AngularJS will be implemented in Horizon.
Improved Table Experience
Currently, filtering and pagination within Horizon are inconsistent. For Kilo, the table experience will be improved by much of it to the client side, caching more data on the client and providing pagination and filtering in the Horizon code instead of relying on the APIs to do it.
David hopes this results in a more consistent user experience, and takes the guesswork out of understanding the paradigm on each page.
There is a wizard in Horizon currently, but it's primitive and it has remained largely stagnant through Havana and Icehouse. David said he plans to refocus on that in this release because it's one of the biggest usability issues Horizon faces -- the launch instance workflow is confusing and requires a lot of upfront knowledge.
The Horizon team has been working with the OpenStack UX team to design a better workflow that will be implemented on the client side. The team will use the network wizard as well to make sure they have an extensible format and an extensible widget.
Refine plugin support
It’s important that Horizon has a clean plug-in mechanism that doesn’t involve editing horizon source code.
Right now it requires a user to add files into the directory where Horizon gets deployed; this isn't optimal because it causes problems when a user wants to update a package. The Kilo version of Horizon will have refined plug-in support that allows users to point to where the plug-in mechanism is and provide better Angular support.
Better theming
One concern that operators have is that they don't want to ship Horizon with the same UI as everyone else, especially if they’re putting it in front of customers. David said there's a need to be able to theme it better without having to hack the code.
The team plans to continue development and provide an improved theming mechanism that's easy to work with.

Friday, December 4, 2015

Breaking Diffie Hellman security with massive computing

Following blogpost and paper talk one possible method NSA would have used to read IPsec and SSL encrypted data.

And detailed paper here:

Both IPSec and SSL/TLS use DH algorithm to get hold of shared secret without sharing the actual secret.  Please see my previous post on this : note is that prime number 

I don't understand all mathematical details presented in the paper.  But important thing to note is that in IPSec and SSL,  DH prime number and  base numbers are not secret.  They are sent in clear between the parties.  Prime numbers are even advertised. For example,  IPSec RFC 3256 defines the prime numbers for various DH Groups.

According to the paper:

"The current best technique for attacking Diffie-Hellman relies on compromising one of the private exponents (a, b) by computing the discrete log of the corresponding public value (g a mod p, g b mod p).

Once the private exponents are known via the massive compute power,  getting hold of shared key is not a problem as it is well known technique used to compute the shared secret.

My initial reading was that the large amount of computation power is required to get hold of private exponents for every DH operation.  Apparently,  that is not true.  Once a particular shared secret is retrieved, any further DH exchanges can be broken with smaller compute power.  That is, some part of intermediate computational results can be reused.  That way,  any attacker need to invest only one time on massive computational power.  Then rest of DH exchanges can be broken with little compute power as long as same prime is used in new DH exchanges.

Now that this news is out,  this may be done by malicious entities. That is actually more concerning.

What are alternative solutions?

- Use EC version of DH.
- Create new primes for each tunnel and exchange the prime number.

Any other solutions?


Wednesday, December 2, 2015

Diffie Hellman - So well Explained here

See this link:

Pasting relevant text here:


Every cipher we have worked with up to this point has been what is called a symmetric key cipher, in that the key with which you encipher a plaintext message is the same as the key with which you decipher a ciphertext message. As we have discussed from time to time, this leads to several problems. One of these is that, somehow, two people who want to use such a system must privately and secretly agree on a secret key. This is quite difficult if they are a long distance apart (it requires either a trusted courier or an expensive trip), and is wholly impractical if there is a whole network of people (for example, an army) who need to communicate. Even the sophisticated Enigma machine required secret keys. In fact, it was exactly the key distribution problem that led to the initial successful attacks on the Enigma machine.
However, in the late 1970's, several people came up with a remarkable new way to solve the key-distribution problem. This allows two people to publicly exchange information that leads to a shared secret without anyone else being able to figure out the secret. TheDiffie-Hellman key exchange is based on some math that you may not have seen before. Thus, before we get to the code, we discuss the necessary mathematical background.

Prime Numbers and Modular Arithmetic

Recall that a prime number is an integer (a whole number) that has as its only factors 1 and itself (for example, 2, 17, 23, and 127 are prime). We'll be working a lot with prime numbers, since they have some special properties associated with them.
Modular arithmetic is basically doing addition (and other operations) not on a line, as you usually do, but on a circle -- the values "wrap around", always staying less than a fixed number called the modulus.
To find, for example, 39 modulo 7, you simply calculate 39/7 (= 5 4/7) and take the remainder. In this case, 7 divides into 39 with a remainder of 4. Thus, 39 modulo 7 = 4. Note that the remainder (when dividing by 7) is always less than 7. Thus, the values "wrap around," as you can see below:
0 mod 7=06 mod 7=6
1 mod 7=17 mod 7=0
2 mod 7=28 mod 7=1
3 mod 7=39 mod 7=2
4 mod 7=410 mod 7=3
5 mod 7=5
To do modular addition, you first add the two numbers normally, then divide by the modulus and take the remainder. Thus, (17+20) mod 7 = (37) mod 7 = 2.
Modular arithmetic is not unfamiliar to you; you've used it before when you want to calculate, for example, when you would have to get up in the morning if you want to get a certain number of hours of sleep. Say you're planning to go to bed at 10 PM and want to get 8 hours of sleep. To figure out when to set your alarm for, you count, starting at 10, the hours until midnight (in this case, two). At midnight (12), you reset to zero (you "wrap around" to 0) and keep counting until your total is 8. The result is 6 AM. What you just did is to solve (10+8) mod 12. As long as you don't want to sleep for more than 12 hours, you'll get the right answer using this technique. What happens if you slept more than 12 hours?


Here are some exercises for you to practice modular arithmetic on.
  1. 12+18(mod 9) Answer
  2. 3*7(mod 11) Answer
  3. (103 (mod 17))*(42 (mod 17)) (mod 17) Answer
  4. 103*42 (mod 17) Answer
  5. 72 (mod 13) Answer
  6. 73 (mod 13) Answer
  7. 74 (mod 13) Answer
  8. 75 (mod 13) Answer
  9. 76 (mod 13) Answer
Did you notice something funny about the last 5 exercises? While, usually, when we take powers of numbers, the answer gets systematically bigger and bigger, using modular arithmetic has the effect of scrambling the answers. This is, as you may guess, useful for cryptography!

Diffie-Hellman Key Exchange

The premise of the Diffie-Hellman key exchange is that two people, Alice and Bob, want to come up with a shared secret number. However, they're limited to using an insecure telephone line that their adversary, Eve (an eavesdropper), is sure to be listening to. Alice and Bob may use this secret number as their key to a Vigenere cipher, or as their key to some other cipher. If Eve gets the key, then she'll be able to read all of Alice and Bob's correspondence effortlessly. So, what are Alice and Bob to do? The amazing thing is that, using prime numbers and modular arithmetic, Alice and Bob can share their secret, right under Eve's nose! Here's how the key exchange works.
  1. Alice and Bob agree, publicly, on a prime number P, and a base number N. Eve will know these two numbers, and it won't matter!
  2. Alice chooses a number A, which we'll call her "secret exponent." She keeps A secret from everyone, including Bob. Bob, likewise, chooses his "secret exponent" B, which he keeps secret from everyone, including Alice (for subtle reasons, both A and B should be relatively prime to N; that is, A should have no common factors with N, and neither should B).
  3. Then, Alice computes the number
    J = NA (mod P)
    and sends J to Bob. Similarly, Bob computes the number
    K = NB (mod P)
    and sends K to Alice. Note that Eve now has both J and K in her possession.
  4. The final mathematical trick is that Alice now takes K, the number she got from Bob, and computes
    KA(mod P).
    Bob does the same step in his own way, computing

    JB (mod P).
    The number they get is the same! Why is this so? Well, remember that K = NB (mod P) and Alice computed KA (mod P) = (NB)A (mod P) = NBA (mod P). Also, Bob used J = NA (mod P), and computed JB (mod P) = (NA)B (mod P) = NAB (mod P).
    Thus, without ever knowing Bob's secret exponent, B, Alice was able to compute NAB (mod P). With this number as a key, Alice and Bob can now start communicating privately using some other cipher.

    Why Diffie-Hellman Works

    At this point, you may be asking, "Why can't Eve break this?" This is indeed, a good question. Eve knows N, P, J, and K. Why can't she find A, B, or, most importantly, NAB(mod P)? Isn't there some sort of inverse process by which Eve can recover A from NA(mod P)?
    Well, the thing Eve would most like to do, that is, take the logarithm (base N) of J, to get A, is confounded by the fact that Alice and Bob have done all of their math modulo P. The problem of finding A, given N, P, and NA (mod P) is called the discrete logarithm problem. As of now, there is no fast way known to do this, especially as P gets really large. One way for Eve to solve this is to make a table of all of the powers of N modulo P. However, Eve's table will have (P-1) entries in it. Thus, if P is enormous (say 100digits long), the table Eve would have to make would have more entries in it than the number of atoms in the universe! Simply storing that table would be impossible, not to mention searching through it to find a particular number. Thus, if P is sufficiently large, Eve doesn't have a good way to recover Alice and Bob's secret.
    "That's fine," you counter, "but if P is so huge, how in the world are Alice and Bob going to compute powers of numbers that big?" This is, again, a valid question. Certainly, raising a 100-digit-long number to the power 138239, for example, will produce a ridiculously large number. This is true, but since Alice and Bob are working modulo P, there is a shortcut called the repeated squaring method.
    To illustrate the method, we'll use small numbers (it works the same for larger numbers, but it requires more paper to print!). Say we want to compute 729 (mod 17). It's actually possible to do this on a simple four-function calculator. Certainly, 729 is too large for the calculator to handle by itself, so we need to break the problem down into more manageable chunks. First, break the exponent (29) into a sum of powers of two. That is,

    29 = 16 + 8 + 4 + 1 = 24 + 23 + 22 + 20

    (all we're doing here is writing 29 in binary: 11101). Now, make a list of the repeated squares of the base (7) modulo 17:
    71 (mod 17) = 7
    72 (mod 17) = 49 (mod 17) = 15
    74 (mod 17) = 72 * 72 (mod 17) = 15 * 15 (mod 17) = 4
    78 (mod 17) = 74 * 74 (mod 17) = 4*4 (mod 17) = 16
    716 (mod 17) = 78 * 78 (mod 17) = 16*16 (mod 17) = 1
    Then, 729 (mod 17) = 716 * 78 * 74 * 71 (mod 17) = 1 * 16 * 4 * 7 (mod 17) = 448 (mod 17) = 6.
    The neat thing is that the numbers in this whole process never got bigger than 162 = 256 (except for the last step; if those numbers get too big, you can reduce mod 17 again before multiplying). Thus, even though P may be huge, Alice's and Bob's computers don't need to deal with numbers bigger than (P-1)2 at any point.

Intel Memory Protection Technology - Some informational links

Definition is taken from wikipedia ( :

"Intel MPX (Memory Protection Extensions) is a set of extensions to the x86 instruction set architecture. With compilerruntime library and operating system support, Intel MPX brings increased security to software by checking pointer references whose normal compile-time intentions are maliciously exploited at runtime due to buffer overflows."

This is indeed great.  Many complicated problems that developer spend lot of time in debugging are related to  memory overwrites/underwrites,   leaks,  accessing freed memory,  Stack overwrites etc.. 

Intel MPX, at its current technology solves overwrites/underwrites issues.   Any user space application compiled/linked using -fmpx should would detect issues. 

It is not only useful for debugging, but also helps in preventing some types of malware injections. 

Since the checks are done in the hardware, there would not be much performance penalty and hence can be used even in production environments. 

From developer perspective too, most of the work is part of the compiler (GCC) and a library.  Hence, there is almost no additional effort required from the developers.

Some informational links are here:  :  MPX integration in Linux Kernel.

Sunday, November 15, 2015

DPDK based Accelerated vSwitch Performance analysis

DPDK based OVS acceleration improves the performance many folds up to 17x times over native OVS.   Please see the link here :

It is interesting to know following :
  • Performance impact with increasing number of flows.
  • Performance impact when additional functions introduced.
  • Performance impact when VMs are involved.

Performance report in has measured values with many variations.  To simplify the analysis, I am taking measured values with one core and hyper-threading disabled.  Also, I am taking numbers with 64 byte packets as PPS number is important for many networking workloads.

Performance impact with increasing number of flows:

Scenario :  Phy-OVS-Phy (Packets received from Ethernet ports are processed by the host Linux OVS and then sent out.  There is no VM involvement.  Following snippet is taken from Table 7-12.

One can observe from above table is that performance numbers are going  down with increasing number of flows, rather, dramatically.  Performance went down by 35.8% when the flows are increased from 4 to 4K. Performance went down  by 45% when the flows are increased from 4 to 8K.
Cloud and telco deployments typically can have large number of flows due to SFC and per-port filtering functionality.  It is quite possible that one might have 128K number of flows.  One thing to note is that the performance degradation is not very high from 4K to 8K. That is understandable that cache thrashing effect is constant after some number of simultaneous flows.  I am guessing that performance degradation with 128K flows would be around 50%.

Performance impact when additional functions introduced

Many deployments are going to have not only VLAN based switching, but also other functions such as VxLAN,  Secured VxLAN (VxLAN-o-Ipsec) and packet filtering (implemented using IPTables in case of KVM/QEMU based hypervisor).  There are no performance numbers in the performance report with VxLAN-o-Ipsec and IPTables, but it has performance numbers with VxLAN based networking.

Following snippet was taken from Table 7-26.  This tables shows PPS (packets per second) numbers when OVS DPDK is used with VxLAN.  Also, these numbers are with one core and hyper-threading disabled.

Without VxLAN,  performance numbers with 4 flows is 16,766,108 PPS.  With VxLAN, PPS number dropped to 7.347,179, a 56% degradation.

There are no numbers in the report for 4K and 8K flows with VxLAN.  That number would have helped to understand the performance degradation with both increasing number of flows & with additional functions.

Performance impact when VMs are involved

Table 7-12 of the performance report shows the PPS values with no VM involved.  Following snippet shows the performance measured when the VM is involved.  Packet flows through Phy-OVS-VM-OVS-Phy components.  Note that VM is running on a separate core and hence VM-level packet processing should not impact the performance.

Table 7-17 shows the PPS value for 64 byte packets with 4 flows.  PPS number is 4,796,202
PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
Performance degradation is around :  71%.

PPS value with 4K flows with VM involved (from Table 7- 21):  3,069,777
PPS value with 4K flows with no VM involved (from Table 7-12) : 10,781,154
Performance degradation is around : 71%.

Let us see the performance impact with the combination of VM involvement & increasing number of flows -

PPS number with 4 flows without VM involvement is 16,766,108 (as shown in Table 7-12).
PPS numbers with 4K flows with VM involvement is 3,069,777 (from Table 7-21)
Performance degradation is around :  81%.


  • Performance degradation is steep when the packets are handed over to the VM. One big reason being that the packet is traversed through the DPDK accelerated OVS twice unlike Phy-OVS-Phy case where the OVS sees the packet only once.  Second biggest possible culprit is   the virtio-Ethernet (vhost-user) processing in the host.
  • Performance degradation is also steep when number of flows are increased. I guess cache thrashing is the culprit here. 
  • Performance degradation when more functions are used.

Having said that,  main thing to note is that,  DPDK based OVS acceleration in all cases is showing almost 7 to 17x improvement over native OVS.  That is very impressive indeed.


Thursday, November 12, 2015

OF Actions using eBPF programs

I was just thinking about this and searched in Internet and I found that this is already implemented and part of OVS.  Please see the series of patches here:

Also, if you want to look at the patched OVS source code,  check this Linux distribution :

You can check the net/openvswitch directory to understand BPF changes.

Essentially, above patches are addressing one of the concerns of OF critics.  OF specifications today define actions.  Any new action requirement has to go through the standardization process in ONF, which can take few months, if not years. Many companies, like NICIRA, Freescale defined their own proprietary extensions.  But the acceptability is low as these are not part of the OF specifications. Now with BPF integration in the Kernel OVS data path,  one can define proprietary actions as BPF set of BPF programs.   As long as this generic BPF program action is standardized in OF specifications,  then new actions can be implemented as BPF programs without having to go through standardization process.

As of now, Linux provides hooking the BPF programs to three areas -  system call level,  CLS scheduling level and at the "dev" layer.   Once above patches are accepted in Linux Kernel, OVS datapath becomes fourth area where BPF programs can be associated.

I am hoping that, BPF programs can be executed not only in the Kernel, but also in DPDK accelerated OVS as well as smart-NIC (FPGA, Network processors, AIOP etc..).


virtio based flow API


OPNFV DPACC team is trying to standardize the flow API, that would be called from VNFs to talk to flow processing modules in VMM or smart-NIC.

One discussion group was started.  You are welcome to join this group here:

Why flow flow API from VNFs :  NFV success depends on the performance VNFs can achieve. Many are feeling that NFV flexibility is great, but to achieve similar performance as physical appliances, one needs to have 2x to 10x hardware.  Even with lot more hardware, one can achieve throughput, but issues remain in achieving low packet latency and low packet jitter.  Since, NFV uses generic hardware, cost may not go up (in comparison with physical appliances), but the electricity charges related to powering more hardware and cooling would go up.  Also, space requirements also go up which can add to the cost of operations.

OVS acceleration using DPDK and smart-NICs are addressing this issue by accelerating or offloading virtual switching functionality.  1/2 of the problem is take care by this. Another 1/2 of the problem is packet processing by VNFs.  There are many VNFs, that follow CP-DP model where DP can be offloaded. Many smart-NIC solution want to take care of that too.  But lack of standards is dampening these efforts.  In NFV environments, vNFs and smart-NICs are coming from two different parties.  VNFs can be placed in many servers with smart-NICs  of various vendors. VNF vendors would not like to have drivers for various smart-NICs.  Also, smart-NIC vendors would not like to provide drivers for various VNF operating systems and their versions.  Operators, who are procuring these also would not want to worry about any inter-dependencies. That is where, standardization helps.

Another bigger issue in NFV markets is that type of VNFs that are going to be placed on a given server.  The type of VNFs on a server  keep changing with the time.  Since the type of VNFs are not known a priori,

  • One way to address is to provide all kinds of fast path functions in smart-NICs. That is very complicated, not only with respect to number of functions, but also with respect to keeping it up with newer VNF and related fast path functions. 
  • Second way is to let the VNFs install their own fast paths in smart-NIC.  That is also challenging as there are multiple  smart-NIC vendors. It is a complexity for vNF vendors to provide fast path functions for multiple  smart-NIC vendors.  It is also a complexity for smart-NIC vendors to ensure that malicious VNF fast path does not hinder its operations.  Few people are exploring the ways to use eBPF based fast path functions.  Since eBPF comes with some kind of verification of program to ensure that it does not hinder other operations,  many think that this could be one approach.  
  • Third way is to define generic pipeline of packet stages with dynamic flow programming. Similar to Openflow, but with reduced complexity of Openflow.  Smart-NIC vendors only need to provide flow processing entity.  All the intelligence of using flow processing entity is up to the VNFs.  VNFs don't push any software entity in smart-NIC.  With this mechanism, smart-NIC can be a network processors, AIOP,  FPGA etc.. 

Above discussion forum is trying to create generic flow API that can be used by VNFs to program tables, flows with actions, objects etc.. 

Please join the group and let us discuss technical items there. Once we have some clarity, then we intend to present this to OPNFV DPACC forum.


Wednesday, October 21, 2015

Security in Hybrid Cloud sans Private Cloud

Check this link :

Few predictions made before are becoming reality.  Pure private clouds are disappearing slowly. Enterprises are increasingly using public clouds for may workloads and going for very small private clouds for critical workloads.  Combination of public cloud hosting with private cloud is called hybrid cloud.

I believe that hybrid cloud market as defined today (Private + Public combination) would decline over time and would become niche market.  But another other kind of hybrid cloud market, where Enterprises use multiple public clouds, would increase in future.

Security considerations :  In my view,  Enterprises need to  embed security in their workloads and not depend on generic security solutions provided by cloud operators.  Few reasons on why this is required.

  • Enterprises may need to host their services in various countries, where there may not be stringent laws on data protection,  data security.   
  • Enterprises may not like to depend on the integrity of administrators of Cloud operators.
  • Enterprises may not like Cloud operators to share the data & security keys to governments without their consent 
What it means is that :
  • Enterprises would need to consider hypervisor domain as insecure, at least for data.
What is it Enterprises would do in future :
  • Security will be built within the workloads (VMs)
    • Threat Security such as firewall, IPS, WAF.
    • Transport level data security such as SSL/TLS.
    • Network Level Security such as Ipsec, OpenVPN
  • Visibility would be built into the virtual machines for 
    • Performance visibility
    • Traffic visibility
    • Flow visibility
Essentially, virtual machines would have all necessary security and visibility agents built into them. Centralized management systems, controlled by Enterprises,  will now configure these agents from a central location to make the configuration & management simpler.

There is a concern that if security is built into the VMs, then attacker exploiting the applications in the VMs may be able to disable the built-in security functions, falsify the data or send wrong information to analytic engines. 

That is a valid concern.  I believe that containers would help in mitigating those concerns.
  • Run all security functions in the root container of  the VM.
  • Run applications in non-root containers within the VM
Isolation provided by containers can mitigate the challenges associated with combining security with the applications.

Service Chaining :  Traditionally, multiple security services are applied by middle virtual appliances. If the traffic is encrypted end-to-end,  these middle virtual appliances will not be able to do good job. Yes, that is true.  This can be solved by Cloud-SFC (SFFs within the virtual machines) where VM SFF itself steer the traffic to various middle appliances or container services within the VM.  More later on this...

I believe that with increasing popularity, flexibility, scale-out, performance provided by CSPs,  it is just matter of time where private clouds would disappear or decline dramatically.  Cloud users (Enterprises) would go for inbuilt security within VMs to host them in public clouds and security/visibility companies would have to address this trend. In  my view only those security/visibility companies would survive.  May be dramatic?  What do you think?

Monday, October 12, 2015

Hyperscale providers developing their own processors?

With increasing migration to the public clouds,  it appears that 70% of all Enterprise workloads would run in the public clouds.  90% of those workloads expected to be run in Amazon, Google Compute and Microsoft Azure public clouds.

With so much of processing power centralized,  it may seem natural for these companies to look into developing their own processors tuned to their workloads.

Many articles talked about rumors about these companies having strategy to develop processors.

Some interesting articles I could find are :

Above two articles seem to suggest Amazon with decent hiring of Engineers from failed ARM server startup company (Calxeda) is designing their own ARM server chips.

This old article seems to suggest that Google is also thinking of designing their own server chips.

What do you think?  When would this happen in your view.

My take is that it would be a while before they decide to build their own chips.  It may be possible that some specialized chips may be designed by them for specific work loads.  But,  creating a general purpose processor could be a challenging task. 

Tuesday, October 6, 2015

Great and informative post about Micro-Service Architecture

Please see this post in techtarget -

It is logical to think that micro-service architecture requires store-front service which hides all the complexities of various micro services.  As the post says, it is also required multiple store-front services need to be chained externally.  By breaking the architecture into this two dimension solution, it provides flexibility and reduction in complexity.

Since SFC kind of solutions can be used to chain multiple store-front services , scale-out, reliability are taken care by SFC solutions.

But the services hidden by the store-front services need to have similar scale-out and high availability properties.  It means that store-front services need to be not only be providing a service, but also work as some kind of orchestrator of services working with VIM (virtual infrastructure manager) such as NOVA and Magnum to bring-up/bring-down micro-services they are front-ending.

It did not cross my mind that service VMs or service containers could be interacting with the schedulers. It certainly makes sense keeping this kind of architecture.

Great post from Tom Nolle on TechTarget.  Really enjoyed reading it.

I wonder what kind of security requirements that would prop-up in this kind of solution architecture.  I am guessing that security within the service would be a simpler solution.


Monday, October 5, 2015

Openflow based Local fast path for distributed Security

OPNFV DPACC working group is trying to standardize the accelerator interface for vNFs to accelerate the workloads.

Accelerator standardization is required to ensure that  the virtual appliances can work on different types of compute nodes - Compute nodes having no acceleration hardware,  compute nodes having accelerator hardware from vendor 1,  compute nodes having accelerator hardware from vendor 2 etc..
Above becomes critical in cloud environments as compute node hardware is procured independent of procurement of virtual appliances.  Virtual appliance vendors like to have their virtual appliances' images work on multiple compute nodes.  Operators likes to ensure that any virtual appliances they purchase continue to work with future compute node hardware.

OPNFV DPACC team goal is to identify  various acceleration types. And then define software based interface for each acceleration type.

VMM which is typically bound to the compute node hardware is expected to have conversion module from the software interface to local hardware interface.

OPNFV DPACC team is choosing virtio and vring transport to implement software interface.

Accelerations normally fall into two categories -  Look-aside model and Inline/fast-path model.  In Look-aside model,  virtual appliance software sends command to the accelerator and expect response for each command.  The response can come back to virtual appliance in synchronous fashion or asynchronous fashion.  Inline/fast-path model typically is for packet processing kind of vNFs.  Any network function that does not need to inspect all the packets can take advantage of inline/fast-path acceleration model.

In inline/fast-path acceleration model,  network function in virtual appliance establishes a session state (either pro-actively or re-actively to the packets) and then it expects the fast-path accelerator to process further packets.

Many smart-NIC vendors provide facilities to their customers to create fast-path functions in the smart-NIC.  In physical appliances, typically this works best.  Physical appliances are fixed functions and the entire software and hardware comes from one vendor.  This vendor would ensure that both normal network function and fast-path functions work together.

In virtual world,  smart-NIC comes from one vendor and virtual appliances come from separate vendors.  Also, it is unknown at smart-NIC build time, the network functions that would be hosted on the compute node.  It may seem that smart-NIC vendors need to have all types of fast-path functions implemented in the smart-NIC.  It is obvious that it is not feasible and the smart-NIC may not have too much of code space to put all kinds of fast-path functions.  Note that, smart-NIC could be based on FPGA or constrained network processor.

Another model could be that smart-NIC is populated dynamically with fast-path functions based on the type of virtual appliances are being brought up on that node.  This also could be a problematic as there could be multiple smart-NIC vendors using various processors/network-processors/FPGA etc..  Hence, one may need to create similar fast path functions for many smart-NIC types.  There is always security & reliability issues as these functions may not co-exist s they come from various vendors.  Some function may misbehave or some functions might crash other functions. In addition, there is always a concern on amount of time it adds to bringing up and bringing down the virtual appliance.

Hence, some in OPNFV DPACC team believe that smart-NIC must implement flow processor such as openflow.  Openflow has very good advantages, Some of them are :

  • Only one logic module.
  • One could create multiple instances.
  • Each instance can be controlled by separate entity.
  • Multiple tables in each instance.
  • Flows in the table can be pipelined.
  • Flows can be programmed with pre-defined actions.

If smart-NIC implements OF,  orchestration module can create an instance for each vNF and assign the instance ownership for that vNF.  vNF at run time can program the flows with appropriate actions to create the fast path for those flows.  

It is already proven in Industry that OF and some proprietary extensions can be used to realize IPSec,  Firewall, NAT, forwarding,  SLB fast paths.  

Please see this link on the presentation slides made to OPNFV DPACC :  This slide deck provides graphical picture of above description.

Friday, October 2, 2015

Configuration relay feature in Open-SFC - Even though Open-SFC project is for "Service Function Chaining", there is one feature called "Configuration relay" which is  very useful generic feature. 

Openstack neutron advanced services project provides configuration support for few network services.  VPN-as-a-Service,  Firewall-as-a-Service and LB-as-a-Service are few examples.  These services provide RESTFul API for IPSec VPN,  Stateful firewall and Load balancers.  These services also follow similar “plugins” and “agents” paradigm.  Plugins implement the RESTful API and store the configuration in the database.  Also,  these plugin send the configuration to the Agents.   Agents, today run in the "Network Nodes", which receives the configuration from plugin and configure local services such as Strongswan,  IP Tables and HA proxy.  Here, network nodes are reachable from the Openstack controller and hence plugin drivers and agents can communicate with each other (via AMQP).

In recent past, many network services are being implemented as vNFs.  With distributed security and end-to-end security becoming norm,  network security services (such as firewall and IPSec VPN) are embedded within the application VMs.  In these cases, advanced-network-service agents need to be run in these virtual machines.   But, there is an issue of communication reachability between plugin drivers and agents.  Virtual machines are on the data network and controller nodes are in the management network.  For isolation/security reasons,  virtual machines are normally not allowed to send/receive traffic from the management network directly.

Configuration relay is expected to mitigate this issue.  Configuration relay is expected to run in each compute node in the VMM/Hypervisor.  Since VMM is reachable from the controllers,  this relay in the VMM becomes conduit (just for configuration) between network service plugin drivers with the agents in the local virtual machines.

I will post more information on how this relay works technically. But following repositories/directories have source code.

FSL_NOVA_SerialPort_Patch in is patch to the nova portion of the compute node – This patch allows creation of virtio-serial port (to allow communication between local virtual machine and configuration relay) via libvirtd and QEMU. and in is a small service in compute that enabled configuration relay.

Example program in the vNF that communicates with relay to get hold of configuration :  (Based on comment posted by Srikanth)



Friday, August 21, 2015

Some informational links on HSM

Some good information/links on HSM and related technologies

Safenet Luna, which is one of the popular HSMs in the market :

Cloud HSM (Based on Safenet Luna) :

Very good blog entry on mapping OpenSSL with HSMs using OpenSSL PKCS11 engine interface :  More details on actual steps with an example can be found here:

One more resource describing the OpenSSL integration :  One more place to get the same document:

PKCS11 standard :

Openstack Barbican provides key management functionality  and it can be enhanced to use HSM internally. More informationc an be found at :

Saturday, April 18, 2015

AWS and Networking

This link (Presentation by AWS Distinguished Engineer) provides insight into "networking" being the pain point.

Some interest things from this post :

So, the answer is that AWS probably has somewhere between 2.8 million and 5.6 million servers across its infrastructure.
Networking is a red alert situation for us right now,” explained Hamilton. “The cost of networking is escalating relative to the cost of all other equipment. It is Anti-Moore. All of our gear is going down in cost, and we are dropping prices, and networking is going the wrong way. That is a super-big problem, and I like to look out a few years, and I am seeing that the size of the networking problem is getting worse constantly. At the same time that networking is going Anti-Moore, the ratio of networking to compute is going up.”

More importantly,this seems to contradict with one of my projections  "Death of SRIOV".

Now, let’s dive into a rack and drill down into a server and its virtualized network interface card. The network interface cards support Single Root I/O Virtualization (SR-IOV), which is an extension to the PCI-Express protocol that allows the resources on a physical network device to be virtualized. SR-IOV gets around the normal software stack running in the operating system and its network drivers and the hypervisor layer that they sit on. It takes milliseconds to wade down through this software from the application to the network card. It only takes microseconds to get through the network card itself, and it takes nanoseconds to traverse the light pipes out to another network interface in another server. “This is another way of saying that the only thing that matters is the software latency at either end,” explained Hamilton. SR-IOV is much lighter weight and gives each guest partition on a virtual machine its own virtual network interface card, which rides on the physical card.
It seems that AWS started to use SR-IOV based Ethernet cards.

I guess it is possible for AWS as it also provides AMIs with inbuilt drivers.

What I am not sure is that how they are doing some of the functions - IP filtering, Overlays and traffic probing and analytics without having to see the traffic.

Currently, AWS seems to be using Intel IXGBE drivers, which means that Intel NICs are being used here.  See this :

Amazon bought Annapurna in 2015. I wonder how that plays a role.  Could it be that AWS folks found issues as detailed in here and hence they are going after programmable smart-NICs?

Food for thought...

Sunday, August 24, 2014

Death of SRIOV and Emergence of virtio

SRIOV functionality  in PCIe devices is introduced to solve a problem of sharing a physical device across multiple virtual machines in a physical server. SRIOV functionality in PCIe devices enables creation of  multiple virtual functions (VFs), typically to assign a virtual function to one virtual machine. Since each VF can be memory mapped,  virtual machines can have direct access to the device, thereby bypassing the VMM (Hypervisor).

NIC, Crypto, Compression and PCRE accelerator vendors have enhanced their PCIe devices to support Virtual Functions (VF).

It worked for some time.  Soon, with the popularity of public clouds and private clouds,  supply chain equation was changed.  It is no longer one or few vendors provide a complete solution to the operators.  Since few vendors got together to provide complete solution to the end operators, it was possible for  VM image vendors to support small number of PCIe NIC and accelerators by adding drivers to their VM images.   Soon after, operators started to procure various components that makes a system from various vendors themselves.  Operators start to get physical servers, accelerators, NICs and virtual machine images from various vendors.  Operators found this model working for them as the cost savings are huge with relatively small addition of integration by themselves.

In this new model,  virtual machine image vendors don't know what kind of accelerators and NIC cards supported by the operators.  Operators might procure newer accelerator and NIC cards in future from a new vendor.  If virtual machine is built for certain SRIOV NIC cards,  that image may not be valid in future if operators procure some other NIC card from a new vendor.  That kind of dependency is not good for all parties - Operators, NIC/Accelerator companies, virtual machine image companies.

One more major change happened in the Industry.  Operators are not just happy with simple demux of the packets based on VLAN ID or MAC ID to determine the VM destination for the incoming packets. Operators wanted to have control over the packet flows for various reasons such as :

  • Service Function Chaining.
  • Traffic Probe 
  • Traffic Control
  • Traffic Policing 
  • Filtering 
  • Overlays
  • Secure Overlays 
  • And many more...
Many NIC cards have simpler mechanisms such as VLAN, MAC level demux.  Since operators need lot of  flexibility,  they started to avoid assignment of VFs to VMs.   SDN and OVS have born out of that need. OVS running in VMM allowed operators to implement above functions without having to depend on the NIC vendors.  

These two reasons -  A. Removal of dependencies among VM image, NIC vendors and operators B.  operator need for controlling the traffic - is slowly killing SRIOV for NICs.

At least one reason - Removal of dependencies among VM image, PCI device vendors and operators - is also killing  SRIOV based accelerator cards.


virtio is a virtulaization standard for Ethernet, Disk and other accelerators.  Virtio in the VMM emulates devices as PCIe devices (emulated PCIe devices) and assign them to the guests. Guest when it comes up discovers these devices and enables them by loading appropriate virtio frontend drivers. Since these devices are emulated by the VMM in software, in theory, there are no reasonable limits on the number of devices that can be emulated and assigned to the guests.

In today world, virtio is used for emulated Ethernet, SCSI and other devices.  Since, this emulation is same irrespective of physical devices,  this naturally provides a flexibility where VM images are independent of physical NICs.  That became more attractive for VM image vendors.  As long as they support virtio front end drivers,  they can be rest assured that they can be run on any physical servers with any NIC or SCSI physical devices. 

If I take NIC as an example,  VMM receives the packets from the physical NIC,  runs through the various functions such as OVS, Linux IPTables etc.. and then sends the packet to the right guest using virtio emulated Ethernet device. In this case, only VMM need to have drivers for the physical NIC. All VM images only need to worry about virtio-Ethernet.

Next step in Evolution 

Though there is a freedom for VM images from physical devices,  but soon,  operators start to find the problems with the performance.  Since packets are traversing through VMM in both directions (ingress and egress),  major performance drops are observed.

That gave the birth to smart-NICs where smart-NICs do almost all VMM packet processing.  To enable flexibility of packet processing enhancements, most of smart-NIC started to be built using programmable entities  (FPGA or Network processors etc..). In addition,  smart-NIC vendors have also started  implement virtio-Ethernet interface to connect to VMs without involving VMM. With this, smart-NIC vendors are solving two issues - Performance issue where the device is directly connected to the guests and VM image independence from hardware. Since virtio-Ethernet is exposed by smart-NIC,  VMs need not know the difference between VMM emulating virtio-Ethernet and smart-NIC emulating virtio-Ethernet.

With great success in Ethernet,  virtio based interface is being extended in other hardware devices such as iSCSI,  Crypto/compression/regex accelerators. 

SRIOV death just began and in few years I won't be surprised if nobody talks about SRIOV.


Wednesday, March 5, 2014

Openflow switches - flow setup & termination rate

I attended ONS yesterday.  This is the 1st time, I see many OF hardware switch vendors (either based on NPUs,  FPGAs or ASICs) advertising the performance numbers.  Though the throughput numbers are impressive, flow setup/termination rates are, in my view, are disappointing.  I see the flow setup rate claims are any where between 100 to 2000/sec.

Software based virtual switches support flow setup rate, up to10K/sec for every core. If 4 core part is used, one can get easily 40K flow setups/sec.

In my view,  unless the flow setup rate is improved, it is very unlikely the hardware based OF switches would be popular as the market addressability is limited.

By talking to few, I understand that poor flow setup rate is mainly due to the way TCAMs are used.  As I understand, every flow update (add/delete) requires rearranging the existing entries and that leads to bad performance. I also understand from many of these vendors that they intend to support algorithmic search accelerators to improve the performance of flow insertions and deletions. I also understand from them that this could improve the performance to hundreds of thousands of flow setups/sec.

Unless following are taken care,  hardware based OF switch adoption would be limited.
  • Higher flow setup rate. (Must be better than software based switches)
  • Ability to maintain millions of flows.
  • Support for multiple tables (All 256 tables)
  • VxLAN Support
  • VxLAN with IPSec
Throughput performance is very important and hardware based OF switches are very good at that.  In my view, all of above are also important for any deployments to seriously consider hardware based OF switches in place of software based switches.

Tuesday, March 4, 2014

OVS Acceleration - Are you reading between the lines?

OVS, part of Linux distributions, is openflow based software switch implementation.  OVS has two components - User space component and kernel space component.  User space component consists of   OVSDB n(configuration) sub-component,  Openflow agent that communicates with openflow controllers and OF normal path processing logic.  Kernel component implements the fastpath.  OVS calls the kernel component 'datapath'.  OVS datapath maintains one table.  When the packet comes into the datapath logic, it determines the matching flow in the table.  If no entry, then the packet is sent to the user space 'normal path processing' module.  OVS, in user space, finds out if there are all matching entries across all OF tables in that pipeline. If there are all matching entries,  then it pushes the aggregate flow to the kernel datapath.  Further packets of the aggregate flow gets processed in the kernel itself.  Just to complete this discussion,  if OVS does not find any matching flow in the OF pipeline, then it, based on the miss entry,  sends the packet to the controller. Controller, then pushes the flow mod entries in the OVS.

It is believed that Linux Kernel based datapath is not efficient for following reasons.
  • Linux Kernel overhead before packet is handed over to the datapath.  Before packet is handed over to the OVS datapath, following processing occurs on packets.
    • Interrupt processing
    • Softirq processing
    • Dev Layer processing
  •  Interrupt overhead 
  • Fast Path to Normal path overhead for first few packets of a flow.

Few software/networking vendors have implemented user space datapath using OVS 'dpif' provider interface. Many of these vendors have implemented user space  datapath using Intel DPDK in poll mode.  Poll mode dedicates few cores to receive packets from the Ethernet controller directly, eliminating interrupts and  thereby avoiding overhead associated with interrupt processing. Also, since the Ethernet ports are owned by this custom user space process, there is no need for any complex hook processing as needed in Linux kernel.  This user space datapath can start OF processing on the packets immediately upon reception  from Ethernet controllers, thereby avoiding any intermediate processing overhead. There is no doubt that these two features by themselves provide good performance boost.

Many of the OVS acceleration implementations, as I know as on today, don't perform well  in Openstack environment.  Openstack orchestration with KVM not only uses OVS to implement virtual switch, but also it uses OVS to realize network virtualization (using VxLAN, NVGRE). In addition, it also uses Linux IPTables to implement isolation among VMs using security groups configuration.  Moreover, some deployments even use IPSec Over VxLAN to protect the traffic from eaves dropping.  Based on some literature I have read,  user space datapath packages also don't perform well if VMs are connected to OVS via tuntap interfaces.   They work well if there is no need for  isolation among VMs  and where there is no need for overlay based network virtualization.  That is, these acceleration packages work if VMs are trusted and when VLAN interfaces are good enough.  But, we know that Openstack integrated OVS requires VxLANs,  isolation among VMs,  usage of tuntap interfaces and even VxLAN over IPSec.

Note that isolation using firewall,  tuntap interfaces & VxLAN-over-IPsec today use Linux Kernel capabilities.  If OVS datapath is in user space, then packets anyway have to traverse through the Kernel (some times twice) to avail these capabilities, which even may have lot more performance issues over having OVS datapath in Kernel.  I will not be surprised, if the performance is lot lower than native OVS kernel datapath.

Hence, while evaluating OVS acceleration, one should ask for following questions:

  • What features of OVS applied in measuring the performance?
  • How many OF tables are used are hitting in this performance test?  Make sure that at least 5 tables are used in the OVS as this is very realistic number. 
  • Does it use VxLAN?
  • Does it implement firewall?
  • Does it implement VxLAN over IPSec?
  • Does it assume that VMs are untrusted?
If there is a performance boost with all features enabled, then that implementation is good. Otherwise, one should be very careful.

SDN Controllers - Be aware of Some critical interoperability & Robustness issues

Are you evaluating robustness of a SDN solution?  There are few tests, I believe one should run to ensure that SDN solution is robust.  Since SDN controller is brain behind traffic orchestration,  it should and must be robust enough to handle various openflow switches from various vendors.  Also, SDN controller must not assume that all openflow switches behave well.

Connectivity tests

Though initial connections to OF controllers are normally successful,  my experience is that connectivity fails in very simple non-standard cases.  Following tests would bring out any issues associated with connectivity. Ensure that the following tests are successful.

  • Controller restart :  Once switch is connected successfully,  restart the controller and ensure that switch connects successfully with the controller. It is observed few times that either switch is not initiating the connection or controller loses some important configuration during restart and does not honor connections from some switches.  In some cases, it is also found that the controller gets the dynamic IP address, but has a fixed domain  name. But switches are incapable of taking the FQDN as the controller address. Hence it is important to ensure that this test suite is successful.
  • Controller restart & controller creating pro-active flows in the switches :  Once the switch is connected successfully and traffic is flowing normally, restart the controller and ensure that switch connects to the controller successfully and traffic continues to flow.   It is observed that many switch implementations remove all the flows when it loses the main connection to the master controller. When the controller comes back up,  it is, normally, responsibility of controller to establish the basic flows (pro-active flows).  Reactive flows are okay as they can get established again upon the packet-in.  To ensure that the controller is working properly across restarts,  it is important to ensure that this test is successful.
  • Controller application restarts, but not the entire controller:  This is very critical as applications typically restart more often due to upgrades and due to bug fixes.  One must ensure that there is no issue with traffic related to other applications on the controller. Also,  one must ensure that there are no issues when the application is restarted either with new image or with existing image.  Also, one should ensure that there are no memory leaks.
  • Switch restart :  In this test, one should ensure that switch reconnects back successfully with the controller once it is restarted.  Also, one must ensure that the pro-active flows are programmed successfully by observing that the traffic continues to flow even after the switch restarts.
  • Physical wire disconnection :  One must also ensure that the temporary discontinuity does not affect the traffic flow.  It is observed that in few cases switch realizes TCP termination and controller does not. Some switches seems to be removing the flows immediately after the main connection breaks and when connected again,  either it does not get the flow states immediately as yet times controller itself may not have known the connection termination. It is observed that any new connection coming from the switch is assumed to be duplicate and hence does not initiate flow setup process. It is required that the controller must need to assume any new main connection from the switch is connecting back after some blackout.  It should treat it as two events - Disconnection with the switch followed by new connection.  
  • Change the IP address of the switch after successful connection with the controller :  Similar to above.  One must ensure that connectivity is restored and traffic flows smoothly.
  • Run above tests while traffic is flowing :  Just to ensure the system is stable even when the controller restarts while traffic is flowing through the switches.  And also ensure that controller is good when the switch restarts while packet-ins are pending in the controller and while controller is waiting for responses from the switch, especially during multi-part message response.  One black-box way of doing this is to wait until the application creates large number of flows in a switch (Say 100K flows).  Then issue "flow list" command (typically from CLI - many controllers provide mechanism to list out the flows) and immediately restart the switch and observe the behavior of controller. Once the switch is reconnected, let the flows be created and issue "flow list" command and ensure that this list is successful.
  • Check for memory leaks :  Run 24 hour tests by restarting the switch continuously.  Ensure that the script restarts the switch software only after it connects to the controller successful and traffic flows.  You should be surprised number of problems you could find with this simple test.  After 24 hours of test, ensure that there are no memory or fd (file descriptor) leaks by observing the process statistics.
  • Check for robustness of the connectivity by running 24 hour test with flow listing from the controller.  Let the controller application create large number of flows in a switch (say 100K) and in a loop execute a script which issues various commands including "flow list".  Ensure that this test is successful and there are no memory or fd leaks.

Keep Alive tests

Controllers share the memory across all openflow switch it controls.  Yet times, one misbehaving openflow switch might consume lot of OF message buffers or memory in the controller leaving controller not respond to keep alive messages from other switches.  SDN controllers are expected to reserve some message buffers and memory to receive keep alive messages and respond to them. This reservation not only required for keep alive messages, but also for each openflow switch.  Some tests that can be run to ensure that proper fairness by the controllers are:
  • Overload the controller by running cbench, which sends lot of packet-in messages and expect controller to crate flows.  While cbench is running, ensure that ta normal switch connectivity does not suffer.  Keep this test for 1/2 hour to ensure that controller is robust enough.
  • Same test, but tweak the cbench to generate and flood keep alive messages towards the controller.  While cbench is running,  ensure a normal switch connectivity to the controller does not suffer. Keep running this test continuously for 1/2 hour to ensure that the controller is robust enough.

DoS Attack Checks

There are few tests,  I believe need to be performed to ensure that the controller code is robust enough.  Buggy switches might send truncated packets or very corrupted packets which wrong (very big) length value.
  • Check all the messages & fields in the OF specifications that have length field.  Generate message (using cbench?) with wrong length value (higher value than the message size) and ensure that the controller does not crash or controller does not stop misbehaving.  It is expected that the controller eventually terminates the connection with the switch with no memory leaks.
  • Run above test continuously to ensure that there are no leaks.
  • Messages with maximum length value in the header:  It is possible that controllers might be allocating big chunk of memory and leading to memory exhaustion.  Right thing for controller is to do is to have maximum message length and drop the message (drain) without storing it in the memory.   
I would like to hear from others on what kind of tests you believe must be performed to ensure that robustness of controllers.

Monday, March 3, 2014

SDN Controller - What to look for?


ONF is making Openflow specification as one of the standards enabling non-proprietary communication between central control plane entity & distribute data plane entities. SDN Controllers are the ones which implement control plane for various data path entities.  OVS, being part of the Linux distributions,  is becoming a defacto virtual switch entity in data centers and service provider market segments.  OVS virtual switch, sitting in the Linux host acts as a switch (data path entity) between virtual machines on the Linux host and  rest of the network.

As with virtualization of compute and storage,  networks are also being virtualized. VLAN used to be the one of the techniques  to realize virtual networks. With the limitations of number of VLANs and inability of extending virtual networks using VLANs over L3 networks,  overlay based virtual network technology is replacing VLAN technology.   VxLAN overlay protocol is becoming a choice of virtual network technology.  Today virtual switches (such as OVS) are supporting VxLAN and becoming defactor overlay protocol in data center and service provider networks.

Another important technology that is becoming popular is Openstack.  Openstack is virtual resource orchestration technology to manage virtualization of compute, storage and network resources.  Neutron component of openstack takes care of configuration & management of virtual networks,  network services such as router,  DHCP, Firewall, IPSec VPN and Load balancers.  Neutron provides API to configure these network resources.  Horizon, which is GUI of openstack provides user interface for these services.

Network Nodes (A term used by Openstack community) are the ones which normally sit at the edge of the data centers. They provide firewall capability between Internet & data center networks,  IPSec capability to terminate IPSec tunnels with the  peer networks, SSL offload capability and load balancing capability to distribute the incoming connections to various servers.  Network nodes also acts as routers between external networks & internal virtual networks.  In today networks,  network nodes are self-contained devices.  They have both control plane and data plane  in each node.  Increasingly, it is being felt that SDN concepts can be used to separate out control plane & normal path software from data plane & fast path software.

Network nodes are also being used as routers across virtual networks within data centers for east-west traffic.  Some even  use them as firewall and load balancers for east-west traffic.  Increasingly,  it is being realized that network nodes should not be burdened with the east-west traffic and rather use virtual switches within each compute node to do this job.  That is, virtual switches are being thought to be used as distributed router, firewall and load balancer for east-west traffic.

Advanced network services, which do deep inspection of packets and data, such as Intrusion Prevention,  Web application firewalls,  SSL offload are being deployed in L2 transparent mode to avoid reconfiguration of networks and also to enable vmotion easily.  When deployed as virtual appliances, it also provides agility and scale-out functions.  It requires traffic steering capability to steer the traffic across required virtual appliances.  Though most of the network services are required for north-south traffic, some of them (such as IPS) are equally needed for east-west traffic.


As one see from above introduction,  operators would like to see following supported by centralized control plane entity (SDN Controllers)
  • Realization of virtual networks
  • Control plane for network nodes 
  • Normal path software for network nodes.
  • Traffic Steering capability to steer the traffic across advanced network services
  • Distributed routing, firewall & Load balancing capability for east-west traffic.
  • Integration with Openstack Neutron

At no time, centralized entity should  become a bottleneck, hence following additional requirements come in play.

  • Scale-out of control plane entity (Clustered Controllers) - Controller Manager.
  • Performance of each control plane entity.
  • Capacity of each control plane entity.
  • Security of control plane entity

Let us dig through each one of the above.

Realization of Virtual Networks:

SDN Controller  is expected to provide following:


  • Ability to program the virtual switches in compute nodes.
  • No special agent in compute nodes.
  • Ability to use OVS  using Openflow 1.3+ 
  • Ability to realize VxLAN based virtual networks using flow  based tunneling mechanism provided by OVS.
  • Ability to realize broadcast & unicast traffic using OF groups.
  • Ability to  integrate with Openstack to come to know about VM MAC addresses and the compute nodes on which they are present.
  • Ability to use above repository to program the flow entries in virtual switches without resorting broadcasting the traffic to all peer compute nodes.
  • Ability to auto-learn VTEP entries.
  • Ability to avoid multiple data path entities in a compute nodes - One single data path for each compute node.
  • Ability to honor security groups configured in Openstack Nova. That is, ability to program flows based on security groups configuration without using 'IP tables" in the compute node. 
  • Ability to use 'Connection tracking" feature to enable stateful firewall functionality.
  • Ability to support IPSec in virtual networks across compute nodes.


Capacity is entirely based on deployment scenario.  Based on best of my knowledge, I believe these parameters are realistic from deployment perspective and also based on capability of hardware.
  • Ability to support 256 compute nodes by one controller entity.  if there are more  256 compute nodes, then more controllers in the cluster should be able to take care of rest.
  • Ability to support multiple controllers - Ability to distribute controllers across the virtual switches.
  • Support for 16K Virtual networks.
  • Support for 128K Virtual ports
  • Support for 256K VTEP entries.
  • Support for 16K IPSec transport mode tunnels


  • 100K Connections/sec per SDN Controller node (Due to firewall being taken care in the controllers).  With new feature, that is being thought in ONF, called connection template,  this requirement of 100K connections/sec can go down dramatically.  I think 50K connections/sec or connection templates/sec would be good enough.
  • 512 IPSec tunnels/sec.

Control Plane & Normal Path software for network nodes

Functionality such as router control plane,  Firewall normal path,  Load balancer normal path & control plane for IPSec (IKE) are the requirements to implement control plane for network nodes.


  • Ability to integrate with Neutron configuration of routers,  firewalls,  load balancers & IPSec.
  • Support for IPv4 & IPv6 unicast routing protocols - OSPF, BGP, RIP and IS-IS.
  • Support for IPv4 & IPv6 Multicast routing protocols - PIM-SM
  • Support for IP-tables kind of firewall normal path software.
  • Support for IKE with public key based authentication.
  • Support for LVS kind of L4 load balancing software.
  • Ability to support multiple routes, firewall, load balancer instances.
  • Ability to support multiple Openflow switches that implement datapath/fastpath functionality of network nodes.
  • Ability to receive exception packets from Openflow switches, process them through control plane/normal-path software & programming the resulting flows in openflow switches.
  • Ability to support various extensions to Openflow specifications such as
    • Bind Objects 
      • To bind client-to-server & Server-to-client flows.
      • To realize IPSec SAs
      • To bind multiple flow together for easy revalidation in case of firewalls.
    • Multiple actions/instructions to support:
      • IPSec outbound/inbound SA processing.
      • Attack checks - Sequence number checks.
      • TCP sequence number NAT with delta history table.
      • Generation of ICMP error messages.
      • Big Metadata
      • LPM table support
      • IP Fragmentation
      • IP Reassembly on per table basis.
      • Ability to go back to the tables whose ID is less than the current table ID.
      • Ability to receive all pipe line fields via packet-in and sending them back via packet-out.
      • Ability for controller to set the starting table ID along with the packet-out.
      • Ability to define actions when the flow is created or bind object is created.
      • Ability to define actions when the flow is  being deleted or bind object is being deleted.
      • Connection template support to auto-create the flows within the virtual switches.


  • Ability to support multiple network node switches - Minimum 32.
  • Ability to support multiple routers -  256 per controller node,  that is, 256 name spaces per controller node.
  • Ability to support 10K Firewall rules on per router.
  • Ability to support 256 IPSec policy rules on per router.
  • Ability to support 1K pools in LVS on per router basis.
  • Ability to support 4M firewall/Load balancer sessions.
  • Ability to support 100K IPSec SAs. (If you need to support mobile users coming in via from IPSec)


  • 100K Connections or Connection templates/sec on per controller node basis.
  • 10K IPSec SAs/sec on per controller node basis.

Traffic Steering 


  • Ability to support network service chains
  • Ability to define multiple network services in a chain.
  • Ability to define bypass rules - to bypass some services for various traffic types.
  • Ability to associate multiple network service chains to a virtual network.
  • Ability to define service chain selection rules - Select a service chain based on the the type of traffic.
  • Ability to support multiple virtual networks.
  • Ability to establish rules in virtual switches that are part of the chain.
  • Ability to support scale-out of network services.


  • Support for 4K virtual networks.
  • Support for 8 network services in each chain.
  • Support for 4K chains.
  • Support for 32M flows.


  • 256K Connections Or connection templates/sec.

Distributed Routing/Firewall/Load balancing for East-West traffic

As indicated before, virtual switches in the compute nodes should be used as data plane entity for these functions. As a controller, in addition to programming the flows to realize virtual networks and traffic steering capabilities,  it should also program flows to control the traffic based on firewall rules and direct the east-west traffic based on the routing information and load balancing decisions.


  • Ability to integrate with Openstack to get to know the routers, firewall & LB configurations.
  • Ability to act as control plane/normal-path entity for firewall & LB (Similar to network node except that it programs the virtual switches of compute nodes).
  • Ability to add routes in multiple virtual switches (Unlike in network node where the routes are added to only corresponding data plane switch).
  • Ability to support many extensions (as specified in network node section).
  • Ability to collect internal server load (For load balancing decisions).


  •  Support for 512 virtual switches.
  •  8M+ firewall/SLB entries.


  • 100K Connections/sec by one SDN controller node.

SDN Controller Manager

When there are multiple controller nodes in a cluster or multiple clusters of controllers,  I believe there is a need for a manager to manage these controller nodes.


  • Ability to on-board new clusters 
  • Ability to on-board new controller nodes and assigning them to clusters.
  • Ability to recognize virtual switches - Automatically wherever possible.  Where not possible, via on-boarding.
  • Ability to associate virtual switches to controller nodes and ability  to inform controller nodes on the virtual switches that would be connected to it.
  • Ability to schedule virtual switches to controller nodes based on controller node capabilities to take in more virtual switches.
  • Ability to act as a bridge between Openstack Neutron & SDN controller nodes in synchronizing the resources & configuration of Neutron with all SDN controller nodes.  Configuration includes:
    • Ports & Networks.
    • Routers
    • Firewall, SLB & IPSec VPN configuration.
  • Ensuring that configuration in appropriate controller node is set to avoid any race conditions.
  • Ability to set backup relations.

Securing the SDN Controller

Since SDN Controller is brain behind realization of virtual networks and network services, it is required that it is highly available and not prone to attacks. Some of the security features it should implement in my view are:
  • Support SSL/TLS based OF connections.
  • Accept connections only from authorized virtual switches.
  • Always work with  backup controller.
  • Synchronize state information with backup controller.
  • DDOS Prevention 
    • Enable Syn-Cookie mechanism.
    • Enable host based Firewall
    • Allow traffic that is of interest to SDN Controller. Drop all other traffic.
    • Enable rate limiting of the traffic.
    • Enable rate  limiting on the exception packets from virtual switches.
    • Control number of flow setups/sec.
  • Constant vulnerability asssesment.
  • Running fragroute tools and isic tools to ensure that no known vulnerabilities are present.
  • Always authenticate the configuration users.
  • Provide higher priority to the configuration traffic.
Note: If one SDN controller node is implementing all the functions listed above,  it is required to combine all performance and capacity requirements.

Check SDN controller from Freescale consisting of comprehensive feature set,  takes advantage of multiple cores to provide very high performance system. Check the details here: