Saturday, July 6, 2013

VxLAN & Openflow Controller Role

 

Background

In one of previous posts here,  I argued that there is no need for Openflow controller to realize virtual networks.  In the same post,  I also mentioned that Openflow controller is required if intelligent traffic redirection is required to realize features such as network service chaining and advanced traffic visualization.  These advanced features require to control the traffic path on per flow basis (5-tuple based connections) across physical/virtual appliances.

VxLAN, by default, tries to discover the remote endpoint  using discovery mechanism. This discovery mechanism involves sending the packet using  multicast VxLAN encapsulation.   Compute nodes or end points that owns the DMAC address of the inner packet expects to consume the packet.  Using learning mechanisms,  each VxLAN end point creates VTEP entries.  Once VTEPs are learnt,  packets are sent only to the intended endpoint.  Please see this VxLAN tutorial to understand the VxLAN functionality. 

As discussed in the post about OVS & VxLAN based networks, OVS supports flow based overlay parameters such as remote endpoint address,  VNI etc.. 

Use cases requiring control from Openflow controller

One use case as briefly touched upon is the flow-control (FC) which is a some central entity across Openflow virtual switches controlling the traffic path, possibly on per connection basis. FC functionality running on top of OF controller, gets the packet-in (miss packet) from the OVS,  decides the next destination (an appliance implementing a network service) and then programs the flow in OF switch with appropriate overlay parameters.

Another use case of OF controller, in realizing VxLAN based virtual networking support, is send/receive of multicast & broadcast packets.  VxLAN specifications support multicast transport  for transporting inner mutlticast and broadcast packets.  As discussed above,  FC functionality is expected to get hold of any inner flows whether they are unicast based, multicast based or broadcast based.   Also,  I hear that many times some network operators don't allow multicast packets in their network infrastructure, but at the same time, they don't like to stop VMs using Multicast based protocols. OF Controller  can provide a feature to duplicate the broadcast and multicast inner packets as many times as number of possible destinations of the virtual network and send them over VxLAN using unicast VxLAN encapsulation.

Support available in Cloud Orchestration tools (Openstack)

Fortunately, in cloud computing world, Openstack already maintains inventory information such as
  • Physical servers and their IP addresses.
  • VMs on each physical server and MAC addresses of the VM Ethernet Ports.
  • Virtual networks (VLAN based, VxLAN based or other overlay based).
  • Physical servers that are participating in a given virtual network.
  • Ports on OVS that are connected to the VMs and which virtual network they belong to.
Openstack provides APIs to get  hold of above information.  Also. Openstack has ability to notify interested parties when the repository changes.  Openflow controller can have a service called 'Cloud Resource Discovery' whose functionality is to keep the cloud resource repository in OF controller and make it available to OF controller applications such as FC. 
  • When it comes up, it can go and read above information and keeps the repository with it. Also, it can register to get the notifications.  
  • Upon any notification,  update local repository information. 
Note that there could be multiple OF controllers working in a cluster to take care of the load among them.  Since Openstack is one entity across OF controllers, responsibility of ensuring that the cloud repository is consistent across OF controllers is with the Openstack entity.

Details of OF Controller Role

Now, let us discuss the capabilities required in OF controllers to realize VxLAN based virtual networks to take of two functions - Enable FC & Work with Multicast/Broadcast packets.  In this,  I am assuming that OF controller is being used along side with Openstack. I am also assuming that VMs are the ones which are generating the traffic destined for another VM.  OVS is used to implement OF virtual switches and VxLAN overlays.  Essentially, OVS provides virtual switch where VMs are connected to on its north side and VxLAN ports which are connected to it on south side.

OF Controller predominantly need to have three main components;
  • Cloud Resource Discovery (CRD) component
  • Flow Control (FC) component
  • OF Controller Transport (OCT) component (such as Opendaylight,  Freescale Openflow Controller Transport,  NOX Controller etc..).

Cloud Discovery component: 

Discovers following from Openstack
  •  Virtual Networks configured
  •  Physical Servers (OF Capable switches)
  •  OF Logical switches within each OF Capable switch (For example, in Openstack world,  there are always to OF logical switch br-int and br-tun).
  •  Names of ports attached to the OF logical switches
  • Qualification of Ports (Network Port,  Port that is connected to VM  or patch port and the virtual network it corresponds to).
  • Local VLAN IDs used in br-int of each OF Capable switch and their mapping to VxLAN network and dices.
  • VMs - MAC Addresses and the corresponding physical servers.

Repositories:

OF Controller components typically maintain the repository, which is either populated by  CRD.  Many OCT do have repositories.  For example,  Freescale OCT has capability to store via its DPRM module
  • OF Capable switches
  • OF logical switches in each OF capable switch
  • Containers (Domains)
  • OF Logical switches that belong to domains.
  • Port names associated with OF logical switch.
  • Table names associated with each domain.
  • It also has ability to put various attributes to each of above elements for extensibility.
Additional repositories are required : 
  • Virtual Networks.  For each virtual network
    • Fields:
      • Virtual Network Name
      • Virtual Network Description.
      • Type of virtual network
        • In case of VLAN:
          • VLAN ID
        • In case of VxLAN
          • VNI
      • Attributes
        • To store the Openstack UUID.
      • List of references to OF Capable switches.
        • In each reference  node
          • Array of OF Logical switch references. (Example:  br-int, br-tun)
          • For br-int logical switch
            • List of references of Ports that are connected towards VMs.
              • For each reference,  also  store the reference to VM.
            • List of references of ports that are connected to br-tun.
          • For  br-tun logical switch
            • List of references of ports that are connected to br-int
            • List of references of ports that are connected to network
      • List of references to OF VMs
      • List of references to Ports that are 
    • Views:
      • Based on name
      • Based on Openstack UUID
  •  VM Repository - Each record consists of
    • Fields:
      • VM Name
      • VM Description
      • Attributes;
        • Openstack VM UUID
        • Kind of VM -  Normal Application VM or  NS VM.
      • List of references of ports connected towards this VM  on br-int (MAC address is supposed to be part of the Port repository in DPRM).  Fore each reference
        • VLAN associated with it (in br-int)
        • Reference to Virtual Network.
      • Reference to physical server (OF Capable switch)
    • Views
      • Based on VM name
      • Based on VM UUID
  •  Cell Repository
  •  Availability Zone repository
  •  Host Aggregate  repository
Existing repositories would be updated with following:
  •  Port repository in each OF logical switch: Each port entry is updated with using attributes
    •  Port Qualification attribute
      • VM Port or Network Port or Patch Port.
    • MAC Address attribute 
      • MAC Address (in case this is connected to VM port)
    • VM Reference attribute:
      • Reference to VM record.
    •  Virtual Network Reference attribute
      • Reference to Virtual network record.
  • OF Switch Repository :  I am not sure whether this is useful anywhere, but it is good to have following attributes 
    • VM Reference attribute
      • List of references to VMs that are being hosted by this switch.
    • VN References attribute:
      • List of references to VNs for which this switch is part of.
    • Availability Zone Reference attribute
      • Reference to  availability Zone record
    • Host Aggregate Zone Reference Attribute
      • Reference Host Aggregate Zone record.
    • Cell Reference Attribute
      • Reference to Cell record 

Flow Control

This component is the one which creates flows based on the information available in the repositories.  Assuming the flow granularity is at the connection level,  first packet from a VM of any connection results into a packet-miss (packet-in) in br-int.  OF controller Transport receives it from the OF switch and gives it over to the FC.  FC application knows the DPID of the switch,  in_port of the packet. From this information,  port repository is checked and from the port repository it finds the virtual network information such as local VLAN ID used in br-int and VxLAN VNI. It also finds out the remote IP address of the overlay endpoint based on VM MAC (DMAC of the packet).

Assuming that VMs communicating are in different compute nodes,  there are total 8 flows the FC module would need to create to let rest of packets of the connection go through between two VMs.
Any connection contains client-to-server side and server-to-client side.  There are two OF logical switches in each compute node.  Since, there are two compute nodes, total flows would be 2 (Due to Client to Server AND Server to Client) * 2 (Due to two Openflow switches) * 2 (because of two compute nodes).

Example:
Let us assume that compute nodes, node1 & node2, have outIP1  and outIP2 IP addresses.
Let us also assume that VM1 on Node1 is making HTTP connection (TCP, source port 30000,  destination port 80) with VM2 on Node2.  VM1 IP address is inIP1 and VM2 IP address InIP2.   Let us also assume that VxLAN  uses port number 5000.

When VxLAN based network is created,  let us assume that OVS agent created local VLAN 100 on node1 and VLAN 101 on Node2 for VxLAN network whose VNI is 2000.

FC establishes the following flows:
  • Node1
    • BR-INT:
      • Client to Server flow
        • Match fields
          • Input Port ID:  < Port ID to which VM is connected to>
          • Source IP;  inIP1
          • Destination IP:  inIP2
          • Protocol:  TCP
          • source port : 30000
          • destination port: 80
        • Actions:
          • Add VLAN tag 100 to the packet.
          • Output port :  Patch Port that is connected between br-int and br-tun.
      • Server to Client flow
        • Match fields:
          • Input Port ID: 
          • VLAN ID: 100
          • Source IP: inIP2
          • Destination IP: inIP1
          • Protocol:  TCP
          • Source port: 80
          • Destination Port: 30000
        • Actions
          • Remove VLAN tag 100
          • Output port
      •  
    • BR-TUN
      • Client to Server flow
        • Match fields:
          • Input Port ID:  Patch Port 
          • VLAN ID:  100
          • Source IP;  inIP1
          • Destination IP:  inIP2
          • Protocol:  TCP
          • source port : 30000
          • destination port: 80
        • Actions
          • Set Field:
            • Tunnel ID: 2000
            • Tunnel IP : outIP2
          • Remove VLAN tag 100
          • Output Port: 
      • Server to Client flow:
        • Match fields:
          • Input Port ID: 
          • Tunnel ID:  2000
          • Source IP: inIP2
          • Destination IP: inIP1
          • Protocol:  TCP
          • Source port: 80
          • Destination Port: 30000
        • Actions
          •  Add VLAN tag 100
          • Output port: < br-tun end of patch port pair>
  • Node 2:
    • BR-INT:
      • Server to Client flow:
        • Match fields
          • Input Port ID:  < Port ID to which VM is connected to>
          • Source IP;  inIP2
          • Destination IP:  inIP1
          • Protocol:  TCP
          • source port :80
          • destination port: 30000
        • Actions:
          • Add VLAN tag 101 to the packet.
          • Output port :  .
      • Client to Server flow
        • Match fields
          • Input Port ID:
          • VLAN ID: 101
          • Source IP: inIP1
          • Destination IP: inIP2
          • Protocol:  TCP
          • source port: 30000
          • destination port: 80
        • Actions:
          • Remove VLAN tag 101
          • Output port: 
    • BR-TUN :
      • Client to Server flow:
        • Match fields:
          • Input Port ID: 
          • Tunnel ID:  2000
          • Source IP;  inIP1
          • Destination IP:  inIP2
          • Protocol:  TCP
          • source port : 30000
          • destination port: 80
        •  Actions
          • Add VLAN tag 101
          • Output port:
      • Server to Client flow:
        • Match fields:
          • Input port ID:
          • VLAN ID: 101
          • Source IP;  inIP2
          • Destination IP:  inIP1
          • Protocol:  TCP
          • source port :80
          • destination port: 30000
        • Actions:
          • Set field:
            • Tunnel ID: 2000
            • Remote IP :  outIP1
          •  Remove vlan 101.
          •  Output port:
In case of broadcast and multicast inner packets,  FC could use OF 1.3 groups with 'ALL' type with as many buckets as number of destinations.  Each bucket must have action with separate set of 'set-field' actions.

Assume that FC sees a Multicast flow from VM1 in Node1 (outIP1) with DIP: inMIP, UDP protocol and Destination Port : 2222 and Source port 40000.  Assuming that it should go to compute nodes Node2 (outIP2), Node3 (outIP3) and Node4 (outIP4) as there are VMs that are interested in these multicast packets. Then FC would generate following flows in Node1 and Node2.  Flows in Node3 and Node4 look similar.

  • Node 1:
    • br-int
      • Match fields
        • Input port: 
        • Source IP :  inIP1
        • Destination IP:  inMIP
        • Protocol: UDP
        • Source port: 40000
        • Destination POrt:  2222
      • Actions
        • Add VLAN tag : 100
        • Output port: 
    •  br-tun
      • Group object (G1):
        • Type: ALL
        • Bucket 1: 
          • Set-fields:
            • Tunnel ID: 2000
            • Remote IP: outIP2
          • Remove VLAN tag 100
          • Output port:
        • Bucket 2
          • Set-fields:
            • Tunnel ID: 2000
            • Remote IP: outIP3
          • Remove VLAN tag 100
          • Output port:
        • Bucket 3
          • Set-fields:
            • Tunnel ID: 2000
            • Remote IP: outIP4
          • Remove VLAN tag 100
          • Output port:
      • Match fields:
        • Input port :
        • VLAN ID: 100
        • Source IP :  inIP1
        • Destination IP:  inMIP
        • Protocol: UDP
        • Source port: 40000
        • Destination POrt:  2222
      • Actions
        • Group Object: G1
  •  Node 2
    • br-tun
      • Match fields:
        • Input port:
        • Tunnel ID: 2000
        • Source IP :  inIP1
        • Destination IP:  inMIP
        • Protocol: UDP
        • Source port: 40000
        • Destination Port:  2222
      • Actions:
        • Push VLAN tag 101
        • Output port:
    • br-int
      • Match fields:
        • Input port:
        • VLAN ID: 101
        • Source IP :  inIP1
        • Destination IP:  inMIP
        • Protocol: UDP
        • Source port: 40000
        • Destination Port:  2222
      • Actions
        • Remove VLAN tag 101
        • Output port:
FC would use repository information to create above flows.  Hence it is important to have the repository information arranged in a good data structure to get information upon packet-miss.
Evey flow that is created by FC in OF logical switches, FC maintains the flows locally too.  This is to enable cases where OF switches evict the flows.  When there is a packet-miss due to this,  FC would push the locally available flow into the OF switch.  That is, whenever there is packet-in,  FC first needs to check its local run time flow store before creating new flows by referring to repository.

Few more considerations that FC and CRD components that need to take care of are;
  • VM Movement:  When VM is moved,  flows created in the OVS OF switches also should be moved accordingly.  CRD component is expected to listen for VM movement events from Openstack and internally update the repositories.  FC component, in turn, should update the OF flows accordingly - Removing flows from old compute node and put them in new compute node.

Thursday, July 4, 2013

Linux OVS & VxLAN based virtual networks


OVS (www.openvswitch.org) is the Openflow switch implementation in Linux. It implements various OF versions including 1.3.   OVS has support to realize virtual networks using VLAN and GRE for a long time.  In recent past,   OVS was enhanced to support overlay based virtual networks.  In this post, I give some commands that can be used to realize virtual networks using VxLAN.

For more information about VxLAN,  please see this tutorial.

Recently, there was a very good development in OVS on overlays.  It is no longer required to have as many 'vports' as number of compute servers to realize a virtual network across multiple compute servers.  OVS now implements the concept of flow based overlay protocol values selection.  Due to this, one VxLAN port is good enough in OVS OF switch irrespective number of remote compute nodes and irrespective of number of virtual networks.

OVS introduced new extensions (an action and set of OXM fields that can be set using set_field action)  to Openflow protocol where OF controller specifies the flow with tunnel/overlay specific information.

VxLAN protocol layer adds overlay header and it needs following information - Source IP address and Destination IP address of outer IP header,  source port and destination ports of UDP header and VNI for VxLAN header.  OVS provides facilities for Openflow controller to set the source IP, destination IP and VNI using set_field action.  OVS introduced following NXM fields

NXM_NX_TUN_ID :  To specify VNI (VxLAN Network Identifier).
NXM_NX_TUN_IPV4_SRC :  To specify source IP of the outer IP header.
NXM_NX_TUN_IPV4_DST :  To specify the destination IP of the outer IP header.

VxLAN protocol layer knows the UDP destination port from the 'vport'.   ovs-vsctl command can be used to create VxLAN ports.  ovs-vsctl command can be used to create many VxLAN ports on the same VNI with different destination port on each one of them.  VxLAN protocol layer gets rest of information required to frame outer IP, UDP headers by itself and with the help of Linux TCP/IP stack.

Similarly,  VxLAN protocol layer informs the OVS OF switches by filling up above fields after decapsulating the packets.  Due to this,  Openflow controller can use above fields as match fields.

Essentially, OVS provided mechansims to set the tunnel field values for outgoing packets in the Openflow flows and also provided mechanisms to use these tunnel fields as match fields in OF tables for incoming packets.

Following commands can be used to create VxLAN ports using 'ovs-vsctl' without explicitly mentioning the tunnel destination and tunnel ID, letting Openflow controller to specify these field values in OF flows.

Creation of VxLAN port with default UDP service port:

  ovs-vsctl add-port br-tun vxlan0 -- set Interface vxlan0   type=vxlan  options:remote_ip=flow options:key=flow

Above command is used to create VxLAN port 'vxlan0' on OF switch 'br-tun' and specifying this port to get the tunnel ID (VNI)  and tunnel remote IP from the OF flow.  "key=flow" is meant to get the tunnel ID from the flow and "remote_ip=flow" is meant to get the tunnel destination IP address from the flow.

Small variation of above command to create the VxLAN port with different UDP destination port, 5000.

ovs-vsctl add-port br-tun vxlan1 -- set Interface vxlan1 type=vxlan options:remote_ip=flow options:key=flow options:dst_port=5000

OVS provides a mechanism to create Openflow flows without having to have external Openflow controller.  'ovs-ofctl' is the machanism provided by OVS to do this.

Following command can be used to create the
ovs-ofctl add-flow br-tun "in_port=LOCAL actions=set_tunnel:1,set_field:172.16.2.3->tun_dst,output:1"  (OR)
ovs-ofctl add-flow br-tun "in_port=LOCAL actions=set_field:172.16.2.3->tun_dst, set_field:1->tun_id, output:1
"set_tunnel" is used to specify the VNI.  "set_field" to specify the tunnel destination.

Other commands of interest are:

To see the openflow port numbers:
        ovs-ofctl show br-tun
To dump flows:
        ovs-ofctl dump-flows br-tun