Background
In one of previous posts here, I argued that there is no need for Openflow controller to realize virtual networks. In the same post, I also mentioned that Openflow controller is required if intelligent traffic redirection is required to realize features such as network service chaining and advanced traffic visualization. These advanced features require to control the traffic path on per flow basis (5-tuple based connections) across physical/virtual appliances.VxLAN, by default, tries to discover the remote endpoint using discovery mechanism. This discovery mechanism involves sending the packet using multicast VxLAN encapsulation. Compute nodes or end points that owns the DMAC address of the inner packet expects to consume the packet. Using learning mechanisms, each VxLAN end point creates VTEP entries. Once VTEPs are learnt, packets are sent only to the intended endpoint. Please see this VxLAN tutorial to understand the VxLAN functionality.
As discussed in the post about OVS & VxLAN based networks, OVS supports flow based overlay parameters such as remote endpoint address, VNI etc..
Use cases requiring control from Openflow controller
One use case as briefly touched upon is the flow-control (FC) which is a some central entity across Openflow virtual switches controlling the traffic path, possibly on per connection basis. FC functionality running on top of OF controller, gets the packet-in (miss packet) from the OVS, decides the next destination (an appliance implementing a network service) and then programs the flow in OF switch with appropriate overlay parameters.Another use case of OF controller, in realizing VxLAN based virtual networking support, is send/receive of multicast & broadcast packets. VxLAN specifications support multicast transport for transporting inner mutlticast and broadcast packets. As discussed above, FC functionality is expected to get hold of any inner flows whether they are unicast based, multicast based or broadcast based. Also, I hear that many times some network operators don't allow multicast packets in their network infrastructure, but at the same time, they don't like to stop VMs using Multicast based protocols. OF Controller can provide a feature to duplicate the broadcast and multicast inner packets as many times as number of possible destinations of the virtual network and send them over VxLAN using unicast VxLAN encapsulation.
Support available in Cloud Orchestration tools (Openstack)
Fortunately, in cloud computing world, Openstack already maintains inventory information such as- Physical servers and their IP addresses.
- VMs on each physical server and MAC addresses of the VM Ethernet Ports.
- Virtual networks (VLAN based, VxLAN based or other overlay based).
- Physical servers that are participating in a given virtual network.
- Ports on OVS that are connected to the VMs and which virtual network they belong to.
- When it comes up, it can go and read above information and keeps the repository with it. Also, it can register to get the notifications.
- Upon any notification, update local repository information.
Details of OF Controller Role
Now, let us discuss the capabilities required in OF controllers to realize VxLAN based virtual networks to take of two functions - Enable FC & Work with Multicast/Broadcast packets. In this, I am assuming that OF controller is being used along side with Openstack. I am also assuming that VMs are the ones which are generating the traffic destined for another VM. OVS is used to implement OF virtual switches and VxLAN overlays. Essentially, OVS provides virtual switch where VMs are connected to on its north side and VxLAN ports which are connected to it on south side.
OF Controller predominantly need to have three main components;
- Cloud Resource Discovery (CRD) component
- Flow Control (FC) component
- OF Controller Transport (OCT) component (such as Opendaylight, Freescale Openflow Controller Transport, NOX Controller etc..).
Cloud Discovery component:
Discovers following from Openstack
- Virtual Networks configured
- Physical Servers (OF Capable switches)
- OF Logical switches within each OF Capable switch (For example, in Openstack world, there are always to OF logical switch br-int and br-tun).
- Names of ports attached to the OF logical switches
- Qualification of Ports (Network Port, Port that is connected to VM or patch port and the virtual network it corresponds to).
- Local VLAN IDs used in br-int of each OF Capable switch and their mapping to VxLAN network and dices.
- VMs - MAC Addresses and the corresponding physical servers.
Repositories:
OF Controller components typically maintain the repository, which is either populated by CRD. Many OCT do have repositories. For example, Freescale OCT has capability to store via its DPRM module- OF Capable switches
- OF logical switches in each OF capable switch
- Containers (Domains)
- OF Logical switches that belong to domains.
- Port names associated with OF logical switch.
- Table names associated with each domain.
- It also has ability to put various attributes to each of above elements for extensibility.
- Virtual Networks. For each virtual network
- Fields:
- Virtual Network Name
- Virtual Network Description.
- Type of virtual network
- In case of VLAN:
- VLAN ID
- In case of VxLAN
- VNI
- Attributes
- To store the Openstack UUID.
- List of references to OF Capable switches.
- In each reference node
- Array of OF Logical switch references. (Example: br-int, br-tun)
- For br-int logical switch
- List of references of Ports that are connected towards VMs.
- For each reference, also store the reference to VM.
- List of references of ports that are connected to br-tun.
- For br-tun logical switch
- List of references of ports that are connected to br-int
- List of references of ports that are connected to network
- List of references to OF VMs
- List of references to Ports that are
- Views:
- Based on name
- Based on Openstack UUID
- VM Repository - Each record consists of
- Fields:
- VM Name
- VM Description
- Attributes;
- Openstack VM UUID
- Kind of VM - Normal Application VM or NS VM.
- List of references of ports connected towards this VM on br-int (MAC address is supposed to be part of the Port repository in DPRM). Fore each reference
- VLAN associated with it (in br-int)
- Reference to Virtual Network.
- Reference to physical server (OF Capable switch)
- Views
- Based on VM name
- Based on VM UUID
- Cell Repository
- Availability Zone repository
- Host Aggregate repository
- Port repository in each OF logical switch: Each port entry is updated with using attributes
- Port Qualification attribute
- VM Port or Network Port or Patch Port.
- MAC Address attribute
- MAC Address (in case this is connected to VM port)
- VM Reference attribute:
- Reference to VM record.
- Virtual Network Reference attribute
- Reference to Virtual network record.
- OF Switch Repository : I am not sure whether this is useful anywhere, but it is good to have following attributes
- VM Reference attribute
- List of references to VMs that are being hosted by this switch.
- VN References attribute:
- List of references to VNs for which this switch is part of.
- Availability Zone Reference attribute
- Reference to availability Zone record
- Host Aggregate Zone Reference Attribute
- Reference Host Aggregate Zone record.
- Cell Reference Attribute
- Reference to Cell record
Flow Control
This component is the one which creates flows based on the information available in the repositories. Assuming the flow granularity is at the connection level, first packet from a VM of any connection results into a packet-miss (packet-in) in br-int. OF controller Transport receives it from the OF switch and gives it over to the FC. FC application knows the DPID of the switch, in_port of the packet. From this information, port repository is checked and from the port repository it finds the virtual network information such as local VLAN ID used in br-int and VxLAN VNI. It also finds out the remote IP address of the overlay endpoint based on VM MAC (DMAC of the packet).Assuming that VMs communicating are in different compute nodes, there are total 8 flows the FC module would need to create to let rest of packets of the connection go through between two VMs.
Any connection contains client-to-server side and server-to-client side. There are two OF logical switches in each compute node. Since, there are two compute nodes, total flows would be 2 (Due to Client to Server AND Server to Client) * 2 (Due to two Openflow switches) * 2 (because of two compute nodes).
Example:
Let us assume that compute nodes, node1 & node2, have outIP1 and outIP2 IP addresses.
Let us also assume that VM1 on Node1 is making HTTP connection (TCP, source port 30000, destination port 80) with VM2 on Node2. VM1 IP address is inIP1 and VM2 IP address InIP2. Let us also assume that VxLAN uses port number 5000.
When VxLAN based network is created, let us assume that OVS agent created local VLAN 100 on node1 and VLAN 101 on Node2 for VxLAN network whose VNI is 2000.
FC establishes the following flows:
- Node1
- BR-INT:
- Client to Server flow
- Match fields
- Input Port ID: < Port ID to which VM is connected to>
- Source IP; inIP1
- Destination IP: inIP2
- Protocol: TCP
- source port : 30000
- destination port: 80
- Actions:
- Add VLAN tag 100 to the packet.
- Output port : Patch Port that is connected between br-int and br-tun.
- Server to Client flow
- Match fields:
- Input Port ID:
- VLAN ID: 100
- Source IP: inIP2
- Destination IP: inIP1
- Protocol: TCP
- Source port: 80
- Destination Port: 30000
- Actions
- Remove VLAN tag 100
- Output port
- BR-TUN
- Client to Server flow
- Match fields:
- Input Port ID: Patch Port
- VLAN ID: 100
- Source IP; inIP1
- Destination IP: inIP2
- Protocol: TCP
- source port : 30000
- destination port: 80
- Actions
- Set Field:
- Tunnel ID: 2000
- Tunnel IP : outIP2
- Remove VLAN tag 100
- Output Port:
- Server to Client flow:
- Match fields:
- Input Port ID:
- Tunnel ID: 2000
- Source IP: inIP2
- Destination IP: inIP1
- Protocol: TCP
- Source port: 80
- Destination Port: 30000
- Actions
- Add VLAN tag 100
- Output port: < br-tun end of patch port pair>
- Node 2:
- BR-INT:
- Server to Client flow:
- Match fields
- Input Port ID: < Port ID to which VM is connected to>
- Source IP; inIP2
- Destination IP: inIP1
- Protocol: TCP
- source port :80
- destination port: 30000
- Actions:
- Add VLAN tag 101 to the packet.
- Output port :
. - Client to Server flow
- Match fields
- Input Port ID:
- VLAN ID: 101
- Source IP: inIP1
- Destination IP: inIP2
- Protocol: TCP
- source port: 30000
- destination port: 80
- Actions:
- Remove VLAN tag 101
- Output port:
- BR-TUN :
- Client to Server flow:
- Match fields:
- Input Port ID:
- Tunnel ID: 2000
- Source IP; inIP1
- Destination IP: inIP2
- Protocol: TCP
- source port : 30000
- destination port: 80
- Actions
- Add VLAN tag 101
- Output port:
- Server to Client flow:
- Match fields:
- Input port ID:
- VLAN ID: 101
- Source IP; inIP2
- Destination IP: inIP1
- Protocol: TCP
- source port :80
- destination port: 30000
- Actions:
- Set field:
- Tunnel ID: 2000
- Remote IP : outIP1
- Remove vlan 101.
- Output port:
Assume that FC sees a Multicast flow from VM1 in Node1 (outIP1) with DIP: inMIP, UDP protocol and Destination Port : 2222 and Source port 40000. Assuming that it should go to compute nodes Node2 (outIP2), Node3 (outIP3) and Node4 (outIP4) as there are VMs that are interested in these multicast packets. Then FC would generate following flows in Node1 and Node2. Flows in Node3 and Node4 look similar.
- Node 1:
- br-int
- Match fields
- Input port:
- Source IP : inIP1
- Destination IP: inMIP
- Protocol: UDP
- Source port: 40000
- Destination POrt: 2222
- Actions
- Add VLAN tag : 100
- Output port:
- br-tun
- Group object (G1):
- Type: ALL
- Bucket 1:
- Set-fields:
- Tunnel ID: 2000
- Remote IP: outIP2
- Remove VLAN tag 100
- Output port:
- Bucket 2
- Set-fields:
- Tunnel ID: 2000
- Remote IP: outIP3
- Remove VLAN tag 100
- Output port:
- Bucket 3
- Set-fields:
- Tunnel ID: 2000
- Remote IP: outIP4
- Remove VLAN tag 100
- Output port:
- Match fields:
- Input port :
- VLAN ID: 100
- Source IP : inIP1
- Destination IP: inMIP
- Protocol: UDP
- Source port: 40000
- Destination POrt: 2222
- Actions
- Group Object: G1
- Node 2
- br-tun
- Match fields:
- Input port:
- Tunnel ID: 2000
- Source IP : inIP1
- Destination IP: inMIP
- Protocol: UDP
- Source port: 40000
- Destination Port: 2222
- Actions:
- Push VLAN tag 101
- Output port:
- br-int
- Match fields:
- Input port:
- VLAN ID: 101
- Source IP : inIP1
- Destination IP: inMIP
- Protocol: UDP
- Source port: 40000
- Destination Port: 2222
- Actions
- Remove VLAN tag 101
- Output port:
Evey flow that is created by FC in OF logical switches, FC maintains the flows locally too. This is to enable cases where OF switches evict the flows. When there is a packet-miss due to this, FC would push the locally available flow into the OF switch. That is, whenever there is packet-in, FC first needs to check its local run time flow store before creating new flows by referring to repository.
Few more considerations that FC and CRD components that need to take care of are;
- VM Movement: When VM is moved, flows created in the OVS OF switches also should be moved accordingly. CRD component is expected to listen for VM movement events from Openstack and internally update the repositories. FC component, in turn, should update the OF flows accordingly - Removing flows from old compute node and put them in new compute node.