Sunday, December 4, 2011

Openflow 1.1 protocol tutorial

Openflow protocol is a TCP/SSL based protocol between controllers and switches.  Switch is expected to initiate a connection to the Controller.  For each datapath instance, switch device is expected to make a connection.  If the switch supports X number of datapaths (instances), then X number of connections are established.

As you might have guessed by this time,  Openflow is a protocol that separates Control Plane and Data Plane.  Data Plane is called as "datapath" too.

Controller typically tries to get information from the datapath about its hardware/virtual switch configuration such as - Number of table supported,  Number of ports, Ports information and QoS queues supported.  Controller tries to get this information every time there is successful Openflow connection was accepted from the switches.

Controllers, based on the configuration by the administrators and based on the output of protocols, knows what to program in the datapath.  Datapaths maintains the tables for storing the flows.  Controller has flexibility to arrange the tables and use different tables for various purposes. Table traversal for packet is controlled by the Controllers.  Each flow the controller establishes in the table can have various actions such as packet header modifications,  next table to jump to etc..   I will talk about how controllers will arrange the tables and does flow management for various applications such as L2 switching, L3 Switching and L4 switching in later posts.  Essentially,  tables contain flows. Flow have matching fields and action fields.  Matching fields are used to match the flow in the datapath.  Packet processing continues to the next table or packet gets sent out based on the matching flow action fields.  Some important items to notice are:
  • Tables are ordered lists, that is, each flow has a priority.
  • Meta data information can be collected across various matching flows in different tables.
  • Next table flow match can have the metadata information as one of the matching fields
  • Flows can point to group of action buckets.
  • Groups can be setup to 
    • Duplicate the packets.
    • Load share the traffic across multiple next tables.
In essence,  idea is that controller has entire control on the packet path across multiple tables. Datapath need not have any intelligence of L2 switching, IP routing.  As long as they blindly do operations as specified in the flows in tables,  things should work fine as Controller takes care of responsibility of L2/L3 protocol level processing.

Flows in the tables can be programmed by control plane in two ways - Proactive and Reactive.  It is also possible that some flows may be setup pro-actively and some get setup re-actively.  Proactive flows are setup without any packets.  Reactive flows are setup only when there is a packet.  Initially, there may not be any flows in tables.  Datapath, when it does not find the matching flow in a table, if the miss property of the table indicates to send the packet to controller,  then the datapath sends the packet to the controller. Controller is expected to push a flow and send the packet out to the datapath for datapath to process the packet appropriately.

With that background, let us see various protocol messages between controller and datapath:
  • Symmetric messages :  Are the messages that can be initiated by any party (Switch or Controller).  
    • Hello message:   Exchanged right after connection setup.
    • Echo Message:  Request/Reply messages - Mainly used for check the liveness of connection,  latency and bandwidth of connection.  Any Echo request messages is replied using Echo reply message.
    • Experimenter messages:  For future extendability.
  • Controller to Switch Messages:  These messages are initiated by the controller and switch is expected to reply. 
    • Feature request:   Controller sends this message to the switch to inquire the switch capabilities.  This feature request is used to get following information from the switch:
      • Data path ID :  A switch may support multiple data paths - Multiple switches. Switch is expected to make as many connections as number of datapaths (switch instances) it supports.  Based on the connection used to request features,  the switch device is expected to send corresponding datapath ID.
      • Maximum number of buffers that can be stored in the switch:  Normally, switch devices are expected to send the packet to which there is no matching table entry as PACKET_IN message to the controller.  To save bandwidth between the switch device and controller,  switch can optionally store the buffer in its memory, send reference to this buffer and some portion of the packet buffer to the controller. As you might see from the specifications,  this reference to buffer is returned by the controller using PACKET_OUT message for switch device to take action. In any case,   this particular parameter indicates the number of buffers the switch device can store while communicating with the controller.
      • Number of tables supported by the switch device on this instance:  As you might see from the openflow specification,  controller creates the flows in various tables.  This parameter informs the controller on how many tables the switch device supports for this datapath (instance).  
      • Capabilities supported by the switch instance such as
        • Statistics collection on per flow, table, port, group and Queue basis.
        • Whether switch can do IP reassembly before it extracts the source port and destination ports in case of UDP, TCP and SCTP transport protocols.
        • Whether the switch can extract source and target IP addresses from the ARP payloads.
      • Information about ports that were attached to this datapath (Switch instance):  For each port (physical or VLAN), following information is sent by the switch.
        • Port Name (String), Port ID (Integer), HW address (in case the port is used in Layer3),   Port Configuration Information in a bitmap (Administratively Down, Drop all packets received from port,  Drop packet forwarded to this port,  Don't send in packets on to this port back),   Port state in a bitmap (Port Link down,  Port blocked due to STP, RSTP etc..,  Alive), Port features in a bitmap (Speed support10Mb Full Duplex, 10Mb half duplex, 100Mb Full Duplex, 100Mb half duplex, 1Gb Full Duplex, 10Gb Half duplex,  10G Full duplex, 40G Full Duplex, 100G Full Duplex, 1Tb Full duplex and other speed support,  Link type - Copper/Fiber support, Link features - Auto negotiation supported, Pause and Asymmetric Pause ---  Port features are given in multiple bitmaps :  Current - current features, advertized - Advertized features of the port,  supports - Supported features of the port and Peer - Features supported by the peer, that is, other side of the link),  Current bit rate of the port and maximum bit rate of the port.
    •  Configure Switch and Get Switch Configure messages:  SET_CONFIG message is used to set the configuration and GET_CONFIG_REQUEST message is used to get the configuration from the switch.  Type of configuration that can be set by controller is listed below.
      • IP Fragment treatment (No special handling,  Drop fragments,  Perform reassembly):  Switch is supposed to take action as set by the controller when it receives the IP fragments.
      • Action on Invalid TTL:  Controller can ask switch to send the packets with invalid TTL to controller.  TTL value 0 is invalid for L2 switch and TTL value 1 is invalid for forwarding packets.
      • Length of the packet that need to be sent to the controller as part of PACKET-IN message. 
    •  Table Modification Message and Get Table Status message:  Modification message is used to modify the properties of specific table.  Property of table mainly deals with the action to be taken when there is a "MISS" in the table for the incoming packet. 'MISS' actions can be defined as one of the following
      • Send the packet to controller.
      • Continue with next table processing.
      • Drop the packet.
    • Flow Entry Management commands (Add/Modify/Delete) of flow entries in a given table. Flow is not the same as traditional flow.  To me,  flow in Openflow context has flexibility to use ternary comparisons (using mask).  Each table is also kind of ordered list with each flow having priority.  Normally when one thinks of flows, they are exact match entries and hence think that flows can be arranged in hash table fashion.  Since these flows are not traditional flows,  hash table can't be used.   Every entry that gets added has following information coming from controller.
      • Table ID: Table to which entry is getting added to.
      • Command: Add, Delete and Modify 
      • Cookie and Cookie mask: Valid for Modify and Delete commands. This is used to update or delete multiple entries at once.
      • Idle Timeout :  Inactivity timeout.  If there are no packets matching this flow for this timeout period,  flow gets deleted.
      • Hard timeout:   Flow gets deleted after this timeout even if there were packets matching this flow.
      • Buffer Identification:  This is sent by the controller typically for add command.  As discussed above,  to save the bandwidth between switch and controller,  actual packet is not sent along with the PACKET-IN message. Rather buffer reference is sent along with truncated packet content.  Controller, while adding the flow can inform the switch to process the packet referenced by the buffer_id.
      • output port and out group :  Variables are meant for DELETE command.  DELETE command deletes the flows that match these two parameters.
      • Inform Flow Removal:  This flag is set by controller on this flow to get informed whenever this flow gets deleted when the flow expires.
      • Check for overlapping entries: This flag indicates that no other flow with the same matching information should be added in future.
      • Match fields:  Flow is identified by set of match fields.  Match fields include  input_port, Ethernet Source MAC address and Mask,  Ethernet Destination MAC address and Mask,  VLAN ID,  VLAN Priority (PCP),   Ether-type,  IP TOS (DSCP field),  IP Protocol,  Source IP address and Mask,  Destiantion IP address and Mask,  Source port, Destination Port, MPLS Label,  MPLS TC,  Meta Data and Meta Data Mask. There are total 15 tuples. Except for Meta Data, everything is part of the packet.  Ofcourse some packets don't have some fields and those fields are normally ignored during the flow match process.   Some fields can be mentioned as wild cards. They are: Input Port,  VLAN ID, VLAN Priority,  Ether Type,  IP TOS,  IP Protocol,  Source Port, Destination Port, MPLS Label and MPLS TC.
      • Set of instructions to be applied. Each instruction can contain multiple actions.  Following instructions can be associated with the flow.
        • Next Table to lookup :  This instruction of the flow indicates to the datapath that it should start matching the next table whose identity is given along with the instruction.
        • Setup Meta Data:   This instruction is used to set the meta data and mask to the data path for that packet.  This meta data might be used by datapath to match the entry in the next table.
        • Actions on the packet - There are three types of instructions that are possible -  "Apply Actions" where actions are applied immediately to the packet,  "Write Actions" where the actions are collected and these collected actions are applied at the end before packet is sent out,  "Clear Actions" is used to clear any collected actions so far.   Note that a given flow can have both "Write Actions" and "Apply Actions". There are many actions that can be collected or applied.  Actions defined by specification are listed below.
          • Output to Switch Port :  Port on which packet has to be sent out.  This port can be logical port. If the logical port is "CONTROLLER",  then max_len field can be specified in the flow.  This size is used to truncate the packet while sending the packet to the controller using PACKET-IN message.
          • Set VLAN ID :  Replace the existing VLAN ID. Applies to packets that have existing VLAN tag.  If there are multiple VLAN tag, this action is applied onto the outermost VLAN header.
          • Set VLAN Priority:  Replace the existing VLAN Priority. If the packet does not have any VLAN tags, then this action is ignored by datapath.  If there are multiple VLAN tag, outermost VLAN tag's priority is replaced.
          • Set Ethernet Source MAC Address:  Replace the existing Ethernet source MAC address.   If there are multiple Ethernet headers, outermost Ethernet header is selected for modification.
          • Set Ethernet Destination MAC Address:  Replace the existing outermost Ethernet Destination MAC address.
          • Set IPv4 source address,  Set IPv4 Destination Address, Set Ipv4 ToS bits, Set IPv4 ECN bits:    Replace the appropriate fields in the outermost IP header and updates the checksum. In case of UDP, TCP, SCTP checksums are also updated.
          • Set transport source port,  Set transport destination port:  Replace the existing transport source port and destination port with the values given in the action descriptor.  Also updates the checksums.
          • Copy TTL Outwards - Copy TTL from next-to-outermost header to outermost header.  Copy can be from IP to IP,  MLPS to MPLS or IP to MPLS.
          • Copy TTL Inwards:  Copy TTL from outermost header to next-to-outermost header.  Copy an be from IP to IP,  MPLS to MPLS and MLPS to IP.
          • Decrement IPv4 TTL :  Decrement the TTL of outermost IP header.
          • Set MPLS Label:  Replace the existing outermost MPLS lable.
          • Set MPLS Traffic Class:  Replace the existing outermost MPLS TC.
          • Set MPLS TTL:  Set the TTL value of outermost MPLS header.
          • Decrement MPLS TTL:  Decrement the MPLS TTL.
          • Push and POP VLAN tag:  Push the VLAN tag or PoP the VLAN tag.
          • Push MPLS header and POP MPLS header 
          • Apply Group Actions:  Group ID of the group is mentioned along with the action.  Group actions are also applied along with the explicit actions specified in the flow.
          • Set Queue:  This action is set to apply the QoS on the packets.  Queue_ID reference is passed along with the action by the controller.  Data path is expected to queue the packet to this queue.  Note that this action can be set not only at the last table, but also at intermediate or first table.  If this action is set on the "Apply Actions",  then it is very important that QoS applied and result packet starts from where it was left off.
    • Group Entry Management Commands:  There is one group table for each datapath (Switch instance).  Group table contains multiple group with each group identified by group-id which is set by the Controller as part of group creation.  Each group is collection of buckets with each bucket having set of actions to be applied.  The type of actions that are set on a bucket are same type of actions that are set on the flow.   Since Group ID is referenced from the flow instruction,  the associated actions of the group are based on which instruction it is - Apply immediately,  Collect or Clear.   Each Group record contains a selection logic of bucket to use.  There are four group types - All,  Select, Indirect,  Fast failover.  Yet times, you require packets to be duplicated and load balanced. In those cases,  two groups are required. First group would have its type "All" and bucket action in each one of them point to separate groups, whose type is 'Select'.
      •  All: Packet is duplicated as many times as number of buckets.  On each duplicated copy,  bucket actions are applied. Packet processing of the duplicated copy is similar to original packet. That is,  this packet would jump to next table or packet gets egress'ed as original packet.
      • Select:  Packet processing selects one bucket.  This is mainly used for load balancing purposes.  Different flows of packets might use different buckets. 
      • Indirect:  One bucket is selected for all flows referred to this group.  This is similar to having one bucket in the group.
      • Fast Failover:  Executes the first high priority live bucket.  To select the bucket, each bucket is associated with weight (priority) and port and port group which tells whether this bucket is alive or not.
    •  Port Modification Message : is used by controller to modify the behavior of the port. Controller sends the message with the port ID and associated modification information. Since it is modification,  the fields which are modified are indicated using corresponding mask.
      • Port Configuration bits and mask bits - Administratively Down, Drop all packets received from port,  Drop packet forwarded to this port,  Don't send in packets on to this port back.
      • Port features that are asked by the controller to advertize.
    • Queue Configuration Message:  It is meant to do QoS in the datapath.  But the capabilities expected by controller from the switch are minimum.  It appears that number of queues are property of datapath.  Controller can only configure the shaping bandwidth on per queue basis.  It is understandable that classification is happening already, but there is no flexibility to create group of queues,  setting up the scheduling algorithms or setting up the queue management algorithms.
    • Read State Messages:  This message is used by controller to get the current state of data path.   This is used to get information about statistics mainly.  "type' in the request indicates the the type of information requested.  
      • Description Statistics:  Data path replies with following information:
        • Manufacturer  Description,  Hardware Description,  Software Description,  Serial number and Readable description of datapath.
      • Flow Statistics: Controller requests for a given flow using "Table ID",  "Out Port", "Cookie and its mask"  and Flow match fields. It was not very clear what happens if multiple flows match.  I think it is the first flow that matches would be selected to reply back. Data path returns the statistics for the given flow such as
        • How long flow is alive in seconds/nanoseconds.
        • Priority of the flow entry: 
        • Number of seconds before expiration.
        • Packet count, Byte count
        • Match fields and instructions that are part of the flow.
      • Aggregate flow statistics:   Similar to above. But in this case, aggregate statistics are sent.This aggregation is based on the flow statistics of all flows that are matched.
        • Packet Count, Byte Count.
        • Flow Count - Number of flows.
      • Table Statistics: Controller request statistics of a table. Reply is sent with following information
        • Fields that are used to match this table.
        • Wildcards supported to match this table.
        • Instructions that are supported by this table.
        • Write Actions
        • Apply Actions
        • Miss Action configuration.
        • Maximum number of entries supported in the table
        • Active entries
        • Number of packets looked up in the table.
        • Number of packets that have entry hit in the table.
      • Port Statistics:  Controller requests for port statistics by giving port number.  Reply information contains following:
        • Number of Received packets,  Number of transmitted packets.
        • Number of received bytes, Number of transmitted bytes.
        • Number of packets dropped in receive,  Number of packets dropped by transmit.
        • Number of receive errors, Number of tx errors
        • Number of rx frame errors,  Number of overrun errors
        • Number of rx CRC errors,  Number of collisions.
      • Queue Statistics:  Controller requests statistics by giving port number and queue ID.  Results sent back are:
        • Transmit bytes,   Transmit packets and transmit errors.
      • Group Statistics:  Controller requests statistics by giving group ID.   Reply information consists of
        • Reference Count - Number of flow entries or other group entries that refer to this group ID.
        • Number of packets and bytes  processed by this group
        • Bucket Statistics are also returned.  For each bucket in the group, following information is sent back
          • Packet count and byte count of packet processed by this bucket.
      • Group Description:  Controller can request the buckets and associated actions by giving group ID.  Reply information consists of:
        • Number of buckets and information on each bucket including actions.
    • PACKET-OUT message:  This message is sent by the controller, typically after creating the flow in the datapath.  I am puzzled by the description of PACKET-OUT message. I am not sure whether it is a problem with the specifications or my misunderstanding.  I see that PACKET-OUT message has action headers. I am expecting that PACKET-OUT message would start with the table where the miss occurred before.  Note that starting from first table is not an option in many cases where the packet is already morphed due to "Apply Actions" in the matched flows of previous tables. I would expect following information to be sent as part of PACKET-OUT:
      • Table ID:  Where to start the search from.
      • Buffer ID : In case the entire packet was not sent to the controller with PACKET-IN message as part of TABLE MISS.
      • Meta Data information:  Note that TABLE miss condition would have occurred after processing some tables before.  Due to action on the flows in those tables,  certain meta information would have been collected.  This meta data information is sent back so that the processing would be consistent.   I also think that Meta data information of one 32bit integer is not good enough. It should be significantly large enough (up to 128 bytes).  Again to save bandwidth, meta data information need not come to controller via PACKET-IN message and sent back using PACKET-OUT message. It can be stored along with stored buffer in the controller where BufferID is the reference to stored buffer. In case of data path is not storing the buffers, then metadata information should be expected by controller as part of PACKET-IN and send back using PACKET-OUT.
  • Asynchronous Messages :  These messages are sent from the datapath to the controller without any command message from the controller.
    • PACKET-IN message:  This message is sent whenever there is no matching flow in the sequence of tables. Any table miss will result in PACKET-IN message. My comments above in PACKET-OUT message section are valid here.  I would expect following information to go  to controller:
      • Table ID:  ID of the table where miss occurred.
      • Action type:  Is it due to MISS action or due to explicit action to send the packet to controller.
      • Buffer ID:  If the datapath can store the packet and in which case it can send the reference to this buffer and same buffer ID is expected as part of PACKET-OUT.   What happens to this stored buffer if there is no PACKET_OUT message.  If there is no PACKET-OUT message for certain amount of time, this packet gets dropped. I guess if there is PACKET-OUT message after this is dropped, then PACKET_OUT message will be ignored by the datapath.
      • Packet Data:  In case buffer ID is sent, then the entire packet will not be sent.  It just need to send enough bytes for controller to understand the kind of packet (typically, up to TCP/UDP header is good enough).  Amount of data to be sent in case of buffer ID is configurable by the controllers.  By default, miss_send_len is 128 bytes.
      • Metadata:   Specification does not talk about this. I believe that it should be sent.
    • Flow Removed Message:  Data path sends this message whenever flow is removed due to timeout.   This message contains following:
      • Flow specific information:  Priority,  Match fields etc..
      • Some statistics information;  Byte count, Packet Count.
      • Duration of the flow:  How much time the flow was alive.
      • Reason for flow removal :  Hard timeout, Idle timeout,  DELETE command,  Group Delete command.
      • Table ID:  Placement of current flow.
    • Port Status Message:  Whenver ports are added, removed or deleted, this information is sent to the controller.  Information typically contains:
      • Reason for this message:  ADD, DELETE, MODIFY.
      • Port specific information.
    • ERROR Message:  Datapath informs the controller whenever there are errors observed.

No comments: