Saturday, September 27, 2008

Hardware Traffic Management Functionality - What is it system designers need to look for)

There are many chip vendors coming out with inbuilt traffic management solutions, mainly on traffic shaping and scheduling.   I happened to review some of them as part of my job at Intoto.

Traffic Management in hardware is typically last step in the egress packet processing.  Scheduled packets of traffic management goes on the wire. That is, once the packet is submitted by software to the Hardware Traffic Management,  packets are not seen by the software. 

In theory,  anything that is done in the hardware is good as it saves precious CPU cycles to do some thing else. And that is good thing.  In practice,  hardware traffic management feature set  is limited, it may not be useful in Enterprise markets. As I understand these HW traffic management solutions are designed for some particular market segments such a Metro Ethernet.

If you are designing a network equipment,  you may like to look for following functionality in hardware traffic management (HTM).

Traffic Classification:  Many HTMs don't support classification in the hardware.  They expect the classification to be done by the software running in the cores and enqueue the packet to the right hardware queue.  HTMs typically do only shaping and scheduling portion of  Traffic Management function on the queues.   I can understand that there are multiple ways the packets can be classified and hence leaving it to software provides good flexibility for system designers.  As I understand,  number of cycles software takes to do the classification is either same or more than the scheduling and shaping put together.  At least, I would expect HTMs to do some simple classification based on L2 and L3 header fields, there by leaving the classification task from the cores.

Traffic Shaping and Scheduling :  Traffic shaping being the basic functionality of HTMs,  this is supported well.  Toke bucket algorithm common algorithm used by Traffic Managers to do the traffic shaping.  Some systems require Dual rate traffic shaping (Committed Information Rate and Excess Information Rate).  System designers may need to look for 'Dual Rate' feature.  In addition, it is required to know how the EIR is used by the HTMs.   At least the systems I am familiar should treat EIR similar to CIR, but EIR shaping is expected to be done only if all CIR requirement of all queues is met.  If there is more bandwidth available after meeting the CIR requirement of queues,  then EIRs of the queues need to be considered.   If EIR of all the queues are met and if there is still more bandwidth available to send more packets, then round-robin mechanism or some other mecahnism of queue selection can be adopted for scheduling the traffic    One should look for these features to ensure link is not under-utilized.

Another feature one should look from HTMs on whether it has flexibility to enable only CIR, EIR or both and a flag indicating whether it should participate in scheduling beyond EIR. 

From scheduling perspective, different systems require different scheduling algorithms.  Systems require scheduling from strict priority based queues and non-strict priority queues.  For non-strict priority queues, scheduling algorithm applies. Common scheduling algorithms expected are:  DRR, CRR, WFQ,  RR, WRR.

Traffic Marking:   Marking is one important feature of Traffic Management functionality.  Marking of the packet is meant for upstream router to make allow/deny decisions if the upstream observes any congestion. Different markings need to be applied based on classification criteria and based on rate band it used (within CIR, between CIR and EIR and beyond EIR).  Marking the packet based on classification criteria is normally expected to be done by software if classification is done in software.  But marking the packet based on the shaping rate needs to be done by HTM as software does not get hold of the packets after traffic management.  Typically the marking is limited to DSCP value of IP header or CoS field of 802.1Q header.  I see some HTM systems expecting the software to point to the DSCP location and CoS location along with the packet so that they can place the right value in those locations.  

So, the features to look for on marking side is  - Ability for HTM to market packets and Ability for software to configure marking values (DSCP, COS or both) on per queue basis based on the shaping band used to schedule the packet (CIR, EIR or beyond EIR).

Congestion Management:   Shaping and Scheduling always leads to queue management.  Queue Management is required to limit the queue size and also to ensure that latency of packets don't go up as in some cases it is good to drop the packets rather than send the packets late.   Different traffic types require different congestion management.  Typical congestion management algorithms expected are -  Tail Drop,  RED (Random Early Detection) , WRED (Weighted Random Early Detection),  head of queue drop.  In addition, queue size in terms of number of packets it can hold are expected to be configurable.   When there is congestion,  there would be packet drops.  How the packets are dropped and how they are informed to software can have performance issues.  Software needs to know the packets that are dropped from the queues so that software can free them.   To reduce the number of interrupts going to the software for dropped packets, it is expected that interrupt coalescing functionality is implemented by HTM. Also it is expected that it maintains list of packets that were dropped so that the software can read that bunch in one go when interrupt occurs.

Hierarchical Shaping and Scheduling :  This feature is critical for many deployments.  Shaping parameters are normally configured at the physical port or logical port level based on the effective bandwidth.  On the port, there could be multiple subscribers (Example:  Server farms of different customers in DC,  different divisions in the Enterprises,  Subscribers in Metro Ethernet Provider etc..) with each subscriber having their own shaping (CIR, EIR).  Different traffic flows in each subscriber also might have shaping. For example,  MEF does not rule out shaping on traffic based on set of DSCP values beyond shaping on Port and VLAN level.   In Enterprise,  shaping might need to be done based on IP addresses or transport protocol services.  Scheduling is always associated with each shaping.  That is whenever there is some bandwidth available, scheduler is initiated to schedule the packet. That is, when the physical/logical port level shaper finds some bandwidth to send the traffic, it invokes the scheduler.  In above example, port/logical port level scheduler tries to get hold of packets from one of the subscribers. If the subscriber itself is another QoS instance (having its own shaping and scheduling),  selected subscriber scheduler is called to get hold of the packet.  If the subscriber scheduler might call another internal scheduler to select the queues having the traffic. Since one scheduler calls another scheduler,  it is called hierarchical.  Typically,  8 hierarchical levels are expected to be supported.  As a system designer, one needs to ensure this feature and ensure that the number of levels supported by HTM suit your requirement.

It is also required to ensure that hierarchical shaping and scheduling does not involve queues at each level. If that was the case, performance of HTM would be bad.  It is okay for HTM to expect software to put the packet in the inner most levels.  Note that all queues may not be always in the inner most level.  An intermediate or first levels QoS might have either further QoS instance or the queue itself.  If the scheduler selects the QoS instance, then next inner level scheduler is called, otherwise, it selects the packet form the selected queue.  Classification in software is expected to put the packets in appropriate queues as per scheduler levels.

Support for Multiple Ports :  Some Enterprise Edge devices have multiple interfaces (Multiple WAN interfaces).   Each interface might be requiring its own QoS traffic treatment.  As a system designer, this is one thing to look for in HTM on how many ports/logical ports can be configured with QoS.   Logical ports are also required as some systems use inbuilt switch to expose multiple physical interfaces from one 10G interface connected to the CPU board. VLANs are used internally to communicate between 10G interface and switch.   For all practical purposes, this scenario should be treated as if there are multiple interfaces on the CPU card itself.

Support for LAG : LAG feature adds multiple links together with each link having its own shaping parameters.   As a system designer, you may like to ensure that traffic marked for a link (port or logical port) by the software LAG or hardware LAG is scheduled on the same link.  Also, one may like to ensure that schedule operation is invoked by the appropriate shaper of link. That is,  HTM should not be having shaper on LAG instance, but it should implement the shaper on each link. 

At no time, HTM should drop the packet. It is okay for some mis-ordering happening in some LAG operations, but no packet should be dropped.  Two LAG operations - Rebalancing and Add/Delete new link should not rise to drop of packets from the HTM queues.  One may like to ensure this.


No comments: