Server Load Balancing (SLB):
Let us first revisit the "Server Load balancing" concept and its features. Server Load balancing was invented during initial stages of web in late 1990 and early 2000s. At that time the pages served were mostly static. So simple Layer4 load balancing was good enough to do balancing of TCP connections across multiple servers in the server farm. Load balancing device, typically, is assigned with Public IP address. When client connection lands on this public IP address, this device figures out the best server that can serve the connection and hands over the connection to the server by doing "Destination NAT", that is, destination IP address of remote client generated packets gets translated with IP address of server selected. To ensure that packets going to client from server are accepted by the client, response packets from the server would undergo source IP address translation. By this clients only see the public IP address in response packets. Now, let us discuss some SLB features.
- MAC Level redirection : When servers in the server farm have public IP address, this option can be used instead of doing "Destination NAT". Server Load balancing (SLB) device should be placed in the traffic path to ensure this mode works. In this mode, SLB device selects the server in the farm and modifies the destination MAC address with the server MAC address. Optionally it can also change the Source MAC with the SLB device MAC.
- Additional Source NAT : This mode is useful in assymetric routing topologies. When destination NAT is done on the client packets, it is required to do source IP translation on server to client traffic. For this to happen, server to client traffic must pass through the same SLB device that did the DNAT on the client traffic. Source NAT is done on the client traffic with the SLB device IP address to ensure that the traffic come to the SLB device even if the servers' routing tables are configured with some other default gateway.
- Multiple different algorithms to select the server: Based on SLB deployment and need, several different ways of scheduling connections to server can be chosen. Following are some of the methods.
- Round Robin : Each server is treated same from capability perspective and servers are chosen on round robin fashion. If there are 2 servers in the server farm, then alternate connections are given to each server.
- Weighted Round Robin : Typically each server is given a weight 1 to 10, 10 is assigned to server that can serve 10 connections for every 1 connection served by server that has weight 1. Until weight number of consecutive connections are assigned to a server, next server is not chosen.
- Least Connections : In both Round robin and Weigthted round robin methods, it is assumed that the all connections are almost equal with respect to duration of connection, amount of traffic that is sent on each connection and processing power used by each connection on the server. But in reality it is not true. In 'Least Connection' method new connections is given to server which is serving least number of connections at that time. SLB device maintains the count of active connections with each server to facilitate this scheduling.
- Weighted Least Connections : All servers can't be assumed to be equally powerful and this feature allows deploying servers of different processing power capabilities. Each server can be given a weight. If 2 and 5 weights are given to server 1 and server2, it means that server2 is powerful enough to serve 5/2 times the active connections simultaneously than server1. SLB device keeps this weight in mind in deciding the best server for new connections.
- Fastest Response time : Though 'Least connections' and 'Weighted Least connections' do their best to distribute the connections, there can be situations where many high processing connections are given to one server. Any new connections going that server can overwhelm that server and due to that connections may not be served well. Any connection that does deep data mining function could overwhelm the server. To take care of these situations, it is better to monitor the processing power left in each server for new connections. What is the better way than monitoring the response time of servers. Though it is not 100% way of finding the least utilized server with response time, it is close as long as monitoring for response time is happening continuously. Some SLB devices monitor the response time as part of health checking. That is not good enough as health checks happen once in a while. Response time measurements must be done on response to real client traffic.
- Fill-and-go-next : Green is the buzz word now - Save power and save on cooling costs. Round Robin and other methods above distribute the traffic to all servers. Even when there is less load across servers, they are all used and thereby not giving chance to any server to go to any power saving modes. When this mode is chosen, SLB device uses one server to full utilization before selecting next server. Typical configuration include the number of connections a server can handle and when to start warming the next server. For example if 10000 and 8000 numbers are given, after 8000 active connections on existing server, next connection is given to next server to bring that server up to the full power and both servers are used until 18000 active connections and then another server is warmed up and so on. This mode can be combined with Round Robin method where this is used to balance the traffic across all warmed up servers.
- Syn Flood Detection and Prevention (DDoS Prevention): This feature is common in servers from last few years. But detection and prevention at central place in SLB allows administrators not worry about feature availability in all servers or enabling this feature in servers. This feature prevents attackers from sending large number of SYN packets without completing the TCP connection establishment phase. SLB devices using this feature detect the SYN flood attack without consuming its resources using SYN-Cookie mechanism. Only when three way handshake is complete, connection is awarded to server in the server farm.
- Traffic Anomaly detection and Traffic Policing: DDoS attacks such as Syn flood detection is really thing of a past. Newer kind of attacks complete TCP connections and issue requests that consume CPU power there by creating Denial of Service for genuine users. Since these attacks complete the connection, it is possible to track the users by their IP addresses. Traffic anomaly detection feature allows administrators to configure the baseline traffic characteristics based on normal day activity. SLB devices based on this configuration detect any anomaly and can take corrective actions such as:
- Throttling the traffic coming from identified IP addresses : Throttling the traffic can be on "Simultaneous active connections", "Connection Rate", "Packet/Byte Rate".
- Blocking the traffic from identified IP addresses for certain time.
- Health Checking: SLBs are expected to use servers that are online. To ensure that, SLBs typically do health checks on servers to ensure that they are alive. Some types of the health checks that can be configured are:
- ICMP Ping : Using this check SLB knows that the server machine is active.
- TCP Connection Checks : These checks allow SLBs know that a given TCP Service is running. Note that it is possible that Machine is alive, but the user process listing for TCP connections is dead. This check allows SLBs to check the TCP Server liveness.
- Application Health Checks: These checks allow SLB to know whether the application is running well on the TCP Service. Note that TCP Server may be running in the server, but application that acts on the application data could be different user process than the process that is listening for TCP connections. Hence it is important to have these health checks enabled in the SLB devices.
- Persistence: For some end applications, it is necessary that all connections coming from one client go to the same server. This is necessary if the application maintains some state across connections coming from the client. It necessitates temporary and dynamic binding state in SLB devices. When there is no matching bind state, one server is chosen for the connection and immediately binds the client IP address with the chosen server IP address. Any new connection from this client is given to same server without server selection process. To ensure that these bindings don't consume large amount of memory on SLB devices, these bindings are removed after certain amount of inactivity. As you can see later, binding are implemented using cookies in case of HTTP connections without having need to store binding state in the SLB. Cookie received in each request would indicate the bound server.
Server Load balancing based on L3-L4 ( Source IP, Destination IP and Ports) does not solve all issues with different kinds of applications running on the servers. It requires L7 intelligence in the SLB devices to meet the load balancing requirements of newer kind of applications. Since they have L7 intelligence in making load balancing decision, I guess these devices are being called 'Application Delivery Controllers". In short they are called ADCs.
First generation of ADCs were trying to solve challenges associated with E-commerce and Session-persisting HTTP based applications. One challenge is to ensure all connections go the server which was chosen for the first connection from the client. This is necessary due to the fact the applications on the server maintain user state in the memory and this state is updated with the user selections until user logs out. This challenge has another twist where these connections are SSL encrypted.
First generation ADCs solve these challenges using HTTP protocol intelligence in ADC devices.
- Persistence using HTTP Cookies : Though IP based persistence works decently, it fails when a given user IP address changes across connections. This can happen if the user router gets new IP address in the life of user session. I believe another possibility is due to different NAT addresses usage by some ISPs. That is, ISP might use one NAT IP address for one connections and another NAT IP address for other connections of same user. Due to these issues IP address persistence does not work in all cases . In addition, IP address persistence occupies some memory in SLB devices. Alternative for HTTP based applications is to use HTTP Cookies. HTTP Cookies allow servers to relate the connections belonging to the same session. Same mechanism is used by ADCs to relate user connections. ADCs define their own cookie for storing the selected server in encrypted/encoded cookie data. When new HTTP connection is received, it checks whether it has its cookie and value. If not, it assumes that this is first connection of session and adds a Cookie in the response going to client with encoded/encrypted selected server information. Any further connections coming from the same user to this site would have this new cookie and this cookie value is used by SLB to map the connection to the server. Since this cookie belongs to the SLB, SLB devices remove the cookie from the request before sending it to the server.
- SSL Termination : E-Commerce web sites require SSL/TLS Security and also session persistence. To facilitate this, SLBs provide SSL termination capability. This capability in addition also offloads CPU intensive SSL processing to ADCs and thereby saving CPU cycles on server for application processing.
HTTP Optimization features:
- HTTP Connection pooling : HTTP protocol allows multiple transactions in one TCP connection. This feature in ADCs multiplex multiple client connections into few connections to the server. Without this feature, there are as many TCP connections to the Server from SLB as number of TCP connections it is terminating. With this feature, there are only few TCP connections towards server, there by saving some CPU cycles and memory on the servers. Though the amount of saving one can have with this feature is debatable, this is one feature many ADCs support.
- HTTP Compression: HTTP compression using gzip and other algorithms can save bandwidth by compressing the response data. ADC devices offload this capability from the Servers there by making server CPU cycles available for application processing.
- HTTP Caching & Delta Encoding: HTTP Caching feature between clients and servers avoids duplicate file download if content was not changed on the server. But there are many instances where file/data content change but not significantly. This feature allows only difference (delta) being sent to the clients rather than complete content there by saving bandwidth usage. Since delta encoding requires significant CPU cycles, ADCs offload this functionality from HTTP Servers.
SLBs traditionally don't look at the application protocol data to make server selection. ADCs augment the criteria for selecting the server with protocol data such as HTTP data. Some examples where protocol intelligence is required:
- Multiple server farms with each each farm having different content.
- Different server farms for Mobile users and Normal users.
- Different response on errors based on browser type.
- Prioritization/throttling/shaping of requests to specific URL resources upon congestion.
- Addition/Modification of request/responses based on the type of applications running on the server farm.
- Combination of above.
Since ADCs are becoming a central location for all requests going to the data center servers, this is place to enforce threat security and thereby offloading threat security functionality from the servers. Some of the threat security functions ADCs support are:
- Traffic Policing based on protocol intelligence: ADCs with protocol intelligence can do better job of DDOS prevention than SLBs. Baseline traffic can be much beyond typical 5 tuples. It can include URL pattern, User-Agent, Cookie values and beyond.
- Web Application Firewall: There are many attacks that are being exploited by attackers on web server applications including SQL injections, XSS (Cross Site Scripting), LFI (Local File include) and RFI (Remote File Include). ADCs are increasingly offloading this functionality from the servers. It not only improves server performance, but also administrators can configure signatures at one place i.e ADC.
This feature is not common in ADCs today. But I believe this is going to become critical feature moving forward to ensure that multiple server farms belonging to multiple domains corresponding to different customers are supported by one or few ADCs. One ADC device should be able to support multiple customer server farms in public data centers. There are two types of virtual instances possible - In first type. One executable image supports multiple instances. Here configuration data and run time data are stored separatetly and even provide UI/CLI with role based access to configuration belonging to virtual instances. Second type of Virtual instance is to have multiple images with each image for one virtual instance. In this case, even if there is any issue with one virtual instance, other instances are not affected, there by providing good isolation. First type can provide large number of instances where second type is limited to few tens. I personally think that Linux container approach is better for second type as this is lean and uses common Operating system and TCP/IP stack image. Only user processes are instantiated multiple times as number of containers.
Third generation ADCs:
What are the features one would expect from third generation ADCs. I can think of few based on where data center market is heading.
- Cloud Computing : Traditionally ADCs are configured with all servers in a given server farm. It works well when manual intervention is required to setup/take-off servers from server farm. Administrators are used to configure the ADCs accordingly. In Cloud computing, servers are no longer physical. They are virtual. They can be added and removed dynamically and programmatically. They also can be disabled and enabled dynamically without manual intervention. Some use cases - virtual instances are brought up and down based on the load on the servers or based on time of day, holidays etc.. Since they are coming up and going down dynamically, it is expected that ADCs also would get this information dynamically.
- Beyond HTTP Servers (SIP, CIFS, SMTP, IMAP etc..): HTTP had been and is dominant in data centers today. It is changing with cloud computing. Many organizations are expected to host their internal servers too in the cloud. They include mail servers, File Servers etc.. Third generation ADCs are expected to balance the traffic across Email Servers, File Servers etc.. In addition it is expected to have optimization features to save network bandwidth. File Servers today dominantly support CIFS. ADCs are expected to have CIFS proxy that can save bandwidth by doing delta encoding, caching and de-duplication etc..