Saturday, September 18, 2010

Web Application firewalls, IPS & Network Anti Virus - Fixing the performance issues

Security professionals know that the intrusion and malware detection is now beyond looking at stream of packets.  Detection require
  • SSL Decryption - Many client side attacks are increasingly hidden in HTTPS connections.( Check this out )
  • Extracting data from the packets (Example:  HTML, Javascript,  Different types of files to detect attacks embedded in the data)  ( See this )
  • Decoding the data (Such as UTF-8, UTF-16, De-compression etc..)
  • Emulation of data if the data is script (such as Javascript) to counter evasion techniques used by attackers.
  • Comparing with known signatures or codelets OR doing some kind of heuristics
This kind of analysis is not possible with stream based firewalls, IPS and AV.  These require collection of data.  They all require proxies. If some IPS/AV vendor says that they do detection without reassembling data and collecting the data, then as end user you will not be wrong to say that they either miss lot of intrusions or give too many false positives.

Computational power to do above is very high.  It is not surprising to see just less than 10Mbps of IPS, AV combined performance in devices which give 1Gbps of firewall, Ipsec throughput.  I hear stories of customer disappointments when they turn on IPS and/or AV functionality in security devices.

Network security analysts advising companies to enable full functionality even for traffic originated from trusted networks.  It should not be surprising anybody as trusted network boundary is reducing  due to mobility of machines in trusted network. That is, machines are moving from trusted to untrusted and vice versa. Examples :  laptops, ipads etc..   These  machines may get infected when they are in untrusted network and may get infect other machines in trusted network when they are brought into corporate networks. That is the reason, now full protection is being enabled on the security devices. 

HTTP is singlemost protocol that occupies majority of network bandwidth in many organizations.  HTTP is also interactive protocol. Any performance issue also  impact the user experience.   Solving HTTP performance problem not only improves user experience, but also would increase the performance of overall system.

Techniques that can be used to improve the performance of HTTP Anti-malware and IPS analysis are given below. End users might look for following features.

  • Avoid doing duplicate IPS and Anti-Malware checks :  It is very common tha same resource is requested by same/multiple users in the orgnaization via HTTP.  Nework device once AV and IPS check is done on the resource should avoid doing the check again.  This requires caching of AV and IPS analysis and using it when the same resource is requested at later time.   Ofcourse, it should have life time so that it checks for AV/IPS if the content of the resource is changed.  Life time can be equal to the Expiry time of the resource which comes along with the HTTP response headers.  If possible,  this system also can do caching of the response which avoids even going to origin server, there by saving the WAN bandwidth too.  I believe that AV/IPS devices would have HTTP Caching moving forward.
  • Auto blacklisting of URIs :  Malware may be served with dynamic content. In which case, above mechanism of caching does not work.  More often, Malware is served using the same URI.  If the data downloaded from a URI contains the malware, that URI can be blacklisted if malware is detected multiple times.   If the request comes to the same URI at later time,  request can be denied without even senidng the request to the origin server.  Always make sure that the newer blacklisted entries are honored by the device.
  • TCP and SSL offload:  Proxies can benefit greatly if some other entity such as intelligent PCI-e takes care of TCP/IP stack and SSL offload.   
  • Implement proxies as per my earlier post.
  • Usage of Multicore processors and distributing the load across multiple cores.  Selection of Multicore processor depends on several factors such as cost, number of cores (performance), acceleration features etc..  But here I am only covering the features.  Features that would help in processing are :
    • Processing power - Higher the processing power, better the performance would be.
    • Cache Size matters:  Unlike typical firewall/Ipsec processing, amount of code that gets executed in doing AV/IPS analysis is lot higher. Higher sized L1 and L2/L3 caches would store more instructions and goes to DDR less often.  Cache for storing data is also important.
    • Acceleration hardware -
      • Compression/Decompression Accelerator:  To take care of decomperssing the compressed files coming in the HTTP response.
      • SIMD (Single instruction Multiple Data) based hardware to do acceleration of
        • Memory /String operations - Copy, Set
        • Checksum, CRC operations
        • HTML and URL decoding operations.
        • and many more...
Hope it helps..

4 comments:

Ravi said...

Good post.

Which Multicore processor is good for network services applciations
such as HTTP Application firewall, WAN Optimization etc..
Intel is always coming on top in my mind. Reasons being :

1. Matured Linux distributions.
2 Intel SSE/AVX - Fantastic Public Key acceleration and AES Encryption.
3. Good debugging tools.
4. High processing power.
5. 16Mbytes of L2 Cache.
6. Vast number of open source libraries.

I understand that Intel processors are expensive and power hungry. But
my applciation, cost is not really that much an issue.
Since intel follows ACPI, power also may not be big issue.

Are there any reasons why I should be looking at other Multicore
processors? I understand that non-x86 multicore processors are good
for IPsec, NAT and SLB kinds of applications, but are they good for
IPS, HTTP Application firewalls and WAN optimization? Thaks in
advance for your comments.

Srini said...

The cost differential could be significant between x86 based system versus Embedded multicore processor based system. It will not be a surprise to see cost, power and heat parameters of Embedded multicore system be 1/3 of x86 based system for a given performance. That is performance/watt and performance/dollar is higher in Embedded Multiprocessors.

Embedded Multicore processors combine PCI, Ethernet, SRIO controllers, Intelligent Crypto accelerator (SSL/IPSec accelerator), Regular expression search, compression accelerators into one chip.

If you are purely looking for raw performance, then x86 is certainly an option. But I believe that you require many accelerator functions too for your applications. If you are looking to create blades in addition to appliances, power, heat matter.

Other benefits of Embedded multicore processors also include less heat dissipation, real-estate savings and no noise pollution.

Ravi said...

I am not sure I agree that non-x86 Multicore processors give performance/cost advantage over x86 based Multicore processors.

As per network world report sonicwall E7500 uses 16 core Octeon processor. 15 cores are used for security processing. I believe IPsec performance is around 3Gbps for 1280 byte packets and less than 1Gbps for 320 byte packets. Octeon also has crypto accelerator. Such a low performance for 15 cores running at 9Ghz total. Isn't very low performance?

Srini said...

It is difficult comment without knowing how many cores are used for ipsec processing. If it was with all 15 cores, I agree with you that performance is low.

In last RSA show and other shows VortiQa team (I am part of it) of Freescale demonstrated 10Gbps of firewall+IPsec (part of full fledged VortiQa UTM) performance with IMIX traffic (Average packet size is 390 bytes) with 7 cores of P4080 Multicore processor. I am not sure whether you can get that kind of performance on x86 Multicore processors. P4080 Multicore processor can do this due to its intelligent Security Engine which not only does crypto offload but also majority portion of IPsec packet processing offload.

Having said that, I am not sure Ipsec is requirement for your application. I am guessing that SSL is relevant and you can benefit from the SSL record layer offload capability provided by Embedded Multicore processors.

As I said before, if one is looking for raw core performance, x86 is certainly provides that. If target applications can benefit from accelerators, which I believe it should, then don't rule out non-Intel Multicore processors.