Saturday, August 30, 2008

Are Multicore processors Linux friendly?

Multicore processors have become the trend for past three years.  Intel, Cavium and RMI have Multicore processors.

But are they Linux friendly?

Since Linux has SMP functionality, they are in theory Linux friendly.  And they are to large extent as long as Kernel controls the hardware accelerators and Ethernet ports. 

There are many reasons why Network device vendors don't like to deal with the hardware from Kernel space. Let us take the two accelerators in question - Regex accelerator and Crypto accelerator.  Any device vendor providing security functionality mostly implement using Linux user space daemons for several reasons such as
  • User space programming is easier to debug.
  • Some security functions require proxies which work on top of socket interface - hence user space applications.
  • More memory can be accessed from user space.
  • Taking advantage of SWAP.
Any application will not be limited to one user space daemon. There could be many user space daemons for typical network infrastructure device. Also, there could be processes which are created dynamically to take more load.  Let us take UTM as an example:
  • IDS/IPS - One user space process, multiple threads.
  • Network AV - HTTP/S Proxy -  One user space process, multiple threads.
  • Network AV - SMTP/S proxy - One user space process, multiple threads.
  • Network AV - POP3/S Proxy - One user space process, multiple threads.
  • Network AV - IMAP/S proxy - One user space process, multiple threads.
  • ClamAV or any equivalent Anti Virus package -  Multiple user processes created dynamically at run time.
  • Spam Assasin or equivalent Anti Spam package - Multiple user processes.
  • IPsec :  Kernel level function.
Crypto acceleration is required by 
  •  Network AV - HTTPS proxy,  SMTP/S Proxy, POP3S Proxy and IMAP/S proxy.
  •  IPsec in kernel space.
Regex acceleration is required by:
  • ClamAV daemons.
  • Spam Assasin daemon
  • IDS/IPS daemon
  • Content Security daemon (HTTP Proxy).
To improve performance and also to have isolation, many vendors would like to deal with the hardware directly from user space without Kernel doing de-multiplexing of requests/responses. That is, the accelerator device need to be shared by multiple daemons. Each daemon should be looking at its own copy of accelerator.   If one process dies, it should not affect other devices.

Unfortunately, the Multicore processors today don't have that capability.  I hope new Multicore processors would have this capability. 

Let us see the expectations by the software on hardware accelerator devices.
  • Accelerator device should be instantiable.
  • Each instance can be memory mappable by the appropriate user space daemon.
  • Only owned process/thread should be able to submit and get hold of responses.
  • Each instance of hardware device should have its own interrupt and this interrupt line should wake up the appropriate thread.
  • When a user process dies, it should not affect other processes using the same device with different instance. 
  • When the user process dies, software should be able to stop the instance of device.
Intel and VIA implemented crypto as instructions and hence they may not have above issues. But many Multicore processors implemented them asynchronously and would have issues if they don't have support for 'instances'.

Comments?

Thursday, August 28, 2008

Techniques to defend aganist DNS Cache Poisoning attacks

This subject is covered very well in many forums, IETF drafts and RFCs. My purpose of this article is to provide some idea on how network security devices can play a role in defending the DNS cache poisoning attacks. Before that, let me give some background.

Many DNS attacks, discussed recently are on DNS Caching Servers, DNS resolvers and there are less number of attacks on Zone Authority DNS Servers. Attacker sends DNS response packets to DNS Caching Servers before authoritative DNS Servers respond to queries sent by DNS Caching Servers, there by poisoning the cache with IP addresses for domain names of their choosing. Typically DNS Caching servers are located in company premesis and ISPs. These DNS Servers cache the responses until TTL expires. When query is raised to these caching serves, they reply immediately with response if it corresponding entry is present in its cache. If not, it sends the query to some other pre-defined uplink DNS Servers or to the authorative DNS Servers corresponding to the domain in question.

Attackers send the DNS response as if it was from uplink DNS Server or authoratative DNS Servers. Since DNS works on UDP, it is quite easy for attacker to spoof the DNS response. The difficulty in making the attack successful by attacker lies in making the response acceptable by the Caching Server. DNS Caching Servers, typically accept the response only:
  • they have sent the query.
  • response contains the same transaction ID as the query it had sent.
  • response contains the destination port same as the source port it used to send the query.
  • response contains the source port same as the destination port it used to send the query.
  • response contains the source IP same as the destination IP it used to send the query.
But, the attackers seem to be able to penetrate above defenses. According to recent vulnerabiltiy reports, many DNS Caching servers randomize the transaction ID already, but it seems that it is not good enough. One of the syggestions to defend against this attack is to make source port of the query also random. It helps in making attacker life difficult as it requires more responses and hence time. It appears that many DNS Caching servers and DNS resolvers patched their software and randomizing the source port. That is good news. But, it appears that even this defense is broken. See this article here: http://tservice.net.ru/~s0mbre/blog/2008/08/08. As seen in the blog, it takes more time, but attacker can make the exploit successful.

Many DNS Caching servers implemented some additional defeneses such as:
  • Additional corelation checks between Query and responses, such as checking for "Domain name" in Query section matches with Query and Answer section in the response : Many attackers are getting around this defense by sending their own queries with arbitrary domain names as part of queries and sending same in the DNS responses . By sending queries themseleves gives them even more control. That is, they need not wait for subscribers of DNS Caching Server to intiate the query, thus greatly increasing the chance of cache poison. Attackers themselves are sending the query with unknown subdomains which certainly triggers the query from the Caching server. Attacker can time their responses right after sending the query with un-resolvable and randomly created domain names. One might ask, how is Cache is getting poisoned if attackers are sending random subdomain name. At best it is kind of DoS attack. From the recent attack exploit scripts (script1 and script2), one can understand that cache poisoning is not happening with the information present in the answer section of DNS response, but using NS Resource Records (Athority section) and Additional Resource records of DNS response. Attackers are ensuring that the Domain name in Question section in both query and response is same. To understand more about this kind of attack, read 2.3 section (Name Chaining) of RFC3833.txt
  • Preventing from Birthday attacks - Again this defense is being overcome by sending random domain names in each query by attackers.
  • Ensuring that the NS record entry has some portion of the domain name in the Question section to make sure that arbitrary NS record is not honored: This kind of defense also is being broken by attackers. Attacker are actually creating full domain name in Question section with random string followed by the victim domain name and creating response with victim domain name in Authority section. Thereby, the defense is being bypassed by attacker. For example, if attacker wanted to serve his/her own IP addresses for www.veryimportantsite.com, then the type of queries and responses he/she sends to victim caching server look like this:
-----------------------------------------------------------------------
Query:
QUESTION SECTION:
.www.veryimportantsite.com. IN A

Response:

Question section is same as Query.

AUTHORITY SECTION:
www.veryimportantsite.com. 6000 IN NS attacker.veryimportantsite.com.

;; ADDITIONAL SECTION:
attacker.veryimportantsite.com. 6000 IN A 2.3.4.5

----------------------------------------------------------------------------

As you see above, defenses used by DNS Caching servers are not going to work. In above case, DNS Caching Server is going to use 2.3.4.5 as the Authoritative DNS Server for www.veryimporantsite.com domain name. Any queries to this caching server for domain names www.veryimportant.com and any other subdomains within it, 2.3.4.5 server is contacted. Since this server is hosted by attacker, he/she can choose to provide IP addresses for victim domain names. Basically, in this case, NS List is corrupted with some poisoned entry. Replace veryimportantsite.com with google.com, then it will have devastating effect if the DNS Caching server is one of popular ISPs' servers. Many users behind this ISP will be directed to site of attackers' choosing when they visit www.google.com or any subdomain under www.google.com.

Many people agree that the right solution is getting rid of UDP and use DNS over TCP or use DNSSEC. It is going to take time for everybody to adopt this. Until then DNS Caching Servers need to improve their randomization and security administrators install security devices to prevent the attack or get early signs of attempts.

How can security devices help?

Due to the stateful nature of network security devices, DNS response packet is not accepted by the firewall functionality of devices if 5 tuples of response don't match up with 5 tuples of DNS query. That is, DNS response packet must have its SIP equal to DIP of the query, DIP equal to SIP, SP equal to DP and DP euqal to SP. In addition, many firewall devices also check for matching transaction ID. One might think that Firewall is not adding value here since many DNS Caching servers are already doing these checks. But firewall goes one step further. It only allows DNS response from secuirty zone to which DNS query is sent. Also, it is good practice to set up the rules to allow DNS traffic as needed. Typically in Enterprises, only outbound DNS is allowed. If there are Authoritative DNS Servers in side, DNS requests are allowed only to these particular servers. This will reduce the probability of successful poisoning attacks. Let us analyze.

Assume that, a company has DNS Caching Server in its "Intranet-DMZ" zone and ISP DNS Server, ofcourse, is in untrusted (External) zone. Enterprise administartor creates a rule 'Intranet DMZ Zone' to 'External Zone' on Destination Port 53 Destination IP as ISP DNS Server with "Allow" action. Due to this rule, any DNS queries from the attacker will be dropped. So, the attacker only needs to depend on queries generated by DNS Caching Server due to genuine queries from its local users. That dramatically reduce the success of the attack. If there are internal attackers, they need to send both queries and responses.

Some firewall devices come with local DNS Caching Server (DNS Resolvers) functionality. In these cases, it should take same precautions such as randomizing the source port and transaction ID while sending DNS query to uplink DNS Servers.

I feel that security devices implementing firewall and IPS can add more defensive measures such as:

1. Many DNS resolves and Caching servers are already enhanced and donot send DNS queries on persistent sessions. That is, Each DNS query is sent with different randomized source port. Security devices can remove the session once the corresponding DNS response is received. Then count the number of DNS responses received for which there is no session entry.

As you observe from the attack scripts, for every query, large number of responses are sent by the attacker. By having some logic on counting the number of orphan responses in a particular quantum of time indicates that some attacker is trying to exploit DNS Caching Servers. Security devices can warn the administartors for further analysis.

2. Once the attack attempt is detected, security device can send one more query with the same information as in the original query and match the responses. If the content looks similar, then it can be cached and send response to the original querier. Ofcoruse, this requires some DNS resolution functionlity in the security device. Note that this functionality can also be implemented within the DNS Caching Servers too.

3. Security devices implementing NAT should not make the guessing of ID and source port easier. When NAT port is selected, it must ensure that it is random port.

Thursday, August 21, 2008

Data Center Firewall features

What makes firewall a good data center firewall?

Before going further on what features are expected by data center IT/Security professionals, it is good to revisit the data centers. Data center providers are mainly hosting providers. They host their customer applications and machines. Some customers of data centers share a machine resource, some like to host their application in a virtual system and some like to host their applications in a dedicated machine(s)/blades. To provide availability and share the load, application servers are installed in multiple machines with "load balancers" distributing the load across the server farm. As we all know, HTTP/HTTPS servers by far the single most server application in data centers. Most of the times, services provided by hosted servers are meant for general public.

Increasingly, there is a trend by Enterprises offloading hosting of Intranet servers to external data center providers. Intranet servers are typically provide access to Employees and limited access to their partners. For example, many email services, sharepoint and wikis are being offloaded to data center providers by many small and medium Enterprises. Many of these services require user authentication. Enterprises don't like to duplicate the user databases in multiple machines/applications. So, you also see the trend of 'Central Authentication Database' across internal servers and servers hosted outside. Many web applications are providing SAML based authentication for federate identity. Since web services need to talk to outside identity providers, there can be outbound connections. Note that, traditionally, servers in data centers only see inbound connections.

Enterprise administrator also requires facilities to upload the content and do other administrative activities on hosted servers. Typically FTP, SSH are some of the services required by administrators. Some applications might have web interface running on Port 80/443 for administration. To provide added security beyond user authentication, data center providers likes to control admin access from particular network(s), typically Enterprise Networks.

With more and more services (both Intranet and Extranet) being hosted in external data centers, the need for securing them is high. Collaborative services/servers such as wikis, share point, CRMs and other work flow servers are typically used to be part of Enterprise networks and only accessible for local users. They are being hosted in external data centers for reasons such as providing access from anywhere for employees, partners, contractors etc.. and also reduce the administration headache. Since they are exposed to access from anywhere, they are open for attacks from attackers. So, need for detection and prevention of exploits becomes much more than what data centers are used to. Quick look at the vulnerabilities published by NIST indicates (nvd.nist.gov) that SQL/XSS/LFI/RFI injections are on rise. You can also see number of wikis, blogs and other collaborative applications are targets of attackers.

Intranet servers when placed in external hosting providers' network, Enterprises would like the communication channels to be secure to protect data from eaves dropping. HTTP over SSL/TLS is one common method used to achieve data confidentiality on the wire. For security devices, placed outside of these servers, to do better job of access control, intrusion detection and malicious injections, it is necessary for these devices to see the traffic in clear. To achieve this, security devices should have capability to decrypt the SSL and do traffic/data analysis and if required redo the SSL. By the way, Since security devices are expected to be kept right in front of the servers, there may not be any need for redoing SSL. But important take way is that the security device should have capability to terminate the SSL connections.

From last few years, many web applications are using SOA (Service Oriented Architecture) which is built upon XML standards. Traditional ways of plain POST requests, JSON and PHP Objects are fast becoming thing of past. Any security device doing intrusion and data analysis need to move beyond POST, JSON and PHP objects and start interpreting SOAP and XML.

Data center providers provide services to many customers. Each customer requirement from security perspective is different. One generic security policy does not fit in these environments. You could have as many firewalls as number of customers, but that is not scalable from cost, space and cooling perspective. Virtualization in firewall/security devices comes in handy. Virtualization with VMWare/Xen also does not scale well. Old traditional virtualization scales well and suites well for data center providers.

Since security device comes in the way of traffic, things like performance of security devices should be high to support traffic rate that can be processed by servers/services it is securing. Latency, stability, availability and failover capabilities are some more important factors data center providers consider while selecting the security devices.

With above background, it is very easy to map to the features expected by data center providers on security device protecting their application and server infrastructure.
  • Access Control : As you see above, access control some times need to go beyond IP addresses and TCP/UDP ports. Some web applications provide administrator and normal user access via same TCP/UDP port. Hence it is not possible to distinguish administrator and normal users from IP addresses and ports. Since many data center providers don't like admin access to be given from any IP address (for providing better security), but from some specific networks, it is required that the access control go beyond to application level information such as URL, Query parameters etc..
  • Intrusion Detection and Prevention at L3-L7: As explained above, typical traditional intrusion detection systems without web application intelligence will not be able to detect intrusions all the time. There are many evasions being employed by attackers. Some evasions are at the IP and TCP level and more evasions are at the HTTP protocol level. Hence protocol intelligence is required. In addition, with SOA based web services, intrusion detection systems need to have intelligence to extract data from SOAP/XML messages. In addition to web application intelligence, they also need to have intelligence of other common services provided by hosting providers such as DNS, FTP, SIP etc..
  • SSL Proxy: Network device should be able to terminate the SSL for further analysis on the protocol data.
  • Virtualization: One physical hardware box is expected to support multiple virtual instances to reduce number of security devices in the deployment. Each virtual instance would need to have its own security policy configuration. It should be as good as different physical firewall devices. I, personally don't prefer VMware/Xen/KVM based virtualization for these environments. I prefer Traditional virtualization where only configuration data and run time states are instantiated for every context.
  • DDOS attack detection and prevention.
  • Traffic Anomaly detection and traffic Control.
  • Performance: To achieve multi gigabit speeds, look for hardware architecture which is scalable.
  • Stateful failover and high availability
  • Logging & Auditing capabilities
  • Intuitive central Management system
Optional features: Though they are not required, some data centers might find them useful
  • Server side NAC: Provide facility for user based access control. NAC does user authentication and provides control access to the different features of an application based on the URL and other fields in the protocol. It also helps in correlating user actions and might be useful in auditing.
My intent here is not to go into many details, but provide some ideas on the features security vendors would need to think while providing security device solutions to data center market.

Guidelines for defining data models

TR-106 defines data model guidelines for creating new data models. For interoperability and to provide better clarity, I have defined some more guidelines.

Before you proceed further, please read this.


Guidelines:

Guidelines that are being followed today in defining data models.

  • Data type: such as string, base64, integer, unsigned integer, Boolean, char etc..
  • Range: In case of integer, unsigned integer values.
  • Enumerations: Set of values that this parameter can take. Valid for both integers and strings.
  • Min/Max length of string: In case of string and base64 types.
  • Read/RW attributes for parameters
  • Create/Delete for table nodes.

Additional guidelines:

General:

  1. Values of some parameters don't reflect changes done by ACS. These are called action parameters. Each action parameter should have associated result parameter. Action parameters are always strings and it takes value "apply". ACS sets this value to perform the action. These parameters should be defined with attribute called "ActionParameter". This attribute should be set to 1 for action parameters. Associated result parameter fully qualified name should also be set with attribute "AssociatedResultParameterName".
  2. For each associated action parameter, there should be result parameter. Result parameters are always strings. "success", "not available" are two values defined by this guidelines. Any other string value is considered as error in performing the action by application. Possible error strings are defined by applications defining the data model. Whenever action parameter value is set, associated result string value should get automatically set to "not available". When the application processing is done successfully, then this parameter value would be set to "success" by device. If application processing returns error, appropriate string value is set to this parameter.
  3. Always define default values for all parameters except for mandatory parameters of table objects.

Table objects:

  1. Identify the parameter that uniquely identifies the instance in the table object. Indicate that in an attribute "KeyParameter". ACS can check for the uniqueness of this value whenever the instance is created.
  2. For each parameter that can't be changed after instance is created, indicate in DM that "DontChangeAfterCreate". Key parameter of the table object typically has this attribute set. There could be other parameters too based on application.
  3. Identify all parameters that are mandatory during instance creation. Use "Mandatory" attribute for these parameters. ACS can ensure to send values for all mandatory parameters in "SetParameterValues" method that follows "AddObject" RPC method.
  4. Indicate the default value in the data model if the parameter is not mandatory. This information is needed when ACS user asks it to reset the some specific optional parameter to default value.
  5. For every table object, one parameter "Number Of Entries" must be present as per TR-106. I suggest to have one parameter "Number off Entries Supported" to represent the number of rows the device can hold. Both these parameters take integer values and both of them are READONLY. Note that names of the parameters could be different for different data models (table nodes). Hence there is need to associate these parameters to appropriate table nodes. To associate these parameters for the table object (table node), these parameter should have attributes associated with it indicating the full qualified table object. I call this attribute as "AssociatedTableNode". In addition to this attribute, two more attributes are needed - "CurrentEntriesIndicator", "MaxNumEntriesIndicator". CurrentEntriesIndicator attribute should be set to parameters indicating the current number of entries and MaxNumEntriesIndicator attribute should be set to parameters indicating the maximum number of entries that device can take for corresponding table.
  6. If table is representing the ordered list, then it should have special attribute for the table object called "OrderList". Also it should have associated parameter name that indicate the priority of the ordered list. Call this attribute as "PriorityParameterName". This information is used by ACS as described in previous blog here. Priority parameter name takes integer values. Lower values indicate higher priority instances.
  7. If table is an ordered list, it should also have one pair of action and result parameters for revalidation purposes. In addition to attributes it defined earlier for these two parameters, it should have additional parameter representing the table object name with attribute "AssociatedTableNode". ACS uses these attributes to know the parameters names to revalidate the states in the device.
  8. As described under section "Nested table objects and special cases" of earlier blog here, there is a need for creating a table object instance with at least one instance of child table. ACS needs to know this special relationship so that it generates the screens such a way that it takes at least one child instance parameters as part of parent instance creation screen. This relationship information also useful for ACS to validate this information before sending configuration to devices. Each child table of this type should have one attribute "OneInstanceNeededByParent" set to 1.

Wednesday, August 20, 2008

Jericho forum - Is network security device market dead?

In one of the meetings I participated few weeks earlier, one person asked me a very interesting question - Will there be any security devices market in future? When I asked him why that question, he referred me to Jericho forum. Though I have some idea about jericho forum before, it got me interested to know more details about this.

When I first browsed through the forum publications, I thought that the question that was asked was fair. At first glance, it appeared that Jericho forum is proposing to add security along with application and data. But after spending few hours on position papers, Brochure and FAQ, it appears that Forum is not advising people to throw away their firewalls and security devices, but enhance security down to applications, data. Having said that, position papers still confuse readers with some inconsistent statements. I think that Jericho forum did not position their security concerns and resulting architecture very well and hence the confusion and mis-characterization in security industry.

Jericho forum described two main challenges - Business transactions that tunnel over HTTP/HTTPS and exploits/malware escaping traditional firewalls/security devices. I add one more challenge beyond HTTPS, that is, data itself may not be in clear - either it is encoded, encrypted or compressed.

It is true that traditional network address/service level firewalls are not good enough to protect resources from data level attacks and data misuse. Many applications are being developed on top of Port 80/443 (HTTP/HTTPS). Web Services (SOA) architecture is being used to develop multiple applications on a single machine with HTTP/HTTPS as transport. Any application service level filtering is possible only by devices having HTTP/HTTPS and web services intelligence.

It is also true that many newer malwares evade traditional signature detection - either by sending malware executables via HTTPS or constantly morphing themselves to avoid detection. One of the techniques behavioral analysis requires gathering the run time information such as registry entry modifications, listening port, any outbound connections, files being modified etc.. by running the executable on appropriate operating system.

With the challenges described and positioning it is doing, first impression I got was that Jericho forum is advocating adding entire security along with each application in the same machine. It took me a while to get rid of this impression. I guess the term "De-perimeterization' is confusing. I would like to think that Jericho Forum is proposing that security at Enterprise boundary is not good enough and the security is, additionally, needed closer to the applications/resources. So, there are multiple perimeters, with some perimeter having few machines or even one machine or one application. By the way, traditional firewalls and Ipsec VPN devices do very good job of providing access control to desktop systems based on the type of user and provide security connectivity to other branches of organization.

Though adding all security functions along with the application on the same machine provides better security, there are complexities:

There could be multiple machines running same application in cluster mode. In some deployments, it is observed that 100s of machines are used to share the load. In those, it is wise to move security functions such as "L4-L7 access control", "Intrusion Detection functions" to specialized security devices. It saves CPU cycles on application servers. It provides single control for administrators to manage security functions of the applications or set of servers and hence the management becomes easier. Some security functions such as terminating wireless connectivity and mobile device management don't really belong to one specific LOB (Line Of Business) application. They need to be outside of the application servers.

Having said that, some security functions can't be done well outside the LOB machines such as behavioral detection of malwares or when the data is encrypted or compressed with proprietary algorithms. They are better done as in end systems.

There is cost to apply some security functions outside the LOB servers. For example, Many LOB Servers implement security protocols such as SSL, XML Security etc.. Any access control device providing control at the XML field level must terminate the SSL connection, authenticate the user and decrypt and validate the XML documents before doing access control. There is inherent benefit too - It saves CPU cycles on the LOB Servers as it sees clear traffic. But, there may be some concerns in CSOs that some network elements has access to the clear data. If it is micro perimeter, then there may not be any concern. I guess Jericho forum is driving this point where the security perimeter is as close to the applications and data.

Security device vendors would like to make their solutions as generic as possible. They don't like to tie up the device functionality to one or few applications. That is where, standardization helps. I am happy to see that Jericho forum in their COA (Collaboration Oriented Architecture) position paper, chose the SOA and XACML. Both of these architecture heavily dependent on XML messaging. It provides common understanding for network elements outside of LOB servers and there by creating eco-system of vendors comprising security vendors, application vendors.

Having said that, I feel that the LOB applications must have their own security based on the application - Such as authentication, multiple roles, role based access, Auditing etc..

In summary, CSOs need to understand that Enterprise boundary security with traditional network level firewalls is not good enough to protect the data and resources. Application specific security is must. Some security functions can be done outside of LOB servers, but the security device must be as close to the LOB servers as possible. So, I don't see network security device vendor market drying up.