Sunday, May 18, 2008

Proxy based Networking applications - Multicore and developer considerations

Many networking applications such as Anti Virus, Anti Spam,  IPS  to name few are being implemented as proxies in network infrastructure devices.  Proxy implementations terminate the connections from external clients and make connection to the destination servers. They get hold of data from client and server and send it to other end after doing applications specific processing.  To ease the development, proxies are implemented on top of sockets in user space.  For each client to server connection,  two socket descriptors would be created - One socket is created as part of accepting the connection from client and another socket is created as part of making connection to the server. 

In unicore processors, it is typical practice to have one process per proxy.  Process handles multiple connections using non-blocking sockets. It is done typically using either poll() and epoll_wait()  mechanisms.  This allows the process to work on multiple connections at the same time.

Many networking applications listed above use the proxies in transparent mode.  Transparent proxies avoid any changes in the end client and server applications and also avoid any changes to the DNS Server configuration.  Systems with transparent proxies intercept the packets from both client and servers.  It is expected that forwarding layer (of Linux ) of the system  to intercept the packets and redirect the packets to the proxies.  Redirection typically happens by overwriting the IP addresses and TCP/UDP ports of the packet such a way that the packets go to the proxies running in user space without any making any changes to the TCP or UDP or any other stack component by the developers.


Process skeleton looks some thing like this:

main()
{
     Initialization, Daemonize & configuration load.
     Create a listening socket.
     while(forever until termination)
     {
            epoll_wait();
            Do any timeout processing.
            for ( all ready socket descriptors )
            {
                 if ( listening socket)
                 {
                        accept();
                        Create application specific context.
                        Might initiate the server connection.
                        May add socket fds to epoll list.
                 }
                 If socket is ready with new data
                 {
                      Application specific processing();
                     As part of this, the oscket fd may get added to the epoll list again.
                 }
                 if ( socket is has space to send more data )
                {
                      Application specific processing() which sends the data.
                      If more data to be sent, socket might be kept again in epoll list.
                }
                if ( socket has exception )
                {
                        Application specific processing ();
                        Connection may be closed as part of application processing.
                }
  
            }          
     }
     Do any graceful shutdown activities.
     exit(0);
}

Increasingly Multicore processors are being used in network infrastructure devices to increase the performance of solution.  Linux SMP is one of the popular operating system choice by developers.  What are the things to be considered while moving to Multicore processors?

Usage of POSIX compliant threads in the proxy processes:
Create as many threads as number of cores in the processor.  Core affinity can be done for each thread.  Yet times one might like to create more number of threads beyond number of cores in processor to take advantage of asynchronous accelerators such as  Symmetric & Public key crypto acceleration,  regular expression search acceleration etc..  In those cases, thread might wait for the response.  To allow the core to do other connections processing ,  multiple threads per core are required. 

Mapping of thread to the application processing:  

Developers use different techniques.  Some use pipelining model. In this model, one thread gets hold of the packets for all connections and pass the packets to next thread in the pipeline for further processing.  Last thread in the pipeline sends the packet out on other socket.  Though this might use multiple cores in the processors, this may not be ideal choice for all applications.  I feel run-to-completion model is good choice for many applications.  Run-to-completion model is simple. Each thread waits for the connections. Once the connection is accepted, it does everything related to the applications in the proxy including the sending out the processed data.   The structure is similar to the process model, but the loop is executed by each thread.  That is, connections get shared across the threads with each thread processing set of connections.  Advantages which this approach are:
  • Better utilization of dedicated caches in the cores.
  • No or less number of Mutex operations as one thread does all processing.
  • Less number of context switches.
  • Less latency as it avoids multiple enque/deque operations to pass packets from one pipeline stage to another.
Load balancing of the incoming connections to the threads: 

There could be multiple ways of doing this.  One way is to let master thread accept the connections and give to one of the working threads to do rest of processing on the connection.  Master thread can use different load balancing techniques to assign the new connection to least loaded thread.  Though this approach is simple and confined to the process,  it has some drawbacks.  When there are shortlived, but large number of connections being established and terminated in quick succession,  master thread can become the bottleneck.  Also, cache utilization may not be that good as master and worker threads might be running on different cores.  Since master thread is not expected to do much other than accepting the connections,  a core may not be dedicated, that is, core may not be affined with the master thread. 

Another technique that can be used to let each thread listen on its own socket, accept the connections and process them.  As we know, we can't have more than one listen socket with respect to IP address and port combination.  So, this techniques uses multiple ports as many as number of threads. Each threads listens on a socket created with unique port.  It should be noted that external clients should not be knowing about this.  The client connections to the server will always be with one standard port.  Hence this technique requires additional feature in the intercept layer.  Intercept layer (typically in the Kernel in case of Linux) is already expected to do IP address translation to ensure the packets are redirected to the proxies.  In addition it can do port translation too. Port to translate with can be found based on the load on each port. For example, the port selection can be 'round-robin' or it could be based on 'least' number of connections on the ports.

Do all relevant application processing in one proxy:

Network infrastructure devices are complex. Yet times, on the same connection multiple application processing is required. For example,  on HTTP connection, the device may be expected do 'HTTP Acceleration such as compression', TCP Acceleration and 'Attack checks'.   If these are implemented in different processes as proxies, latency would increase dramatically as the each proxy terminates and makes new connection to next proxy. Also performance of the system goes down. Certainly it has one advantage, that is, maintainability of the code. But performance wise, it is good to do all applications processing in one single process/thread context. 

Considerations for choosing Multicore processor

Certainly cost is the factor. Besides the cost, other things to look for are -
  • Frequency of core is very important:  As discussed above, a connection is handled by one thread.  Since thread can be executed in one core context at any time, performance of the connection is proportional to processor frequency (speed).  For proxy based applications,  higher frequency cores are better choice compared to multiple low powered cores.
  • Cache :  Proxy based applications typically do lot more processing than typical per-packet based applications such as Firewall, IPSec VPN etc..   If the cache size higher, more instruction memory can be cached.   So, higher the cache size, better the performance. 
  • Division of cache across cores:  Since threads can be affined with the cores, it is good to ensure that the data cache is not same across the cores. Any facility to divide the shared cache into core specific cache would be preferrable.
  • Memory mapping of accelerator devices into the process virtual memory:  By having access to the hardware accelerator from the user space, one can avoid memory copies between user space and kernel space.
  • Hardware based connection distribution across cores :  This is to ensure that the traffic is distributed across cores. Intercepting software in Kernel forwarding layer need not make any load balancing decisions to distribute the traffic across threads.  Intercept layer only need to translate the port so that the packets go to the right thread.
Other important considerations that are needed for any applications are:
  • Facility in hardware to prioritize the management traffic at ingress level : To ensure that Management application is always accessible even when devices is under flood attack.
  • Congestion Management in hardware at ingress level:  To ensure that buffers are not exhausted by application that do lot of processing.
  • Hardware acceleration for crypto,  regular expressions and comperssion/uncompression.
Programming considerations for performance
  • Each poll() or epoll_wait() calls are expensive, so avoid calling epoll_wait() as much as possible : Once the epoll_wait comes out, read the data from the ready socket as much as possible. Similarly write data as much as possible on the ready sockets.
  • Avoid locking as much as possible.
  • Avoid pipelining - Adopt run to completion model.
I hope it helps developers who would be developing proxy based applications.

Hardware timer block in Multicore processors for network infrastructure devices

Some use case scenarios of timers for different functions of network infrastructure devices is given here.  



One of the main challenges with software timers is to ensure that jitter and latency of the packets don't go up during the period when some timer block related operations occur.  Latency of the packets or even packet drop happens when CPU takes too long a time to process some timer block related functions.  Any timer block functions that go through the timers in a tight loop would have affect on packet processing if the number of timer elements checked or acted on in the tight loop are more.  The threshold of number of elements that are checked in the tight loop that causes packet latency disruption depends on the frequency of CPU. Based on the software timer block implementation,  traversal of some timers happen for different operations.  Let us see some of the challenges/problems with software timer modules.
  • Software timers depend on hardware timer interrupt. In Linux, timer interrupt occurs fore very jiffy( typically 1msec or 2msec).  Due to this any software timer can  have error up to jiffy.  If applications requires  smaller error, say in terms of, micro seconds,  then only method I can think of is to have timer interrupt to occur in terms of microseconds.  This may not work in all processors.  There is too much of interrupt processing overhead in cores and reduces the performance of the system. Fortunately many Applications tolerate millisecond error in firing the timers, but some applications such as QoS scheduling on multi-gig links running general purpose operating systems such as Linux require finer granular and accurate timers.
  • Many networking applications require large number of software timers as described in earlier post.  This will lead to traversing many timers on per jiffy basis.  For example, if an application creates 500K timers/sec, then there would be 500 timers on per jiffy basis. For every 1 millisecond,  it needs to traverse 500 timers and may have to fire all 500 of them. This can take significant amount of time based on the amount of time the application timer callback takes. If takes good amount of time to process, you have packet drop or increased packet latency or both the issues. Some software implementations maintain the timers on per core basis. If there are 8 cores,  each core may be processing 62 or 63 timers every millisecond.  This is ideal case, but what if the traffic workload is causing only few cores starting the timers. Only few cores would be loaded to process the expired timers.  Basically the load may not get balanced across the cores.
  • To reduce the number of timers to traverse for every hardware timer interrupt,  cascaded timers wheels are normally used by software implementations. This implementation does have different timer wheels for different timer granularity and when the timers are started, they go to appropriate wheel and bucket. Due to this any bucket of timer wheel contains the timers that will get expired. Though it reduces the number of timers to traverse for every timer interrupt, but it may involve movement of large number of timers from one timer wheel to another as described in the earlier post. This movement of timers may take significant amount of time and again could be the cause for packet drop and increased latency.
  •  If there are periodic timers or need to be restarted based on activity software timer implementation spend good amount of time in restaring them.
Do hardware timer blocks in Multi-core processors help?  

In my view hardware timer block can help when your applications demand large number of timers, periodic timers or very accurate timers.  If your application requires 'Zero Loss Throughput', then hardware block is going to help certainly as it takes away the CPU cycles used to traverse the timer list or movement of timers in software implementations.

What are the features expected by network infrastructure applications from hardware timer block in Multi-core processors?
  • Large number of timers are expected to be supported, ranging in Millions. 
  • Decent number of (say 1K) timer groups are expected to be supported.  There are multiple applications running in cores that require timers.  Applications that are being shutdown or that are being terminated due to some error conditions should be able to clear all the timers that it had started.
  • Accessibility of timer groups by applications running in different execution contexts. There should be good isolation among timer groups. There should be some provision to program the number of timers that can be added to a timer group.  There should be provision to read the number of timers that are in the timer group.
    • Applications running in Linux user space 
    • Applications running in Kernel space.
    • Applications running in virtual machines. 
  • Application should be able to do following operations. All operations are expected to be completed synchronously.
    • Start a new timer:  Application should be able to provide
      • Timer identification : Timer Group & Unique timer identification within in the group.
      • Timeout value (Microsecond granularity)
      • One shot or periodic timer or inactivity timer
      • Priority of timeout event (upon expiry) : This would help in prioritizing the timer events with respect to other events such as packets.
      • If there are multiple ways or queues to provide the timer event upon expiry to the cores,  then application should be able to give its choice of way/queue as part of starting the timer. This would help in steering the timer event to specific core or distribute the timer events across cores.
    • Stop existing timer :  Stopping the timer should free the timer as soon as possible.  One existing hardware implementation of timer block in Multi-core processor today has this problem. If the application is starting timers and stopping them in continuous fashion, eventually it runs out of memory and memory will get freed only upon actual timeout value of the timers. If the timeout of these timers are in tens of minutes,  then the memory is not released for minutes together.  Good hardware implementation of timer block should not have this exponent usage of memory in any situation.   Timer stop attributes typically involve
      • Timer identification
    • Restart the existing timer.
      • Timer identification
      • New timeout value
    • Get hold of remaining time out value at any time, synchronously by giving 'Timer identification'
    • Set the actvity on the timer - Should be very fast as applications might use this on per packet basis.
Firewall/NAT/ADC appliances targeting Large and Data center markets would greatly benefit from the Hardware based timer blocks.  All hardware timer blocks are not equally created. Hence check the functionality and efficacy of hardware implementation. 
  • Measure the latency, packet drop and jitter of the packets over long time. One scenario that can be tested is given below.
    • Without timers,  measure the throughput of 1M sessions by pumping traffic across all sessions using equipment such as IXIA or smartbits. Let us this throughput is B1. 
    • Create 1M sessions, hence 1M timers with 10 minutes timeout value.
    • Pump the traffic from IXIA or smartbits for 30 minutes.
    • Check whether the throughput is almost same as B1 across all 30 minutes.  Also ensure that there is no packet drop or increase in latency of packets, specifically at 10, 20, 30 minute interval.
  • Measure the memory usage:
    • Do connection rate test with each connection inactivity timeout value 10 minutes.
    • Ensure that upon TCP Reset or TCP FIN sequence the session is removed and hence timer is stopped.
    • Continue this for 10 or more minutes.
    • Ensure that the memory usage did not go up beyond reason. 
    • Ensure that timers could be started successfully during the test.



Tuesday, May 13, 2008

Assurance of firewall availability for critical resources : TR-069 support

I guess I have been harping that network security devices are stateful in nature. Let me say that again here :-) They create session entries for 5 tuple connections. DDOS attacks can consume these resources. There are several techniques used to maximize firewall availability. Some of them I discussed before are - Session inactivity timeout functionality, TCP syn flood detection and Syn Cookie mechanism to prevent SYN floods and connection rate limiting.

Above techniques do not guarantee that legitimate connections are not dropped. Rate throttling feature does not distinguish from genuine connections to DDOS connections. But, some resources are very important and access to/from these resources must be made available all the time. That is, some assurance of firewall availability for these critical resources is required.

During DDOS attack and worms outbreak, systems in corporate network should have access to central virus database server to get newer virus updates. Even if some systems in corporate network are compromised and participating in DDOS attacks, other systems should continue to access critical resources while problem is being fixed. Similarly, access to corporate servers should be maximized during DDOS outbreak.

Though all issues can't be solved, enough facilities should be there for assurance of firewall availability for these critical accesses.

Many firewall today support feature called 'Session Reservation and Session Limits'. Using this feature, certain number of sessions can be reserved to individual machines/systems. This feature also limits the number of simultaneous sessions for some non-critical systems/machines.
One use case example: Let us say that a Medium Enterprise has 500 systems. Say that this company bought a UTM firewall with 50000 session entries. Administrator can reserve 20 sessions and limit 100 sessions for each PC. That is, 10000 entries are reserved. Rest of 40000 sessions are free for all. When all 40000 sessions are used up, then reserved sessions are available for PCs. Each PC can use its reserved 20 session entries. Thereby, when there is a DDOS attack, even after 40000 session entries are used, these PC continue to have access 20 more session entries. No other system can occupy these reserved sessions.

Session reservation database is set of rules. Each rule contains following information:
  • Rule ID: Identification of the rule.
  • Description: description of this record.
  • IP Address information: IP addresses for which this rule applies. All the action information in this rule is specific to each IP address.
  • Connection Direction: Outgoing or incoming. Indicates whether the sessions to be reserved for connections made by machines represented by 'IP addresses' or for connections terminated by these IP addresses. 'outgoing' indicates this rule is applied for connections originated and 'incoming' indicates whether this rule is applied for incoming connections.
  • Zone : Indicates the zone ID. If 'Connection Direction' is outbound, then zone indicates the destination zone. If 'Connection Direction' is inbound, then zone indicates the source zone.
  • ReserveCount: Number of sessions reserved for this rule.
Session Limits database also contains set of rules. Each rule contains following information:
  • Rule ID: Identification of th rule.
  • Description
  • IP address Information: IP addresses for which this rule applies.
  • Connection Direction: Outgoing or Incoming.
  • Zone: Zone ID
  • Limit Count: Number of maximum sessions for each of IP addresses.

TR-069 data profile:
  • internetGatewayDevice.security.VirtualInstance.{i}.firewall.maxSessionReservationRules: R, unsigned Int
  • internetGatewayDevice.security.VirtualInstance.{i}.firewall.maxSessionLimitRules R
  • internetGatewayDevice.security.VirtualInstance.{i}.firewall.sessionReservations.{i} PC
    • ruleID: RW, Unsinged Int, Value between 1 and maxSessionReservationRules.
    • description: RW, String(128)
    • ipAddressType: RW, String(32). It takes values such as 'immediate', 'ipobject'. Immediate indicates that IP addresses are given as values and 'ipobject' indicates the IP address information points to one of the IPObjects.
    • ipAddresses: RW, String(64) - f the type is immediate, then it can be single IP address in dotted decimal form, subnet by providing network IP address and prefix in terms of number or range of IP addresses with '-' in between low and high values. If the type is 'ipobject', then it has one of ipobject names from internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPValueObject.{i} table or internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPFQDNObject.{i} table. 'any' is special value indicating all source IP values. Examples: 10.1.5.10 or 10.1.5.0/24 or 10.1.5.1-10.1.5.254
    • connectionDirection: RW, String(16). It takes values 'outgoing', 'incoming'.
    • zoneID: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
    • reserveCount: RW, Unsigned Int.
  • internetGatewayDevice.security.VirtualInstance.{i}.firewall.sessionLimits.{i} PC
    • ruleID: RW, Unsinged Int, Value between 1 and maxSessionLimitRules.
    • description: RW, String(128)
    • ipAddressType: RW, String(32). It takes values such as 'immediate', 'ipobject'. Immediate indicates that IP addresses are given as values and 'ipobject' indicates the IP address information points to one of the IPObjects.
    • ipAddresses: RW, String(64) - f the type is immediate, then it can be single IP address in dotted decimal form, subnet by providing network IP address and prefix in terms of number or range of IP addresses with '-' in between low and high values. If the type is 'ipobject', then it has one of ipobject names from internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPValueObject.{i} table or internetGatewayDevice.security.VirtualInstance.{i}.NetworkObjects.IPFQDNObject.{i} table. 'any' is special value indicating all source IP values. Examples: 10.1.5.10 or 10.1.5.0/24 or 10.1.5.1-10.1.5.254
    • connectionDirection: RW, String(16). It takes values 'outgoing', 'incoming'.
    • zoneID: String(32), RW - One of the Zone IDs. It takes value of ZoneName from internetGatewayDevice.securityDomains.VirtualInstance.{i}.Zone.{i} table.
    • limitCount: RW, Unsigned Int.

Common mistakes by TR-069 CPE developers - Tips for testers

There are some mistakes which can go undetected for a long time. Testers and certification agencies need to watch out for those. I tried to list down some of those mistakes in TR-069 based CPE devices.

Problems associated with Instance numbers: It is expected that instance numbers are persistent across reboots. But, this does not seem to happen always. Typical problems observed are:
  • Instance numbers are not stored in persistent memory: If instance numbers are not stored and retrieved from persistent memory, then CPE device may provide duplicate instance number next time when it comes up for AddObject RPC methods. ACS may reject and new records will not get created. On a side note : Surprisingly some ACS systems don't even care to check for duplicate instance numbers.
  • Instance numbers are stored and retrieved, but the relationship with actual table records is not maintained: That is, ACS might have one view of instance number to the row and CPE devices have different mapping. This will have problems in modification of values. ACS thinks that it is modifying parameters of specific row, but CPE modifies some other record.
  • Instance numbers are stored and retrieved for records which were created and populated with values, but not saved for unpopulated records: This scenario occurs when AddObject is successful, but before the record is populated with values, the CPE is restarted or got restarted for some reason. If CPE does not restore these instance numbers, the configuration of that record will not be successful (when CPE comes back) as ACS sends the configuration with instance number returned by CPE before.
Problems associated with ParameterKey: ParameterKey is also expected to be stored in persistent memory across reboots. Some ACS systems use this as a configuration checkpoint. ACS systems know the CPE configuration by reading paramterKey value from the CPE. ACSes use this to figure out the 'configuration diff' between what it thinks CPE has versus the latest configuration it has for CPE. Only this difference is typically sent to CPE. Due to this, it is expected that parameterKey is stored in persistent memory along with the rest of configuration. Typical Problems observed are:
  • ParameterKey and configuration is not stored in persistent memory: There are two issues due to this. ACS reconfigures the device every time it restarts. This could be a problem as the device is unavailable until ACS configures the box.
  • Configuration is saved and retrieved, but not ParameterKey: Once the device restarts, ACS thinks that CPE does not have any configuration and reconfigures the device from scratch. Since configuration is actually retrieved, there would be many duplication errors. ACS might get confused and might stop pushing the configuration until admin is intervened. In some cases, I observed some intelligent ACS sets the device to factory defaults and configures by sending entire configuration without any manual intervention from ACS administrator.
  • Configuration and parameterKey are saved at different times: This is dangerous. Essentially the mapping between configuration and parameterKey is broken. When device comes back, ACS view and device view of configuration is different.
Problems associated with Access Control and Notifications: Access control is one feature CPE vendors forget to provide. Typical problems observed are:
  • No support provided for Access Control: This is one of the important features for managed service providers. If end user changes the important configuration and makes mistake, debugging may take significant time for Service providers. Due to this , service providers would like to allow change of only specific configuration by subscribers. Without having this support in device makes that intention difficult.
  • Support provided for Access control and notification only for some variables, but not all.
  • CPE sends notifications for unasked variables
Many problems, I believe, are result of configuration parameter mismatch between local configuration methods and TR-069 method. For example, Web interface provided on CPE might have been developed long ago before TR-069 based remote configuration method is implemented. Some CPE developers forget to integrate their web interface backend functionality with TR-069. In some cases, they do poor job of integration.

Cloud computing security is going to catch up.

Please see this article in information week.

Google and IBM are teaming up together to provide cloud services. Google is already providing email and storage services and they want to go beyond that.

One interesting thing that was mentioned in the article is
"With the exception of security requirements, "there's not that much difference between the enterprise cloud and the consumer cloud," Google CEO Eric Schmidt said earlier this month during an appearance in Los Angeles with IBM chief Sam Palmisano."

One more quote from the article:
"The cloud has higher value in business. That's the secret to our collaboration."

Another thing I observed in the article is their planned usage of Xen.

Combining all of them put together:

  • Cloud computing requires security. Otherwise, Enterprises may not be able to offload their servers to cloud.
  • Cloud computing makes use of Virtualization.

I was giving choices in my earlier blog on *Cloud computing and Security*. Though information week article is not giving enough information on how the security services are going to be offered, but they will start thinking soon.

I am beginning to think that both kinds of models which I suggested earlier would be used.

  • Flexibility for Enterprises to put their preferred vendor security products as virtual appliances.
  • Providing security using one mega security appliance.

My prediction is that mega security appliance is required to provide typical infrastructure security. Virtual appliance flexibility will be provided for specialized security.

Sunday, May 11, 2008

Packet processing applications - Updating to Multicore processors

I described the need for session parallelization and some tips on programming here. There are many network packet processing applications which don't take advantage of SMP systems. I tried to describe the steps required to convert these applications to run on SMP systems with as less modifications as possible.

Target packet processing applications:





  • Packet processing application that maintain sessions (flows) such as firewall, IPsec VPN, Intrusion Prevention Software, Traffic Management etc..
  • Packet processing applications where significant amount of cycles are spent on each packet. Session Parallelization technique uses some additional checks and queuing (enqueue and dequeue) operations of the packets in the session. Session Parallelization technique only works, if the number of CPU cycles required for packet processing is much more than the CPU cycles it takes for few checks and queue operations. Otherwise, you are better off taking locks during packet processing. For example, IP forwarding packet processing application can be parallelized using session parallelization technique.

Let us examine the typical packet processing application requirements.





  • Upon first packet of the session, they create a context - Session context. These session contexts are kept in some kind of run time data structure such as Hash buckets, linked lists, arrays, some container etc.. - I call it as Home base structure.
  • Subsequent packets of the session gets the session context from the home base Data structures.
  • Session contexts are deleted either by timer or by user actions or due to some special packets.
  • Session contexts are typically 'C' structures with multiple members (states)
    • Some members are set during session context creation time and never changed. I call them "SessionConstants".
    • Some members are manipulated during packet processing and those values are needed by subsequent packets. And these are not required by any other function other than packet processing. I call them "Session Packet Variables".
    • Some members are manipulated by both packet processing and also in non-packet processing contexts. I call those "Session Mutex Variables".
    • Some variables are used as containers by other modules. That is responsibility of storage and retrieval is responsibility of those modules. I call them "Session Container Variables"
  • Any packet processing application might contain multiple modules and each module has its own sessions (control blocks). To improve performance and other reasons, yet times, modules store other modules' control block references in their sessions. Also, current module session might be referred by other modules. when the session is actively used by other modules by having pointer to the session, it is necessary that session is not freed until all active references are dereferenced. At the same time, delete operation can't be postponed until it is deferenced by every other module. To satisfy both the requirements, each control block typically contains two variables - "Reference Count" and "Delete flag". Reference Count is incremented when external modules store the reference of the session. When the session is being deleted (either due to inactivity timer, due to special packets or due to user actions etc.. ) and if the reference is used, then modules set the "Delete flag" of the session. Any search on the session ignores the sessions having "Delete flag" set. Also the module is expected to inform the external modules that the session is being deleted. Upon this notification, external modules are expected to remove the reference and as part of it they decrement the reference count of the session. Session is freed only after all references are removed and Delete flag is set.

Process to follow to make applications SMP aware (Example: Applications running in Kernel space)

  • Identify the Session or Control block of the target module.
  • Define two additional variables in the control block - Reference Count and Delete flag.
  • Identify the home base structure. Since the sessions can be created in multiple CPU contexts, define the lock for this structure. It is better to define Read/Write lock. Read lock can be taken when the session is searched in home base data structure. Write lock needs to be taken while adding or deleting the control block to/from the home base data structure. Ensure to increment the reference count while adding to home base structure. Remove it from the home base structure only if reference count is 1 and Delete flag is set to TRUE.
    • Some times home base could be some other module's control block. In this case, it is responsibility of other module (Container module) responsibility to store, retrive and reset this control block reference atomically.
    • Always ensure to initialize the control block completely before adding it to the home base data structure.
    • Ensure that control block reference count is incremented within the home base lock for both add and search operations.
  • Identify session constants. There is nothing that need to be done for SMP.
  • Identify Session Packet Variables. There is no need to lock these variables if "Session Parallelization" technique is employed.
  • Identify Session Mutex variables. Further identify the logic groupings of these variables. For each of these logical groupings:
    • Define a lock. This lock can be global lock or session lock. Session locks are good for performance reasons, but it requires more memory.
    • Ensure to define MACROS/inline/functions to manipulate and access variables of the group.
  • Identify container variables. If this control block is home base structure for other module control blocks, then define MACRO/inline/functions to add/retrieve/reset. These macros would be used by other modules.
    • Define this set of MACROS for each container variable.
    • Define a session lock for each container variable. Above MACROS use this lock to protect the integrity of the container value.
    • Since ADD macro is keeping the reference of foreign module control block, ensure to increment the reference count of foreign control block. To allow this, ADD macro should expect the fucntion pointer being passed to it by the foreign module. It is expected that the function ponited by function pointer is used to increment the reference count.
    • Expect the "Increment Reference Count" function pointer passed to the RETRIEVE macro. Since RETRIEVE macro returns the pointer of foreign module, it is necessary to increment the reference count to ensure that pointer is not freed prematurely.
    • Ensure that complete functionality of ADD/RETRIEVE/RESET MACROS work under the lock defined for each container variable.
  • Some guidelines on Reference Count and Delete Flag variables.
    • Define "IncRefCount", "DefRefCount" and "SetDelete" and "GetDelete" macros.
    • Define a lock to manipulate and access "Reference Count" and "Delete flag" variables.
    • Use this lock only in above macros/inline functions.
    • Use above macros for manipulating and accessing these variables.
    • Do first level of cleanup of control block when "SetDelete" macro is called. This cleanup involves informing external module of its intention of going down and also removing any foreign module references. Foreign references are ones this module stored as part of its packet/session processing. Removing foreign references involves decrementing the reference count of foreign reference and invaliding the reference in the control block.
    • Do final cleanup such as freeing up memory for control block only when "Reference count" is 1 and "Delete flag" is set to TRUE. Before freeing up the memory, remove it from the home base structure. This condition typically happens as part of "Decrement Reference Count" macro.
    • In non-SMP scenarios, the reference counts are incremented when external modules store the reference of the session in their modules. In SMP scnearios, while one processor processing the packets of the session, another processor can delete the session. Due to this, it is expected that reference count is incremented even while packet processing is happening. That is, whenever "Search" for session is done or when the session is being retried from container module, the reference count must be incremented. As indicated above, retrival of the session, including incrementing the reference count must be done either in data structure/container lock. Since incrementing the reference count happens within its own lock, you would see lock within lock in this scenario. That is, reference count lock is taken with data structure or container lock taken. This is Ok as data structure or container lock is never taken under reference count lock even in cases where session is added or removed from the data structure/conatiner.

Saturday, May 10, 2008

DDOS Mitigation functionality - Tips for Admin

There is some discussion going on in focus-ids mailing list on DDOS attack mitigation. That discussion prompted me to write this article. ISIC, UDPSIC, TCPSIC, ICMPSIC are some tools used to measure the network security device effectiveness of detection and mitigation of DDOS attacks. As we all know one of the main intentions of DDOS attacks is to make the service, network or target unavailable. By looking at the packets, you can't see the difference between normal genuine packets and packets generated by DDOS attacks. This makes it difficult to stop these attacks based on signature based methods.

One of the properties of many of DDOS attacks is that they try to make the discovery of source of attack difficult to find. There are two types of DDOS attacks that are common.
  • Spoofing of source IP address in the packets: DDOS attacks are generated by spoofing the source IP address of the packet. ISIC, UDPSIC, TCPSIC and ICMPSIC tools simulate these kinds of attacks. Any packet that is sent back to the source does not reach the attacker. Due to this, TCP based sessions don't get established. Note that non-TCP sessions don't have connection establishment phase.
  • Botnets : The attacker instructs the agents which were installed on compromised hosts across the globe to bombard the target. Attacker keeps changing the hosts that attack the target. Thereby, in effect making the source discovery ineffective.
Now, in addition to Botnets, there is a third kind of DDOS attack. Recent DDOS attack on cnn.com by Chinese hackers is one example. Here too, sources are known, but there are many. Note that here the attack on cnn.com is not generated by botnets, but supposedly by out of patriotism. Some in China felt that CNN is biased in its reporting on Olympic torch and its linkage to Tibet religious freedom. As I understand, it was simple attack, where it connects to cnn.com website and accesses some URL for every 2 seconds. This attack executable was distributed and many home users in China, out of patriotism, executed it.

DDOS attack incident detection may be easier, but mitigation is difficult. If the intention of the attack is to consume the bandwidth of target site, there is nothing much the target network administrator can do. Target company/organization needs to depend on its ISP to block the flood of packets. Gathering as much information as possible and providing that information to ISP is one of the things the administrators can do.

The current trend of DDOS attacks go beyond consuming the link bandwidth. With less number of hosts participating in the DDOS attack, these attacks consume the CPU, memory bandwidth of target networks/servers. I feel the network security appliances providing DDOS attack mitigation functionality can help in this scenario. It can not only provide detection, but can stop bombardment of servers.

There are multiple products *DDOS mitigators* in the market claiming to solve some of above problems. Many IPS boxes also support this feature.

If you are hosting some servers, you can be a victim. As an administrator, I look for following features from these appliances.

DDOS attack consumes 1Mbps link by making 512 connections/sec (approximately) . Any DDOS mitigator, ideally should be able to process 512 connections in every second for 1Mbps link. If the connection is maintained for 20 seconds (which is typical), then the connection capacity needs to be 10K. For 100Mbps link, DDOS attack mitigation appliance needs to support 51200 connections/sec and should have 1M session capacity. With this capacity and connection rate, it can do better job of protecting internal networks/servers/other stateful security devices without itself getting bogged down.

DDOS mitigators are expected to limit the amount of traffic that goes to the internal servers/machines/networks etc.. Each resource in the network would have some limitations on how much traffic, connections, connection/sec it can take. Adminis, once they make a list of resources and their limitations, should be able to configure DDOS mitigators. DDOS mitigators must ensure that the resources are not flooded and it should shape the traffic accordingly. DDOS mitigators need to provide features like:
  • Ability to configure
    • connections/sec
    • Packets/sec
    • Bytes/sec
      • On per resource basis - Server/machine basis, Network basis
      • From a given source with respect to IP address range, Subnet.
  • Ability configure to filter traffic on combination of 5 tuples.
As you have observed, admins not only would like to shape the traffic to internal resources with respect to connections/sec, maximum number of connections and throughput, but also would like to have these limits from particular source(s). Yet times, I also observe that there is a requirement to limit the amount of traffic within each 5 tuple connection, between any IP address combination. Mitigators need to provide this flexibility without expecting admins to create many rules. Many times, it is not possible to create rules with all combinations of IP addresses. DDOS mitigators need to provide flexibility of creating rules with ranges, subnets along with provision to configure granularity to apply the specified traffic rates. For example, admin should be able to configure ANY to 'Internal HTTP Server IP addresses' with 10 connections/sec for every combination of source IP and destination IP. If there are 100 different sources are trying to access internal HTTP Servers, DDOS functionality should be able to rate limit the number of connections to 10/sec for each of sources independently.

As with any security device, it must also support multiple zones and provide flexibility with respect to zones. In case of hosting environments, provider may be servicing multiple customers. So, virtual instance, with each instance belonging to a customer is needed. In case of Enterprise environments, normally only one virtual instance would be used.

Flexibility is expected to be provided to disable limiting of traffic for some source networks. These networks could be networks belonging to remote offices. This feature is called white listing.

Ofcourse, it is expected that DDOS mitigators provides facilities to stop half open connections by providing TCP syn flood protection, UDP based session exhaust protection facilities, facilities to configure service inactivity timeouts for interactive protocols etc..

Thursday, May 8, 2008

UDP Broadcast Relay : TR-069 Support

UDP broadcast relay functionality became very popular due to NetBIOS. Broadcast packets are used by NetBIOS for name resolution. Windows Network neighborhood is one functionality that makes use of NetBIOS name service. Due to broadcast functionality, NetBIOS name service works within subnet. If there are multiple subnets, then WINS Server is required. Broadcast relay functionality in routers separating subnets eliminates the need for WINS Servers. Name resolution using UDP broadcast relay function can even be extended to networks in remote offices by relaying broadcast packets over VPN tunnels.

UDP broadcast relay functionality in routers receives broadcast packets and send to other subnets by replacing destination IP of original packet with destination subnet broadcast address.

Since firewall/VPN gateways are also routers, this functionality is implemented in many firewall/VPN gateways. These gateways provide control for administrators on type of broadcast packets to relay and destination subnets to relay to. Multiple of these rules can be created for different types of broadcast addresses.

Configuration consists of set of rules. Each rule containing incoming braodcast IP address/interface and relay subnets/interfaces. Rules can be created and deleted by administrator.

TR-069 profile:
  • internetGatewayDevice.security.VirtualInstance.{i}.UDPBroadcastRelay.{i} PC
    • name: String(32), RW, Mandatory - Identification of broadcast relay rule. Once rule is created, this value can't be changed.
    • description: String(128), RW, Optional
    • enable: Boolean, RW, Mandatory: Value 1 indicates the record is enabled and 0 is used to disable.
    • incomingBroadcastAddressType: String(16), RW, Mandatory - Indicates whether the broadcast address is represented as an IP address or Interface identifier. Takes one of the values "ipaddress", "interface".
    • incomingbroadcastAddress: String(128), RW, Mandatory - Either dotted IP address or Fully qualified TR-069 instance of VLAN, LANDevice, WANPPPConnection or WANIPConnection etc.
    • incomingbroadcastPort: Integer, RW, Mandatory - Destination Port of incoming broadcast packet.
    • internetGatewayDevice.security.VirtualInstance.{i}.UDPBroadcastRelay.{i}.relayTo.{i} PC
      • relayBroadcastAddressType: String(16), RW, Mandatory - Indiacates whether the relayTo broadcast address is specified as IP address or interface - Takes one of the values "ipaddress", "interafce".
      • relayBroadcastAddress: String(128), RW, Mandatory - Either dotted IP address or fully qualified instance of interafaces from VLAN, LANDevice, WANPPPConnection or WANIPConnection.
In case of remote subnets, relayBroadcast is specified as remote subnet broadcast IP address. If the subnets are directly attached to the router, then interface names can be used in relayBroadcastAddress field.

Monday, May 5, 2008

Packet Ordering requirements in network infrastructure devices

One of the goals of Internet is to maintain the packet ordering.  This goal requires that  infrastructure devices don't change the order of packets. That is, ingress packets from a port go in same order on egress ports after they go through the processing.

There are many types of infrastructure devices that come-in in the way of packets.  Some Infrastructure devices are now not only do routing or switching, but also do many other functions such as  deep packet inspection,  firewall,  Application detection,  IPS,  IPSec  VPN etc..  So, it becomes difficult for network infrastructure devices to keep up with this requirement.  Based on my understanding with some of deployments, this requirement is indeed relaxed.  My understanding of packet ordering requirements now is:

  •  If the infrastructure device is pure router or bridge (switch),  it is expected that all ingress packets from a port go out in the same order on egress ports.  Routers and switches might not be sending all ingress packets from a port to one egress port. That is, there is no one-to-one correspondence between ingress port to egress port.  Packet order is expected to be maintained across the ingress packets from a port which are going to an egress port.  That is, if set of packets from port1 are going to port2, then it is expected that the order of this set of packets is maintained. Routers and switches are not expected to maintain packet order across the packets which are going to different egress ports.  
  • It is difficult to see any router/switch without traffic prioritization function on the egress port.  Routers classify packets to different priority bands based on DSCP value and switches do this either based on DSCP value or COS value found in 802.1q headers.  Traffic prioritization function sends higher priority packets before the lower priority packets.  So, packet ordering requirement is not extended to packets belonging to different priorities. But, the packets from a ingress port belonging to same priority going to an egress port must go out in the order they were received. 
  • Firewall, IPS, DPI and other stateful applications work on 5-tuple sessions.  Here the packet ordering is expected to be kept intact within session.  There is no requirement to keep the ordering across sessions. This works fine for VOIP and other real-time traffic scenarios.  It is important to keep the jitter to low. Since jitter buffering is done on per session basis by VOIP end points, ensuring packet order is not changed within session seems fine.
  • IPsec is tunneling protocol.  One tunnel may carry many sessions.  Since sessions are not visible in the tunnel, it is required that IPsec function maintains the packet order within each security association (tunnel).  
Based on some comments from some service providers, I got an understanding that 0.001% of packets going in different order is acceptable.

Any comments?

    Thursday, May 1, 2008

    Configuration Synchronization between TR-069 devices and ACS

    Most of the times, configuration for TR-069 based devices is done at the provider end (ACS end). ACS pushes the configuration when device connects to it. TR-069 also provides facility for subscribers to change the configuration locally, but providers have control over which functions of device can be changed by subscriber. 'SetParameterAttributes' RPC method defined by TR-069 specification carries the 'Access control' attributes of parameters.

    When allowed to change the confguration by subscribers, if no caution is taken, the configuration view can be vastly different between device and ACS over time. That might result to wrong conclusions by people administring the device. Fortunately, TR-069 provides a way for ACS admin to notifiy the changes to ACS when changes done by local subscribers. Like in 'Access Control' , notification attributes are also sent using SetParameterAttributes RPC method. If Synchronization of configuration is required, it is necessary that ACS sets notification attributes on parameters which are allowed to be changed by local subscriber.

    It appears that notification of configuration change works smoothly between devices and ACS on modifications of existing parameters. But, it runs into rough weather when local subscriber adds a record in table objects or deletes a record from table object. Based on my not-so-much experience on different devices and ACS systems, it seems that this was not taken care well and TR-069 also not helping in this regard.

    Let us examine different approaches that are possible.
    • Approach 1: Definition of one parameter 'Number Of Entries' for every table object. ACS can set the notification attribute on this parameter. Whenever local subscriber adds/deletes an instance from the table object, this parameter gets incremented/decremented. Since notification is set on this, ACS gets informed. It has performance disadvantage. When 'Number of Entries' parameter is changed, ACS gets notified with new value of this parameter. Now it is responsibility of ACS to find the difference between its configuration and configuration in the device. That might require walking through the table in device and making modifications to its database. Another disadvantage is that, this approach does not work if there is any existing data profile that has table objects without this special parameter 'Number of Entries'.
    • Approach 2: Setting the notification attributes on 0th instance: It is observed that many devices don't use 0th instance for records in table objects. This instance can be used to set the notification. If notification is set on 0th instance, it can be treated to indicate Add/Deletion of records to/from the table. Whenever new record is added/deleted by subscriber, device needs to check the notifications on 0th instance. If it is set, it can send instance number of newly created record or instance number of deleted record as parameter and value 1 for 'Add' and 2 for 'Delete' as part of 'ParamterList' of Inform method. By this, ACS knows instance number of newly added record or deleted record. In case of newly added record, it can issue 'Get' operation on specific instance number to read values of parameters. In case of deleted record, it can remove the record from its database.
    Approach1 does not require any changes to the TR-069 specification, but has disadvantage of performance issues if the number of instances in a given table are high. Approach 2 is clean, but requires addition to TR-069 specification or clarifications in TR-069 specification.

    If anybody choose to use TR-069 based device management for Enterprise devices, I strongly suggest to go with Approach 2.

    TR-069 Protocol and applicability in Network security devices - Opinion

    Recently a security software developer asked me a question. He wanted to know whether TR-069 protocol is suitable for managing network security devices. Further he wanted to know what enhancements I would like to see in TR-069 protocol, if any.

    I am sure that quite a bit of thought had gone into defining the RPC methods and their usage. It is certainly easier to create new data models and software associated within device and ACS implementations when you compare with SNMP based management.

    TR-069 is certainly suitable for network security application configuration. But there are some points developers should be aware of.

    AddObject method:
    This is one thing I have difficulty in understanding the intentions behind in defining 'AddObject' method in TR-069 protocol. 'AddObject' method does not take any parameter values. To add a row in a table, it requires two methods from ACS - AddObject followed by SetParameterValues method. Due to this, it has some complications and additional logic in TR-069 implementations in security devices.

    Configuration of many security functions such as firewall, IPsec, AntiX and IPS consists of multiple rules (rows). Each row has its identification and value of these identification parameter (Let us call it as Instance Key) are typically configured by administrators. This identification parameter must be unique within the table. Some configuration databases identify the rows by integers and some identify by strings. Once the record is created, security software does not allow changes to identification values. Of course, other parameters of the rule (row) are allowed to be changed.

    TR-069 defines its own identification for each row in the table called instance number. Instance number is of integer type. Instance number is returned in AddObject response. This can't be used to identify the record in a table as far as security applications are concerned. I guess this instance number is mostly for TR-069 protocol. Since record name/ID (Key) is not part of AddObject, record can't be created in the security applications upon reception of AddObject by the device. It needs to wait until SetParameterValues method is sent by ACS with the record identification value.

    Though this is not a major hurdle of using TR-069 in configuring security applications, but it could have been avoided if AddObject allows setting up of parameter values.

    I don't see any reason why separate instance number is required as it is defined in TR-069. By avoiding instance number and replace with application Instance Key, it provides many advantages:

    • Device does not need to maintain the state between "Add Object" and "Set Parameter Values" RPC methods to create the instance (row) in the applications.
    • Device complexity increases as it needs to maintain the mapping between instance numbers and application instance key values even across reboots to maintain consistency between devices and ACS.
    • ACS does not need to maintain the mapping of instances with each device instance numbers. Since ACS is expected to manage thousands of devices, it needs to maintain this run time mapping information which limits the ACS scalability.

    I like to see the enhancement the TR-069 protocol where it avoids instance number approach for the rows. As a matter of fact, local management engines don't have special instance numbers for each row of tables. One of the parameters itself used as the key to identify the instances in the table. With that in mind, I like to see following approach.

    • One of the parameters in each table object in data model can be identified as the instance key. It requires only one parameter to identify the instance at that level due to tree structure of the data model and nested table objects. Table object is already identified uniquely due to upper (parent) table objects. Due to this, one parameter is good enough to identify the instance within the table object. By having one of data model parameters as the key parameter, ACS also can identify the row by this parameter value.
    • Today configuration transaction is limited to one SetParameterValues method. Adding a row is outside the transaction today. To make the adding instance also as part of transaction, row creation along with its parameters and values should be made part of SetParameterValues RPC method and eliminate AddObject RPC method.
    • Any modifications to the instance at later time need to send the key value along with the parameter names. Existing instance number position in the parameter name can be replaced with the key value.

    Mandatory parameters in one RPC method:

    Security application makes some configuration parameters 'must to have'. Some of these configuration parameters can have default values, but not all. My experience shows that many mandatory parameters don't have any default values. Due to this, security application software typically expect all the mandatory parameters values as part of its record creation. As discussed before, actual object creation in security software module is done when the first 'SetParameterValue' RPC method is received (after AddObject). So, it is expected that ACS sends SetParameterValues method with all mandatory parameters and its values. TR-069 protocol does not specify any rules in regards to this. Due to this, ACS systems have a choice of sending these parameters in multiple RPC methods. It complicates the device implementation, where it needs to wait until all mandatory parameters are received. This is not practical as device does not know when to give up waiting for the mandatory parameters. I believe, there is an understanding that ACS systems always send most of the parameters' (including mandatory parameters) values in one RPC method.

    Since this information is not documented, you might often hear from your QA guys that it is not working as expected. I wish that TR-069 specification clarifies this clearly. By the way, if the approach as indicated in above section (under AddObject method section) is implemented, then this is a mute point.

    Default Values for Parameters

    TR-069 data model guidelines don't clearly specify that all optional parameters of table objects and normal parameters must have default values.

    Many security devices and I am sure other devices also have requirement of resetting to factory defaults upon user action. In addition, some devices provide option for administrators/users to set all/some optional parameters to default values on existing configuration. To facilitate this, I think TR-069 data model definition must mandate setting up the default values for non-mandatory parameters.

    Factory Defaults

    Many device applications are shipped with default configuration. Many devices also support resetting the device to factory defaults via both hardware and software means. TR-069/TR-104 does not specify the ways to define the factory defaults. Due to this, when device resets its configuration to factory defaults, ACS does not know the configuration of device until it reads the configuration from the device by traversing through the data model template tree.

    An additional benefit of defining factory defaults in a standard fashion also helps device to do the factory reset from central place (TR-069 client in device) without letting each application to do its own factory reset.

    Validations

    TR-069/TR-104 data model definition facilitates the parameter value validations such as data type, possible enumeration values, min and maximum length in case of string and base64 data types, range of values in case of integers. But, there are no validation definitions on number of instances that can be created. Many applications limit the number of instances on per table object basis. Having this number known to ACS facilitates the validation at the ACS itself. Note that this limit might be different from one device type to another type. Some generic software application tune this number based on amount of memory available on the device it is being installed. Some times, this limit is also configurable by the local administrator.

    TR-104 mandates the definition of "Number of Instances" parameter for each table object. In addition, mandating one more parameter "Max Instances Supported" for each table object would sole above problem.

    Nested Table Objects and Transactions

    TR-069 protocol dictates that all parameters within the "Set Parameter Values" method either set in entirety or reject every thing. For all practical purposes, a configuration transaction is limited to one instance of the RPC method. Device implementations ensures this by checking with each application using "Test" functions of applications first for all configuration parameters and then followed by setting the configuration in the applications if all "test" functions are successful - basically tests followed by sets as in SNMP world. Applications' "test" functions also has its own limitations, especially if nested table object instance parameters are being checked if parent table object instance is not yet available in the applications, even though it is available in the same transaction. Since "Set" functions are expected to be called only after all "test" functions are successful, parent table instance would not have been created in the applications. Hence "test" function of nested table object instance would return FAILURE. To avoid complicated "test" functions, I feel that TR-069/TR-104 should put guideline for ACS developers to avoid setting parameters of nested table object instances along with parent table object instances in the same RPC method if it is followed by appropraite "AddObject" methods.

    Nested Table Objects and Special cases

    Even though it is true for all cases that the nested table instance can't exist without parent table object instance, in some cases it is required that instances of some table objects are created with atleast one instance of child table object. This is true specifically in security rules such as firewall ACL rules and IPSec SPD rules. ACL and IPSec SPD can be represented as table objects in the data model. These rules contain 5-tuple selectors with Source IP, destination IP represented by multiple sets of network objects. That is, source IP (and destination IP) field of the ACL rule is a table object to represent many networks. Administrator would be allowed to add more networks to the "source IP" and "destination IP" tables at later time once ACL rule is created. But when ACL rule is created, security application expect at least one network in "source IP" and "destination IP" fields of ACL rule object.

    Today data model does not allow this kind of relationship. Due to this, ACS developers don't know about this dependency. If not taken care of by ACS then the rule creation will be rejected by device continuously.

    It is necessary that data model definition allows specifying this special relationship. One simple proposal is to indicate for each table object on whether at least one instance of this is needed to create instance of immediate parent object.

    Ordered Lists

    Rules in many security applications such as firewall, IPsec VPN, Web application firewall etc.. are ordered. That is, the rules are searched from top rule to bottom rule for matching in the table object and hence the rules must be ordered in the list with top rule being the highest priority rule and bottom rule being the lowest priority rule. Administrators, normally in the course of administration need to add rules in between the list, top of the list, bottom of the list or change the priority of the rules in the list. TR-069 does not provide any RPC methods to change the priority of the rules. Due to this, the data models defining ordered list are expected to have 'priority' parameter. Administartor changes value of this parameter to change the priority of the rules. Devices are expected to form the ordered list based on the value of priority parameter acorss multiple instances in the table object. Though it works fine in cases where the configuration changed rarely, it poses performance penatlies on the device. Let us get into this details.

    • Security applications revalidates run time states every time there is a change in the rule list. Revalidation can be costly in systems where millions of states are created. For example, firewall application in Enterprise/Carrier market segments support more than million sessions where each session corresponds to TCP/UDP/IP connection. If the rules are modified or their order is changed, firewall application is expected to revalidate the sessions and remove sessions that are no longer valid with respect to new rule base.

    If a rule needs to be added in between, then the priority value of rules below the current rule need to be changed by at least 1 value less than the current values to accomodate the rule. That is, if there are 500 rules and if one rule is added at position 250, then bottom 250 rules would undergo change. This will raise 250 more additional parameters being modified. This results to 250 changes in the rule base in device. So, there would be 250 revalidations of Millions of sessions. To avoid this performance penatly, it is necessary that each ordere list (table object) has one parameter outside the table object "Revalidate now". ACS sets this value at the end of all priority parameter values in SetParameterValues method. When this parameter is set to 1, device is expected to initiate revalidate process. It avoids revaldation logic in the device for each rule change. This parameter value is not expected to be persistent. Its purpose is to initiate revalidation action. Reading of this parameter should always give value 0 even after it is set.

    ACS also needs to know the objects that are ordered in nature. ACS also needs to know the parameter name used to order the list. If ACS does not know this information, then the administartor is forced to change the priority value of all 250 rules in above example manually. That is not some thing which administrators (users) would enjoy. ACS, by knowing this informaton, can change the priority values itself based simple user actions internally and communicate with device appropriately. ACS at the end of any action on the ordered list should send set the "Revalidate Now" parameter. To faciliate this, data model defintion should identify the ordered lists. My proposal is to introduce an attribute "Ordered list" to the table objects. This attribute will not be present for non ordered table objects.

    Above guidelines on data model work fine for existing TR-069 protocol. Future TR-069 enhancements needs to think of differnet approach for ACS scaling by introducing new commands in "SetParameterValues" RPC method. SetParameterValues RPC method as explained above also need to take 'Add' command in addition to implicit 'Modify' command supported today. My proposal is to introduce "Move" command for changing the order of instances in the table. "Move" command takes the identifcation value of target row and command attributes such as "first", "last", "before", "after". If the command attribute value is one of "before" or "after", then it also takes the relative row identificatin value. "First" value indicates to put the target row in the beginning of the list. "Last" value put the target row at the end of the list. "Before" keeps the target row immediately before the relative row and "After" keeps the row immediately after the relative row. This proposal eliminates the need for having "priority" parameter. It also reduces the need for sending multiple parameters when the row is added in between. Note that "Revalidate Now" parameter need to be still present if more than one row is being added or more than one "moves" are initiated by administrator.

    Log Export

    One of main features of network security devices is logging. Detection/Protection of internal resources as important as notifying administrators of any violations. When central management is used to configure network security devices, it is also expected that central management console provides facilities to show alerts, notifications, analyze logs and generate reports. TR-069 protocol does not specify any protocol elements to send the log messages to ACS.

    You may need to device your own protocol (such as syslog over IPsec) for sending logs. Also, you need to work with ACS vendors to corelate the logs with configuration.