Random technical bits and thoughts: nginx

Showing posts with label nginx. Show all posts

Friday, January 1, 2010

nginx - Hashing scheme

ngx_http_upstream module is using ngx_hash module for searching the HTTP headers specific function pointers to take appropriate action. Intially when I had gone through it ngx_hash.c, I was confused due to the differences between this and the way I have used hash lists before. I have used hash lists for dynamic insertion, deletion in addition to search. For supporting all these operations, I am used to develop hash lists that contain set of buckets, with each bucket pointing to one of different types of linked lists or binary trees. Any operation starts with finding out the 'hash' value (typical hash algorithm used in networking world is Bob-Jenkins algorithm), get hold of head of list/tree and then do operation such as addition, deletion or search. In case of addition, a node is created and add to the list/tree. In case of delete operation, node is deleted from the list/tree of the bucket. In case of search operation, it return the pointer to the node.

nginx hash, though in principle is same as typical hash lists, but it has significant differences. It appears that 'ngx hash' is not meant for applications that add and remove elements dynamically. They are specifically designed to hold set of 'init time' elements arranged in hash list. Main purpose seems to be speeding up the search of one time added elements. 'ngx_hash' utility module provides mainly two functions : ngx_hash_init() and ngx_hash_find(). I did not find any hash destroy function. It may be that it is never need to be destroyed at run time. As part of process exit, Linux process anyway reclaims all the memory.

Little bit deep into the hash list implementation, it is allocating memory for hash table of type 'ngx_hash_t' which has elements 'buckets' and 'size of hash table'. Bucket is double pointer. It first allocates memory for bucket array, that is, memory to hold pointers, that is first dimension of two dimensional array. Size of this array is equal to the hash size. Second dimension to this array needs to point to all collision elements. As I said before, all elements that are put in the hash list are known while creating the hash list itself. No dynamic addtion or deletion is possible here. So, it allocates memory to hold all the elements across hash lists at one time. Then it puts right offsets to appropriate places in the buckets array. Each element is of type 'ngx_hash_elt_t', which has 'name' (key), lengh of the key and pointer to application specific value information. ngx_hash_init() is relatively complicated function where it determines the hash list size needed to allocate buckets array and then it figures out the size of memory required to hold the hash list elements. ngx_hash_find is relatively simple where it finds the right bucket from the hash value and searches for the collision elements in array form.

ngx_hash module also has some logic to search for wild card based strings where wild card is present only at the head or tail of strings. I will do some writeup on that at later time.

Wednesday, December 30, 2009

Red black tree library - nginx

See wikipedia link on rb tree description. Red black tree provides improvement over binary trees that makes it faster to search and also faster to insert/delete nodes. It applies four principles as described in wiki link to balance the tree after every addition and deletion. These principles make it search performance predictable. These rebalancing principles are not expensive and hence insert and delete operations are also faster.

Nginx web server uses rb trees to cache SSL sessions across threads, to store the open file cache entries and for many other purposes.

Nginx rbtree implementation does not include all binary tree operations. It leaves the binary tree insertion to the applications. It does rebalancing of the tree and also it provides delete operation API function. It lets the application to do the 'search' on binary tree. Note that search operation on rbtree is same as search operation on binary tree.

Though implementation of 'rbtree' implementation looks complicated (mainly if you have not gone through the algorithm before), its usage by application is simple. Applications need to define the root and put the node in the structure elements which it wants to put in the rbtree.

Following type definition of the rbtree root is defined by 'rbtree' implementation:

struct ngx_rbtree_s {
    ngx_rbtree_node_t     *root;
    ngx_rbtree_node_t     *sentinel;
    ngx_rbtree_insert_pt   insert;
};

root : This is root of the binary tree.
setinel: This node is created to empty node for leaves. See wikilink to understand the usage of this. This is really dummy node memory.
insert: This is function pointer to application function which does insertion into the binary tree.

struct ngx_rbtree_node_s {
    ngx_rbtree_key_t       key;
    ngx_rbtree_node_t     *left;
    ngx_rbtree_node_t     *right;
    ngx_rbtree_node_t     *parent;
    u_char                 color;
    u_char                 data;
};

ngx_rbtree_node_s is expected to be defined in each element that application needs to make it part of rbtree.

Key: This is the key. Applications typically put the hash value of the actual application key in here. This is used to traverse to left and right based on key value. If two nodes have same key value, then the entire application key is used to make decision.   I guess this is done to improve performance. Application key can be long. Integer based comparisons are faster.

left, right, parent: Are used to maintain the node in the tree. left and right point to the children and parent point to the parent node.

color: Color of the node - Red or black.

data: Start of application data. It is not used by rbtree.

API functions:

ngx_rbtree_init(tree, s, i) : This macro initialize the rbtree instance. 'tree' is pointer to the 'rbtree' instance of type ngx_rbtree_t. 's' is pointer to sentinel node and 'i' is pointer to the insert function. This is the first macro that needs to be called by application to initialize the rbtree instance.
void ngx_rbtree_insert(ngx_thread_volatile ngx_rbtree_t *tree, ngx_rbtree_node_t *node) : This function is called by application to put a new node in the rbtree. This function internally calls the insert function provided to ngx_rbtree_init before to insert the node as per application key in the binary tree. Then ngx_rbtree_insert function continues to do the rebalancing of the tree.
void ngx_rbtree_delete(ngx_thread_volatile ngx_rbtree_t *tree, ngx_rbtree_node_t *node): This function is called to remove 'node' from the rbtree 'tree'. It removes the node from the binary tree and reblances it as per rbtree principles.

Tuesday, December 29, 2009

nginx - Module Documentation - Some notes

I was going through the nginx module development guide. While I was going through the example module code, I did not understand how the module registers itself to the nginx infrastructure. For example, the upstream module has following structure initializations.

static ngx_command_t ngx_http_upstream_commands[] = {

    { ngx_string("upstream"),
      NGX_HTTP_MAIN_CONF|NGX_CONF_BLOCK|NGX_CONF_TAKE1,
      ngx_http_upstream,
      0,
      0,
      NULL },

    { ngx_string("server"),
      NGX_HTTP_UPS_CONF|NGX_CONF_1MORE,
      ngx_http_upstream_server,
      NGX_HTTP_SRV_CONF_OFFSET,
      0,
      NULL },

      ngx_null_command
};

static ngx_http_module_t ngx_http_upstream_module_ctx = {
    ngx_http_upstream_add_variables,       /* preconfiguration */
    NULL,                                  /* postconfiguration */

    ngx_http_upstream_create_main_conf,    /* create main configuration */
    ngx_http_upstream_init_main_conf,      /* init main configuration */

    NULL,                                  /* create server configuration */
    NULL,                                  /* merge server configuration */

    NULL,                                  /* create location configuration */
    NULL                                   /* merge location configuration */
};

ngx_module_t ngx_http_upstream_module = {
    NGX_MODULE_V1,
    &ngx_http_upstream_module_ctx,         /* module context */
    ngx_http_upstream_commands,            /* module directives */
    NGX_HTTP_MODULE,                       /* module type */
    NULL,                                  /* init master */
    NULL,                                  /* init module */
    NULL,                                  /* init process */
    NULL,                                  /* init thread */
    NULL,                                  /* exit thread */
    NULL,                                  /* exit process */
    NULL,                                  /* exit master */
    NGX_MODULE_V1_PADDING
};

ngx_http_upstream_commands and ngx_http_upstream_module_ctx are referred in ngx_http_upstream_module.   So far so good. I expected ngx_http_upstream_module to be referred in some other structure or I expected this to be registered with infrastructure via some API function. I did not find it being referred anywhere.

Finally I figured out that ./configure shell script with the help of /auto/modules shell script is creating a C file called ngx_module.c.    See this shell script bit in /auto/modules file.

cat << END                                    > $NGX_MODULES_C

#include
#include

$NGX_PRAGMA

END

for mod in $modules
do
    echo "extern ngx_module_t $mod;"         >> $NGX_MODULES_C
done

echo                                          >> $NGX_MODULES_C
echo 'ngx_module_t *ngx_modules[] = {'        >> $NGX_MODULES_C

for mod in $modules
do
    echo "    &$mod,"                         >> $NGX_MODULES_C
done

cat << END                                    >> $NGX_MODULES_C
    NULL
};

END

Above shell script is creating C file with array of all module structure variables. When I executed ./configure, content of the file looked like this. Since this file is big, I have taken few lines out of this file to illustrate the output.

#include
#include
extern ngx_module_t ngx_core_module;
extern ngx_module_t ngx_errlog_module;
extern ngx_module_t ngx_conf_module;
extern ngx_module_t ngx_events_module;
extern ngx_module_t ngx_event_core_module;
extern ngx_module_t ngx_select_module;
extern ngx_module_t ngx_poll_module;
extern ngx_module_t ngx_http_module;
extern ngx_module_t ngx_http_core_module;
extern ngx_module_t ngx_http_upstream_module;
extern ngx_module_t ngx_http_static_module;
extern ngx_module_t ngx_http_autoindex_module;

ngx_module_t *ngx_modules[] = {
    &ngx_core_module,
    &ngx_errlog_module,
    &ngx_conf_module,
    &ngx_events_module,
    &ngx_event_core_module,
    &ngx_select_module,
    &ngx_poll_module,
    &ngx_http_module,
    &ngx_http_core_module,
    &ngx_http_upstream_module,

NULL

};

Core module seems to be using ngx_modules array to get hold of different function pointers of modules to set the configuration and other operations.

It appears from the source code that there is no dynamic addition of modules. ./configure script is only method available of module registration. I feel this is one drawback nginx has with respect to Apache in my view. Ofcourse, nginx has several advantages such as:

Even driven model: That is one thread can handle multiple connections simultaneously.
Multiple threads to take advantage of multiple processors.
Usage of 'epoll' mechanism.

Anyway, I thought I would write this notes for people who also may wonder while going through the documentation on how module gets registered with nginx infrastructure. Hope it helps.

Random technical bits and thoughts