Sunday, December 20, 2009

Futex - fast user space Mutex

Futexes from functionality perspective same as other Mutex mechanisms in Linux user space processes, except that these are faster. I am not aware of any performance reports. I would love to put reference to any performance reports in this blog. Let me know.

Futex can be used for both Mutex purposes as well as for synchronizing the processing across multiple threads/processes. They can be used in place of read/write mutexes and semaphores. Futex also can be used in place of 'pthreads conditional vairables' for synchronization. Like semaphores, Futexes also can be used across threads or across processes. Just for information, threads within a process can be run in different CPU contexts simultaneously. That is, thread1 in a process may be run under Core1 while thread2 is being executed in core2 context. But a given thread at any time runs only one core context.

Futex mechanism internally makes use of Kernel waitqueue mechanism. 'Wait Queues' are used to make thread/process wait and wake the thread/process.

Futex does not replace spinlock/spinunlock mechanism. Spinlock/spinunlock mechanism is good for locking very small portion of the code. In general my preference is use mutex mechanism which is fast and make the caller sleep when there is contention so that CPU can execute some other processing. As you all know spinlock/unlock mechanism spins the caller until contention is gone. It is waste of cycles. But, it is useful when you know that 99% of the time there is no contention and you need very fast mutex mechanism.

Please check the Futex man page on in depth description of Futex system call. You can google for it or use this link here. I am copying some text from this man page here.

Each futex works on a integer variable. Main operations are FUTEX_WAIT and FUTEX_WAKE. Futex call syntax is given below.

int futex(int *uaddr, int op, int val, struct timespec *timeout, int *uaddr2, int val3)

  • uaddr : Pointer to the integer variable.
  • val: value of the integer variable
  • op: Takes operation code such as FUTEX_WAIT and FUTEX_WAKE: Even though there are other operations supported by futex call, this discussions limits itself to FUTEX_WAIT and FUTEX_WAKE. My understanding other operations except for FUTEX_CMP_REQUEUE are buggy.
  • timeout: Maximum time to wait when futex is issued with FUTEX_WAIT. If 'wakeup' does not happen within this time given by timeout, call returns with error ETIMEDOUT.

One important and useful point to consider in FUTEX_WAIT operation is the check it does between *uaddr and val. If both of these values are not same, futex with FUTEX_WAIT operation returns immediately. This property can be used by external logic to speed up the Mutex operation.

Since contention is not common in real applications, Futex assume that kernel waitqueue mechanism is used when there is contention. That is, futex call is expected to be made only when necessary. The property of FUTEX_WAIT operation described above comes in handy.

Let us define two functions. Futex_Lock(int *FutexVal) and Futex_Unlock(int *FutexVal).

Lock and Unlock logic should meet following requirements:
  • Must ensure that critical code section is not entered by more than one thread.
  • Must ensure that FUTEX_WAIT operation uses Kernel Wait Queue mechanism only when the lock is taken by somebody before.
  • Should ensure that FUTEX_WAKE operation is issued only when there are waiters.
  • Ofcourse, lock/unlock operation is fast.
Article on "Futexes are tricky" by Redhat really gave me the perspective on how to work with Futexes. Three state concept was taken from this paper. Please see the full article here.

#define FLOCK_FREE 0

Futex_Lock(int *futexVal)

/** If there are no users (threads) accessing the critical section, proceed further **/
if (fatomic_cmp_and_exchange (futexVal, FLOCK_FREE, FLOCK_ONE_ACTIVE_USER) == FLOCK_ONE_ACTIVE_USER )

/** If control comes here means there is at least one active user **/
/** FutexVal has either FLOCK_ONE_ACTIVE_USER value or FLOCK_MULTIPLE_ACTIVE_USERS value **/
while ( fatomic_exchange(futexVal, FLOCK_MULTIPLE_ACTIVE_USERS) != FLOCK_FREE) /** If LOCK becomes free in the mean time, get out **/
if ( futex(futexVal, FUTEX_WAIT, FLOCK_MULTIPLE_ACTIVE_USERS, NULL, NULL,0) == 0 ) /** This call waits, it returns 0 when somebody wakes it up. In the mean time if lock free was called by somebody else, it returns with some other return value and goes to the loop again **/


/** control comes here, when the lock is free, but the value is set to FLOCK_MULTIPLE_ACTIVE_USERS. In this case, even though lock is free FLOCK_WAKE would be called which is un-necessary, but it is okay as it does not harm and this condition happens very rarely. **/


Futex_UnLock(int *futexVal)
If there is only one active user, there is no need to wake up as that thread already got the lock. Atomically set the futex val to FLOCK_FREE and get hold of previous value **/

if ( fatomic_exchange(futexVal, FLOCK_FREE) != FLOCK_ONE_ACTIVE_USER)
futex(futexVal, FUTEX_WAKE, 1, NULL, NULL, 0); Wake one waiter.

fatomic_compare_and_exchange(ptr, old, new) is expected to change the *ptr to new if *ptr matches with old. It returns the new *ptr value.
fatomic_exchange (ptr, val): It puts val in *ptr and returns the old *ptr value.

For completeness, I am copying some code piece from Linux source code for these atomic operations on PPC.

unsigned long fatomic_compare_and_exchange(void *addr, unsigned long old,
unsigned long new)

unsigned int old_val;

__asm__ __volatile__(

"1:\t" "lwarx %0,0,%1\n" /* load and reserve, copy current *addr to old_val */
"cmpd %0,%3\n" /* Compare old_val with old argument **/
"bne 2f\n" /* If both are not equal, get out **/
"stwcx. %2,0,%1\n" /* Else, put new value in *addr */
"bne- 1b\n" /* Go back and do above if reservation fails */
: "=&r"(old_val)
: "r"(addr), "r"((unsigned int)new),
"r"((unsigned int)old)
: "memory", "cc");

return old_val;

unsigned long fatomic_exchange(void *addr, unsigned long val)

unsigned int result;

__asm__ __volatile__(

"1:\t" "lwarx %0,0,%1\n" /* load and reserve, copy *addr into result */
"stwcx. %2,0,%1\n" /* Store val into *addr */
"bne- 1b\n" /* Go back and retry if reservation fails in between*/
: "=&r"(result)
: "r"(addr), "r"(val)
: "memory", "cc");

return result;


Whenever you require to protection some critical section, simply do following:

Declare one integer variable, say
int futexval = FLOCK_FREE;

.... critical section ...

No comments: