Sunday, December 27, 2009

Memory Barriers - Where are they required?

Memory barriers are also called memory fences. There are two types of memory fences - Read and write.
Memory barriers are required in Multicore processing environments.

When a programmer develops the code,  he expects the code to run sequentially.  Modern CPUs have multiple execution engines inside them  and they try to parallelize the serial code by itself, thereby executing the code out-of-order.  Also, while CPU is waiting for the data to be ready from the DDR,  it also try to execute the next instruction.  If there is any dependency in the code that is executed already before the old code, then the new code gets executed again.  That is, cores know the dependency of the code in a single thread and hence can revert or cancel the out-of-order code execution, thereby, maintaining the in-sequence order of the code.

When there are multiple cores,  this out-of-sequence code execution might give result to errors.  Wikipedia has got very good example and I am pasting that piece of code and explanation here:


Initially, memory locations x and f both hold the value 0. The program running on processor #1 loops while the value of f is zero, then it prints the value of x. The program running on processor #2 stores the value 42 into x and then stores the value 1 into f. Pseudo-code for the two program fragments is shown below. The steps of the program correspond to individual processor instructions.
Processor #1:
while f == 0
  ;
 // Memory fence required here
 print x;

Processor #2:
 x = 42;
 // Memory fence required here
 f = 1;
One might expect the print statement to always print the number "42"; however, if processor #2's store operations are executed out-of-order, it is possible for f to be updated before x, and the print statement might therefore print "0". Similarly, processor #1's load operations may be executed out-of-order and it is possible for x to be read before f is checked, and again the print statement might therefore print an unexpected value. For most programs neither of these situation are acceptable. A memory barrier can be inserted before processor #2's assignment to f to ensure that the new value of x is visible to other processors at or prior to the change in the value of f. Another can be inserted before processor #1's access to x to ensure the value of x is not read prior to seeing the change in the value of f.

As you can see,  processor #1 requires read memory fence and processor #2 requires write fence.

In Linux, memory barriers are implemented in following macros:

smp_rmb()  :  Symmetric Multi Processing & Read Memory Barrier
smp_wmb():  Symmetric Multi Processing & Write Memory Barrier.
smb_mb():    Symmetric Multi Processing * Memory Barrier - Both read & write memory barrier.

No comments: