knowit: Java - Volatile & Barriers

A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable

Paragraph from JCP - Java Concurrency in Practice

The visibility effects of volatile variables extend beyond the value of the volatile variable itself. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable. So from a memory visibility perspective, writing a volatile variable is like exiting a synchronized block and reading a volatile variable is like entering a synchronized block.

When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable.

Since volatile is overloaded with two meanings:
a) access atomicity; and
b) memory ordering
— you cannot get one without getting the other as baggage

Visibility Guarantee vs Mutual Exclusion vs Atomicity

volatile = Provides VG | ME not applicable | R/W are atomic like any other variable
synchronised - Provides VG | Provides ME | Provides synthetic / perceived Atomicity
Lock - Provides VG |
Semaphore -
Unsafe.CAS

Adding Atomicity to Volatiles

To add get&set operation (CAS) to volatile variables use AtomicReferenceFieldUpdater

Summary

Any data that is shared between thread needs a "memory barrier" to ensure its visibility.

Changes to any member that is declared volatile are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.

Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed.

Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.

Volatile showing Barriers and Instructions

[StoreStore]

[LoadStore]

x = 1; // volatile store

[StoreLoad] // Case (a): Guard after volatile stores

...

[StoreLoad] // Case (b): Guard before volatile loads

int t = x; // volatile load

[LoadLoad]

[LoadStore]

Since volatile loads clearly dominate most of the programs, sane implementations choose the case (a), emitting the StoreLoad barrier after each volatile store.

http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
http://jeremymanson.blogspot.com/2007/08/volatile-does-not-mean-atomic.html

http://mechanical-sympathy.blogspot.co.uk/2011/07/memory-barriersfences.html

volatile provides only visibility guarantee but not Atomicity

Below could fail

if (volatileBoolean) {
volatileBoolean = !volatileBoolean;
}

add volatile to share a value between threads

Using a plain volatile is simpler and more efficient if all you want to do is set or get the value. (But not get & then (conditional) set the value)

from other users

lazySet on atomic usually wins over volatile 'write'
lazySet is faster as it doesn't stall the CPU pipeline

Barriers

http://preshing.com/20120710/memory-barriers-are-like-source-control-operations/

Each type of memory barrier is named after the type of memory reordering it’s designed to prevent: for example, #StoreLoad is designed to prevent the reordering of a store followed by a load.

http://mechanical-sympathy.blogspot.co.uk/2011/07/memory-barriersfences.html

Memory barriers provide two properties. Firstly, they preserve externally visible program order by ensuring all instructions either side of the barrier appear in the correct program order if observed from another CPU and, secondly, they make the memory visible by ensuring the data is propagated to the cache sub-system.

Store Barrier

A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state visible to other CPUs so they can act on it if necessary

Load Barrier

A load barrier, “lfence” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU. This makes program state exposed from other CPUs visible to this CPU before making further progress.

Full Barrier
A full barrier, "mfence" instruction on x86, is a composite of both load and store barriers happening on a CPU.

#StoreLoad

A StoreLoad barrier ensures that all stores performed before the barrier are visible to other processors, and that all loads performed after the barrier receive the latest value that is visible at the time of the barrier.

Java Memory Model

A volatile field has a store barrier inserted after a write to it and a load barrier inserted before a read of it. Qualified final fields of a class have a store barrier inserted after their initialisation to ensure these fields are visible once the constructor completes when a reference to the object is available.

Atomic Instructions and Software Locks

Atomic instructions, such as the “lock ...” instructions on x86, are effectively a full barrier as they lock the memory sub-system to perform an operation and have guaranteed total order, even across CPUs. Software locks usually employ memory barriers, or atomic instructions, to achieve visibility and preserve program order.

Performance Impact of Memory Barriers

Memory barriers prevent a CPU from performing a lot of techniques to hide memory latency therefore they have a significant performance cost which must be considered. To achieve maximum performance it is best to model the problem so the processor can do units of work, then have all the necessary memory barriers occur on the boundaries of these work units. Taking this approach allows the processor to optimise the units of work without restriction. There is an advantage to grouping necessary memory barriers in that buffers flushed after the first one will be less costly because no work will be under way to refill them.

More on Barriers

http://gee.cs.oswego.edu/dl/jmm/cookbook.html

Memory barrier instructions directly control only the interaction of a CPU with its cache, with its write-buffer that holds stores waiting to be flushed to memory, and/or its buffer of waiting loads or speculatively executed instructions. These effects may lead to further interaction among caches, main memory and other processors.

FENCE
Is coarse-grained barrier instruction, that guarantees that all loads and stores initiated before the fence will be strictly ordered before any load or store initiated after the fence. This is usually among the most time-consuming instructions on any given processor (often nearly as, or even more expensive than atomic instructions).

Most processors additionally support more fine-grained barriers.

A property of memory barriers that takes some getting used to is that they apply BETWEEN memory accesses.

LoadLoad -
StoreStore
LoadStore
StoreLoad - Most expensive - Monitor Exit -> Monitor Enter

volatile should be used just to publish state ????

http://stackoverflow.com/questions/30141634/using-volatile-variables-and-semaphores-java

http://stackoverflow.com/questions/19744508/volatile-vs-atomic/19745207#19745207

Barrier usage in different hardwares

There is one important difference between the instruction streams for x86 and SPARC and the instruction stream for Itanium. The JVM chased the consecutive write operations with a memory barrier on x86 and SPARC, but it did not place a memory barrier between the two write operations. On the other hand the instruction stream for Itanium has a memory barrier between both writes.
Why does the JVM behave differently across hardware architectures? Because a hardware architecture has a memory model and each memory model has a set of consistency guarantees. Some memory models, like that of x86 or SPARC, have a very strong set of consistency guarantees.
Other memory models, like that of Itanium, PowerPC or Alpha, have a much more relaxed set of guarantees. For example x86 and SPARC do not re-order consecutive write operations - so no memory barrier is needed. Itanium, PowerPC and Alpha will re-order consecutive write operations - so the JVM has to place a memory barrier between them.
The JVM uses memory barriers to bridge the gaps between the Java memory model and the memory model of the hardware it runs on.

knowit

Java - Volatile & Barriers

#StoreLoad

Barrier usage in different hardwares

No comments:

Post a Comment