Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Relaxed-Memory Concurrency
[go: Go Back, main page]

Relaxed-Memory Concurrency

Multiprocessors are now pervasive and concurrent programming is becoming mainstream, but typical multiprocessors (x86, Sparc, Power, ARM, Itanium) and programming languages (C, C++, Java) do not provide the sequentially consistent shared memory that has been assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, exposing behaviour that arises from hardware and compiler optimisations to the programmer. Moreover, these memory models have usually described only in ambiguous (and sometimes flawed) prose, leading to widespread confusion. We are working to develop mathematically rigorous and usable semantics for multiprocessor programs, focussing on three processor architectures (x86, Power, and ARM); on the upcoming revisions of the C++ and C languages (C++0x and C1x); and on reasoning and verification using these models.

x86

For x86, we have produced the x86-TSO model. To the best of our knowledge, x86-TSO is sound with respect to actual processor behaviour, matches the current vendor intentions, and is a good model to program above. The model is described in the CACM paper below, written to be accessible for a broad audience, and formally defined in the TPHOLs 2009 paper below. The POPL 2009 paper gives a rather different model, x86-CC, based on the misleading vendor documentation of the time, that is not sound with respect to observable behaviour. See also more details.

Power

For Power, we have produced several models. The PLDI 2011 paper below gives an abstract-machine model that, to the best of our knowledge, accurately captures the architectural intent and observable processor behaviour for a wide range of subtle examples. The TACAS 2011 tool paper describes our Litmus experimental testing tool. The CAV 2010 paper gives an earlier axiomatic model that is sound with respect to current POWER processor behaviour, as far as we know, but does not match the architectural intent for some examples (as described in the PLDI 2011 paper). It is based on a general framework developed in Alglave's thesis. The DAMP 2009 paper gives a very preliminary axiomatic model, based on a naive reading of the vendor documentation. See also more details of the PLDI model and more details of the CAV and DAMP work.

C++0x and C1x

C and C++ are defined by standards, but those standards have historically not covered the behaviour of concurrent programs, motivating an ongoing effort to specify concurrent behaviour in a forthcoming revision of C++ (unofficially, C++0x) [WG21, Boehm and Adve, PLDI 2008]. The next C standard (unofficially, C1X) [WG14] is expected to follow suit. The key issue here is the multiprocessor relaxed-memory behaviour induced by hardware and compiler optimisations. The proposals aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed-memory atomic primitives for high-performance concurrent code. However, the draft standards (e.g. the Final Committee Draft (N3092), are prose documents: while the result of careful deliberation, they have almost inevitably been unclear on some points, and have been subject to some subtle semantic flaws. The POPL 2011 paper below gives a mathematically rigorous semantics for C++ concurrency. To the best of our knowledge, this captures the intent of the Final Committee Draft (N3092) text, but with changes we suggested at the Rapperswil meeting of the C++ Standards Committee in July 2010, which have been incorporated into the current draft. Some further changes are currently under discussion. We hope that this will aid discussion of any further changes to the draft standard, provide an unambiguous correctness condition for compilers, and give a basis for analysis and verification of concurrent C and C++ programs. As a starting point, the paper also proves correctness of a compilation scheme from C++0x concurrency to x86-TSO. See also more details.

Reasoning about x86 concurrency: TRF

The low-level assembly-language implementations of the abstractions that support language-level concurrency, such as locks and concurrent datastructures, are particularly interesting and challenging to reason about because they invariably contain data races. The ECOOP 2010 paper below develops a novel principle for reasoning about assembly programs on our previous x86-TSO memory model, uses it to analyze concurrency abstraction implementations of two spinlocks (from Linux), a non-blocking write protocol, the double-checked locking idiom; and java.util.concurrent's Parker. This triangular race freedom principle strengthens the usual data-race freedom style of reasoning. See also more details.

Compiler Verification: CompCertTSO, from Concurrent Clight (with TSO semantics) to x86-TSO

CompCertTSO is a compiler that generates x86 assembly code from ClightTSO, a large subset of the C programming language enhanced with concurrency primitives for thread management and synchronisation, and with a TSO relaxed memory model. The development is based on the CompCert compiler from sequential Clight to PowerPC and ARM (developed by Leroy et al., INRIA). The CompCertTSO compiler is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that, for well-defined source programs, any behaviour of the generated code (with x86-TSO semantics) is permitted by the source-language semantics --- has been proved within Coq. The POPL 2011 paper describes some of this work, and the second paper below describes further work on fence elimination; see also more details.

Correctness of Compiler Optimisations for DRF

Current proposals for concurrent shared-memory languages, including C++ and C, give well-defined semantics (in fact, sequential consistency) only for programs without data races; this is known as the DRF guarantee. The correctness of compiler optimisations under the DRF guarantee is not completely clear, and experience with Java shows that this area is error-prone. The PLDI 2011 paper below gives a rigorous study of optimisations that involve both reordering and elimination of memory reads and writes, covering many practically important optimisations. It also discusses some surprising limitations of the DRF guarantee. Earlier work by Ševčík and Aspinall, in ECOOP 2008, showed that the Java Memory Model disallows some standard optimisations, including some performed by the Hotspot JVM. See also more details.

Program Logics for x86-TSO

The paper below develops a program logic for x86-TSO assembly programs, extending rely-guarantee reasoning. It is formalised in HOL4 and illustrated with Simpson's 4-slot algorithm.

Position Papers, Tutorials and Invited Talks

People

University of Cambridge:

University of Leicester:

IBM:

INRIA Paris-Rocquencourt:

MPI-SWS

Purdue University

Microsoft Research Cambridge:

  • Samin Ishtiaq

Funding

We acknowledge funding from:


[validate]