Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Relaxed-Memory Concurrency
[go: Go Back, main page]

Relaxed-Memory Concurrency

Multiprocessors are now pervasive and concurrent programming is becoming mainstream, but typical multiprocessors (x86, Sparc, Power, ARM, Itanium) and programming languages (C, C++, Java) do not provide the sequentially consistent shared memory that has been assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, exposing behaviour that arises from hardware and compiler optimisations to the programmer. Moreover, these memory models have usually described only in ambiguous (and sometimes flawed) prose, leading to widespread confusion. This page collects work by a group of people working to develop mathematically rigorous and usable semantics for multiprocessor programs. We are focussing on three processor architectures (x86, Power, and ARM), on the recent revisions of the C++ and C languages (C++11 and C11), and on reasoning and verification using these models.

All Papers, Chronologically

People

University of Cambridge:

University of Leicester:

IBM:

INRIA Paris-Rocquencourt:

University of Oxford:

MPI-SWS

Purdue University

Microsoft and MSR Cambridge:

Uppsala University:

x86 (more details)

For x86, we have produced the x86-TSO model. To the best of our knowledge, x86-TSO is sound with respect to actual processor behaviour, matches the current vendor intentions, and is a good model to program above. The model is described in the CACM paper below, written to be accessible for a broad audience, and formally defined in the TPHOLs 2009 paper below. The POPL 2009 paper gives a rather different model, x86-CC, that was based on the misleading vendor documentation of the time but that is not sound with respect to observable behaviour.

Reasoning about x86 concurrency: TRF (more details)

The low-level assembly-language implementations of the abstractions that support language-level concurrency, such as locks and concurrent datastructures, are particularly interesting and challenging to reason about because they invariably contain data races. The ECOOP 2010 paper below develops a novel principle for reasoning about assembly programs on our previous x86-TSO memory model, uses it to analyze concurrency abstraction implementations of two spinlocks (from Linux), a non-blocking write protocol, the double-checked locking idiom; and java.util.concurrent's Parker. This triangular race freedom principle strengthens the usual data-race freedom style of reasoning.

Power (more details)

For Power, we have produced several models. The PLDI 2011 paper below gives an abstract-machine model that, to the best of our knowledge, accurately captures the architectural intent and observable processor behaviour for a wide range of subtle examples. It was revised and extended to support the Power load-reserve/store-conditional operations as described in our PLDI 2012 paper below. The TACAS 2011 tool paper describes our Litmus experimental testing tool. The CAV 2010 paper gives an earlier axiomatic model that is sound with respect to current POWER processor behaviour, as far as we know, but does not match the architectural intent for some examples (as described in the PLDI 2011 paper). It is based on a general framework developed in Alglave's thesis. The DAMP 2009 paper gives a very preliminary axiomatic model, based on a naive reading of the vendor documentation. See also more details of the CAV and DAMP work.

Software: Testing Tools

C11 and C++11 (more details)

C and C++ are defined by standards, but those standards have historically not covered the behaviour of concurrent programs, motivating an ongoing effort to specify concurrent behaviour in the new revisions of C and C++, C11 and C++11 [WG14, WG21, Boehm and Adve, PLDI 2008]. The key issue here is the multiprocessor relaxed-memory behaviour induced by hardware and compiler optimisations. The proposals aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed-memory atomic primitives for high-performance concurrent code. However, the draft standards (e.g. the Final Committee Draft (N3092), are prose documents: while the result of careful deliberation, they have almost inevitably been unclear on some points, and have been subject to some subtle semantic flaws. The POPL 2011 paper below gives a mathematically rigorous semantics for C++ concurrency. To the best of our knowledge, this captures the intent of the revised standards texts, incorporating changes that we suggested; see the further details for the current full version of the model. We hope that this will aid discussion of any further changes to the standard, provide an unambiguous correctness condition for compilers, and give a basis for analysis and verification of concurrent C and C++ programs. As a starting point for the latter, the paper also proves correctness of a compilation scheme from C/C++11 concurrency to x86-TSO. The PPDP 2011 paper uses the Isabelle Nitpick tool to efficiently explore C/C++11 executions.

Compiling from C/C++11 Concurrency to Power (see more details here and here)

Compiler Verification: CompCertTSO, from Concurrent Clight (with TSO semantics) to x86-TSO (more details)

CompCertTSO is a compiler that generates x86 assembly code from ClightTSO, a large subset of the C programming language enhanced with concurrency primitives for thread management and synchronisation, and with a TSO relaxed memory model. The development is based on the CompCert compiler from sequential Clight to PowerPC and ARM (developed by Leroy et al., INRIA). The CompCertTSO compiler is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that, for well-defined source programs, any behaviour of the generated code (with x86-TSO semantics) is permitted by the source-language semantics --- has been proved within Coq. The POPL 2011 paper and draft below describe some of this work, and the third paper below describes further work on fence elimination.

Correctness of Compiler Optimisations for DRF

Current proposals for concurrent shared-memory languages, including C++ and C, give well-defined semantics (in fact, sequential consistency) only for programs without data races; this is known as the DRF guarantee. The correctness of compiler optimisations under the DRF guarantee is not completely clear, and experience with Java shows that this area is error-prone. The PLDI 2011 paper below gives a rigorous study of optimisations that involve both reordering and elimination of memory reads and writes, covering many practically important optimisations. It also discusses some surprising limitations of the DRF guarantee. Earlier work by Ševčík and Aspinall, in ECOOP 2008, showed that the Java Memory Model disallows some standard optimisations, including some performed by the Hotspot JVM. See also more details.

Program Logics for x86-TSO

The paper below develops a program logic for x86-TSO assembly programs, extending rely-guarantee reasoning. It is formalised in HOL4 and illustrated with Simpson's 4-slot algorithm.

Position Papers, Tutorials and Invited Talks

Funding

We acknowledge funding from:


[validate]