Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
The Semantics of Multiprocessor Programs
[go: Go Back, main page]

The Semantics of Multiprocessor Programs

Multiprocessors are now dominant, but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, usually described only in ambiguous (and sometimes flawed) prose, leading to widespread confusion.

We are working to develop mathematically rigorous and usable semantics for multiprocessor programs, focussing on three architectures: x86 (Intel 64/IA-32 and AMD64), Power, and ARM. Such models should provide a solid intuition for low-level programming, and a sound foundation for future work on verification, static analysis, and compilation of low-level concurrent code, and for the design of high-level concurrent programming languages.

x86-TSO

For x86, we have produced two models: x86-TSO and x86-CC. To the best of our knowledge, x86-TSO is sound with respect to actual processor behaviour, matches the current vendor intentions, and is a good model to program above. We give two equivalent definitions of x86-TSO: an intuitive operational model based on local write buffers, and an axiomatic total store ordering model, similar to that of the SPARCv8. Both are adapted to handle x86-specific features. We implemented the axiomatic model in our memevents tool, which calculates the set of all valid executions of test programs and compared the results with experimental data from our litmus tool. The model is described in the CACM paper below, written to be accessible for a broad audience.

The x86-TSO model was introduced and is formally defined in the TPHOLs paper below, together with its accompanying techreport and HOL4 definitions.

The paper below introduces a reasoning principle for programs running above the x86-TSO model: a generalisation, triangular race freedom of a data race freedom property. Of particular interest are the low-level implementations of the abstractions that support language-level concurrency - especially because they invariably contain data races. Triangular race freedom is used to analyze five concurrency abstraction implementations: two spinlocks (from Linux); a non-blocking write protocol; the double-checked locking idiom; and java.util.concurrent's Parker.

The paper below develops a program logic for x86-TSO assembly programs, extending rely-guarantee reasoning. It is formalised in HOL4 and illustrated with Simpson's 4-slot algorithm.

  • A Rely-Guarantee proof system for x86-TSO assembly code programs. Tom Ridge. To appear at VSTTE 2010.

x86-CC

Our earlier x86-CC model, presented at POPL 2009, was based on the then-current Intel and AMD documentation: the Intel 64 and IA-32 Architectures Software Developer's Manual [up to revision 28, Sept. 2008] and AMD64 Architecture Programmer's Manual [vol 2, rev. 3.14, Sept. 2007], and also with the earlier Intel 64 Architecture Memory Ordering White Paper [Aug. 2007]. We tested the semantics against actual processors and the vendor litmus-test examples, and gave an equivalent abstract-machine characterisation of our axiomatic memory model. We proved that the memory model is equivalent to a simpler (`nice execution') definition. For programs that are, in some precise sense, data-race free, we proved in HOL that their behaviour is sequentially consistent. However, that vendor documentation, and hence x86-CC, turned out to be unsound with respect to the behaviour of actual x86 processors: it forbids behaviour that actual processors exhibit, as detailed in the Addendum to the x86-CC paper below. More recent vendor documentation (revisions 29ff of the Intel SDM and 3.15 of the AMD APM) fixes that unsoundness but gives surprisingly weak guarantees to programmers. Accordingly, while the paper may still be of some interest (and the instruction semantics and testing material is still relevant), for most purposes one should focus on the x86-TSO model above.

Experimental data:

Power and ARM

The Power and ARM architectures have very different memory models to x86, but are broadly similar to each other. We have published two models, in DAMP 2009 and in CAV 2010. Both capture some important aspects of the semantics but are flawed in various ways; neither can be regarded as definitive. Further work is in progress.

Our preliminary DAMP 09 semantics described below is based on the Power 2.05 specification and the ARM Architecture Reference Manual v7.

Our CAV 2010 model below is based primarily on empirical testing using a new tool suite; it includes theoretical results on the fences required to strengthen executions in the model.

  • Fences in Weak Memory Models. To appear in CAV 2010. Jade Alglave, Luc Maranget, Susmit Sarkar and Peter Sewell

Instruction Semantics

To reason about multiprocessor machine-code or assembly code programs, the memory model must be integrated with a semantic definition of the behaviour of machine instructions. The HOL developments above for x86 (x86-CC) and for the Power and ARM (DAMP'09) include such definitions for modest fragments of the instruction sets. In the absence of fully verified processor implementations, we establish confidence in these definitions by empirical testing, comparing the behaviours permitted by the HOL definition against the behaviours observed in practice running randomly generated instructions on actual processors. Our x86sem tool and results for doing this on x86 are below.

See also the work by Fox and Myreen in the same vein on testing Fox and Gordon's ARM semantics.

Tools

Experimental data:

  • From the x86sem sequential testing: results

Position Papers, Tutorials and Invited Talks

People

Computer Laboratory, University of Cambridge:

University of Leicester:

INRIA Paris-Rocquencourt:

Microsoft Research Cambridge:

  • Samin Ishtiaq
an x86-CC example execution

[validate]