The Semantics of Multiprocessor Machine Code: x86, Power, and ARMMultiprocessors are now dominant, but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, usually described only in ambiguous prose, leading to widespread confusion. We have developed rigorous semantics for multiprocessor programs above three architectures: x86, Power, and ARM. Each covers the relaxed memory model, instruction semantics, and instruction decoding, for fragments of the instruction sets, and is mechanized in HOL. Such models should provide a solid intuition for low-level programming, and a sound foundation for future work on verification, static analysis, and compilation of low-level concurrent code, and for the design of high-level concurrent programming languages. For x86, we have produced two models: x86-CC and x86-TSO. The first, x86-CC, was based on the then-current Intel and AMD documentation: the Intel 64 and IA-32 Architectures Software Developer's Manual [up to revision 28, Sept. 2008] and AMD64 Architecture Programmer's Manual [vol 2, rev. 3.14, Sept. 2007], and also with the earlier Intel 64 Architecture Memory Ordering White Paper [Aug. 2007]. We tested the semantics against actual processors and the vendor litmus-test examples, and gave an equivalent abstract-machine characterisation of our axiomatic memory model. We proved that the memory model is equivalent to a simpler (`nice execution') definition. For programs that are, in some precise sense, data-race free, we proved in HOL that their behaviour is sequentially consistent. However, that vendor documentation, and hence x86-CC, turned out to be unsound with respect to the behaviour of actual x86 processors: see the Addendum to the x86-CC paper below. More recent vendor documentation (revision 29 of the Intel SDM) fixes that unsoundness but gives unreasonably weak guarantees to programmers.
Accordingly, we have now produced what we believe to be a better model, x86-TSO.
To the best of our knowledge this is sound with respect to actual processor
behaviour, matches the current vendor intentions, and is a better
model to program above. We give two
equivalent definitions of x86-TSO: an intuitive operational model
based on local write buffers, and an axiomatic total store ordering
model, similar to that of the SPARCv8. Both are adapted to handle
x86-specific features. We have implemented the axiomatic model in our
The Power and ARM architectures have a very different memory model to x86, but are broadly similar to each other. Our preliminary DAMP 09 semantics described below is based on the Power 2.05 specification and the ARM Architecture Reference Manual v7. Note that this model has some major flaws; further work on the semantics is needed. Here the core memory model was also formalised in Coq. Papers
Code
Tools for the x86-CC paper
Experimental data for the x86-CC paper
People(Computer Laboratory, University of Cambridge) (INRIA Paris-Rocquencourt) (Microsoft Research Cambridge) |
|