Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
An introduction to units of measure in programming languages
[go: Go Back, main page]

An introduction to units of measure in programming languages

Andrew Kennedy, 2004.

Static type systems for programming languages catch errors at compile time and express useful program invariants in the source code. Many people have pondered upon the idea of checking the units of measure for numerical quantities in programs by extending the type system to catch errors such as adding a length to a time or using pounds instead of kilograms. Polymorphic type inference and semantics for such a system was the subject of my thesis. If you aren't familiar with languages such as ML or Haskell, you may not find this particularly readable. Hence this page: a beginner's guide to types for units of measure.

All code will be written in a Java/C#-like language but similar features could be provided for any language with a static type system.

Introduction

The starting point for static checking of units or dimensions is to parameterize numeric types (real, complex) on their units or dimensions. Immediately the question of units vs dimensions arises. Dimensions are "classes" of inter-convertible units. The table below gives examples.

Dimension Units
Length metre, cm, inch, foot, mile, parsec
Mass kilogram, pound, tonne
Time second, hour, nanosecond, aeon
Velocity mph, km/hr, knots
Force newton, lbf
Pressure pascal, lb/sq inch, atmospheres
(None) radian, mole

Two things are worth noting:

  1. Dimensions can be divided into independent base dimensions from which other dimensions are derived. There might be more than one way of doing this: conventionally, length, mass and time are basic and velocity (length per unit time) and force (mass times length per unit time squared) are derived, but one could equally well make velocity basic and time derived.
  2. Some quantities are dimensionless but could still be considered to have particular units. Angles measured in radians are an example (ratio of arc length to radius) and molecular quantities measured in moles (just a number).

For programming purposes, if only one unit of measure is associated with each base dimension (e.g. one chooses to adopt m/kg/s for length/mass/time) then it's really a matter of taste whether types are parameterized on units or dimensions. If, however, different units of measure with the same dimension can be used in a single program then one desirable feature of the system is the automatic insertion of conversions between units. This is something that I have not considered in my work but which could fit into the framework. 

I'll assume from now that types are parameterized on units but that the unit system is fixed.

Types with units

Start with some set of base units (e.g. kg, m, s) and then define derived unit expressions by "product of powers" using notation such as

kg m s^-2

for the units associated with force. Then write

double<kg m s^-2>

for the type of floating-point quantities that have these units. Clearly we need a notation for dimensionless quantities such as angles: just omitting the units e.g.

double

is as good a notation as any. 

Values with units

Obviously we need some way of introducing values with particular units, as ordinary real-valued literals must be dimensionless. We assume that for each base unit there is a constant with the same name e.g.

double<kg> kg;
double<m> m;
double<s> s;

You can then multiply these constants by literal reals e.g.

gravity = 9.8 * m / (s*s);

(Behind the scenes, units are not carried around at run-time and the compiler would use the value 1.0 for all base units so the multiplications would not actually be performed at run-time; also, you might want some syntactic sugar that provides a neater notation than the above). 

Checking units

With this much the compiler can check the units of simple formulae; note, though, that this cannot just be encoded in a conventional type system as it relies on the algebraic properties of units e.g. kg m s^-2 is equivalent to m s^-2 kg. (To be precise, the units form an Abelian group if the powers are integers or a vector space if one uses rationals for powers). (Note that cunning use of C++ templates can be used to encode these algebraic properties, as templates permit unrestricted computation to be performed at compile-time).

For efficiency of type checking, unit expressions can be represented by vectors, with particular elements of the vector corresponding to the exponent on a particular base unit.

Generics

Unfortunately, this system only takes you so far - what if I want to write generic functions that work on values of different units? Here's a toy example:

double SumSq(double a, double b) { return a*a + b*b; }

I surely do not want to duplicate this function for every unit of measure that is passed to it e.g.

double<m^2> SumSq(double<m> a, double<m> b) { return a*a + b*b; }

double<kg^4> SumSq(double<kg^2> a, double<kg^2> b) { return a*a + b*b; }

Instead, I'd like to write it once and give generic types to the inputs and outputs (akin to templates in C++, generics in Ada, parametric polymorphism in ML). Suppose that U is a "unit variable". Then I can write

double<U^2> SumSq<U>(double<U> a, double<U> b) { return a*a + b*b; }

which is interpreted "for any units U, SumSq accepts two parameters both with units U and returns a result with units U squared". 

Now it can be a trifle painful to annotate all numeric types with their units of measure. For the extension to ML described in my thesis, the compiler can automatically infer the "most general" generic type for any function, such as the SumSq example above. These techniques could be applied to other languages with support for units of measure.

Some drawbacks

Not everything in the garden is rosy:

  1. Static typing has its limits. For example, consider a function that calculates the product of the elements in a list whose size is not known at compile-time. The units of the result depend on the size of the list - and hence cannot be determined at compile-time. In fact, type inference will insist that the inputs and outputs are dimensionless.
  2. A certain amount of cheating is required. Obviously one has to assume that the built-in arithmetic operations (+, -, *, /, etc.) are given predetermined generic types:

    double<U> operator+(double<U> x, double<U> y)
    double<U> operator-(double<U> x, double<U> y)
    double<U V> operator*(double<U> x, double<V> y)
    double<U V^-1> operator/(double<U> x, double<V> y)

    What's not so obvious is that certain derived operations cannot be coded in the programming language and still be assigned their natural generic types. For example, one might hope that an implementation of square root (by iterative approximation) would be assigned the type

    double<U> Sqrt(double<U^2> x)

    It can be shown that no non-trivial programs have this type in a language with just the four built-in arithmetic operations listed above. (Trivial programs include those that always return zero or go into an infinite loop). One of my papers (POPL'97) presents a mathematical proof using techniques from formal semantics. But more intuitively, consider the operation of such a root-finding function. It must start with an initial estimate of the root (of type double<U>) and yet it cannot generate this estimate from its argument (of type double<U^2>) using only the built-in arithmetic operations. 

    There are two ways out of this bind. The first is simply to build "enough" operations into the language as primitive and to assign these their natural generic types (roots being an obvious candidate). The second is to allow the user to break the type system and to "cast" from one unit of measure to another. This is rather unsatisfactory as arguably it defeats the whole purpose of units-of-measure checking, but perhaps if it were limited to "library writers" then its abuse would be curtailed.