Andrew Kennedy, 2004.
Static type systems for programming languages catch errors at compile time and express useful program invariants in the source code. Many people have pondered upon the idea of checking the units of measure for numerical quantities in programs by extending the type system to catch errors such as adding a length to a time or using pounds instead of kilograms. Polymorphic type inference and semantics for such a system was the subject of my thesis. If you aren't familiar with languages such as ML or Haskell, you may not find this particularly readable. Hence this page: a beginner's guide to types for units of measure.
All code will be written in a Java/C#-like language but similar features could be provided for any language with a static type system.
The starting point for static checking of units or dimensions is to parameterize numeric types (real, complex) on their units or dimensions. Immediately the question of units vs dimensions arises. Dimensions are "classes" of inter-convertible units. The table below gives examples.
| Dimension | Units |
| Length | metre, cm, inch, foot, mile, parsec |
| Mass | kilogram, pound, tonne |
| Time | second, hour, nanosecond, aeon |
| Velocity | mph, km/hr, knots |
| Force | newton, lbf |
| Pressure | pascal, lb/sq inch, atmospheres |
| (None) | radian, mole |
Two things are worth noting:
For programming purposes, if only one unit of measure is associated with each base dimension (e.g. one chooses to adopt m/kg/s for length/mass/time) then it's really a matter of taste whether types are parameterized on units or dimensions. If, however, different units of measure with the same dimension can be used in a single program then one desirable feature of the system is the automatic insertion of conversions between units. This is something that I have not considered in my work but which could fit into the framework.
I'll assume from now that types are parameterized on units but that the unit system is fixed.
Start with some set of base units (e.g. kg, m, s) and then define derived
unit expressions by "product of powers" using notation such as
kg m s^-2
for the units associated with force. Then write
double<kg m s^-2>
for the type of floating-point quantities that have these units. Clearly we need a notation for dimensionless quantities such as angles:
just omitting the units e.g.
double
is as good a notation as any.
Obviously we need some way of introducing values with particular units, as ordinary real-valued literals must be dimensionless.
We assume that for each base unit there is a constant with the same name e.g.
double<kg> kg;
double<m> m;
double<s> s;
You can then multiply these constants by literal reals e.g.
gravity = 9.8 * m / (s*s);
(Behind the scenes, units are not carried around at run-time and the compiler would use the value 1.0 for all base units so the multiplications would not actually be performed at run-time; also, you might want some syntactic sugar that provides a neater notation than the above).
With this much the compiler can check the units of simple formulae; note, though, that this cannot just be encoded in a conventional type system as it relies on the algebraic properties of units e.g. kg m s^-2 is equivalent to m s^-2 kg. (To be precise, the units form an Abelian group if the powers are integers or a vector space if one uses rationals for powers). (Note that cunning use of C++ templates can be used to encode these algebraic properties, as templates permit unrestricted computation to be performed at compile-time).
For efficiency of type checking, unit expressions can be represented by vectors, with particular elements of the vector corresponding to the exponent on a particular base unit.
Unfortunately, this system only takes you so far - what if I want to write generic functions that work on values of different units? Here's a toy example:
double SumSq(double a, double b) { return a*a + b*b;
}
I surely do not want to duplicate this function for every unit of measure that is passed to it e.g.
double<m^2> SumSq(double<m> a, double<m> b) { return
a*a + b*b; }
double<kg^4> SumSq(double<kg^2> a, double<kg^2> b) { return a*a + b*b; }
Instead, I'd like to write it once and give generic types to the inputs and outputs (akin to templates in C++, generics in Ada, parametric polymorphism in ML). Suppose that U is a "unit variable". Then I can write
double<U^2> SumSq<U>(double<U> a, double<U> b) { return
a*a + b*b; }
which is interpreted "for any units U, SumSq accepts two parameters both with units U and returns a result with units U squared".
Now it can be a trifle painful to annotate all numeric types with their units of measure.
For the extension to ML described in my thesis, the compiler can automatically
infer the "most general" generic type for any function, such as the SumSq example above.
These techniques could be applied to other languages with support for units of
measure.
Not everything in the garden is rosy: