This document is a list of frequently asked questions (FAQ) and their answers for the Larch family of specification languages. It is intended to be useful to those who are new to Larch, especially to students and others trying to understand and apply Larch. However, there is some material that is also intended for experts in other formal methods who would like to learn more about Larch. If you find something that seems too technical, please skip a bit, perhaps to another question.
We are looking for more contributions of questions and answers, as well as corrections and additions to the answers given. Please send e-mail to us at larch-faq@cs-DOT-iastate-DOT-edu (after changing each "-DOT-" to "."). We will welcome and acknowledge any help you may give.
Bibliographic references are described at the end of this document (see section Bibliography). They look like "[Wing83]", which give the names of up to 3 authors, and an abbreviated year of publication.
This FAQ is accessible on the WWW in the following URL.
http://www.cs.iastate.edu/~leavens/larch-faq.html
HTML, plain text, postscript, and GNU info format versions are also available by anonymous ftp at the following URLs.
ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.html ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.txt.gz ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.ps.gz ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.info.tar.gz
Archive-name: larch-faq
Posting-Frequency: monthly
Last-Modified: $Date: 2001/07/18 16:45:42 $
Version: $Revision: 1.116 $
URL: http://www.cs.iastate.edu/~leavens/larch-faq.html
Copyright: (c) 1995, 1996, 1997 Gary T. Leavens
Maintainer: Gary T. Leavens and Matt Markland <larch-faq@cs-DOT-iastate-DOT-edu>
This FAQ is copyright (C) 1997 Gary T. Leavens.
Permission is granted for you to make copies of this FAQ for educational and scholarly purposes, and for commercial use in specifying software, but the copies may not be sold or otherwise used for direct commercial advantage; permission is also granted to make this document available for file transfers from machines offering unrestricted anonymous file transfer access on the Internet; these permissions are granted provided that this copyright and permission notice is preserved on all copies. All other rights reserved.
This FAQ is provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of its information, the author and maintainers assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
This work was partially funded by (US) National Science Foundation, under grant CCR-9503168.
Thanks to Matt Markland who provided the initial impetus for getting this FAQ started, who helped make up the initial list of questions.
Many thanks to John Guttag, Jim Horning, Jeannette Wing, and Steve Garland for creating the Larch approach and its tools, and for their patient teaching of its ideas through their writing, personal contact, and many e-mail messages.
Thanks to Perry Alexander, Al Baker, Pieter Bekaert, Eric Eide, John Fitzgerald, Jim Horning, Ursula Martin, Bertrand Meyer, Clyde Ruby, and Jeannette Wing for general comments, updates, and corrections to this FAQ. Thanks to Steve Garland and Jim Horning for help with answers. Thanks to Pierre Lescanne, Kishore Dhara, Teo Rus, Narayanan Sridhar, and Sathiyanarayanan Vijayaraghavan for suggesting questions.
In this chapter, we discuss global questions about Larch.
Larch [Guttag-Horning93] may be thought of as an approach to formal specification of program modules. This approach is an extension of Hoare's ideas for program specification [Hoare69] [Hoare72a]. Its distinguishing feature is that it uses two "tiers" (or layers) [Wing83]. The top tier is a behavioral interface specification language (BISL), tailored to a specific programming language. Such BISLs typically use pre- and postcondition specifications to specify software modules. The bottom tier is the Larch Shared Language (LSL, see [Guttag-Horning93], Chapter 4), which is used to describe the mathematical vocabulary used in the pre- and postcondition specifications. The idea is LSL specifies a domain model, some mathematical vocabulary, and the BISL specifies both the interface and the behavior of program modules. See section 1.2 Why does Larch have two tiers?, for more of the reasons for this separation.
The Larch family of specification languages [Guttag-Horning-Wing85b] consists of all the BISLs that use LSL for specification of mathematical vocabulary. Published BISLs with easily accessible references include:
There are also some BISLs whose documentation is not as easy to come by: Larch/Generic [Chen89], GCIL (which stands for Generic Concurrent Interface specification Language [Lerner91]), and Larch/CORBA (a BISL for CORBA IDL [Sivaprasad95]).
Typically, each BISL uses the declaration syntax of the programming language for which it is tailored (e.g., C for LCL, Modula-3 for LM3), and adds annotations to specify behavior. These annotations typically consist of pre- and postconditions (often using the keywords requires and ensures), some way to state a frame axiom (often using the keyword modifies), and some way to indicate what LSL traits are used to provide the mathematical vocabulary. See section 4. Behavioral Interface Specification Languages (BISLs), for more details.
The two "tiers" used in the Larch family of specification languages are LSL, which is called the bottom tier, and a behavioral interface specification language (a BISL), which is called the top tier.
This specification of program modules using two "tiers" is a deliberate design decision (see [Wing83] and [Guttag-Horning93], Chapter 3). It is similar to the way that VDM [Jones90] [ISO-VDM96] [Fitzgerald-Larsen98] is typically used (one specifies a model and some auxiliary functions, in a way similar to LSL, and then one uses that vocabulary to specify the operations that are to be implemented in some programming language.) However, VDM does not have a family of BISLs that are tailored to specific programming languages. Only the Larch family, and the RESOLVE family [Edwards-etal94], have specification languages tailored to specific programming languages. (See section 1.4 How does Larch compare to other specification languages?, for more comparisons.)
What are the advantages of using the Larch two-tiered approach?
One could imagine using some other language besides LSL for the bottom tier and still getting these advantages. This is done, for example, in RESOLVE [Edwards-etal94]. Chalin [Chalin95] has suggested that Z [Hayes93] [Spivey92] could also serve for the bottom tier, but one would probably want to use a subset of Z without state and the specification of side-effects for that.
The major disadvantage of the tailoring of each BISL to a specific programming language is that each BISL has its own syntax and semantics, and thus requires its own tool support, manuals, etc.
(Parts of the following paragraph are adapted from a personal communication from Horning of November 29, 1996.)
Something that might seem to be a disadvantage of the Larch approach
is that one sometimes finds oneself writing
out a very similar specifications in both LSL and a BISL.
Isn't that a waste of effort? It may seem like that,
but one has to realize that one is doing two different things,
and thus the two similar specifications cannot really be the same.
What may happen is that one writes similar declarations on the two levels,
with parallel sets of operators and procedures.
For example, one might specify LSL operators empty,
push, pop, top for stacks,
and then specify in a BISL procedures
Empty, Push, Pop, and Top.
While the BISL procedure specification will use the LSL operators
in their pre- and postconditions,
there will probably be significant differences in their signatures.
For example, the procedure Push may have a side-effect,
rather than returning a new stack value as would the push operator
in the trait.
If you find yourself repeating the axioms from an LSL trait in a BISL
specificaton, then you're probably making a mistake.
It is also difficult to extract a simple mathematical vocabulary (like the total functions of LSL) from a BISL specification. This is because the semantics of program modules tends to be complex, Thus a BISL specification is not an appropriate vehicle for the specification of mathematical vocabulary.
See section 1.3 What is the difference between LSL and a Larch BISL?, for more discussion on this point.
The main difference between LSL and a Larch BISL is that in LSL one specifies mathematical theories of the operators that are used in the pre- and postcondition specifications of a Larch BISL. Thus LSL is used to specify mathematical models and auxiliary functions, and the a Larch BISL is used to specify program modules that are to be implemented in some particular programming language.
The following summary, provided by Horning (private communication, October 1995), contrasts LSL and a Larch BISL.
A BISL specifies (not LSL) LSL specifies (not a BISL)
--------------------------- ---------------------------
State Theory definition
storage allocation Theory operations
aliasing, side-effects inclusion, assumption,
Control implication, parameterization
errors, exception handling, Operator definitions
partiality, pre-conditions, Facilities for libraries, reuse
concurrency
Programming language constructs
type system, parameter modes,
special syntax and notations
Implementation
(no LSL operators are
implemented)
Horning's summary is that a BISL handles "all the messy, boring stuff", while LSL handles "all the subtle, hard stuff", and thus the features in the two tiers do not overlap much. However, that summary is from the point-of-view of a designer of LSL. From the point of view of a user of a Larch BISL, the BISL is where you specify what is to be implemented, and LSL is where you do "domain modeling".
See [Lamport89] [Guttag-Horning93], Chapter 3, for more about the role of interface specification, and the separation into two tiers.
First, a more precise comparison is needed, because Larch is not a single language, but a family of languages (see above). Another problem with this comparison is that Larch has two tiers, but VDM-SL [Jones90] [ISO-VDM96] [Fitzgerald-Larsen98], Z [Hayes93] [Spivey92], and COLD-K [Feijs-Jonkers92] are all integrated languages, which mix aspects of both of the tiers of the Larch family. Thus we will compare some aspects of VDM-SL and Z to LSL and other aspects to BISLs in the Larch family (such as LCL or LM3 [Guttag-Horning93], Chapters 5 and 6).
(The comparisons below tend to be a bit technical; if you are not familiar with specification and verification, you might want to skip to another question.)
An excellent resource for comparison of LSL to other algebraic specification languages is [Mosses96]. Also a comparison of Larch with RESOLVE [Edwards-etal94] is found in [Sitaraman-Welch-Harms93].
By VDM, one means, of course, the specification language VDM-SL [Jones90] [ISO-VDM96] [Fitzgerald-Larsen98]. In comparison with LSL, in VDM-SL one can specify mathematical values (models) using constructs similar to those in denotational semantics and typed, functional programming languages. That is, one can declare types of records, tagged unions, sets, maps and so on, and one can specify mathematical functions on such values, either in a functional style, or by the use of pre- and postcondition specifications. For example, one might write the following in VDM-SL to define a phone book.
-- This is a VDM-SL model and auxiliary function specification
PhoneBook = map Id to (Info | HIDDEN)
Info :: number : seq of Digit
address : Address
listed : PhoneBook * Id -> bool
listed(pb, id) == id in set dom pb
The above example uses the interchange syntax of VDM-SL [ISO-VDM96]
is used. In this syntax, for example,
* is an approximation to the mathematical notation
for a cross product; see Section 10 of [ISO-VDM96] for details.
The type PhoneBook is a map to a tagged union type.
The type Info is a record type.
The auxiliary function listed is given in a mathematical
form, similar to LSL.
In LSL one can use standard traits (for example, those in Guttag and Horning's handbook [Guttag-Horning93], Appendix A) and the LSL shorthands tuple and union (see Section 4.8 of [Guttag-Horning93]) to translate VDM-SL specifications of mathematical values into LSL. For example, the following is an approximation to the above VDM-SL specification.
% This is LSL
PhoneBookTrait : trait
includes Sequence(Digit, Number),
FiniteMap(PhoneBook, Id, InfoOrHIDDEN)
Info tuple of number : Number, address : Address
InfoOrHidden union of left : Info, right : HIDDEN
introduces
listed : PhoneBook, Id -> Bool
asserts
\forall pb: PhoneBook, id: Id
listed(pb, id) = defined(pb, id);
However, VDM-SL does not have the equivalent of LSL's equational specification constructs, and thus it is more difficult in VDM-SL to specify a mathematical vocabulary at the same level of abstraction. That is, there is no provision for algebraic specification in VDM-SL.
Because VDM-SL has a component that is like a programming language, and because it allows recursive definitions, the logic used in VDM-SL is a three-valued logic instead of the classical, two-valued logic of LSL. This may make reasoning about VDM-SL specifications relatively more difficult than reasoning about LSL specifications.
In comparison to Larch family BISLs, the first thing to note is that VDM-SL is not a BISL itself, as it is not tailored to the specification of interfaces for some particular programming language. This is evident in the following example, which continues the above VDM-SL phonebook specification.
-- The following is a VDM-SL operation specification
LOOKUP(pb: PhoneBook, id: Id) i: Info
pre listed(pb, id)
post i = pb(id)
In the above VDM-SL specification, it is evident that one can define some generic interface information (parameters, information about their abstract values, etc.). The pre- and postcondition format of such specifications has been adopted by most Larch family BISLs. Framing for procedure specifications in VDM-SL is done by declaring external variables, and noting which are readable and writable by a procedure; these two classes of variables are respectively defined by external declarations (in LCL and Larch/C++), and by the modifies clause. Because VDM-SL is generic, it cannot, by itself, specify language-specific interface details, such as pointers, exception handling, etc. However, an advantage of VDM-SL is that it has better tool support than most BISLs in the Larch family.
Like VDM-SL, Z [Hayes93] [Spivey92] (pronounced "zed") is a specification language that allows both the specification of mathematical values and program modules.
Like LSL, Z allows the definition of mathematical operators equationally (see [Spivey92], section 3.2.2). A Z schema is roughly comparable to an LSL trait. The main difference between Z and LSL is that in Z specifications can include state variables. That is, a Z schema can specify variables, or a procedure with side effects on such variables. In this respect a Z schema can act much like a Larch family BISL, since, unlike LSL, it is not restricted to the specification of mathematical constants and mathematical functions.
By concentrating on the features of Z that do not involve state variables, one can more closely compare Z and LSL. We do this for the next several paragraphs below.
The Z schema calculus (see [Spivey92], section 3.8) allows one to apply to schemas: most logical connectives, quantification, name hiding, and pipelining. (One can also use sequential composition between schemas that specify side-effects.) The schema calculus is thus more expressive than the LSL mechanisms: includes and assumes. However, in LSL one can effectively conjoin traits by including them all to make a new trait.
It is possible in Z to write schemas
that describe constants and mathematical functions only.
However, many purely mathematical Z schemas
describe something more like an instance of an LSL tuple.
This is done in Z by giving several variable declarations.
For example,
consider the following Z schema
(in which Z stands for the integers).
--Z_rational---------------------------------- | num, den: Z ---------------------------------------------
The above is semantically similar to the LSL trait LSL_rational given below.
LSL_rational: trait
includes Integer
introduces
num: -> Int
den: -> Int
However, in Z usage, the schema Z_rational
is likely to be useful,
whereas in LSL one would typically write
a trait such as the following,
which specifies a type (called a sort in LSL) of rational numbers.
LSL_rational_sort: trait includes Integer rational tuple of num: Int, den: Int
The above is semantically similar to a Z schema
that defines rational as the Cartesian product
of Z and Z, and specifies the constructors
and observers of the LSL shorthand
(see [Guttag-Horning93], Figure 4.10).
This is a difference in usage, not in expressiveness,
however.
Z and LSL both have comparable built-in notations for describing abstract values. Z has basic types (the integers, booleans, etc.) and more type constructors (sets, Cartesian products, schemas, and free types). A "free type" in Z (see [Spivey92], Section 3.10) is similar to an LSL union type, but Z allows direct recursion. Z also has a mathematical tool-kit (see [Spivey92], Chapter 4) which specifies: relations, partial and total functions, sequences, and bags, and a large number of operation symbols and mathematical functions defined on these types. This tool-kit is roughly comparable to Guttag and Horning's handbook (see [Guttag-Horning93], Appendix A), although it tends to make heavier use of symbols (like "+") instead of named functions (like "plus"). (This use of symbols in Z has been found to be a problem with the understandability of Z specifications [Finney96]. However, the readability of Z is improved by the standard conventions for mixing informal commentary with the formal notation.)
LSL and Z both use classical logic, and the underspecification approach to the specification of partiality, in contrast to VDM-SL. Z does not have any way of stating the equivalent of an LSL partitioned by or generated by clause.
See [Baumeister96] for an extended Z example (the birthday book of [Spivey92]) worked in LSL.
Compared to a Larch family BISL, Z, like VDM-SL is generic, and thus is not able to specify the interface details of a particular programming language.
An interesting semantic difference between Z and a typical Larch-family BISL (and VDM-SL) is that Z does not have true preconditions (see [Jackson95], Sections 7.2 and 8.1); instead, Z has guards (enabling conditions, which are somewhat like the when clauses of GCIL [Lerner91]). Thus Z may be a better match to the task of defining finite state machines; the use of enabling conditions also (as Jackson points out) allows different views of specifications to be combined. Aside from the Larch BISLs like GCIL that have support for concurrency, most Larch family BISLs have preconditions which are "disclaimers". Because of this difference, and the Z schema calculus, it is much easier to combine Z specifications than it is to combine specifications in a Larch BISL.
A related point of usage difference is the frequency of the use of invariants in Z (Spivey calls these "constraints"; see [Spivey92], section 3.2.2). Because in Z one often specifies schemas with state variables, it makes sense to constrain such variables in various ways. In Z, a common idiom is to conjoin two schemas, in which case the invariants of each apply in the conjoined schema. A typical Larch family BISL has no provision for such conjunction, although object-oriented BISLs (such as Larch/C++) do get some mileage out of conjunction when subtypes are specified.
Because Z does not distinguish operation specifications from auxiliary function specifications, it does not have a separation into tiers. So in Z one can use operations previously specified to specify other operations.
Like Larch, COLD-K [Feijs-Jonkers92] makes more of a separation into mathematical and interface specifications, although all are part of the same language. The part of COLD-K comparable to LSL is its algebraic specifications (see [Feijs-Jonkers92], Chapters 2 and 3). In contrast to LSL, COLD-K does not use classical logic, and can specify partial functions. All COLD-K types have an "undefined" element, except the type of the Booleans. The logic in COLD-K has a definedness predicate that allows one to specify that some term must be undefined, for example; such a specification is not directly supported by LSL. A feature of COLD-K not found in LSL is the ability to hide names (see [Feijs-Jonkers92], Section 3.3); that is to only export certain names from a COLD-K scheme. Both COLD-K and LSL have mechanisms for importing and renaming.
Compared to a Larch family BISL, COLD-K is often more expressive, because it uses dynamic logic as its fundamental semantics (see [Feijs-Jonkers92], Section 5.9). In COLD-K one can also write algorithmic descriptions, and can mix algebraic, algorithmic, and dynamic logic specifications. COLD-K has relatively more sophisticated mechanisms for framing (see [Feijs-Jonkers92], p. 126, and [Jonkers91]) than most Larch family BISLs (see section 4.9 What does modifies mean?). All of this would seem to make COLD-K both more expressive and difficult to learn than a typical Larch family BISL.
(The following is adapted from a posting of Horning to the larch-interest mailing list on June 19, 1995, which was itself condensed from the preface of [Guttag-Horning93].)
This project has been a long time in the growing. The seed was planted by Steve Zilles on October 3, 1973. During a programming language workshop organized by Barbara Liskov, he presented three simple equations relating operations on sets, and argued that anything that could reasonably be called a set would satisfy these axioms, and that anything that satisfied these axioms could reasonably be called a set. John Guttag refined this idea. In his 1975 Ph.D. thesis (University of Toronto), he showed that all computable functions over an abstract data type could be defined algebraically using equations of a simple form. He also considered the question of when such a specification constituted an adequate definition.
As early as 1974, Guttag and Horning realized that a purely algebraic approach to specification was unlikely to be practical. At that time, they proposed a combination of algebraic and operational specifications which they referred to as "dyadic specification." Working with Wing, by 1980 they had evolved the essence of the two-tiered approach to specification; the term "two-tiered" was introduced by Wing in her dissertation [Wing83]. A description of an early version of the Larch Shared Language was published in 1983. The first reasonably comprehensive description of Larch was published in 1985 [Guttag-Horning-Wing85a].
In the spring of 1990, software tools supporting Larch were becoming available, and people began using them to check and reason about specifications. To make information on Larch more widely available, it was decided to write a book [Guttag-Horning93].
The First International Workshop on Larch, organized by Ursula Martin and Jeannette Wing, was held in Dedham, Massachusetts, July 13-15, 1992 [Martin-Wing93].
According to Jim Horning (personal communication, January 20, 1998) and John Guttag (personal communication, March 28 1998): "The name was not selected at PARC (hence from the Sunset Western Garden Book), but at MIT. The project had been known informally there as `Bicycle'." One day Mike Dertouzos [director of the MIT Laboratory for Computer Science] and John were talking on the phone. According to Jim, Mike asked John to "think up a new name quick, because he could just imagine what Senator Proxmire would say if he noticed that DARPA was sponsoring `a bicycle project' at MIT." John was, he says, "looking at a larch tree at" his "parent's house" and came up with "Larch". He also says "I did grow up in Larchmont, and I'm sure that influenced my choice of name." Jim adds, "It was only later that Butler Lampson suggested that we could use it for the specification of very `larch' programs."
The "Larch" is the common name of a species of fir tree. The cover of [Guttag-Horning93] has a picture of Larch trees on it. For more pictures of Larch trees, see the following URL.
http://www.cs.iastate.edu/~leavens/larch-pics.html
GIF format pictures of the Larch logo, can be found at the following URLs.
http://www.sds.lcs.mit.edu/spd/larch/pictures/larch.gif http://www.sds.lcs.mit.edu/spd/larch/pictures/trees.gif
A good place to start is the Guttag and Horning's book [Guttag-Horning93]. (This book is sometimes called "the silver book" by Larchers.) Consider it required reading. If you find that all tough going, you might want to start with Liskov and Guttag's book [Liskov-Guttag86], which explains the background and motivates the ideas behind abstraction and specification. (See section 1.9 What is the use of formal specification and formal methods?, for more background.)
You might also want to read the introductory article about the family [Guttag-Horning-Wing85b], although it is now somewhat dated. You might (especially if you are an academic) want to also read some of the proceedings of the First International Workshop on Larch [Martin-Wing93], which contains several papers about Larch.
The usenet newsgroup `comp.specification.larch' is devoted to Larch. You might also want to read the newsgroup `comp.specification.misc' for more general discussions.
There was a mailing list, called "larch-interest," for Larch. However, it has been discontinued, now that the usenet newsgroup is available. The archives of this list are available from the following URL.
http://www.research.digital.com/SRC/larch/larch-interest.archive.html
Other on-line resources can be found through the global home page for Larch at the following URL.
http://www.sds.lcs.mit.edu/spd/larch/index.html
Other resources are SRC's Larch home page
http://www.research.digital.com/SRC/larch/larch-home.html
and Horning's page.
http://www.star-lab.com/horning/larch.html
This frequently asked questions list (FAQ) is accessible on the world-wide web in:
http://www.cs.iastate.edu/~leavens/larch-faq.html
HTML, plain text, postscript, and GNU info format versions are also available by anonymous ftp at the following URLs.
ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.html ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.txt.gz ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.ps.gz ftp://ftp.cs.iastate.edu/pub/larch/larch-faq.info.tar.gz
This question might mean one of several other more precise questions. These questions are discussed below.
One question is: is there a Larch-style BISL for some particular object-oriented (OO) programming language? Yes, there are Larch-style BISLs for Modula-3 (see [Guttag-Horning93], chapter 6, and [Jones91]), Smalltalk-80 (see [Cheon-Leavens94]), and C++ (see [Leavens97] [Leavens96b]).
Another question is: does LSL have extensions that support object-oriented specification? No, there are none in particular. One could imagine extensions of LSL with some notion of subsorting, as in OBJ [Futatsugi-etal85] [Goguen-etal87], but as yet this has not been done.
There are endless debates about the use of formal specification and formal methods. For recent discussions see [Saiedian-etal96]. The following discussion may help you enter the debate in a reasonable fashion.
First, be sure you know what formal methods are, and what they are good for [Liskov-Guttag86] [Wing90]. Roughly speaking, formal methods encompass formalized software design processes, formal specifications, formal derivation of implementations, and formal verification. You might also want to look at the following URL for more recent work on formal methods.
http://www.comlab.ox.ac.uk/archive/formal-methods.html
Formal verification of implementations that were not designed with verification in mind is impossible in general. Formal verification has been critiqued as too difficult to justify and manage [DeMillo-Lipton79]. Programs can also only be verified relative to some abstract machine [Fetzer88]. However, these problems haven't stopped people from trying to use formal verification to gain an added level of confidence (or to help find bugs [Garland-Guttag-Horning90] [Tan94]) in software that is very critical (in terms of safety or business factors). Of course, you can still use formal methods without doing formal verification.
Several advocates of formal methods have tried to explain formal methods by debunking the "myths" that surround them [Hall90] [Bowen-Hinchey95]. Bowen and Hinchey have also given some advice in the form of "commandments" for good use of formal methods [Bowen-Hinchey95b].
Of the various kinds of formal methods, formal specification (that's where Larch comes in), seem to be the most useful. At a fundamental level, some specification is needed for any abstraction (otherwise there is no abstraction [Liskov-Guttag86]). A specification is useful as a contract between the implementor and a client of an abstraction (such as a procedure or a data type). This contract lays out the responsibilities and obligations of both parties, and allows division of labor (and a division of blame when something goes wrong). However, such a specification need not be formal; the main reason for preferring formal specification over informal specification is that formal specification can help prevent the ambiguity and unintentional imprecision which plague informal specification.
There are, however, some disadvantages to using formal specifications. One shouldn't take the search for precision to an extreme and try to formalize everything, as this will probably cost much time and effort. (Instead, try to formalize just the critical aspects of the system.) Another disadvantage is that a formal specification can itself contain errors and inconsistencies, which can lead to the specification becoming meaningless. The use of formality by itself also does not lead to well-structured and readable specifications; that is a job that humans have to do.
Although you can use the Larch languages and tools without subscribing to any particular view on the utility of formal methods, there is an emerging view of this question in the Larch community. This view is that formal specification and formal methods are most useful as aids to debugging designs [Garland-Guttag-Horning90] [Tan94]. The basic idea is that one writes (what one thinks is) redundant information into specifications, and this information is compared with the rest of the specification. Often the supposedly redundant information does not match, and trying to check that it does brings out design or conceptual errors. Thus formal methods play a role complementary to prototyping, in that they allow one to analyze certain properties of a design to see if there are lurking inconsistencies or problems.
Much of what has been said above seems logical, but experimentation would be helpful to understand what is really true and to help quantify the costs and benefits of formal methods.
One last answer to the question: some people find writing formal specifications (and even proofs) fun. No one knows why.
In this chapter, we discuss questions about the Larch Shared Language (LSL).
The Larch Shared Language (LSL) (see [Guttag-Horning93], Chapter 4, and [Guttag-Horning-Modet90]) is a language for specifying mathematical theories. LSL is a kind of equational algebraic specification language [Guttag75] [Guttag-Horning78] [Goguen-Thatcher-Wagner78] [Ehrig-Mahr85] [Futatsugi-etal85] [Mosses96] [Loeckx-Ehrich-Wolf96]. That is, specifications in LSL mainly consist of first-order equations between terms. The semantics of LSL is based on classical, multisorted first-order equational logic (see section 2.14 What is the meaning of an LSL specification?). However, LSL does have two second-order constructs, which allow you to specify rules of data type induction and certain kinds of deduction rules (see section 2.10 How and when should I use a generated by clause?, and see section 2.9 How and when should I use a partitioned by clause?).
Unlike a programming language, LSL is a purely declarative, mathematical formalism. In LSL there are no side-effects, algorithms, etc. (see section 2.18 What pitfalls are there for LSL specifiers?). Instead of specifying computation, in LSL one specifies mathematical operators and their theories. These operators are used in Larch behavioral interface specification languages as their mathematical vocabulary (see section 1.1 What is Larch? What is the Larch family of specification languages?).
An unusual feature of LSL is that it provides ways to add checkable redundancy to specifications (see section 2.11 When should I use an implies section?). See section 1.4 How does Larch compare to other specification languages?, for more detailed comparisons of LSL and related specification languages.
See Chapter 4 of [Guttag-Horning93], and the rest of this chapter, for more information.
Besides this FAQ, the best place to look is probably your own computer
system. You should have a manual page for the LSL checker,
if it's installed on your system.
Try the Unix command man lsl to see it.
You should also look for a directory (such as `/usr/larch/LSL') where the installation of LSL is found. In that directory, you will find a subdirectory `Doc', where there is some documentation on the checker. See section 2.3 Where can I get a checker for LSL?, if you don't have the LSL checker installed on your system.
There is no official reference manual for the latest version of LSL (3.1) on-line. The defining document for LSL version 2.3 [Guttag-Horning-Modet90] is available from the URL
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-058.ps.gz
You can get the MIT releases of the LSL checker using the world-wide web, starting at the following URL.
http://www.sds.lcs.mit.edu/spd/larch/lsl.html
You can also get the MIT release by anonymous ftp from the following URL.
ftp://ftp.sds.lcs.mit.edu/pub/Larch/LSL/
At Iowa State, the Larch/C++ group has made available later beta releases of the LSL checker that fix problems with its profligate use of space. You can get the sources for Unix and Windows 95 executables from the following URL.
ftp://ftp.cs.iastate.edu/pub/larch/
You might also be interested in the Larch Development Environment (LDE), from Cheng's group at Michigan State University. This environment integrates tools that support LSL, LP, LCL, and Larch/C++. It runs under SunOS and Solaris. You can get it from the following URL.
ftp://ftp.cps.msu.edu/pub/serg/tools/LDE/
The following makefile shows one way to use the Unix
command make to help check LSL traits.
It relies on properties of the Bourne shell under Unix
and standard Unix make,
so it would have to be adjusted to work on other systems.
(Also, if you cut and paste this makefile, be sure to change the
leading blanks to tab characters on the rule lines.)
SHELL = /bin/sh
LSL = lsl
LSLFLAGS =
.SUFFIXES: .lsl .lsl-ckd .lp
.lsl.lsl-ckd:
$(LSL) $(LSLFLAGS) $< 2>&1 | tee $
.lsl.lp:
$(LSL) -lp $(LSLFLAGS) $<
checkalltraits:
$(LSL) $(LSLFLAGS) *.lsl
clean:
rm -f *.lsl-ckd *.lpfrz
cleanall: clean
rm -f *.lpscr *.lp *.lplog
If you use the emacs text editor, you can then check your traits by using M-x compile (and responding with something like checkalltraits and a return). When you do this, you can then use C-x ` (which is an alias for M-x next-error) to edit the LSL trait with the next error message (if you're using a version of the LSL checker version 3.1beta5 or later). Emacs will put the cursor on the appropriate line for you automatically.
See section 3.10 How do I use LP to check my LSL traits?, for more information on how to use LP to check the implications in LSL traits.
The "sorts" in an LSL trait correspond to the types in a programming language. They are names for sets of values.
This usage is historical (it seems to come from [Goguen-Thatcher-Wagner78]), and arises from the desire to avoid the word "type", which is heavily overused. However, it does have some practical benefit, in that in the context of a BISL, one can say that a type comes from a programming language, but a sort comes from LSL.
Unlike some other specification languages
(e.g., OBJ [Futatsugi-etal85] [Goguen-etal87]),
in LSL there is no requirement that all sort names be declared.
For example, in the following trait,
a sort Answer is used.
AnswerTrait: trait
introduces
yes, no, maybe: -> Answer
In summary, LSL sorts are declared by being mentioned in the signatures of operators.
An LSL trait is used to describe a mathematical theory. This could be used by a mathematician to simply formalize a theory, but more commonly the theory specified is intended to be used as the mathematical vocabulary for some BISL. Another common use is as a way of specifying input to the Larch Prover (LP).
When used as mathematical vocabulary for some behavioral interface
specification, one can identify some other common uses.
Quite often one wants to specify the abstract values of some
data type.
(See section 4.4 What is an abstract value?.)
Examples include the Integer, Set,
Queue, and String
traits in Guttag and Horning's handbook [Guttag-Horning93a].
Another common use is to specify some function on abstract values,
in order to ease the specification of some procedure.
One might also write a trait in order to:
The sections of an LSL trait are determined by the LSL grammar [Guttag-Horning-Modet90]. (See section 2.3 Where can I get a checker for LSL?, for a more recent grammar, which is found in the file `Doc/lsl3_1.y', for version 3.1, and `Doc/lsl3_2.y', for version 3.2.)
As a rough guide, one can have the following sections, in the following order.
A good general introduction is found in [Guttag-Horning93], Chapter 4. See section 2.8 What is the difference between assumes and includes?, for more on the purposes of the assumes and includes clauses.
One way to think of this is that includes requests textual
inclusion of the named traits, but assumes only says that
all traits that include the trait being specified
must include some trait or traits that makes the assumptions true.
Hence includes is like #include in C or C++,
but assumes is quite different.
A common reason to use assumes instead of includes
is when you want to specify
a trait with a sort parameter,
and when you also need some operators to be defined on your sort
parameters.
For example, Guttag and Horning's handbook trait Sequence
(see [Guttag-Horning93], page 174),
starts as follows
Sequence(E, C): trait assumes StrictPartialOrder(>, E)
The assumes clause is used to specify that the sort parameter
E
must have an operator > that makes it a strict partial order.
One way to view such assumptions is that they specify
what is needed about the theory of a sort parameter in order to
sort check,
and
make sense of
the operators that apply to the sort parameter [Goguen84] [Goguen86].
One can also use assumes to state assumptions about operators
that are trait parameters.
For example, Guttag and Horning's handbook trait Sequence
(see [Guttag-Horning93], page 175),
starts as follows.
PriorityQueue (>: E,E -> Bool, E, C): trait assumes TotalOrder(E for T)
The assumes clause is used to specify that the parameter
>:E,E -> Bool must make E into a total order.
Another reason to use an assumes clause instead of an includes
clause is to state that a trait is not self-contained,
but is intended to be included in some context in which
some other trait has been included.
For example, Guttag and Horning's handbook trait IntegerPredicates
(see [Guttag-Horning93], page 164),
starts as follows.
IntegerPredicates (Int): trait assumes Integer
The assumes clause is used to specify that this trait has
to be used in a trait that has included (the theory of)
the trait Integer.
See [Guttag-Horning93], Section 4.6, for more details.
You should use a partitioned by clause to identify (i.e., make equivalent) some terms that would otherwise be distinct in an LSL trait's theory. A classic example is in the specification of sets, which includes the following partitioned by clause (see [Guttag-Horning93], page 166).
C partitioned by \in
In this example, C is the sort for sets,
and \in is the membership operator on sets.
This means that two sets, s1 and s2, are distinct (i.e., not equal)
if there is some element e such that
member(e, s1) is not equal to member(e, s2).
Another way to look at this is that, you can prove that two sets
are equal by showing that they have the exactly the same members.
(The sets insert(1, insert(2, {})) and
insert(2, insert(1, {})) are thus equal.)
This property is what distinguishes a set from a list,
because it says that order does not matter in a set.
The reason there is a separate clause in LSL for this is because of the implicit quantifiers involved. The partitioned by clause in the example above is more powerful than the following (see [Wing95], Section 5.1).
\forall e: E, s1, s2: Set
((e \in s1) = (e \in s2)) => s1 = s2
The reason is that the above code would allow
you to conclude that {1,2} = {2,3},
because both have 2 as a member.
(That is, one can let e be 2,
and then the left hand side of the implication holds.)
Historically, LSL did not have a way to write quantifiers inside
its formulas, and so the partioned by clause was necessary.
With LSL version 3.1, however,
one could write the following equivalent
to the above partitioned by clause for Set.
(See section 2.24 How do I write logical quantifiers within an LSL term?,
for the meaning of \A.)
\forall e: E, s1, s2: Set
(\A e (e \in s1) = (e \in s2)) => s1 = s2
However, the above formula will be harder to work with in LP than the equivalent partitioned by clause. This is because LP has special provision for using partitioned by clauses. Another reason to prefer the partitioned by clause is that it conveys intent more clearly than a general logical formula like the above. Thus it there are still good reasons to use a partitioned by clause in LSL.
See section 4.2 in Guttag and Horning's book [Guttag-Horning93] for more details.
You should use a generated by clause to specify that there
can be no abstract values beside the ones that you are describing
for a given sort. This is quite common for specifications
of the abstract values of data types, but will not usually
be the case for the abstract values of sort parameters
(such as E in Queue[E]).
A generated by clause can also be thought of as excluding
"junk"
(i.e., nonstandard values) from a type [Goguen-Thatcher-Wagner78].
For example, in the trait AnswerTrait
(see section 2.5 What is a sort?),
nothing in the specification prevents there being other elements
of the sort Answer besides yes, no,
and maybe.
That is, if x has sort Answer, then the formula
x = yes \/ x = no \/ x = maybe
is not necessarily true;
in other words, it's not valid.
However, in the following trait the above formula is valid,
because of the generated by clause.
(This is stated in the trait's implies section.)
AnswerTrait2: trait
includes AnswerTrait
asserts
Answer generated by yes, no, maybe
equations
yes \neq no;
yes \neq maybe;
no \neq maybe;
implies
\forall x: Answer
x = yes \/ x = no \/ x = maybe;
Yet another reason to use a generated by clause
is to assert a rule
of data type induction.
This view is particularly helpful in LP,
because it allows one to divide up a proof into cases,
and to use induction.
For example, when proving some property P(x)
holds for all x of sort Answer,
the generated by clause of the trait AnswerTrait2
allows one to proceed by cases;
that is, it suffices to prove P(yes), P(no),
and P(maybe).
A more typical induction example can be had by considering
Guttag and Horning's handbook trait Set
(see [Guttag-Horning93], page 167).
In this trait, C is the sort for sets.
It has the following generated by clause
(by virtue of including the trait
SetBasics, see [Guttag-Horning93], page 166).
C generated by {}, insert
This clause says that the only abstract values of sort C,
are those that can be formed from (finite) combinations
of the operators {} and insert.
For example, insert(x, insert(y, {})) is a set.
This rules out, for example, infinite sets,
and sets that are nonstandard.
It does not say that each such term denotes a distinct set,
and indeed some terms fall in the same equivalence class
(see section 2.9 How and when should I use a partitioned by clause?).
It also does not mean that other operators that form sets
cannot be defined;
for example, in the trait Set
one can write {x} \U {y}.
What it does say is that every other way to build a set,
like this one, has to
be equivalent to some use of {} and insert.
Let us consider a sample proof by structural induction in
the trait Set.
One of the implications in this trait is that
\forall s: C
size(s) >= 0
To prove this, one must prove that it holds for all sets s.
Since, by the generated by clause,
all sets can be formed from {} and insert,
it suffices to prove two cases.
For the base case, one must prove the following trivial result.
size({}) >= 0
For the inductive case, one gets to assume that
size(y) >= 0
and then one must prove the following.
\forall s: C, e: E
size(insert(e, y)) >= 0
This follows fairly easily; we leave the details to the reader.
See section 4.2 in Guttag and Horning's book [Guttag-Horning93] for more details on the generated by clause.
One situation where one usually does not want to use a
generated by clause is when specifying the theory of a sort parameter.
For example, in the trait Set
(see [Guttag-Horning93], page 167),
there is no generated by clause describing the elements E
of the set.
The reason not to use a generated by clause for such sort parameters
is to make the assumptions on such parameters as weak as possible.
Thus, unless one has to do induction over the elements of a sort parameter,
one would not need to know that they are generated.
You should usually use an implies section in a trait, even though it adds nothing to the theory of a trait, and is completely redundant. The reasons for doing so are as follows.
Redundancy is a good thing for debugging and consistency checking; after all, one needs two pieces of information to do any checking.
A motto that mathematicians live by is the following. (You do realize that writing an LSL trait is doing math, don't you?)
Few axioms, many theorems.
For LSL, this motto means that you should try to make your asserts section as small and as elegant as possible, and to push off any redundancy into the implies section.
An important piece of redundant information you can provide for a reader in the implies section is a converts clause. You should also consider using a converts clause for each operator you have introduced in the trait. See section 2.12 How and when should I use a converts clause?.
Writing a good implies section for a trait is something that takes some practice. It helps to get a feel for what can be checked with LP first (see [Guttag-Horning93], Chapter 7). It also helps to look at some good examples; for example, take a look at the traits in Guttag and Horning's handbook (see [Guttag-Horning93], Appendix A); most of these have an implies section, and some are quite extensive.
One general hint about writing implies clauses comes from the fact there are an infinite number of theorems that follow from every LSL specification. (One infinite set of theorems concerns the built-in sort Bool.) Since there are an infinite number of theorems, you can't possibly list all of them. Focus on the ones that you think are interesting, especially those that will help the reader understand what your specification is about, and those that will help you debug your specification.
See Section 4.5 in Guttag and Horning's book [Guttag-Horning93] for more details.
A converts clause goes in the implies section of a trait. It is a way to make redundant claims about the completeness of the specification of various operators in a trait.
You should consider stating a converts clause for each non-constant operator you introduce in a trait. If you list an operator in a converts clause, you would be saying that the operator is uniquely defined "relative to the other operators in a trait" (see [Guttag-Horning93], p. 142, [Leavens-Wing97], Section B.1). Of course, this would not be true for every operator you might define. This means that if you state that an operator is converted in the implies section of a trait, then you are claiming that there is no ambiguity in the definition of that operator, once the other operators of the trait are fixed.
As an example of a converts clause, consider the following trait.
ConvertsExample: trait
includes Integer
introduces
unknown, hmm, known: Int -> Int
asserts \forall i: Int
hmm(i) == unknown(i);
known(i) = i + 1;
implies
converts
known: Int -> Int,
hmm: Int -> Int
In this trait, the converts clause claims that both
known and hmm are converted.
It should be clear that known is converted,
because it is uniquely-defined for all integers.
What may be a bit puzzling is that hmm is also converted.
The reason is that once the interpretation of the other operators
in the trait are fixed, in particular once the interpretation of
unknown is fixed, then hmm is uniquely-defined.
It would not be correct to claim that both hmm and
unknown are converted; since they are defined in terms of
each other, they would not be uniquely defined relative to the
other operators of the trait.(1)
See Section 4.5 and pages 142-144 in Guttag and Horning's book [Guttag-Horning93] for more details.
Often one wants to say that some operator is uniquely-defined, but not for some arguments. This can be done using an exempting clause, which is a subclause of a converts clause. See section 2.13 How and when should I use an exempting clause?, for more discussion.
An exempting clause is a subclause of a converts clause. (See section 2.12 How and when should I use a converts clause?.) You should use an exempting clause when you want to claim that some operator is uniquely-defined, "relative to the other operators in a trait" ([Guttag-Horning93], p. 142), except for some special cases.
A very simple example is the trait List
(see [Guttag-Horning93], p. 173),
where the converts includes the following.
converts head
exempting head(empty)
This just says that head is well-defined,
except for taking the head of an empty list.
A slightly more complicated exempts clause is illustrated by the following classic example. Suppose one specifies a division operator on the natural numbers. In such a trait (see [Guttag-Horning93], p. 201), one could use the following trait outline.
Natural (N): trait
introduces
0: -> N
div: N, N -> N
% ...
asserts
% ...
implies
converts div: N, N -> N
exempting \forall x: N
div(x, 0)
The converts clause in this example claims that div is
uniquely-defined, relative to the other
operators in a trait, except that division by 0 is not defined.
Without the exempting subclause, the converts clause
would not be correct.
See the following subsections and page 44 of [Guttag-Horning93] for more information.
No, exempting a term does not make it undefined.
An exempting clause merely
takes away the claim that it is defined
(see [Guttag-Horning93], page 44).
For example, in the trait List
(see [Guttag-Horning93], p. 173),
writing the following does not make head(empty) undefined.
converts head
exempting head(empty)
If anything, it is the lack of an axiom defining the value of
head(empty) that makes it "undefined".
However, you would certainly want to use an exempting
clause in situations where you have a term that is intended to
be "undefined".
The concept of "undefinedness" is a tricky one for LSL.
In a technical sense, nothing is undefined in LSL,
because each constant and variable symbol has a value,
and each non-constant operator is a total function
(see [Guttag-Horning93], p. 9, and [Leavens-Wing97], Appendix A).
It would be more accurate to say that in the trait List,
the term head(empty) is "underspecified".
This means it has a value, but the trait doesn't specify what
that value is.
See section 2.16 Is there support for partial specifications in LSL?,
for more discussion on this point.
With LSL (through version 3.1) the exempting subclause
of a converts clause is not based on semantics,
instead it is based on syntax.
You can only exempt terms having particular forms;
these forms allow the arguments to an operator
to be either ground terms (as in head(empty)),
or universally quantified variables or both
(as in div(x, 0)).
If you want to exempt all terms for which one argument is positive, or has a size larger than 3, then you are out of luck. All you can do is to state your conversion claim informally in a comment (as you can't use a converts clause naming the operator in question). If you want to rewrite your trait, you could, for example, change the appropriate argument's sort to be the natural numbers instead of the integers, or define a new sort for which your operator is completely-defined.
See Section B.2 of [Leavens-Wing97] for a proposed extension to LSL that would solve this problem.
An LSL trait denotes a theory,
which is a collection of true formulas (of sort Bool).
This theory contains "the trait's assertions,
the conventional axioms of first-order logic, everything that follows
from them, and nothing else"
(see [Guttag-Horning93], p. 37).
For a brief introduction to these ideas,
see Chapter 2 of [Guttag-Horning93];
for general background on equational logic,
see [Ehrig-Mahr85] or [Loeckx-Ehrich-Wolf96].
Another way to look at the meaning of an LSL trait is the set of models it denotes. An algebraic model of a trait has a set for each sort, and a total function for each operator. In addition, it must satisfy the axioms of first-order logic and the axioms of the trait. A model satisfies an equational axiom if it assigns the same value to both sides of the equation. (See [Ehrig-Mahr85], Chapter 1, or [Loeckx-Ehrich-Wolf96], Chapter 5, for more details.)
In contrast to some other specification languages,
the semantics of LSL is loose
(see [Guttag-Horning93], p. 37).
That is, there may be several models of a specification
that have observably distinct behavior.
For example, consider the trait ChoiceSet
in Guttag and Horning's handbook (see [Guttag-Horning93], p. 176).
There are many models of this trait with different behavior;
for example, it might be that in one model
choose({2} \U {3}) = 2,
and in some other model choose({2} \U {3}) = 3.
The specification won't allow you to prove either of these things,
of course, because whatever is provable from the specification
is true in all models.
The following answer is adapted from a posting to `comp.specification.larch' by Horning (July 19, 1995) in response to a question by Leavens (July 18, 1995).
The trouble starts with your question: there are no "undefined terms" in LSL. LSL is a language of total functions. There are no partial functions associated with operators, although there may be underspecified operators (i.e., operators bound to functions with different values in different models). The possibility of underspecification is intentional; it is important that models of traits do not have to be isomorphic.
Another way to say this is that in an algebraic model of a trait (see section 2.14 What is the meaning of an LSL specification?), each term denotes an equivalence class, which can be thought of as its definition in that algebra. In other algebras, the same term may be in a nonisomorphic equivalence class, if the term is underspecified.
To make this clearer, consider the following trait.
QTrait : trait
includes Integer
Q union of theBool: Bool, theInt: Int
introduces
isPositive: Q -> Bool
asserts
\forall q: Q
isPositive(q) == q.theInt > 0 /\ (tag(q) = theInt);
implies
\forall q: Q, b: Bool
isPositive(q) == (tag(q) = theInt) /\ q.theInt > 0;
isPositive(q) == if (tag(q) = theInt)
then q.theInt > 0 else false;
isPositive(theBool(b)) == false;
converts isPositive: Q -> Bool
In this trait, the question is whether the assertion that defines
isPositive is written correctly.
In particular, is isPositive(theBool(true))
"undefined" or an error?
This is clarified in the implies section of the trait,
which claims that the order of conjuncts in the definition of isPositive
does not matter, that the definition using if is equivalent,
that the meaning of isPositive(theBool(true)) is false,
and that isPositive is converted
(see section 2.12 How and when should I use a converts clause?).
The trait QTrait says that q.theInt is always of sort Int.
However, it places no constraint on what integer it is,
unless tag(q) = theInt. So in the definition and the first
implication,
both operands of /\ are always defined
(i.e., always mean either true or false,
since Bool is generated by true and false),
even though we may not always know which.
So the definition is fine, and the implications are all provable.
The fact that you don't have to use if, and that all the classical laws of logic (like conjunction being symmetric) apply is a big advantage of the total function interpretation of LSL. See section 2.16.1 Can I specify a partial function in LSL? for more discussion.
There are two ways one might understand this question. These are dealt with below.
Technically, no; all functions specified in
LSL are total
(see section 2.14 What is the meaning of an LSL specification?).
Thus every operator specified in LSL takes on some value
for every combination of arguments.
What you can do is to underspecify such an operator,
by not specifying what its value is on all arguments.
For example, the operator head in the
handbook trait List
(see [Guttag-Horning93], p. 173)
is underspecified in this sense, because no value is specified
for head(empty).
It is a synonym to say that head is weakly-specified.
One reason LSL takes this approach to dealing with partial functions is that it makes the logic used in LSL classical. One doesn't have to worry about non-classical logics, and "bottom" elements of types infecting a term [Gries-Schneider95]. Another reason is that when LSL is used with a BISL, the precondition of a procedure specification can protect the operators used in the postcondition from any undefined arguments [Leavens-Wing97].
(If you want to develop a theory of partial functions,
however, you can do that in LSL.
For example, you can specify a type with a constant bottom,
that, for you, represents "undefined".
You will just have to manipulate your partial functions
with LSL operators that are total functions.)
No, you don't have to specify everything completely in LSL. It's a good idea, in fact, to only specify the aspects that are important for what you are doing. For example, if you want to reason about graph data structures, it's best to not (at first, anyway) try to specify the theory of graphs in great detail. Put in what you want, and go with that. (This won't hurt anything, because whatever you need to prove you'll be forced to add eventually.)
Anything you can specify in LSL, you can do incompletely. (There are no "theory police" that will check your traits for completeness.) To incompletely specify a sort, for example, nodes in a graph, just use it's name in operations and don't say anything more about it. If you need to have some operators that manipulate nodes, then write their signatures into the introduces section, but don't write any axioms for them. If you need to know some properties of these operators, you can write enough axioms in the asserts section to prove them (perhaps making some of the properties theorems in the implies section).
The time to worry about the completeness of your theory is when you are having trouble proving some of the things you want to prove about it, or if you are preparing it for reuse by others. (On the other hand, a less complete theory is often more general, so when trying to generalize a trait for others to use, sometimes it helps to relax some of the properties you have been working with.)
Yes, you can define operators recursively.
A good, and safe, way to do this is to use structural induction
(e.g., on the generators of a sort).
For example, look at how count is defined
in the handbook trait BagBasics
(see [Guttag-Horning93], p. 168).
In this trait, the sort C of bags is generated by
the operators {} and insert.
There is one axiom that specifies count when its second
argument is the empty bag, and one that specifies count
when its argument is constructed using insert.
There are many other examples in the Guttag and Horning's handbook
(see [Guttag-Horning93], Appendix A),
for example
the many specifications of operator size (Section A.5),
the specification of the operators head and tail
in the trait Queue (p. 171), the specification
of flatten in the trait ListStructureOps (p. 183),
and the specification of operator + in the trait
Positive (p. 202).
See section 2.19 Can you give me some tips for specifying things with LSL?,
for more details on structural induction.
This kind of recursive specification is so common that if you are familiar with functional programming (especially in a language like Haskell or ML), then it's quite helpful to think that way when writing LSL traits.
There is one thing to be careful of, however.
Unless you use structural induction,
you might specify an inconsistent trait.
The following example (from David Schmidt, personal communication),
illustrates the problem.
This trait includes the Integer trait of Guttag and Horning's
handbook (see [Guttag-Horning93], p. 163).
BadRecTrait: trait
includes Integer
introduces
f: Int -> Int
asserts
\forall i: Int
f(i) == f(i) + 1;
It's easy to prove that this trait is inconsistent;
subtract f(i) from both sides of the defining equation
for f, and you obtain 0 = 1.
As a heuristic for avoiding this kind of mistake,
avoid recursive specifications that don't "make progress"
when regarded as programs.
It's also possible to fall into a pit when recursively defining a partial function. The following is a silly example, but illustrates the problem.
BadZeroTrait: trait
includes Integer, MinMax(Int)
introduces
zero: Int -> Int
asserts
\forall k: Int
zero(k) == if k = 0 then 0 else min(k, zero(k-1));
This is silly, the idea being to define a constant function
that returns zero for any nonnegative integer.
Still, it looks harmless.
However, what does the specification say about zero(-1).
It's easy to see that it's less than -1, and less than
-2, and indeed less than any integer.
But the Integer trait in Guttag and Horning's handbook
(see [Guttag-Horning93], p. 163)
does not allow there to be such an integer;
so this trait is inconsistent.
To solve this problem, you could change the axiom defining
zero to indicate the intended domain,
as follows.
\forall k: Int
(k >= 0) => zero(k) = (if k = 0 then 0 else min(k, zero(k-1)));
This revised axiom would allow zero(-1) to have any value.
Another cure for this kind of example would be to specify the domain
of zero as the natural numbers.
(Of course, for this particular example, a nonrecursive definition
is also preferable.)
See section 2.18 What pitfalls are there for LSL specifiers?, for another thing to watch for when making a recursive definition.
According to Guaspari (posting to the larch-interest mailing list,
on March 8, 1995),
"the commonest `purely mathematical' mistakes" in trait definitions
occur when one uses structural induction "over constructors
that don't freely generate a sort".
To understand this, consider his example,
which includes the handbook trait Set
(see [Guttag-Horning93], page 167).
GuasparisExample: trait
includes Integer, Set(Int, Set[Int])
introduces
select: Set[Int] -> Int
asserts
\forall s: Set[Int], e: Int
select(insert(e, s)) == e;
implies
equations
insert(1, {2}) == insert(2, {1});
select(insert(1, {2})) == select(insert(2, {1}));
1 == 2; % ouch!
In the above example, the problem with the definition of select
is that it's not well-defined with respect to the equivalence
classes of the sort Set[Int].
The problem is highlighted in the implies section of the trait.
The first implication is a reminder that the order of insertion in
a set doesn't matter; that is, insert(1, {2}) and
insert(2, {1}) are equivalent sets
because they have the same members.
In the jargon, insert(1, {2}) and
insert(2, {1}) are members of the same equivalence class.
The second implication is a reminder that an operator,
such as select must give the same result on equal arguments;
in the jargon, a function must "respect" the equivalence classes.
But the definition of select does not respect the equivalence
classes, because if one applies the definition to both sides of
the second implication, one concludes that 1 = 2.
One way to fix the above definition is shown by
the trait ChoiceSet in Guttag and Horning's handbook
(see [Guttag-Horning93], page 176).
Another way that would work in this example
(because integers are ordered) would be to define
an operator minimum on the sort Set[Int],
and to use that to define select as follows.
GuasparisExampleCorrected: trait
includes Integer, Set(Int, Set[Int]), MinMax(Int)
introduces
select: Set[Int] -> Int
minimum: Set[Int] -> Int
asserts
\forall s: Set[Int], e, e1, e2: Int
minimum(insert(e, {})) == e;
minimum(insert(e1, insert(e2, s)))
== min(e1, minimum(insert(e2, s)));
select(insert(e, s)) == minimum(insert(e, s));
implies
\forall s: Set[Int], e1, e2: Int
minimum(insert(e1, insert(e2, s)))
== minimum(insert(e2, insert(e1, s)));
select(insert(1, {2})) == 1;
select(insert(2, {1})) == 1;
To show that minimum is well-defined,
one has to prove that it gives the same answer
for all terms in the same equivalence class.
The first implication is this trait does not prove
this in general, but does help in debugging the trait.
To explain the difference that having a "freely-generated"
sort makes, compare the trait GuasparisExample above to
the handbook trait Stack
(see [Guttag-Horning93], page 170).
In that trait, top is defined using the following equation.
\forall e: E, stk: C
top(push(e, stk)) == e;
Why doesn't this definition have the same problem as the definition
of select?
Because it respects the equivalence classes of the sort C
(stacks).
Why? Because there is no partitioned by clause for Stack,
all equivalence classes of stacks have a canonical representative.
The canonical representative is
a term built using just the operators
noted in the generated by clause
for Stack: empty and push;
it is canonical because there is only one way to use just these
generators to produce a value in each equivalence class.
The sort of stacks is called freely generated because of this
property.
(The sort for sets is not freely generated, because
there is more than one term built using the generators
that represents the value {1,2}.)
Since a term of the form push(e, stk) is a
canonical representative of its equivalence class,
the specification of top automatically respects the
equivalence classes.
A more involved example is the definition of
the set membership operator (\in) for sets
(see [Guttag-Horning93], p. 166 and p. 224).
Since sets are not freely generated,
it takes work to prove that the specification of \in
is well-defined.
You might want to convince yourself of that as an exercise.
In summary, to avoid this first pitfall, remember the following mantra when defining operators over a sort with a partitioned by clause or an interesting notion of equality.
Is it well-defined?
As a help in avoiding this problem, you can use a generated freely by clause in the most recent versions of LSL. The use of a generated freely by clause makes it clear that the reader does not have to be concerned about ill-defined structural inductions.
Another pitfall in LSL is to assert that all values of some freely-generated type satisfy some nontrivial property. An example is the trait given below.
BadRational: trait
includes Integer
Rational tuple of num: Int, den: Int
asserts
\forall r: Rational
r.den > 0;
implies
equations
([3, -4]).den > 0;
(-4) > 0; % oops!
In this trait, the implications highlight the consequences
of this specification pitfall: an inconsistent theory.
The reason the above trait is inconsistent
is that it says that all integers are positive.
This is because for all integers i and j,
one can form a tuple of sort rational by writing [i, j].
This kind of inconsistency arises whenever one has a freely
generated sort, such as LSL tuples, and tries to add constraints
as properties.
There are several ways out of this particular pit.
The most straightforward is to use a different sort for the
denominators of rationals (e.g., the positive numbers,
as in [Guttag-Horning93], page 207).
Another way is to specify the sort Rational without using a tuple,
but using your own constructor, such as mkRat;
this allows the sort to be not freely generated.
The only problem with that approach is that
one either has to make arbitrary or difficult decisions
(such as the value of mkRat(3, 0)),
or one has to be satisfied with an underspecified constructor.
Another way to avoid this is to use a parameterized trait;
for example, we could have parameterized BadRational
by the sort Int,
and instead of including the trait Integer,
used an assumes clause to specify the property desired
of the denominator's sort.
In some cases, one can postpone asserting such properties
until one reaches the interface specification;
this is because in a BISL one can impose such properties on
variables, not on entire sorts.
To avoid this pit, think of the following mantra.
Is this property defining a sort, or imposing some constraint on another sort?
(If the answer is that it's imposing a constraint on another sort, watch out for the pit!)
A pitfall that some novices fall into is forgetting that LSL is a purely mathematical notation, that has no concept of state or side effects. For example, the following are surely incorrect signatures in LSL.
BadSig: trait includes Integer introduces currentTime: -> Seconds random: -> Int
If you think random is a good signature for a random number
generator, you've fallen into the pit.
The operator random is an (underspecified) constant;
that is, random stands for some particular integer,
always the same one no matter how many times it is mentioned in an assertion.
The operator currentTime illustrates how mathematical notation
is "timeless": it is also a constant.
How could you specify a notion like a clock in LSL? What you have to do is, instead of specifying the concept of "the current time", specify the concept of a clock. For example consider the following trait.
ClockTrait: trait
assumes TotalOrder(Seconds for T)
introduces
newClock: Seconds -> Clock
tick: Clock -> Clock
currentTime: Clock -> Seconds
asserts
Clock generated by newClock, tick
\forall sec: Seconds, c: Clock
currentTime(newClock(sec)) == sec;
currentTime(tick(c)) > currentTime(c);
implies
\forall sec: Seconds, c: Clock
currentTime(tick(tick(c))) > currentTime(c);
Note that a clock doesn't change state in LSL, however,
because each clock is also timeless, the tick operator
produces a new Clock value.
Tricks like this are fundamental to semantics
(see, for example, [Schmidt86]).
As an exercise, you might want to specify a random number generator
along the same lines.
To avoid this pit remember the following mantra when you are writing the signatures of operators in LSL.
Does this operator have any hidden parameters?
There are no hidden parameters, or global variables, in LSL.
A logical pitfall that one can fall prey of in LSL is failing to understand that the quantifiers on an equation in LSL are outside the equation as a whole. Usually this problem can be detected when there is a free variable that appears only on the right-hand side of an equation. (This assumes that you use the typical convention of having the right-hand side of an equation be the "result" of the left-hand side.) For example, consider the following (from [Wing95], p. 13).
BadSubsetTrait: trait
includes SetBasics(Set[E] for C)
introduces
__ \subseteq __: Set[E], Set[E] -> Bool
asserts
\forall e: E, s1, s2: Set[E]
s1 \subseteq s2 == (e \in s1 => e \in s2);
implies
\forall e: E, s1, s2: Set[E]
(e \in s1 /\ e \in s2) => s1 \subseteq s2;
The implication highlights the problem:
if two sets have any single element in common,
then they are considered subsets of each other by the above trait.
This trait could be corrected in LSL by writing the quantifier inside
the defining equation.
(See section 2.24 How do I write logical quantifiers within an LSL term?,
for the meaning of \A.)
\forall e: E, s1, s2: Set[E]
s1 \subseteq s2 == \A e (e \in s1 => e \in s2);
This is different, because the right-hand side must hold for all e,
not just some arbitrary e.
To avoid this pit remember the following mantra whenever you use a variable on only one side of an equation.
Is it okay if this variable has a single, arbitrary value?
See Wing's paper "Hints to Specifiers" [Wing95] for more tips and general advice.
Yes indeed, we can give you some tips.
The first tip helps you write down an algebraic specification
of a sort that is intended to be used as an abstract data type
(see [Guttag-Horning78], Section 3.4,
and [Guttag-Horning93], Section 4.9).
The idea is to divide the set of operators of your sort into
generators and observers.
A constructor returns your sort, while an observer
takes your sort in at least one argument
and returns some more primitive sort (such as Bool).
The tip (quoted from [Guttag-Horning93], p. 54) is to:
Write an equation defining the result of applying each observer to each generator.
For example, consider the following trait.
ResponseSigTrait: trait
introduces
yes, no, maybe: -> Response
optimistic, pessimistic: Response -> Bool
asserts
Response generated by yes, no, maybe
Response partitioned by optimistic, pessimistic
In this trait, the generators are yes, no,
and maybe; the observers are optimistic
and pessimistic.
What equations would you write?
According to the tip, you would write equations with left-hand sides as in the following trait. (The tip doesn't tell you what to put in the right hand sides.)
ResponseTrait: trait
includes ResponseSigTrait
asserts
equations
optimistic(yes) == true;
optimistic(no) == false;
optimistic(maybe) == true;
pessimistic(yes) == false;
pessimistic(no) == true;
pessimistic(maybe) == true;
implies
\forall r: Response
yes ~= no;
yes ~= maybe;
no ~= maybe;
optimistic(r) == r = yes \/ r = maybe;
pessimistic(r) == r = no \/ r = maybe;
(Defining the operators optimistic and pessimistic
as in the last two equations in the
implies section of the above trait is not wrong,
but doesn't follow the tip.
That kind of definition can be considered a shorthand notation
for writing down the equations in the asserts section.)
As in the above example, you can specify that some set of generators is complete by listing them in a generated by clause. If you have other generators, what Guttag and Horning call extensions, then they can be defined in much the same way as the observers are defined. As an exercise, try defining the operator introduced in the following trait.
ResponseTraitExercise: trait
includes ResponseTrait
introduces
negate: Response -> Response
Here are some other general tips for LSL.
Associative, Symmetric, TotalOrder,
and MinMax).
A mixin trait typically does not have a generated by
or partitioned by clause (although the latter sometimes
are implied by such traits),
whereas an ADT trait typically will have at least a
generated by clause.
sort operator,
the following is a perfectly good idea,
even though the implicit algorithm
(generate and test) is terribly inefficient.
\forall l1,l2: List[OrderedItem]
sort(l1) = l2 == permutation(l1, l2) /\ ordered(l2);
-1 < 0 does not parse in LSL
unless you put parentheses around the -1.
See Wing's paper "Hints to Specifiers" [Wing95] for several more tips and general advice.
You can also find other tips by looking for the "mottos" and "mantras" in this chapter. See section 2.18 What pitfalls are there for LSL specifiers?, for tips on what to avoid.
This is a good question, although there's no hard-and-fast answer to it. To help you in deciding whether a specification is sufficiently abstract, we'll give you a checklist, and an example of what not to do.
To illustrate this checklist, we'll give a really awful example of a specification that isn't abstract enough. It illustrates every fault in the above list. (Of course, you'd never write anything this bad.)
NonAbstractSet: trait
includes Integer
SmallSet tuple of firstDef, secondDef, thirdDef: Bool,
first, second, third: Int
introduces
{}: -> SmallSet
insert: Int, SmallSet -> SmallSet
__ \in __: Int, SmallSet -> Bool
size: SmallSet -> Int
asserts
\forall i,i1,i2,i3: Int, s: SmallSet, b1,b2,b3: Bool
{} == [false, false, false, 0, 0, 0];
insert(i, s)
== if ~s.firstDef
then set_first(set_firstDef(s, true), i)
else if ~s.secondDef
then set_second(set_secondDef(s, true), i)
else if ~s.thirdDef
then set_third(set_thirdDef(s, true), i)
else s;
i \in s == if s.firstDef /\ i = s.first
then true
else if s.secondDef /\ i = s.second
then true
else if s.thirdDef /\ i = s.third
then true
else false;
size(s) == if ~s.firstDef then 0
else if ~s.secondDef then 1
else if ~s.thirdDef then 2
else 3;
Comparing this specification with the one in Guttag and Horning's
handbook ([Guttag-Horning93], pp. 166-167)
reveals a lot about how to write more abstract specifications.
Notice that the trait NonAbstractSet represents the sort
SmallSet as a tuple.
This is akin to a "design decision"
and makes the specification less abstract.
Another problem is the specification of algorithms
for the observers insert, \in, and size.
These could be made more abstract, even without changing the
definition of the sort SmallSet.
For example, using the idea of generators and observers
(see section 2.19 Can you give me some tips for specifying things with LSL?),
one could specify \in as follows.
i \in {} == false;
size(s) < 3 => (i \in insert(i, s) = true);
The above trait also defines sets of integers, even though it doesn't particularly depend on the properties of integers. It would be more abstract to specify sets of elements. Finally the trait only works for sets of three elements. Unless there is some good reason for this (for example, modeling some particular implementation), you should avoid such arbitrary limits in LSL. They can easily be imposed at the interface level.
Look at traits in the LSL handbook for examples of how to do things right. You may find that a trait can be too abstract to make it easily understood; that's fine, you don't have to try to define the ultimate theory of whatever you're working with.
See [Wing95], section 4.1 for a more extended discussion on the above ideas. You might want to look at [Liskov-Guttag86] for more background on the general idea of abstraction in specification. See also [Jones90] for the related idea of "implementation bias".