This page gives a short descriptiono of the usage of the Jacke tool. Jacke is a parser-generator for Standard ML, generating LALR parsers from grammar specifications which can be embedded into ordinary ML code. It features compositional syntax to specify grammar productions, closely resembling ML notation.
Parsers are ubiquitous in computing, providing the interface between a convenient, user-friendly front-end syntax and an internal format of input data. The area of building parsers was one of the earliest studied in Computer Science, and is rather well understood [1,2].
For many practically useful classes of grammars a corresponding parser can be derived automatically: Given the specification of a grammar in, e.g., Backus-Naur Form (BNF), a program which yields a parse tree of an input string can be generated automatically. Tools performing this task are called parser generators, and usually the produced parsers are not much less efficient than hand-coded ones. Moreover, manually coding parsers for large grammars is certainly tedious, error-prone and makes it hard to adapt to grammar changes.
This approach to parsing is quite flexible. Instead of simply yielding a parse tree, the outcome of a successful parse can be defined using semantic actions attached to the productions of a grammar specification. For example, semantic actions can be used to construct evaluators, e.g. parsing an arithmetic expression and evaluating the input to a number.
Jacke is a parser-generator tool, producing ML code for parsers for LALR languages. It is similar to the Yacc tools, but with SML consistent syntax. A major issue in the design of Jacke is simplicity. The tool features the following properties.
In parsers produced by Jacke, all symbols carry left and right position information of arbitrary, user-definable type. This information is available to semantic actions and is also used in syntactic error messages. Several parsers may be defined in a single file, corresponding to the definition of multiple start symbols.
Token declarations in Jacke look very much like SML datatypes, and in fact a corresponding datatype is generated from them. Associativity declarations mimic SML infix directives. For example,
token PLUS | MINUS | TIMES | NUM of int assocl TIMES assocl PLUS MINUSdeclares four tokens,
TIMES, PLUS
and MINUS are declared to be associative
to the left, and TIMES is given highest
precedence.
The actual productions of the grammar (which are usually mutually recursive)
take the form of rule bindings, resembling SML function definitions.
They may include a type annotation on the left hand side, documenting the
type of the intended return value. The right hand sides of rule bindings,
commonly a sequence of symbols, are extended to BNF expressions.
These are result transformations ('=>')
and alternatives ('|'), which are
compositional in structure, i.e. they may be nested arbitrarily. To continue
the example of arithmetic expressions, consider the rule bindings
rule exp :int =
NUM
| n1 as exp, oper, n2 as exp => (oper (n1,n2))
and oper =
PLUS => (op+)
| MINUS => (op-)
| TIMES => (op*)
Conceptionally, each BNF expression returns a value, synthesized from the
attributes of the subexpressions. For token identifiers without token attributes,
this is the unit value (). For tokens with attributes
(those having an 'of' part, such as NUM
above), this is just their attribute. Sequential expressions return the
tuple of their component results, alternative expressions return the value
of the rule that was successful. The combinator '=>'
allows results to be transformed into a different value, using SML expressions.
The 'as' expression allows a subexpression
result to be named, so that it is available in following result transformations.
As short hand, expressions consisting of a symbol identifier symid
only (such as oper and PLUS
above) stand for the expression symid as symid.
This often allows for more compact rules and is consistent with
the scheme used to name results of subexpressions in New Jersey's ML-Yacc.
Finally, there is the expression 'skip',
representing the empty production. The value associated with 'skip'
is ().
The start symbols of the grammar are declared using parser bindings. These declarations finallygenerate SML functions. Such a parser binding just denotes the corresponding start symbol. For example,
parser eval = expwill generate the actual parsing function
eval,
taking a lexer as input to produce the parse result. The lexer is of type
unit -> token option * position_ty * position_tyand when applied to
() should supply the
next token, if any.
Such lexers can be conveniently generated with the
ML-Lex tool.
The sources can be compiled using the Compilation Manager of SML/NJ, by typing
CM.make' "jacke.cm"To use the tool on an input file, type
Output.output "filename"This will generate a file filename.out containing ML code for the parser.
To use the generated code, make sure that the the signatures and implementation
of LALR tables and parsing engine are in the environment. These are contained
in the files LrTable.sml and LrParser.sml.
A detailed report and source files for the implementation, written in Standard ML, are available for download.
Jacke was implemented as an advanced practical (Fortgeschrittenenpraktikum) while I was studying for an M.Sc. in Computer Science at the Universität des Saarlandes, Saarbrücken. Andreas Rossberg made the original proposal and design of the tool. The project was supervised jointly by him and Leif Kornstaedt at the Programming Systems Lab of the Computer Science department.
"Jacke" is german for jacket, and (I guess) chosen as name because it is pronounced almost like Yacc. It is planned to complement Jacke with a lexer generator, going by the name Hose (trousers).
[1] Aho,A.V., Sethi,R. and Ullman,J.D. Compilers: Principles, Techniques and Tools, Addison-Wesley. 1986
[2] Appel,A.W. Modern Compiler Implementation in ML, Cambridge University Press. 1998.