Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
The Jacke Parser-Generator Tool
[go: Go Back, main page]

A Parser-Generator Tool for Standard ML

Jan Schwinghammer
Informatics
University of Sussex

Introduction

This page gives a short descriptiono of the usage of the Jacke tool. Jacke is a parser-generator for Standard ML, generating LALR parsers from grammar specifications which can be embedded into ordinary ML code. It features compositional syntax to specify grammar productions, closely resembling ML notation.

Parsers and Parser Generators

Parsers are ubiquitous in computing, providing the interface between a convenient, user-friendly front-end syntax and an internal format of input data. The area of building parsers was one of the earliest studied in Computer Science, and is rather well understood [1,2].

For many practically useful classes of grammars a corresponding parser can be derived automatically: Given the specification of a grammar in, e.g., Backus-Naur Form (BNF), a program which yields a parse tree of an input string can be generated automatically. Tools performing this task are called parser generators, and usually the produced parsers are not much less efficient than hand-coded ones. Moreover, manually coding parsers for large grammars is certainly tedious, error-prone and makes it hard to adapt to grammar changes.

This approach to parsing is quite flexible. Instead of simply yielding a parse tree, the outcome of a successful parse can be defined using semantic actions attached to the productions of a grammar specification. For example, semantic actions can be used to construct evaluators, e.g. parsing an arithmetic expression and evaluating the input to a number.

Standard ML

Standard ML is a polymorphic, statically typed functional programming language, with several implementations (e.g. SML-NJ and Moscow ML) freely available. A good introduction to programming in ML are Robert Harpers notes. A parser generator that is delivered with the SML-NJ system is ML-Yacc. Its functionality and syntax are essentially the same as those of the original Yacc (Yet Another Compiler Compiler) developed for C. Unfortunately it is not very intuitive to use.

Objectives of Jacke

Jacke is a parser-generator tool, producing ML code for parsers for LALR languages. It is similar to the Yacc tools, but with SML consistent syntax. A major issue in the design of Jacke is simplicity. The tool features the following properties.

In parsers produced by Jacke, all symbols carry left and right position information of arbitrary,  user-definable type. This information is available to semantic actions and is also used in syntactic error messages. Several parsers may be defined in a single file, corresponding to the definition of multiple start symbols.

Description

Token declarations in Jacke look very much like SML datatypes, and in fact a corresponding datatype is generated from them. Associativity declarations mimic SML infix directives. For example,

token PLUS | MINUS | TIMES | NUM of int

assocl TIMES
assocl PLUS MINUS
declares four tokens, TIMES, PLUS and MINUS are declared to be associative to the left, and TIMES is given highest precedence.

The actual productions of the grammar (which are usually mutually recursive) take the form of rule bindings, resembling SML function definitions. They may include a type annotation on the left hand side, documenting the type of the intended return value. The right hand sides of rule bindings, commonly a sequence of symbols, are extended to BNF expressions. These are result transformations ('=>') and alternatives ('|'), which are compositional in structure, i.e. they may be nested arbitrarily. To continue the example of arithmetic expressions, consider the rule bindings

rule exp :int =
    NUM
  | n1 as exp, oper, n2 as exp => (oper (n1,n2))
and oper =
    PLUS => (op+)
  | MINUS => (op-)
  | TIMES => (op*)

Conceptionally, each BNF expression returns a value, synthesized from the attributes of the subexpressions. For token identifiers without token attributes, this is the unit value (). For tokens with attributes (those having an 'of' part, such as NUM above), this is just their attribute. Sequential expressions return the tuple of their component results, alternative expressions return the value of the rule that was successful. The combinator '=>' allows results to be transformed into a different value, using SML expressions.

The 'as' expression allows a subexpression result to be named, so that it is available in following result transformations. As short hand, expressions consisting of a symbol identifier symid only (such as oper and PLUS above) stand for the expression symid as symid. This often allows for more compact rules and is consistent with the scheme used to name results of subexpressions in New Jersey's ML-Yacc. Finally, there is the expression 'skip', representing the empty production. The value associated with 'skip' is ().

The start symbols of the grammar are declared using parser bindings. These declarations finallygenerate SML functions. Such a parser binding just denotes the corresponding start symbol. For example,

parser eval = exp
will generate the actual parsing function eval, taking a lexer as input to produce the parse result. The lexer is of type
unit -> token option * position_ty * position_ty
and when applied to () should supply the next token, if any. Such lexers can be conveniently generated with the ML-Lex tool.

Usage

The sources can be compiled using the Compilation Manager of SML/NJ, by typing

CM.make' "jacke.cm"
To use the tool on an input file, type
Output.output "filename"
This will generate a file filename.out containing ML code for the parser.

To use the generated code, make sure that the the signatures and implementation of LALR tables and parsing engine are in the environment. These are contained in the files LrTable.sml and LrParser.sml.

Download

A detailed report and source files for the implementation, written in Standard ML, are available for download.

About

Jacke was implemented as an advanced practical (Fortgeschrittenenpraktikum) while I was studying for an M.Sc. in Computer Science at the Universität des Saarlandes, Saarbrücken. Andreas Rossberg made the original proposal and design of the tool. The project was supervised jointly by him and Leif Kornstaedt at the Programming Systems Lab of the Computer Science department.

"Jacke" is german for jacket, and (I guess) chosen as name because it is pronounced almost like Yacc. It is planned to complement Jacke with a lexer generator, going by the name Hose (trousers).

References

[1] Aho,A.V., Sethi,R. and Ullman,J.D. Compilers: Principles, Techniques and Tools, Addison-Wesley. 1986

[2] Appel,A.W. Modern Compiler Implementation in ML, Cambridge University Press. 1998.


Jan Schwinghammer, November 2004