I am a member of the Science and Technology Advancement Team at FNMOC (Monterey, California), U.S. Navy.
Contents
The package is available as a gzip tar file (version 1.1b): RegReg.tar.gz
This tool is written in the Scheme programming language. It generates parsers based on tagged regular expressions. For example, it can be used to construct WMO bulletin decoders.
A presentation made in October 29th, 2002, at the International Lisp Conference, San Francisco, covering Metcast fastCGI and RegReg applied to meteorological decoders, is available here: in PostScript, in PDF. A paper, titled RegReg: a Lightweight Generator of Robust Parsers for Irregular Languages to appear in the proceedings of WCRE 2003, Novembre 2003, shows an application of RegReg in reverse engineering.
The slides, in PDF, of a "demo" presentation, for October 21th, 2003, is available here.
This package allow Scheme code to be run as a fastCGI application under the mod_fcgi module available for Apache. It is composed of a proxy written in C, basic input and output functions written in Scheme to communicate with that proxy, and a general code structure for the Scheme application. The Scheme application can be run localy or on remote computers.
You need the fastCGI development kit to compile the proxy. It can be obtained from the fastCGI web site: fastCGI development Kit (external link).
Here is a tar of the code files: fastCGI_Scheme.tar
The files contain important explanations. They should be read before attempting to use this code.
If you have any questions regarding the use of this package, please contact me at mario dot latendresse dot ca at metnet dot navy dot mil. (As usual, replace ' dot ' for '.' and ' at ' for '@')
Here are the transparencies that were used during the four days course given at FNMOC in February 2002.
I recommend printing the R5RS document for more details on Scheme; although it is not intended to be pedagogical. Dorai Sitaram wrote a nice pedagogical introduction to Scheme.
Dorai Sitaram Scheme introduction is available as PDF or HTML.
Local copies of these documents:
Dorai Sitaram, Teach Yourself Scheme in Fixnum Days in PDF
Links to further information
The Metcast server was originaly designed and written by Oleg Kiselyov, at FNMOC.
This documentation presents the modifications made on the code of the Metcast server. Among other things, these modifications increase the performance of the Metcast server. Some technical details of the implementation is also described. For all technical details not related to the fastCGI, the authoritative documentation is at Spawar/JMV-TNG.
The Metcast server runs under a Scheme interpreter, that is, most of the Metcast code is written using the Scheme language. The Scheme interpreter runs as a CGI application under Apache and until recently exclusively on a modified version of Gambit Scheme. (Some upcoming modifications will make it portable to several Scheme interpreters, hopefully under PLT Scheme and Bigloo, probably on many others.)
There is also some documentation on the MBL language, the query language used by the Metcast clients. It explains how the language and the implementation can be extended to handle new products and features. The ease with which this can be done demonstrates the flexibility of the Scheme language.
A client request, to Metcast, is formulated using a specific language, the MBL language. Its syntax is relatively simple: it uses a well formed parenthesis expression, known as a S-expr. This a common syntax of Lisp and Scheme expressions. Here is an example:
(example1 (bounding-box 50 -77 42 -20) (st_constraint (st_country_code "CN")) (products (METAR)))
This request, fomulated by the client, is identified by the token `example1'. It contains three sub-expressions. The first one specifies the region of the request, the second one constrains the request to stations from Canada, and the last one identifies the product, that is Metar.
In general, a MBL request has the form:
(id parameter1 ... parametern (products product1 ... productn))A product is a S-expr of the form
(product-name parameter1 ... parametern)
The list of parameters can be empty, as shown above in example1.
As can be seen from the syntax, a request may contain several products. The parameters specified at the top level of the request apply, unless otherwise instructed by a local parameter, to each product. These are global parameters. Each product may have its own local parameters. They only apply to the product which specifies them. Some products have mandatory parameters, a request without them cannot be answered. The list of parameters, product names and their mandatory parameters are presented below.
Note: unknown parameter names will be silently ignored, without causing an error. But an unknown product name will generate an error message.
The product names are divided in categories. Here are the list of product names and the categories they belong to.
A parameter is a S-expr of the form
(parameter-name arg1 ... argn)The possible parameter names are (in alphabetic order): attr-constraint, block_id, bounding-box, call_id, center-id, depth-max, depth-min, grid-id, isobar-p-max, isobar-p-min, keywords, layer, manops, max-records, mime-type, model, modified-since, msgtype, msg_constraint, name, process-id, product-dict, product-GRIB-code, productname, projection, resolution, source, st_constraint, tau, use, valid-at.
The argument types and values depend on the parameter.
(bounding-box lon-W lat-N lon-E lat-S)
This is the mostly used parameter has it is required by most products. The four parameters can be strings, numbers, symbols, or a mix of these.
Examples
(bounding-box -75.0 40 -70 35.0) (bounding-box 75W 40N 70W 35N) (bounding-box "75W" "40N" 70W 35.0)
Longitude strings have the form <number><d> where <d> is the letters W, w, E, or e. The letters w and W specify a west longitude, the letters e and E specify an east longitude.
Latitude strings have the form <number><d> where <d> is one of the letters N, n, S, or s. The letters n and N specify a north latitude, the letters s and S specify a south latitude.
West longitudes are negative, up to -180, inclusive. East longitudes are positive, up to 180, inclusive. South latitudes are negative, up to -90, inclusive. North latitudes are positive, up to 90, inclusive.
Symbols, like 70W or 35N, are translated into strings and interpreted as above.
A st_constraint parameter is a S-expr of the following form
(st_constraint constaint1 ... constraintn)where each constraint is one of the following
A module registers, into the dynamic environment, the services it can provide. This is done at the time the module is loaded in the Scheme interpreter. It is done by associating a symbol with a closure. When parsing a sub-expression of a MBL request, which should be a list, the symbol at the head is used to retrieve the associated service.
The Metcast server communicates with the Informix database server through two pipes: an output pipe to send the SQL request to the Informix database and an input pipe to read the answer from it. The input pipe is specified by the global Scheme variable DB:PIPE-FROM-SQL and the output pipe by DB:output-port. The input pipe is a special node generated by the Unix mknod utility. The output pipe actually contains a command to execute the Informix program dbaccess. It is something similar to
"| dbaccess -e DB:NAME - 1>&2"where DB:NAME is bound to the name of the database. When a SQL request is made to the Informix server, the special node DB:PIPE-FROM-SQL is specified as the location for its answer.
The file db-util.scm defines the function open-data-base which opens those two pipes. It is called when the file is loaded in the Scheme interpreter. Thereafter, these remain opened, until the server dies.
Two functions can be used to actually send the SQL requests and process the answers. These are DB:for-each and DB:assoc-val.
If no results are expected from the database server, the function Db:imperative-stmt should be used.
The function DB:make-sql-stmt-buffer can be used to build up and execute a SQL request.
It is assumed by all these functions that the database server is
running.
The fastCGI implementation of Metcast is scalable. Several Metcast
servers can run simultaneously on different machines with one entry
point for the Metcast clients.
Benchmark results, presented in the subsections below, quantify the
fastCGI performance.
The first subsection is a comparison with the old Metcast server,
while the other two subsections are absolute performance measurements.
For this benchmark we are using the following configuration.
The Apache server is running on one machine (Zowie) while the Metcast
server is running on another (Ruby). The observation database server
is running on Ruby. Some database requests (grid and imagery) are
answered from a third machine (Spaceley).
The following table shows the number of replies per second for the
fastCGI and CGI Metcast server. (These are the saturated numbers, that
is the maximum number of requests per second that could be obtained.)
Compare to the CGI Metcast the number of processes created by the
fastCGI Metcast is very low. In the former case, every request spawn
five processes. In the latter case, only a few processes are created
every day, it is even possible that none are created even though
hundred of requests are answered. This is so since fastCGI Metcast
servers have an unlimited life span extending several days.
The machine running the server is my workstation. It is a 930 Mhz
Pentium III, single processor, running Linux (Red Hat 6.1).
The following tables show the saturated number of replies per
second. That is, once the number of replies per second is reached, the
Metcast server cannot increase the rate of replies. This rate depends
on the size of the returned content.
Two tables are presented. One without requesting any information from
the Informix database, the other with requests to it. In the latter
case, a fix file is used as a fictional answer from Informix. In the
former case, a Metar request is performed.
The fastCGI Metcast server can answer at a rate superior to 80 replies
a second if the returned content size is very small and fix. In that
case, no processing is done with the Informix database and no prepared
file is processed. The only work done is the proxy relaying the
request to the Metcast server, the request being read by it, and a
short answer being returned. Since it is a fastCGI, no Unix process is
created.
The proxy server_fcgi_proxy (see the file
server_fcgi_proxy.c) relays the Metcast client requests to
the Metcast server. If the current Metcast servers cannot handle the
load of client requests, another proxy is started, up to a maximum
specified in the file httpd.conf. Starting a new proxy causes
the UNIX super daemon inetd to start a new Metcast server to answer
client requests coming through that proxy. The proxy and the
inetd.conf file must specify the same listening socket port
to start the Metcast server. Once started, communication with the
proxy remains persistent. Idle proxies are killed by the module
mod_fastcgi. Spawning and killing of proxies are governed
by the directive FastCgiConfig.
Input/output of the Metcast server goes through the standard ports,
although it uses two approaches depending on the mode of invocation.
(The mode of invocation is either local or fastCGI, this is specified
using the MODE_INVOCATION shell variable, or the FASTCGI
Scheme variable.)
If the Metcast server is called as ``local'', it uses the standard
ports with no additional encoding, whereas if it is called as
``fastCGI'' it uses a ``chunked encoding''. The chunked encoding
solves the problem of end of file recognition by the proxy.
All module uses the function SRV:send to produce output to
the client, through the proxy. The output is buffered until a
SRV:send-terminate is executed. Each request marks and
flushes the custom dynamic environment. These functions are written
in Scheme.
Connections to databases are persistent. A Metcast server opens up
the streams once when loaded into the Scheme interpreter. If for some
reason a database does not answer a SQL request, the Metcast server
will abort due to a timeout.
For Metcast, the maxProcesses parameter limits the number of proxies,
which indirectly limits the number of fastCGI Metcast servers. Note
that an initial environment could be specified for all fastCGI
applications. It could be used to replace the initial script to setup
the proxy environment. But this is a less general solution that was
not adopted.
The proxy server has been identified by the following Location
directive in httpd.conf:
One of the technique is to use dynamic loadable Apache modules.
Apache must be built with the mod_so module. To do so
configure an Apache make file by
The mod_fastcgi.so module, which controls fastCGI
applications, should be placed in the libexec directory (usually at
/usr/local/apache/libexec). It is dynamically loaded when Apache needs
to run fastCGI applications.
FastCGI Metcast
Some fastCGI Metcast benchmarks
This is a short presentation of the benchmark results showing the
essential numbers. The httperf software was used to benchmark
the fastCGI implementation.
Comparing the old (CGI) Metcast server with the
new (fastCGI) one (Zowie,Ruby)
Size of content in KB Replies/second fastCGI Replies/second CGI 0.7 17.6 4.7 2.4 14.4 4.9 7 10.3 3.6 15 6.5 3.2 41 3.1 2.0 Benchmarks on a single machine
Size of content in KB Reply/second 1.3 30 4 17 9 11 13 7.4 20 5.4 35 2.9 64 2.0
for fastCGI Metcast, without Informix
Size of content in KB Reply/second 1.3 26 4 10 9 6 13 4.7 20 3.3 35 1.6 64 1.3
for fastCGI Metcast, with InformixBenchmarks on Ruby
The following table shows the reply rates for Ruby. The Apache server,
and the Metcast proxy, are running on my workstation. A temporary ssh
connection is made through Zowie to Ruby to allow the proxy to
communicate with Metcast servers running on Ruby. For large responses
(e.g. 30KB) the network is probbaly a bottleneck.
Size of content in KB Reply/second 1 29 4 11 7 7.2 15 3.2 37 0.9 57 0.9 60 0.8
for fastCGI Metcast, with Informix on Ruby through ZowieImplementation of the fastCGI
The fastCGI implementation uses the super daemon inetd and
the module mod_fastcgi running under Apache. The files
/etc/inetd.conf and /etc/httpd.conf must be
appropriately setup to run the Metcast server as a fastCGI. The file
inetd.conf applies to the Scheme code running the Metcast
server, whereas httpd.conf applies to a proxy running under
mod_fastcgi which itself is running under the Apache server.
Setting up fastCGI in the Apache configuration file
Load balancing is automatically performed by the fastCGI module of
Apache. The rates of spawning and killing of proxies are configured
using the directive FastCgiConfig in the Apache configuration file
httpd.conf.
Here is an example of a FastCgiConfig directive.
FastCgiConfig -restart -restart-delay 5 -idle-timeout 120
-maxProcesses 30 -maxClassProcesses 30 -initial-env none
<Location /cgi-bin/start_server_fcgi_proxy>
SetHandler fastcgi-script
#perhaps other authorization directives
</Location>
Consult the fastCGI documentation at
www.fastcgi.com for more information.
fastCGI applications under Apache
To use any fastCGI application under Apache, the fastCGI module must
be installed into Apache.
./configure --enable-module=so
and perform a make. (You must have all Apache sources to do that).
You should have an ew Apache server with dynamically loadable modules.