Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Mario Latendresse, FNMOC
[go: Go Back, main page]

Mario Latendresse

Last update: January 2004.

I am a member of the Science and Technology Advancement Team at FNMOC (Monterey, California), U.S. Navy.

Contents

RegReg: a parser generator based on tagged regular expressions

The package is available as a gzip tar file (version 1.1b): RegReg.tar.gz

This tool is written in the Scheme programming language. It generates parsers based on tagged regular expressions. For example, it can be used to construct WMO bulletin decoders.

RegReg HTML documentation

RegReg PDF documentation

A presentation made in October 29th, 2002, at the International Lisp Conference, San Francisco, covering Metcast fastCGI and RegReg applied to meteorological decoders, is available here: in PostScript, in PDF. A paper, titled RegReg: a Lightweight Generator of Robust Parsers for Irregular Languages to appear in the proceedings of WCRE 2003, Novembre 2003, shows an application of RegReg in reverse engineering.

The slides, in PDF, of a "demo" presentation, for October 21th, 2003, is available here.

Scheme fastCGI proxy

This package allow Scheme code to be run as a fastCGI application under the mod_fcgi module available for Apache. It is composed of a proxy written in C, basic input and output functions written in Scheme to communicate with that proxy, and a general code structure for the Scheme application. The Scheme application can be run localy or on remote computers.

You need the fastCGI development kit to compile the proxy. It can be obtained from the fastCGI web site: fastCGI development Kit (external link).

Here is a tar of the code files: fastCGI_Scheme.tar

The files contain important explanations. They should be read before attempting to use this code.

If you have any questions regarding the use of this package, please contact me at mario dot latendresse dot ca at metnet dot navy dot mil. (As usual, replace ' dot ' for '.' and ' at ' for '@')

Scheme Course at FNMOC

Here are the transparencies that were used during the four days course given at FNMOC in February 2002.

Transparencies in PostScript

Transparencies in PDF

I recommend printing the R5RS document for more details on Scheme; although it is not intended to be pedagogical. Dorai Sitaram wrote a nice pedagogical introduction to Scheme.

Dorai Sitaram Scheme introduction is available as PDF or HTML.

Local copies of these documents:

R5RS in PostScript

R5RS in PDF

Dorai Sitaram, Teach Yourself Scheme in Fixnum Days in PDF

End of Scheme Course description

Links to further information

Testing the Metcast server

The Scheme language

The Metcast Decoders

The Metcast Server

Introduction

The Metcast server was originaly designed and written by Oleg Kiselyov, at FNMOC.

This documentation presents the modifications made on the code of the Metcast server. Among other things, these modifications increase the performance of the Metcast server. Some technical details of the implementation is also described. For all technical details not related to the fastCGI, the authoritative documentation is at Spawar/JMV-TNG.

The Metcast server runs under a Scheme interpreter, that is, most of the Metcast code is written using the Scheme language. The Scheme interpreter runs as a CGI application under Apache and until recently exclusively on a modified version of Gambit Scheme. (Some upcoming modifications will make it portable to several Scheme interpreters, hopefully under PLT Scheme and Bigloo, probably on many others.)

There is also some documentation on the MBL language, the query language used by the Metcast clients. It explains how the language and the implementation can be extended to handle new products and features. The ease with which this can be done demonstrates the flexibility of the Scheme language.

The client request language for Metcast, MBL

A client request, to Metcast, is formulated using a specific language, the MBL language. Its syntax is relatively simple: it uses a well formed parenthesis expression, known as a S-expr. This a common syntax of Lisp and Scheme expressions. Here is an example:

(example1
  (bounding-box 50 -77 42 -20)
  (st_constraint (st_country_code "CN")) 
  (products (METAR)))

This request, fomulated by the client, is identified by the token `example1'. It contains three sub-expressions. The first one specifies the region of the request, the second one constrains the request to stations from Canada, and the last one identifies the product, that is Metar.

General syntax of MBL

In general, a MBL request has the form:

(id parameter1 ... parametern (products product1 ... productn))
A product is a S-expr of the form
(product-name  parameter1 ... parametern) 

The list of parameters can be empty, as shown above in example1.

As can be seen from the syntax, a request may contain several products. The parameters specified at the top level of the request apply, unless otherwise instructed by a local parameter, to each product. These are global parameters. Each product may have its own local parameters. They only apply to the product which specifies them. Some products have mandatory parameters, a request without them cannot be answered. The list of parameters, product names and their mandatory parameters are presented below.

Note: unknown parameter names will be silently ignored, without causing an error. But an unknown product name will generate an error message.

Product identification

The product names are divided in categories. Here are the list of product names and the categories they belong to.

Parameters

A parameter is a S-expr of the form

  (parameter-name arg1 ... argn)
The possible parameter names are (in alphabetic order): attr-constraint, block_id, bounding-box, call_id, center-id, depth-max, depth-min, grid-id, isobar-p-max, isobar-p-min, keywords, layer, manops, max-records, mime-type, model, modified-since, msgtype, msg_constraint, name, process-id, product-dict, product-GRIB-code, productname, projection, resolution, source, st_constraint, tau, use, valid-at.

The argument types and values depend on the parameter.

Bounding-box parameter

The bounding-box parameter has the form:
(bounding-box lon-W lat-N lon-E lat-S) 

This is the mostly used parameter has it is required by most products. The four parameters can be strings, numbers, symbols, or a mix of these.

Examples

  (bounding-box -75.0 40 -70 35.0)
  (bounding-box 75W 40N 70W 35N) 
  (bounding-box "75W" "40N" 70W 35.0) 

Longitude strings have the form <number><d> where <d> is the letters W, w, E, or e. The letters w and W specify a west longitude, the letters e and E specify an east longitude.

Latitude strings have the form <number><d> where <d> is one of the letters N, n, S, or s. The letters n and N specify a north latitude, the letters s and S specify a south latitude.

West longitudes are negative, up to -180, inclusive. East longitudes are positive, up to 180, inclusive. South latitudes are negative, up to -90, inclusive. North latitudes are positive, up to 90, inclusive.

Symbols, like 70W or 35N, are translated into strings and interpreted as above.

Constraints parameter

A st_constraint parameter is a S-expr of the following form

(st_constraint constaint1 ... constraintn)
where each constraint is one of the following A constraint parameter may be global or local to a product, although for METAR, TAF and UAR, only local constraints are effective; the global ones have no effect for these three products.

Metcast Implementation

The Metcast system is made of one main part, server.scm, and several modules. The file global-vars.scm contains the global variables and parameters. A Metcast server is started by Apache using the file start_server.sh. This file contains the list of Unix environment variables that can be modified to customize the Metcast server. When installing the server, the values of these variables should be chosen carefully.

Parsing MBL

The parsing of the MBL request is decentralized. The global part is parsed by the main module, server.scm. Each module parses its own local parameters.

The dynamic environment

A dynamic environment is used to communicate between the main part and the modules. The file dynamicEnv.scm contains the functions to maintain it and use it.

A module registers, into the dynamic environment, the services it can provide. This is done at the time the module is loaded in the Scheme interpreter. It is done by associating a symbol with a closure. When parsing a sub-expression of a MBL request, which should be a list, the symbol at the head is used to retrieve the associated service.

Database communication

For the following discussion, consult the Scheme code in file db-util.scm.

The Metcast server communicates with the Informix database server through two pipes: an output pipe to send the SQL request to the Informix database and an input pipe to read the answer from it. The input pipe is specified by the global Scheme variable DB:PIPE-FROM-SQL and the output pipe by DB:output-port. The input pipe is a special node generated by the Unix mknod utility. The output pipe actually contains a command to execute the Informix program dbaccess. It is something similar to

"| dbaccess -e DB:NAME - 1>&2"
where DB:NAME is bound to the name of the database. When a SQL request is made to the Informix server, the special node DB:PIPE-FROM-SQL is specified as the location for its answer.

The file db-util.scm defines the function open-data-base which opens those two pipes. It is called when the file is loaded in the Scheme interpreter. Thereafter, these remain opened, until the server dies.

Two functions can be used to actually send the SQL requests and process the answers. These are DB:for-each and DB:assoc-val.

If no results are expected from the database server, the function Db:imperative-stmt should be used.

The function DB:make-sql-stmt-buffer can be used to build up and execute a SQL request.

It is assumed by all these functions that the database server is running.

FastCGI Metcast

The Metcast server runs as a fastCGI application. This increases the performance as it avoids restarting the Metcast server for each client request.

The fastCGI implementation of Metcast is scalable. Several Metcast servers can run simultaneously on different machines with one entry point for the Metcast clients.

Benchmark results, presented in the subsections below, quantify the fastCGI performance.

Some fastCGI Metcast benchmarks

This is a short presentation of the benchmark results showing the essential numbers. The httperf software was used to benchmark the fastCGI implementation.

The first subsection is a comparison with the old Metcast server, while the other two subsections are absolute performance measurements.

Comparing the old (CGI) Metcast server with the new (fastCGI) one (Zowie,Ruby)

For this benchmark we are using the following configuration.

The Apache server is running on one machine (Zowie) while the Metcast server is running on another (Ruby). The observation database server is running on Ruby. Some database requests (grid and imagery) are answered from a third machine (Spaceley). The following table shows the number of replies per second for the fastCGI and CGI Metcast server. (These are the saturated numbers, that is the maximum number of requests per second that could be obtained.)

Size of content in KB Replies/second fastCGI Replies/second CGI
0.7 17.6 4.7
2.4 14.4 4.9
7 10.3 3.6
15 6.5 3.2
41 3.1 2.0
Number of replies per second for fastCGI Metcast and CGI Metcast

Compare to the CGI Metcast the number of processes created by the fastCGI Metcast is very low. In the former case, every request spawn five processes. In the latter case, only a few processes are created every day, it is even possible that none are created even though hundred of requests are answered. This is so since fastCGI Metcast servers have an unlimited life span extending several days.

Benchmarks on a single machine

The machine running the server is my workstation. It is a 930 Mhz Pentium III, single processor, running Linux (Red Hat 6.1).

The following tables show the saturated number of replies per second. That is, once the number of replies per second is reached, the Metcast server cannot increase the rate of replies. This rate depends on the size of the returned content.

Two tables are presented. One without requesting any information from the Informix database, the other with requests to it. In the latter case, a fix file is used as a fictional answer from Informix. In the former case, a Metar request is performed.

Size of content in KB Reply/second
1.3 30
4 17
9 11
13 7.4
20 5.4
35 2.9
64 2.0
Number of replies per second, according to size,
for fastCGI Metcast, without Informix

Size of content in KB Reply/second
1.3 26
4 10
9 6
13 4.7
20 3.3
35 1.6
64 1.3
Number of replies per second, according to size,
for fastCGI Metcast, with Informix

Benchmarks on Ruby

The following table shows the reply rates for Ruby. The Apache server, and the Metcast proxy, are running on my workstation. A temporary ssh connection is made through Zowie to Ruby to allow the proxy to communicate with Metcast servers running on Ruby. For large responses (e.g. 30KB) the network is probbaly a bottleneck.

Size of content in KB Reply/second
129
4 11
7 7.2
15 3.2
37 0.9
57 0.9
60 0.8
Number of replies per second, according to size,
for fastCGI Metcast, with Informix on Ruby through Zowie

The fastCGI Metcast server can answer at a rate superior to 80 replies a second if the returned content size is very small and fix. In that case, no processing is done with the Informix database and no prepared file is processed. The only work done is the proxy relaying the request to the Metcast server, the request being read by it, and a short answer being returned. Since it is a fastCGI, no Unix process is created.

Implementation of the fastCGI

The fastCGI implementation uses the super daemon inetd and the module mod_fastcgi running under Apache. The files /etc/inetd.conf and /etc/httpd.conf must be appropriately setup to run the Metcast server as a fastCGI. The file inetd.conf applies to the Scheme code running the Metcast server, whereas httpd.conf applies to a proxy running under mod_fastcgi which itself is running under the Apache server.

The proxy server_fcgi_proxy (see the file server_fcgi_proxy.c) relays the Metcast client requests to the Metcast server. If the current Metcast servers cannot handle the load of client requests, another proxy is started, up to a maximum specified in the file httpd.conf. Starting a new proxy causes the UNIX super daemon inetd to start a new Metcast server to answer client requests coming through that proxy. The proxy and the inetd.conf file must specify the same listening socket port to start the Metcast server. Once started, communication with the proxy remains persistent. Idle proxies are killed by the module mod_fastcgi. Spawning and killing of proxies are governed by the directive FastCgiConfig.

Input/output of the Metcast server goes through the standard ports, although it uses two approaches depending on the mode of invocation. (The mode of invocation is either local or fastCGI, this is specified using the MODE_INVOCATION shell variable, or the FASTCGI Scheme variable.)

If the Metcast server is called as ``local'', it uses the standard ports with no additional encoding, whereas if it is called as ``fastCGI'' it uses a ``chunked encoding''. The chunked encoding solves the problem of end of file recognition by the proxy.

All module uses the function SRV:send to produce output to the client, through the proxy. The output is buffered until a SRV:send-terminate is executed. Each request marks and flushes the custom dynamic environment. These functions are written in Scheme.

Connections to databases are persistent. A Metcast server opens up the streams once when loaded into the Scheme interpreter. If for some reason a database does not answer a SQL request, the Metcast server will abort due to a timeout.

Setting up fastCGI in the Apache configuration file

Load balancing is automatically performed by the fastCGI module of Apache. The rates of spawning and killing of proxies are configured using the directive FastCgiConfig in the Apache configuration file httpd.conf. Here is an example of a FastCgiConfig directive.
FastCgiConfig -restart -restart-delay 5 -idle-timeout 120
              -maxProcesses 30 -maxClassProcesses 30 -initial-env none 

For Metcast, the maxProcesses parameter limits the number of proxies, which indirectly limits the number of fastCGI Metcast servers. Note that an initial environment could be specified for all fastCGI applications. It could be used to replace the initial script to setup the proxy environment. But this is a less general solution that was not adopted.

The proxy server has been identified by the following Location directive in httpd.conf:

<Location /cgi-bin/start_server_fcgi_proxy>
 SetHandler fastcgi-script
 #perhaps other authorization directives
</Location>
Consult the fastCGI documentation at
www.fastcgi.com for more information.

fastCGI applications under Apache

To use any fastCGI application under Apache, the fastCGI module must be installed into Apache.

One of the technique is to use dynamic loadable Apache modules. Apache must be built with the mod_so module. To do so configure an Apache make file by

./configure --enable-module=so
and perform a make. (You must have all Apache sources to do that). You should have an ew Apache server with dynamically loadable modules.

The mod_fastcgi.so module, which controls fastCGI applications, should be placed in the libexec directory (usually at /usr/local/apache/libexec). It is dynamically loaded when Apache needs to run fastCGI applications.