Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Nested Words and Visibly Pushdown Languages
Nested Words
aka Visibly Pushdown Languages
What are nested words?
Nested words is a model for
representation of data with both a
linear ordering and a hierarchically nested
matching of items. Examples of data with such dual linear-hierarchical structure
include executions of structured programs,
annotated linguistic data,
and HTML/XML documents.
A nested word consists of a sequence of linearly ordered positions, augmented
with nesting edges connecting
calls to
returns (or
open-tags to
close-tags).
The edges do not cross creating a properly nested hierarchical structure,
and we allow some of the edges to be pending. This nesting structure can be uniquely
represented by a sequence specifying the types of positions (calls, returns,
and internals).
Nested words generalize both words and ordered trees, and
allow both word and tree operations.
Nested word automata---finite-state acceptors for nested words,
define the class of regular languages of nested words.
This class has all the
appealing theoretical properties that the classical regular word
languages enjoys:
deterministic nested word automata are as expressive as their
nondeterministic counterparts; the class is closed under union, intersection,
complementation, concatenation, Kleene-*, prefixes, and language homomorphisms;
membership, emptiness, language
inclusion, and language equivalence are all decidable; and definability in
monadic second order logic corresponds exactly to finite-state recognizability.
These results generalize to infinite nested words also.
How do they relate to context-free languages of words?
Given a language L of nested words over an alphabet, the linear encoding
of nested words gives a language L' over the tagged alphabet consisting
of symbols tagged with the type of the position.
If L is regular language of nested words, then L' is context-free.
In fact, the pushdown automaton accepting L' has a special structure:
while reading a call, the automaton must push one symbol, while reading a
return symbol, it must pop one symbol (if the stack is non-empty), and while
reading an internal symbol, it can only update its control state.
We call such automata visibly pushdown automata and the
class of word languages they accept visibly pushdown languages (VPL).
Since our automata can be determinized,
VPLs correspond to a subclass of deterministic context-free languages (DCFL).
VPLs generalize of paranthesis languages, bracketed languages,
and balanced languages, and
they have better closure properties than
CFLs, DCFLs, or paranthesis languages.
We argue that for algorithmic verification of structured programs,
instead of viewing the program as a context-free language over words, one
should view it as a regular language of nested words (or equivalently,
a visibly pushdown language), and this would allow
model checking of many properties (such as stack inspection, pre-post
conditions) that are not expressible in existing specification logics.
In general, pushdown automata serve two distinct purposes:
discovering the hierarchical matching, and
processing/querying the matching.
In applications where only the second purpose is relevant
(as in program analysis), one can replace pushdown automata with NWAs
with many benefits.
How do they relate to ordered trees?
Data with dual linear-hierarchical structure is traditionally modeled
using binary, and more generally, using ordered unranked, trees,
and queried using tree automata.
In ordered trees, nodes with the same parent are linearly ordered,
and the classical tree traversals such as infix (or depth-first left-to-right)
can be used to define an implicit ordering of all nodes.
It turns out that,
hedges, where a hedge is a sequence of ordered trees,
are a special class of nested words, namely, the ones
corresponding to Dyck words, and
regular hedge languages correspond to balanced languages.
For document processing, nested words do have many advantages
over ordered trees.
Tree-based representation implicitly assumes that the input linear data
can be parsed into a tree, and thus, one cannot represent and process data that
may not parse correctly. Word operations such as prefixes, suffixes, and concatenation,
while natural for document processing, do not have analogous tree operations.
Second, tree automata can naturally express constraints on the sequence of labels
along a hierarchical path, and also along
the left-to-right siblings, but they have difficulty to capture constraints
that refer to the global linear order.
For example, the query that patterns p1,... pk appear in the document in that order
compiles into a deterministic word automaton (and hence
deterministic NWA) of linear size, but standard
deterministic bottom-up tree automaton for this query must be of size exponential in k.
NWAs can be viewed as a kind of tree automata
such that both bottom-up tree automata and top-down tree automata are
special cases.
These results imply that a query can be more succinctly
encoded in the nested words view with complexity benefits
A nested word automaton reads the word from left to right, processing
the nesting edges as and when they arrive. This matches with the SAX
API for XML, and thus has a natural use in streaming
algorithms.
References
The model of nested words went through a couple of iterations:
See Visibly pushdown languages; Alur and Madhusudan; STOC 2004;
and Adding nesting structure to words; Alur and Madhusudan; DLT 2006.
We recommend reading
this unified full version (under journal review).
The invited talk at CSR 2007 is also a good starting point.
There has been extensive follow-up research.
The purpose of this page is to keep track of the latest results related this topic.
Email me with comments and/or suggested additions.
Additional Decision Problems for VPAs/NWAs
Visibly pushdown games; Loding, Madhusudan, and Serre; FSTTCS 2004.
Visibly pushdown automata: From language equivalence to simulation and
bisimulation; Srba; CSL 2006.
Regularity problems for visibly pushdown languages; Barany, Loding and Serre;
STACS 2006.
On the membership problem for visibly pushdown languages; La Torre, Napoli,
and Parente; ATVA 2006.
Congruences and Minimization
Congruences for visibly pushdown languages;
Alur, Kumar, Madhusudan, and Viswanathan; ICALP 2005.
Minimization, learning, and conformance testing of Boolean programs;
Kumar, Madhusudan, and Viswanathan; CONCUR 2006.
Minimizing variants of visibly pushdown automata;
Chervet and Walukiewicz; MFCS 2007.
Temporal and Fixpoint Logics; Expressiveness
A temporal logic of nested calls and returns;
Alur, Etessami, and Madhusudan; TACAS 2004.
Regular languages of nested words: Fixed points, automata,
and synchronization; Arenas, Barcelo, and Libkin; ICALP 2007.
First-order and temporal logics for nested words,
Alur, Arenas, Barcelo, Etessami, Immerman, and Libkin; LICS 2007.
Alternating automata and a temporal fixpoint calculus for visibly pushdown languages;
Bozzelli; CONCUR 2007.
A grammatical representation of visibly pushdown languages; Baran and Barringer;
WoLLIC 2007.
Specifications for Program Analysis
VPA-based aspects: Better support for AOP over protocols;
Nguyen and Sudholt; SEFM 2006.
Instrumenting C programs with nested word monitors;
Chaudhuri and Alur; SPIN 2007.
Synthesizing monitors for safety properties -- This time with calls and returns;
Rosu, Chen, and Ball; UIUC TR 2007.
XML Processing
Visibly pushdown expression effects for XML stream processing;
Pitcher; PLAN-X 2005.
Visibly pushdown languages for streaming XML;
Kumar, Madhusudan, and Viswanathan; WWW 2007.
Marrying words and trees; Alur; PODS 2007.
Transducers
Visibly pushdown transducers for approximate validation of streaming XML;
Thomo, Venkatesh, and Ye; FoIKS 2008.
Visibly pushdown transducers;
Raskin and Servais; ICALP 2008.
Nested Trees
A fixpoint calculus for local and global program flows;
Alur, Chaudhuri, and Madhusudan; POPL 2006.
Languages of nested trees; Alur, Chaudhuri, and Madhusudan; CAV 2006.
Visibly pushdown languages and term rewriting; Chabin and Rety; FroCos 2007.
Words with Multiple Nestings
A note on nested words; Blass and Gurevich; Microsoft Research TR; 2006.
A robust class of context-sensitive languages; La Torre, Madhusudan, and Parlato;
LICS 2007.
2-Visibly Pushdown Automata; Carotenuto, Murano, and Peron; DLT 2007.
New Results Using Visibility of Calls/Returns
Third-order Idealized Algol with iteration is decidable;
Murawski and Walukiewicz; FoSSaCS 2005.
Synchronization of pushdown automata; Caucal; DLT 2006.
Propositional dynamic logic with recursive programs; Loding and Serre; FoSSaCS 2006.
Height-deterministic pushdown automata; Nowotka and Srba; MFCS 2007.
An infinite automaton characterization of double exponential time;
La Torre, Madhusudan, and Parlato; UIUC TR 2007.