|
Research Summary
My research focuses on methods to help programmers develop correct,
reliable, and secure software. Our goal is to create better tools for
manipulating programs and analyzing their execution properties.
The tools that we are developing are based on static analysis,
which provides a way to obtain information about the possible
states that a program reaches during execution, but without actually
running the program on specific inputs. Instead, static-analysis
techniques explore a program's behavior for all possible inputs
and all possible states that the program can reach. To make this feasible,
the program is ``run in the aggregate'' -- i.e., on descriptors
that represent collections of many states.
Most of our effort is focused on three topics:
(A summary of some of my past research work can be found
here.)
Our work on static analysis via 3-valued logic addresses the problem
of analyzing programs that use dynamic allocation and freeing of
storage cells and destructive updating of structure fields -- features
found in all modern programming languages, including C, C++, and Java.
In such programs, data structures can grow and shrink dynamically,
with no fixed upper bound on their size or number. In the case of
thread-based languages, such as Java, the number of threads can also
grow and shrink dynamically. Our techniques provide a way to create
finite-sized descriptors of memory configurations that (i) abstract
away certain details, but (ii) retain enough key information so that
the analysis can identify interesting properties that hold.
[More]
In our work on analyzing stripped executables, we have devised methods
that -- without debugging information -- identify variables, data
objects, and types, and track how they are manipulated by the
executable. This information provides answers to such questions as
``what could this dereference operation access?'', ``what function
could be called at this indirect call site?'', etc., which can help
human security analysts understand the workings of COTS components,
plugins, mobile code, and DLLs, as well as memory snapshots of worms
and virus-infected code.
[More]
Our work on weighted pushdown systems (WPDSs) has developed a new
static-analysis framework that is strictly richer than previous
approaches (35 years worth ...) We have also shown that WPDSs are a
useful formalism in a completely separate application domain, namely,
access control of shared computing resources in distributed systems.
[More]
Static Program Analysis via 3-Valued Logic
The aim of this work
[SRW02,
RSW04]
has been to create a
parametric framework for program analysis that is capable of
expressing rich properties of memory configurations, and
confirming such behavioral properties as
- when the input to a list-insert program
is an acyclic list, the output is an acyclic list, and
- when the input to a list-reversal program that uses
destructive-update operations is an acyclic list, the output is an
acyclic list.
A parametric analysis framework is one that can be instantiated in
different ways to create different program-analysis algorithms that
provide answers to different questions, with varying degrees of
efficiency and precision.
This work was originally motivated by the problem of
shape analysis
for programs written in languages that permit destructive updating
of dynamically allocated storage.
The aim of shape analysis is to discover information
(in a conservative way) about the possible ``shapes''
of the dynamically allocated data structures to which a program's pointer
variables can point.
As we investigated this problem more deeply, we discovered
methods that allowed us to deal with properties of memory configurations
other than just pure shape properties.
For instance, recent work has concerned numeric properties
[GDDRS04].
The key aspect of our approach is the way in which it makes use of
2-valued and 3-valued logic: 2-valued and 3-valued
logical structures---i.e., collections of relations---are used to
represent concrete and abstract stores, respectively;
individuals represent entities such as memory cells, threads,
locks, etc.;
unary and binary relations encode the contents of
variables, pointer-valued structure fields, and other aspects of
memory states; and first-order formulas with transitive closure are used
to specify properties such as sharing, cyclicity, reachability, etc.
Formulas are also used to specify how the store is affected by the
execution of the different kinds of statements in the programming
language.
The analysis framework can be instantiated in different ways by
varying the relation symbols of the logic, and, in particular, by varying
which of the unary relations control how nodes are folded
together. The specified set of relations
determines the set of properties that will be tracked,
and consequently what properties of stores can be discovered to
hold at different points in the program by the corresponding instance
of the analysis.
As a methodology for verifying properties of programs, the advantages
of the 3-valued-logic approach are:
-
No loop invariants are required.
-
No theorem provers are involved, and thus every abstract execution
step must terminate.
-
The method is based on
abstract interpretation
and satisfies conditions that guarantee that the entire process
always terminates.
-
The method applies to programs that manipulate pointers and
heap-allocated data structures.
-
The method eliminates the need for the user to write the usual
proofs required with abstract interpretation---i.e., to
demonstrate that the abstract descriptors that the analyzer
manipulates correctly model the actual heap-allocated data
structures that the program manipulates.
A PowerPoint presentation on some of this material can be found
here.
A prototype implementation that implements this approach has been
created, called TVLA
(Three-Valued-Logic Analyzer).
Bibliography:
[Back to the top]
Static Analysis of Executables
The aim of this research is to recover intermediate representations
that are similar to those that can be created for a program written in
a high-level language. Our initial work has concerned x86
executables. The goal is to provide a platform that an analyst can
use to understand the workings of COTS components, plugins, mobile
code, and DLLs, as well as memory snapshots of worms and
virus-infected code.
Static analysis provides techniques that can help with such a task;
however, there are several obstacles that must be overcome.
In particular, to understand memory-access operations, it is
necessary to determine the set of addresses accessed by each
operation. This is difficult because
-
While some memory operations use explicit memory addresses
in the instruction (easy), others use indirect addressing
via address expressions (difficult).
-
Arithmetic on addresses is pervasive. For instance, even when
the value of a local variable is loaded from its slot in an
activation record, address arithmetic is performed.
-
There is no notion of type at the hardware level, so address
values cannot be distinguished from integer values.
-
Memory accesses do not have to be aligned, so word-sized
address values could potentially be cobbled together from misaligned
reads and writes.
An additional challenge is that for many programs, symbol-table and
debugging information is entirely absent. Even if it is present,
the executable may be untrustworthy, in which case
symbol-table and debugging information cannot be relied upon.
Instead, an analysis that we have developed---called value-set
analysis (VSA)---is carried out to recover information about the
contents of memory locations and how they are manipulated by the
executable
[BR04,
RBL06,
BR07].
A key feature of VSA is that it tracks address-valued and
integer-valued quantities simultaneously. VSA is related to
pointer-analysis algorithms that have been developed for programs
written in high-level languages, which determine an over-approximation
of the set of variables whose addresses each pointer variable can
hold:
VSA determines an over-approximation of the set of addresses that
each data object can hold at each program point.
At the same time, VSA is similar to range analysis and other numeric
static-analysis algorithms that over-approximate the integer values
that each variable can hold:
VSA determines an over-approximation of the set of integer values
that each data object can hold at each program point.
Value-set analysis has been incorporated into a tool called CodeSurfer/x86.
The facilities of CodeSurfer/x86 provide an
analyst with a powerful and flexible platform for investigating the
properties and behaviors of an x86 executable, using either (i)
CodeSurfer/x86's GUI, (ii) CodeSurfer/x86's scripting language, which
provides access to all of the intermediate representations that
CodeSurfer/x86 builds for the executable, or (iii) GrammaTech's
Path
Inspector, which is a tool that uses a sophisticated pattern-matching
engine to answer questions about the flow of execution in a program.
CodeSurfer/x86 is the outcome of a joint project between the Univ. of
Wisconsin and GrammaTech, Inc. CodeSurfer/x86 makes use of both
IDAPro,
a disassembly toolkit, and GrammaTech's
CodeSurfer
system, a toolkit for building program-analysis and inspection tools.
CodeSurfer/x86 is a tool for code understanding and code inspection that
supports both a graphical user interface (GUI) and a scripting language for
accessing a program's system dependence graph (SDG)
[HRB90],
as well as other information stored in
CodeSurfer's intermediate representations (IRs).
The GUI supports browsing (``surfing'') of an SDG, along with
a variety of operations for making queries about the SDG---such as
slicing
[HRB90]
and chopping
[RR95].
The GUI allows a user to navigate through a program's source code
using these dependences in a manner analogous to navigating the World
Wide Web.
CodeSurfer's scripting language provides a programmatic interface to these
operations, as well as to lower-level information, such as the
individual nodes and edges of the program's SDG, call graph, and
control-flow graph, and a node's sets of used, killed, and
possibly-killed variables.
By writing programs that traverse CodeSurfer's IRs to implement
additional program analyses, the scripting language can be used to extend
CodeSurfer's capabilities.
Bibliography:
[Back to the top]
Weighted Pushdown Systems and their Application to Interprocedural Dataflow Analysis
This work has established new connections between interprocedural
dataflow analysis and model checking of pushdown systems (PDSs).
Various connections between dataflow analysis and model checking
have been established in past work,
e.g., [S91,
S93,
S98,
EK99,
CC00];
however, with one exception ([EK99]),
past work has shed light only on the relationship between
model checking and bit-vector dataflow-analysis problems, such
as live-variable analysis and partial-redundancy elimination.
In contrast, the results obtained in our work
[RSJ03,
RSJM]
apply to
(i) bit-vector problems, (ii) the one non-bit-vector problem
addressed in [EK99],
as well as (iii) certain dataflow-analysis problems that cannot be expressed
as bit-vector problems,
such as linear constant propagation
SRH96
and affine-relation analysis
MOS04.
In general, the approach can be applied
to any distributive dataflow-analysis problem for which the
domain of transfer functions has no infinite descending chains.
(Safe solutions are also obtained for problems that are
monotonic but not distributive.)
The contributions of the work can be summarized as follows:
-
Conventional dataflow-analysis algorithms merge together the
values for all states associated with the same program point,
regardless of the states' calling context.
With the dataflow-analysis algorithm obtained via weighted PDSs,
dataflow queries can be posed with respect to a regular language
of stack configurations.
Conventional merged dataflow information can also be obtained
by issuing appropriate queries;
thus, the new approach provides a strictly richer framework
for interprocedural dataflow analysis than is provided by
conventional interprocedural dataflow-analysis algorithms.
-
Because the algorithm for solving path problems in weighted PDSs
can provide a witness set of paths, it is possible to provide
an explanation of why the answer to a dataflow query has
the value reported.
The algorithms described in
[RSJ03,
RSJM]
have been implemented in two libraries (one implemented in C,
one implemented in C++) that solve reachability problems
on weighted PDSs;
these can be downloaded from
WPDS (C)
and
WPDS++ (C++).
These libraries have been used to create prototype implementations of
context-sensitive interprocedural dataflow analyses for uninitialized
variables, live variables, linear constant propagation, and the
detection of affine relationships.
Weighted pushdown systems are also a useful formalism in a completely
separate application domain, namely, access control of shared computing resources
in distributed systems [JR04,
SJRS03,
JR02,
JSWR06,
WJRSS06].
Bibliography:
- The use of WPDSs for program analysis:
- The use of WPDSs for access control in distributed systems:
[Back to the top]
|