Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Thomas W. Reps -- Research Summary
[go: Go Back, main page]

 
   
   

 
Computer Sciences Dept.

Thomas W. Reps

Professor
Computer Sciences Department
University of Wisconsin-Madison
1210 West Dayton Street
Madison, WI 53706-1685
USA

Picture of Thomas W. Reps

Research Summary

My research focuses on methods to help programmers develop correct, reliable, and secure software. Our goal is to create better tools for manipulating programs and analyzing their execution properties.

The tools that we are developing are based on static analysis, which provides a way to obtain information about the possible states that a program reaches during execution, but without actually running the program on specific inputs. Instead, static-analysis techniques explore a program's behavior for all possible inputs and all possible states that the program can reach. To make this feasible, the program is ``run in the aggregate'' -- i.e., on descriptors that represent collections of many states.

Most of our effort is focused on three topics:

(A summary of some of my past research work can be found here.)

Our work on static analysis via 3-valued logic addresses the problem of analyzing programs that use dynamic allocation and freeing of storage cells and destructive updating of structure fields -- features found in all modern programming languages, including C, C++, and Java. In such programs, data structures can grow and shrink dynamically, with no fixed upper bound on their size or number. In the case of thread-based languages, such as Java, the number of threads can also grow and shrink dynamically. Our techniques provide a way to create finite-sized descriptors of memory configurations that (i) abstract away certain details, but (ii) retain enough key information so that the analysis can identify interesting properties that hold. [More]

In our work on analyzing stripped executables, we have devised methods that -- without debugging information -- identify variables, data objects, and types, and track how they are manipulated by the executable. This information provides answers to such questions as ``what could this dereference operation access?'', ``what function could be called at this indirect call site?'', etc., which can help human security analysts understand the workings of COTS components, plugins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code. [More]

Our work on weighted pushdown systems (WPDSs) has developed a new static-analysis framework that is strictly richer than previous approaches (35 years worth ...) We have also shown that WPDSs are a useful formalism in a completely separate application domain, namely, access control of shared computing resources in distributed systems. [More]

Static Program Analysis via 3-Valued Logic

The aim of this work [SRW02, RSW04] has been to create a parametric framework for program analysis that is capable of expressing rich properties of memory configurations, and confirming such behavioral properties as

  • when the input to a list-insert program is an acyclic list, the output is an acyclic list, and
  • when the input to a list-reversal program that uses destructive-update operations is an acyclic list, the output is an acyclic list.
A parametric analysis framework is one that can be instantiated in different ways to create different program-analysis algorithms that provide answers to different questions, with varying degrees of efficiency and precision.

This work was originally motivated by the problem of shape analysis for programs written in languages that permit destructive updating of dynamically allocated storage. The aim of shape analysis is to discover information (in a conservative way) about the possible ``shapes'' of the dynamically allocated data structures to which a program's pointer variables can point. As we investigated this problem more deeply, we discovered methods that allowed us to deal with properties of memory configurations other than just pure shape properties. For instance, recent work has concerned numeric properties [GDDRS04].

The key aspect of our approach is the way in which it makes use of 2-valued and 3-valued logic: 2-valued and 3-valued logical structures---i.e., collections of relations---are used to represent concrete and abstract stores, respectively; individuals represent entities such as memory cells, threads, locks, etc.; unary and binary relations encode the contents of variables, pointer-valued structure fields, and other aspects of memory states; and first-order formulas with transitive closure are used to specify properties such as sharing, cyclicity, reachability, etc. Formulas are also used to specify how the store is affected by the execution of the different kinds of statements in the programming language.

The analysis framework can be instantiated in different ways by varying the relation symbols of the logic, and, in particular, by varying which of the unary relations control how nodes are folded together. The specified set of relations determines the set of properties that will be tracked, and consequently what properties of stores can be discovered to hold at different points in the program by the corresponding instance of the analysis.

As a methodology for verifying properties of programs, the advantages of the 3-valued-logic approach are:

  1. No loop invariants are required.
  2. No theorem provers are involved, and thus every abstract execution step must terminate.
  3. The method is based on abstract interpretation and satisfies conditions that guarantee that the entire process always terminates.
  4. The method applies to programs that manipulate pointers and heap-allocated data structures.
  5. The method eliminates the need for the user to write the usual proofs required with abstract interpretation---i.e., to demonstrate that the abstract descriptors that the analyzer manipulates correctly model the actual heap-allocated data structures that the program manipulates.
A PowerPoint presentation on some of this material can be found here.

A prototype implementation that implements this approach has been created, called TVLA (Three-Valued-Logic Analyzer).

Bibliography:

[Back to the top]

Static Analysis of Executables

The aim of this research is to recover intermediate representations that are similar to those that can be created for a program written in a high-level language. Our initial work has concerned x86 executables. The goal is to provide a platform that an analyst can use to understand the workings of COTS components, plugins, mobile code, and DLLs, as well as memory snapshots of worms and virus-infected code.

Static analysis provides techniques that can help with such a task; however, there are several obstacles that must be overcome. In particular, to understand memory-access operations, it is necessary to determine the set of addresses accessed by each operation. This is difficult because

  • While some memory operations use explicit memory addresses in the instruction (easy), others use indirect addressing via address expressions (difficult).
  • Arithmetic on addresses is pervasive. For instance, even when the value of a local variable is loaded from its slot in an activation record, address arithmetic is performed.
  • There is no notion of type at the hardware level, so address values cannot be distinguished from integer values.
  • Memory accesses do not have to be aligned, so word-sized address values could potentially be cobbled together from misaligned reads and writes.

An additional challenge is that for many programs, symbol-table and debugging information is entirely absent. Even if it is present, the executable may be untrustworthy, in which case symbol-table and debugging information cannot be relied upon. Instead, an analysis that we have developed---called value-set analysis (VSA)---is carried out to recover information about the contents of memory locations and how they are manipulated by the executable [BR04, RBL06, BR07].

A key feature of VSA is that it tracks address-valued and integer-valued quantities simultaneously. VSA is related to pointer-analysis algorithms that have been developed for programs written in high-level languages, which determine an over-approximation of the set of variables whose addresses each pointer variable can hold:

VSA determines an over-approximation of the set of addresses that each data object can hold at each program point.
At the same time, VSA is similar to range analysis and other numeric static-analysis algorithms that over-approximate the integer values that each variable can hold:
VSA determines an over-approximation of the set of integer values that each data object can hold at each program point.

Value-set analysis has been incorporated into a tool called CodeSurfer/x86. The facilities of CodeSurfer/x86 provide an analyst with a powerful and flexible platform for investigating the properties and behaviors of an x86 executable, using either (i) CodeSurfer/x86's GUI, (ii) CodeSurfer/x86's scripting language, which provides access to all of the intermediate representations that CodeSurfer/x86 builds for the executable, or (iii) GrammaTech's Path Inspector, which is a tool that uses a sophisticated pattern-matching engine to answer questions about the flow of execution in a program.

CodeSurfer/x86 is the outcome of a joint project between the Univ. of Wisconsin and GrammaTech, Inc. CodeSurfer/x86 makes use of both IDAPro, a disassembly toolkit, and GrammaTech's CodeSurfer system, a toolkit for building program-analysis and inspection tools.

CodeSurfer/x86 is a tool for code understanding and code inspection that supports both a graphical user interface (GUI) and a scripting language for accessing a program's system dependence graph (SDG) [HRB90], as well as other information stored in CodeSurfer's intermediate representations (IRs). The GUI supports browsing (``surfing'') of an SDG, along with a variety of operations for making queries about the SDG---such as slicing [HRB90] and chopping [RR95]. The GUI allows a user to navigate through a program's source code using these dependences in a manner analogous to navigating the World Wide Web.

CodeSurfer's scripting language provides a programmatic interface to these operations, as well as to lower-level information, such as the individual nodes and edges of the program's SDG, call graph, and control-flow graph, and a node's sets of used, killed, and possibly-killed variables. By writing programs that traverse CodeSurfer's IRs to implement additional program analyses, the scripting language can be used to extend CodeSurfer's capabilities.

Bibliography:

[Back to the top]

Weighted Pushdown Systems and their Application to Interprocedural Dataflow Analysis

This work has established new connections between interprocedural dataflow analysis and model checking of pushdown systems (PDSs). Various connections between dataflow analysis and model checking have been established in past work, e.g., [S91, S93, S98, EK99, CC00]; however, with one exception ([EK99]), past work has shed light only on the relationship between model checking and bit-vector dataflow-analysis problems, such as live-variable analysis and partial-redundancy elimination. In contrast, the results obtained in our work [RSJ03, RSJM] apply to (i) bit-vector problems, (ii) the one non-bit-vector problem addressed in [EK99], as well as (iii) certain dataflow-analysis problems that cannot be expressed as bit-vector problems, such as linear constant propagation SRH96 and affine-relation analysis MOS04. In general, the approach can be applied to any distributive dataflow-analysis problem for which the domain of transfer functions has no infinite descending chains. (Safe solutions are also obtained for problems that are monotonic but not distributive.)

The contributions of the work can be summarized as follows:

  • Conventional dataflow-analysis algorithms merge together the values for all states associated with the same program point, regardless of the states' calling context. With the dataflow-analysis algorithm obtained via weighted PDSs, dataflow queries can be posed with respect to a regular language of stack configurations. Conventional merged dataflow information can also be obtained by issuing appropriate queries; thus, the new approach provides a strictly richer framework for interprocedural dataflow analysis than is provided by conventional interprocedural dataflow-analysis algorithms.
  • Because the algorithm for solving path problems in weighted PDSs can provide a witness set of paths, it is possible to provide an explanation of why the answer to a dataflow query has the value reported.

The algorithms described in [RSJ03, RSJM] have been implemented in two libraries (one implemented in C, one implemented in C++) that solve reachability problems on weighted PDSs; these can be downloaded from WPDS (C) and WPDS++ (C++). These libraries have been used to create prototype implementations of context-sensitive interprocedural dataflow analyses for uninitialized variables, live variables, linear constant propagation, and the detection of affine relationships.

Weighted pushdown systems are also a useful formalism in a completely separate application domain, namely, access control of shared computing resources in distributed systems [JR04, SJRS03, JR02, JSWR06, WJRSS06].

Bibliography:

[Back to the top]

 
Computer Sciences | UW Home