Thomas W. Reps -- Past Research Accomplishments

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Thomas W. Reps -- Past Research Accomplishments

My UW | UW Search

Computer Science Home Page
> ~reps

Recent Items of Note
(Last updated: 8/10/07)

Categorized Index to Publications

Disclaimer

List of Publications
· Books
· Journal Publications
· Invited Papers
· Book Chapters
· Reprinted in Collections
· Edited Books
· Magazine Articles
· Conference Publications
· Patents
· Pending Submissions
· Other Publications and Reports

Software

Visitors, Post-Doctoral Associates, and Students

C.S. Dept.
Home Page

Thomas W. Reps
Professor
Computer Sciences Department
University of Wisconsin-Madison
1210 West Dayton Street
Madison, WI 53706-1685
USA

Picture of Thomas W. Reps

Past Research Accomplishments

The Cornell Program Synthesizer
- Technology Transfer
The Synthesizer Generator
- Technology Transfer
Incremental computing
- Technology Transfer
Program slicing
- Technology Transfer
Interprocedural dataflow analysis and context-free-language reachability
- Technology Transfer
Techniques for software renovation
The use of program profiling for software maintenance
Computational differentiation and computational divided differencing

The Cornell Program Synthesizer

The Cornell Program Synthesizer [TR81], which Tim Teitelbaum and I started working on in 1978, was the first interactive, WYSISYG (What You See Is What You Get) programming environment for a strongly-typed, block-structured, imperative programming language. It had a tightly integrated combination of facilities for programmers to create, edit, execute, and debug their PL/C programs. In other words, by 1979, we had created the first version of an interactive programming environment similar to modern tools, such as Visual Studio and Eclipse.

At the time that it was created (i.e., 1978-79), the Cornell Program Synthesizer was a radically new kind of programming tool. Along with the Mentor system of Donzeau-Gouge, Kahn, Huet, and Lang, it showed that the kind of direct-manipulation programming tools that had been pioneered in the SmallTalk and InterLisp systems could be created for a mainstream programming language (i.e., one that was strongly-typed, block-structured, and had imperative programming constructs).

The Cornell Program Synthesizer was used successfully in introductory programming classes by about 20,000 students over a five-year period at Cornell, Rutgers, Princeton, UCLA, and several other universities.

Technology Transfer

The Cornell Program Synthesizer was distributed to researchers at about 110 universities, research institutes, and industry laboratories. Through lectures that were given about the system and copies that were distributed, it is possible to trace its direct influence on several commercial products, including DEC's Language Sensitive Editor, Rational's Ada programming environment, and Symantec's LightspeedC and LightspeedPascal programming environments. It is more difficult to connect the dots to the similar products that hundreds of thousands or millions of people use on a daily basis today; however, together with the initial paper about the system [TR81], the Cornell Program Synthesizer was very influential: as a model for programming tools of the future, it helped to spark widespread interest in the question of how to create better tools for developing, analyzing, testing, and debugging programs.

[Back to the top]

The Synthesizer Generator

My Ph.D. dissertation [R84] investigated the problem of automating the creation of programming tools similar to the Cornell Program Synthesizer, but for languages other than PL/I. The goal of the research was to create an editor-generator program to enable editors for different languages to be created automatically from a language description. In the solution I proposed, language descriptions are specified by giving

a grammar for the language,
sets of ``attributes'' (annotations) to be attached to nodes of the grammar's derivation trees, and
rules that describe how information from one node's attributes affects the attributes of other nodes.

In essence, each editor can be thought of as a specialized ``spreadsheet'' for documents written in the given language. After an editing operation changes the document's underlying tree, some of the tree's attributes may no longer be consistent with one another; attributes must be updated to reestablish consistency (similar to the way values are updated in a spreadsheet program). A major result of my Ph.D. research was an optimal algorithm that I devised to solve the attribute-updating problem [R82, RTD83, R84 ].

As part of my Ph.D. research, I developed a prototype implementation of an editor-generating system that made use of the ideas described above. I spent 1982-85 as a post-doc at Cornell (in residence during 1982-83 at INRIA, the national institute for computer science in France) collaborating with my advisor, Tim Teitelbaum, on the design and implementation of the Synthesizer Generator, a full-scale version of such a system [RT84, RT87, RT88a ]. The Synthesizer Generator creates a language-specific editor from an input specification that defines a language's syntax, display format, transformation rules for restructuring ``documents'' written in the language, and attribution rules that codify context-sensitive relationships between document elements. From this specification, the Synthesizer Generator creates a specialized editor for manipulating documents according to these rules. The Synthesizer Generator has proven to be of great utility in a wide range of applications, including program editors, text-preparation systems, and tools that verify the correctness of proofs (for several different kinds of mathematical and programming logics).

Technology Transfer

Beginning in August 1985, the Synthesizer Generator was made available to universities and industrial research organizations. Since distribution began it has been licensed to about 330 sites in 23 countries. Two books describing the system, which Teitelbaum and I co-authored, were published by Springer-Verlag in 1988 [RT88a, RT88b]. Also in 1988, Teitelbaum and I founded a company (GrammaTech, Inc.) to commercialize the Synthesizer Generator technology. Grammatech subsequently created two products for the Ada programming language based on the Synthesizer Generator, Ada-ASSURED and Ada-Utilities, and, since the mid-90's, these have been used in the development of almost every major Department of Defense system written in Ada.

[Back to the top]

Incremental Computing

The attribute-updating algorithm that I developed as part of my Ph.D. research is an example of an ``incremental updating algorithm''---an algorithm that makes use of the solution to one problem instance to find the solution to a nearby problem instance. This kind of problem has remained one of my interests, and is something that I have returned to occasionally over the years in a variety of contexts, including interactive program-development environments [R82, RTD83, R84, RMT86, R88], graph algorithms [RR94a, RR94b, RR96a, RR96b], and the construction of abstract transformers for program-analysis problems [RSL03].

Incremental computing was the subject of G. Ramalingam's Ph.D. research [R96], which was carried out under my supervision.

Technology Transfer

The incremental shortest-hyperpath algorithm from [RR96b] has been used by AT&T; to improve the performance of Open Shortest Path First (OSPF), the most commonly used intra-domain Internet routing protocol. The incremental shortest-hyperpath algorithm is used as a subroutine in a heuristic search algorithm for optimizing OSPF weights (for a given set of projected demands) and thereby reduce congestion [Fortz and Thorup 04] The speed-up obtained by using our incremental algorithm was reported to be a factor of 15 to 20. In 2004, Mikkel Thorup told me that AT&T; was using their results to adjust capacities in the AT&T; network when they had to schedule link downtimes for maintenance. Apparently this happens quite frequently: he indicated that on the order of a dozen links had to be serviced each day.

[Back to the top]

Program Slicing

Starting in 1986, I carried out a great deal of work exploring how program slicing (an operation originally proposed in 1979 by the late Mark Weiser) can serve as the basis for semantics-based program-manipulation operations. The slice of a program with respect to a set of program elements S is a projection of the program that includes only program elements that might affect (either directly or transitively) the values of the variables used at members of S. Slicing allows one to find semantically meaningful decompositions of programs, where the decompositions consist of elements that are not textually contiguous. Program slicing is a fundamental operation that can aid in solving many software-engineering problems [HR92]. For instance, it has applications in program understanding, maintenance, debugging, testing, differencing [HR91], specialization [RT96], reuse, and merging of program variants [HPR89]. The projects that I carried out with my students and co-workers were aimed at

improving the underlying technology for program slicing (and related operations) [HRB90, RHSR94, RR95 ],
implementing program slicers [Wisconsin Program-Slicing Tool, CodeSurfer ],
developing methods for using slicing in software-engineering tools [HPR89, HR91, HR92 ], and
building slicing-based program-manipulation tools [HPR89].

(Click here for the home page of the Wisconsin Program-Slicing Project.)

The conference paper on interprocedural slicing that Susan Horwitz, David Binkley, and I wrote [HRB88] was selected for inclusion in a special SIGPLAN collection of the 50 most influential papers from the SIGPLAN Conference on Programming Language Design and Implementation from 1979 to 1999:

Horwitz, S., Reps, T., and Binkley, D., Interprocedural slicing using dependence graphs. 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): A Selection, K.S. McKinley, ed., ACM SIGPLAN Notices 39, 4 (April 2004), 232-243.

A journal version of [HRB88] appeared as [HRB90]. (Of all my papers, this is the most highly-cited; e.g., see Google Scholar and Citeseer.)

A retrospective on [HRB88] was published as

Horwitz, S., Reps, T., and Binkley, D., Retrospective: Interprocedural slicing using dependence graphs. 20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation (1979 - 1999): A Selection, K.S. McKinley, ed., ACM SIGPLAN Notices 39, 4 (April 2004), 229-231. [retrospective (PS); retrospective (PDF)]

In more recent work, my students, colleagues, and I have developed even better methods for program analysis and slicing based on a formalism called weighted pushdown systems.

Technology Transfer

In collaboration with Genevieve Rosay and Susan Horwitz, I built a system, called the Wisconsin Program-Slicing Tool, that implemented slicing of ANSI C programs. Although the Wisconsin Program-Slicing Tool was distributed from 1996-2000 for not-for-profit research purposes under license from the University of Wisconsin, it is no longer being distributed. However, there is a commercial tool available, named CodeSurfer, that is derived from the Wisconsin implementation and is available from GrammaTech, Inc., which licensed the technology from the Wisconsin Alumni Research Foundation (a not-for-profit foundation that uses royalty income from the technologies it licenses to support research at the University of Wisconsin).

GrammaTech has improved on the Wisconsin implementation considerably (both in terms of speed and space efficiency), and extended it to support C++. GrammaTech also offers CodeSurfer to academic researchers on very favorable terms. In 2007, Tavis Ormandy of Gentoo Linux said the following about CodeSurfer:

``Our team is responsible for investigating and verifying security problems in any of the 10,000+ open source programs we distribute. Codesurfer has made a dramatic impact on the time it takes us to isolate and understand security flaws, allowing us to traverse complex and unfamiliar codebases easily and trace problems to their source.''

A group at SAIC/NASA had this to say about CodeSurfer:

``Our group analyzes many mission-critical software projects to reduce defects and ensure that the software meets requirements. We conducted a formal study to see if CodeSufer could improve our software inspections. We found that CodeSurfer reduced inspection time by an average of 30%. In addition, when using CodeSurfer the number of defects found more than doubled.''

Slicing was also used in many commercial tools designed to address the Year 2000 problem, including IBM's Visual Age for COBOL, where slicing was used to support application understanding, program conversion, and program structuring, as well as for the Year 2000 and other code-renovation issues. Since 2000, slicing has continued to live on in tools for software renovation (e.g., to modify financial software to account for conversion to the Euro). (A list of more than 20 such tools can be found here; another list of slicing tools is available here.)

[Back to the top]

Interprocedural Dataflow Analysis and Context-Free-Language Reachability

Work that I carried out with Susan Horwitz, Mooly Sagiv, and David Melski established some interesting connections between the work on interprocedural program slicing and a number of other program-analysis problems, such as classical interprocedural dataflow analysis. In particular, we showed that a large class of program-analysis problems can be solved by transforming them into instances of a special kind of graph-reachability problem, called context-free-language reachability (CFL-reachability) [RHS95, HRS95, R95b, R96, MR98, R98, R00 ].

The notion of CFL-reachability was originated by Mihalis Yannakakis, who developed it in the context of graph-theoretic methods in database theory. Context-free-language reachability problems can be solved in polynomial time by algorithms similar to the ones we originally developed for interprocedural slicing [HRB90, RHSR94].

David Melski and I studied the relationship [MR98] between CFL-reachability and the program-analysis formalism of set constraints, which was introduced by John Reynolds and subsequently studied by Alex Aiken and Nevin Heintze.

The PowerPoint presentation prepared for the tutorial that I gave on this material at PLDI '00 can be found here.

Technology Transfer

In addition to the CFL-reachability-based slicing algorithms that are used in CodeSurfer, the work on program analysis via CFL-reachability had direct impact on the well-known SLAM tool from Microsoft. In particular, as described in [BR01], one of the key algorithms used in SLAM is based on the interprocedural dataflow-analysis algorithm given in [RHS95]. The SLAM work even got on Bill Gates's radar screen. For instance, in a keynote address at the Windows Hardware Engineering Conference in 2002, Gates touted the work on SLAM as follows:

``... software verification ... has been the Holy Grail of Computer Science for many decades, but now in some very key areas, for example, driver verification, we're building tools that can do actual proof about the software and how it works in order to guarantee the reliability.''

In more recent work, my students, colleagues, and I have developed even better methods for interprocedural dataflow analysis based on a formalism called weighted pushdown systems.

[Back to the top]

Techniques for Software Renovation

Mike Siff and I investigated techniques for converting C programs to C++ programs that take advantage of the improved features of C++, such as templates and classes. For instance, in [SR96, S98] we considered the problem of software generalization: Given a program component C, create a parameterized program component C' such that C' is usable in a wider variety of syntactic contexts than C. Furthermore, C' should be a semantically meaningful generalization of C; namely, there must exist an instantiation of C' that is equivalent in functionality to C. We gave an algorithm that generalizes C functions via type inference. The original functions operate on specific data types; the result of generalization is a collection of C++ function templates that operate on parameterized types. This version of the generalization problem is useful in the context of converting existing C programs to C++.

We also described a general technique for identifying modules in legacy code [SR98, S98]. The method is based on concept analysis---a branch of lattice theory that can be used to identify similarities among a set of objects based on their attributes. We showed how concept analysis can identify potential modules using both ``positive'' and ``negative'' information, and presented an algorithmic framework to construct a lattice of concepts from a program, where each concept represents a potential module.

[Back to the top]

The Use of Program Profiling for Software Maintenance

In 1996, I was asked by the Defense Advanced Research Projects Agency (DARPA) to plan a project aimed at reducing the impact of the Year 2000 problem on the Department of Defense. DARPA was particularly interested in whether there were ``any techniques in the research community that could be applied to the Y2K problem and have impact beyond present commercial Y2K products and services.'' I spent a semester as a kind of talent scout for DARPA to see what ideas might be worth pursuing. The most exciting of the ideas that turned up concerned a method for using path profiling as a heuristic to locate some of the sites in a program where there are problematic date manipulations [RBDL97] (and more generally as a an aid for testing and debugging).

A path profiler instruments a program so that the number of times each different loop-free path executes is accumulated during an execution run. With such an instrumented program, each run of the program generates a path spectrum for the execution---a distribution of the paths that were executed during that run. A path spectrum is a finite, easily obtainable characterization of a program's execution on a dataset, and provides a behavior signature for a run of the program.

Our techniques are based on the idea of comparing path spectra from different runs of the program. When different runs produce different spectra, the spectral differences can be used to identify paths in the program along which control diverges in the two runs. By choosing input datasets to hold all factors constant except one, the divergence can be attributed to this factor. The point of divergence itself may not be the cause of the underlying problem, but provides a starting place for a programmer to begin his exploration.

One application of this technique is in the ``Year 2000 Problem'' (i.e., the problem of fixing computer systems that use only 2-digit year fields in date-valued data). In this context, path-spectrum comparison provides a heuristic for identifying paths in a program that are good candidates for being date-dependent computations.

New technology for Year 2000 Problem tools
``The use of program profiling for software maintenance with applications to the Year 2000 Problem''
(also ``The use of program profiling in software testing'')
``Interprocedural path profiling'', CC 99
(also ``Interprocedural path profiling'', TR-1382)

A few years later, we found out that essentially the same idea had been discovered earlier by J. Collofello and L. Cousin [CC87], as well as by N. Wilde (who has done extensive research on the subject, and uses the term ``software reconnaissance'', e.g., see [W94, WC96]).

[Back to the top]

Computational Differentiation and Computational Divided Differencing

Tools for computational differentiation transform a program that computes a numerical function F(x) into a related program that computes F'(x) (the derivative of F). Louis Rall (of the UW Mathematics Department) and I described how techniques similar to those used in computational-differentiation tools can be used to implement other program transformations---in particular, a variety of transformations for computational divided differencing [RR00, RR03 ].

[Back to the top]

Computer Sciences | UW Home

Feedback or content questions: send email to "reps" at the cs.wisc.edu server
Technical or accessibility issues: lab@cs.wisc.edu
Copyright © 2002, 2003 The Board of Regents of the University of Wisconsin System.