Debugging Programs by Visualization and Interactive Contract

Debugging Programs
by Trace Visualization
and Contract Discovery
S. Kanat Bolazar
4/7/2006 Friday
Dr. Fawcett’s Brown Bag Seminar,
Syracuse University, Syracuse, NY
Information Theory and
Computational Complexity
• Information Theory
• Shannon's Entropy
• Mutual Information
• Kolmogorov Complexity
• Minimum Message Length (MML),
Minimum Description Length (MDL)
• Chaitin's Algorithmic Information Theory
Ideal versus Actual Program
Comprehension
• Ideal Minimal Program
• Actual Programs
• Complete Comprehension
• Task-Oriented Partial Comprehension
• Debugging and Pointwise Comprehension
My Research: How I Got Here
• Automated Software Reuse
• Automated Testing, using Contracts
• Real-Life Example: Azureus, A BitTorrent
File Sharing Client
• Problems: No Documentation, No Unit
Tests, Long Methods, No Comments
• Idea: Merge Comprehension and Testing
Innovative Debugging Techniques
• Reversible Execution, "Omniscient
Debugging"
• Delta Debugging
• Slicing
• Dynamic Slicing
• Query-Based Debugging
Complexity: Kilo-Lines-Of-Code
• Animal Farm - 144 pp, about 5000 lines total; 5
•
•
KL (kilo-lines)
Da Vinci Code - 454 pp, about 18000 lines total;
18 KL
Statistics from 2001 for some software:
– Apache 80 KLOC (kilo-lines of code)
– Linux 800 KLOC
– Mozilla 2100 KLOC (2.1 million lines)
• Book is a linear presentation
• Software is a highly-interconnected entity
Debugging: Steps
1. From program failure, find the defect
(fault origin) that caused the failure.
– This is usually the hardest part
– Mostly due to complexity of code
2. Remove the defect
– Essentially patch-up programming (ideally, a
patch that will last) or reprogramming.
– Can't be automated; requires creative input
of a programmer
Expected vs. Actual Behavior
• Specifications
– Formal vs. Informal
– Requirements must be documented
• Hierarchy of expectations:
– What is a step (how) for a goal becomes a target
(what) for a subgoal.
• Unit Tests
– Test each part of the program individually, before
putting pieces together.
– "Oracle"s
• Someone who states what the "expected behavior" is.
• Active oracles predict; passive "oracle"s verify
– Automated unit tests
Promising Approaches for
Automatic Fault Localization
• Preventive measures:
– Formal requirements
– Unit tests
• Automated fault localization:
– Reversible / Omniscient Debugging
– Delta Debugging
– Software Visualization
Reversible / Omniscient Debugging
• Motivation
– Debugging: Lost in source code space and time,
taking tiny steps.
• Definition
– Can run the program in reverse, change the past,
play forward.
• Advantages
– Binary search, natural, helps pinpoint minimal failurecausing behavior.
• Issues
– Full state information must be saved.
– We can’t revert the environment (file deleted, etc).
Delta Debugging
• Motivation
– Find minimal (change in) behavior that causes failure
to appear.
• Definition
– Given a passing run, a failing run, and an automated
passive oracle, automatically discover the fault origin
by trial-and-error.
• Advantages
– Can be used to simplify error reports.
– Quite time-efficient, and automated.
Delta Debugging, cont’d
• Requirements
– Test automation (oracle) is needed
– There must be an automated way to run the program
with different inputs (data file, emulated keyboard
and mouse inputs, etc) without human intervention.
• Issues
– For most programs, pinpointing the cause is harder,
but localizing the failure origin is possible, and
advantageous.
– May not be feasible for larger programs; automated
unit tests may be needed.
Software Visualization
• Motivation
– Program may not be understood, or worse, may be
misunderstood
– One can't debug what one can't understand
• Definition
– Program understanding
• Static approaches
• Dynamic approaches
– Algorithm animation
• Advantages
–
–
–
–
Usable power output
The "big picture" perspective
You may even find errors you were not looking for!!
Understanding program better will be helpful in the long term as
well.
Software Visualization, cont’d
• Requirements
– Automated data gathering
– A visual exploration tool
• Issues
– Cognitive economy
– Richness / simplicity
• encoding information in hue, brightness and contrast
• encoding information in every pixel
– Not fully automated; human is the pattern-detector
– Ideally, should allow fast set-up and quick feedback
The Making of a Legacy System
• Software evolves over time due to changes in
•
•
•
the environment, customer needs, etc.
Original authors of a program may have moved
on to other projects or another company
Architecture, design, and code-level
documentation may be missing
The developer/maintainer does not fully
understand the original design.
The Making of a Legacy System, cont’d
• Deadlines and time pressure often do not leave
•
•
•
the developer enough time to align changes with
the original design.
Over time, changes in directions not envisaged
by original architects accumulate.
Software becomes inconsistent, incoherent.
We call such a system a “legacy system”.
The Legacy System
• Runs, possibly with some small problems
• Can not be understood due to conflicting design
•
decisions implicit in accumulated changes
Is often brittle/fragile/unstable
– Code is replete with strong components (cycles of
dependency A --> B --> A, etc)
– Some bugs are resolved away from origin, making
both components buggy in isolation
– Small changes may produce quite unexpected
erroneous behavior (chaotic behavior)
• Cannot be properly maintained
Software Maintenance and Code Reuse
• Maintenance and code reuse require that the
•
•
•
•
developer understands the program
Adaptive Maintenance: Adapt to new
environments, versions of libraries used,
hardware, etc
Perfective Maintenance: Add new features
Corrective Maintenance: Debug; remove defects
that cause (fatal / nonfatal) failures
Reuse: Use components and libraries developed
as they are, in other projects
Program Comprehension Theories
• Most comprehension theories are arrived at by
•
•
•
•
empirical observation of programmers
Top-down comprehension: From overall program
functionality, go down to code
Bottom-up comprehension: From code, read and
build up understanding, going up
Opportunistic: Switch between these two as
needed, in a single comprehension session
Integrated: Combined model integrates earlier
models
Program Comprehension Theory Dictionary
• Chunking: Grouping code and other "chunks"
•
•
•
•
into larger understood blocks
Plan: Patterns and templates (also called slots)
of code; code structure decisions
Rules of Discourse: Coding practices; rules such
as "no duplicates; no unused code"
Beacon: A cue or a clue that a certain plan may
be used in code (eg. swap suggests sort)
Strategy: Approach developer takes in trying to
understand the program.
Program Comprehension Theory
Dictionary, cont’d
• Hypotheses/Conjectures: Developer's guesses
about the program and its parts.
– New evidence may require that a hypothesis be
accepted, changed, or rejected.
• Cross-referencing and annotations: Mappings
between features or specifications to source
code.
Top-Down Comprehension
• Separate theories by Brooks and by Soloway,
•
•
•
Andelson, and Ehrlich
Assumes expert developer, familiar with the
application domain, tools, and coding practices
If not an expert (in the field or in the language),
same developer may instead prefer a bottom-up
strategy.
Experiments show that expert programmers are:
– More flexible: Faulty hypotheses are refined or
discarded more quickly
– As inefficient as novice programmers on "unplan-like
code" (beacons mislead)
Brooks' (Top-Down) Theory
• A large tree of hypotheses is first formed (top•
down) then validated (bottom-up).
Brooks theorizes that the developer:
– Knows the application domain (vertical), intermediate
domains, and the language domain
– Maps the problem through these domains, from
functionality to code
– Forms hypotheses and subhypotheses from top of the
program, down
• By skimming names and code of the modules, classes,
methods, fields, etc
– Validates these hypotheses from the bottom (code
level) up
Bottom-Up Comprehension
• Separate theories by Shneiderman and Mayer, and
•
•
•
Pennington
Developer starts from code, chunking it into
meaningful blocks, up to the top level
Hypotheses are formed, starting at the code level
Pennington model:
– Program model: Control-flow abstraction of the
program; program structure
– Situation model: Program semantics; mapping to the
application domain
– Program model is often developed before situation
model
Letovsky's Model: Opportunistic
Comprehension Strategies
• Letovsky encouraged developers to "think aloud"
•
while performing debugging tasks
He later analyzed the collected records and
found that the developers:
– Often switch between top-down and bottom-up
strategies, as needed
– Ask "what", "why" and "how" questions while
understanding the program
– Use a mental annotation that maps specifications to
implementation
1980s, 1990s, and Now
• Most program comprehension theories were
•
•
•
formed in the 1980s
Tests validated individual theories, but theories
were not comparatively evaluated
Languages, program sizes, programmer skill level
and number would be different today
Von Mayrhauser and Vans, in 1994:
– Conducted a detailed survey as part of Vans' PhD work
– Integrated top-down and bottom-up (Pennington)
models
– Performed large-scale tests with industry partners to
validate their theory
Von Mayrhauser & Vans Model
• Von Mayrhauser & Vans integrated metamodel
of program comprehension consists of:
– Knowledge Base: Program model, situation model and
top-down structures
– Program model process (bottom-up, by chunking)
– Situation model process (bottom-up, by chunking)
– Top-down comprehension process (using schemata,
plans)
What, Why, and How
• These three questions about a piece of code are
•
interconnected.
Example: Method similarity(,) performs its task
by calling method dotProduct(.).
– What does method dotProduct(.) do?
It probably calculates a dot product of two vectors.
– Why does dotProduct(.) exist?
Because it is needed by similarity(.)
– How does similarity(.) work?
It uses the dot product as a similarity measure.
What, Why, and How: Hypotheses
• Note that these are merely hypotheses that
•
•
need testing.
Why questions refer to callers; without callers a
method need not be in a program.
How questions refer to code and methods
called; called methods must be somewhat
understood.
Why We Need Partial Understanding,
Hypotheses, Top-Down & Bottom-Up Analysis
• Without understanding all callers and all called
•
methods, we can't answer these three questions
fully.
Stability in program comprehensibility:
– Confidence in hypotheses about a method should not
require higher confidence level for all called and caller
methods
– Complete understanding of one method should be
achievable through partial understanding of the
caller(s) and the called methods
• Comprehension must be partial, by hypotheses,
going both top-down and bottom-up
Common Maintenance Problems
and Suggested Solutions
System Under Test, Inputs, Outputs
• Possible inputs to a tested program (system under
test, SUT):
– GUI (mouse) and text (keyboard) inputs to the program
– Configuration files, other files, network connections,
other input streams, other OS state
• Possible outputs from the SUT:
– Displayed images, icons, diagrams, text, 2-D colorcoded information murals
– File and network outputs, other output streams
– Other OS actions, possibly changing OS state
• SUT internal state can be considered an input and
an output as well.
Debugging a Program
• A maintainer (tester) may have to debug
a program (system under test, SUT).
• Often debugging tools are used to
extract more internal state information
from SUT.
• We'll define five potential problems in
detail, and suggest guidelines for
solutions.
Five Potential Problems
1.Tester may have faulty, incomplete or no mental
model of the SUT
2.Delay in feedback causes delay in task
completion
3.Comprehension is limited to usable information in
the output of SUT and tools used
4.Setting up and interacting with SUT and testing
tools used may take too much time
5.In a large program, mapping features to code
takes time.
1. Incomplete / Faulty Mental Model
Problem: Tester may have faulty, incomplete or
no mental model of the SUT
– Tester often is not be the original developer.
– The original developer may even have left the
company.
– Tester may lack application domain (vertical
domain) information.
Solution: Combine program comprehension and
testing
2. Feedback Delay Causes Delay In
Task Completion
Problem: Delay in feedback causes delay in
task completion
– Delay between writing and running the tests
allows faulty assumptions to live longer.
– Delay between reading the source code and
seeing its dynamic behavior hinders
comprehension.
Solution: Minimize feedback delay, in both
testing and comprehension
3. Comprehension is Limited to
Usable Information Output
Problem: Comprehension is limited to usable
information in the output of SUT and tools used
– Output of SUT may be limited, making
understanding the SUT harder.
– Output already predictable by tester's mental
model does not help evolve this model.
(in information theory, this would be called "mutual
information")
– Tools used can extract and display more SUT
information (internal state of the SUT).
3. Comprehension is Limited to
Usable Information Output, cont’d
– Some types of output are better for certain
tasks:
• Tables of numbers can be much more precise and
detailed
• Plots and graphs are easier to understand than
tables
• Graphical descriptions make the large-scale
patterns apparent
Solution: Maximize usable output; use
visualization where possible
4. Inputs (to SUT and Tools) May
Take Too Much Time
Problem: Setting up and interacting with
SUT and testing tools used may take too
much time
– Some programs require a lot of user input
before reaching a failure state.
– Debugging tools often require very small
steps, and take a long time to reach a failure.
– Delay and input complexity causes
misrepresentations in tester's mental model of
SUT.
4. Inputs (to SUT and Tools) May
Take Too Much Time, cont’d
– Time spent also delays completion of the
overall task.
– Automated unit testing tools such as JUnit
require that tests be written.
– Once such tests are written, automation
greatly improves efficiency.
Solution: Minimize required input from the
user; allow automation
5. Mapping Features to Code May
Take Time
Problem: In a large program, mapping features
to code takes time.
– Feature-responsibility matrix is often not explicitly
documented; tester may not know.
– Source code for the SUT may not be welldocumented.
– Features (dynamic behavior, functionality) are visible
only when the SUT is run.
– Often, parts of a deployed SUT can not be run and
tested easily in isolation.
Solution: Connect functionality to code; help
tester locate features
Guiding Principles, Goals, and
Design Decisions
• The solutions suggested for these five problems
•
•
•
become the goals of our system.
Here we list these goals again, and describe how
we attempt to achieve these goals.
The problems were and these goals are listed in
how we prioritize these issues.
Goal 1: Combine Program Comprehension and
Testing
– Create an understand-test loop
– Use this loop to incrementally improve understanding
and confidence in accuracy
Guiding Principles, Goals, and
Design Decisions, cont'd
• Goal 2: Minimize Feedback Delay (in Comprehension and
Testing)
– Loop must execute without need for recompilation
– Allow specifying and evolving hypotheses about the program during run
– Allow these hypotheses to specify and execute tests without need for
recompilation
• Goal 3: Maximize Usable Output
– Use visualization where possible; increase the information content and
availability
– Allow drilling down from a high-level view to individual method calls
– Allow display of aggregate behavior (of all calls for one method), visually
– Allow statistical analysis of this aggregate behavior
Guiding Principles, Goals, and
Design Decisions, cont'd
• Goal 4: Minimize Required Input; Allow Automation
– Allow interactive annotation of the program while it is running
– The annotated code will run for each call of the annotated
method
– Such annotations can be used to specify expected behavior
(contracts) of the method
• Goal 5: Connect Functionality to Code; Help Tester
Locate Features
– Discover methods central to one feature by tracing while using
that feature
– Synchronously display program I/O and executing methods (and
method call I/O)
Our System
Tracing Sessions
• The system under test (SUT) is run inside our
•
prototype environment.
First, the tester selects the methods to trace.
– To understand a class, all its methods may be chosen.
– Often, the focus is on a few methods related by
functionality.
• The tester interacts with the program to elicit
•
the erroneous behavior (non-fatal failure)
The tester stops the tracing and views the trace
to form an understanding, hypotheses.
Tracing Sessions, cont’d
• The tester may repeat these steps without
•
stopping the SUT.
Sessions are recorded in separate execution trace
records which remain viewable.
Trace Configurations and
Contracts
• Selected methods to trace constitute a
•
•
•
•
•
configuration.
A new configuration can be created by cloning
any existing configuration.
Configurations can be modified.
When a configuration is used in a test, a frozen
instance called a contract is saved.
Contracts contain methods to trace, methods not
to trace, and method contracts.
We'll see more about method contracts later.
Tool Used: AspectJ Aspect-Oriented
Programming Framework
• AspectJ is an Aspect-Oriented Programming
•
•
•
•
(AOP) framework
AOP allows cross-cutting concerns such as
logging to be implemented separately.
This is done by intercepting interesting events
and executing some code
Interesting events may be method calls, thrown
exceptions, variable access/change
In AspectJ, pointcuts define which important
events will be intercepted
Interception With Method Call Pointcuts
• AspectJ can do load-time weaving of code into
•
•
•
•
Java bytecode.
This means we can intercept methods of any
library for which we have no source code
This is not black-box testing of methods; intralibrary method calls can be caught
Example: While intercepting library methods f
and g, we can also catch when f calls g.
Note: This is optional and can be specified in an
AspectJ pointcut as desired
Standard Method Call Observables
• Method parameters
• The "this" reference
• The value returned (null if void is the
return type)
• A "that" reference: Refers to the object
that called this method
– Example: If str.substring(2) calls
DefaultLocale.getEncoding() method:
• getEncoding method call has this = DefaultLocale
and that = str
User-Defined Observables
• The tester can define variables for a method
•
while SUT is running
The variables can be defined as precondition
variables or postcondition variables
– Precondition variables evaluate before a method call
– Postcondition variables evaluate after a method call
• These variables define the user-defined
•
observables
A variable can use standard observables and
user-defined observables defined earlier
User-Defined Observables, cont’d
• We allow calling private methods and accessing
•
•
•
private fields of any object.
In the definitions, the tester can use arbitrary
Java code including blocks of code.
These variables will be evaluated for each
method call
We store the values of all observables (standard
and user-defined) for each method call
Sample Uses of User-Defined
Observables
• Extracting a field value to save it for later
• Cloning an object to save the whole object for
•
•
•
later
Displaying some values on standard output
Loops to aggregate or check values of an array
(sum, min, max, forall checks, etc)
Code to pop open a new GUI frame or dialog
Design By Contract (DBC)
• Tester can specify expected behavior of the
•
•
method using method contracts
We use programmatically specified DBC (Design
by Contract)
Method contracts are made up of preconditions
and postconditions for the method:
– Preconditions must hold true before a method call
proceeds
– Postconditions must hold true when a method call ends
• Caller is responsible for making preconditions true
• Method itself is responsible for making
postconditions hold true
Method Contracts
• Each precondition and postcondition is defined
•
•
in a Boolean variable
These variables are then asserted to always hold
true
For common tasks, to reduce the typing effort,
we use macros that expand to code
Advantages and Disadvantages of Using
Programmatic Syntax for Specifications
• Advantages of using the target language for
specification:
– Familiar to developers; no new language needs to be
learned
– Specification has unambiguous semantics
– Specification is directly executable
• Potential disadvantages:
– A formal specification language may allow shorter
specifications
• We partly address this problem by using macros
– A formal specification language may allow declarative
syntax
• But this means it won't be directly executable
JML: Java Modeling Language
• Java annotated with a modeling language that is
•
•
•
•
•
extended Java
Declaration of side effects
Careful avoidance of side-effects in the model
Model holds more information to check the
object's behavior
Extra model information is stored for each object
(extra fields, objects, etc)
Comes with a large side-effect-free library of
collection classes
OCL: Object Constraint
Language
• A standard part of UML 2.0, often used in
•
•
•
•
annotations in UML diagrams
Tools can share such annotations using XML
Metadata Interchange (XMI)
Independent of target language
Defines basic quantifiers like forall, exists, etc.
Defines some basic data structures (set, bag,
etc)
UML 2.0 OCL
• OCL now defines a tuple type
• Earlier version of OCL could not define
multi-layer structured models
• Collections (sets, sequences and bags) are
not automatically flattened anymore
• A separate flatten operation is added
• We can now create a set of sets, and the
result is not just the union of these sets.
Execution Trace Summary
Statistics
• While the SUT is being traced, summary
•
•
•
statistics are displayed in real time.
Any failure (as per method contracts) is
highlighted in red.
If an interaction with SUT causes some methods
to fail, this will be immediately visible.
This makes pinpointing the cause of a problem
easier.
Visualizer
• An execution trace record stores all observables
•
•
•
for all calls to selected methods
Visualizer displays individual method calls over
time as an execution mural.
A time line axis displays tick marks, and uses
tool tips to interactively display time.
Method signatures are hyperlinked to method
contracts used for this trace record.
Color Spectrum
• We use discriminable colors from a relatively
•
•
•
cool color spectrum (blue-green-yellow)
These colors are used only for method calls that
conforms to method's contract.
Method calls which have any failed assertions
are shown in red.
If a method has any failed calls, it is highlighted
with a red cross mark.
Method Call Details
• Tester clicks on one method call rectangle to
•
inspect method call details (observables).
Recall that observables consist of standard and
user-defined variables and references:
– Standard observables: parameters, $this, $that,
$result
– User-defined observables: Precondition and
postcondition variables defined, if any
• Failed conditions and calls are highlighted in red
• The method signature displayed is hyperlinked
to the method contract used
Object Details
• The tester can inspect object details
including private field values
• Superclasses and their field values can
also be inspected.
• The tester can choose to:
– Expand any objects in this window (click on
triangle on the left)
– Inspect the object in a separate window (click
on the object's string representation)
Aggregate Analysis: Statistical
Analysis and Plots
• Standard debuggers would only allow access to
•
•
a single method call's details
We accumulate data from multiple calls to a
method of interest
We leverage this extra data to the tester's
advantage using two techniques:
– Tester can view covariances between all observables,
in data collected from all calls
– Tester can select any two observables and view a 2D
plot of these two variables
Statistical Analysis
• We display the covariances visually in a
table
• The data used is accumulated from all
calls to this method
• Darker boxes represent stronger
relationship between variables/observables
• Tester can then click on any box to open a
plot of these two variables
Plots
• The points in the plot are hyperlinked back to
•
•
•
the method call which provided this data
Same color spectrum is used to highlight correct
versus failed behavior
Tester will often start from such analysis to form
an understanding of the method
Observed relationships between variables can
then be asserted as contract conditions
Discussion
• Our approach is best for non-fatal repeatable failures
•
•
•
•
that don't corrupt program internal state in a way that
can't be reset easily.
We avoid recompilation, and ideally, we want to avoid
restarting the SUT.
Fatal errors break the instant feedback understand-test
loop.
If a failure corrupts internal state in a way that can't be
reset easily, our loop is again broken by the program
behavior.
For fatal errors, it is best to wrap the program in an
exception/error-catching resetting outer program. This
way, restarting delays can be avoided and feedback
delay can be minimized.
Future Work
• Usability tests by graduate students and software
•
•
•
engineers will allow us to assess the utility of our tool
Program input/output and execution synchronization is
very important in understanding how functions are
implemented in a program. GUI recording and playback
as well as stream (input/output/file/network) following
will be used for this purpose.
Controlling so many open windows: We would like to use
a categorized iconized format to easily find and control
any previously opened window.
Integration with a standard IDE/debugger that allows
two-way switching would make our work much easier.
We would not have to deal with all the tasks debuggers
have already implemented.