Debugging Programs by Trace Visualization and Contract Discovery S. Kanat Bolazar 4/7/2006 Friday Dr. Fawcett’s Brown Bag Seminar, Syracuse University, Syracuse, NY Information Theory and Computational Complexity • Information Theory • Shannon's Entropy • Mutual Information • Kolmogorov Complexity • Minimum Message Length (MML), Minimum Description Length (MDL) • Chaitin's Algorithmic Information Theory Ideal versus Actual Program Comprehension • Ideal Minimal Program • Actual Programs • Complete Comprehension • Task-Oriented Partial Comprehension • Debugging and Pointwise Comprehension My Research: How I Got Here • Automated Software Reuse • Automated Testing, using Contracts • Real-Life Example: Azureus, A BitTorrent File Sharing Client • Problems: No Documentation, No Unit Tests, Long Methods, No Comments • Idea: Merge Comprehension and Testing Innovative Debugging Techniques • Reversible Execution, "Omniscient Debugging" • Delta Debugging • Slicing • Dynamic Slicing • Query-Based Debugging Complexity: Kilo-Lines-Of-Code • Animal Farm - 144 pp, about 5000 lines total; 5 • • KL (kilo-lines) Da Vinci Code - 454 pp, about 18000 lines total; 18 KL Statistics from 2001 for some software: – Apache 80 KLOC (kilo-lines of code) – Linux 800 KLOC – Mozilla 2100 KLOC (2.1 million lines) • Book is a linear presentation • Software is a highly-interconnected entity Debugging: Steps 1. From program failure, find the defect (fault origin) that caused the failure. – This is usually the hardest part – Mostly due to complexity of code 2. Remove the defect – Essentially patch-up programming (ideally, a patch that will last) or reprogramming. – Can't be automated; requires creative input of a programmer Expected vs. Actual Behavior • Specifications – Formal vs. Informal – Requirements must be documented • Hierarchy of expectations: – What is a step (how) for a goal becomes a target (what) for a subgoal. • Unit Tests – Test each part of the program individually, before putting pieces together. – "Oracle"s • Someone who states what the "expected behavior" is. • Active oracles predict; passive "oracle"s verify – Automated unit tests Promising Approaches for Automatic Fault Localization • Preventive measures: – Formal requirements – Unit tests • Automated fault localization: – Reversible / Omniscient Debugging – Delta Debugging – Software Visualization Reversible / Omniscient Debugging • Motivation – Debugging: Lost in source code space and time, taking tiny steps. • Definition – Can run the program in reverse, change the past, play forward. • Advantages – Binary search, natural, helps pinpoint minimal failurecausing behavior. • Issues – Full state information must be saved. – We can’t revert the environment (file deleted, etc). Delta Debugging • Motivation – Find minimal (change in) behavior that causes failure to appear. • Definition – Given a passing run, a failing run, and an automated passive oracle, automatically discover the fault origin by trial-and-error. • Advantages – Can be used to simplify error reports. – Quite time-efficient, and automated. Delta Debugging, cont’d • Requirements – Test automation (oracle) is needed – There must be an automated way to run the program with different inputs (data file, emulated keyboard and mouse inputs, etc) without human intervention. • Issues – For most programs, pinpointing the cause is harder, but localizing the failure origin is possible, and advantageous. – May not be feasible for larger programs; automated unit tests may be needed. Software Visualization • Motivation – Program may not be understood, or worse, may be misunderstood – One can't debug what one can't understand • Definition – Program understanding • Static approaches • Dynamic approaches – Algorithm animation • Advantages – – – – Usable power output The "big picture" perspective You may even find errors you were not looking for!! Understanding program better will be helpful in the long term as well. Software Visualization, cont’d • Requirements – Automated data gathering – A visual exploration tool • Issues – Cognitive economy – Richness / simplicity • encoding information in hue, brightness and contrast • encoding information in every pixel – Not fully automated; human is the pattern-detector – Ideally, should allow fast set-up and quick feedback The Making of a Legacy System • Software evolves over time due to changes in • • • the environment, customer needs, etc. Original authors of a program may have moved on to other projects or another company Architecture, design, and code-level documentation may be missing The developer/maintainer does not fully understand the original design. The Making of a Legacy System, cont’d • Deadlines and time pressure often do not leave • • • the developer enough time to align changes with the original design. Over time, changes in directions not envisaged by original architects accumulate. Software becomes inconsistent, incoherent. We call such a system a “legacy system”. The Legacy System • Runs, possibly with some small problems • Can not be understood due to conflicting design • decisions implicit in accumulated changes Is often brittle/fragile/unstable – Code is replete with strong components (cycles of dependency A --> B --> A, etc) – Some bugs are resolved away from origin, making both components buggy in isolation – Small changes may produce quite unexpected erroneous behavior (chaotic behavior) • Cannot be properly maintained Software Maintenance and Code Reuse • Maintenance and code reuse require that the • • • • developer understands the program Adaptive Maintenance: Adapt to new environments, versions of libraries used, hardware, etc Perfective Maintenance: Add new features Corrective Maintenance: Debug; remove defects that cause (fatal / nonfatal) failures Reuse: Use components and libraries developed as they are, in other projects Program Comprehension Theories • Most comprehension theories are arrived at by • • • • empirical observation of programmers Top-down comprehension: From overall program functionality, go down to code Bottom-up comprehension: From code, read and build up understanding, going up Opportunistic: Switch between these two as needed, in a single comprehension session Integrated: Combined model integrates earlier models Program Comprehension Theory Dictionary • Chunking: Grouping code and other "chunks" • • • • into larger understood blocks Plan: Patterns and templates (also called slots) of code; code structure decisions Rules of Discourse: Coding practices; rules such as "no duplicates; no unused code" Beacon: A cue or a clue that a certain plan may be used in code (eg. swap suggests sort) Strategy: Approach developer takes in trying to understand the program. Program Comprehension Theory Dictionary, cont’d • Hypotheses/Conjectures: Developer's guesses about the program and its parts. – New evidence may require that a hypothesis be accepted, changed, or rejected. • Cross-referencing and annotations: Mappings between features or specifications to source code. Top-Down Comprehension • Separate theories by Brooks and by Soloway, • • • Andelson, and Ehrlich Assumes expert developer, familiar with the application domain, tools, and coding practices If not an expert (in the field or in the language), same developer may instead prefer a bottom-up strategy. Experiments show that expert programmers are: – More flexible: Faulty hypotheses are refined or discarded more quickly – As inefficient as novice programmers on "unplan-like code" (beacons mislead) Brooks' (Top-Down) Theory • A large tree of hypotheses is first formed (top• down) then validated (bottom-up). Brooks theorizes that the developer: – Knows the application domain (vertical), intermediate domains, and the language domain – Maps the problem through these domains, from functionality to code – Forms hypotheses and subhypotheses from top of the program, down • By skimming names and code of the modules, classes, methods, fields, etc – Validates these hypotheses from the bottom (code level) up Bottom-Up Comprehension • Separate theories by Shneiderman and Mayer, and • • • Pennington Developer starts from code, chunking it into meaningful blocks, up to the top level Hypotheses are formed, starting at the code level Pennington model: – Program model: Control-flow abstraction of the program; program structure – Situation model: Program semantics; mapping to the application domain – Program model is often developed before situation model Letovsky's Model: Opportunistic Comprehension Strategies • Letovsky encouraged developers to "think aloud" • while performing debugging tasks He later analyzed the collected records and found that the developers: – Often switch between top-down and bottom-up strategies, as needed – Ask "what", "why" and "how" questions while understanding the program – Use a mental annotation that maps specifications to implementation 1980s, 1990s, and Now • Most program comprehension theories were • • • formed in the 1980s Tests validated individual theories, but theories were not comparatively evaluated Languages, program sizes, programmer skill level and number would be different today Von Mayrhauser and Vans, in 1994: – Conducted a detailed survey as part of Vans' PhD work – Integrated top-down and bottom-up (Pennington) models – Performed large-scale tests with industry partners to validate their theory Von Mayrhauser & Vans Model • Von Mayrhauser & Vans integrated metamodel of program comprehension consists of: – Knowledge Base: Program model, situation model and top-down structures – Program model process (bottom-up, by chunking) – Situation model process (bottom-up, by chunking) – Top-down comprehension process (using schemata, plans) What, Why, and How • These three questions about a piece of code are • interconnected. Example: Method similarity(,) performs its task by calling method dotProduct(.). – What does method dotProduct(.) do? It probably calculates a dot product of two vectors. – Why does dotProduct(.) exist? Because it is needed by similarity(.) – How does similarity(.) work? It uses the dot product as a similarity measure. What, Why, and How: Hypotheses • Note that these are merely hypotheses that • • need testing. Why questions refer to callers; without callers a method need not be in a program. How questions refer to code and methods called; called methods must be somewhat understood. Why We Need Partial Understanding, Hypotheses, Top-Down & Bottom-Up Analysis • Without understanding all callers and all called • methods, we can't answer these three questions fully. Stability in program comprehensibility: – Confidence in hypotheses about a method should not require higher confidence level for all called and caller methods – Complete understanding of one method should be achievable through partial understanding of the caller(s) and the called methods • Comprehension must be partial, by hypotheses, going both top-down and bottom-up Common Maintenance Problems and Suggested Solutions System Under Test, Inputs, Outputs • Possible inputs to a tested program (system under test, SUT): – GUI (mouse) and text (keyboard) inputs to the program – Configuration files, other files, network connections, other input streams, other OS state • Possible outputs from the SUT: – Displayed images, icons, diagrams, text, 2-D colorcoded information murals – File and network outputs, other output streams – Other OS actions, possibly changing OS state • SUT internal state can be considered an input and an output as well. Debugging a Program • A maintainer (tester) may have to debug a program (system under test, SUT). • Often debugging tools are used to extract more internal state information from SUT. • We'll define five potential problems in detail, and suggest guidelines for solutions. Five Potential Problems 1.Tester may have faulty, incomplete or no mental model of the SUT 2.Delay in feedback causes delay in task completion 3.Comprehension is limited to usable information in the output of SUT and tools used 4.Setting up and interacting with SUT and testing tools used may take too much time 5.In a large program, mapping features to code takes time. 1. Incomplete / Faulty Mental Model Problem: Tester may have faulty, incomplete or no mental model of the SUT – Tester often is not be the original developer. – The original developer may even have left the company. – Tester may lack application domain (vertical domain) information. Solution: Combine program comprehension and testing 2. Feedback Delay Causes Delay In Task Completion Problem: Delay in feedback causes delay in task completion – Delay between writing and running the tests allows faulty assumptions to live longer. – Delay between reading the source code and seeing its dynamic behavior hinders comprehension. Solution: Minimize feedback delay, in both testing and comprehension 3. Comprehension is Limited to Usable Information Output Problem: Comprehension is limited to usable information in the output of SUT and tools used – Output of SUT may be limited, making understanding the SUT harder. – Output already predictable by tester's mental model does not help evolve this model. (in information theory, this would be called "mutual information") – Tools used can extract and display more SUT information (internal state of the SUT). 3. Comprehension is Limited to Usable Information Output, cont’d – Some types of output are better for certain tasks: • Tables of numbers can be much more precise and detailed • Plots and graphs are easier to understand than tables • Graphical descriptions make the large-scale patterns apparent Solution: Maximize usable output; use visualization where possible 4. Inputs (to SUT and Tools) May Take Too Much Time Problem: Setting up and interacting with SUT and testing tools used may take too much time – Some programs require a lot of user input before reaching a failure state. – Debugging tools often require very small steps, and take a long time to reach a failure. – Delay and input complexity causes misrepresentations in tester's mental model of SUT. 4. Inputs (to SUT and Tools) May Take Too Much Time, cont’d – Time spent also delays completion of the overall task. – Automated unit testing tools such as JUnit require that tests be written. – Once such tests are written, automation greatly improves efficiency. Solution: Minimize required input from the user; allow automation 5. Mapping Features to Code May Take Time Problem: In a large program, mapping features to code takes time. – Feature-responsibility matrix is often not explicitly documented; tester may not know. – Source code for the SUT may not be welldocumented. – Features (dynamic behavior, functionality) are visible only when the SUT is run. – Often, parts of a deployed SUT can not be run and tested easily in isolation. Solution: Connect functionality to code; help tester locate features Guiding Principles, Goals, and Design Decisions • The solutions suggested for these five problems • • • become the goals of our system. Here we list these goals again, and describe how we attempt to achieve these goals. The problems were and these goals are listed in how we prioritize these issues. Goal 1: Combine Program Comprehension and Testing – Create an understand-test loop – Use this loop to incrementally improve understanding and confidence in accuracy Guiding Principles, Goals, and Design Decisions, cont'd • Goal 2: Minimize Feedback Delay (in Comprehension and Testing) – Loop must execute without need for recompilation – Allow specifying and evolving hypotheses about the program during run – Allow these hypotheses to specify and execute tests without need for recompilation • Goal 3: Maximize Usable Output – Use visualization where possible; increase the information content and availability – Allow drilling down from a high-level view to individual method calls – Allow display of aggregate behavior (of all calls for one method), visually – Allow statistical analysis of this aggregate behavior Guiding Principles, Goals, and Design Decisions, cont'd • Goal 4: Minimize Required Input; Allow Automation – Allow interactive annotation of the program while it is running – The annotated code will run for each call of the annotated method – Such annotations can be used to specify expected behavior (contracts) of the method • Goal 5: Connect Functionality to Code; Help Tester Locate Features – Discover methods central to one feature by tracing while using that feature – Synchronously display program I/O and executing methods (and method call I/O) Our System Tracing Sessions • The system under test (SUT) is run inside our • prototype environment. First, the tester selects the methods to trace. – To understand a class, all its methods may be chosen. – Often, the focus is on a few methods related by functionality. • The tester interacts with the program to elicit • the erroneous behavior (non-fatal failure) The tester stops the tracing and views the trace to form an understanding, hypotheses. Tracing Sessions, cont’d • The tester may repeat these steps without • stopping the SUT. Sessions are recorded in separate execution trace records which remain viewable. Trace Configurations and Contracts • Selected methods to trace constitute a • • • • • configuration. A new configuration can be created by cloning any existing configuration. Configurations can be modified. When a configuration is used in a test, a frozen instance called a contract is saved. Contracts contain methods to trace, methods not to trace, and method contracts. We'll see more about method contracts later. Tool Used: AspectJ Aspect-Oriented Programming Framework • AspectJ is an Aspect-Oriented Programming • • • • (AOP) framework AOP allows cross-cutting concerns such as logging to be implemented separately. This is done by intercepting interesting events and executing some code Interesting events may be method calls, thrown exceptions, variable access/change In AspectJ, pointcuts define which important events will be intercepted Interception With Method Call Pointcuts • AspectJ can do load-time weaving of code into • • • • Java bytecode. This means we can intercept methods of any library for which we have no source code This is not black-box testing of methods; intralibrary method calls can be caught Example: While intercepting library methods f and g, we can also catch when f calls g. Note: This is optional and can be specified in an AspectJ pointcut as desired Standard Method Call Observables • Method parameters • The "this" reference • The value returned (null if void is the return type) • A "that" reference: Refers to the object that called this method – Example: If str.substring(2) calls DefaultLocale.getEncoding() method: • getEncoding method call has this = DefaultLocale and that = str User-Defined Observables • The tester can define variables for a method • while SUT is running The variables can be defined as precondition variables or postcondition variables – Precondition variables evaluate before a method call – Postcondition variables evaluate after a method call • These variables define the user-defined • observables A variable can use standard observables and user-defined observables defined earlier User-Defined Observables, cont’d • We allow calling private methods and accessing • • • private fields of any object. In the definitions, the tester can use arbitrary Java code including blocks of code. These variables will be evaluated for each method call We store the values of all observables (standard and user-defined) for each method call Sample Uses of User-Defined Observables • Extracting a field value to save it for later • Cloning an object to save the whole object for • • • later Displaying some values on standard output Loops to aggregate or check values of an array (sum, min, max, forall checks, etc) Code to pop open a new GUI frame or dialog Design By Contract (DBC) • Tester can specify expected behavior of the • • method using method contracts We use programmatically specified DBC (Design by Contract) Method contracts are made up of preconditions and postconditions for the method: – Preconditions must hold true before a method call proceeds – Postconditions must hold true when a method call ends • Caller is responsible for making preconditions true • Method itself is responsible for making postconditions hold true Method Contracts • Each precondition and postcondition is defined • • in a Boolean variable These variables are then asserted to always hold true For common tasks, to reduce the typing effort, we use macros that expand to code Advantages and Disadvantages of Using Programmatic Syntax for Specifications • Advantages of using the target language for specification: – Familiar to developers; no new language needs to be learned – Specification has unambiguous semantics – Specification is directly executable • Potential disadvantages: – A formal specification language may allow shorter specifications • We partly address this problem by using macros – A formal specification language may allow declarative syntax • But this means it won't be directly executable JML: Java Modeling Language • Java annotated with a modeling language that is • • • • • extended Java Declaration of side effects Careful avoidance of side-effects in the model Model holds more information to check the object's behavior Extra model information is stored for each object (extra fields, objects, etc) Comes with a large side-effect-free library of collection classes OCL: Object Constraint Language • A standard part of UML 2.0, often used in • • • • annotations in UML diagrams Tools can share such annotations using XML Metadata Interchange (XMI) Independent of target language Defines basic quantifiers like forall, exists, etc. Defines some basic data structures (set, bag, etc) UML 2.0 OCL • OCL now defines a tuple type • Earlier version of OCL could not define multi-layer structured models • Collections (sets, sequences and bags) are not automatically flattened anymore • A separate flatten operation is added • We can now create a set of sets, and the result is not just the union of these sets. Execution Trace Summary Statistics • While the SUT is being traced, summary • • • statistics are displayed in real time. Any failure (as per method contracts) is highlighted in red. If an interaction with SUT causes some methods to fail, this will be immediately visible. This makes pinpointing the cause of a problem easier. Visualizer • An execution trace record stores all observables • • • for all calls to selected methods Visualizer displays individual method calls over time as an execution mural. A time line axis displays tick marks, and uses tool tips to interactively display time. Method signatures are hyperlinked to method contracts used for this trace record. Color Spectrum • We use discriminable colors from a relatively • • • cool color spectrum (blue-green-yellow) These colors are used only for method calls that conforms to method's contract. Method calls which have any failed assertions are shown in red. If a method has any failed calls, it is highlighted with a red cross mark. Method Call Details • Tester clicks on one method call rectangle to • inspect method call details (observables). Recall that observables consist of standard and user-defined variables and references: – Standard observables: parameters, $this, $that, $result – User-defined observables: Precondition and postcondition variables defined, if any • Failed conditions and calls are highlighted in red • The method signature displayed is hyperlinked to the method contract used Object Details • The tester can inspect object details including private field values • Superclasses and their field values can also be inspected. • The tester can choose to: – Expand any objects in this window (click on triangle on the left) – Inspect the object in a separate window (click on the object's string representation) Aggregate Analysis: Statistical Analysis and Plots • Standard debuggers would only allow access to • • a single method call's details We accumulate data from multiple calls to a method of interest We leverage this extra data to the tester's advantage using two techniques: – Tester can view covariances between all observables, in data collected from all calls – Tester can select any two observables and view a 2D plot of these two variables Statistical Analysis • We display the covariances visually in a table • The data used is accumulated from all calls to this method • Darker boxes represent stronger relationship between variables/observables • Tester can then click on any box to open a plot of these two variables Plots • The points in the plot are hyperlinked back to • • • the method call which provided this data Same color spectrum is used to highlight correct versus failed behavior Tester will often start from such analysis to form an understanding of the method Observed relationships between variables can then be asserted as contract conditions Discussion • Our approach is best for non-fatal repeatable failures • • • • that don't corrupt program internal state in a way that can't be reset easily. We avoid recompilation, and ideally, we want to avoid restarting the SUT. Fatal errors break the instant feedback understand-test loop. If a failure corrupts internal state in a way that can't be reset easily, our loop is again broken by the program behavior. For fatal errors, it is best to wrap the program in an exception/error-catching resetting outer program. This way, restarting delays can be avoided and feedback delay can be minimized. Future Work • Usability tests by graduate students and software • • • engineers will allow us to assess the utility of our tool Program input/output and execution synchronization is very important in understanding how functions are implemented in a program. GUI recording and playback as well as stream (input/output/file/network) following will be used for this purpose. Controlling so many open windows: We would like to use a categorized iconized format to easily find and control any previously opened window. Integration with a standard IDE/debugger that allows two-way switching would make our work much easier. We would not have to deal with all the tasks debuggers have already implemented.
© Copyright 2026 Paperzz