WuKong - College of Engineering

WuKong: Automatically Detecting
and Localizing Bugs that Manifest at
Large System Scales
Bowen Zhou
Jonathan Too
Milind Kulkarni
Saurabh Bagchi
Purdue University
Ever Changing Behavior of Software
• Software has to be adaptive to accommodate
for different platforms, inputs and
configurations.
• As a side effect, manifestation of a bug may
depend on a particular platform, input or
configuration.
2
Ever Changing Behavior of Software
3
Software Development Process
Develop a new feature and its unit tests
Test the new feature on a local machine
Push the feature into productoin systems
Break production systems
Roll back the feature
4
Bugs in Production Run
• Properties
– Remains unnoticed when the application is tested
on developer's workstation
– Breaks production system when the application is
running on a cluster and/or serving real user
requests
• Examples
– Configuration Error
– Integer Overflow
5
Bugs in Production Run
• Properties
– Remains unnoticed when the application is tested
on developer's workstation
– Breaks production system when the application is
running on a cluster and/or serving real user
requests
• Examples
– Configuration Error
– Integer Overflow
6
Modeling Program Behavior for
Finding Bugs
• Dubbed as Statistical Debugging
[Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06]
[Chilimbi ICSE ‘09] [Liblit PLDI ‘03]
– Represents program behavior as a set of features
that can be measured in runtime
– Builds a model to describe and predict the
features based on data collected from many runs
– Detects abnormal features that deviate from the
model's prediction beyond a certain threshold
7
Modeling Program Behavior for
Finding Bugs
• Dubbed as Statistical Debugging
[Bronevetsky DSN ‘10] [Mirgorodskiy SC ’06]
[Chilimbi ICSE ‘09] [Liblit PLDI ‘03]
– Represents program behavior as a set of features that
can be measured in runtime
– Builds a model to describe and predict the
features based on data collected from many runs
– Detects abnormal features that deviate from the
model's prediction beyond a certain threshold
8
Modeling Scale-dependent Behavior
Is there a bug in
one of the
production runs?
Production runs
# OF TIMES LOOP EXECUTES
Training runs
RUN #
9
Modeling Scale-dependent Behavior
Accounting for
scale makes
trends clear,
errors at large
scales obvious
Production runs
# OF TIMES LOOP EXECUTES
Training runs
SCALE
10
Modeling Scale-dependent Behavior
• Our Previous Research
– Vrisha [HPDC '11]
• Builds a collective model for all features of a program to
detect bugs at any feature
– Abhranta [HotDep '12]
• Tweaks Vrisha's model to allow per-feature bug
detection and localization
11
Modeling Scale-dependent Behavior
• Our Previous Efforts
– Vrisha [HPDC '11]
• Builds a collective model for all features of a program to
detect bugs at any feature
– Abhranta [HotDep '12]
• Tweaks Vrisha's model to allow per-feature bug
detection and localization
12
Modeling Scale-dependent Behavior
• Big gap in scale
– e.g. training runs on up to 128 nodes, production
runs on 1024 nodes
• Noisy features
– Too many false positives render the model useless
13
Reconstructing Scale-dependent
Behavior: the WuKong way
• Covers a wide range of program features
• Predicts the expected value in a large-scale
run for each feature separately
• Prunes unpredictable features to improve
localization quality
• Provides a shortlist of suspicious features in its
localization roadmap
14
The Workflow
APP
PIN
RUN 1
APP
PIN
RUN 2
APP
PIN
RUN 3
APP
PIN
RUN 4
SCALE
FEATURE
RUN 1
SCALE
FEATURE
RUN 2
SCALE
FEATURE
RUN 3
SCALE
FEATURE
RUN 4
Training
MODEL
FEATURE
?
...
SCALE
FEATURE
RUN N
SCALE
=
...
APP
PIN
RUN N
SCALE
FEATURE
Production
15
Feature Collection
16
Features considered by WuKong
void foo(int a) {
if (a > 0) {
} else {
}
if (a > 100) {
int i = 0;
while (i < a) {
if (i % 2 == 0) {
}
++i;
}
}
}
17
Features considered by WuKong
1
2
3
4
void foo(int a) {
1:if (a > 0) {
} else {
}
2:if (a > 100) {
int i = 0;
3:while (i < a) {
4:if (i % 2 == 0) {
}
++i;
}
}
}
18
Modeling
19
Predict Feature from Scale
• X ~ vector of scale parameters X1...XN
• Y ~ number of times a particular feature
occurs
• The model to predict Y from X:
• Compute the prediction error:
20
Predict Feature from Scale
• X ~ vector of scale parameters X1...XN
• Y ~ number of times a particular feature
occurs
• The model to predict Y from X:
• Compute the prediction error:
21
Bug Localization
22
Locate Buggy Features
• First, we need to know if the production run is
buggy, by doing detection as follows:
Ei  M i
Error of feature i Constant parameter Max error of feature i
in the production run
in all training runs
• If there is a bug in this run, we can start looking at
the prediction error of each feature:
– Rank all features by their prediction error to provide a
localization roadmap that contains the top N features
23
Improve Localization Quality by
Feature Pruning
24
Noisy Feature Pruning
• Some features cannot be effectively predicted by
the above model
– Random
– Not scale-determined
– Discontinuous
• The trade-off
– Keep those feature would pollute the diagnosis by
pushing real faults down the list
– Remove these features could miss some faults if the
faults happens to be in such features
25
Noisy Feature Pruning
• How to remove them?
For each feature:
1. Do a cross validation with training runs
2. Remove the feature if it triggers greater-than100% prediction error in more than (100-x)% of
training runs
• Parameter x > 0 is for tolerating outliers in
training runs
26
Evaluation
• Fault injection in Sequoia AMG2006
– Up to 1024 processes
– Randomly selected conditionals to be flipped
• Two case studies
– Integer overflow in a MPI library
– Deadlock in a P2P file sharing application
27
Evaluation
• Fault injection in Sequoia AMG2006
– Up to 1024 processes
– Randomly selected conditionals to be flipped
• Two case studies
– Integer overflow in a MPI library
– Deadlock in a P2P file sharing application
28
Fault Injection Study
• Fault
– Injected at process 0
– Randomly pick a feature to flip
• Data
– Training (w/o fault): 110 runs, 8-128 processes
– Production (w/ fault): 100 runs, 1024 processes
29
Fault Injection Study
• Result
– Total
– Noncrashing
– Detected
– Located
100
57
53
49
30
Evaluation
• Fault injection in Sequoia AMG2006
– Up to 1024 processes
– Randomly selected conditionals to be flipped
• Two case studies
– Integer overflow in a MPI library
– Deadlock in a P2P file sharing application
31
Evaluation
• Fault injection in Sequoia AMG2006
– Up to 1024 processes
– Randomly selected conditionals to be flipped
• Two case studies
– Integer overflow in a MPI library
– Deadlock in a P2P file sharing application
32
Case Study: A Deadlock in
Transmission’s DHT Implemenation
33
Case Study: A Deadlock in
Transmission’s DHT Implemenation
34
Case Study: A Deadlock in
Transmission’s DHT Implemenation
Feature 53, 66
35
Conclusion
• Debugging scale-dependent program behavior
is a difficult and important problem
• WuKong incorporates scale of run into a
predictive model for each individual program
feature for accurate bug diagnosis
• We demonstrated the effectiveness of
WuKong through a large-scale fault injection
study and two case studies of real bugs
36
Q&A
• [email protected]
37
Backup
38
Runtime Overhead
39