Deep Learning for
Program Repair
Aditya Kanade
Indian Institute of Science
Dagstuhl Seminar on Automated Program Repair, January 2017
A Probabilistic Perspective on Program Repair
• Specification = A set of examples
= { (x, y) | x is a faulty program and y is its fixed version }
• Represent the faulty and fixed programs as sequences of tokens
x = x1, x2, …, xn and y = y1, y2, …, yk
• Learn a conditional probability distribution P(Y = y | X = x)
• Repair procedure: To fix an unseen program x, generate a y such that
y = argmax P(Y = y’ | X = x)
2
Seq2seq Neural Net with Attention Mechanism
Attention mechanism to focus
on diff. parts of input sequence
With fixed-size context vector
3
Neural Network Architecture
• Encoder
• Decoder
4
DeepFix: Fixing Common Programming Errors
• Common programming errors: Unlike logical errors, these are not
specific to a programming task at hand, but relate to the overall
syntax and structure of the programming language
• Analogous to grammatical errors in natural languages.
• Both experienced developers and novice students make them
• Detected by compilers as syntactic errors, but are neither accurately
localized nor fixed by the compilers
5
Example
Input faulty program with attention weights
Program as repaired by DeepFix
6
The Iterative Repair Strategy of DeepFix
An end-to-end deep learning solution where the neural net
learns to both localize and fix the faults.
7
Experimental Setup
• Identifiers (variable and method names) are canonicalized
• Resulting vocabulary size is 129, tokens embedded in 50-dimensional vectors
• Seq2seq w/ attention network implemented in TensorFlow
• Both encoder and decoder are 4 stacked GRU layers of 300 cells in each layer
• Dropout at the rate of 0.2 on non-recurrent connections
• Training configuration
•
•
•
•
•
Trained using ~150K examples created synthetically
Stochastic gradient descent using Adam optimizer with mini-batch of 128
Gradients clipped at [-1,1]
Training time of 1-2 hours per epoch
Select the model with peak validation performance over 20 epochs
8
Results
• Evaluated on 6971 faulty C programs submitted to 93 programming tasks in
an introductory programming tasks
• Fixed 27% programs completely such that the programs compile w/o errors
• Fixed additional 19% programs partially
• Evaluated on 9230 C programs with seeded faults
• Fixed 56% programs completely
• Fixed additional 17% programs partially
• Localization accuracy on seeded faults: Top-1 = 79%, Top-5 = 89%
• Types of errors fixed: missing delimiters, extraneous symbols, swapped
symbols, missing variable declarations, etc.
9
Some Plots
Distribution of fixed programs by tasks
PCA projects of vector-representations of
correct (circles) and faulty (triangles) programs
10
Lightweight Machine Learning
11
Statistical Correlation to Search for Likely Fixes
• Given a test suite and a potential faulty location, how to search for
expressions that are likely to appear in the fixed version?
• Use symbolic execution to obtain desired expression values
• Enumerate a set of expressions and their values on all executions
• Rank expressions using statistical correlation with the desired values
• Used for synthesis of repair hints in MintHint [ICSE’14]
12
Clustering Programs for Large-Scale Feedback
Generation on Student Code
?
? ?
? ?
?
? ?
?
?
?
? ?
?
?
Clustering
?
?
by solution
strategy
?
? ?
?
Instructor certifies
one submission from
each cluster as correct
Student programs
?
?
?
?
?
?
?
?
Verified feedback
? ?
?
?
?
?
?
?
?
Used in CoderAssist [FSE’16] to generate feedback on student programs
13
Acknowledgements
• DeepFix: Fixing common C language errors by deep learning
Rahul Gupta, Soham Pal, Aditya Kanade, Shirish Shevade
AAAI’17
• Semi-supervised verified feedback generation
Shalini Kaleeswaran, Anirudh Santhiar, Aditya Kanade, Sumit Gulwani
FSE’16
• MintHint: Automated synthesis of repair hints
Shalini Kaleeswaran, Varun Tulsian, Aditya Kanade, Alessandro Orso
ICSE’14
• Some images taken from the Deep Learning book by Goodfellow, Bengio and Courville
14
Discussion Points
• Improving performance of DeepFix, e.g., using a copying mechanism?
• Fixing more challenging programming errors?
• Handling larger programs? Learning better dependences?
• General-purpose deep nets versus special-purpose deep nets
designed for program text?
• How to obtain high-quality training data? Mutation strategies?
15
© Copyright 2026 Paperzz