Efficient Detection of All Pointer and Array Access Errors

Efficient Detection of All Pointer
and Array Access Errors
Todd M.Austin Scott E.Breach Gurindar S.Sohi
Computer Sciences Department
University of Wisconsin-Madison
1210 W. Dayton Street
Madison, WI 53706
{austin, breach, sohi}@cs.wisc.edu
December 1, 1993
Presented by Oren Markovitz, [email protected]
Topics
 Over View
 Memory Access errors
 Motivation
 What are safe pointers
 Safe pointers implementation
 Optimizations
 Experimental framework
 Results
 Related work
Over View
 Detect Pointer & Array Access errors
 Complete but not efficient
 Extended safe pointer representation
 Implemented in C
 Supports run-time and compiler optimization
 With no optimization performance hit
130%-540%,
text and data overhead
under 100%
Memory Access Errors
 Spatial access error-
dereference of a pointer or
subscripted array outside of
the referent.
 Temporal access errordereference of a pointer or a
subscripted array outside of
the lifetime of the referent.
Some Statistics - Motivation
 Miller et al injected random inputs
to mature unix applications on six
different platforms almost all
applications dumped core.
 Sullivan & Chillarege examined
IBM MVS over 4 years - 50% of
reported errors were due to access
errors.
The difficulty to detect & fix
access errors
 The effects of access errors may not
manifest themselves except under
exceptional conditions.
 The exceptional conditions which
lead to the program error may be
hard to reproduce.
 Its very hard to correlate memory
access error with program error.
Safe pointers
 The program is transferred to use
an extended pointer
representation called safe
pointers.
 A safe pointer contains the value
of the pointer as well as pointer
attributes.
Safe Pointers
 Value: The safe pointer
value.
 Base and size: referent
base address and its size.
(detect spatial errors)
 storageClass: allows
detecting errant
deallocations (e.g. its
illegal to free global or
local variables)
 capability: always exists ,
never exists, or a unique
allocation indicator.
Typedef {
<type> *value;
<type> *base;
unsigned size;
enum {heap=0, local, global}
storageClass;
int capability; /* plus FOREVER
and NEVER */
} SafePtr<type>;
Capability
 Each allocation is assigned a unique
number called capability.
 When the pointer is deallocated the
capability is returned to the
capability pool.
 When a pointer is referenced its
capability is checked to be legal.
 The capability pool is associative.
Program transformation
 Pointer conversion - extend
pointer definition
 Check insertions - detect access
errors
 operator conversions - generate
and maintain safe pointer
attributes
Check insertions
 Before each pointer dereference
void ValidateAccess(<type> * addr) {
if (storageClass != Global &&
!ValidCapability(capability))
FlagTemporalError();
if ((unsigned)addr - (unsigned)base > sizesizeof(<type>))
FlagSpatialError();
/* valid access !*/
}
Run time support (1)
 Explicit pointer allocations
are extended to support safe
pointers: malloc, calloc,
realloc, free.
 Function stack frame
Allocation: The frame is
allocated on function call
and assigned a capability as
any malloc operation.
Run time support (2)
Void *malloc(unsigned size) {
void free(void *p) {
void * p;
if (p.storageClass != Heap)
p.base = p.value =
FlagNonHeapFree();
unsafe_malloc(size);
if (!ValidCapability(p.capability))
p.size = size;
FlagDuplicateFree();
p.storageClass = Heap;
if (p.value != p.base)
InsertCapability(p.capability);
FlagNonOriginalFree();
bzero(p.value, size);
DestroyCapability(p.capability);
return p;
unsafe_free(p.value);
}
}
Run time support (3)
Void Func(int a) {
/* procedure prologue */
unsigned frameCapability = NextCapability();
InsertCapability(frameCapability);
ZeroFramePointers(); /* Assume capability NEVER == 0 */
.
.
/* procedure epilogue, exit point */
DestroyCapability(frameCapability);
return;
}
Operators conversions
 Pointers assignment copy the source pointer
attributes to the destination pointer.
 Determine the pointer attributes :
– Access path prefix: the address of the pointer
referenced.
– Access path suffix: the extent of the object being
referenced.
– Direct reference: The referenced object is local or
global.
– Indirect reference: The referenced object is
temporal.
The Access path
P = &f->g->h[3].I->j.k[4]
expression
access path
prefix
last pointer access path
dereference suffix
A
p.value = &f->g->h[3].I->j.k[4]
p.base = f->g->h[3].I->j.k
p.size = sizeof(f->g->h[3].I->j.k)
p.storageClass =
f->g->h[3].I.storageClass
p.capability =
f->g->h[3].I.capability
a.b
a.b.c[4].d
(**p)[3]
(*p)->b
w->x
w->x->y
w->x->y[3].z->c[4].b
Acces path
prefix
Prefix
type
a
a
a
**P
*p
w
w->x
w->x->y[3].z
direct
direct
direct
indirect
indirect
indirect
indirect
indirect
Access
path
suffix
B
b.c[4].d
[3]
B
X
Y
C[4].b
Access checking example
Struct {
Char a;
Char b[100];
} x, *p;
char *q;
P
Q
Capability store
[x,x,x,x, NEVER]
[x,x,x,x, NEVER]
{}
P = &x;
[x,x,x,x, NEVER]
{}
*p; / * no error */
Q = &p->b[10];
[1000,1000,101,global,
FOREVER]
-“-“-
{}
-“-
Q--;
-“-
*q;
p-=2;
p; /* error !!! */
-“[798,1000,101,global,F
OREVER]
-“-
-“[1011,1001,100,global,
FOREVER]
[1010,1001,100,global,
forever]
-“-“-“-
-“-
Char* p, *q;
P = malloc(10);
Q=p+6;
q; /*no error */
Free(p);
P = malloc(10);
q; /* error */
[x,x,x,x,NEVER]
[2000,2000,10,heap,1]
-“-“-“[2000,2000,10,heap,2]
-“-
[x,x,x,x,NEVER]
[x,x,x,x, NEVER]
[2006,2000,10,heap,1]
-“-“-“-“-
{}
{1}
-“-“{}
{2}
-“-
-“-“-“-
Run time optimization - spatial
checks
 Spatial checks - A dirty bit
added to the safe pointer
attributes.
 The dirty bit is set if a there
was no dereference check
since the last change
effective change to the
pointer effective address.
Run time optimization - temporal
checks
 A capability counter was
adder to count the current
allocated capabilities.
 The last temporal check
results are used if the
capability counter was
not changed.
Compile time (static) checks
optimization
 A tree structure which
describes all possible
executable paths of the
program is built.
 All the possible paths are
scanned.
 If there are two consecutive
checks without a change in
between then the later is
redundant.
Experimental frame work (1)
C program
C program
Pointer &
Reference
Conversion
C compiler
Unchecked
Executable
CPP
Safe-C program
C++
compiler
Checked
Executable
Varied
Inputs
checked
executable
Trace
Generation
(w/QPT)
Annotated
traces
PC to
program
point
Analyzer
Static Analysis
Lower Bounds
Experimental frame work (2)
 Replace pointer declarations.
 Replace malloc,free… to safe C calls
 Change all pointer and array
declarations.
 Add capability to function frames.
 Overload operations to add checks and
maintain safe pointers under references
and pointer operations.
Experimental frame work (3) overloading operations
Template <class type>
class sp {
/* safe pointer representation */
Type *value;
Type *base;
unsigned long size;
char storageClass;
unsigned short capability;
/* constructor */
sp(void) {
value = NULL;
base = NULL; size = 0;
storageClass = NONE;
capability = NEVER;
}
/* native pointer */
/* base address of object */
/* size of object in bytes */
/* type of allocation */
/* capability is always unique */
Experimental frame work (4) overloading operations
/* dereference */
Type& operator*(void) {
if (storageClass != Global && !ValidCpability(capability))
FlagTemporalError();
if ((unsigned)value - (unsigned)base > size - sizeof(type))
FlasgSpatialError();
return *value;
}
/* pointer addition */
sp<type> operator+(int addend) {
sp<type> p = *this; /* no side-effect on *this */
p.value = p.value + addend;
return p;
}
Lower bound computation
 When a check and reference are executed they write a
stamp.
 The stamp describes the execution path of the program.
 Analyzing of the stamp log shows what static and dynamic
checks are redundant.
 The number of required checks may be higher than the
lower bound:
– The inputs are sample inputs.
– Impositions due to the program aliases may force the
compiler time optimizer to make conservative assumptions
and add redundant checks.
Results (1)
None
Static
Dynamic
Spatial
Temporal
Spatial
Temporal
Spatial
Temporal
Spatial
Temporal
Spatial
Temporal
Spatial
Temporal
95
95
73
73
1259
1259
246
246
217
217
501
501
2,544,106
2,544,106
13,730,020
13,730,020
1,623,490
1,623,490
2,268,757
2,268,757
5,662,107
5,662,107
38,995,428
38,995,428
Optimization
Run-time Opt
Compile-time Opt (lower bound)
Static Dynamic
Coverage Static (% Dynamic
(% unopt)
Unopt)
(%Unopt
)
95
39%
83%
49%
79%
95
0%
83%
0%
0%
73
67%
70%
55%
83%
73
0%
70%
0%
0%
1259 20%
34%
%
22%
1259 9%
34%
5%
31%
246
24%
60%
3%
24%
246
3%
60%
9%
3%
217
17%
76%
21%
18%
217
0%
76%
0%
0%
501
86%
76%
43%
88%
501
0%
76%
0%
0%
Results(2) - execution overheads
12
10
Temporal Checks
Temporal Data
8
Spatial Checks
6
Spatial Data
4
User defined Ptr
2
Original Program
0
Unopt
Anagram
Opt
Backprop
Unopt
Min-Span
Opt
Partition
Results(3) - Text overheads
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Unopt
Anagram
Temporal Checks
Temporal Data
Spatial Checks
Spatial Data
User defined Ptr
Original Program
Opt
Backprop
Unopt
Min-Span
Opt
Partition
Results (4) - Data overheads
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Unopt
Anagram
Temporal Data
Spatial Data
User defined Ptr
Original Program
Opt
Backprop
Unopt
Min-Span
Opt
Partition
Results(5) - Summery
 Execution overhead (without compiler optimization) - low
enough to be during program development.
 Main contributors to execution overhead:
– safe pointer structures are not register allocated.
– Many traditional optimization cant be used.
 Spatial checks implementation is relatively chip when
compared to other methods, and complex program usually
include ones anyway.
 Temporal checks are usually optimized on run-time.
 Text and Data overhead are generally low:
– text overhead 41%-340% (with all but two under 100%)
– data overhead 5%-330% (with all but one under 100%)
Related work (1) - Purify
 Spatial error - by marking and detecting





stack access, only allocated zone are safe.
Temporal errors - allocated space is safe.
Works on object code.
Cant detect access errors such as when
allocated variable over-runs another
allocated variable.
The stack is “aged” to detect temporal
error - increasing stack size.
Cross language but not cross platform.
Related work(2) - RTCC & code
center
 RTCC - uses safe pointers
but without temporal
checks and optimizations.
 Code Center - Uses
safe pointers:
– no temporal checks
– supports access type
protection.
– Works as interpreter!!!
Related Work(3)
 Integral C : similar to
RTCC, detects only spatial
errors.
 VW-Pascal compiler:
– Detects both temporal
and access errors.
– Not cross language
(Pascal).
– Limited by the
expressiveness of the
language.