Don’t let data Go astray
A Context-Sensitive Taint Analysis for Concurrent Programs in Go
Volker Stolz
Bergen University College, Norway
& University of Oslo, Norway
28th Nordic Workshop on Programming Theory (NWPT’16)
1st November 2016
Supported by the bilateral Norwegian/German project
GoRETech — Go Runtime Enforcement Techniques
& EU COST Action IC1402 “ARVI — Applied Runtime Verification”
Don’t let data Go astray
Violet Ka I Pun
Anna-Katharina Wickert
Martin Steffen
Eric Bodden
Volker Stolz
Michael Eichberg
Motivation
I
Taint analysis: data flow analysis
I
Secure information flow
I
Prevent untrusted/sensitive data from reaching sensitive locations
Examples (the usual suspects):
I
• SQL injection (user input flows unfiltered into SQL query)
• leaks (clear-text password ending up in log/debugging output)
Volker Stolz
Don’t let data Go astray
NWPT’16
1 / 20
The Go language
I
Backed by Google
I
imperative (C-programmers
should be able to read it)
I
object-oriented (maybe. . . )
I
concurrent (goroutines)
I
structurally typed
I
garbage collected; dynamic race
checker
I
higher-order functions and
closures
Volker Stolz
Don’t let data Go astray
NWPT’16
2 / 20
What are methods?
type Number struct { n int }
I
procedures – functions – methods
I
methods are “specific” functions
I
Example: add1 / add2 not that
much different from each other
func add1 (x Number, y Number)
func (x Number) add2 (y Number) int
{ return x.n+y.n }
method ∼ function with special first argument
f (o, v ) vs. o.f (v )
I
elsewhere often: special keyword for first argument: this (or self)
Volker Stolz
Don’t let data Go astray
NWPT’16
3 / 20
Higher-order functions
I
I
known from functional languages
languages with higher-order functions
• functions as “first-class” data ⇒
func add (x int) (func (int) int) {
return func ( y int ) ( int ) { return y + x }
}
add : int→ (int→int) = λx :int .λy :int .x + y
Volker Stolz
Don’t let data Go astray
NWPT’16
4 / 20
Deferred functions
Each function/method
can be called:
func main () {
defer fmt . Println ( ‘‘ 1’’ )
fmt . Println ( ‘‘2’’ )
}
1
conventionally
2
deferred
3
asynchronously (see later)
I
Also in Apple’s Swift language
Volker Stolz
I
Deferred call (guaranteed to be)
executed when the surrounding
function body returns
I
Eval’d for side-effect only,
returned value irrelevant
I
Deferred calls can be nested, too
Don’t let data Go astray
NWPT’16
5 / 20
Concurrency in Go
Go’s concurrency mantra
“Don’t communicate by sharing memory,
share memory by communicating!”
Go concurrency
goroutines + channels
Volker Stolz
I
claimed to be “easy”
I
first-class, typed channels
Don’t let data Go astray
NWPT’16
6 / 20
Channels
I
“named pipes”
I
FIFO, bounded, non-lossy communication
I
crucial data type with synchronization power
taking a back-seat:
I
•
•
•
•
I
locks
mutexes
monitors
semaphores. . .
channels: first-class data
• channels can send (references to) channels
• inspired by CSP (and CCS, and, actually π)
I
directed channels
Volker Stolz
Don’t let data Go astray
NWPT’16
7 / 20
Channels
Simple example: synchronized channel for strings
package main
import ‘‘fmt’’
func main () {
messages := make ( chan string , 0 )
go func () { messages ← ‘‘ping’’ } ()
msg := ← messages
fmt . Println (msg)
}
Volker Stolz
Don’t let data Go astray
NWPT’16
8 / 20
Back to Taint Analysis. . .
I
I
Identifies flows of private information to untrusted places
Terminology:
• Source (produce tainted data)
• Sink (consumer of data)
• Flow (from source to sink)
I
Both given as user-defined sets of methods
I
(Q: How to untaint data? Sanitizers, future work)
Volker Stolz
Don’t let data Go astray
NWPT’16
9 / 20
Abstract syntax
I
Abstract syntax captures essence of Go
I
Gory details in SSA representation in implementation
s ::=
|
e ::=
v ::=
Volker Stolz
x := e | x.f := e | if v then s else s | defer((λx.s) v)
go s | x ← y | x → y | return v | s ; s
v | v v | makeChan
x | x.f | () | true | false | λx.s
Don’t let data Go astray
NWPT’16
10 / 20
Lattice for Taint Analysis
Simple lattice:
uninitialized / untainted / tainted / both
Volker Stolz
∪
⊥
1
0
>
⊥
1
0
>
⊥
1
0
>
1
1
>
>
0
>
0
>
>
>
>
>
>
1
0
⊥
Don’t let data Go astray
NWPT’16
11 / 20
Standing on the Shoulders. . .
I
Existing Go compiler infrastructure:
• SSA representation
https://godoc.org/golang.org/x/tools/go/ssa
• Basic blocks / call-graph
https://godoc.org/golang.org/x/tools/go/callgraph
• Points-to analysis à la Andersen
https://godoc.org/golang.org/x/tools/go/pointer
I
Interprocedural, context-sensitive analysis:
Padhye / Khedker, SOAP@PLDI 2013
• standard worklist algorithm w/ calling context & value
Volker Stolz
Don’t let data Go astray
NWPT’16
12 / 20
Standing on the Shoulders. . .
I
Existing Go compiler infrastructure:
• SSA representation
https://godoc.org/golang.org/x/tools/go/ssa
• Basic blocks / call-graph
https://godoc.org/golang.org/x/tools/go/callgraph
• Points-to analysis à la Andersen
https://godoc.org/golang.org/x/tools/go/pointer
I
Interprocedural, context-sensitive analysis:
Padhye / Khedker, SOAP@PLDI 2013
• standard worklist algorithm w/ calling context & value
. . . or their toes?
Volker Stolz
Don’t let data Go astray
NWPT’16
12 / 20
Our analysis – example
main()
n1
a := “Hello World”
c1
b := g(a)
n3
f,
c2
n6
Volker Stolz
n4
b := make([]byte, 8)
r, = f.Read(b)
n5
c = string(b[:])
return
fmt.Print(b)
// sink
:= os.OpenFile(”./pw.txt”)
s, n := h(f)
fmt.Print(s)
for n > 0
c3
s, n = h(f)
c4
t := g(s)
n8
fmt.Print(t)
n9
h(f *os.File) (c string, r int)
exit
// sink
// sink
g(a string) (b string)
n2 |n7
Don’t let data Go astray
b=a
return
NWPT’16
13 / 20
Our analysis – example
main()
n1
a := “Hello World”
c1
b := g(a)
n3
f,
c2
n6
Volker Stolz
n4
b := make([]byte, 8)
r, = f.Read(b)
n5
c = string(b[:])
return
fmt.Print(b)
// sink
:= os.OpenFile(”./pw.txt”)
s, n := h(f)
fmt.Print(s)
for n > 0
c3
s, n = h(f)
c4
t := g(s)
n8
fmt.Print(t)
n9
h(f *os.File) (c string, r int)
exit
// sink
Untainted
// sink
g(a string) (b string)
n2 |n7
Don’t let data Go astray
b=a
return
NWPT’16
13 / 20
Our analysis – example
main()
n1
a := “Hello World”
c1
b := g(a)
n3
f,
c2
n6
Volker Stolz
n4
b := make([]byte, 8)
r, = f.Read(b)
n5
c = string(b[:])
return
fmt.Print(b)
// sink
:= os.OpenFile(”./pw.txt”)
s, n := h(f)
fmt.Print(s)
for n > 0
c3
s, n = h(f)
c4
t := g(s)
n8
fmt.Print(t)
n9
h(f *os.File) (c string, r int)
exit
Tainted
// sink
Untainted
// sink
g(a string) (b string)
n2 |n7
Don’t let data Go astray
b=a
return
NWPT’16
13 / 20
Channel Handling
Q: Can we handle flow through channels in the same framework?
Volker Stolz
n1
x :=“Hello World”
n2
ch := make(chan, string)
c1
go f(ch)
fn1
func f(ch chan string)
n3
sink(x)
fn2
y := ← ch
n4
x = tainted
fn3
sink(y)
n5
ch ← x
n6
...
Don’t let data Go astray
NWPT’16
14 / 20
Channel Handling
Q: Can we handle flow through channels in the same framework?
TA(n5 )↓ch = [ch 7→ S(x)]
n1
x :=“Hello World”
n2
ch := make(chan, string)
c1
go f(ch)
fn1
func f(ch chan string)
n3
sink(x)
fn2
y := ← ch
n4
x = tainted
fn3
sink(y)
n5
ch ← x
n6
...
A: Add feedback from writes to reads
Overapproximation; only feed back taint value of channel along this edge
Volker Stolz
Don’t let data Go astray
NWPT’16
14 / 20
Our analysis
“Nielson-style” data-flow analysis (monotone framework)
Analysis info of a node = transfer function of node
applied to
(union over all incoming flows)
TA(`) = Φ(S, N ` ) S
where S = {TA(`0 ) | (`0 , `) ∈ flow∗ (P)} and
N ` ∈ nodes(P) .
Volker Stolz
Don’t let data Go astray
NWPT’16
15 / 20
Our analysis
Assignments, from/to struct members, calls:
Φ(S, [x := e]` ) = φ(S, x, e, `)
Volker Stolz
Don’t let data Go astray
NWPT’16
16 / 20
Our analysis
Assignments, from/to struct members, calls:
Φ(S, [x := e]` ) = φ(S, x, e, `)
0
0
{TA(l 0 )↓y 0 | [y 0 := e]` ∨ [y 0 ← ch]`
f.a. (`0 : y 0 ) ∈ aliases(` : y )}]
S
0
0
φ(S, x.f , y .f 0 , l) = S[x.f 7→ {TA(`0 )↓y 0 | [y 0 := e]` ∨ [y 0 ← ch]`
f.a. (`0 : y 0 ) ∈ aliases(` : y )}]
S[x 7→ 1]
if v1 is a source
φ(S, x, v1 v2 , `) =
1
S[x 7→ Φvexit
]
otherwise
φ(S, x, y .f , `) = S[x 7→
S
I
aliases(` : u) produced by existing context-insensitive pta.
I
tainting a struct-member taints the entire struct
I
same for slices & built-in key/value map data structure
Volker Stolz
Don’t let data Go astray
NWPT’16
16 / 20
Our analysis
Assignments, from/to struct members, calls:
Φ(S, [x := e]` ) = φ(S, x, e, `)
φ(S, x, y .f , `) = S[x 7→
φ(S, x, v1 v2 , `) =
S
0
0
{TA(l 0 )↓y 0 | [y 0 := e]` ∨ [y 0 ← ch]`
f.a. (`0 : y 0 ) ∈ aliases(` : y )}]
S[x →
7 1]
1
S[x →
7 Φvexit
]
if v1 is a source
otherwise
I
aliases(` : u) produced by existing context-insensitive pta.
I
tainting a struct-member taints the entire struct
I
same for slices & built-in key/value map data structure
Volker Stolz
Don’t let data Go astray
NWPT’16
16 / 20
Channel Handling
Φ(S, [x → ch]l ) = S[ch 7→ S(x)]
Φ(S, [x ← ch]l ) = S[x 7→ A]
S
0
where A = {TA(l 0 )↓ch0 | [x 0 → ch0 ]l }
f.a. (l 0 : ch0 ) ∈ aliases(l : ch)
I
existing pta knows about channels
I
we lose information about ordering of messages in channels
(even in obvious cases)
Volker Stolz
Don’t let data Go astray
NWPT’16
17 / 20
Sanitizers & Monitoring
I
I
Taint analysis with only sources and sinks too restrictive
Certain operations untaint data:
• SQL injections: filter out dangerous characters
• Password example: data flowing through hash-function sanitized
I
Natural extension through third set of operations: sanitizers
I
Interesting questions:
• Where to report tainted flow at runtime (early/late)
• Minimize taint-tagging at runtime
• Automated placement of sanitizers to “repair” programs
(for C] : Livshits, POPL’13)
Volker Stolz
Don’t let data Go astray
NWPT’16
18 / 20
Practical Evaluation
I
Extensive list of sources & sinks from Eric’s security projects
I
Test suite with hand-crafted examples: X
I
Runtime bounded by lattice height, calling contexts
I
Real-world case study: well. . .
• fine-grained analysis descends into imported packages (fmt.Print)
• case study requires domain specific properties
(and by now everyone is avoiding SQL injections. . . )
• SSA-based infrastructure requires setup that actually compiles
(git clone github.com/random/goproject not enough)
I
High-profile targets:
• Docker, Dropbox, Android development
Volker Stolz
Don’t let data Go astray
NWPT’16
19 / 20
Summary: Challenge in Go
I
Traditional data-flow analysis, for a newer language
I
Information flow: sources to sinks (via sanitizers)
I
mixes problems from C (structs, pointers)
with problems from π-calculus (channels)
I
combination of off-the-shelf components (SSA representation & pta
from Go compiler infrastructure, vasco interprocedural analysis)1
I
“More research is needed on. . . ”:
• fine tune white-list/black-list for imports
• or: precompute & cache standard libraries?
• collect (performance) data on real-world
Go code
1
E. Bodden, K. I Pun, M. Steffen, V. Stolz, A.-K. Wickert
“Information Flow Analysis for Go”, ISoLA 2016.
Volker Stolz
Don’t let data Go astray
NWPT’16
20 / 20
Summary: Challenge in Go
I
Traditional data-flow analysis, for a newer language
I
Information flow: sources to sinks (via sanitizers)
I
mixes problems from C (structs, pointers)
with problems from π-calculus (channels)
I
combination of off-the-shelf components (SSA representation & pta
from Go compiler infrastructure, vasco interprocedural analysis)1
I
“More research is needed on. . . ”:
• fine tune white-list/black-list for imports
• or: precompute & cache standard libraries?
• collect (performance) data on real-world
Go code
1
E. Bodden, K. I Pun, M. Steffen, V. Stolz, A.-K. Wickert
“Information Flow Analysis for Go”, ISoLA 2016.
Volker Stolz
Don’t let data Go astray
NWPT’16
20 / 20
© Copyright 2026 Paperzz