LangSec meets protocol state machines

LangSec meets state machines
Erik Poll
joint work with Fabian van den Broek,
Joeri de Ruiter & many others
Radboud University Nijmegen
Overview
How can we tackle root causes of some classes of
security vulnerabilities in a systematic way?
Two (related) ideas
• language-theoretic security (LangSec)
• state machines
Erik Poll
Radboud University Nijmegen
2
LangSec
Language-theoretic
Security
LangSec (Language-theoretic Security)
• Interesting look at root cause of large class of security problems,
namely problems with input
• Useful suggestions for dos and don’ts
• See langsec.org, esp. http://langsec.org/bof-handout.pdf
Sergey Bratus & Meredith Patterson
Erik Poll
Radboud University Nijmegen
4
Tower of Babel
Web browsers and web applications involve many languages
HTTP(S), HTML, CCS, javascript, Flash, cookies & FSOs
Ajax & XML, ActiveX, jpeg, mpeg, mp4, png, gif, SilverLight,
user names, email addresses, URLs/URIs, X509 certificates,
TCP/IP (IPv4 or IPv6), file names, directories, OS commands,
SQL, LDAP, JSP, PHP, ASCII, Unicode, UTF-8, ...
Erik Poll
Radboud University Nijmegen
5
Input attacks
The common pattern in many attacks
buffer overflows, format string attacks, integer overflow, OS command
injection, path traversal attacks, SQL injection, HTML injection, PHP file
name injection, LDAP injection, XSS, CSRF, database command &
function injection, ShellShock, HeartBleed,...
1. attacker crafts some malicious input
2. software goes off the rails processing this
Like social engineering or hypnosis as attack vector on humans?
Erik Poll
Radboud University Nijmegen
6
Processing input is dangerous!
Processing involves
1) parsing/lexing
2) interpreting/executing
Eg interpreting a string as filename, URL, or email address
This relies on some language or format
1) relies on syntax
2) on semantics
Insecure processing of inputs exposes strange functionality that the
attacker can program & abuse: a weird machine
Erik Poll
Radboud University Nijmegen
7
Fallacy of classic input validation?
Classical input validation:
filter or encode harmful characters (blacklist)
or, slightly better:
only let through harmless characters (whitelist)
But:
• Which characters are harmful (or required!) depends on the
language or format. You need context to decide which characters
are dangerous.
• Not only presence of funny characters can cause problems, but
als the absence of other characters, or input fields that are too
long or too short, ...
Erik Poll
Radboud University Nijmegen
8
Root causes (dont’s)
Obstacles in producing code without input vulnerabilities
1. ad-hoc and imprecise notion of input validity
2. parser differentials
eg web-browsers parsing same certificate in different ways
3. mixing input recognition & processing
aka shotgun parsers
4. unchecked development of input languages
eg ASCI text email evolving to include HTML, Javascript,...
Erik Poll
Radboud University Nijmegen
9
Root cause: shotgun parsers
Handwritten code that incrementally parses & interprets input,
in a piece-meal fashion
Tell-tale signs in the code:
• use of strings or byte arrays
• code all over the place that parses and combines these
Erik Poll
Radboud University Nijmegen
10
An example shotgun parser – spot the security flaw!
...
char buf1[MAX_SIZE], buf2[MAX_SIZE];
// make sure url is valid URL and fits in buf1 and buf2:
if (!isValid(url)) return;
if (strlen(url) > MAX_SIZE – 1) return;
// copy url up to first separator, ie. first ’/’, to buf1
out = buf1;
do {
// skip spaces
if (*url != ’ ’) *out++ = *url;
} while (*url++ != ’/’);
loop termination
strcpy(buf2, buf1);
flaw (for URLs
...
without /) caused
Blaster worm
Erik Poll
Radboud University Nijmegen
11
LangSec principles (do’s)
No more handwritten shotgun parsers, but
1. precisely defined input languages
eg with EBNF grammar
2. generated parsers
3. complete parsing before processing
So don’t substitute strings & then parse,
but parse & then substitute in parse tree
Eg parameterised queries instead of dynamic SQL.
4. keep the input language simple & clear
So that equivalence of various parsers is decidable.
So that you give minimal processing power to attackers.
Erik Poll
Radboud University Nijmegen
12
Erik Poll
Radboud University Nijmegen
13
Erik Poll
Radboud University Nijmegen
14
Example complicated input language: GSM
GSM is a extremely rich & complicated protocol
Erik Poll
Radboud University Nijmegen
15
Example: GSM protocol fuzzing
Lots of stuff to fuzz!
With an USRP with OpenBTS software
we can fuzz phones
[Fabian vd Broek, Brinio Hond, Arturo Cedillo Torres, Security Testing of GSM
Implementations, Essos 2014]
Erik Poll
Radboud University Nijmegen
16
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird functionality
Erik Poll
Radboud University Nijmegen
17
Example: GSM protocol fuzzing
Fuzzing SMS layer of GSM reveals weird
eg possibility to send faxes (!?)
you have a fax!
Only way to get rid if this icon; reboot the phone
Erik Poll
Radboud University Nijmegen
18
Results with GSM protocol fuzzing
• Lots of success to DoS phones: phones crash, disconnect from
the network, or stop accepting calls
• Little correlation between problems and phone brands & firmware
versions
• how many implementations of the GSM stack do vendors
have?
• The scary part: what would happen if we fuzz base stations?
Root cause: complex input language, with lots of
handwritten code to parse & interpret input
Erik Poll
Radboud University Nijmegen
19
protocol state machines
Messages & sequences of messages
Protocols not only involve messages,
but also sequences of messages
Erik Poll
Radboud University Nijmegen
21
Using a protocol state machine (FSM)
Language for sequences of inputs
can be specified using a
finite state machine (FSM)
This state machne only
describes the happy flows.
The implementation
will have to be input-enabled.
SSH transport layer
Erik Poll
Radboud University Nijmegen
22
Typical prose specifications: RFC for SSH 
“Once a party has sent a SSH_MSG_KEXINIT message for key exchange or reexchange, until it has sent a SSH_MSG_NEWKEYS message, it MUST NOT send any
messages other than:
• Transport layer generic messages (1 to 19) (but SSH_MSG_ SERVICE REQUEST
and SSH_MSG_SERVICE_ACCEPT MUST NOT be sent);
• Algorithm negotiation messages (20 to 29) (but further SSH_MSG KEXINIT
messages MUST NOT be sent);
• Specific key exchange method messages (30 to 49).
The provisions of Section 11 apply to unrecognised messages”
…
“An implementation MUST respond to all unrecognised messages with an
SSH_MSG_UNIMPLEMENTED. Such messages MUST be otherwise ignored. Later
protocol versions may define other meanings for these message types.”
Understanding state machines from prose is hard!
Erik Poll
Radboud University Nijmegen
23
Extracting state machines from code!
Using state machine learning we can automatically infer a
state machine from implementation by black box testing.
• This is effectively a form of fuzzing.
• not fuzzing the content of messages,
but fuzzing the order of messages.
• Using variants of the L* algorithm,
implemented in open source libraries such as LearnLib
This is a great way to obtain protocol state machines
• without reading specs!
• withour reading code!
Erik Poll
Radboud University Nijmegen
24
How does state machine learning work?
Just try out sequences of inputs, and observe outputs
A/X
Suppose input A results in output X
A/X
• If second input A results in the same output X
B/error
Now try all sequences of inputs with A, B, C, ...
A/X
B/error
Erik Poll
B/Y
A/error
Radboud University Nijmegen
A/Y
A/X
• If a second input A results in different output Y
C/X
A/error
25
Example: state machine learning for
Erik Poll
Radboud University Nijmegen
26
Example: state machine learning for
merging arrows
with identical response
Erik Poll
Radboud University Nijmegen
27
Example: state machine learning for
merging arrows with
same start & end state
Erik Poll
Radboud University Nijmegen
28
Understanding & comparing implementations
Volksbank
implementation
Rabobank
implementation
Are both implementations correct & secure? And compatible?
Erik Poll
Radboud University Nijmegen
29
State machine inference for this device?
Erik Poll
Radboud University Nijmegen
30
Internet banking with
TLS
transfer € 10.00
to 52.72.83.232
type: 23459876
Erik Poll
Radboud University Nijmegen
31
Internet banking with USB-connected
More secure: display shows transaction details
Also, more user-friendly
transfer € 10.00
to 52.72.83.232
USB
transfer € 10.00
to 52.72.83.232
Erik Poll
Radboud University Nijmegen
32
Security flaw in state machine
Embarrasing security flaw:
attacker can press the OK key via the USB cable
Could we detect such flaws
automatically?
[Arjan Blom et al., Designed to fail, NordSec 2012]
Erik Poll
Radboud University Nijmegen
33
State machine learning using
Erik Poll
Radboud University Nijmegen
34
Erik Poll
Radboud University Nijmegen
35
State machine of old vs new device
Erik Poll
Radboud University Nijmegen
36
Would you trust this to be secure?
Complete state machine of new device,
using richer alphabet of USB commands
[Georg Chalupar et al., Automated reverse engineering using Lego, WOOT 2014]
Erik Poll
Radboud University Nijmegen
37
State machine learning for TLS
Model learned for the NSS implementation
Comforting to see this is so simple!
Erik Poll
Radboud University Nijmegen
38
TLS... according to GnuTLS
Erik Poll
Radboud University Nijmegen
39
TLS... according to OpenSSL
Erik Poll
Radboud University Nijmegen
40
TLS... according to Java Secure Socket Exension
Erik Poll
Radboud University Nijmegen
41
Which TLS state machines are secure?
[Joeri de Ruiter and Erik Poll, Protocol State Fuzzing of TLS implementations, Usenix 2015]
Erik Poll
Radboud University Nijmegen
42
Conclusions
LangSec provides an interesting look at input problems
• explains root causes & a way to avoid theseS
State machines are great specification formalism
• to avoid ambiguities
• to help the programmer
• special case of LangSec,
using state machine to express input language
Extracting state machines from code is great tool!
• analysis of existing implementations
• obtaining reference state machines for existing protocols
Erik Poll
Radboud University Nijmegen
43
Thanks for you attention!
Erik Poll
Radboud University Nijmegen
44