20060411_scs

Secure Compiler Seminar 4/11
Visions toward a
Secure Compiler
Toshihiro YOSHINO
<[email protected]>
(D1, Yonezawa Lab.)
Talk Agenda
Brief Introduction about TAL and PCC
 Introduction of my Master Thesis
 Visions toward a Secure Compiler

Brief Introduction about
TAL and PCC
Background

Program verification
= Mathematically assure a program has
certain properties

Useful for security
• Memory access safety, information flow analysis, …

Verifying low-level code directly reduces TCB



TCB: Trusted Computing Base
High-level code must be compiled after verified
⇒ We must trust the compiler
Assemblers are much simpler than compilers
Current Techniques and
Problems

Code signing




Based on public key cryptography
Can prove the genuineness of code
Cannot prove the safety by itself
Signature matching



Use a dictionary of malicious patterns and
match target programs against it
Employed in many antivirus systems
Pass does NOT mean safety
• Often unable to detect very new virus
Proof-Carrying Code
[Necula et al. 1997]

Technique for safe execution of
untrusted code


Code consumer does not need to trust
the producer
Code distributed with the proof of its
safety
Producer creates a proof
 Consumer verifies the proof against
his security policy

Proof-Carrying Code
[Necula et al. 1997]

Low consumer’s cost

Consumer has only to verify the proof
• For example, by typechecking

Tamper-proof

If passed the check, code does NOT
harm even if modified
• If modification makes the code fail the
check, the code will not run and it is safe
• Otherwise code still obeys the consumer’s
security policy
Typed Assembly Language
[Morrisett et al. 1999]

Extends a conventional assembly language
with static type checking


An instance of Proof-Carrying Code
By type checking, it can guarantee

Memory access safety
• Program never accesses outside the memory area
allocated for it

Interface consistency
• Type agreement of arguments / return value of
functions
etc.
TAL System Illustrated
TAL System
Type Checker
Code with
type
information
Assembler
Linker
Code
Consumer
A Brief Example of
TAL Program
Type Information
fact: {eax: B4}
movl %eax, %ecx (Used to typechecking a
program)
movl $1, %eax
loop: {eax: B4, ecx: B4}
mull %ecx
decl %ecx
cmpl $0, %ecx
Program Code
(Same as conventional
jg
loop
assembly languages)
{eax: B4}
end:
Related Work:
TALK, TOS [Maeda, 2005]

TALK: TAL for Kernel
Morrisett et al. uses garbage collector
for memory management in TAL
 For OS, GC cannot be assumed

• Must implement memory management
(malloc/free)

TOS: Typed Operating System

An experimental OS written in TALK
Introduction of
My Master Thesis
My Work for Master Thesis

“A Framework Using a Common
Language to Build Program Verifiers
for Low-Level Languages”
To help developers of program verifiers
 To be a common basis for verification
of low-level programs

• Such as assembly and machine languages
Motivation:
Verifiers are Hard to Develop
Especially in low-level languages…

Complex semantics
Semantics of each instruction is complex
 There are many instructions in a language


Low portability
Low-level languages heavily depend on
the underlying architecture
 Accordingly, entire verifier also depends
on the underlying architecture

Our Idea

Split a verifier into three parts
1.
2.
3.


Design a common language,
Translate the target program into that
language, and
Verify the translated program
These parts are explicitly independent
from each other
Thus we can replace them easily
Our Idea
Translator
Translated
Program
(2)
Target
Program
Result
Success
/Fail
(3)
Verification
Logic
Semantics of
Common Language
(1)
Verifier
How Do We Solve the
Problems?

Coping with complex semantics
Only translators care the semantics of the
source language
 Translator is reusable

• Once description is done, we can reuse it

Improving portability

Verification logic is also reusable
• Once implemented, it can be used for other
architectures simply by replacing translators
How Do We Solve the
Problems?
Translator
Translated
Program
Program in
Target
Another
Program
Language
Result
Success
/Fail
Verification
Logic
Semantics of
Common Language
Verifier
Overview of the Work

Designed a framework to build program
verifier
Designed a common language ADL
 Discussed the correctness of translators
 Proved that the properties assured are
preserved throughout translation


Implemented the framework using Java
ADL: A Common Language
Translator
Translated
Program
Target
Program
Result
Success
/Fail
Verification
Logic
Semantics of
Common Language
Verifier
ADL: A Common Language
Design Concept

ADL: Architecture Description Language

From observation of many architectures



Expressiveness



Data is stored in registers and memory, and
manipulates it according to program
Only jumps are sufficient for control flow structure
Arithmetics, logical operations, …
C-like expressions
Conservative semantics


No need to describe indecent programs
To simplify semantics
ADL: A Common Language
Overview of the Language

Imperative language which manipulates
registers and memory

5 kinds of commands
• nop, error, assignment, goto, if-then-else

Much like C than assembly
• Infix operators, parenthesized formulae
• Conditional execution by arbitrary condition
using if command

Only goto modifies control flow
• Unconditional branch
ADL: A Common Language
A Brief Example
data:
...
ADL
data:
...
main:
%ebx = &data;
%eax = 0;
goto &lp;
lp:
%eax = %eax + *[4](%ebx);
%ebx = *[4](%ebx + 4);
if %ebx == &null then
goto &end
else goto &lp;
main:
movl
movl
end:
goto &end;
end:
jmp
x86
$data, %ebx
$0, %eax
lp:
addl
movl
cmpl
je
jmp
0(%ebx), %eax
4(%ebx), %ebx
$0, %ebx
end
lp
end
ADL: A Common Language
Restrictions

ADL has a few restrictions by design




Code and data are completely separated
We assume NOTHING about memory
layout of a program
To simplify the semantics
Some programs cannot be expressed
•
•
However, most of decent programs can be
written even under these restrictions
To be discussed in the next slide
ADL: A Common Language > Restrictions
Separation of Code and Data

Do not treat code as data


ADL programs cannot read / write code
We cannot express the programs which
uses dynamic code generation

But, patterns of the generated code is
fixed in many cases
⇒ Other solution is possible
• For example, prepare a function for each
pattern of code
ADL: A Common Language > Restrictions
Not Assume Memory Layout

Casting is prohibited

ADL distinguishes integers and pointers
• In real architectures, pointers are not
distinguished from integers

Pointer arithmetic is restricted

Only pointer+integer, pointer-pointer are
defined
• Other operations returns ‘undetermined’

Sufficient for array/structure operations
and offset calculation
Program Translator
Translator
Translated
Program
Target
Program
Result
Success
/Fail
Verification
Logic
Semantics of
Common Language
Verifier
Program Translator

Translates low-level programs into ADL

We must assure that program translators
are correct
Otherwise, we cannot trust the entire
verifier
 Correctness is defined in the following
discussion

Program Translator
What Is Correctness of
Program Translation?


Instruction = Function over machine states
Correctness =
Correspondence between states of two
machines are preserved in translation
State
Original
Program
State’
State
State
Translated
Program
State’
State’
Program Translator
How to Confirm
Correctness of Translation

Any programs result in corresponding
states for any input ⇒ Correctness
Total inspection is NOT realistic
 Theorem prover would be useful

• Automatic proving is one of future work
• But how to confirm the correctness of the
description of the source language?

At this time, we take empirical approach

Test several cases using an interpreter
Verification Logic
Translator
Translated
Program
Target
Program
Result
Success
/Fail
Verification
Logic
Semantics of
Common Language
Verifier
Verification Logic

Verifies the properties of translated programs


Function that takes a program and returns success
or fail
Soundness must be assured
• This is the task for the creator of a verification logic
• Here we do not discuss any further

Definition: Soundness of a verification logic


Verification logic V: State → Bool
The set {S | V(S)} is closed about step execution
• If V(S), execution never falls into error state, and
• If V(S) and S→T (→ means step execution), then V(T)
Verification Logic
Soundness of Verification Logic
Machine
States
Soundness =
V(S) ∧ S→T
then V(T)
S such that
V(S)
Verification Logic
Program Translation and
Verification
We proved the following theorem
If program translator is correct, and
 Verification logic is sound, then

⇒ Verification on original program and
translated program are equivalent

Closed subset can be defined on the
states of translation source language
Implementation

Framework


ADL data structures
ADL interpreter
• Used to confirm the correctness of translators


Translator, verification logic interfaces
Translation rule compiler
• Compiles translation rule into Java implementation of a
translator

And for proof of concept,


Translator from Intel x86 and SPARC
A simple type checker
Related Works
Foundational TAL [Crary, 2003]

TAL type checker is still large


TCB is reduced by using a logical framework




TALx86 type checker consists of approx. 23k LoC
in O’Caml (!)
Designed a language called TALT on Twelf logical
framework [Pfenning et al., 1999]
Proved GC safety of TALT by machine
Correspondence between TALT and realistic
architectures are not discussed
TALT type system is fixed

Our work allows replacement of verification logics
Future Work

Automatically confirm the correctness of
translation

Automatic testing
• Cooperating with emulators or debuggers


Support dynamic memory allocation


Or, build a model and use a theorem prover
Currently all memory must be allocated statically
Support concurrent programs


Concurrency is not taken into consideration
To apply for OSes, etc., concurrency takes an
important role
Visions toward
a Secure Compiler
What Is Secure Compiler?

A compiler which produces certified
code
For example, TAL code as output
 Like Popcorn compiler in TALx86

• Safe dialect of C → TALx86

A compiler which assures correct
compilation (optionally)
Like credible compiler [Rinard, 1999]
 Reduces TCB

Motivation

Infrastructure has been built
TALK, TOS [Maeda, 2005]
 Verifier framework [Yoshino, 2006]


Next we have to build a house on it!

Most people do not want to write lowlevel code directly
⇒ Secure Compiler
Toward Secure World
If we built a secure compiler…

Memory-error-free systems

Prevent memory-error-based attacks
• OS kernel, core libraries, network server…

Writing secure code



Vulnerable code will result in verification
failure
So code security will be improved
Rest to be discovered…
Tasks to Do

Determine what properties to assure



Design the verification logic


Memory access safety? Information flow?
Must be mechanically checkable
Use verifier framework?
Design the language

Target: TAL-base? ADL?
• ADL can be used as certified language
• Register allocation is done, so simple mapping will
be possible…

Source: ???