Overview

Overview
Compiler
Baojian Hua
[email protected]
Compiler




Compilers are the most fundamental
developer tool
Long history of study in CS
Almost all of the key ideas in compiler design
are also important in other problem domains
You will end up using compiler design
principles in almost every software
development project
Compilers are Fundamental

Compilers are fundamental elements of
computer systems



Every new machine architecture defines
standard calling conventions and comes
with a compiler
Chip performance is measured by using a
compiler
Even embedded processors are now so
complex that compilers are necessary
Still learning about compilers



Compiler design is still an active area of
research
In recent years, there has been a tremendous
amount of activity in various forms of security
and safety-related compiling
We will do some reading of recent research
papers, in addition to working on our own
compiler projects
What is a Compiler?



A compiler translates source programs
into target programs.
In a high-level source language,
programs specify both static (compiletime) and dynamic (run-time)
computations.
Usually, the target language specifies
only dynamic computations
Static and dynamic
source
program
compiler
static computations
target
program
machine
dynamic computations
results
Compiler Structure

Compilers are structured in highly
modularized fashion



promotes better correctness, maintenance,
etc
permits easier retargeting for new target
architectures
naturally follows the static/dynamic staging
of high-level source programs
Compiler structure
string of characters
lexical
analyzer
sequence of tokens
parser
symbol
table
abstract syntax tree
semantic
analyzer
intermediate code
code
generator
relocatable object code
The UNCOL argument
SML
x86
Java
Sparc
C
MIPS
C++
PPC
C#
ARM
n×m compilers!
The UNCOL argument
SML
x86
Java
Sparc
C
IR
MIPS
OCaml
PPC
C#
ARM
vs n+m compilers
assembler
linker
Translate
Assem
control
flow
analysis
Flow Graph
instruction
selection
Machine code
Abstract syntax
semantic
analysis
Relocatable
code
code
emission
Reductions
canonicalize
parsing
actions
IR trees
Tokens
parse
Assembly code
register
allocation
IR trees
translate
Register
assignment
Translate
lex
Flow Graph
Source program
Compiler phases, more detailed…
Optimizations




The most important function of a compiler is
code optimization
For modern architectures, register allocation
is the most important optimization
But there are many, many other optimizations
as well
A serious design challenge is how to order
the optimizations
Declarative Specifications



Another interesting aspect of compilers
is the level of automation
Some phases are automatically
generated from declarative
specifications
Most of this is derived directly from CS
theory, such as automata theory and
type theory
Every phase is different


All in all, every phase presents unique
challenges, and makes use of different
(math) concepts, data structures, and
algorithms
A major aspect of compiler design,
therefore, is how to synthesize all of
this into a coherent, reliable, and robust
system
How This Course Works
Structure of this course

Lectures


Readings


Textbook & research papers
Exercises & quizzes


Monday, 2:00-4:00
paper+pencil exercises + 2 quizs
Projects

development of a compiler from scratch
Online Resources

Web site:


Critical course information:








http://staff.ustc.edu.cn/~bjhua/courses/spring10
course policies and schedule of lectures
readings and exercises
project information
development resources
discussion boards
late-breaking announcements
some lecture notes
Read the web site every day!
Course staff resources

TAs


Me



Zhong Zhuang ([email protected])
[email protected]
See the webpage for office hours, etc.
The best way to get quick answers is to
use email
Textbooks & Reference




Modern compiler implementation in ML
(tiger book)
Compilers: principals, techniques and
tools (dragon book)
Advanced compiler design and
implementation (whale book)
Engineering a compiler (ark book)
Projects

7 projects planned



a trivial warm-up
Worth a combined total of 70/100 points
Each project involves developing:


a complete working compiler component
plus test programs

Each project will build on the previous one

You should work by yourself
Project mechanics

Projects due at midnight on specified date


Test programs due one week prior


handin automatically
Test programs will be published for all to use
Your project score is the percentage of all
test programs that your compiler passes

some TA discretion is allowed
Grading your work


Homework exercises to be turned in at lecture, to be
graded by TAs
For projects:

It must be possible to build and execute your compilers

This means following the directions given on the webpage
automatically
precisely



Make files, test file formats, etc
We officially support SML
All compilers must target the x86 by generating assembly
code that can be assembled by gcc running under Linux
Coding style


For grading purposes, we will not read your
code
But you will be living with your code all term,
so attention paid to commenting, good
structure, and (especially) good modularity
will be critical to staying sane!

Also, you should understand your compiler
completely


Think: Extreme Programming
Or at least do detailed design and interface
development
Summary of Grading

70% for projects


10% for homework exercises


determined by successful tests of your
compiler on test programs
given roughly every other week, to be
handed in to TAs during lecture
20% for middle and final quizzes
Late policy

Firm due dates, so you should be able
to plan and manage this


hence, late submissions generally not
accepted
See me if something serious comes up
that causes you to need more time
About cheating

Each team is required to do its own project
work

read the cheating and collaboration policy on the
Blackboard, if you want a course grade

We may use code-similarity checkers

Collaboration is OK!


Sharing ideas, approaches, limited debugging help,
etc. are all good
But you must write your own code
Choice of Programming
Language

Since we will not read your code, any
language can be used


but we must be able to build and test your
compiler automatically
SML are officially supported

see webpage for more details
Choice of Language


SML provides major advantages for
compiler construction
If you are unfamiliar with SML, then
probably best to stick with Java


But pay extra-extra-extra-special attention
to good modularity
Will say more about this during the term
Why SML?



Each project will extend/enhance the
previous project
In SML, changes to an interface or module
will cause the SML compiler to check all other
modules that depend on it
In Java, changes to a class often do not
cause the Java compiler to complain, even if
the changes affect other classes
Why SML? cont’



As a result, in Java, you will often be forced
to test your changes by executing/testing
your project compiler
This is slow, painful, error-prone; and you get
almost no help in locating the source of bugs
If you use Java, it is super-important to be
very very disciplined


See the basic principles in Ch.1!
It is probably important to make good use of the
visitor pattern
Other Development Resources


All projects will develop compilers that
target the Intel x86 architecture
x86 development tools will be GNUbased, running under Linux

For some development, can use



MinGW system for Windows (but note
incompatibilities with Linux gcc)
Mac OS X (with development tools installed)
See the webpage for details
Summary



This is intended to be a fun and
engaging project-oriented class
By the end of the term, you will have
implemented a serious compiler for a
nontrivial programming language!
Stay engaged, and pay attention to
coding style and modularity, and it is
fun and profit
Last Thing



Prepare textbook
Read the online SML book
Take your laptop to class next time