Computer Science 210
Computer Organization
Building an Assembler
Part IV: The Second Pass, Syntax Analysis,
and Code Generation
The Second Pass
Tools
Opcode table
Line stream
Text file
CharacterIO
Scanner
Token stream
First Pass
Source program listing,
error messages
(file and/or terminal)
Symbol table
Sym file
Second Pass
Bin file
What the Second Pass Does
• Scans through the lines of code and
performs syntax analysis
• Translates each line of code to a 16-bit
binary instruction (or data values when
.FILL, .STRINGZ, and .BLKW appear)
Implementation: The Data
#define FILL_ZERO "0000000000000000"
#define SIX_ZEROS "000000"
static
static
static
static
static
int
int
token
FILE*
char
spAddress;
spNotDone;
spToken;
binfile;
outputBuffer[17];
//
//
//
//
//
The address counter
More instructions?
The current token
The output file
The current binary instruction
The Top-Level Function
// Initializes the data, gets the first instruction, gets the
// first token, and calls program()
void secondPass(FILE* infile, FILE* outfile, FILE* bfile){
binfile = bfile;
outputBuffer[16] = 0;
initScanner(infile, outfile);
spAddress = DEFAULT_START_ADDRESS;
spNotDone = nextInstruction();
spToken = nextToken();
program();
}
Implementation: Second Pass Tools
• Define some utility functions to
– Output a line of binary code
– Process a label reference
– Finish an instruction (scans to end of line, increments
the address counter, gets the next token)
– Check a token’s type and output an error message if
it’s unexpected
Finishing an Instruction
// The purported end of an instruction has been reached, so
// check for the newline, get the next instruction, get
// its first token, and increment the address counter.
void finishInstruction(){
spToken = nextToken();
accept(TC_NEWLINE, "Too many tokens in instruction.");
spNotDone = nextInstruction();
if (spNotDone){
spAddress++;
spToken = nextToken();
}
}
The Parsing Functions
• Each syntax rule in the EBNF grammar
translates to a parsing function
• Each function assumes that the current token
is its start symbol
• Each function calls finishInstruction
as its last step
The Protoypes
// Parsing function prototypes
void program();
void instruction();
void orig_ins();
void add_or_and_ins();
void blkw_ins();
void br_ins();
void fill_ins();
void jmp_ins();
void jsr_ins();
void jsrr_ins();
void ld_ldi_st_sti_ins();
void ldr_or_str_ins();
void lea_ins();
void not_ins();
void ret_or_rti_ins();
void stringz_ins();
void trap_ins();
Instructions with the same format differ only in the leading token
Parsing with the Top-Level Rule
// program = [ orig-directive ] { [ label ] instruction } ".END"
void program(){
orig_ins();
while (spNotDone && spToken.type != TC_END)
instruction();
accept(TC_END, ".END expected.");
}
We stop when .END is reached or there are no more instructions
accept checks the current token’s type for possible error
Parsing the not Instruction
// not-ins = "NOT"
register "," register
void not_ins(){
strcpy(outputBuffer, spToken.binary);
spToken = nextToken();
accept(TC_REG, "Register expected.");
strcat(outputBuffer, spToken.binary);
spToken = nextToken();
accept(TC_COMMA, "Comma expected.");
spToken = nextToken();
accept(TC_REG, "Register expected.");
strcat(outputBuffer, spToken.binary);
outputBinary();
finishInstruction();
}
Parsing the .FILL Directive
// fill-ins = ".FILL"
integer-literal
void fill_ins(){
spToken = nextToken();
accept(TC_INT, "Integer literal expected.");
strcpy(outputBuffer, signedBinary(spToken.intValue, 16));
outputBinary();
finishInstruction();
}
Should add a check on the bounds of an integer fill value!
Parsing the .BLKW Directive
// blkw-ins = ".BLKW"
integer-literal
void blkw_ins(){
strcpy(outputBuffer, FILL_ZERO);
spToken = nextToken();
accept(TC_INT, "Integer literal expected.");
int i;
for (i = 1; i <= spToken.intValue; i++)
outputBinary();
spAddress += spToken.intValue - 1;
finishInstruction();
}
Should add a check on the memory available for the
given number of words!
Parsing the .STRINGZ Directive
// stringz-ins = ".STRINGZ"
string-literal
void stringz_ins(){
spToken = nextToken();
accept(TC_STRING_LIT, "String literal expected.");
char* lit = spToken.source;
int i;
for (i = 0; i < spToken.intValue; i++){
char ch = lit[i];
strcpy(outputBuffer, unsignedBinary(ch, 16));
outputBinary();
}
strcpy(outputBuffer, FILL_ZERO);
outputBinary();
spAddress += spToken.intValue - 2;
finishInstruction();
}
Should add a check on the memory available for the characters!
Parsing the LD Instruction
// ld-ins = "LD" register "," label
void ld_ldi_lea_st_sti_ins(){
strcpy(outputBuffer, spToken.binary);
spToken = nextToken();
accept(TC_REG, "Register expected.");
strcat(outputBuffer, spToken.binary);
spToken = nextToken();
accept(TC_COMMA, "Comma expected.");
spToken = nextToken();
processLabel(9);
outputBinary();
finishInstruction();
}
LD, LDI, LEA, ST, and STI all have the same format
Processing a Reference to a Label
// Converts an integer to a signed bit string and
// appends that to the output buffer
void processLabel(int numBits){
accept(TC_LABEL, "Label expected.");
if (spToken.type == TC_LABEL){
int labelAddress = findSymbol(spToken.source);
if (labelAddress == -1)
putError("Undeclared label.");
else{
int offset = labelAddress - (spAddress + 1);
strcat(outputBuffer, signedBinary(offset, numBits));
}
}
}
Make sure there is a label, make sure it’s declared, and use its
address and PC + 1 to compute the offset of length numBits
Should add a check on the limits of the offset!
For Friday
Review and Wrapup
© Copyright 2026 Paperzz