Lobster Design Journal - Rose

Lobster 16-bit Processor
Design Journal
Brandon Cannaday
Michael Kuehl
Andrew Toth
Scott Turner
September 20th, 2003 (Day 1)
Today marks the beginning of the design process. The first thing accomplished
was creating a name for the project that was simpler than saying “The Computer
Architecture Final Project.” The name agreed upon was Lobster, which obtains its roots
from an inside joke between a small group of people. The project will be referred to as
Lobster from this point on.
The first iteration of the assembly language algorithm was created today (by
Mike). This version of the program is not tailored to Lobster at all and its primary
function is to give us a basic understanding of what kind of instructions we will need in
our assembly language specification.
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
File:
Project.asm
Written by: Michael Kuehl, 9/18/2003
Euclid's Algorithm
Currently 7 registers
Currently 7 different commands
Register usage
$zero - 0
$s0 - A
$s1 - B
#s2 - two uses:
1) flag for a < b
2) temp storage
$s3 - The constant 1
$s4 - A (temp)
$s5 - B (temp)
.text
.globl main
main:
# Text section of the program (as opposed to data).
# Make MAIN globl so you can refer to it in SPIM.
# Program starts at MAIN.
#wait for input forever
ori $s0, $0, 8
# $s0 = number entered by user
beq $s0, $zero, main
# If user input is 0, start again
ori
ori
add
add
$s1,
$s3,
$s4,
$s5,
$0, 2
$0, 1
$zero, $s0
$zero, $s1
InternalLoop:
beq $s5, $zero, ExternalLoop
#
#
#
#
B ($s1) starts at 2
The Constant 1
Load A ($s0) into Temp A ($s4)
Load B ($s1) into Temp B ($s5)
#Jump to ExternalLoop if Temp B
($s5) equals 0
slt $s2, $s4, $s5
beq $s2, $zero, ELSE
add $s2, $zero, $s4
add $s4, $zero, $s5
add $s5, $zero, $s2
j InternalLoop
#If B Temp > A Temp, $s2 = 1
#If $s2 = 0, jump to ELSE
#
# Swap A Temp and B Temp
#
# Jump to Internal Loop
sub $s4, $s4, $s5
j InternalLoop
# Temp A = Temp A - Temp B
# Jump to InternalLoop
ELSE:
ExternalLoop:
beq $s4, $s3, found
add $s1, $s1, $s3
add $s4, $zero, $s0
add $s5, $zero, $s1
j InternalLoop
#
#
#
#
#
If TempA = 1, Jump to Found
Add 1 to B
Load A ($s0) into Temp A ($s4)
Load B ($s1) into Temp B ($s5)
Jump to InternalLoop
found:
#display B somehow
#j main
From this program we created a list of registers and instructions we will need as of now
(we are still waiting on interrupts to be taught in class).
Register Specifications
Register Name
Number
$zero
0
The Constant 0
Usage
$at
1
Reserved for Assembler
$v0
2
Results of a Procedure
$a0
$a1
$t0
3
4
5
Argument 1 for a Procedure
Argument 2 for a Procedure
temporary (not preserved across call)
$t1
6
temporary (not preserved across call)
$t2
7
temporary (not preserved across call)
$t3
8
temporary (not preserved across call)
$s0
9
saved temporary (preserved across call)
$s1
10
saved temporary (preserved across call)
$s2
11
saved temporary (preserved across call)
$s3
12
saved temporary (preserved across call)
$s4
13
saved temporary (preserved across call)
$sp
14
stack pointer
$ra
15
return address
Assembly Language specifications
Category
Instruction
Arithmetic
Add
Subtract
Logical
Data Transfer
Example
add $s1, $s2,
$s3
sub $s1, $s2,
$s3
Meaning
Comments
$s1 = $s2 +$s3
Regular Addition
$s1 = $s2 - $s3
Regular Subtraction
Shift Left logical
sll $s1, $s2, 10 $s1 = $s2 << 10
Shift Left by Constant
Shift Right
Logical
srl $s1, $s2,
10
$s1 = $s2 >> 10
Shift Right by Constant
Load Lower
Immediate
lli $s1, 10
$s1 = 10
Loads constant into Lower
8 bits
Load Upper
Immediate
lui $s1, 10
$s1 = 10 * 2^8
Conditional
Branch
lw $s1,
10($s2)
sw $s1,
Store Word
10($s2)
beq $s1, $s2,
Branch on Equal
$s3
Branch on Not
bne $s1, $s2,
Equal
$s3
slt $s1, $s2,
Set on Less Than
$s3
$s1 = Memory[$s2
+ 10]
Memory[$s2 + 10]
= $s1
if($s1 == $s2)
goto $s3
if($s1 != $s2) goto
$s3
if($s1 < $s2) goto
$s3
Loads constant into Lower
8 bits
Word from Memory to
Register
Word from Register to
Memory
Equal test – instruction
address in $s3
Not equal test – instruction
address in $s3
Compare Less Than –
instruction address in $s3
Unconditional
Jump
Jump
j $s1
goto $s1
Jump to address
Jump and Link
jal $s1
$ra=Current, goto
For procedure call
$s1
Load Word
After the initial assembly language specification was created, the next logical step
was to invent some sort of machine language specification.
The first thing to be decided upon was an op-code. Deciding how many bits this
was to be lead to a series of massive arguments ultimately leading to the defenestration of
one of the members. Maybe it wasn’t quite that severe; however deciding on the number
of bits was difficult. A 3 bit op-code would allow 13 bits for the remaining instructions
however it only allowed for a total of 8 instructions. A 4 bit op-code, on the other hand,
allowed for a total of 16 instructions, but would only allow 12 bits for the remaining
instructions. After realizing that forcing the algorithm into a mere 8 instructions would
prove to be very difficult, we decided on a 4 bit op-code.
Since the machine language needs a register address the next step was deciding on
how many bits it requires. This time the only argument was between Mike and himself.
Apparently there were some fists thrown and some name calling, but there were no
witnesses or concrete evidence of this. In the end the register address became a 4 bit
number allowing Lobster the ability to accommodate 16 registers for your programming
convenience.
The machine code then split itself into three different instruction formats, each
with a distinct purpose. The first form begins with a 4 bit op-code, followed by 3, 4 bit
register addresses.
Type A
Op-Code
4 bits
Register Address
4 bits
Register Address
4 bits
Register Address
4 bits
This format is used for any instruction that requires 3 registers (i.e. add $s1, $s2, $s3).
The second instruction format is for dealing with immediate values. This format begins
with the 4 bit op-code, followed by a 4 bit register address, and then lastly an 8 bit value.
Type B
Op-Code
4 bits
Register Address
4 bits
Misc. Value
8 bits
This format is for loading an immediate value into a register (i.e. lw $s0, 5). The third
and last instruction format is used for jumping. It begins with the 4 bit op-code, followed
by a register address, and then ended with 0’s. Since jumping only requires a register
address, the remaining bits are not needed, therefore will remain 0’s.
Type C
Op-Code
4 bits
Register Address
4 bits
Unused
8 bits (0’s)
While Mike was coming up with the above information, I (Brandon) began
construction on the webpage. Since boring web pages are boring, I decided to make ours
slightly more interesting to read. The navigation bar on the left will hold links to all the
milestones as well as links to the design journal and assembly language/machine
language specifications, and of course any other relevant information that we deem
important enough to add.
September 23rd, 2003 (Day 2)
Today was slightly less eventful than Day 1. The major accomplishment was
modifying the program. The initial program that was written was in MIPS however was
written as if it were to receive an input from a user. As of now we do not know how to
give PCSPIM an input or if it is even possible. I (Brandon) suggested modifying the
program so that it would run in PCSPIM. This is made possible by simply defining a
number to the input. When the program was run it was found to be slightly flawed. Mike
then rewrote the program so it ran correctly. By doing this we have successfully tested
that our algorithm works perfectly and is ready to be converted to our assembly and
machine languages.
Mike also created the machine language specification for Lobster today.
Command
Add
Subtract
Shift Left Logical
Shift Right Logical
Load Lower Immediate
Load Upper Immediate
Load word
Store word
Branch on Equal
Branch on Not Equal
Set on Less Than
Jump
Jump and Link
Type
A
A
A
A
B
B
A
A
A
A
A
C
C
Op-Code
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
The machine language specification is basically an extensive of Day 1’s instruction
format specification. Op-Codes are assigned to each instruction, and the types
correspond to the 3 different instruction formats created earlier.
There were minor updates to the web page today as well. The link to milestone 1
now has a page to go to. It includes the assembly language and machine language
specifications.
Most attempts to figure out interrupts have failed, so we are still waiting to go
over them in class.
September 24th, 2003 (Day 3)
Interrupts were taught today! And now that we know them, we realize that there
are many changes that need to be made to the design. The goal today is to complete
milestone 1.
Before the changes that involve interrupts are made, Mike made changes that
make our assembly language easier to implement, however harder to code. But since we
aren’t being grading on the ease of programming our processor, rather how well it works,
the changes are necessary. One of the easiest things to change to make the processor
easier to implement is limiting immediate values. There are currently four instructions
that use immediate values that can be changed to use registers instead.
- sll now is of type A, instead of using an immediate value to determine shift, the
value must first be stored in a register.
- lw now is of type A, the loaded value is taken from a register
- sw now is of type A, the value is taken from a register
The jump instruction format which used to have the op-code followed by the register
address has been modified so the register address is at the end. This was done because of
similar styles found in format A. “bne” and “beq” have the register address that it is
jumping to as the last four bits. We decided, as of now, that it would probably be simpler
to implement if every instruction that required jumping has the address that it is jumping
to as the last four bits. This way we may be able to create a generic jump system that
works with all instructions that require jumping.
Along with these minor changes came some rather major changes from the
original design. The “srl” command was removed to make room for an “or” command.
The “or” command is used when the user enters a 16 bit number. Since the user can only
enter 8 bits at a time, we need to take the first 8 bits store them in a register, shift it over 8
bits and then “or” the 2nd 8 bits into the same register. The result is a 16 bit number
stored in a particular register.
There had to be a few instructions added when we began dealing with I/O and
interrupts. The chart below illustrates the new instruction set and corresponding opcodes.
A-Type Instructions
Instruction
ADD RD, RS, RT
SUB RD, RS, RT
OR RD, RS, RT
SLL RD, RS, RT
LW RD, RS(RT)
SW RD, RS(RT)
BEQ RD, RS, RT
BNE RD, RS, RT
SLT RD, RS, RT
DISP PORT, RS
OP-Code
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
Instruction
LUI RD IMM
LLI RD IMM
OP-Code
1010
1011
Instruction
J RT
JAL RT
MASKI
RFI
OP-Code
1100
1101
1110
1111
L-Type Instructions
J-Type Instructions
The program was rewritten today to deal with interrupts. According the professor,
everything seemed to be correct.
#
#
#
#
#
#
#
#
#
#
#
File:
Project.asm
Written by: Michael Kuehl, 9/25/2003
Euclid's Algorithm
Register usage
$zero - 0
$s0 - A
$s1 - B
#s2 - three uses:
1) flag to start
#
#
#
#
#
#
#
#
$s3
$t0
$t1
$t3
$v0
$v1
2) flag for a < b
3) temp storage
- The constant 1
- A (temp)
- B (temp)
- address holder
- Interupt input
- Display Register
init:
la $t3, main
add $s1, $zero, $zero
lli $s1, 2
add $s3, $zero, $zero
lli $s3, 1
add $s2, $zero, $zero
# B ($s1) starts at 2
# The Constant 1
main:
# $s0 = number entered by user
beq $s2, $zero, $t3
# If Start flag == 0, don't start
beq $s0, $zero, $t3
# If user input is 0, start again
MaskI
# Keeps interupts from happening for now
add $s4, $zero, $s0
# Load A ($s0) into Temp A ($s4)
add $s5, $zero, $s1
# Load B ($s1) into Temp B ($s5)
InternalLoop:
la $t3, ExternalLoop
beq $s5, $zero, $t3
equals 0
la $t3, ELSE
slt $s5, $s4, $t3
add $s2, $zero, $s4
add $s4, $zero, $s5
add $s5, $zero, $s2
la $t3, InternalLoop
j $t3
ELSE:
sub $s4, $s4, $s5
la $t3, InternalLoop
j $t3
ExternalLoop:
la $t3, found
beq $s4, $s3, $t3
add $s1, $s1, $s3
add $s4, $zero, $s0
add $s5, $zero, $s1
la $t3, InternalLoop
j $t3
#Jump to ExternalLoop if Temp B ($s5)
#If B Temp > A Temp, $s2 = 1
#
# Swap A Temp and B Temp
#
# Jump to Internal Loop
# Temp A = Temp A - Temp B
# Jump to InternalLoop
#
#
#
#
If TempA = 1, Jump to Found
Add 1 to B
Load A ($s0) into Temp A ($s4)
Load B ($s1) into Temp B ($s5)
# Jump to InternalLoop
found:
add $v0, $zero, $s1 # Puts B (The Rel. Prime Number) into display
register
disp $v0, PORT
# Displays $v0
add $s2, $zero, $zero
MaskI
# Lets Interupts happen again
j main
# Jumps back to main
InputIntr:
sll $s0, $s0, 3
or
$s0, $s0, $v0
add $v1, $zero, $s0
disp $v1, PORT
#
#
#
#
Shifts A over by 8 bits
loads $v0 into the lower 8 bits
Puts A into $v0
Displays $v0
RFI
# Returns to program
StartIntr:
lli $2, 1
RFI
# Sets start flag to 1
# Returns to program
The next step after completing the assembly language program is to convert it all to
machine code. While I (Brandon) wait for this long process to be completed, I jumped
back on the webpage and created links to all the files that are offered thus far. I also
updated the current Milestone 1 page to the design we have now.
We also decided to try and create an assembler for our assembly language. This
may not happen depending on how much time we have. We did agree on the fact that it
would be neat.
During the process of converting the assembly language to machine code, Mike
noticed the need for some specialty registers. These include IAT0, IAT1, IAT2, and IEB.
IAT0 is connected to the input interrupt, which is called when a user inputs a number.
IAT1 is the start interrupt. This is called when the user has already entered the number
and wants to begin computations. IAT2 is the display port, and IEB is the interrupt
enable bit. When IEB is 1 you can’t have interrupts, when it is 0, interrupts are enabled.
Register
IAT0
IAT1
IAT2
IEB
Address
0000
0001
0010
1111
The address can be the same as regular addresses because when these are called, the
program knows to invoke specialty registers over regular ones.
Just for kicks, here is the pure machine code conversion of our program as of
today.
# File:
Project.asm
# Written by: Michael Kuehl, 9/25/2003
# Euclid's Algorithm
#
#
Address
Instruction
0000000000000000 1010100000000000
0000000000000010 1011100001100100
0000000000000100 1001000010000000
0000000000000110 1010100000000000
0000000000001000 1011100001110000
0000000000001010 1001000110000000
0000000000001100 1010100000000000
0000000000001110 1011100000011010
0000000000010000 0000101000000000
0000000000010010 1011101000000010
0000000000010100 0000110000000000
0000000000010110 1011110000000001
0000000000011000 0000101100000000
0000000000011010 0110101100001000
0000000000011100 0110100100001000
0000000000011110
0000000000100000
0000000000100010
0000000000100100
0000000000100110
0000000000101000
0000000000101010
0000000000101100
0000000000101110
0000000000110000
0000000000110010
0000000000110100
0000000000110110
0000000000111000
0000000000111010
0000000000111100
0000000000111110
0000000001000000
0000000001000010
0000000001000100
0000000001000110
0000000001001000
0000000001001010
0000000001001100
0000000001001110
0000000001010000
0000000001010010
0000000001010100
0000000001011000
0000000001011010
0000000001011100
0000000001011110
0000000001100000
0000000001100010
0000000001100100
0000000001100110
0000000001101000
0000000001101010
0000000001101100
0000000001101110
0000000001110000
0000000001110010
1110000000001111
0000010100001001
0000011000001010
1010100000000000
1011100001000100
0110011000001000
1010100000000000
1011100000111100
1000010101101000
0000011100000101
0000010100000110
0000011000000111
1010100000000000
1011100000100100
1100000000001000
0001010101010110
1010100000000000
1011100000100100
1100000000001000
1010100000000000
1011100001011000
0110010111001000
0000101010101100
0000010100001001
0000011000001010
1010100000000000
1011100000100100
1100000000001000
1001001010100000
0000101100000000
1010100000000000
1011100000011010
1110000000001111
1100000000001000
0000011100000000
1011011100000011
0011100110011011
0010100110010010
1001001010010000
1111000000001111
1011101100000001
1111000000001111
And now I (Brandon) will take all of the files that are ready to be submitted as
Milestone 1 and make them available online through the Lobster webpage. The marks
the completion of Milestone 1, pending any changes the professor may suggest.
October 28th, 2003 (Day 4) – Milestone 2
Milestone 2 begins with the assignment of tasks. The sixteen instructions that
Lobster offers were split up today among the four members of our group. Each member
was to take their corresponding instructions and convert them into Register Transfer
Language.
Cannaday:
lui
lli
sll
sub
Kuehl:
or
assert
maski
rfi
Toth:
add
j
sw
lw
Turner:
ben
beq
slt
jal
That’s it for today. Hopefully we all make sure to take into account that Lobster
specifications are different from MIPS.
October 3rd, 2003 (Day 5)
Today was the meeting with professor, as well as the day we decided to have our
RTL specs finished so the group could review one another’s work. The professor
comments were encouraging; however the RTL specs were not. It seems that we
apparently do not know the machine language specs of our own processor. I won’t
mention names, but one member switched register addresses, and another created a RTL
for a 32 bit processor. Eventually we got everything all straightened out, and the two
members took their RTL specs back to be repaired. They will be emailing them to me
(Brandon) and Mike to be compiled in the Milestone 2 docs.
We decided that I would continue to write the design journal (what fun) for
consistency. Mike will be creating the finished RTL specification for turn in, and I will
continue updating the webpage as necessary.
October 6th, 2003 (Day 6)
Crunch day! All the specs have been emailed and this is the day we begin
combining everything into a finished product. There are errors that need to be fixed
within all of our RTL specs, but most are minor and were easy to repair. Scott was (I
guess) still a little confused over the difference in Lobster specs vs. MIPS. He combined
Lobster and MIPS and used Lobster addresses in a MIPS style RTL. I accidentally used
an 8 bit register address, instead of a 4 bit address in LUI (oops), but all of these are
fixable.
A proposed addition to the webpage was made today, from Mike to me. Since the
RTL specs can be nicely converted into a flow chart, why don’t we put a fully animated
flow chart of the entire RTL on the webpage? After much mental anguish over how
tedious such a task will be, I decide to do it. So look at the website for an animated RTL,
it will be neat.
(3 hours later) The flow chart is done, and the Design Journal decided to corrupt
itself, so I had to reformat the last few days. All the docs are done, and linked on the
webpage. It seems that this milestone is done, hazaa! Since the RTL specs will take up a
few pages, I decided to leave them out of the design journal. All that’s left is for all of
this to be committed to CVS for your grading enjoyment.
October 10th, 2003 (Day whatever)
Today we decided it was way too nice outside to spend it in our dorm rooms, so
we all met in Olin to stare at a white board for two hours and design our data path. This
time Scott and Andrew were the two that were going to do most of the work. After the
data path and state diagrams were done being drawn on the boards, they were to convert
them into some digital form. I (Brandon) was at the white board, Mike was at my laptop
reading off instructions, Andrew was taking pictures for future reference, and Scott was
taking notes of the state diagram. After 2 hours, 3 markers, and 20 feet of dry erase
board later, they are all done. All that’s left to do now is stand back and marvel in our
genius before we leave it to the weekend cleaning crew to destroy.
October 12th, 2003 (Day whatever + 1)
As usual, Mike and I will be taking everything and compiling it into a finished
Milestone. Today we received and reviewed the Flash animations of the Data Path and
State Diagram. After going through each instruction, we found that there were some
minor changes that needed to be made to the data path. The data path is now essentially
finished and ready to be turned it.
We also received our grade for Milestone 2, which suggested some changes to the
RTL. The most important of which was the ability to remove one clock cycle from jal,
beq, slt, and bne. For RFI we noticed we were trying to read and write to the IT register
at once, and we can’t do that for what it needs to do. Therefore, another clock cycle was
needed. This will make adjusting the state diagram tomorrow fun.
October 13th, 2003 (Day whatever + 2)
This is the exciting day referred to earlier as crunch day. As promised, there were
lots of changes to the state diagram. It didn’t help that in our meeting we named each
register a letter, and now we have to change them all to something that reflects what it
actually does. For example, mux “n” is now called “PCMux.” So after an hour of
modifying the state diagram it’s time to take a break and go to the SRC. We’ll be back in
45 minutes.
The state diagram is now adjusted to reflect all changes. There was one change
that needed to be made to the data path. The input to the ITDataMux that was originally
a 1 needed to be changed to a 0. The webpage contains the Flash animations of both the
data path and the state diagrams. Using Flash makes reading them much easier, since
you can zoom in and out and move the image around to look a specific part closer. And
of course, for your viewing enjoyment, here are the images of the data path and state
diagram.
I don’t know if you remember me saying that we would have liked to create an
assembler for Lobster assembly language, but hopefully you do not. It’s not likely to
happen. The Milestones seem to be getting closer and closer together and there just isn’t
the required time available to create the program. It seems that all the documents are
nearly finished, so this ends Milestone 3.
October 22th, 2003 (Day whatever + 12)
Well, for a change of pace for this Milestone the design journal shall be done by me,
Andrew Toth. The team assignments for this milestone split nicely among the other
members of the group so I will be taking care of this. Although I was initially assigned to
make the “sll” component, Mike and I had a fierce battle over who would actually do it.
Before it got too violent I decided to let him do it. Seeing as how I am the only one in our
group who gets to experience the amazing joys of the Sophomore Engineering curriculum
this quarter I thought he deserved to have a little fun too. That and the fact that he went
ahead and did it last night and I did not, so therefore he got to do it. So tonight I simply
have to update the design journal. Brandon is working on writing our assembler. We
decided it would be a good idea to make one after all since it was our group who did
suggest it and we want those 10 extra points. He will also do some touch up and
administration of the webpage once everything is gathered from the group. Mike has
spent countless hours glued to his laptop and Xilinx and has created and tested the
components for registers A, B, and C, the PC, the IR, the Interrupt Control, the ALU, the
Register file, the IT register file, and as mentioned above, the SLL component. There
were a few delays when his index finger cramped up and he may need carpal tunnel
release surgery but all is well for now. While still at it he has also made documentation of
the testing of all the components. Scott has been busy working on translating the state
diagrams for the control into Xilinx, and will be including the minor changes needed to
catch up with our latest modifications. I sadly have only done this humble design journal,
but most of the other work was done even before I asked how the work was being divided
up for this part of the project, so oh well. Perhaps some screenshots or the like will be
added to this later.
October 23rd, 2003
Today hasn’t been a productive day, and this is Brandon again. This morning at
about 1:41am, XiLinx completely destroyed Mike’s computer while he tried to add the
HDL file to the Lobster project. It turns out that any computer this is tried on is instantly
bluescreened. The goal for today is to complete the milestone, but with the current
computer difficulties, this may not be doable. The way around this glitch (hopefully) is
to create another file and add the HDL file of the control first, and then add all the other
Lobster files after it. As it stands right now, everything is done except testing the control.
There were some changes made to the RTL because interrupts are finally
understood, and due to some advice of the processor Mike was successfully able to add
the HDL file to the project. To do this, he created a blank file and added it, and then
copied and pasted the actual HDL code to the new file.
The webpage is updated. Unfortunately, this milestone did not let me make any
neat graphics to look at. I tried; however getting high quality images out of XiLinx is
quite difficult.
At this point the assembler can convert all A-type and L-type instructions to
binary and decimal equivalents. I will continue programming hours on end to complete it
for next Wednesday. This milestone is done.
November 7th, 2003
The project is almost done, however the long process of making sure each
instruction works correctly has just begun. For this milestone, the work split-up is as
follows. Mike is working diligently in XiLinx debugging the processor. Andrew is
writing up the final report. And I (Brandon) will be making the presentation for
presentation day. Scott, I haven’t seen in class for a couple of days, so no assignment for
him has been made yet. The presentation will be created in Flash, since it allows more
flexibility than PowerPoint. The final report will set a new standard by which literature is
written, and Lobster will be faster than any processor currently on the market.
We’re hoping to have the processor fully working by Monday, so as to show it to
the professor for review. It is also our goal to implement this on the chip emulator thing
that takes our processor and pretends to be it, or whatever it does. This is of course
dependant on how long Mike wishes to stare at XiLinx in order to make it work. Today
he successfully completed the implementation of lli, add, and assert.
In order to get these instructions to work, changes needed to be made to the
existing RTL, and logic of the processor. The changes to the RTL included changing
which clock cycle the PC was incremented to clock cycle 3. This was done because if it
was incremented in clock cycle 2, as it was, the instruction that was supposed to be
implemented was overwritten by the instruction at PC + 3. The timing was such that the
PC was done being incremented while the IR was still reading the value for the
instruction.
The changes inside Lobster included the following. Temporary registers A and B
have an inverted clock, so they read on the other side of the clock cycle. This was done
because A and B were not getting their values soon enough. A and B now get their values
a half clock cycle earlier, which makes the ALU slightly happier than before. The
Control Unit had a pretty big mistake when dealing with selecting the Reg1Mux value. It
was set to select input 2 instead of 1. Other than that, the changes were simply random
simple mistakes.
The presentation made a little process today as well. I have decided to start with
discussing our machine language specification. Such as instruction formats, opcode
lengths, blah, blah, blah. I have created the frames necessary for whoever will be
discussing this section.
November 8th, 2003
Today is a continuation of the debugging from yesterday. Hours and hours of
Mike staring at his computer screen may have caused him to go temporarily insane,
however the thought of another 5 or so instructions working may help him a little. I was
writing small programs and then using the assembler, which I modified so it outputs in
the format needed for the RAM, to compile them. These programs were then inputted
into Lobster and slowly we were able to get it to the point where it was able to find the
gcd of 2 numbers. To get this to work, each instruction had to be debugged one and a
time.
“Sub” required the entire adder to be redone. The ALU that we built was a little
too buggy to be used, therefore we switched to a generated ALU from XiLinx. Doing
this was more difficult than that sentence implied, but needless to say, it was difficult
(according to Mike). After the switch, each instruction that was working yesterday had to
be retested.
“OR” just worked after the new ALU was added, therefore nothing was changed.
“SLL” required a change to the control unit state diagram. The CMux was set in
one state; however it was not saved when going to the next state. This caused CMux to
be reset to 0. It could not explicitly be reset in the following state since there were other
instructions branching to it; therefore SLL had to be separated out.
“SW”, “LW”, and “MASKI” just worked all by themselves.
“JAL” had a pretty interesting problem. What was happening was the linking part
was coming after the jumping part. So when it was storing the address in $ra, it was the
address it was jumping to instead of the address of where it was currently at. This was
solved by adding a state in the control unit which linked before it jumped.
“BEQ” needed another state as well. This time the state’s only purpose was to
waste time. This was because the values being compared weren’t being loaded into the
registers before they were being compared. We needed give the registers more time to
load the values.
“BNE” had the same problems as “BEQ”, plus the ALU didn’t stabilize the ALU
output in time for the PC to be written. So we had to add another waiting state to allow
time for the ALU to process the inputs.
“SLT” again had the same problems as “BNE” and “BEQ”. Much like “BNE”, a
waiting state had to be added since the ALU did not stabilize itself in time for the PC to
be written.
After all of these instructions were found to work, I started writing programs
while Mike implemented them. The first program that was written was just a very simple
add and load program that then asserted a value to the display ports. After this program
ran successfully, I modified it with some jumping. This program also ran successfully.
We hit a problem when I did some pretty wild jumping and jump and linking.
Apparently my assembler hit a glitch (I still can’t explain it) that caused it to load label
addresses wrong. Once we found the problem, through very exhaustive step-by-step
debugging, the program ran successfully. The next step was to try Euclid’s Algorithm in
Lobster. This failed pretty badly. I then went through and found the way it was currently
written got itself into an infinite loop. We decided we would be happy just if the
processor could find the gcd of any two numbers. So I wrote the algorithm in Lobster
necessary to do this. After some minor problems in the program were fixed, like
branching when a<b instead of a>b, the processor was successfully able to run this
program. It calculated the gcd of any numbers we were able to throw at it.
Just for kicks, we decided to calculate what the clock speed of Lobster is. Since
the high and low times of the clock are 1250ps, or 1.25ns, the total clock cycle time is
2.5ns. Using the equations learned in class, and the magic of Maple, we calculated the
clock speed to be 400 MHz.
November 9th, 2003
The processor works today! Now let’s discuss how it got there. Again, Mike
worked hour after hour debugging and debugging to make this miracle happen. The only
untested instructions that needed to work were “RFI”, “IATO”, and “IAT1”. Apparently
the only major change needed to get these three instructions working was the inversion of
the clock input on the ITReg file. Other than that all that was required was minor
tweaking, like WIB had to be reset during the last cycle on the interrupt, not the first, and
values had to be reinitialized through each state, since XiLinx does not automatically
carry over values.
Mike modified the program I wrote yesterday to now find the lowest relative
prime number to a given input. This time the program uses procedures, unlike our very
first program. For some reason the assembler now works fine (still can’t explain it). As
far as we can tell, the processor is working just fine. The only problem is the algorithm
takes a long time when checking large numbers, so testing is becoming a more and more
difficult process. We tested the number 2*3*4*5*6, which equals 120. The lowest
relative prime number to this number will be seven; however the number of clock cycles
needed to calculate this is 14865. This takes a long time to render in ModelSim. We can
now see why this program was chosen to test our processors; it just about the simplest
program that takes the most about of time. With a little work, the program was modified
to remove eight instructions per iteration. Now the relative prime can be found in 7163
clock cycles.
A Whole Bunch of Days all in One Glorious Section
After the processor worked, the last thing was just the compilation of documents,
creating the final presentation, and final report. However, since we completed it so early
it was our responsibility, as good citizens, to implement it in hardware. This was so
much harder than typing the four words to describe it.
The first gigantic undertaking was figuring out which pen was which. We were
so kindly provided with the pin configuration of the FPGA, and that of the LED board,
but there was no document describing the communication between them. So after six
hours of work, I was never so excited to see a number display on an LED. The next
problem was that of displaying multiple numbers. For some unbelievably stupid reason,
the LED board didn’t have inputs for each separate LED. Therefore, I had to refresh
through all four at 4ms each. Getting it to do this was another complicated process. I
went and talked to the professor and after a couple of hours it was decided that to do this
we had to do:
Slow the clock speed by counting to 2^14
Use the 14th bit as the clock to a 2 bit counter
The output of the counter then cycled through a multiplexer which selected each
of the LED at 4ms each
The reason I couldn’t cycle through at clock speed was that the LED act like capacitors.
They need time to charge and discharge before lighting again. The result of 50 MHz
cycling was a faint 8 on each LED. The reason for the 8 instead of the input was that it
was cycling faster than the multiplexer could select a 7 bit value for the LED. After I got
the displays to fully work, I made a sub circuit that took in a 4 bit number and outputted
the correct 7 bit LED signal. The next step was making a simple program work. I was
able to put a counter on it that counted down if a switch was enabled, and up when it was
disabled. Nothing happened when it hit zero, it just started over.
Of course, all of this eventually led to nothing. Our processor only works in a
specific range of clock speed. The FPGA runs at 50 MHz, and our processor, for some
unknown reason, does not work at these speeds. It’s a sad day for Comp Arch kind, a sad
day indeed.