Lobster 16-bit Processor Design Journal Brandon Cannaday Michael Kuehl Andrew Toth Scott Turner September 20th, 2003 (Day 1) Today marks the beginning of the design process. The first thing accomplished was creating a name for the project that was simpler than saying “The Computer Architecture Final Project.” The name agreed upon was Lobster, which obtains its roots from an inside joke between a small group of people. The project will be referred to as Lobster from this point on. The first iteration of the assembly language algorithm was created today (by Mike). This version of the program is not tailored to Lobster at all and its primary function is to give us a basic understanding of what kind of instructions we will need in our assembly language specification. # # # # # # # # # # # # # # # # # File: Project.asm Written by: Michael Kuehl, 9/18/2003 Euclid's Algorithm Currently 7 registers Currently 7 different commands Register usage $zero - 0 $s0 - A $s1 - B #s2 - two uses: 1) flag for a < b 2) temp storage $s3 - The constant 1 $s4 - A (temp) $s5 - B (temp) .text .globl main main: # Text section of the program (as opposed to data). # Make MAIN globl so you can refer to it in SPIM. # Program starts at MAIN. #wait for input forever ori $s0, $0, 8 # $s0 = number entered by user beq $s0, $zero, main # If user input is 0, start again ori ori add add $s1, $s3, $s4, $s5, $0, 2 $0, 1 $zero, $s0 $zero, $s1 InternalLoop: beq $s5, $zero, ExternalLoop # # # # B ($s1) starts at 2 The Constant 1 Load A ($s0) into Temp A ($s4) Load B ($s1) into Temp B ($s5) #Jump to ExternalLoop if Temp B ($s5) equals 0 slt $s2, $s4, $s5 beq $s2, $zero, ELSE add $s2, $zero, $s4 add $s4, $zero, $s5 add $s5, $zero, $s2 j InternalLoop #If B Temp > A Temp, $s2 = 1 #If $s2 = 0, jump to ELSE # # Swap A Temp and B Temp # # Jump to Internal Loop sub $s4, $s4, $s5 j InternalLoop # Temp A = Temp A - Temp B # Jump to InternalLoop ELSE: ExternalLoop: beq $s4, $s3, found add $s1, $s1, $s3 add $s4, $zero, $s0 add $s5, $zero, $s1 j InternalLoop # # # # # If TempA = 1, Jump to Found Add 1 to B Load A ($s0) into Temp A ($s4) Load B ($s1) into Temp B ($s5) Jump to InternalLoop found: #display B somehow #j main From this program we created a list of registers and instructions we will need as of now (we are still waiting on interrupts to be taught in class). Register Specifications Register Name Number $zero 0 The Constant 0 Usage $at 1 Reserved for Assembler $v0 2 Results of a Procedure $a0 $a1 $t0 3 4 5 Argument 1 for a Procedure Argument 2 for a Procedure temporary (not preserved across call) $t1 6 temporary (not preserved across call) $t2 7 temporary (not preserved across call) $t3 8 temporary (not preserved across call) $s0 9 saved temporary (preserved across call) $s1 10 saved temporary (preserved across call) $s2 11 saved temporary (preserved across call) $s3 12 saved temporary (preserved across call) $s4 13 saved temporary (preserved across call) $sp 14 stack pointer $ra 15 return address Assembly Language specifications Category Instruction Arithmetic Add Subtract Logical Data Transfer Example add $s1, $s2, $s3 sub $s1, $s2, $s3 Meaning Comments $s1 = $s2 +$s3 Regular Addition $s1 = $s2 - $s3 Regular Subtraction Shift Left logical sll $s1, $s2, 10 $s1 = $s2 << 10 Shift Left by Constant Shift Right Logical srl $s1, $s2, 10 $s1 = $s2 >> 10 Shift Right by Constant Load Lower Immediate lli $s1, 10 $s1 = 10 Loads constant into Lower 8 bits Load Upper Immediate lui $s1, 10 $s1 = 10 * 2^8 Conditional Branch lw $s1, 10($s2) sw $s1, Store Word 10($s2) beq $s1, $s2, Branch on Equal $s3 Branch on Not bne $s1, $s2, Equal $s3 slt $s1, $s2, Set on Less Than $s3 $s1 = Memory[$s2 + 10] Memory[$s2 + 10] = $s1 if($s1 == $s2) goto $s3 if($s1 != $s2) goto $s3 if($s1 < $s2) goto $s3 Loads constant into Lower 8 bits Word from Memory to Register Word from Register to Memory Equal test – instruction address in $s3 Not equal test – instruction address in $s3 Compare Less Than – instruction address in $s3 Unconditional Jump Jump j $s1 goto $s1 Jump to address Jump and Link jal $s1 $ra=Current, goto For procedure call $s1 Load Word After the initial assembly language specification was created, the next logical step was to invent some sort of machine language specification. The first thing to be decided upon was an op-code. Deciding how many bits this was to be lead to a series of massive arguments ultimately leading to the defenestration of one of the members. Maybe it wasn’t quite that severe; however deciding on the number of bits was difficult. A 3 bit op-code would allow 13 bits for the remaining instructions however it only allowed for a total of 8 instructions. A 4 bit op-code, on the other hand, allowed for a total of 16 instructions, but would only allow 12 bits for the remaining instructions. After realizing that forcing the algorithm into a mere 8 instructions would prove to be very difficult, we decided on a 4 bit op-code. Since the machine language needs a register address the next step was deciding on how many bits it requires. This time the only argument was between Mike and himself. Apparently there were some fists thrown and some name calling, but there were no witnesses or concrete evidence of this. In the end the register address became a 4 bit number allowing Lobster the ability to accommodate 16 registers for your programming convenience. The machine code then split itself into three different instruction formats, each with a distinct purpose. The first form begins with a 4 bit op-code, followed by 3, 4 bit register addresses. Type A Op-Code 4 bits Register Address 4 bits Register Address 4 bits Register Address 4 bits This format is used for any instruction that requires 3 registers (i.e. add $s1, $s2, $s3). The second instruction format is for dealing with immediate values. This format begins with the 4 bit op-code, followed by a 4 bit register address, and then lastly an 8 bit value. Type B Op-Code 4 bits Register Address 4 bits Misc. Value 8 bits This format is for loading an immediate value into a register (i.e. lw $s0, 5). The third and last instruction format is used for jumping. It begins with the 4 bit op-code, followed by a register address, and then ended with 0’s. Since jumping only requires a register address, the remaining bits are not needed, therefore will remain 0’s. Type C Op-Code 4 bits Register Address 4 bits Unused 8 bits (0’s) While Mike was coming up with the above information, I (Brandon) began construction on the webpage. Since boring web pages are boring, I decided to make ours slightly more interesting to read. The navigation bar on the left will hold links to all the milestones as well as links to the design journal and assembly language/machine language specifications, and of course any other relevant information that we deem important enough to add. September 23rd, 2003 (Day 2) Today was slightly less eventful than Day 1. The major accomplishment was modifying the program. The initial program that was written was in MIPS however was written as if it were to receive an input from a user. As of now we do not know how to give PCSPIM an input or if it is even possible. I (Brandon) suggested modifying the program so that it would run in PCSPIM. This is made possible by simply defining a number to the input. When the program was run it was found to be slightly flawed. Mike then rewrote the program so it ran correctly. By doing this we have successfully tested that our algorithm works perfectly and is ready to be converted to our assembly and machine languages. Mike also created the machine language specification for Lobster today. Command Add Subtract Shift Left Logical Shift Right Logical Load Lower Immediate Load Upper Immediate Load word Store word Branch on Equal Branch on Not Equal Set on Less Than Jump Jump and Link Type A A A A B B A A A A A C C Op-Code 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 The machine language specification is basically an extensive of Day 1’s instruction format specification. Op-Codes are assigned to each instruction, and the types correspond to the 3 different instruction formats created earlier. There were minor updates to the web page today as well. The link to milestone 1 now has a page to go to. It includes the assembly language and machine language specifications. Most attempts to figure out interrupts have failed, so we are still waiting to go over them in class. September 24th, 2003 (Day 3) Interrupts were taught today! And now that we know them, we realize that there are many changes that need to be made to the design. The goal today is to complete milestone 1. Before the changes that involve interrupts are made, Mike made changes that make our assembly language easier to implement, however harder to code. But since we aren’t being grading on the ease of programming our processor, rather how well it works, the changes are necessary. One of the easiest things to change to make the processor easier to implement is limiting immediate values. There are currently four instructions that use immediate values that can be changed to use registers instead. - sll now is of type A, instead of using an immediate value to determine shift, the value must first be stored in a register. - lw now is of type A, the loaded value is taken from a register - sw now is of type A, the value is taken from a register The jump instruction format which used to have the op-code followed by the register address has been modified so the register address is at the end. This was done because of similar styles found in format A. “bne” and “beq” have the register address that it is jumping to as the last four bits. We decided, as of now, that it would probably be simpler to implement if every instruction that required jumping has the address that it is jumping to as the last four bits. This way we may be able to create a generic jump system that works with all instructions that require jumping. Along with these minor changes came some rather major changes from the original design. The “srl” command was removed to make room for an “or” command. The “or” command is used when the user enters a 16 bit number. Since the user can only enter 8 bits at a time, we need to take the first 8 bits store them in a register, shift it over 8 bits and then “or” the 2nd 8 bits into the same register. The result is a 16 bit number stored in a particular register. There had to be a few instructions added when we began dealing with I/O and interrupts. The chart below illustrates the new instruction set and corresponding opcodes. A-Type Instructions Instruction ADD RD, RS, RT SUB RD, RS, RT OR RD, RS, RT SLL RD, RS, RT LW RD, RS(RT) SW RD, RS(RT) BEQ RD, RS, RT BNE RD, RS, RT SLT RD, RS, RT DISP PORT, RS OP-Code 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 Instruction LUI RD IMM LLI RD IMM OP-Code 1010 1011 Instruction J RT JAL RT MASKI RFI OP-Code 1100 1101 1110 1111 L-Type Instructions J-Type Instructions The program was rewritten today to deal with interrupts. According the professor, everything seemed to be correct. # # # # # # # # # # # File: Project.asm Written by: Michael Kuehl, 9/25/2003 Euclid's Algorithm Register usage $zero - 0 $s0 - A $s1 - B #s2 - three uses: 1) flag to start # # # # # # # # $s3 $t0 $t1 $t3 $v0 $v1 2) flag for a < b 3) temp storage - The constant 1 - A (temp) - B (temp) - address holder - Interupt input - Display Register init: la $t3, main add $s1, $zero, $zero lli $s1, 2 add $s3, $zero, $zero lli $s3, 1 add $s2, $zero, $zero # B ($s1) starts at 2 # The Constant 1 main: # $s0 = number entered by user beq $s2, $zero, $t3 # If Start flag == 0, don't start beq $s0, $zero, $t3 # If user input is 0, start again MaskI # Keeps interupts from happening for now add $s4, $zero, $s0 # Load A ($s0) into Temp A ($s4) add $s5, $zero, $s1 # Load B ($s1) into Temp B ($s5) InternalLoop: la $t3, ExternalLoop beq $s5, $zero, $t3 equals 0 la $t3, ELSE slt $s5, $s4, $t3 add $s2, $zero, $s4 add $s4, $zero, $s5 add $s5, $zero, $s2 la $t3, InternalLoop j $t3 ELSE: sub $s4, $s4, $s5 la $t3, InternalLoop j $t3 ExternalLoop: la $t3, found beq $s4, $s3, $t3 add $s1, $s1, $s3 add $s4, $zero, $s0 add $s5, $zero, $s1 la $t3, InternalLoop j $t3 #Jump to ExternalLoop if Temp B ($s5) #If B Temp > A Temp, $s2 = 1 # # Swap A Temp and B Temp # # Jump to Internal Loop # Temp A = Temp A - Temp B # Jump to InternalLoop # # # # If TempA = 1, Jump to Found Add 1 to B Load A ($s0) into Temp A ($s4) Load B ($s1) into Temp B ($s5) # Jump to InternalLoop found: add $v0, $zero, $s1 # Puts B (The Rel. Prime Number) into display register disp $v0, PORT # Displays $v0 add $s2, $zero, $zero MaskI # Lets Interupts happen again j main # Jumps back to main InputIntr: sll $s0, $s0, 3 or $s0, $s0, $v0 add $v1, $zero, $s0 disp $v1, PORT # # # # Shifts A over by 8 bits loads $v0 into the lower 8 bits Puts A into $v0 Displays $v0 RFI # Returns to program StartIntr: lli $2, 1 RFI # Sets start flag to 1 # Returns to program The next step after completing the assembly language program is to convert it all to machine code. While I (Brandon) wait for this long process to be completed, I jumped back on the webpage and created links to all the files that are offered thus far. I also updated the current Milestone 1 page to the design we have now. We also decided to try and create an assembler for our assembly language. This may not happen depending on how much time we have. We did agree on the fact that it would be neat. During the process of converting the assembly language to machine code, Mike noticed the need for some specialty registers. These include IAT0, IAT1, IAT2, and IEB. IAT0 is connected to the input interrupt, which is called when a user inputs a number. IAT1 is the start interrupt. This is called when the user has already entered the number and wants to begin computations. IAT2 is the display port, and IEB is the interrupt enable bit. When IEB is 1 you can’t have interrupts, when it is 0, interrupts are enabled. Register IAT0 IAT1 IAT2 IEB Address 0000 0001 0010 1111 The address can be the same as regular addresses because when these are called, the program knows to invoke specialty registers over regular ones. Just for kicks, here is the pure machine code conversion of our program as of today. # File: Project.asm # Written by: Michael Kuehl, 9/25/2003 # Euclid's Algorithm # # Address Instruction 0000000000000000 1010100000000000 0000000000000010 1011100001100100 0000000000000100 1001000010000000 0000000000000110 1010100000000000 0000000000001000 1011100001110000 0000000000001010 1001000110000000 0000000000001100 1010100000000000 0000000000001110 1011100000011010 0000000000010000 0000101000000000 0000000000010010 1011101000000010 0000000000010100 0000110000000000 0000000000010110 1011110000000001 0000000000011000 0000101100000000 0000000000011010 0110101100001000 0000000000011100 0110100100001000 0000000000011110 0000000000100000 0000000000100010 0000000000100100 0000000000100110 0000000000101000 0000000000101010 0000000000101100 0000000000101110 0000000000110000 0000000000110010 0000000000110100 0000000000110110 0000000000111000 0000000000111010 0000000000111100 0000000000111110 0000000001000000 0000000001000010 0000000001000100 0000000001000110 0000000001001000 0000000001001010 0000000001001100 0000000001001110 0000000001010000 0000000001010010 0000000001010100 0000000001011000 0000000001011010 0000000001011100 0000000001011110 0000000001100000 0000000001100010 0000000001100100 0000000001100110 0000000001101000 0000000001101010 0000000001101100 0000000001101110 0000000001110000 0000000001110010 1110000000001111 0000010100001001 0000011000001010 1010100000000000 1011100001000100 0110011000001000 1010100000000000 1011100000111100 1000010101101000 0000011100000101 0000010100000110 0000011000000111 1010100000000000 1011100000100100 1100000000001000 0001010101010110 1010100000000000 1011100000100100 1100000000001000 1010100000000000 1011100001011000 0110010111001000 0000101010101100 0000010100001001 0000011000001010 1010100000000000 1011100000100100 1100000000001000 1001001010100000 0000101100000000 1010100000000000 1011100000011010 1110000000001111 1100000000001000 0000011100000000 1011011100000011 0011100110011011 0010100110010010 1001001010010000 1111000000001111 1011101100000001 1111000000001111 And now I (Brandon) will take all of the files that are ready to be submitted as Milestone 1 and make them available online through the Lobster webpage. The marks the completion of Milestone 1, pending any changes the professor may suggest. October 28th, 2003 (Day 4) – Milestone 2 Milestone 2 begins with the assignment of tasks. The sixteen instructions that Lobster offers were split up today among the four members of our group. Each member was to take their corresponding instructions and convert them into Register Transfer Language. Cannaday: lui lli sll sub Kuehl: or assert maski rfi Toth: add j sw lw Turner: ben beq slt jal That’s it for today. Hopefully we all make sure to take into account that Lobster specifications are different from MIPS. October 3rd, 2003 (Day 5) Today was the meeting with professor, as well as the day we decided to have our RTL specs finished so the group could review one another’s work. The professor comments were encouraging; however the RTL specs were not. It seems that we apparently do not know the machine language specs of our own processor. I won’t mention names, but one member switched register addresses, and another created a RTL for a 32 bit processor. Eventually we got everything all straightened out, and the two members took their RTL specs back to be repaired. They will be emailing them to me (Brandon) and Mike to be compiled in the Milestone 2 docs. We decided that I would continue to write the design journal (what fun) for consistency. Mike will be creating the finished RTL specification for turn in, and I will continue updating the webpage as necessary. October 6th, 2003 (Day 6) Crunch day! All the specs have been emailed and this is the day we begin combining everything into a finished product. There are errors that need to be fixed within all of our RTL specs, but most are minor and were easy to repair. Scott was (I guess) still a little confused over the difference in Lobster specs vs. MIPS. He combined Lobster and MIPS and used Lobster addresses in a MIPS style RTL. I accidentally used an 8 bit register address, instead of a 4 bit address in LUI (oops), but all of these are fixable. A proposed addition to the webpage was made today, from Mike to me. Since the RTL specs can be nicely converted into a flow chart, why don’t we put a fully animated flow chart of the entire RTL on the webpage? After much mental anguish over how tedious such a task will be, I decide to do it. So look at the website for an animated RTL, it will be neat. (3 hours later) The flow chart is done, and the Design Journal decided to corrupt itself, so I had to reformat the last few days. All the docs are done, and linked on the webpage. It seems that this milestone is done, hazaa! Since the RTL specs will take up a few pages, I decided to leave them out of the design journal. All that’s left is for all of this to be committed to CVS for your grading enjoyment. October 10th, 2003 (Day whatever) Today we decided it was way too nice outside to spend it in our dorm rooms, so we all met in Olin to stare at a white board for two hours and design our data path. This time Scott and Andrew were the two that were going to do most of the work. After the data path and state diagrams were done being drawn on the boards, they were to convert them into some digital form. I (Brandon) was at the white board, Mike was at my laptop reading off instructions, Andrew was taking pictures for future reference, and Scott was taking notes of the state diagram. After 2 hours, 3 markers, and 20 feet of dry erase board later, they are all done. All that’s left to do now is stand back and marvel in our genius before we leave it to the weekend cleaning crew to destroy. October 12th, 2003 (Day whatever + 1) As usual, Mike and I will be taking everything and compiling it into a finished Milestone. Today we received and reviewed the Flash animations of the Data Path and State Diagram. After going through each instruction, we found that there were some minor changes that needed to be made to the data path. The data path is now essentially finished and ready to be turned it. We also received our grade for Milestone 2, which suggested some changes to the RTL. The most important of which was the ability to remove one clock cycle from jal, beq, slt, and bne. For RFI we noticed we were trying to read and write to the IT register at once, and we can’t do that for what it needs to do. Therefore, another clock cycle was needed. This will make adjusting the state diagram tomorrow fun. October 13th, 2003 (Day whatever + 2) This is the exciting day referred to earlier as crunch day. As promised, there were lots of changes to the state diagram. It didn’t help that in our meeting we named each register a letter, and now we have to change them all to something that reflects what it actually does. For example, mux “n” is now called “PCMux.” So after an hour of modifying the state diagram it’s time to take a break and go to the SRC. We’ll be back in 45 minutes. The state diagram is now adjusted to reflect all changes. There was one change that needed to be made to the data path. The input to the ITDataMux that was originally a 1 needed to be changed to a 0. The webpage contains the Flash animations of both the data path and the state diagrams. Using Flash makes reading them much easier, since you can zoom in and out and move the image around to look a specific part closer. And of course, for your viewing enjoyment, here are the images of the data path and state diagram. I don’t know if you remember me saying that we would have liked to create an assembler for Lobster assembly language, but hopefully you do not. It’s not likely to happen. The Milestones seem to be getting closer and closer together and there just isn’t the required time available to create the program. It seems that all the documents are nearly finished, so this ends Milestone 3. October 22th, 2003 (Day whatever + 12) Well, for a change of pace for this Milestone the design journal shall be done by me, Andrew Toth. The team assignments for this milestone split nicely among the other members of the group so I will be taking care of this. Although I was initially assigned to make the “sll” component, Mike and I had a fierce battle over who would actually do it. Before it got too violent I decided to let him do it. Seeing as how I am the only one in our group who gets to experience the amazing joys of the Sophomore Engineering curriculum this quarter I thought he deserved to have a little fun too. That and the fact that he went ahead and did it last night and I did not, so therefore he got to do it. So tonight I simply have to update the design journal. Brandon is working on writing our assembler. We decided it would be a good idea to make one after all since it was our group who did suggest it and we want those 10 extra points. He will also do some touch up and administration of the webpage once everything is gathered from the group. Mike has spent countless hours glued to his laptop and Xilinx and has created and tested the components for registers A, B, and C, the PC, the IR, the Interrupt Control, the ALU, the Register file, the IT register file, and as mentioned above, the SLL component. There were a few delays when his index finger cramped up and he may need carpal tunnel release surgery but all is well for now. While still at it he has also made documentation of the testing of all the components. Scott has been busy working on translating the state diagrams for the control into Xilinx, and will be including the minor changes needed to catch up with our latest modifications. I sadly have only done this humble design journal, but most of the other work was done even before I asked how the work was being divided up for this part of the project, so oh well. Perhaps some screenshots or the like will be added to this later. October 23rd, 2003 Today hasn’t been a productive day, and this is Brandon again. This morning at about 1:41am, XiLinx completely destroyed Mike’s computer while he tried to add the HDL file to the Lobster project. It turns out that any computer this is tried on is instantly bluescreened. The goal for today is to complete the milestone, but with the current computer difficulties, this may not be doable. The way around this glitch (hopefully) is to create another file and add the HDL file of the control first, and then add all the other Lobster files after it. As it stands right now, everything is done except testing the control. There were some changes made to the RTL because interrupts are finally understood, and due to some advice of the processor Mike was successfully able to add the HDL file to the project. To do this, he created a blank file and added it, and then copied and pasted the actual HDL code to the new file. The webpage is updated. Unfortunately, this milestone did not let me make any neat graphics to look at. I tried; however getting high quality images out of XiLinx is quite difficult. At this point the assembler can convert all A-type and L-type instructions to binary and decimal equivalents. I will continue programming hours on end to complete it for next Wednesday. This milestone is done. November 7th, 2003 The project is almost done, however the long process of making sure each instruction works correctly has just begun. For this milestone, the work split-up is as follows. Mike is working diligently in XiLinx debugging the processor. Andrew is writing up the final report. And I (Brandon) will be making the presentation for presentation day. Scott, I haven’t seen in class for a couple of days, so no assignment for him has been made yet. The presentation will be created in Flash, since it allows more flexibility than PowerPoint. The final report will set a new standard by which literature is written, and Lobster will be faster than any processor currently on the market. We’re hoping to have the processor fully working by Monday, so as to show it to the professor for review. It is also our goal to implement this on the chip emulator thing that takes our processor and pretends to be it, or whatever it does. This is of course dependant on how long Mike wishes to stare at XiLinx in order to make it work. Today he successfully completed the implementation of lli, add, and assert. In order to get these instructions to work, changes needed to be made to the existing RTL, and logic of the processor. The changes to the RTL included changing which clock cycle the PC was incremented to clock cycle 3. This was done because if it was incremented in clock cycle 2, as it was, the instruction that was supposed to be implemented was overwritten by the instruction at PC + 3. The timing was such that the PC was done being incremented while the IR was still reading the value for the instruction. The changes inside Lobster included the following. Temporary registers A and B have an inverted clock, so they read on the other side of the clock cycle. This was done because A and B were not getting their values soon enough. A and B now get their values a half clock cycle earlier, which makes the ALU slightly happier than before. The Control Unit had a pretty big mistake when dealing with selecting the Reg1Mux value. It was set to select input 2 instead of 1. Other than that, the changes were simply random simple mistakes. The presentation made a little process today as well. I have decided to start with discussing our machine language specification. Such as instruction formats, opcode lengths, blah, blah, blah. I have created the frames necessary for whoever will be discussing this section. November 8th, 2003 Today is a continuation of the debugging from yesterday. Hours and hours of Mike staring at his computer screen may have caused him to go temporarily insane, however the thought of another 5 or so instructions working may help him a little. I was writing small programs and then using the assembler, which I modified so it outputs in the format needed for the RAM, to compile them. These programs were then inputted into Lobster and slowly we were able to get it to the point where it was able to find the gcd of 2 numbers. To get this to work, each instruction had to be debugged one and a time. “Sub” required the entire adder to be redone. The ALU that we built was a little too buggy to be used, therefore we switched to a generated ALU from XiLinx. Doing this was more difficult than that sentence implied, but needless to say, it was difficult (according to Mike). After the switch, each instruction that was working yesterday had to be retested. “OR” just worked after the new ALU was added, therefore nothing was changed. “SLL” required a change to the control unit state diagram. The CMux was set in one state; however it was not saved when going to the next state. This caused CMux to be reset to 0. It could not explicitly be reset in the following state since there were other instructions branching to it; therefore SLL had to be separated out. “SW”, “LW”, and “MASKI” just worked all by themselves. “JAL” had a pretty interesting problem. What was happening was the linking part was coming after the jumping part. So when it was storing the address in $ra, it was the address it was jumping to instead of the address of where it was currently at. This was solved by adding a state in the control unit which linked before it jumped. “BEQ” needed another state as well. This time the state’s only purpose was to waste time. This was because the values being compared weren’t being loaded into the registers before they were being compared. We needed give the registers more time to load the values. “BNE” had the same problems as “BEQ”, plus the ALU didn’t stabilize the ALU output in time for the PC to be written. So we had to add another waiting state to allow time for the ALU to process the inputs. “SLT” again had the same problems as “BNE” and “BEQ”. Much like “BNE”, a waiting state had to be added since the ALU did not stabilize itself in time for the PC to be written. After all of these instructions were found to work, I started writing programs while Mike implemented them. The first program that was written was just a very simple add and load program that then asserted a value to the display ports. After this program ran successfully, I modified it with some jumping. This program also ran successfully. We hit a problem when I did some pretty wild jumping and jump and linking. Apparently my assembler hit a glitch (I still can’t explain it) that caused it to load label addresses wrong. Once we found the problem, through very exhaustive step-by-step debugging, the program ran successfully. The next step was to try Euclid’s Algorithm in Lobster. This failed pretty badly. I then went through and found the way it was currently written got itself into an infinite loop. We decided we would be happy just if the processor could find the gcd of any two numbers. So I wrote the algorithm in Lobster necessary to do this. After some minor problems in the program were fixed, like branching when a<b instead of a>b, the processor was successfully able to run this program. It calculated the gcd of any numbers we were able to throw at it. Just for kicks, we decided to calculate what the clock speed of Lobster is. Since the high and low times of the clock are 1250ps, or 1.25ns, the total clock cycle time is 2.5ns. Using the equations learned in class, and the magic of Maple, we calculated the clock speed to be 400 MHz. November 9th, 2003 The processor works today! Now let’s discuss how it got there. Again, Mike worked hour after hour debugging and debugging to make this miracle happen. The only untested instructions that needed to work were “RFI”, “IATO”, and “IAT1”. Apparently the only major change needed to get these three instructions working was the inversion of the clock input on the ITReg file. Other than that all that was required was minor tweaking, like WIB had to be reset during the last cycle on the interrupt, not the first, and values had to be reinitialized through each state, since XiLinx does not automatically carry over values. Mike modified the program I wrote yesterday to now find the lowest relative prime number to a given input. This time the program uses procedures, unlike our very first program. For some reason the assembler now works fine (still can’t explain it). As far as we can tell, the processor is working just fine. The only problem is the algorithm takes a long time when checking large numbers, so testing is becoming a more and more difficult process. We tested the number 2*3*4*5*6, which equals 120. The lowest relative prime number to this number will be seven; however the number of clock cycles needed to calculate this is 14865. This takes a long time to render in ModelSim. We can now see why this program was chosen to test our processors; it just about the simplest program that takes the most about of time. With a little work, the program was modified to remove eight instructions per iteration. Now the relative prime can be found in 7163 clock cycles. A Whole Bunch of Days all in One Glorious Section After the processor worked, the last thing was just the compilation of documents, creating the final presentation, and final report. However, since we completed it so early it was our responsibility, as good citizens, to implement it in hardware. This was so much harder than typing the four words to describe it. The first gigantic undertaking was figuring out which pen was which. We were so kindly provided with the pin configuration of the FPGA, and that of the LED board, but there was no document describing the communication between them. So after six hours of work, I was never so excited to see a number display on an LED. The next problem was that of displaying multiple numbers. For some unbelievably stupid reason, the LED board didn’t have inputs for each separate LED. Therefore, I had to refresh through all four at 4ms each. Getting it to do this was another complicated process. I went and talked to the professor and after a couple of hours it was decided that to do this we had to do: Slow the clock speed by counting to 2^14 Use the 14th bit as the clock to a 2 bit counter The output of the counter then cycled through a multiplexer which selected each of the LED at 4ms each The reason I couldn’t cycle through at clock speed was that the LED act like capacitors. They need time to charge and discharge before lighting again. The result of 50 MHz cycling was a faint 8 on each LED. The reason for the 8 instead of the input was that it was cycling faster than the multiplexer could select a 7 bit value for the LED. After I got the displays to fully work, I made a sub circuit that took in a 4 bit number and outputted the correct 7 bit LED signal. The next step was making a simple program work. I was able to put a counter on it that counted down if a switch was enabled, and up when it was disabled. Nothing happened when it hit zero, it just started over. Of course, all of this eventually led to nothing. Our processor only works in a specific range of clock speed. The FPGA runs at 50 MHz, and our processor, for some unknown reason, does not work at these speeds. It’s a sad day for Comp Arch kind, a sad day indeed.
© Copyright 2026 Paperzz