Thesis_Presentation - CSE

Asynchronous 8051 Microcontroller Presentation
By:
Ryan Mabry
April 18, 2005
Agenda
•
•
•
•
•
•
•
•
8051 Background
Motivation
Architecture
Design Flow
Design Implementation
Results
Challenges
Conclusion
8051 Background
•
•
•
•
•
Developed by Intel in 1980
Widely used in embedded systems
Very popular after 25 years on the market
Official 8051 family designation is MCS 51
Based on Harvard Architecture –
Separate memory for instructions and data
ROM stores program instructions
RAM stores program data
8051 Background Continued
• 8051 Predecessor was the 8048
Used in IBM’s first PC keyboard
• Enhanced version of 8051 is 8052
Increased Internal Memory Capacity
Additional Timer
More Registers
Motivation
• Project is based off VHDL synthesizable 8051 model
developed by University of California’s Dalton Project
(http://www.cs.ucr.edu/~dalton/8051)
• Two Goals
A) Develop asynchronous 8051
B) Use synchronous design tools in the process
• Asynchronous Advantages
A) Lower Power Consumption
B) No clock skew
Motivation Continued
• Asynchronous Disadvantages
A) No complete design solution tools
B) No global clock: communication must be done
through handshaking or other methods
C) Must ensure timing and data integrity when using
asynchronous communication methods
Synchronous Architecture
Clock
rst
td
Op-code
I8051_DEC
I8051_ALU
addr
ip
Src-1
Src-2
Src-3
Carry-in 1 & 2
des
Carry-out 1 & 2
Overflow
I8051_ROM
data
I8051_CTR
td
wr
addr
Is-bit-addr
data
Data-bit
I8051_RAM
ALU-Op-code
(Rd, wr, addr, data_out, data_in)
Ports
Asynchronous Architecture
rst
td
Op-code
I8051_DEC
I8051_ALU
addr
ip
Src-1
Src-2
Src-3
Carry-in 1 & 2
I8051_ROM
data
I8051_CTR
td
wr
addr
Is-bit-addr
data
Data-bit
des
Carry-out 1 & 2
Overflow
I8051_RAM
ALU-Op-code
ALU
Wrapper
Clock
req
ack
Clocking
Element
CTR
Wrapper
(Rd, wr, addr, data_out, data_in)
Ports
Architecture Differences
• Clock is generated onboard asynchronous 8051
• Clock is stopped while controller waits for ALU to complete
an operation
- Implemented through handshaking signals
generated by ALU and Controller Wrappers
• No excess cycles in asynchronous controller
- Defined in synchronous version as clock cycles
where controller is doing nothing and waiting for
ALU to complete an operation
Asynchronous Design Flow
Functional Simulation
Synthesis of
Synchronous Blocks
Timing
Analysis
Asynchronous Wrapper
Design
Timing Simulation
Asynchronous Design Flow Continued
• Functional Simulation – Verify Functionality Of Design
A) Standard VHDL Compilers cannot synthesize
VHDL code that implements asynchronous logic
B) This project used Modelsim
C) Compare controller registers, memory contents
and instructions executed in asynchronous and
synchronous versions – verify to be the same
• Synchronous Block Synthesis – Synthesize synchronous
parts of both 8051 microcontrollers. This project used
Ambit Buildgates.
Behavioral Code -> Verilog Netlist
Asynchronous Design Flow Continued
• Timing Analysis – Generate Delay Numbers
A) Cadence Encounter generates parasitics for circuits
B) Use Synopsys Primetime for critical path analysis
C) Import parasitics and verilog netlist into Primetime
D) Remove successive ALU Operations to get delay
numbers
IE: Remove division case from ALU to obtain
critical path delay for multiplication
E) Also generate critical path numbers for RAM, ROM,
decoder, and controller modules
Asynchronous Design Flow Continued
• Asynchronous Wrapper Design
A) Implement delay elements for wrappers in
Cadence Composer schematic editor
B) Combinational logic elements in wrappers
can be designed in VHDL code and then imported
C) Wire two parts together in schematic
• Timing Simulation
Unable to test implementation of asynchronous design
since university does not have post-synthesis timing
simulator installed.
Design Implementation - Handshaking
• Controller needs ALU Operation to be performed:
A) Assert request line
B) Stop Clock
• Once ALU Operation is finished:
A) Assert acknowledge line
B) Start Clock
Req+
• Deassert request Line
Stop Clock
• Deassert acknowledge line
Ack+
Start Clock
Ack-
Req-
Design Implementation – ALU Wrapper
Req
ALU
Opcode
Select
Logic
S2 S1 S0
Logical
Operations
Add
Subtract
Multiply
Divide
0
0
2to1
1 Mux
S0
1
2to1
Mux
S1
0
1
2to1 Ack
Mux
S2
Design Implementation – ALU Wrapper Continued
• Remove operations from ALU to obtain delay numbers
• Buffers used as building block for each delay element
- Delay of 114ps (Used 100ps to simplify design)
• Primetime was used for critical path analysis
• Apply 50% safety margin to initial numbers to account for
operating conditions – temperature changes and
voltage fluctuations
ALU Ops
Delay(ps)
Division
37000
Multiplication
15800
Add & Subtract
12800
Logical Operations 9000
ALU Ops
Division
Multiplication
Add & Subtract
Logical Operations
Buffers
163
30
38
90
Design Implementation – Controller Wrapper
ALU Op-code
Ack
CTR
Wrapper
Req
• Asserts request signal while controller is waiting for ALU
to complete operation
• Deasserts request signal once acknowledge signal from
ALU wrapper is received
• Implemented in VHDL code
Design Implementation – Controller Modifications
• Excess Cycles Eliminated
Example: ADDC_1 instruction takes 8 clock cycles
in synchronous controller and 6 clock cycles in
asynchronous controller
when OPC_ADDC_1 =>
case exe_state is
…
when ES_5 =>
exe_state <= ES_6;
when ES_6 =>
exe_state <= ES_7;
when ES_7 =>
SHUT_DOWN_ALU;
cpu_state <= CS_1;
exe_state <= ES_0;
end case;
Cycles ES_5 and ES_6 are excess cycles
Eliminated in asynchronous version
Design Implementation – Clocking Unit
req
ack
...
Inverter Chain
Clock
• When req=‘1’ and ack=‘0’ clock is stopped. Otherwise
behaves as a synchronous clock
• Length of inverter chain is longer than critical path in
RAM to avoid timing violations
• Critical path in RAM module is 30.9ns
• Since inverter has delay of 50ps, inverter chain must be
682 inverters long
Results
• Targeted VTVT standard cell library developed by Virginia
Tech VLSI for Telecommunications.
• Asynchronous 8051 consumes more area due to onboard
clock and wrappers. RAM dominates both chip areas
Asynchronous Cell Area: 72400
Synchronous Cell Area: 65662
• Divmul program on Dalton website used to roughly
benchmark designs in Modelsim
Asynchronous Simulation Time: 172,030ns
Synchronous Simulation Time: 221,390ns
• Asynchronous 8051 is roughly 28.7% faster while using
10% more area than synchronous version
Challenges
• Had to learn all of the different tools
A) Technical assistance was available for Ambit
Buildgates and Cadence Encounter
B) Resorted to user manuals and the Internet for
Synopsys Primetime
• Learned other tools not necessary to design flow
- Time spent learning Synopsys Design Analyzer and
Timemill could have been better spent in later stages
of design flow
Conclusion
• A lot of work to change existing synchronous design to
asynchronous design
• Use of synchronous design tools in asynchronous design
flow made process much easier
• Since no post-synthesis timing simulators are installed, it is
impossible to verify the correctness of the asynchronous
design
• I would like to thank Narender Hanchate for his time in
helping me learn most of the tools used in this project