Embedded-Systems-PMarwedel-3d-mem

Universität Dortmund
Embedded System Hardware
Embedded system hardware is frequently used in a loop
(„hardware in a loop“):
actuators
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 0-
Universität Dortmund
Summary
Processing units
 Power efficiency of target technologies
 ASICs
 Processors
• Energy efficiency
• Code size efficiency and code compaction
• Run-time efficiency
• DSP processors
• Multimedia processors
• Very long instruction word (VLIW) machines
• Micro-controllers
 Reconfigurable Hardware
 Memory
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 1-
Universität Dortmund
Memory
For the memory, efficiency is again a concern:
 speed (latency and throughput); predictable timing
 energy efficiency
 size
 cost
 other attributes (volatile vs. persistent, etc)
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 2-
Universität Dortmund
Access times and energy consumption
increases with the size of the memory
Example (CACTI Model):
"Currently, the size of
some applications is
doubling every 10
months"
[STMicroelectronics,
Medea+ Workshop,
Stuttgart, Nov. 2003]
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 3-
Universität Dortmund
Access times and energy consumption
for multi-ported register files
Area (l2x106)
Cycle Time (ns)
Power (W)
1.8
7
14
1.7
6
12
5
10
4
8
3
6
1.2
2
4
1.1
1
2
1
0
0
1.6
1.5
1.3
16
32
64
Register File Size
128
16
32
GP6M2
64
128
GP6M3
16
32
128
64
Register File Size
Rixner’s et al. model [HPCA’00], Technology of 0.18 mm
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 4-
Source and © H. Valero, 2001
1.4
Universität Dortmund
Reducing energy consumption with Sub-banking
30
1-Subbank
4-Subbanks
16-Subbanks
25
Energy [uJ]
shorter wires,
lower capacitances,
faster access
2-Subbanks
8-Subbanks
20
15
10
5
Memory Size (bytes)
http://www.ecse.rpi.edu/frisc/theses/
MaierThesis/FIG319.GIF
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 5-
2M
8k
16
k
32
k
64
k
12
8k
25
6k
51
2k
1M
4k
2k
1k
51
2
25
6
0
Universität Dortmund
How much of the energy consumption
of a system is memory-related?
Mobile PC
Thermal Design (TDP) System Power
Other
13%
Other
13%
600/500 MHz uP
37%
Power Supply
10%
600/500 MHz uP
13%
Power Supply
10%
Memory+Graphics
12%
HDD
9%
Mobile PC
Average System Power
LCD 10"
30%
Memory+Graphics
15%
LCD 10"
19%
HDD
19%
Note: Based on Actual Measurements
CPU Dominates Thermal
Design Power
[Courtesy: N. Dutt; Source: V. Tiwari]
Multiple Platform
Components Comprise
Average Power
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 6-
Universität Dortmund
„CPU“ Power Dissipation
EBOX
8%
DMMU
8%
Others
5%
Icache
26%
42%/40% again memory-related !
Clock
10%
IMMU
9%
Ibox
18%
Dcache
16%
TLB
17%
Data Flow
11%
I/O
7%
Clock
19%
Strong ARM
IEEE Journal of SSC
Nov. 96
Control L.
16%
PLA
5%
ROM
2%
Cache
23%
Power PC
Based on slide by and ©: Osman S. Unsal, Israel Koren, C.
Mani Krishna, Csaba Andras Moritz, University of
Massachusetts, Amherst, 2001
Proceedings of ISSCC 94
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 7-
Universität Dortmund
Access-times will be a problem
Speed gap between processor and main DRAM increases
Speed
• early 60ties (Atlas):
page fault ~ 2500 instructions
• 2002 (2 GHz µP):
access to DRAM ~ 500
instructions
 penalty for cache miss about
same as for page fault in Atlas
8
4
 2x
every 2
years
2
1
0
1
2
3
4
5
years
[P. Machanik: Approaches to Addressing the
Memory Wall, TR Nov. 2002, U. Brisbane]
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 8-
Universität Dortmund
Predictability is a problem (1)
Embedded system systems are often real-time systems:
 Have to guarantee meeting timing constraints.
Predictability: For satisfying timing constraints in hard realtime systems, predictability is the most important concern;
pre run-time scheduling is often the only practical means of
providing predictability in a complex system [Xu, Parnas]
Time-triggered, statically scheduled operating systems
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 9-
Universität Dortmund
Predictability is a problem (2)
Currently available caches don‘t solve the problem:
• Improve the average case behavior
• Use „non-deterministic“ cache replacement algorithms
Scratch-pad/tightly coupled memory based
predictability
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 10 -
Universität Dortmund
Hierarchical memories
using scratch pad memories (SPM)
Hierarchy
Example
main
Address
space
SPM
processor
0
no tag
memory
ARM7TDMI
cores, well-known
for low power
consumption
scratch pad memory
FFF..
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 11 -
Universität Dortmund
Comparison of currents using measurements
E.g.: ATMEL board with
ARM7TDMI and
ext. SRAM
Current
32 Bit-Load Instruction (Thumb)
200
mA
150
100
116
77,2
82,2
1,16
53,1
50
48,2
50,9
44,4
Prog Main/ Data
Main
Prog Main/ Data
SPM
Prog SPM/ Data
Main
0
Core+SPM (mA)
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
Prog SPM/ Data SPM
Main Memory Current (mA)
- 12 -
Universität Dortmund
Comparison of energy consumption
Example: Atmel ARM-Evaluation board
Main memory access takes more cycles
savings (86%) larger than for current.
energy reduction:
/ 7.06
Energy
10 nJ
32 Bit-Load Instruction (Thumb)
140,0
120,0
100,0
80,0
60,0
40,0
20,0
0,0
115,8
76,5
51,6
16,4
Prog Main/ Data Main
Prog Main/ Data SPM
Prog SPM/ Data Main
€
Prog SPM/ Data SPM
Energy
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 13 -
Universität Dortmund
Why not just use a cache ?
1. Predictability?
Worst case execution time
(WCET) may be large
[P. Marwedel et al., ASPDAC, 2004]
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 14 -
Universität Dortmund
Why not just use a cache ? (2)
2. Energy for parallel access of sets, in comparators, muxes.
.
9
8
Energy per access [nJ]
7
6
Scratch pad
5
Cache, 2way, 4GB space
4
Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
3
2
1
0
256
512
1024
2048
4096
8192
16384
memory size
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
[R. Banakar, S. Steinke, B.-S. Lee, 2001]
- 15 -
Universität Dortmund
Influence of the associativity
Parameters different from
previous slides
[P. Marwedel et al., ASPDAC, 2004]
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 16 -
Universität Dortmund
Flash Memory
Based on EPROM/EEPROM-Memory
EPROM storage cell
Row (j)
FG charged
FG not charged
Ground
Read amplifier
floating gate FG can be charged by high voltage.
Charged transistor non-conducting if row selected.
Discharging with UV light.
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 17 -
Universität Dortmund
EEPROMS storage cell
EEPROM= electrically erasable read only memory.
EEPROM needs additional transistor per bit
Row (j)
FG charged
FG not charged
Ground
Writing and erasing requires just electrical signals
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 18 -
Universität Dortmund
NOR- and NAND-Flash
contact
contact
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
[www.samsung.com/Products/Semiconductor/
Flash/FlashNews/FlashStructure.htm]
NOR: Transistor between bit line and ground
NAND: Several transistor between bit line and ground
- 19 -
NOR
NAND
Random access
Yes 
No 
Erase block
Slow 
Fast 
Size of cell
Larger 
Small 
Reliability
Larger 
Smaller 
Execute in place
Yes 
No 
Applications
Code storage, boot
flash, set top box
Data storage, USB
sticks, memory cards
[www.samsung.com/Products/Semiconduc
Properties
of NORand NANDFlash
memories
Type/Property
tor/Flash/FlashNews/FlashStructure.htm]
Universität Dortmund
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 20 -
Universität Dortmund
Embedded System Hardware
Embedded system hardware is frequently used in a loop
(„hardware in a loop“):
actuators
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
For Impressive display technology see http://www.dateconference.com/conference/ 2003/keynotes/index.htm
- 21 -
Universität Dortmund
Digital-to-Analog (D/A) Converters
Various types, can be quite simple,
e.g.:
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 22 -
Universität Dortmund
Output voltage 
no. represented by x
Due to Kirchhoff‘s laws:
I  x3 

Due to Kirchhoff‘s laws:
Vref
R
Vref
R
 x2 
Vref
2R
 x1 
Vref
4R
 x0 
Vref
8 R
3
  xi  2i 3
i 0
V  R1  I '  0
Current into Op-Amp=0: I  I '
Hence:
Finally:
V  R1  I  0
 V  Vref
R1 3
R1
i 3
  xi  2  Vref 
 nat ( x )
R i 0
8 R
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 23 -
Universität Dortmund
Possible to reconstruct analog value from
digitized value?
Let fs be the sampling frequency
Input signals with frequency components > fs/2 cannot be
distinguished from signals with frequency components < fs/2.
Example: Signal: 5.6 Hz; Sampling: 9 Hz
1.5
1
0.5
0
-0.5
-1
-1.5
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 24 -
Universität Dortmund
Frequency spectrum of sampled signal
Let Xc(): frequency
spectrum of the
continuous signal, cutoff frequency ΩN=2π fN
Let Ωs=2π fs: sampling
frequency
Let Xs: frequency
spectrum of the
sampled signal
Xs consists of multiple
copies of Xc, separated
by Ωs
Cannot be distinguished in sampled signal
Formally: Xs = Xc folded with S,
with S: frequency spectrum of clock
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 25 -
Universität Dortmund
Aliasing
If Ωs < 2  ΩN copies of
spectrum will overlap
(we don’t know the
original frequencies any
more)
No problem for signal reconstruction if this is avoided.
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 26 -
Universität Dortmund
 Nyquist theorem
• Analog input to sample-and-hold can be precisely
reconstructed from its output, provided that sampling
proceeds at  double of the highest frequency found in the
input voltage. [Nyquist 1928, Shannon, 1949]
S/H
A/D-converter
D/A-converter
Interpolate
=?
Does not capture effect of value quantization:
Quantization noise prevents precise reconstruction.
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 27 -
Universität Dortmund
Embedded System Hardware
Embedded system hardware is frequently used in a loop
(„hardware in a loop“):
actuators
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 28 -
Universität Dortmund
Actuators and output
Huge variety of actuators and output devices,
impossible to present all of them.
Microsystems motors as examples (© MCNC):
(© MCNC)
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 29 -
Universität Dortmund
Actuators and output (2)
Courtesy and ©:
E. Obermeier, MAT,
TU Berlin
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 30 -
Universität Dortmund
Stepper Motor
 Stepper motor: rotates fixed number of degrees when
given a “step” signal.
 In contrast, DC motor just rotates when power applied.
 Rotation achieved by applying specific voltage sequence
to coils
 Controller greatly simplifies this
http://www.cise.ufl.edu/~prabhat/Teaching/cis6930-f04/comp4.pdf
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 31 -
Universität Dortmund
Summary
We have covered:
 sensors (briefly)
 sample-and-hold and A/D converter circuits
 communication
 information processing hardware
• ASICs (briefly)
• processors
• FPGAs
• memory
 D/A-converters
 actuators (briefly)
 P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6
- 32 -