advanced processor architectures and memory organisation

Digital Signal Processors (DSPs)
DSP
• Advanced signal processor circuits
• MAC (Multiply and Accumulate) unit (s) - provides
fast multiplication of two operands and
accumulating results at a single address.
• MAC unit computes fast an expression such as the
following summation. yn = Σ (ai × xn−i) where the
sum is made for i = 0, 1, 2, …, N-1. Here i, n and N
are the integers, ai is a coefficient, xj is
independent variable or an input element and yk
is the dependent variable or an output element.
DSP
• DSP processors invariably have Harward
architecture.
• Caches are organized in Harward architecture
separate I-cache and D-Cache
• Basic Units: Memory, Internal Bus, Data bus,
Address bus, control bus, Bus Interface Unit,
Instruction fetch register, Instruction decoder,
Control unit, Instruction Cache, Data Cache,
multistage pipeline processing, multi-line
superscalar processing for obtaining processing
speed higher than one instruction per clock cycle,
Program counter
DSP special structural units
•
•
•
•
Packed Data processing
Parallel Execution MAC units
Special Instructions
Instruction Packing unit
ADVANCED PROCESSOR ARCHITECTURES
AND MEMORY ORGANISATION
SHARC and Tiger SHARC
SHARC
• Processor from Analog Devices.
• SHARC stands for Super Harvard Single - Chip
Computer Super Harvard architecture means
more than one set of address and data buses for
data
• For example, set J of address and data buses for
data-memory space, set K of address and data
buses for data-memory space, and set I of address
and data buses for program memory space
SHARC functions
• Program memory configurable as program and
data memory parts (Princeton architecture)
• SHARC functions as VLIW (very large instruction
word) processor.
• used in large number of DSP applications.
• Controlled power dissipation in floating point
ALU.
• Different SHARCs can link by serial communication
between them
SHARC features
• Instruction word 48-bits
• 32-bit data word for integer and floating point
operations
• 40-bit extended floating point.
• Smaller 16 or 8 bit must also store as full 32-bit data.
Therefore, the big endian or little endian data
alignment is not considered during processing with
carry also extended for rotating (RRX).
• ON chip memory 1 MB Program memory and data
memory (Harvard architecture)
• External OFF chip memory
• OFF chip as well as ON-chip Memory can be
configured for 32-bit or 48 bit words.
Parallel Processing features
• Supports processing instruction level parallelism
as well as memory access parallelism.
• Multiple data accesses in a single Instruction
Addressing Features
• SHARC 32-bit address space for accesses
• Accesses 16 GB or 20 GB or 24 GB as per the word
size ( 32-bit, 40-bit or 48 bit) configured in the
memory for each address.
• When word size = 32-bit then external memory
configuration addressable space is 16 GB (232 × 4)
bytes
Word size features
• Provides for two word sizes 32-bit and 48-bit.
• SHARC two full sets of 16 general-purpose
registers, therefore fast context switching
• Allows multitasking OS and multithreading.
• Registers are called R0 to R15 or F0 to F15
depending upon integer operation configuration
or floating point configuration.
• Registers are of 32-bit. A few registers are special,
of 48 bits that may also be accesses as pair of 16bit and 32-bit registers.
TigerSHARC
• Highest performance density family of processors
from Analog Devices
• Precision high-performance integrated circuits
used in analog and digital signal processing
applications
• Designed for multiprocessing applications and for
peak performance greater than BFLOPS (billion
floating-point operations per second)
Example
• ADSP-TS203SABP-050 processor processes using
250 MHz clock and on chip memory of 6 M bits
and operates at 1.2V/3.3 V
• Low voltage design helps in processing with little
power dissipation.
• Analog Devices claims the highest performance
per Watt.
TigerSHARC
• Multiple TigerSHARCs can connect by serial
communication at 1 GBps.
• A TigerSHARC version has 24M bits ON-chip
memory.
• Two ALUs and twos set of address and data buses
for data memory
• On set of address and data buses for program
memory
• TigerSHARC is available as IP core also so that new
applications and enhancements can be
developed.
Arithmatic
Fn = Fx + Fy
Fn = Fx-Fy
Fn = ABS(Fx + Fy)
Fn = ABS(Fx-Fy)
Fn=(Fx + Fy)/2
COMP(Fx,Fy)
Fn = -Fx
Rn = Rx + Ry + CI
Add
Subtract
Absolute value of sum
Absolute value of difference
Average
Compare
Negate
Add with Carry
Arithmatic
Fn = ABSFx
Fn = PASS Fx
Fn = RND Fx
Fn = SCALE Fx by Ry
Rn = MANX Fx
Rn = LOGB Fx
Rn = FIX Fx,
Absolute value
CopyFx to Fn
Round
Scale exponent of Fx by Ry
Extract mantissa of Fx
Convert exponent of Fx to integer
Convert floating-point to integer
Fn = FLOAT Rx by Ry Convert integer to floating-point
Arithmatic
Fn = RECIPS Fx
Fn = RSQRTS Fx
Fn
Fn
Fn
Fn
=
=
=
=
Create seed for reciprocal
Create seed for reciprocal square
root
Fx COPYSIGN Fy Copy sign of Fy to Fx
Minimum of Fx, Fy
MIN(Fx.Fy)
Maximum of Fx, Fy
MAX(Fx,Fy)
Clip Fx within range [-Fy,Fy]
CLIP Fx by Fy
Logical
Rn = LSHIFT Rx by Ry
Rn = Rn OR LSHIFT Rx by Ry
Rn=ASHIFT Rx by Ry
Rn = Rn OR ASHIFT Rx by Ry
Rn = ROT Rx by Ry
Rn = BCLR Rx by Ry
Rn = BSET Rx by Ry
Rn = BTGL Rx by Ry
Logical shift distance Ry
Logical shift and logical OR
Arithmetic shift
Arithmetic shift and logical OR
Rotate distance Ry
Clear one bit in Rx
Set one bit in Rx
Toggle one bit in Rx
Logical
BTST Rx by Ry
Rn = FDEP Rx by Ry
Rn = Rn OR FDEP Rx by Ry
Rn = FDEP Rx by Ry
Rn = Rn OR FDEP Rx by Ry
Rn = FEXT Rx by Ry
Rn = FEXT Rx by Ry
Rn = EXP Rx
Test one bit in Rx
Deposit field from Rx into Rn
Deposit field from Rx using OR
Deposit and sign extend field from Rx
Deposit and sign extend using OR
Extract field from Rx
Extract and sign extend field from Rx
Extract exponent field
Operations
Rn
Rn
Rn
Rn
= EXP Rx (EX)
= LEFTZ Rx
= LEFTO Rx
= FPACK Fx
Fx = FUNPACK Rn
Extract exponent field from ALU
Extract number of leading Os
Extract number of leading Is
Convert 32-bit floating-point to 16bit floating-point
Convert 16-bit floating-point to 32bit floating-point
Load Value
• R0 = DM (0x2000000);
• R0 = DM(_a);
loads R0 the contents of the variable a
• DM(_a) = R0;
stores R0 into memory location
Little Endian and Big Endian in a Memory Organization
Little Endian: The LSB Byte (8-bit) of a word of 32 bit is
written into the lowest address byte, and the other
bytes are written in increasing order of significance.
Word (32 bit): 0x90ABCDEF
Word Address: 0x1000
Address
0x1000
0x1001
0x1002
0x1003
Little
EF
CD
AB
90
Big Endian: The byte order is revered. The MSB Byte (8bit) of a word of 32 bit is written into the lowest
address byte, and the other bytes are written in
decreasing order of significance.
Word (32 bit): 0x90ABCDEF
Word Address: 0x1000
Address
0x1000
0x1001
0x1002
0x1003
Big
90
AB
CD
EF
Princeton Architecture
• 80x86 processors and ARM7 have Princeton
architecture for main memory. 8051-family
microcontrollers have Harvard architecture.).
Vectors and pointers, variables, program
segments and memory blocks for data and stacks
have different addresses in the program in
Princeton memory architecture.
Harvard architecture
• When the address spaces for the data and for
program are distinct
• Handling streams of data that are required to be
accessed in cases of single instruction multiple data
type instructions and DSP instructions.
• Separate data buses ensure simultaneous accesses
for instructions and data.
• Program segments and memory blocks for data and
stacks have separate set of addresses in Harvard
architecture.
• Control signals and read-write instructions are also
read-write instructions are also separate for accessing
the program memory and data memory.