04BlackfinCompute

A comparison of DSP Architectures
BlackFin ADSP-BFXXX Compute Unit
Based on a ENEL619.23 white paper
prepared by Darrell Anklovitch
Jan 28, 2004
Blackfin Compute Unit
REV B
Overview
•
•
•
•
•
Architecture Overview
Register Map
ALU features and sample instructions
Multiplier features and sample instructions
Shifter features and sample instructions
Jan 28, 2004
Blackfin Compute Unit
REV B
References
• ADSP-BF535 Blackfin Processor Hardware
Reference, Rev 2, April 2004, Analog Devices. –
Section 2
• Blackfin Processor Instruction Set Reference, Rev
2, May 2003, Analog Devices. – Sections 8 ~ 10,
14 & 15
• A number of the figures in this presentation are
based on figures found in the ADSP-BF535
Blackfin Processor Hardware Reference.
Jan 28, 2004
Blackfin Compute Unit
REV B
ADSP-2106x Core Architecture
CACHE
MEMORY
32 x 48
JTAG TEST &
EMULATION
FLAGS
DAG 1
8 x 4 x 32
DAG 2
8 x 4 x 24
PROGRAM
SEQUENCER
PMA BUS
TIMER
24
PMA
DMA BUS
32
DMA
PMD BUS
48
DMD BUS
40
PMD
BUS CONNECT
FLOATING & FIXED-POINT
MULTIPLIER,
FIXED-POINT
ACCUMULATOR
Jan 28, 2004
DMD
REGISTER
FILE
16 x 40
Blackfin Compute Unit
REV B
32-BIT
BARREL
SHIFTER
FLOATING-POINT
& FIXED-POINT
ALU
Register File and COMPUTE Units
• Key issues
– 5 data paths FROM COMPUTE units
– 5 data paths TO COMPUTE units
parallel operations
UNDER
THE
Jan–28,Highly
2004
Blackfin
Compute
Unit RIGHT CONDITIONS
REV B
BF533 Memory Accesses
Under the right conditions -- 4 memory accesses at same time
64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store
PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time
PLUS background DMA activity
Jan 28, 2004
Blackfin Compute Unit
REV B
Compute Unit Architecture
Register
File
1
Shifter
Jan 28, 2004
2 Multipliers
1 set of
Video
ALUs
2 ALUs
Blackfin Compute Unit
REV B
Register File
8 x 32 bit
OR
16 x 16 bit
2 x 40 bit
accumulators
DATA REGISTER SYNTAX:
•R0, R1 etc refer to 32 bit registers
•R0.L refers to the low 16 bits of the R0 32 bit reg
•R0.H refers to the high 16 bits of the R0 register
ACCUMULATOR SYNTAX:
•A0.L => low 16 bits
•A0.H => next 16 bits
•A0.W => least significant 32 bit word
•A0.X => MS 8 bit extension
SHARC – 16 32-bit data registers, integer and float
There is a pair of SHARC accumulator registers too
Jan 28, 2004
Blackfin Compute Unit
REV B
ALU Data Flow
2 x 32 bit paths to dual
Multiplier/ALU units
2 x 32 bit paths back
to register file
Jan 28, 2004
Blackfin Compute Unit
REV B
Sample instructions
Blackfin
R0 = R1 + R2;
SHARC
R0 = R1 + R2;
R0.L = R1.L + R2.H;
R0 = R1 +|- R2;
Means
MOVE.W R2, R0
ADD.W R1, R0
Closest
R0 = R1 + R2,
R4 = R1 – R2;
R0.L = R1.L – R2.L
in parallel with
R0.H = R1.H + R2.H
Jan 28, 2004
68K
MOVE.L R2, R0
ADD.L R1, R0
Blackfin Compute Unit
REV B
MOVE.L R2, R0
ASR.L #16, R0
MOVE.L R1, R3
ASR.L #16, R3
ADD.W R3, R0
ASL.L #16, R0
MOVE.W R2, R0
ADD.W R1, R0
ALU Features
Single 16 bit OPS:
Dual 16 bit OPS:
31
Rm
Can be :
Rp
Rn
Single 32 bit OPS:
Dual 16 bit Cross:
31
Rm
Rp
Rn
Jan 28, 2004
Blackfin Compute Unit
REV B
ALU Sample Instructions
Single 16 bit ops:
Dual 16 bit ops:
Single 32 bit ops:
Does not work in parallel
Must have this option
Dual 32 bit ops:
Quad 16 bit ops:
C
A
B
D
A
B
Operator order is important
+ must come before -
•A & B registers must stay on the same side of the ‘|’ for both
Instructions
•For dual and quad 16 bit operations the (CO) option causes the
destination registers to cross
Jan 28, 2004
Blackfin Compute Unit
REV B
Multiply Data Flow
2 x 32 bit paths to dual
Multiplier/ALU units
Multiplier
share the
same
operand/result
buses as the
ALU
2 x 40 bit
accumulator
2 x 32 bit paths back
to register file
Jan 28, 2004
Blackfin Compute Unit
REV B
Multiply Features
L
L
L
H
H
L
H
H
•Multiplies are signed fractional by default
•Signed fractional multiply result is automatically left
shifted 1 bit.
•Signed fractional multiply != signed integer multiply
•Rounding available on fractional number multiplies and
special option of integer number multiplies
Jan 28, 2004
Blackfin Compute Unit
REV B
Rounding
2 cases:
31
Rm
31
Rp
32 bit result
0x8000
0x8000
31
31
top 16 bits go to
destination register
top 16 bits go to
destination register
Rd
Rd
Rounding adds 0x8000 to the 32 bit multiplier result or
accumulator value before extracting a 16 bit value to the
destination register
Jan 28, 2004
Blackfin Compute Unit
REV B
Fractional Multiply
Fractional
Multiply !=
Integer
Multiply
•When extracting a 16 bit fractional value from an accumulator
the high 16 bits is taken
•Where in the destination register it goes depends on which
accumulator is being extracted from
Jan 28, 2004
Blackfin Compute Unit
REV B
Integer Multiply
Fractional
Multiply !=
Integer
Multiply
•When extracting a 16 bit integer value from an accumulator
the low 16 bits is taken.
•Where in the destination register the 16 bit value goes depends
on which accumulator is being extracted from
Jan 28, 2004
Blackfin Compute Unit
REV B
Multiply Sample Instructions
16 bit extraction from ACC 0
32 bit extraction
16 bit extraction from ACC 1
Multi-issue MAC Instruction Examples
A1 += R1.H * R2.L , A0 += R1.L * R2.L;
R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L);
Any combination of .H and .L in the 2 operands is allowed
R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L);
Where destination registers must be paired as follows:
R[1,0], R[3,2], R[5,4] and R[7,6]
R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L;
Jan 28, 2004
Blackfin Compute Unit
REV B
Shifter Sample Instructions
Arithmetic shift
3 op
Reg
shift
3 op
Immediate
shift
2 operator
Immediate
shifts
Jan 28, 2004
2 operator
Register
shifts
Blackfin Compute Unit
REV B
Parallel Instruction Examples
• In general there are 16 and 32 bit versions of
the arithmetic instructions
• Most of the 32 bit instructions can be
executed in parallel with 2 x 16 bit
memory/index operations
• Exceptions are DIVS, DIVQ and
MULTIPLY with 32 bit operands
• || means parallel
• Examples:
– A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\
– R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];
Jan 28, 2004
Blackfin Compute Unit
REV B