A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit Based on a ENEL619.23 white paper prepared by Darrell Anklovitch Jan 28, 2004 Blackfin Compute Unit REV B Overview • • • • • Architecture Overview Register Map ALU features and sample instructions Multiplier features and sample instructions Shifter features and sample instructions Jan 28, 2004 Blackfin Compute Unit REV B References • ADSP-BF535 Blackfin Processor Hardware Reference, Rev 2, April 2004, Analog Devices. – Section 2 • Blackfin Processor Instruction Set Reference, Rev 2, May 2003, Analog Devices. – Sections 8 ~ 10, 14 & 15 • A number of the figures in this presentation are based on figures found in the ADSP-BF535 Blackfin Processor Hardware Reference. Jan 28, 2004 Blackfin Compute Unit REV B ADSP-2106x Core Architecture CACHE MEMORY 32 x 48 JTAG TEST & EMULATION FLAGS DAG 1 8 x 4 x 32 DAG 2 8 x 4 x 24 PROGRAM SEQUENCER PMA BUS TIMER 24 PMA DMA BUS 32 DMA PMD BUS 48 DMD BUS 40 PMD BUS CONNECT FLOATING & FIXED-POINT MULTIPLIER, FIXED-POINT ACCUMULATOR Jan 28, 2004 DMD REGISTER FILE 16 x 40 Blackfin Compute Unit REV B 32-BIT BARREL SHIFTER FLOATING-POINT & FIXED-POINT ALU Register File and COMPUTE Units • Key issues – 5 data paths FROM COMPUTE units – 5 data paths TO COMPUTE units parallel operations UNDER THE Jan–28,Highly 2004 Blackfin Compute Unit RIGHT CONDITIONS REV B BF533 Memory Accesses Under the right conditions -- 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS background DMA activity Jan 28, 2004 Blackfin Compute Unit REV B Compute Unit Architecture Register File 1 Shifter Jan 28, 2004 2 Multipliers 1 set of Video ALUs 2 ALUs Blackfin Compute Unit REV B Register File 8 x 32 bit OR 16 x 16 bit 2 x 40 bit accumulators DATA REGISTER SYNTAX: •R0, R1 etc refer to 32 bit registers •R0.L refers to the low 16 bits of the R0 32 bit reg •R0.H refers to the high 16 bits of the R0 register ACCUMULATOR SYNTAX: •A0.L => low 16 bits •A0.H => next 16 bits •A0.W => least significant 32 bit word •A0.X => MS 8 bit extension SHARC – 16 32-bit data registers, integer and float There is a pair of SHARC accumulator registers too Jan 28, 2004 Blackfin Compute Unit REV B ALU Data Flow 2 x 32 bit paths to dual Multiplier/ALU units 2 x 32 bit paths back to register file Jan 28, 2004 Blackfin Compute Unit REV B Sample instructions Blackfin R0 = R1 + R2; SHARC R0 = R1 + R2; R0.L = R1.L + R2.H; R0 = R1 +|- R2; Means MOVE.W R2, R0 ADD.W R1, R0 Closest R0 = R1 + R2, R4 = R1 – R2; R0.L = R1.L – R2.L in parallel with R0.H = R1.H + R2.H Jan 28, 2004 68K MOVE.L R2, R0 ADD.L R1, R0 Blackfin Compute Unit REV B MOVE.L R2, R0 ASR.L #16, R0 MOVE.L R1, R3 ASR.L #16, R3 ADD.W R3, R0 ASL.L #16, R0 MOVE.W R2, R0 ADD.W R1, R0 ALU Features Single 16 bit OPS: Dual 16 bit OPS: 31 Rm Can be : Rp Rn Single 32 bit OPS: Dual 16 bit Cross: 31 Rm Rp Rn Jan 28, 2004 Blackfin Compute Unit REV B ALU Sample Instructions Single 16 bit ops: Dual 16 bit ops: Single 32 bit ops: Does not work in parallel Must have this option Dual 32 bit ops: Quad 16 bit ops: C A B D A B Operator order is important + must come before - •A & B registers must stay on the same side of the ‘|’ for both Instructions •For dual and quad 16 bit operations the (CO) option causes the destination registers to cross Jan 28, 2004 Blackfin Compute Unit REV B Multiply Data Flow 2 x 32 bit paths to dual Multiplier/ALU units Multiplier share the same operand/result buses as the ALU 2 x 40 bit accumulator 2 x 32 bit paths back to register file Jan 28, 2004 Blackfin Compute Unit REV B Multiply Features L L L H H L H H •Multiplies are signed fractional by default •Signed fractional multiply result is automatically left shifted 1 bit. •Signed fractional multiply != signed integer multiply •Rounding available on fractional number multiplies and special option of integer number multiplies Jan 28, 2004 Blackfin Compute Unit REV B Rounding 2 cases: 31 Rm 31 Rp 32 bit result 0x8000 0x8000 31 31 top 16 bits go to destination register top 16 bits go to destination register Rd Rd Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before extracting a 16 bit value to the destination register Jan 28, 2004 Blackfin Compute Unit REV B Fractional Multiply Fractional Multiply != Integer Multiply •When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken •Where in the destination register it goes depends on which accumulator is being extracted from Jan 28, 2004 Blackfin Compute Unit REV B Integer Multiply Fractional Multiply != Integer Multiply •When extracting a 16 bit integer value from an accumulator the low 16 bits is taken. •Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from Jan 28, 2004 Blackfin Compute Unit REV B Multiply Sample Instructions 16 bit extraction from ACC 0 32 bit extraction 16 bit extraction from ACC 1 Multi-issue MAC Instruction Examples A1 += R1.H * R2.L , A0 += R1.L * R2.L; R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L); Any combination of .H and .L in the 2 operands is allowed R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L); Where destination registers must be paired as follows: R[1,0], R[3,2], R[5,4] and R[7,6] R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L; Jan 28, 2004 Blackfin Compute Unit REV B Shifter Sample Instructions Arithmetic shift 3 op Reg shift 3 op Immediate shift 2 operator Immediate shifts Jan 28, 2004 2 operator Register shifts Blackfin Compute Unit REV B Parallel Instruction Examples • In general there are 16 and 32 bit versions of the arithmetic instructions • Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations • Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands • || means parallel • Examples: – A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\ – R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0]; Jan 28, 2004 Blackfin Compute Unit REV B
© Copyright 2025 Paperzz