Chapter 2 TMS320C6000 Architectural Overview Learning Objectives Chapter 2, Slide 2 Describe C6000 CPU architecture. Introduce some basic instructions. Describe the C6000 memory map. Provide an overview of the peripherals. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 General DSP System Block Diagram Internal Memory Internal Buses External Memory Central Processing Unit Chapter 2, Slide 3 P E R I P H E R A L S Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Implementation of Sum of Products (SOP) It has been shown in Chapter 1 that SOP is the key element for most DSP algorithms. So let’s write the code for this algorithm and at the same time discover the C6000 architecture. N Y = an * xn n = 1 = a1 * x1 + a2 * x2 +... + aN * xN Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Chapter 2, Slide 4 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Implementation of Sum of Products (SOP) So let’s implement the SOP algorithm! N Y = an * xn n = 1 = a1 * x1 + a2 * x2 +... + aN * xN The implementation in this module will be done in assembly. Two basic operations are required for this algorithm. (1) Multiplication (2) Addition Therefore two basic instructions are required Chapter 2, Slide 5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Multiply (MPY) N Y = an * xn n = 1 = a1 * x1 + a2 * x2 +... + aN * xN The multiplication of a1 by x1 is done in assembly by the following instruction: MPY a1, x1, Y This instruction is performed by a multiplier unit that is called “.M” Chapter 2, Slide 6 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Multiply (.M unit) 40 Y = an * xn n = 1 .M The . M unit performs multiplications in hardware MPY .M a1, x1, Y Note: 16-bit by 16-bit multiplier provides a 32-bit result. 32-bit by 32-bit multiplier provides a 64-bit result. Chapter 2, Slide 7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Addition (.?) 40 Y = an * xn n = 1 .M .? Chapter 2, Slide 8 MPY .M a1, x1, prod ADD .? Y, prod, Y Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Add (.L unit) 40 Y = an * xn n = 1 .M .L MPY .M a1, x1, prod ADD .L Y, prod, Y RISC processors such as the C6000 use registers to hold the operands, so lets change this code. Chapter 2, Slide 9 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Register File - A 40 Register File A A0 A1 A2 A3 A4 Y = a1 x1 an * xn n = 1 prod Y .M . . . .L MPY .M a1, x1, prod ADD .L Y, prod, Y A15 32-bits Let us correct this by replacing a, x, prod and Y by the registers as shown above. Chapter 2, Slide 10 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Specifying Register Names 40 Register File A A0 A1 A2 A3 A4 Y = a1 x1 an * xn n = 1 prod Y .M . . . .L MPY .M A0, A1, A3 ADD .L A4, A3, A4 A15 32-bits The registers A0, A1, A3 and A4 contain the values to be used by the instructions. Chapter 2, Slide 11 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Specifying Register Names 40 Register File A A0 A1 A2 A3 A4 Y = a1 x1 an * xn n = 1 prod Y .M . . . .L MPY .M A0, A1, A3 ADD .L A4, A3, A4 A15 32-bits Register File A contains 16 registers (A0 -A15) which are 32-bits wide. Chapter 2, Slide 12 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data loading Register File A A0 A1 A2 A3 A4 Q: How do we load the operands into the registers? a1 x1 prod Y .M . . . .L A15 32-bits Chapter 2, Slide 13 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Load Unit “.D” Register File A A0 A1 a1 x1 A2 A3 Q: How do we load the operands into the registers? prod Y .M . . . .L A: The operands are loaded into the registers by loading them from the memory using the .D unit. .D A15 32-bits Data Memory Chapter 2, Slide 14 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Load Unit “.D” Register File A A0 A1 a1 x1 A2 A3 It is worth noting at this stage that the only way to access memory is through the .D unit. prod Y .M . . . .L .D A15 32-bits Data Memory Chapter 2, Slide 15 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Load Instruction Register File A A0 A1 a1 x1 A2 A3 Q: Which instruction(s) can be used for loading operands from the memory to the registers? prod Y .M . . . .L .D A15 32-bits Data Memory Chapter 2, Slide 16 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Load Instructions (LDB, LDH,LDW,LDDW) Register File A A0 A1 a1 x1 A2 A3 Q: Which instruction(s) can be used for loading operands from the memory to the registers? prod Y .M . . . .L A: The load instructions. .D A15 32-bits Data Memory Chapter 2, Slide 17 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions Before using the load unit you have to be aware that this processor is byte addressable, which means that each byte is represented by a unique address. Data address 00000000 00000002 00000004 00000006 00000008 Also the addresses are 32-bit wide. FFFFFFFF 16-bits Chapter 2, Slide 18 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions The syntax for the load instruction is: LD *Rn,Rm Where: Rn is a register that contains the address of the operand to be loaded Data address a1 x1 00000000 00000002 00000004 00000006 00000008 prod Y and Rm is the destination register. FFFFFFFF 16-bits Chapter 2, Slide 19 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions The syntax for the load instruction is: LD *Rn,Rm The question now is how many bytes are going to be loaded into the destination register? Data address a1 x1 00000000 00000002 00000004 00000006 00000008 prod Y FFFFFFFF 16-bits Chapter 2, Slide 20 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions The syntax for the load instruction is: LD *Rn,Rm The answer, is that it depends on the instruction you choose: Data address a1 x1 00000000 00000002 00000004 00000006 00000008 prod Y • LDB: loads one byte (8-bit) • LDH: loads half word (16-bit) • LDW: loads a word (32-bit) • LDDW: loads a double word (64-bit) Note: LD on its own does not exist. Chapter 2, Slide 21 FFFFFFFF 16-bits Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions The syntax for the load instruction is: Data 1 0 0xA 0xB 0xC 0xD 0x2 0x1 Example: 0x4 0x3 If we assume that A5 = 0x4 then: 0x6 0x5 (1) LDB *A5, A7 ; gives A7 = 0x00000001 0x8 0x7 LD *Rn,Rm address 00000000 00000002 00000004 00000006 00000008 (2) LDH *A5,A7; gives A7 = 0x00000201 (3) LDW *A5,A7; gives A7 = 0x04030201 (4) LDDW *A5,A7:A6; gives A7:A6 = 0x0807060504030201 FFFFFFFF 16-bits Chapter 2, Slide 22 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Using the Load Instructions The syntax for the load instruction is: address Data LD *Rn,Rm Question: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place? 0xA 0xB 0xC 0xD 0x2 0x1 0x4 0x3 0x6 0x5 0x8 0x7 00000000 00000002 00000004 00000006 00000008 FFFFFFFF 16-bits Chapter 2, Slide 23 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Loading the Pointer Rn The instruction MVKL will allow a move of a 16-bit constant into a register as shown below: MVKL .? a, A5 (‘a’ is a constant or label) How many bits represent a full address? 32 bits So why does the instruction not allow a 32-bit move? All instructions are 32-bit wide (see instruction opcode). Chapter 2, Slide 24 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Loading the Pointer Rn To solve this problem another instruction is available: MVKH eg. MVKH .? a, A5 (‘a’ is a constant or label) Chapter 2, Slide 25 ah al a ah x A5 Finally, to move the 32-bit address to a register we can use: MVKL a, A5 MVKH a, A5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Loading the Pointer Rn Always use MVKL then MVKH, look at the following examples: Example 1 A5 = 0x87654321 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC (sign extension) MVKH 0x1234FABC, A5 A5 = 0x1234FABC ; OK Example 2 MVKH A5 = 0x12344321 Chapter 2, Slide 26 0x1234FABC, A5 MVKL 0x1234FABC, A5 A5 = 0xFFFFFABC ; Wrong Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 LDH, MVKL and MVKH Register File A A0 A1 a x A2 A3 prod Y .M . . . .L .D A15 MVKL MVKH pt1, A5 pt1, A5 MVKL MVKH pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 32-bits Data Memory Chapter 2, Slide 27 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Creating a loop So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 So let’s create a loop so that we can implement the SOP for N Taps. Chapter 2, Slide 28 MVKL MVKH pt1, A5 pt1, A5 MVKL MVKH pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Creating a loop So far we have only implemented the SOP for one tap only, i.e. Y= a1 * x1 With the C6000 processors there are no dedicated instructions such as block repeat. The loop is created using the B instruction. So let’s create a loop so that we can implement the SOP for N Taps. Chapter 2, Slide 29 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 What are the steps for creating a loop 1. Create a label to branch to. 2. Add a branch instruction, B. 3. Create a loop counter. 4. Add an instruction to decrement the loop counter. 5. Make the branch conditional based on the value in the loop counter. Chapter 2, Slide 30 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 1. Create a label to branch to loop Chapter 2, Slide 31 MVKL MVKH pt1, A5 pt1, A5 MVKL MVKH pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 2. Add a branch instruction, B. loop Chapter 2, Slide 32 MVKL MVKH pt1, A5 pt1, A5 MVKL MVKH pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .? loop Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Which unit is used by the B instruction? Register File A A0 A1 a x .S prod Y .M .M . . . .L .L A2 A3 loop .D .D A15 MVKL MVKH pt1, A5 pt1, A5 MVKL MVKH pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .? loop 32-bits Data Memory Chapter 2, Slide 33 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Which unit is used by the B instruction? Register File A A0 A1 a x .S prod Y .M .M . . . .L .L A2 A3 loop .D .D A15 MVKL .S MVKH .S pt1, A5 pt1, A5 MVKL .S MVKH .S pt2, A6 pt2, A6 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .S loop 32-bits Data Memory Chapter 2, Slide 34 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 3. Create a loop counter. Register File A A0 A1 a x .S prod Y .M .M . . . .L .L A2 A3 loop .D .D A15 32-bits MVKL .S MVKH .S pt1, A5 pt1, A5 MVKL .S MVKH .S MVKL .S pt2, A6 pt2, A6 count, B0 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 B .S loop B registers will be introduced later Data Memory Chapter 2, Slide 35 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 4. Decrement the loop counter Register File A A0 A1 a x .S prod Y .M .M . . . .L .L A2 A3 loop .D .D A15 32-bits MVKL .S MVKH .S pt1, A5 pt1, A5 MVKL .S MVKH .S MVKL .S pt2, A6 pt2, A6 count, B0 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop Data Memory Chapter 2, Slide 36 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Make the branch conditional based on the value in the loop counter What is the syntax for making instruction conditional? [condition] Instruction [B1] loop Label e.g. B (1) The condition can be one of the following registers: A1, A2, B0, B1, B2. (2) Any instruction can be conditional. Chapter 2, Slide 37 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Make the branch conditional based on the value in the loop counter The condition can be inverted by adding the exclamation symbol “!” as follows: [!condition] Instruction Label [!B0] B loop ;branch if B0 = 0 [B0] B loop ;branch if B0 != 0 e.g. Chapter 2, Slide 38 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Make the branch conditional MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 Register File A A0 A1 a x .S prod Y .M .M . . . .L .L A2 A3 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop .D .D A15 32-bits [B0] LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop Data Memory Chapter 2, Slide 39 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 More on the Branch Instruction (1) With this processor all the instructions are encoded in a 32-bit. Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded. 32-bit B Case 1: 21-bit relative address B .S1 label Relative branch. Label limited to +/- 220 offset. Chapter 2, Slide 40 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 More on the Branch Instruction (2) By specifying a register as an operand instead of a label, it is possible to have an absolute branch. This will allow a dynamic range of 232. 32-bit 5-bit register B Case 2: code B .S2 register Absolute branch. Operates on .S2 ONLY! Chapter 2, Slide 41 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Testing the code MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 This code performs the following operations: loop LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 However, we would like to perform: ADD .L A4, A3, A4 a0*x0 + a1*x1 + a2*x2 + … + aN*xN SUB .S B0, 1, B0 B .S loop a0*x0 + a0*x0 + a0*x0 + … + a0*x0 [B0] Chapter 2, Slide 42 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Modifying the pointers MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 The solution is to modify the pointers loop A5 and A6. [B0] Chapter 2, Slide 43 LDH .D *A5, A0 LDH .D *A6, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Indexing Pointers Syntax Description *R Pointer Pointer Modified No In this case the pointers are used but not modified. R can be any register Chapter 2, Slide 44 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Indexing Pointers Syntax Description *R *+R[disp] *-R[disp] Pointer + Pre-offset - Pre-offset Pointer Modified No No No In this case the pointers are modified BEFORE being used and RESTORED to their previous values. [disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit). disp = R or 5-bit constant. R can be any register. Chapter 2, Slide 45 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Indexing Pointers Syntax Description *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Pointer Modified No No No Yes Yes In this case the pointers are modified BEFORE being used and NOT RESTORED to their Previous Values. Chapter 2, Slide 46 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Indexing Pointers Syntax Description *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Pointer Modified No No No Yes Yes Yes Yes In this case the pointers are modified AFTER being used and NOT RESTORED to their Previous Values. Chapter 2, Slide 47 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Indexing Pointers Syntax Description *R *+R[disp] *-R[disp] *++R[disp] *--R[disp] *R++[disp] *R--[disp] Pointer + Pre-offset - Pre-offset Pre-increment Pre-decrement Post-increment Post-decrement Chapter 2, Slide 48 Pointer Modified No No No Yes Yes Yes Yes [disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Modify and testing the code MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 This code now performs the following loop operations: a0*x0 + a1*x1 + a2*x2 + ... + aN*xN [B0] Chapter 2, Slide 49 LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Store the final result MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 This code now performs the following loop operations: a0*x0 + a1*x1 + a2*x2 + ... + aN*xN [B0] Chapter 2, Slide 50 LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop STH .D A4, *A7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Store the final result MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 count, B0 loop The Pointer A7 has not been initialised. [B0] Chapter 2, Slide 51 LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop STH .D A4, *A7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Store the final result MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 MVKL .S2 pt3, A7 MVKH .S2 pt3, A7 MVKL .S2 count, B0 The Pointer A7 is now initialised. loop [B0] Chapter 2, Slide 52 LDH .D *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop STH .D A4, *A7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 What is the initial value of A4? MVKL .S2 pt1, A5 MVKH .S2 pt1, A5 MVKL .S2 pt2, A6 MVKH .S2 pt2, A6 A4 is used as an accumulator, so it needs to be reset to zero. loop [B0] Chapter 2, Slide 53 MVKL MVKH MVKL ZERO LDH .S2 .S2 .S2 .L .D pt3, A7 pt3, A7 count, B0 A4 *A5++, A0 LDH .D *A6++, A1 MPY .M A0, A1, A3 ADD .L A4, A3, A4 SUB .S B0, 1, B0 B .S loop STH .D A4, *A7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Increasing the processing power! Register File A A0 A1 A2 A3 A4 .S1 .M1 . . . .L1 How can we add more processing power to this processor? .D1 A15 32-bits Data Memory Chapter 2, Slide 54 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Increasing the processing power! Register File A A0 A1 A2 A3 A4 .S1 .M1 . . . (1) Increase the clock frequency. (2) Increase the number of Processing units. .L1 .D1 A15 32-bits Data Memory Chapter 2, Slide 55 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 To increase the Processing Power, this processor has two sides (A and B or 1 and 2) Register File A A0 A1 A2 A3 A4 . . . A15 Register File B .S1 .S2 .M1 .M2 .L1 .L2 .D1 .D2 32-bits B0 B1 B2 B3 B4 . . . B15 32-bits Data Memory Chapter 2, Slide 56 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Can the two sides exchange operands in order to increase performance? Register File A A0 A1 A2 A3 A4 . . . A15 Register File B .S1 .S2 .M1 .M2 .L1 .L2 .D1 .D2 32-bits B0 B1 B2 B3 B4 . . . B15 32-bits Data Memory Chapter 2, Slide 57 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 The answer is YES but there are limitations. To exchange operands between the two sides, some cross paths or links are required. What is a cross path? A cross path links one side of the CPU to the other. There are two types of cross paths: Chapter 2, Slide 58 Data cross paths. Address cross paths. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Paths Data cross paths can also be referred to as register file cross paths. These cross paths allow operands from one side to be used by the other side. There are only two cross paths: Chapter 2, Slide 59 one path which conveys data from side B to side A, 1X. one path which conveys data from side A to side B, 2X. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 TMS320C67x Data-Path Chapter 2, Slide 60 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Paths Data cross paths only apply to the .L, .S and .M units. The data cross paths are very useful, however there are some limitations in their use. Chapter 2, Slide 61 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Path Limitations <dst> .L1 .M1 .S1 A <src> <src> 2x (1) The destination register must be on same side as unit. (2) Source registers - up to one cross path per execute packet per side. 1x B Execute packet: group of instructions that execute simultaneously. Chapter 2, Slide 62 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Path Limitations <dst> eg: ADD MPY SUB || ADD .L1 .M1 .S1 .L1x .M1x .S1x .L1x A <src> <src> A0,A1,B2 A0,B6,A9 A8,B2,A8 A0,B0,A2 2x 1x B || Means that the SUB and ADD belong to the same fetch packet, therefore execute simultaneously. Chapter 2, Slide 63 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Path Limitations <dst> eg: ADD MPY SUB || ADD .L1 .M1 .S1 .L1x .M1x .S1x .L1x A <src> <src> A0,A1,B2 A0,B6,A9 A8,B2,A8 A0,B0,A2 2x 1x B NOT VALID! Chapter 2, Slide 64 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Data Cross Paths for both sides <dst> <dst> Chapter 2, Slide 65 .L1 .M1 .S1 .L2 .M2 .S2 A <src> <src> 2x <src> <src> 1x B Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Address cross paths Data Addr A .D1 (1) The pointer must be on the same side of the unit. LDW.D1T1 *A0,A5 STW.D1T1 A5,*A0 Chapter 2, Slide 66 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Load or store to either side Data1 DA1 = T1 DA2 = T2 Data2 Chapter 2, Slide 67 A5 .D1 *A0 LDW.D1T1 *A0,A5 LDW.D1T2 *A0,B5 A B B5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Standard Parallel Loads Data1 DA1 = T1 DA2 = T2 A5 .D1 *A0 .D2 *B0 A B LDW.D1T1 *A0,A5 || LDW.D2T2 *B0,B5 Chapter 2, Slide 68 B5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Parallel Load/Store using address cross paths Data1 DA1 = T1 DA2 = T2 A5 .D1 *A0 .D2 *B0 A B LDW.D1T2 *A0,B5 || STW.D2T1 A5,*B0 Chapter 2, Slide 69 B5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Fill the blanks ... Does this work? Data1 DA1 = T1 DA2 = T2 .D1 *A0 .D2 *B0 A B LDW.D1__ *A0,B5 || STW.D2__ B6,*B0 Chapter 2, Slide 70 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Not Allowed! Data1 DA2 = T2 .D1 *A0 .D2 *B0 B5 A B LDW.D1T2 *A0,B5 B6 || STW.D2T2 B6,*B0 Chapter 2, Slide 71 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Not Allowed! Parallel accesses: both cross or neither cross Data1 DA2 = T2 .D1 *A0 .D2 *B0 B5 A B LDW.D1T2 *A0,B5 B6 || STW.D2T2 B6,*B0 Chapter 2, Slide 72 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Conditions Don’t Use Cross Paths If a conditional register comes from the opposite side, it does NOT use a data or address cross-path. Examples: [B2] [A1] Chapter 2, Slide 73 ADD LDW .L1 .D2 A2,A0,A4 *B0,B5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ‘C62x Data-Path Summary CPU Ref Guide Full CPU Datapath (Pg 2-2) Chapter 2, Slide 74 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ‘C67x Data-Path Summary Chapter 2, Slide 75 ‘C67x Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Cross Paths - Summary Data Address Chapter 2, Slide 76 Destination register on same side as unit. Source registers - up to one cross path per execute packet per side. Use “x” to indicate cross-path. Pointer must be on same side as unit. Data can be transferred to/from either side. Parallel accesses: both cross or neither cross. Conditionals Don’t Use Cross Paths. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Code Review (using side A only) 40 Y = n = 1 MVK loop: LDH LDH MPY ADD SUB [A2] B STH .S1 .D1 .D1 .M1 .L1 .L1 .S1 .D1 an * xn 40, A2 *A5++, A0 *A6++, A1 A0, A1, A3 A3, A4, A4 A2, 1, A2 loop A4, *A7 ; A2 = 40, loop count ; A0 = a(n) ; A1 = x(n) ; A3 = a(n) * x(n) ; Y = Y + A3 ; decrement loop count ; if A2 0, branch ; *A7 = Y Note: Assume that A4 was previously cleared and the pointers are initialised. Chapter 2, Slide 77 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Let us have a look at the final details concerning the functional units. Consider first the case of the .L and .S units. Chapter 2, Slide 78 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant Operands can be: 5-bit constants (or 16-bit for MVKL and MVKH). 32-bit registers. 40-bit Registers. However, we have seen that registers are only 32-bit. So where do the 40-bit registers come from? Chapter 2, Slide 79 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant A 40-bit register can be obtained by concatenating two registers. However, there are 3 conditions that need to be respected: Chapter 2, Slide 80 The registers must be from the same side. The first register must be even and the second odd. The registers must be consecutive. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant All combinations of 40-bit registers are shown below: 40-bit Reg : odd even 8 Chapter 2, Slide 81 32 40-bit Reg : odd even 8 32 A1:A0 B1:B0 A3:A2 B3:B2 A5:A4 B5:B4 A7:A6 B7:B6 A9:A8 B9:B8 A11:A10 B11:B10 A13:A12 B13:B12 A15:A14 B15:B14 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr .unit 32-bit Reg <src>, <src>, <dst> 5-bit Const 32-bit Reg 40-bit Reg < src > < src > .L or .S < dst > 32-bit Reg Chapter 2, Slide 82 40-bit Reg Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > < src > .L or .S < dst > 32-bit Reg Chapter 2, Slide 83 40-bit Reg Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit <src>, <src>, <dst> 32-bit Reg 40-bit Reg < src > < src > .L or .S OR.L1 A0, A1, A2 < dst > 32-bit Reg Chapter 2, Slide 84 40-bit Reg Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit 32-bit Reg .L or .S < dst > Chapter 2, Slide 85 40-bit Reg < src > < src > 32-bit Reg <src>, <src>, <dst> OR.L1 ADD.L2 A0, A1, A2 -5, B3, B4 40-bit Reg Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit 32-bit Reg .L or .S < dst > Chapter 2, Slide 86 40-bit Reg < src > < src > 32-bit Reg <src>, <src>, <dst> OR.L1 ADD.L2 ADD.L1 A0, A1, A2 -5, B3, B4 A2, A3, A5:A4 40-bit Reg Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit 32-bit Reg .L or .S < dst > Chapter 2, Slide 87 40-bit Reg < src > < src > 32-bit Reg <src>, <src>, <dst> 40-bit Reg OR.L1 ADD.L2 ADD.L1 SUB.L1 A0, -5, A2, A2, A1, A2 B3, B4 A3, A5:A4 A5:A4, A5:A4 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Operands - 32/40-bit Register, 5-bit Constant instr 32-bit Reg 5-bit Const .unit 32-bit Reg .L or .S < dst > Chapter 2, Slide 88 40-bit Reg < src > < src > 32-bit Reg <src>, <src>, <dst> 40-bit Reg OR.L1 ADD.L2 ADD.L1 SUB.L1 ADD.L2 A0, A1, A2 -5, B3, B4 A2, A3, A5:A4 A2, A5:A4, A5:A4 3, B9:B8, B9:B8 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Register to register data transfer Chapter 2, Slide 89 To move the content of a register (A or B) to another register (B or A) use the move “MV” Instruction, e.g.: MV A0, B0 MV B6, B7 To move the content of a control register to another register (A or B) or vice-versa use the MVC instruction, e.g.: MVC IFR, A0 MVC A0, IRP Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 TMS320C6211/6711 Instruction Set Chapter 2, Slide 90 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6211 Instruction Set (by category) Arithmetic Logical ABS ADD ADDA ADDK ADD2 MPY MPYH NEG SMPY SMPYH SADD SAT SSUB SUB SUBA SUBC SUB2 ZERO AND CMPEQ CMPGT CMPLT NOT OR SHL SHR SSHL XOR Bit Mgmt CLR EXT LMBD NORM SET Data Mgmt LDB/H/W MV MVC MVK MVKL MVKH MVKLH STB/H/W Program Ctrl B IDLE NOP Note: Refer to the 'C6000 CPU Reference Guide for more details. Chapter 2, Slide 91 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6211 Instruction Set (by unit) .L Unit .S Unit ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKL MVKH MVKLH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO .M Unit MPY MPYH SMPY SMPYH Other NOP Chapter 2, Slide 92 IDLE ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM NOT OR SADD SAT SSUB SUB SUBC XOR ZERO .D Unit ADD ADDA LDB/H/W MV NEG STB/H/W SUB SUBA ZERO Note: Refer to the 'C6000 CPU Reference Guide for more details. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ‘C6711 Additional Instructions (by unit) .S Unit ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP .L Unit ADDDP ADDSP DPINT DPSP INTDP INTDPU INTSP INTSPU SPINT SPTRUNC SUBSP SUBDP ‘C67x .D Unit .M Unit MPYSP MPYDP Chapter 2, Slide 93 MPYI MPYID ADDAD LDDW Note: Refer to the 'C6000 CPU Reference Guide for more details. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 TMS320C6211/6711 Memory Map Chapter 2, Slide 94 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Byte Address ‘C6211 Memory Map 0000_0000 (L2 cache) 0180_0000 8000_0000 On-chip Peripherals 256M x 8 External 9000_0000 1 256M x 8 External A000_0000 2 256M x 8 External FFFF_FFFF Chapter 2, Slide 95 Async (SRAM, ROM, etc.) Sync (SBSRAM, SDRAM) Internal Memory 0 B000_0000 External Memory 64K x 8 Internal 3 Unified (data or prog) 4 blocks - each can be RAM or cache Level 1 Cache 4KB Program 4KB Data Not in map 4K P CPU 256M x 8 External L2 64K 4K D Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 TMS320C6211/6711 Peripherals Chapter 2, Slide 96 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6x System Block Diagram Memory External Memory Internal Buses .M1 .M2 .L1 .L2 .S1 .S2 Control Regs Regs (B0-B15) Regs (A0-A15) .D1 .D2 P E R I P H E R A L S CPU Chapter 2, Slide 97 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ‘C6x Internal Buses Internal Memory A D External Interface A D x32 Peripherals A D ‘C67x x32 Program Addr x32 Program Data x256 Data Addr - T1 x32 Data Data - T1 x32/64 Data Addr - T2 x32 Data Data - T2 x32/64 DMA Addr - Read x32 DMA Data - Read x32 DMA Addr - Write x32 DMA Data - Write x32 PC A regs B regs DMA can perform 64-bit data loads. Chapter 2, Slide 98 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6x System Block Diagram Memory Internal Buses EMIF .M1 .M2 .L1 .L2 .S1 .S2 Control Regs Regs (B0-B15) Regs (A0-A15) Ext’l Memory .D1 .D2 P E R I P H E R A L S CPU Chapter 2, Slide 99 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6x System Block Diagram Program RAM Data Ram Addr Internal Buses D (32) EMIF - Async .M1 .M2 .L1 .L2 .S1 .S2 Control Regs Regs (B0-B15) - Sync Regs (A0-A15) Ext’l Memory .D1 .D2 P E R I P H E R A L S CPU Chapter 2, Slide 101 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6x Peripherals External Memory McBSP EMIF DMA ‘C6x CPU HPI/XB Timer Boot PLL EMIF (External Memory Interface) - Glueless access to async/sync memory EPROM, SRAM, SDRAM, SBSRAM DMA/EDMA (Enhance Direct Memory Acces) - 4/16 Channels BOOT - Boot from 4M external block - Boot from HPI/XB McBSP (Multi-Channel Buffered Serial Port) - High speed sync serial comm - T1/E1/MVIP interface HPI (Host Port Interface) /Expansion Bus (XB) - 16/32-bit host P access Timer/Counters - Two 32-bit Timer/Counters Chapter 2, Slide 102 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Clocking - Basic Definitions What is a “clock cycle”? ‘C6x CLKOUT1 (‘C6x clock cycle) PLL CLKIN CLKOUT2 (1/2 CLKOUT1) x1 x4 When we talk about cycles ... CLKIN - MHz PLL CLKOUT1 - MHz CLKOUT2 - MHz MIPs (max) 250 x1 250 (4ns) 125 2000 200 x1 200 (5ns) 100 1600 50 x4 200 100 1600 25 x4 100 (10ns) 50 800 Chapter 2, Slide 103 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 'C6x System Block Diagram (Final) Program RAM Data Ram Addr Internal Buses DMA D (32) EMIF .M1 .M2 .L1 .L2 .S1 .S2 Control Regs Serial Port Regs (B0-B15) - Sync - Async Regs (A0-A15) Ext’l Memory .D1 .D2 Host Port Boot Load Timers Pwr Down CPU Chapter 2, Slide 104 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ‘C6000 Device Summary Device MIPS MHz Kbytes pins mm W $ Periphs 6201B 1600 200 128 352 27 1.9 80-110 D2H 6202 2000 250 384 352 27 1.9 120-150 D3X 6211 1200 150 72 256 27 1.5 20-40 E2H TMS320 MFLOPS MHz Kbytes pins mm W $ 6701 1000 167 128 352 35 1.9 170-200 D2H 6711 600 100 72 256 27 0.9 20-40 E2H Peripherals Legend: D,E: DMA,EDMA 2,3: # of McBSPs H,X: HPI, XBUS TMS320C62x DSP Generation Parametric Table Periphs TMS320C64x DSP Generation Parametric Table TMS320C67x DSP Generation Parametric Table Chapter 2, Slide 106 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 ’C6000 History 6201 r1 1Q97 coincident ‘C6x architectural announcement. Sample CPU core, minimal peripherals. 6201 r2 4Q97. Full production, with peripherals. 6201B 4Q98. Power reduced, .18 micron silicon, double ports into internal data memory. 6701 3Q98. Pin-for-pin compatible floating-point version of ‘C6201. 1GFLOP (@ 167MHz) performance. 6202 2Q99. 2000 MIPS @ 250MHz. 2-3x 6201 on-chip memory. Replaced HPI with Expansion Bus (32-bit HPI + more). 6211 3Q99. 2 cents per MIPS! 1200MIPS @ 150MHz as low as $25. Double-level cache, enhanced DMA. 6711 Announced 3/1/99. 6701 floating-point CPU with 6211-like memory/peripherals. Volume pricing under $20. Chapter 2, Slide 107 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Module 1 Exam 1. Functional Units a. How many can perform an ADD? Name them. six; .L1, .L2, .D1, .D2, .S1, .S2 b. Which support memory loads/stores? .M 2. Memory Map .S .D .L a. How many external ranges exist on ‘C6201? Four Chapter 2, Slide 110 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 3. Conditional Code a. Which registers can be used as cond’l registers? A1, A2, B0, B1, B2 b. Which instructions can be conditional? All of them 4. Performance a. What is the 'C6711 instruction cycle time? CLKOUT1 b. How can the 'C6711 execute 1200 MIPs? 1200 MIPs = 8 instructions (units) x 150 MHz Chapter 2, Slide 111 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Coding Problems a. Move contents of A0-->A1 or or MV .L1 ADD .S1 MPY .M1 A0, A1 A0, 0, A1 A0, 1, A1 (what’s the problem with this?) Chapter 2, Slide 112 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Coding Problems a. Move contents of A0-->A1 or or MV .L1 ADD .S1 MPY .M1 A0, A1 A0, 0, A1 A0, 1, A1 (A0 can only be a 16-bit value) b. Move contents of CSR-->A1 MVC CSR, A1 c. Clear register A5 or or or or or Chapter 2, Slide 113 ZERO .S1 SUB .L1 MPY .M1 CLR .S1 MVK .S1 XOR .L1 A5 A5, A5, A5 A5, 0, A5 A5, 0, 31, A5 0, A5 A5,A5,A5 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Coding Problems (cont’d) d. A2 = A02 + A1 MPY.M1 ADD.L1 A0, A0, A2 A2, A1, A2 e. If (B1 0) then B2 = B5 * B6 [B1] MPY.M2 B5, B6, B2 f. A2 = A0 * A1 + 10 MPY A0, A1, A2 ADD 10, A2, A2 g. Load an unsigned constant (19ABCh) into register A6. value .equ 0x00019abc mvkl .s1 0x00019abc,a6 mvkl.s1 value,a6 mvkh .s1 0x00019abc,a6 mvkh.s1 value,a6 Chapter 2, Slide 114 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 5. Coding Problems (cont’d) h. Load A7 with contents of mem1 and postincrement the selected pointer. x16 mem A7 mem1 load_mem1: Chapter 2, Slide 115 10h MVKL .S1 mem1, A6 MVKH .S1 mem1, A6 LDH .D1 *A6++, A7 Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Architecture Links: Chapter 2, Slide 116 C6711 data sheet: tms320c6711.pdf User guide: spru189f.pdf Errata: sprz173c.pdf Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 2 TMS320C6000 Architectural Overview - End -
© Copyright 2026 Paperzz