UNIT NO: D75P 34 UNIT TITLE: Computer Architecture Session 2003 - 2004 Outcome 2 Demonstrate an understanding of the functions of computer system components All materials © Aberdeen College 2002 unless stated otherwise. May contain reference to external websites outwith the control of Aberdeen College. All comments to: [email protected] Computing (TN3) Engineering, Computing and Business Studies Awarded for excellence ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Week 1. Main Components of a Computer System. A computer can be represented in a block diagram like the one below. The four main functional blocks are the central processing unit, memory, input and output. The input and output devices, often called peripherals, are used to input and output instructions and data. In the course of this unit we will look at each of these components in turn. For the first session, we will concentrate on the CPU, more often called just the "processor" or "chip". Classifying Processors. Here are four different methods of describing the "type" of a processor. Clock Speed. This tells us how many times the clock "ticks" per second - early chips could run at a staggering 4.77 MHz, now a more acceptable speed is 2 GHz or more. Processor Size. There are two methods of defining this. The first is to use the register size - i.e. whatever size the internal registers are. These are usually 8, 16, 32 or 64 bits - more commonly 16 or 32s. Processors with 16 bit registers are called 16-bit processors. Modern derivatives of the 80x86 family, Motorola 68000s and anything with a RISC chip have 32 bit registers. We define a 32bit processor as having a 32bit-word size. The greater the number of bits, the more powerful the processor should be because it can process a larger amount of information in one operation. For example, a 32bit processor can add two 32bit numbers at once; an 8 bit processor can only do 2 x 8 bit numbers at once. Theoretically it can also transfer 32 bits to/from memory at once. The actual performance of any processor, however, depends on many different factors (size is not everything!) September 2003 2 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Sometimes when referring to a 16bit processor you can have long words, which means 32 bits. Bus Size. Another method is to classify the processor using its data bus size. In this case a 16bit processor means it uses a 16bit data bus (although the register sizes might be entirely different.) The CPU transfers 16bits in one operation. The simple Intel model that we will begin with will be an 8-bit processor, because it uses an 8-bit data bus with 16 bit registers. The Motorola 68000 has a 16 bit bus and 32 bit registers. And sometimes they are called 8/16 and 16/32 processors! The data bus width is important because it helps determine how fast data can be transferred to and from the CPU. An Intel 8088 has to transfer 2 lots of 8 bits to fill a 16bit register. Instruction Set. We will later examine how any processor has a fixed list of commands that it can respond to, and that this list can vary between processors. These roughly fall into two groups called CISC (Complex Instruction Set) and RISC (Reduced Instruction Set). There are also "hybrid" chips (e.g. CRISC, ARM, and MIPS) which fall somewhere between the two. We will examine later the relevant advantages and disadvantages of each type, but a rough guide is that a CISC chip will support a large number of instructions (some have 300 or more) and the reduced set of course has much less (a typical RISC chip has about 30). The Motorola and Intel derivatives are all CISC chips; SUN Sparcs, Acorns, some PDAs and Nokia Mobile Phones are all examples of RISC chips. RISC instructions always operate on 32 bit registers. This photo shows a Pentium P3 processor - the chip itself is the tiny rectangle in the middle of the ceramic square, which acts as a heatsink. Also shown is the fan, which is bolted on top and continually runs while the processor is in use. The actual chip measures less than 2cm across. Photo © C Nyssen 2002 September 2003 3 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Internal Components of the CPU. Data Bus Address Bus The central processing unit (CPU) consists of a control unit, an arithmetic and logic unit, and various other registers, although individual computers differ as to the exact organisation. Not all registers have to be the same size because they all hold different types of information. Those registers, which hold data or instructions, have to be the same size as a memory location. Registers which hold the address of a memory location, the program counter and the memory address register all need to be large enough to contain the highest memory address. In a typical 8-bit microcomputer the registers which hold data are 8 bits wide, whereas those 16 which hold memory addresses are 16 bits wide to allow for a maximum memory size 2 (65 535) locations. A 16-bit PC computer normally has a wider address range, typically in the 24 megabyte range, e.g. 24 address lines giving a maximum memory size of 2 (16777216) locations. September 2003 4 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The Control Unit. The Control Unit is the "Brain Department" of the CPU. This component controls all the timing and activities of the processor, and controls everything that happens within the CPU - and therefore the whole system! The CU itself consists of a number of different elements, but the most relevant to this unit is the Instruction Decoder. Whenever a program instruction arrives in the CPU to be executed, the Decoder interprets the information and decides how to process it. Control signals are required to connect registers to the bus, to control the functions of the ALU and to provide timing signals to the rest of the computer system. Most of the control signals originate in the control section of the Central Processing Unit. All the actions of the control unit are connected with the decoding execution of instructions, the FETCH and EXECUTE cycles. The ALU (Arithmetic and Logic Unit). The arithmetic and logic unit (ALU) is involved in the execution of arithmetic and logic operations. The operands of an arithmetic or logical operation are to be found in memory, but to speed up the operation many computers have several, typically 8 or 16, faster memory locations, called registers, within the CPU. Many computers have a single special register, called the accumulator, which is the source of one of the operands and the destination of an arithmetic or logical operation. If this is the case, the structure of the processor can be represented as follows:- D A TA BU S A CC A LU Flags Connects to the control U nit The above example also shows a flag register. A flag register contains a number of individual bits to store information about the result of the last ALU operation, for example, whether it resulted in a zero result, negative result, or produced a carry or an overflow. This information may be used by later instructions. September 2003 5 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Structure Of the ALU The inputs and outputs will be typically 8 or 16 bits wide, depending on the size of the ALU. The number of control signals will depend upon the number of functions which the ALU is capable of n performing; no control signals are required for 2 operations. The Sub-unit Of The ALU The ALU can perform a range of arithmetic and logic operations. The following circuit descriptions would not necessarily be found in more modern CPUs, which implement the operations utilising more regular structures such as PLAs (Programmable Logic Arrays). a) An Adder A computer works on a pattern of bits and so the lowest level of adder is a one bit adder. This adder has to implement a truth table as follows:A 0 0 1 1 B 0 1 0 1 Carry Sum 0 0 0 1 0 1 1 0 The circuit that implements the truth table is called a half adder, since for addition of multiple bits an additional circuit is needed which has an extra input, the carry from the previous bit addition. This circuit is called a full adder. The above example also shows a flag register. A flag register contains a number of individual bits to store information about the result of the last ALU operation, for example, whether it resulted in a zero result, negative result, or produced a carry or an overflow. This information may be used by later instructions. b) Logic Tests An ALU normally contains logic to perform a number of different logical tests, such as a test to see if the result of an operation is zero. Some of these logical tests affect the flag register used to store information regarding the result of the last operation; other logical tests produce a result used as data in further processing. c) Logical Tests For Zero All that is needed for a test for zero is an OR gate with the requisite number of inputs as shown below. This circuit may be used to set the zero flag on the result of an operation. September 2003 6 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) d) Bitwise AND Of Two Operands As this operation suggests, what is required is a set of AND gates which have as inputs the corresponding bits of the two operands. The outputs are the resultant AND of the bit pairs as shown by the circuit below:- The other bit operations, for example the bitwise OR, may be implemented by similar schemes using different gates. e) Shifting Most computers include some form of shift or rotate instructions in the instruction set. These instructions move bits right or left within a word. The various shift and rotate operations differ in what is placed in the bit position left vacant by the moving of the bit pattern and by what happens to the bit which is moved out of the word by the shifting operation. A shift register may be implemented by a series of edge-triggered flip-flops as shown below. On the occurrence of a clock pulse, the external input is clocked into the first flip-flop, the output from the first flip-flop is clocked into the second and so on. Thus all bits are shifted one place to the right. The output and input will be connected in the particular way required for the shift operation and initial loading of all the bits of the shift register in parallel is normally allowed. f) Comparator Most computers include a number of comparison operations such as tests for equality, greater than and less than. All these comparisons can be performed by subtraction, with the setting of the appropriate status flags, without the storing of the subtraction result. g) Multiplication and Division In most small computers, multiplication and division are not implemented in hardware but have to be implemented by the programmer in software. In larger computers special hardware is provided, but this type of hardware is out-with the scope of this unit. September 2003 7 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The Registers. Registers are small temporary storage units of a fixed size. Most registers are dedicated to a specific purpose, although general-purpose registers are available in some processors. The number and nature of registers will vary between processors. Some registers will not be available to programmers. R egisters only for processor use M em ory A ddress R egister (M A R ) A ddress bus M em ory D ata R egister (M D R ) D ata bus Instruction R egister (IR ) P rogram C ounter (PC ) M ainly for processor use but can be accessed by P rogram m er S tack C ounter (SC ) S tatus R egister (SR ) G eneral P urpose R egister A ccum ulator Register s can be grouped into two types - data registers, which hold data actually being worked on, and pointer registers, which point to where the data can be found, or where it is being sent to. Most processors will contain at least the following:Memory Address Register - points to a location in memory where data is being read from or written to. Program Counter or Instruction Pointer - points to the address in memory of the next program instruction, i.e. the one immediately after the instruction currently being executed. Memory Data Register or Memory Buffer Register - the only register where data can be transferred into, or leave from, the CPU. Acts like a portal or gateway for the data travelling between the CPU and RAM. Instruction Register - used as a "workspace" by the Control Unit, to hold and decode the program instruction currently being executed. Accumulator - used as a "workspace" by the ALU to hold data currently being manipulated. Registers are designed to do a specific job and are not bound by the word size of the computer. Generally the more complex the set of instructions, the greater the number of internal registers will be required. September 2003 8 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Week 2 – Busses and Peripherals. System Busses. The CPU is connected to everything else by systems channels called busses. A bus is a physical, electrical connection between different parts of the computer and consists of either copper circuits or electrical cable, or a combination of the two. In the next session we will look in detail at the Data, Address and Control Buses, but this is a general idea of what they do:the data bus is used to transfer the actual data values the address bus signals where in RAM the data is going to/ coming from the control bus carries control signals. In order to attach any input/output devices, or peripherals, you need something to connect the device to the system bus. In reality, this involves plugging a small circuit board ("card") into the motherboard to form the physical connection between the two. This photograph shows a 486 motherboard with an expansion card fitted. You can see the CPU and RAM chips to the middle and front of the picture. The expansion card is a VGA (Video) card for outputting signals to a monitor. Note that this card also carries ROM chips of its own. The card therefore acts as a device controller and device interface. Data then flows from one device to another along the busses. For example, data typed in at a keyboard can enter the system via the keyboard port, travelling along the bus in order to reach the processor. All computers have a number of separate bus systems so that data can be moving between different pairs of components at the same time. Most systems will have a separate CPU-Memory Bus linking memory directly with the CPU and which runs at very high speeds. It will also be connected to an I/O bus via a bus adapter, with the I/O bus running at much slower speeds. September 2003 9 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) I/O devices and memory run slowly compared to the CPU clock speed. The more cycles per second, the more actions the CPU can carry out. Processor speeds are measured in hertz - a hertz is 1 cycle, or clock "tick", per second. 1 kHz = 1000 Hz; 1 MHz = 1000 kHz; 1 GHz = 1000 MHz. The first PCs ran at about 4.77 MHz; now they run at 2 GHz or more. However there is no way that the memory and devices can keep up with this speed, so for each subsequent clock tick, the actual task being processed can vary. To be capable of high-speed transfers, the physical bus length must be quite short. Many systems therefore consist of a network of very short busses all joined together, rather than just one big one. However one large bus is much cheaper to produce than lots of small ones, so this results in a trade off between speed and economy. The first electonic computers such as COLOSSUS and ENIAC did not use transistors or capacitors (they hadn't been invented yet!). Instead these early machines relied on thermoionic valves to store binary values. The worlds first electronic computer, Colossus, was built at Bletchley Park near Milton Keynes between 1941 and 1943. It relied on huge valves to operate. Likewise, the valves in ENIAC used so much electricity that the surrounding city of Philadelphia would experience power brown-outs whenever the computer switched on. Valves were fragile, unreliable and got extremely hot in use, which is one reason why computers used to take up whole rooms. Later computers such as the Manchester Mk 2 incorporated elaborate liquid coolant systems, much like a domestic freezer but an awful lot bigger! The valves shown in the picture on the left are of a particular type called a "Nixie" tube. These were used to create illuminated alphanumeric output. Nixie displays of this sort were used in calculators and industrial instruments right up until the mid-1970s, when they gradually began to be superceded by Liquid Crystal Display screens. This is how the above valves would have looked when soldered onto a primaeval motherboard! September 2003 10 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) This photograph shows an Intel™ Socket 370 motherboard, supporting both Celeron™ and Pentium processors. Serial and Parallel (Printer) Ports ISA slot (black) PCI slots (white) Keyboard and Mouse connectors Socket for processor, heatsink and fan AGP slot (brown) BIOS ROM chip - the backup battery is in the middle of the board Slots for fitting RAM. This board will support 2 X 516kB RAM chips, giving 1 GB of memory. Power supply gets attached here IDE ports for attaching fixed disk drives, CD-ROM etc. September 2003 11 FD (Floppy Disk) port for attaching cable to connect floppy drive ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The System Bus. The main method of communication between the various parts of a computer is by the use of one or more buses. A bus consists of a group of signal lines used to carry information. Usually the components tap on to the bus to send and receive information as illustrated below: A ddress Bus Clock Parallel CPU RO M RA M Input Output Serial Interrupt D ata Bus Control Bus In order to work correctly only one sender must be active on the bus at any one time. In a simple computer this is achieved by having a single master, the central processing unit, which controls the whole system. The other devices on the bus, called slaves, respond to commands from the central processing unit, which controls information, address, data and control, and a bus is often subdivided into these three types. In a computer system there will be a number of groups of buses. In this unit only the lowest level buses will be considered; those between components of the CPU and those between the CPU, memory and input-output interfaces on a single printed circuit board. The Address bus is used to specify the memory location (addressed) involved in data transfer while the data itself is transferred between devices using the data bus. The data bus therefore, must be bi-directional allowing data to be read into and written to the CPU. September 2003 12 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The Control bus compares various lines used to distribute timing and control signals throughout the system. Important among these are; Signals concerned with the direction of the data transfer (to or from the CPU); Signals which indicate that the data is to be transferred to I/O rather than memory; Requests from external devices require the attention of the CPU, The response to such 'interrupts' can be programmed in various ways, and a system of prioritisation may often be desirable. A system clock generator is responsible for providing an accurate and highly stable timing signal. This generator often forms part of the microprocessor itself. The number of lines contained in the address and data buses depend upon the particular microprocessor employed. Most of today's microprocessors are capable of performing operations on binary numbers consisting of either 8 or 16 bits. They are thus known as 8 bit and 16 bit microprocessors respectively. In a microcomputer based on a 8 bit microprocessor, the data bus has 8 separate lines. Similarly, in a 16-bit system the data bus will have 16 separate lines. Address buses for 8 bit systems invariably comprise 16 lines whereas those for 16 bit may consist of as many as 24 lines. A further complication exists in the case of a number of microprocessors which in order to minimise the CPU pin count (so that a 40 pin rather than a 64 pin package may be utilised), employ multiplexed data and address buses. Certain CPU pins are then used to convey both address and data information, the CPU information on to the respective bus. Since a bus may be connected to many devices, the use of bus drivers/buffers are usually packaged in groups of eight bits (i.e. one byte) and many may be unidirectional (e.g. for use with an address bus) or bi-directional (e.g. for use with a data bus). In the later case devices are usually referred to as 'bus transceivers'. The largest binary number that can be appear on an 8 bit bus is 11111111 (or 28-1 = 255) while that for a 16 bit bus is 1111111111111111 (216-1 = 65535 = 64 k). Each address corresponds to an unique binary code, hence the linear addressable range 'paging' will be dependent upon the number of address lines provided within the system. (The maximum number of individual memory locations that can exist in a system having n address is 2n). Signal on all lines, whether they be address, bus or control, can exist in only one of two states 0 (low) or logic 1 (high). As far as individual devices sharing the data bus are concerned, a third 'high independence' state exists whenever a device is in its deselected or disabled state. This allows the CPU to communicate with other devices without the risk of Bus conflict. Bus transceivers can usually also be placed in a tri-star condition, thus permitting partial access to the bus for a second processor or other 'intelligent' device. The address range corresponding to a particular device (e.g. ROM) is decoded from the address bus and is used to generate an appropriate 'enable' signal. A TTL decoder (or demultiplexer) is often used in such an application. September 2003 13 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Although the CPU is the heart of any microprocessor system, it may not be the only ‘intelligent’ device present. A second data processor, for example, may be fitted in order to perform numeric data processing (NDP) or a dedicated microprocessor may be incorporated, for example, in an intelligent keyboard. The most desirable characteristics of a bus are listed below (but not in any order of importance). Their importance will vary according to the application that one has in mind. A bus should :• • • • • • • • • Be processor and manufacturer independent Allow the use of multiple masters Permit asynchronous operations Employ simple non-multiplexed data transfer protocol Use a simple low-cost backplane Incorporate some means of signalling bus errors Permit a s high bus data rate as possible (to minimise processing delays) Allow as wide an addressing range as possible (both in relation to memory and I/O space) Support as wide a range of processors as possible (including 16 bit and 32 bit processing types) A bus is simply a collection of wires on which electrical signals are passed from component to component. The size and speed of the busses will vary between processor models, but their functions remain the same. A typical 80x86 system component uses standard TTL logic levels. This means each wire on a bus uses a standard voltage level to represent zero and one. We think of binary values as being zero and one rather than electrical levels, because these levels vary on different processors. The Data Bus. The data bus is used to transfer the actual data values, and size of this bus varies widely. On typical systems, the data bus may be 8, 16, 32, or 64 bits (lines) wide. The 8088 and 80188 microprocessors have an eight bit data bus (eight data lines) - this means that the CPU can transfer eight bits of data at a time. The 8086, 80186, 80286, and 80386SX processors have a 16-bit data bus, and so on. The data bus is usually linked to the size of the internal registers for example, a processor with a 32-bit register will commonly have a 16-bit data bus (but this is not always the case!) Having an 8-bit data bus does not limit the processor to eight bit data types. It simply means that the processor can only access one byte of data per memory cycle; the obvious disadvantage is that an 8-bit bus can only transmit half the information per unit time as a 16-bit one. However, since each memory address corresponds to a byte, this also has distinct advantages - the CPU can address memory in chunks as small as a single byte. It also means that this is the smallest unit of memory you can access at once with the processor. That is, if the processor wants to transfer a 4-bit value, it must read eight bits and then ignore the extra four bits. September 2003 14 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 80x86 Processor Data Bus Sizes Processor 8088 80188 8086 80186 80286 80386sx 80386dx 80486 80586 class/ Pentium (Pro) Data Bus Size 8 8 16 16 16 16 32 32 64 The Address bus. We already saw that a data bus transfers information between a particular memory location or I/O device and the CPU. But how do we know where the data is supposed to come from or go to? To differentiate memory locations and I/O devices, the system designer assigns a unique memory address to each memory element and I/O device. When some particular memory location or I/O device has to be accessed, the relevant address is placed on the on the address bus. Circuitry associated with the memory or I/O device recognises this address and instructs the memory or I/O device to read the data from or place data on the data bus. In either case, all other memory locations ignore the request. Only the device whose address matches the value on the address bus responds. The size of the address bus will also vary between processors, and bears a direct relationship to how many memory locations can be accessed at any given time. If the address bus had only 1 line, the processor can access 21, i.e. 2 addresses. If the address bus is a 12-bit bus, the processor can provide 212, or 4096 unique addresses. (Each address is commonly 1 byte in size, so this gives us 4kB of addressable memory). Some 8088 and 8086 derivatives, for example, have 20 bit address busses and can access up to 1,048,576 memory locations. Some computers have up to 36 address lines, giving a theoretical 64 GB of addressable space. 80x86 Family Address Bus Sizes Processor 8088 8086 80188 80186 80286 80386sx 80386dx 80486 80586 / Pentium (Pro) September 2003 Address Bus Size 20 20 20 20 24 24 32 32 32 15 Max Addressable Memory 1,048,576 (1 MB) 1,048,576 1,048,576 1,048,576 16,777,216 (16 MB) 16,777,216 4,294,976,296 (4 GB) 4,294,976,296 4,294,976,296 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) What happens if there is not enough space in the given memory location to write back the data, or if the RAM is spread over more than one chip? The simple answer is that the data will be spread between the 2 chips and part of the byte written to each. For example, if we wanted to write 1110 0110 to location 42, the first half (1110) would be written to address 42 of the first chip and the second half (0110) to location 42 of the second chip. The CPU prepares the chosen chip for writing by means of a chip enable or chip select line. In order to write data to a chip, therefore, the CPU must follow a sequence of steps; Address goes on the address bus Any address lines involving use of the chip select are decoded The chip select is activated The actual data goes on the data bus Data gets written to the correct location via the write line When accessing data, from an I/O port for example, the reverse happens; Address goes on the address bus The relevant I/O line is activated Any address lines involving use of the chip select are decoded The chip select is activated The actual data goes on the data bus Data gets sent back to the processor The Control Bus. The control bus is a collection of signal lines that control how the processor communicates with the rest of the system. Consider for a moment the data bus. The CPU sends data to memory and receives data from memory on the data bus. This prompts the question, "Is it sending or receiving?" There are two lines on the control bus, read and write, which specify the direction of data flow. Other signals include system clocks, interrupt lines and status lines. The read and write control lines control the direction of data on the data bus. When both contain a logic 1, the CPU and memory-I/O are not communicating with one another. If the read line is low (logic 0), the CPU is reading data from memory (that is, the system is transferring data from memory to the CPU). If the write line is low, the system transfers data from the CPU to memory. The byte enable lines are another set of important control lines. These control lines allow 16, 32, and 64 bit processors to deal with smaller chunks of data. September 2003 16 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Note that it is quite possible for byte, word, and double word values to overlap in memory. For example, in the figure below you could have a word variable beginning at address 193, a byte variable at address 194, and a double word value beginning at address 192. These variables would all overlap. Besides the address lines which access memory, the 80x86 family provides a 16 bit I/O address bus. This gives the 80x86 CPUs two separate address spaces: one for memory and one for I/O operations. Lines on the control bus differentiate between memory and I/O addresses. Other than separate control lines and a smaller bus, I/O addressing behaves exactly like memory addressing. Memory and I/O devices both share the same data bus and 16 lines on the address bus. We began studying our hardware theory by looking at two most important components of any system - the CPU and Memory (RAM). In order to do anything useful, however, these must somehow interface with the human user, and so the two are attached via various peripheral devices. This term refers to any piece of hardware attached to a CPU and forms the interface between the outside world and what is happening inside the processor. These are also sometimes referred to as simply "devices" or "peripherals". Devices can be classified into two general groups - Input/Output Devices and Storage Devices. Although these are designed to fulfil different purposes, they interface between processors and users in the same way. I/O devices enable communication between computers and users - for example through keyboards, monitors, mice and barcode scanners. Storage devices store data on a permanent basis, theoretically indefinitely although most media do tend to deteriorate after time. Some examples are - hard drive storage, floppy drive storage and CD-Rs. Expansion Busses. When we looked at the motherboard we saw that the expansion slots came in different sizes. Expansion busses are designed to make it easier to connect devices to the computer system. In the early days of microcomputers, a form called an S100 bus was widely used on CP/M systems. The Apple II was based on a proprietary design and had the first expansion bus that made it easy for end users to add cards on their own. The idea of an open architecture based on a simple expansion bus was one of the factors that helped launch the first IBM PC's overnight success. The first type of slot to be introduced was the ISA slot, and although this technology is now nearly 20 years old you can still fit ISA cards in some modern motherboards. Any typical Pentium motherboard has a selection of different expansion bus designs. September 2003 17 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Different Expansion Busses The Industry Standard Architecture (ISA) bus was the original 8-bit bus that originally debuted on the IBM PC. At that point, it ran at the same speed as the system bus (4.77 MHz) and was later upgraded to first 6, then 8 MHz and 16 bits in width. Computers then started to carry faster processors, and it was soon discovered that many expansion cards simply could not keep up with system demand. The industry had by this time standardised on the 8MHz speed, although most expansion buses now use a speed independent of the system bus. The 8-bit-wide 4.77-MHz IBM PC bus had a peak throughput rating of about 2 megabytes per second.* Bringing the speed up to 8 MHz increased the maximum throughput to 8 MBps. An 8-bit extension made it possible for computers to address 16mb of memory (up from the original 1mb address space), but addressing the additional 8 bits is not as easy as addressing the original 8-bit design, because memory access operations require two steps. *How to measure it. 8 MHz = 8 * 1000 * 1000 clock cycles per second, equals 8 000 000. For an 8 bit bus, multiply by 8 which gives 64 000 000 bits per second. Divide by 8 to get bytes and you get 8 000 000 bytes per second, or 8 MBps. As processors became faster and gained wider data paths, the basic ISA bus design did not change to keep pace. Even now, most ISA cards remain 8bit. The few types with 16bit data paths (hard disk controllers, graphics adapters, and some network adapters) are still constricted by the low throughput levels of the ISA bus. Expansion cards in faster bus slots can better handle these processes - so much so that some newer motherboards don't even carry ISA slots anymore. As the slow and narrow ISA bus became a bottleneck between the processor and expansion devices, the Peripheral Component Interconnect (PCI) bus was created by Intel to solve this problem. The PCI bus runs at its own clock speed separate from the system bus speed. Originally specified as a 32-bit-wide bus operating at 33 MHz, PCI had a theoretical maximum transfer rate of 132 MBps (16½ times as fast as the ISA bus). This is the version most widely implemented in PC systems. PCI also simplified systems configurations by supporting plug-andplay, and it extended the limited resources of the original PC-compatible hardware by supporting shared IRQ assignments. September 2003 18 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The original 124-pin slot specification (62 pins on each side of the expansion slot) has been revised to support even greater throughput. First, a 64-bit extension was designed using an extended connector much like the 16-bit addition to the ISA bus, adding another 64 contact pins (32 pins per side). This doubled the theoretical throughput (though 64-bit cards are still rare at this point). More productive is the PCI 2.1 specification, which calls for a 66-MHz bus speed. This effectively doubles the theoretical throughput of the original 32bit specification to 264 MBps-33 times as fast as with the ISA bus. Along with the 64bit version, there are some other aspects of the PCI specification that most users may not know about. For example, PCI cards can run on either 5 volts or 3.3 volts. A 5volt card has a notch cut into the edge connector toward the front of the computer case, with a corresponding key in the slot. A 3.3-volt card has a notch toward the rear of the case, with a corresponding key in the slot. This prevents a user from accidentally plugging the wrong card into a slot. The PCI specification also calls for a universal card, which fits either slot and runs on either voltage. Another less-known trait of the PCI expansion slot is that the bus is limited to ten electrical loads. Most cards apply more than one load to the bus, and as a result, the practical limit for expansion cards on a single PCI bus is three cards (in some cases, four will work). If you need more than three PCI cards installed in a single system, you can have more than one PCI bus, using a PCI bridge configuration. September 2003 19 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) In the days of the original ISA bus, we used the relatively simple Monochrome Display Adapter (MDA) and Color Graphics Array (CGA) cards to drive our monitors, and these required relatively small amounts of data. A CGA graphics display could show four colors (2 bits of data) at 320 X 200 resolution at 60 Hz, which required 128,000 bits of data per screen, or just over 937 kilobytes per second. In contrast, a 16bit high-color image requires 1.5MB of data, and at 75 Hz, this data is refreshed 75 times per second. (75 Hz is probably the minimum acceptable refresh rate for monitors.) Thanks to graphics accelerators, not all of this data has to be transmitted across the expansion bus to the graphics card, but new imaging technology has created new problems. Now 3-D graphics have made it possible to model both fantastic and realistic worlds on-screen with amazing detail. Texture mapping and object hiding require enormous amounts of data, and the graphics adapter needs to have fast access to this information. Accelerated Graphics Port (AGP) first appeared with Pentium II motherboards. It barely conforms to our original definition of a bus, as it is really a point-to-point connection, dedicated to the single task of connecting a graphics adapter more directly to the motherboard's resources. AGP has limited capabilities. PCI devices must support communication with a variety of devices-storage adapters, network connections, and sound cards, for example-but AGP deals only with graphics. This single-direction, limited task makes it possible to streamline the design for maximum speed. The speed is used to give the graphics adapter fast access to texture and buffer data. Instead of loading up the graphics card with expensive memory, AGP lets the card access this information directly from the computer's system memory, without involving the CPU in the process. How much faster is AGP than PCI? A 33-MHz 32-bit PCI bus supports up to 132 MBps throughput. AGP is also a 32-bit design, but it runs at speeds up to 133 MHz-four times as fastso it has a maximum transfer rate of 532 MBps. (This is still twice as fast as the rate of a 66MHz 32-bit PCI bus.) Best of all, however, is that the graphics card on the AGP bus does not have to compete with any other devices to get access to its data. September 2003 20 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) You can have only one AGP device in a system at a time. If you want to use a second display (a feature that Windows 98 makes relatively easy to implement), you will need to rely on a PCI graphics adapter for the second display. If you want to upgrade an AGP display, you will need to replace the adapter. As with PCI, there are some lesser-known details contained in the AGP specification. Just as there are two different PCI slot designs depending on the card voltage, so there are two different voltage designs for AGP, the common 3.3-volt design and a 1.5-volt type. As part of the compulsory questions for Outcome 3, you will be expected to draw a graph to demonstrate the differences in performance between systems with differing data and address bus sizes. An assessment-level question is given next for you to try. (Your lecturer will explain the clock cycle part, as we don't actually cover this until Book 3). Address and data bus sizes - Graph Drawing Exercise © SQA 2001 - taken from draft Exemplar for unit. Scenario A semiconductor manufacturer has decided to produce a range of microprocessors/microcontrollers for use in a variety of application areas. As speed and cost are both important factors the designers have decided to use a common core processor and provide different address and data bus widths for different family members. The difference in cost between processors is largely caused by the differences in packaging. One result of this decision is that each member of the family can perform a maximum of one million memory fetches per second (as long as it is attached to memory of a sufficient speed). This corresponds to one fetch per two machine cycles. Part 1 You have been detailed to help the design team of your companies latest product, and the task that you have been given is to produce clear graphs showing the performance of different members of the processor family. This will be used to help decide the lowest cost component that can be used in the product. The graph will be used at a meeting where the choice of device will be finalized. The graph should be in a form suitable for its intended use and labeled clearly and scaled appropriately. Part Number Hyc4e Hyc4t Hyc4w Hyc4s Hyc8w Hyc8s Hyc12s Hyc12n Hyc12s Hyc16f Hyc32f Hyc32o Data Bus Size 4 4 4 4 8 8 12 12 16 16 32 32 Address Bus Size 8 10 12 16 12 16 16 20 16 24 24 32 Prepare a graph to the above specification for this set of data. September 2003 21 Cost (ex vat) 2.00 2.50 3.20 4.00 3.70 4.60 5.30 6.20 6.10 7.00 8.50 10.20 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) It is estimated that the proposed application will require a processor capable of transferring at least seven bits per second. Add a line to a new copy your graph indicating this level of performance. From your graph, determine which processors meet this requirement. Part 2 Now that the range of candidate processors has been reduced, it has been decided to further reduce the list of candidates by considering the required memory space of the application. You have been detailed to produce a graph showing the amount of memory that each of these processors can address. Again, this will be required at a meeting, and should be appropriately presented. The system will require a minimum of 30Kb of memory, and your graph should include this. Based on the data bus size, the address bus size and unit cost, which processor would you recommend? Week 3 - Memory. A computer stores information in its memory. There are basically two types of systems memory - RAM (Random Access Memory) and ROM (Read Only Memory). There are further subdivisions of these 2 types, which we shall examine in detail later. Read only Memory can be used to store algorithms (i.e. the instructions of a program) when memory is manufactured and once tested these algorithms should not need changing. Obviously algorithms which are going to be changed are not stored in ROM as this would be very inefficient. In general, once a ROM has been programmed it cannot be changed. In PCs, ROM is often used to store part of the operating software of a computer system. When a computer is switched on there is nothing inside the RAM because such data is lost when the power is removed. It is therefore necessary to have a program which can be loaded automatically and which will then load the necessary programs into the main memory. Such a loading program is called a ‘Bootstrap’ loader. The algorithm stored in the ROM is obeyed and reads other programs from a peripheral device. In most cases the program read in is part of the operating system which then controls the subsequent operation of the computer system. Different Memory Types. Random Access Memory (RAM). In common usage, the term RAM is synonymous with main memory, the memory available for data and programs. For example, a computer with 8M RAM has approximately 8 million bytes of memory that programs can use. It can be both read from and written to. It is typically cheap and fast – standard Dynamic RAM (DRAM) has a usual access speed of about 60 – 70 nS. Dynamic Ram uses capacitors for storing electrical charge. These minute capacitors can only hold information for a very short period of time (a thousandth of a second or millisecond). Dynamic memory must therefore be refreshed at frequent intervals in order to retain the information stored in the capacitors, and this is done when the microprocessor is carrying out other work so that processing time does not suffer. The basic idea is that information is stored in the form of a charge on a capacitor and this allows a higher bit density and gives lower power consumption than static memories. September 2003 22 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Main memory consists of a large number of cells each capable of storing 1 bit of information. These cells are grouped into locations. A location will normally be either 8 bits (1 byte) or 16 bits and each location will have a unique address. A location is therefore the smallest addressable unit of memory and the size of the location is known as the memory word size. The word size is the smallest number of bits that can be stored or retrieved in one memory access. A more modern type of RAM is Synchronous DRAM (SDRAM), a type of RAM that can run at much higher clock speeds than conventional memory. SDRAM actually synchronizes itself with the CPU's bus and is capable of running about twice as fast as DRAM. Today's fastest Pentium systems use CPU buses running at 100 MHz or more, so SDRAM can keep up with them, though barely. SDRAM is not expected to support the ever-higher speeds of the latest CPUs, which is why new memory technologies such as RDRAM and SLDRAM, are being developed. Older systems used SIMM Memory Chips (Single Inline Memory Module), small circuit boards which required a 32-bit path to the memory chips. With the development of the Pentium, which required a 64-bit path to memory, SIMMS had to be installed in pairs. Memory is nowadays supplied as DIMMs (Dual Inline Memory Modules) which can be installed one DIMM at a time. This is what a SIMM looks like, in contrast to the DIMM above. Static Random Access Memory (SRAM). This is a type of memory that is faster and more reliable than the more common DRAM. The term static is derived from the fact that it doesn't need to be refreshed like dynamic RAM. It is both faster and less volatile than dynamic RAM, but it requires more power and is a lot more expensive. Both types of RAM are volatile, meaning that they lose their contents when the power is turned off. While DRAM supports access times of about 60 nanoseconds, SRAM can give access times as low as 10 nanoseconds. In addition, its cycle time is much shorter than that of DRAM because it does not need to pause between accesses. Due to its high cost, SRAM is often used only as a memory cache (see below…) SRAM is constructed from bipolar cells, unlike the capacitors used for DRAM. It is fast, not very compact and has a high power consumption. Static RAM uses minute switches to indicate an ON or OFF state. These switches are called flip-flops. Whether the switch is on or off they require a current to be passed to them. As a result static RAM is used mainly for small memory sizes. September 2003 23 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) L1, L2 and Secondary Cache. The speed at which a program executes instructions will be dependent on the rate at which instructions and data can be read from and written to main memory. Application code and data that will be frequently used can reside in cache memory. This is an intermediate memory system that sits between the CPU and main memory, and works on the principle of locality of reference. In other words, stuff that gets used a lot is kept handy! If you have accessed one location, you are more likely to access its neighbours next, because programs are stored sequentially, as are arrays of data and processes occurring in loops. Cache memory is SRAM or "Static RAM," the fastest available. It is also expensive compared to main RAM. The cache memory is connected to the CPU by an extremely fast Front side bus; consequently, data can be accessed from cache memory much faster than from main memory. Processors usually have cache built-in or as part of the CPU module - if you look at advertisements for processors, they are marketed with an n-size cache. Some early Celerons had no cache at all, and subsequently performed very poorly! Computers generally only have 128 kilobytes to 512 kilobytes of cache memory, but very high end systems may have up to 2 megabytes. Among the Intel processors, the Pentium II and III chips generally come with 256 (accepted minimum) or 512 kilobytes of cache memory. Cheaper Celeron chips usually have 128 or 256 kilobytes. The top-end of the Pentium range comes with 1 or even 2 megabytes of cache, but these are extremely expensive and probably unnecessary for normal use. If, however, you have a "dual processor capable" computer or motherboard, you could have a dual chip system with 256K on each chip at a much cheaper price. With a large enough cache memory, the entire executable application program might be contained in the cache. If frequently used code isn't in the cache, the computer loses time in two ways. First, it still spends time looking for the code in the cache, then after wasting this time, it spends more time fetching the code from the slower main memory. Hence, bigger cache memories can substantially increase performance in most applications. The cache acts a bit like a buffer; frequently used data is kept, and data that has not been accessed recently "drops" off the bottom - this policy is known as "replacement algorithm". The most common algorithm is the Least Recently Used (LRU) algorithm by which the block which has gone the longest time without being referenced is overwritten. The front side bus is an extremely fast data pipeline connecting the core processor of the CPU with its cache memory. This bus can run at full processor speed or, more often, at half (or some other fraction) of the speed of the processor. A 600 MHz processor might have a front side bus running at 200 or maybe even 300 MHz. September 2003 24 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Memory caching is effective because most programs will access the same data or instructions over and over. By keeping as much of this information as possible in SRAM, the computer avoids accessing the slower DRAM. Some memory caches are built into the architecture of microprocessors. The Intel 80486 microprocessor, for example, contains an 8K memory cache; most modern Pentiums ship with a 256K OR 516K cache. Such internal caches are often called Level 1 (L1) caches. Some systems also come with external cache memory, called Level 2 (L2) cache. These caches sit between the CPU and the DRAM. Like L1 caches, L2 caches are composed of SRAM but they are much larger. Where a system contains both L1 and L2 cache, the L2 is sometimes called a secondary cache. Disk caching works under the same principle as memory caching, but instead of using highspeed SRAM, a disk cache uses conventional main memory. The most recently accessed data from the disk is stored in a memory buffer. When a program needs to access data from the disk, it first checks the disk cache to see if the data is there. Disk caching can dramatically improve the performance of a system, because accessing data in RAM can be thousands of times faster than accessing the hard drive. When data is found in the cache, it is called a cache hit, and the effectiveness of a cache is judged by its hit rate. Many caches use a technique known as smart caching, in which the system can recognise certain types of frequently used data. Optimising Your Cache If you're buying a system or upgrading your motherboard and processors, you should make sure that your system has as much cache memory as you can afford. If using a system for heavy-duty applications such as video editing, two CPUs may be installed; this effectively doubles the amount of cache from 512 kilobytes to a more than adequate 1-megabyte. Adverts The following advertisements have been taken from recent copies of popular computer magazines. These demonstrate the differences in specification and price between different processor models. September 2003 25 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) September 2003 26 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Two Different Types Of Cache. Write-through. Every write operation to the cache is accompanied by a write of the same data to main memory. If this is implemented, then the input/output processor need not consult the cache directory when it reads memory, since the state of main memory is an accurate reflection of the state of the cache as updated by the central processor. Although this scheme simplifies the accesses for the input/output processor, it results in fairly high traffic between central processor and memory, and the high traffic tends to degrade input/output performance. Write-back. In this scheme, the central processor updates the cache during a write, but actual updating of the memory is deferred until the line that has been changed is discarded from the cache. At that point, the changed data are written back to main memory. Read Only Memory (ROM). Computers almost always contain a small amount of read-only memory that holds program instructions for starting up the computer and performing special diagnostics. This is often referred to as a BIOS (Basic Input / Output System) chip. Unlike RAM, ROM cannot be written to. In fact, both types of memory (ROM and RAM) allow random access, so strictly speaking, RAM should be called readwrite memory. ROM typically has a slow access time and is more expensive to produce than RAM. ROM is non-volatile, i.e. it will be retained in the PC's memory even if the power is switched off. Other devices which are added to the PC can have their own ROM e.g. a graphics card will have its own ROM dedicated to the operation of the graphics card alone. Programmable Read-Only Memory (PROM). Like a ROM, this is a memory chip on which you can store program code. But once the PROM has been used, you cannot wipe it clean and use it to store something else. Like ROMs, PROMs are non-volatile; they retain their contents even when the computer is turned off. The difference between a PROM and a ROM is that a PROM is manufactured as blank memory, whereas a ROM is programmed during the manufacturing process. To write data onto a PROM chip, you need a special device called a PROM burner. The process of programming a PROM is sometimes called burning the PROM. A PROM / EPROM burner. PROMS are cheap to produce (although the initial setup costs are high) and are used for various types of firmware. This could be anything from sound cards to washing machines - in fact anything that has some sort of electronic device embedded in it will use a PROM. September 2003 27 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Erasable Programmable Read-Only Memory (EPROM) and Electrically Erabable Programmable Read-Only Memory (EEPROM). EPROM is a special type of memory that retains its contents until it is exposed to ultraviolet light. The ultraviolet light clears the contents, making it possible to reprogram the memory. To reprogram an EPROM, you need a PROM burner. An EPROM differs from a PROM in that a PROM can be written to only once and cannot be erased. EPROMs are used widely in types of firmware that may be subject to upgrade at some point in the future. They also enable the manufacturer to change the contents of the PROM before the device is actually shipped - for example, in a PC any bugs can be removed and new versions installed shortly before delivery. Another widespread use is in component manufacturing processes, where EPROMS may be used for testing and quality control purposes. The EEPROM works like the EPROM but is cleared using an electrical charge rather than UV light. Like other types of PROM, both EPROMs and EEPROMs retain their contents even when the power is turned off. They are not as fast as RAM and are comparatively expensive. EEPROM is similar to flash memory (sometimes called flash EEPROM). The principal difference is that EEPROM requires data to be written or erased one byte at a time whereas flash memory allows data to be written or erased in blocks (thereby making flash memory faster). Week 4 - How RAM works. We already saw that main memory, the RAM, communicates with the processor by the data and address buses. We also learned that the bus consists not of a single, but of multiple, electrical circuits or lines. The width of the address bus dictates how many different memory locations can be accessed, and the width of the data bus how much information is stored at each RAM location. Every time a bit is added to the width of the address bus, the address range doubles. This effectively means that the CPU can access 2no. of lines - for example, the Intel 386 processor had a 32-bit address bus, enabling it to access up to 4294967296 locations. (If each location = 1 byte, this would be 4GB of memory). The Pentium processor - introduced in 1993 - had a data bus width to 64-bits, enabling it to access 8 bytes of data at a time. This model also used 168pin DIMMs (earlier computers mainly used SIMMs) which are specifically designed to support 64-bit paths and which are still the industry standard. The actual chips themselves consist of rectangular arrays of memory cells, arranged in rows (wordlines) and columns (bitlines). Each memory cell also has a unique location or address defined by the intersection of a row and a column, and we usually refer to these addresses in hexadecimal notation (remember that it would really be stored in binary - and humans can't readily understand long binary strings!) September 2003 28 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) DRAM is manufactured using a similar process to a processor. A thin wafer of silicon has a circuit etched onto it using an acid bath - the circuit includes millions of tiny transistors and capacitors and the control algorithms. The overall design is just a series of simple, repeated structures so the whole thing can be reproduced very simply and cheaply. Over the years, several different structures have been used to create the memory cells on a chip, but the support algorithms usually consist of sense amplifiers to amplify the signal or charge detected on a memory cell, and some sort of address logic to select the correct rows and columns. Other components on the RAM chip may include internal counters or registers to keep track of the refresh sequence, or to initiate refresh cycles as needed; plus there will be some sort of control device for actually reading from or writing to the selected cell. In DRAM, microscopically small capacitors are used to hold the charge representing binary 1s and 0s, but these are so tiny that they discharge very quickly, and all the data is lost. To overcome this problem, other circuitry refreshes the memory, reading the value before it disappears completely, and rewriting it back. (This action is what makes the memory dynamic). The refresh speed is expressed in nanoseconds (.000000001 sec, or the time that light takes to travel about 20cm!) - most models of DRAM refresh every 60 or 70 ns. The most difficult aspect of working with DRAM devices is resolving the timing requirements. A sequence of several events has to take place before a RAM address can be read from or written to; all of this has to be co-ordinated by the Control Unit and system clock. Row Address Select. The /RAS circuitry is used to latch the row address and to initiate the memory cycle. It is required at the beginning of every operation. To enable /RAS, the voltage level is changed from high to low, and must stay in a low state until the /RAS is no longer required. /RAS may also be used to trigger a refresh cycle (/RAS Only Refresh, or ROR). Column Address Select. The /CAS is used to latch the column address and to initiate the read or write operation. /CAS may also be used to trigger a /CAS before /RAS refresh cycle. This refresh cycle requires /CAS to be active prior to /RAS and to remain active for a specified time. Like the /RAS, it is activated by a low voltage. Address. The addresses are used to select a memory location on the chip. The address pins on a memory device are used for both row and column address selection, which is known as multiplexing. The number of addresses depends on the memory's size and organisation. The voltage level present at each address at the time that /RAS or /CAS goes active determines the row or column address, respectively, that is selected. Other circuitry and control structures confirm that the address being read from or written to was the one that was in fact selected! Write Enable. The /WE signal is used to choose a read operation or a write operation. A low voltage level signifies that a write operation is desired; a high voltage level is used to choose a read operation. The operation to be performed is usually determined by the voltage level on /WE when /CAS goes low Output Enable: During a read operation, this control signal is used to prevent data from appearing at the output until needed. When /OE is low, data appears at the data outputs as soon as it is available. /OE is ignored during a write operation. In many applications, the /OE pin is grounded and is not used to control the DRAM timing. September 2003 29 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Data In or Out: The DQ pins (also called Input/Output pins or I/Os) on the memory device are used for input and output. During a write operation, a voltage (high=1, low=0) is applied to the DQ. This voltage is translated into the appropriate signal and stored in the selected memory cell. During a read operation, data read from the selected memory cell appears at the DQ once access is complete and the output is enabled (/OE low). At most other times, the DQs are in a high impedance state; they do not source or sink any current, and do not present a signal to the system. This also prevents DQ contention when two or more devices share the data bus. Because most PC memory accesses are sequential, the current industry standard RAM is designed to fetch all the bits in a burst as fast as possible. This type of memory is known as Synchronous DRAM. An on-chip burst counter allows the column part of the address to be incremented very rapidly which helps speed up retrieval of information. A component known as the Memory Controller provides the location and size of the block of memory required; the SDRAM chip can then supply the bits as fast as the CPU can take them, using an on-chip clock to synchronise operations to the CPU's system clock. Different Speeds of RAM. Until a couple of years ago, most RAM ran at its own speed (asynchronous). The industry standard nowadays, however, is Synchronous DRAM (SDRAM) which is synchronised to the system clock. This enables data to be delivered offchip at burst rates of up to 133MHz, although some set-up time is required for the initial data transfer. The problem with SDRAM was that it was never truly designed to run at speeds beyond about 100MHz. Developments in he technology of chipsets began to outstrip developments in RAM, and various manufacturers began to develop alternatives. One stop-gap was Intel's S-RIMM specification, which allows PC100 SDRAM chips to use Direct RDRAM memory modules; but this was complex and expensive. The next step was DRDRAM, or Rambus, specifically designed for the Pentium 4. This is a totally new RAM architecture, complete with bus mastering (the Rambus Channel Master) and a new pathway (the Rambus Channel) between memory devices (the Rambus Channel Slaves). Direct RDRAM is actually the third version of the Rambus technology. The original (Base) design ran at 600MHz and this was increased to 700MHz in the second iteration, known as Concurrent RDRAM. A Direct Rambus channel includes a controller and one or more Direct RDRAMs connected together via a common bus - which can also connect to devices such as micro-processors, digital signal processors, graphics processors and other circuits. The controller is located at one end, and the RDRAMS are distributed along the bus, which is parallel terminated at the far end. The two-byte wide channel uses a small number of very high speed signals to carry all address, data and control information at up to 800MHz. September 2003 30 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The other big player battling to provide system builders with high-performance RAM is Double Density DRAM (DDRAM). This works by allowing the activation of output operations on the chip to occur on both the rising and falling edge of a clock cycle, thereby providing an effective doubling of the clock frequency without increasing the actual frequency. Like other types, DDR SDRAM is tied to the front-side bus, with both the memory and bus executing instructions at the same time rather than one of them having to wait for the other. Virtual Memory. We have already looked at different types of physical memory in a PC system, and in particular at the mainstream RAM - the memory which holds the code or application currently being run. When a PC loads up Windows, for example, the code relating to the windows and graphics (USER and GDI code) loads into the lower section of memory, as do any older DOS applications or drivers. The core Windows operating system (VMM code) loads into the top part. Each Windows application is then loaded into its own protected memory space, usually above the system and DOS code. These allocations can be shown pictorially as a memory map. But what happens if there isn't enough unallocated memory to run an application? In this case, Windows has to pinch a bit of hard disk space to park any RAM code that hasn't been recently used. This becomes part of the systems Virtual Memory. Virtual memory is a combination of RAM (physical system memory) and reserved hard disk space. It can be used to store both program code and data when applications are running. During the execution of a program, at any given point parts of the code will be in physical RAM and other parts are swapped out to the hard disk. This arrangement makes it possible to have more virtual memory in your system than you have RAM installed, and also makes it possible to run more applications simultaneously. September 2003 31 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) How To Find Out How Much Virtual Memory A System Has. Windows can tell you the amount of virtual memory available at any given time and the percentage of your total system resources that are currently available for applications. To get this information, choose Start>>Programs>> Accessories>>Resource Meter. This will show the % of free resources listed. (There are also various freeware programs available which will do the same thing, but fancier). It's recommended that you keep free memory and resources as high as possible. What Causes Free Memory to Decrease? Every time you run an application program under Windows, that program uses some of your free virtual memory to run program code, and to store and display data. Programs use additional memory as they open new documents, execute utilities or perform other operations. If you're running low on virtual memory one of the first indications is that your system will slow to a crawl! This can be solved by closing down some (or all) applications that are running in the background; in some cases, you'll need to close and restart Windows, because some applications don't de-allocate memory after they're closed. Increasing the Amount of Virtual Memory in a System. In some circumstances you may have to increase the amount of virtual memory in the system. You can do this in two ways: Increase the amount of system RAM available, by adding to or upgrading the chips; Create a permanent or temporary swap file, or increase the size of the current Windows swap file. How to Create, Delete, or Change the Size of a Swap File Whenever possible, it's best to let Windows manage your virtual memory. Windows chooses the default setting based on the amount of free hard-disk space. The swap file then shrinks and grows dynamically based on actual memory usage. If you need to specify a different disk or set limits on the minimum or maximum reserved space, however, you can create, delete or resize a swap file manually. Before creating a swap file, run a disk defragmentation utility. Then go to Start>>Settings>>Control Panel>>System>>Properties>>Virtual Memory. Click the radio button for Let me specify my own virtual memory settings, and then enter the new disk in Hard disk or enter values (in kilobytes) in Minimum or Maximum. Note that Windows cannot create a swap file from compressed or stacked hard disk space. Performance Considerations for Virtual Memory The fastest type of virtual memory is physical RAM. The more virtual memory that's provided by memory chips in your computer, the faster Windows will run. Because of this, it's best to increase physical memory whenever possible. Creating a swap file on a network drive is not recommended - network swap files are extremely slow. If you must create a swap file on a network drive, create a permanent swap file. Before creating the swap file, you must make sure the network directory does not have a read-only attribute, and you must have both create and write access to the directory. September 2003 32 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Temporary vs. Permanent Swap Files Windows allows you to set up 2 types of swap files, temporary or permanent. Temporary swap files can be created out of fragmented hard disk space, but permanent swap files can only be created out of contiguous hard disk space. Depending on the amount of contiguous free hard disk space available, performance concerns, and the amount of hard disk space you need when not running Windows, one type of swap file may be better for your configuration than the other. Of the 2 types of swap files, temporary swap files are slower. The more fragmented your hard disk is, the slower a temporary swap file becomes. A temporary swap file takes the form of a DOS file (WIN386.SWP) that is created on your hard disk when Windows loads, and gets deleted when you exit from Windows. To maintain the best performance from a temporary swap file, run a defragmentation utility on the hard disk frequently. Once you've set up virtual memory for a temporary swap file, a swap file of the requested size will be created every time Windows loads. However, if the requested size swap file would use more than 50% of the available hard disk space, the size is reduced to accommodate the 50% limit. Windows does not warn you that it's creating a smaller swap file, so if disk space is low, and if you add files to the hard disk, be aware that your swap file size may be affected! Overclocking. Overclocking is the practice of running your CPU past the speed that it is rated at, for example running a 1.2 GHz CPU at 1.4 GHz. How can this be achieved? Most CPU manufacturers create their CPUs and then test them at a certain speed. If the CPU fails at a certain speed, then it is sold as a CPU at the next lower speed. The tests are usually very stringent so a CPU may be able to run at the higher speed quite reliably. In fact, the tests are often not used at all once a company has been producing a certain CPU for awhile, they may well mark some of them down as the slower CPUs in order to fulfil market demand! Is overclocking dangerous? For the most part, no - provided you are not trying to run your old 486 33MHz at 1 GHz. Another practice that is not recommended is monkeying about with any of the voltage settings. You must also keep the CPU as cool as possible, perhaps by fitting an auxiliary fan. Most modern CPUs are multiplier locked - i.e. you cannot change the actual CPU speed - but you can change the bus speed. The multiplier is a figure obtained by dividing the default CPU speed by the default bus speed, e.g. a 1.2GB Athlon with a 133 MHz bus => 1200/133 = a multiplier of 9. On older CPUs it was possible to change the multiplier by altering some of the jumper settings on the motherboard, but this is not possible with most CPUs on the market today. The only way to alter the overall CPU speed, therefore, is to alter the speed of the bus. Changing the bus speed is actually more beneficial than changing the CPU's speed - when you increase the bus speed, in many cases you will be overclocking all the parts in your AGP, PCI and ISA slots, and your RAM as well as the CPU. Usually this is by a small margin and won't hurt these components. In your motherboard manual, find the jumper settings for the particular bus speed you want to use. Locate those jumpers on your motherboard and change them to fit the jumper settings in the manual. Some motherboards have a "SoftMenu," which enable you to change the bus speed in the computer's BIOS. Calculate the new processor speed by multiplying the bus speed by your CPU's multiplier. September 2003 33 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) WEEK 5 - Memory Maps. We already saw that when a computer is first booted up, it loads part of the operating system into RAM. This means that not all of the RAM is available for subsequent applications. As well as the OS, many peripherals also "claim" a bit of RAM space for their I/O processes, immediately their device drivers are loaded. Any applications which are then opened have to fit themselves in, round the sections of RAM that have already been bagged. In a typical Windows configuration it is easy to see where the RAM space for any given device is located. This screen was obtained by clicking on My Computer >> Control Panel >> System >> Device Manager >> Modem >> Properties >> Resources. We can then tell from this screen that the modem card uses IRQ Channel 3 and memory locations 1428 - 142F and 2000 to 20FF. The addresses are always given in Hexdecimal notation. The Conflicting Device list shows "No Conflicts" meaning that this modem is not in competition with any other device for the IRQ channels and RAM locations it's using. By looking at the properties of other devices you can see which resources are claimed on startup. It is sometimes useful to set out a diagram showing which parts of RAM are claimed by devices, and which are free for running applications; these diagrams are called Memory Maps. Examples of Memory Maps. These are all common, commercial examples provided by different computer manufacturers which demonstrate, if nothing else, that the term "memory map" can mean different things to different manufacturers! Memory maps are usually labelled in hex. This is because Computers store their locations in binary - and we humans cope better hexdecimally! September 2003 34 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 1. The following memory maps show which areas of the memory space are available for your program's use. Essentially, if you do not use the MON51 Target Monitor, you have the entire address space available. Configuration Using MON51 Memory Type & Range Description XDATA (0000h-6AFFh) von Neumann RAM/ROM (Reserved for program code.) XDATA (6B00h-7FFFh) von Neumann RAM/ROM used by the Monitor (Data Area) XDATA (8000h-FFFFh) Free RAM (Available for target program.) CODE (0000h-6AFFh) von Neumann RAM/ROM (Reserved for program code.) CODE (6B00h-7FFFh) von Neumann RAM/ROM used by the Monitor (Data Area) CODE (8000h-9100h) ROM used by the Monitor (Code Area) September 2003 35 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 2. There are no figures with this one showing the locations and space used, because this will vary between manufacturers. However this is a very common configuration and will probably be similar to your own PC. It's actually a memory map of a Motorola 68000 derivative as used in an iMac. Figure: A simple schematic memory map of a microcomputer. The order of the different segments of memory can vary depending on the system. September 2003 36 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 3. A very fancy memory map of a games console; this would be similar to a Nintendo 64 or a PlayStation configuration. September 2003 37 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 4. One to draw for yourself. This is similar to the memory map question in Outcome 2 for Computer Architecture. A certain system has an addressable memory using 16 lines. 8kb of boot code starts at 0000h. System RAM starts at location 16384; this block of RAM extends for 16kb. RAM reserved for the video display begins at location 8000h and continues for 4kb. Immediately after that comes 4kb of flash memory. The top 2kb is reserved for memory mapped i/o buffer space. Draw the relevant memory map, labelling the areas claimed by the connected devices, unused space and the addresses of the start and finish of each area. How to draw a memory map. First, you must calculate how much addressable memory you are working with - i.e. the number of addresses, or locations, available. This will always be (2number of address lines ). For example, a system with an 8-bit address bus will have 28, or 256, addresses. Now draw a vertical bar chart and mentally divide it into 256 slices. Label the bottom section 0 and the top one 255. Note that the number of the top address is always the size -1. It is now a simple task to fill in the slices that have been claimed by the various devices. Your assessment question will give you addresses or sizes in both hex and decimal formats, so some base conversion will be required. You may find it useful to label the sections in hexdecimal on one side of the bar chart and decimal on the other. For assessment purposes, at least one side must show all the starting and finishing addresses. Once you have done, check your solution with the answer over the page. September 2003 38 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 4 suggested solution. First work out total addressable memory - it's 2 16 which equals 65536 locations, each of which is 1 byte in size. Addresses start from 0, so the range will run from 0000 to (65536 less 1) which equals 0000 65535 in decimal notation, or 0000h to FFFFh in hex. Now draw a map - for assessment purposes it does not have to be to scale, but the addresses MUST be accurate! Address (Decimal notation) 63488 - 65535 (62k - 64k) I/O Buffer Address (Hexdecimal notation) F800h - FFFFh 40960 - 63487 (40k - 62k) Free A000h - F7FFh 36864 - 40959 (36k - 40k) Flash 9000h - 9FFFh 32768 - 36863 (32k - 36k) Video 8000h - 8FFFh 16384 - 32767 (16k - 32k) RAM 4000h - 7FFFh 8192 - 16383 (8k - 16k) Free 2000h - 3FFFh 0000 - 8191 (0 - 8k) Boot 0000h - 1FFFh September 2003 39 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Exercise - External hardware Amstrad original PCW series. Draw the relevant Memory Map based on the following figures. Label all the starting and finishing addresses and show both claimed and free space. How wide is the address bus? FDC status register starts from 0 and uses 1 byte FDC data register comes next and also uses 1 byte Starts at 136 & uses 8 bytes Location 159 Parallel ports Kempston joystick AMX mouse EMR MIDI interface A0-A7 A0-A2 Starts at 168 and uses 8 bytes Hard drive Fax Link interface (CPS8256-compatible circuitry). C8-CF Starts at 208 & uses 8 bytes DF Location 224 FF September 2003 Kempston mouse MasterScan: b0 ink under scan head. Cascade/Spectravideo joystick. Input: b4 right, b3 up, b2 left, b1 fire, b0 down. Immediately after comes free space Top location reserved for PROM code 40 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Exercise - Atari Games Console (part). Draw the relevant Memory Map based on the following figures. Label all the starting and finishing addresses and show both claimed and free space. You may assume a 16-bit address bus. Location Contents Top of memory Operating System ROM Device handler routines Serial I/O utilities Interrupt handler Central I/O utilities Operating System vectors RAM vectors on powerup JMP vectors Cassette Printer Keyboard Screen Editor ROM Character set Floating Point ROM package I/O chips ANTIC Programmable Interrupt Power On Key GTIA or CTIA Top 4630 bytes 60906-65535 E944 onwards 59093-59715 E4A6 for 559 bytes 58533 E480 onwards 58448-58495 58432-58447 16 bytes up to E43F 58400-58415 Start at E41F for 16 bytes 58368-58383 E36D 57343 55295 start at 54272 for 12 bytes 54016-54271 53760-54015 Start from D000 for 1/4 kilobyte Week 7 - Polling, Interrupts and Device Handling. An interrupt is signal informing a program that an event has occurred. When a program receives an interrupt signal, it takes a specified action (which can be to ignore the signal). Interrupt signals can cause a program to suspend itself temporarily to service the interrupt. Interrupt signals can come from a variety of sources. For example, every keystroke generates an interrupt signal. Interrupts can also be generated by other devices, such as a printer, to indicate that some event has occurred. These are called hardware interrupts. Interrupt signals initiated by programs are called software interrupts. A software interrupt is also called a trap or an exception. PCs support 256 types of software interrupts including 16 hardware interrupts. Each type of software interrupt is associated with an interrupt handler -- a routine that takes control when the interrupt occurs. For example, when you press a key on your keyboard, this triggers a specific interrupt handler. The complete list of interrupts and associated interrupt handlers is stored in a table called the interrupt vector table, which resides in the first 1 K of addressable memory. September 2003 41 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Why Interrupts Are Used to Process Information. The processor is a highly-tuned machine that is designed to (basically) do one thing at a time. However, we use our computers in a way that requires the processor to at least appear to do many things at once. If you've ever used a multitasking operating system like Windows 95, you've done this; you may have been editing a document while downloading information on your modem and listening to a CD simultaneously. The processor is able to do this by sharing its time among the various programs it is running and the different devices that need its attention. It only appears that the processor is doing many things at once because of the blindingly high speed that it is able to switch between tasks. Most of the different parts of the PC need to send information to and from the processor, and they expect to be able to get the processor's attention when they need to do this. The processor has to balance the information transfers it gets from various parts of the machine and make sure they are handled in an organised fashion. There are two basic mechanisms that a processor can employ. Polling: The processor could take turns going to each device and asking if they have anything they need it to do. This is called polling the devices. In some situations in the computer world this technique is used, however it is not used by the processor in a PC for a couple of basic reasons. One reason is that it is wasteful; going around to all the devices constantly asking if they need the attention of the CPU wastes cycles that the processor could be doing something useful. This is particularly true because in most cases the answer will be "no". Another reason is that different devices need the processor's attention at differing rates; the mouse needs attention far less frequently than say, the hard disk (when it is actively transferring data). Interrupting: The other way that the processor can handle information transfers is to let the devices request them when they need its attention. This is the basis for the use of interrupts. When a device has data to transfer, it generates an interrupt that says "I need your attention now, please". The processor then stops what it is doing and deals with the device that requested its attention. It actually can handle many such requests at a time, using a priority level for each to decide which to handle first. It's also interesting to put into perspective just how fast the modern processor is compared to many of the devices that transfer information to it. Let's imagine a very fast typist; say, 120 words per minute. At an average of 5 letters per word, this is 600 characters per minute on the keyboard. You might be fascinated to realize that if you type at this rate, a 200 MHz computer will process 20,000,000 instructions between each keystroke you make! You can see why having the processor spend a lot of time asking the keyboard if it needs anything would be wasteful, especially since at any time you might stop for a minute or two to review your writing, or do something else. Even while handling a full-bandwidth transfer from a 28,800 Kb/sec modem, which of course moves data much faster than your fingers, the processor has over 60,000 instruction cycles between bytes it needs to process. September 2003 42 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Interrupt Controllers Device interrupts are fed to the processor using a special piece of hardware called an interrupt controller. The standard for this device is the Intel 8259 interrupt controller, and has been since early PCs. As with most of these dedicated controllers, in modern motherboards the 8259 is, in most cases, incorporated into a larger chip as part of the chipset. The interrupt controller has 8 Interrupt Request Lines (IRQs) that take requests from one of 8 different devices. The controller then passes the request on to the processor, telling it which device issued the request (which interrupt number triggered the request, from 0 to 7). The original PC and XT had one of these controllers, and hence supported interrupts 0 to 7 only. Starting with the IBM AT, a second interrupt controller was added to the system to expand it; this was part of the expansion of the ISA system bus from 8 to 16 bits. In order to ensure compatibility the designers of the AT didn't want to change the single interrupt line going to the processor. So what they did instead was to cascade the two interrupt controllers together. The first interrupt controller still has 8 inputs and a single output going to the processor. The second one has the same design, but it takes 8 new inputs (doubling the number of interrupts) and its output feeds into input line 2 of the first controller. If any of the inputs on the second controller become active, the output from that controller triggers interrupt #2 on the first controller, which then signals the processor. Interrupt Priority The PC processes device interrupts according to their priority level. This is a function of which interrupt line they use to enter the interrupt controller. For this reason, the priority levels are directly tied to the interrupt number: On an old PC/XT, the priority of the interrupts is 0, 1, 2, 3, 4, 5, 6 and 7. On a modern machine, it's slightly more complicated, because remember that IRQ2 cascades to the higher eight lines. The result of this is that the priorities become 0, 1, (8, 9, 10, 11, 12, 13, 14, 15), 3, 4, 5, 6 and 7. Non-Maskable Interrupts (NMI) All of the regular interrupts that we normally use and refer to by number are called maskable interrupts. The processor is able to mask, or temporarily ignore, any interrupt if it needs to, in order to finish something else that it is doing. In addition, however, the PC has a non-maskable interrupt (NMI) that can be used for serious conditions that demand the processor's immediate attention. The NMI cannot be ignored by the system unless it is shut off specifically. When an NMI signal is received, the processor immediately drops whatever it was doing and attends to it. As you can imagine, this could cause havoc if used improperly. In fact, the NMI signal is normally used only for critical problem situations, such as serious hardware errors. The most common use of NMI is to signal a parity error from the memory subsystem. This error must be dealt with immediately to prevent possible data corruption. September 2003 43 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Multiple Devices and Conflicts In general, interrupts are single-device resources. Because of the way the system bus is designed, it is not feasible for more than one device to use an interrupt at one time, because this can confuse the processor and cause it to respond to the wrong device at the wrong time. If you attempt to use two devices with the same IRQ, an IRQ conflict will result. This is one of the types of resource conflicts. It is possible to share an IRQ among more than one device, but only under limited conditions. In essence, if you have two devices that you seldom use, and that you never use simultaneously, you may be able to have them share an IRQ. However, this is not the preferred method since it is much more prone to problems than just giving each device its own interrupt line. One of the most common problems regarding shared IRQs is the use of the third and fourth serial (COM) ports, COM3 and COM4. By default, COM3 uses the same interrupt as COM1 (IRQ4), and COM4 uses the same interrupt as COM2 (IRQ3). If you have a mouse on COM1 and set up your modem as COM3--a very common setup--guess what happens the first time you try to go online? You can share COM ports on the same interrupt, but you have to be very careful not to use both devices at once; in general this arrangement is not preferred. Many modems will let you change the IRQ they use to IRQ5 or IRQ2, for example, to avoid this problem. Other common areas where interrupt conflicts occur are IRQ5, IRQ7 and IRQ12. The following table shows the most common IRQ configurations on a PC-based system. September 2003 44 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) IRQ line 16-Bit Priority Bus Line Default Use Other Common Uses Description Conflicts 0 1 No System timer. None; for system use only This is used exclusively for internal operations and is never available to peripherals or user devices. 1 2 No Keyboar d/ Keyboar d controlle r None; for system use only This is used exclusively for keyboard input. Even on systems without a keyboard, IRQ1 is not available for use by other devices. Note that the keyboard controller also controls the PS/2 style mouse if the system has one, but the mouse uses a separate line, IRQ12. This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board. This is a dedicated interrupt line; there should never be any conflicts. If there is, this would indicate a motherboard or chipset (keyboard controller) problem. 2 N/a No Cascad es a second interrupt controlle r to the first, allowing the use of IRQs 8 to 15. Seldom used nowadays except for older modems and EGA video cards, or as an alternative IRQ for COM3 or COM4. For compatibility with older cards that used IRQ2 on the original PC or XT machines (which had only one controller and a normal IRQ2 line), the motherboard of modern PCs reroutes IRQ2 to IRQ9. Hence IRQ2 can still be used but appears to the system as IRQ9. September 2003 45 Conflicts generally come from trying to use a device on IRQ2 and another on IRQ9 at the same time. Some modems and serial port cards allow IRQ2 to be used as an alternative for the two standard lines used for modems and serial ports (IRQ3 and IRQ4) in order to avoid conflicts in those two heavily-contested areas. This is generally a good configuration decision since unused IRQs from 3 to 7 are harder to find than unused IRQs from 10 to 15. If you want to use IRQ2, move any device using IRQ9 to another line like 10 or 11. ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 3 11 8/16 bit COM2 COM4, modems, sound cards, network cards, tape accelerator cards. Also a popular option for modems, sound cards and other devices. Modems often come pre-configured to use COM2 on IRQ3. 4 12 8/16 COM1 COM3, modems, sound cards, network cards, tape accelerator cards. This port and interrupt are almost always used by the serial mouse, where there is no PS/2 mouse fitment. IRQ4 is also the default interrupt for the third serial port, COM3, and a popular option for modems, sound cards and other devices. Modems sometimes come pre-configured to use COM3 on IRQ4. September 2003 46 Conflicts on IRQ3 are relatively common. The two biggest problem areas are modems attempting to use COM2/IRQ3 and clashing with the builtin COM2 port. Some systems may attempt to use both COM2 and COM4 simultaneously on this same interrupt line. Many devices, (particularly network interface cards0 come with IRQ3 as the default. Conflicts on IRQ4 are relatively common, although not as common as on IRQ3. On systems with a PS/2 mouse, problems are less common. The two biggest problem areas are modems that attempt to use COM3/IRQ4 and clash with COM1, and systems that attempt to use both COM1 and COM3 simultaneously on this same interrupt line. ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 5 13 8/16 bit Sound card LPT2, COM3, COM4, modems, network cards, tape accelerator cards, hard disk controller on old PC/XT. 6 14 8/16 bit Floppy disk controlle r Tape accelerator card. 7 15 8/16 bit LPT1 COM3, COM4, modems, sound cards, network cards, tape accelerator cards September 2003 47 This is probably the single "busiest" IRQ in the whole system. On the original PC/XT system this IRQ was used to control the (massive 10 MB) hard disk drive. When the AT was introduced, hard disk control was moved to IRQ14 to free up IRQ5 for 8-bit devices. As a result, IRQ5 is in most systems the only free interrupt below IRQ9 and is therefore the first choice for use by devices that would otherwise conflict with IRQ3, IRQ4, IRQ6 or IRQ7. Technically IRQ6 is available for use by other devices, and some will allow you to select IRQ6, but most will not. Normally used for a printer port. These days of course many other devices use parallel ports, including external drives. If you are not using a printer or other device then IRQ7 can be used in a similar way to IRQ5: as an alternate for any of the devices that would normally be fighting over IRQ3 or IRQ4. Conflicts on IRQ5 are very common because of the large variety of devices that have it as an option. Sound cards especially like to grab IRQ5 and are generally best left there, to avoid problems with poorly written older software that just assumed the sound card would always be left at IRQ5. To whatever extent possible, move devices that can use highervalued IRQs away from IRQ5. Conflicts on IRQ6 are uncommon and are usually the result of an incorrectly configured peripheral card, since IRQ6 is almost always used for floppy disks. If you use a tape accelerator card along with an integrated floppy disk controller on your motherboard, watch out for the accelerator trying to take over IRQ6. Conflicts on IRQ7 are relatively unusual. If you are using two parallel ports, make sure the second uses IRQ5 or another available IRQ. Some add-in parallel boards try to make LPT2 also use IRQ7, which generally won't work. Otherwise, avoid using IRQ7 for expansion cards. ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 8 3 No Realtime clock None; for system use only 9 4 16 bit only None Network cards, sound cards, SCSI host adapters, PCI devices, rerouted IRQ2 devices. 10 5 16 bit only None Network cards, sound cards, SCSI host adapters, secondary IDE channel, quaternary IDE channel, PCI devices. September 2003 48 This is the reserved interrupt for the realtime clock timer. This timer is used by software programs to manage events that must be calibrated to real-world time; this is done by setting "alarms", which trigger this interrupt at a specified time. On most PCs it can be used freely since it has no default setting. This is usually open and one of the easiest IRQs to use since it is generally not contested by many devices. While the secondary IDE controller can sometimes be set to use IRQ10, it almost always uses IRQ15 instead. This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board. There are a couple of things to watch out for when using this IRQ. First, if you are trying to use IRQ2, you cannot use IRQ9 as well, since devices that try to use IRQ2 really end up using IRQ9 instead. Also, some systems that use PCI cards that require the use of a system IRQ line will grab IRQ9; this can be changed in some cases using the BIOS setup to manually assign IRQs to devices. Conflicts on IRQ10 are unusual. The only thing to watch out for is a PCI card that needs an interrupt line being assigned IRQ10 by the BIOS; this can be changed in some cases using the BIOS setup parameters that assign IRQs to PCI devices. ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 11 6 16 bit only None 12 7 16 bit only PS/2 Mouse 13 8 No Floating point unit (FPU / NPU / Math coprocess or). September 2003 Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, quaternary IDE channel, PCI devices. Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, PCI devices. This line is usually open and relatively easy to use since it is generally not contested by many devices. If you are using three IDE channels (the third typically being on a sound card), IRQ11 is typically the one that the tertiary controller will try to use. Also, some PCI video cards will try to use IRQ11. Watch out for PCI cards, especially video cards, that grab IRQ11. On machines that use a PS/2 mouse, this is the IRQ reserved for its use. Using a PS/2 mouse frees up the COM1 serial port and the interrupt it uses (IRQ4) for other devices. Normally this is a good trade since free IRQs with numbers below 8 are harder to find than ones above 8. If a PS/2 mouse is not used, IRQ12 is a good choice for use by other devices such as network cards. Watch out for PCI cards that can sometimes be assigned this line by the system BIOS. If you are using a PS/2 mouse you need to make sure no other devices use IRQ12. None; for system use only. This is the reserved interrupt for the integrated floating point unit (on 80486 or later machines) or the math coprocessor (on 80386 or earlier machines that use one). It is used exclusively for internal signaling and is never available for use by peripherals. This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board, or possibly with your processor or math coprocessor. 49 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 14 9 16bit only Primary IDE channel. SCSI host adapters 15 10 16bit only Second ary IDE channel. Network cards, SCSI host adapters. Reserved for use by the primary IDE controller, which provides access to the first two IDE/ATA devices (usually hard disk drives and/or CDROM drives). On machines that do not use IDE devices at all, this IRQ can be used for another purpose (such as a SCSI host adapter to provide SCSI drives). This IRQ is nowadays reserved for use by the secondary IDE controller, which provides access to the third and fourth IDE/ATA devices (usually hard disk drives and/or CD-ROM drives). If you are not using IDE, or are using only two devices and want to put them on the primary channel to free up this IRQ, that can be done easily as long as you remember to disable the secondary IDE channel. Problems with IRQ14 are rare, since the universality of its use for IDE means most peripheral vendors avoid offering it as an option. If you are using SCSI and not IDE, and want to use IRQ14, make sure any integrated IDE controllers are disabled first. Problems with IRQ15 typically result from assigning a peripheral to use it while forgetting to disable the integrated secondary IDE controller. Most Pentium or later (PCIbased) motherboards have two integrated IDE controllers. Some people incorrectly assume that there will be no conflict if nothing is attached to the secondary channel, but this is not always the case. Interrupt Service Routines. Suppose the PC is currently running a software application say a spreadsheet application. When you ask the machine to print a certain spreadsheet, this means the PC must stop what it currently doing and deal with the printer request. This involves transferring control to another programme to deal with the print request. This second program is called an interrupt service routine. The purpose of an interrupt system is to utilise the CPU to the full. In order to do this it is important that the interrupt is dealt with as quickly and accurately as possible, transparent to the user. September 2003 50 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) The interrupt routine must therefore; 1. Remember the state and where it left the current programme running 2. Deal with the interrupt 3. Then return to the interrupted programme. Steps 1 and 2 are dealt with by the hardware, Step 3 is dealt with by the software. When the CPU is ready to accept an interrupt it sends an acknowledgement signal. The following is the sequence of events:1. An interrupt signal is generated by a device. 2. The CPU completes the execution of the current instruction and acknowledges signal. 3. The requesting device sends an address location via the I/O database to the CPU and studies off the request signal. 4. The CPU stores the current value of the PC in the other memory address and backs the next word (given in step 3 by the request device) cuts the PC and starts to process again. The instruction now being executed is the first instruction of the service routine. Remember we talked previously about registers in the CPU. Registers held information currently being processed or information that would be useful at a later date. Whenever an interrupt is called and the CPU has to deal with it, some mechanism has to exist to store the current values of all the registers. These can then be restored once the interrupt service routine is completed. Multiple and Nested Interrupts. There may be a number of interrupt causes, which therefore require some routine to identify the cause of the interrupt. This can be achieved by using multiple interrupt lines. Each line will have its own part of memory locations. However, more than one type of interrupt may be required on one line. Therefore, a number of different interrupt devices can be attached to the same interrupt line. An interrupt line consists of: A request line (Which transmits the request). An Acknowledgement line (Which request signal and to signal, so that the interrupts is switched off). This can be achieved either by a software technique or hardware function. Software Interrupt When an interrupt occurs, as we now know the hardware transfers control to a service routine. There may be a number of devices attached to this line, therefore the software will have to identify the device which is requesting the interrupt. The software will check first one device then skip to the next one. It checks for a bit flag, which is set to one if an interrupt, has been requested. It sets to zero when an interrupt has been dealt with. When the flag is set to zero the software skips the next instruction. September 2003 51 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) It is also possible for interrupts to interrupt each other, and this is called a nested interrupt. A nested interrupt occurs when an interrupt currently being serviced (interrupt A) is temporarily suspended to deal with another interrupt (interrupt B). In order to deal with either then the address of the first instruction for interrupt A is stored in a memory location. The return address will then be pushed onto a stack. (A stack is simply a pile of instructions and memory locations). Each time an interrupt is accepted the PC (Programme Counter) will be pushed onto the stack then popped off in reverse order. Hardware Interrupt. When using multiple interrupt lines. Priorities can be achieved simply by using a priority arbitration circuit, with all peripheral lines attached. Priority can be either fixed or programmed. If there is only one device per interrupt line then we have total priority between the systems. However, usually there is more than one device on the line, in which instances the priority can be achieved by daisy chaining, the interrupt acknowledgement line between the circuits. The second method of interrupt handling is called daisy chaining. The devices are attached to the same interrupt request line. However, the acknowledgement line instead of being attached in parallel is attached first to one device then the next. This means the device closest to the CPU has the highest-ranking priority. All interrupt lines have an appropriate bit pattern, and if an interrupt is generated it will only be recognised if the corresponding bit is set to 1. An associated register called an Interrupt mask register is programmable, and allows the bit pattern to be changed. This register is added to the other register, therefore, unless there is a 1 bit, in the mask register, an interrupt will not be recognised. Input / Output (I/O) Channels. We have seen how the CPU operates and how it deals with interrupts. It will also have to deal with peripherals devices attached to the PC such as printers. In earlier PCs the CPU was bound by the I/O and could not perform any processing while communicating with a peripheral device. This meant the CPU could not process data while sending and receiving data and was dictated to by the speed of the peripheral device. In order to overcome this problem I/O channels were developed so that I/O could be handled independently from the CPU. In today’s systems the CPU executes an instruction to initiate an I/O transfer over a channel. The transfer is dealt with by the I/O channel leaving the CPU to deal with processing other data. When the data transfer is complete the I/O channel sends an ‘I/O complete’ interrupt to the CPU to inform the CPU it can now transfer more data. September 2003 52 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Memory Buffers When data is being transferred from say a PC to a printer, the information being sent will be stored in a buffer. A buffer is simply an area of memory used to temporarily store the data being transferred. When sending data to the printer it is held in a buffer contained in the PC or printer or possibly both. The CPU then instructs the I/0 channel to transfer the data from the buffer to the printer. This is why a printer may continue to print a document even though the PC has been switched off line or indeed off entirely. Similarly, when typing data at the keyboard, the information is sent to a buffer by the ‘I/0 channel’, held there until the centre key is pressed, then the information is dealt with by the CPU (i.e. a write operation). Polling and Interrupts – Comparison of Approaches. As part of your assessment material you will be obliged to evaluate a real-life scenario (not necessarily computer-oriented) and explain whether it is uses a polling or interrupt-driven approach. The following are examples of assessment-level questions for you to try. Examples 1 & 2(c) SQA – taken from Draft Exemplar 2001 Example 1. A Computer Technician is designated as the technical support operative. Users phone in to report faults. The user is required to state their location as part of the fault report procedure. As soon as a fault is reported, the technical support operative is required to go to the assistance of the user who has reported ther fault. Is this an Interrupt driven or Polling approach? What, if any, is the disadvantage to this approach? Example 2. A Computer Technican is designated as the technical support operative. Every hour the technical support operative is required to visit each computer user in turn to find out if there are any problems to report. Is this an Interrupt driven or Polling approach? What, if any, is the disadvantage to this approach? September 2003 53 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Example 3. Sam and Ella run a sandwich shop. The shop is usually quiet during the mornings so they are free to work in the back, preparing sandwiches for the busy lunchtime period. If a customer does come in, a small bell attached to the door will ring to alert Sam and Ella that someone requires serving. At lunchtimes, however,. the shop is very busy and both Sam and Ella have to serve at the counter. Customers have to regularly stand in a queue to collect and pay for their sandwiches. Which elements of this scenario equate to an interrupt driven approach, which part is the polling approach, and why? Week 8 - Direct Memory Access Programmed I/0 channels are fine for slow peripheral devices. However, the data still has to pass through the MBR and MAR, which means the CPU, will spend a fair proportion of its time dealing with the transfer of data. For high speed devices e.g. laser printers or disk drives, we require to transfer the data directly to the PC's memory, thus bypassing the CPU to allow it to continue processing other data. This is called Direct Memory Addressing (DMA). This is achieved by incorporating many of the functions included in software into a hard drive controller. This controller will need the following: • A register for generating the memory address; • A register for keeping track of the word count; • A register to be used as a data buffer between the peripheral device and the main memory. Therefore, the DMA controller can be connected directly to the peripheral device and the PC's memory, thus avoiding the use of the CPU in the transfer of data. To commence an I/0 operation utilising the DMA the programme will do the following: · Load the initial memory address. · Load the count of the number of words to be transferred. · Load a control word stating whether to input or output. · Execute the 'start' command. When the DMA receives the start command it will begin transferring the data independently of the CPU. This allows the CPU to process either another part of the same programme or another programme. Note: The DMA is still attached to the CPU because the CPU will have to initiate the transfer with the start command. There will also be occasions when the DMA is in the process of transferring data to memory and the CPU will also wish to access memory. In these circumstances the DMA is usually given priority, as the transfer of data from a fast peripheral device cannot be held up. This is known as cycle stealing because the memory cycle access usually originates from the CPU. Although memory is accessed to fetch a command, when the command is executed, it often does not always involve access to memory. Therefore, ‘cycle stealing’ is not as common as you may at first envisage. September 2003 54 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Any peripheral which can utilise DMA will normally default to a particular channel when it is first installed. The following table shows the most common configurations. DMA Chnl Bus Line Default Use Other Common Uses Description Conflicts 0 No Memory (DRAM) Refresh. None; for system use only Reserved for use by the internal DRAM refresh circuitry. (Remember that Dynamic Ram must be refreshed frequently to make sure that it does not lose its contents.) 1 No Low DMA channel for sound card. SCSI host adapters, ECP parallel ports, tape cards, network cards, voice modems. Most sound cards today actually use two DMA channels; one must be chosen from DMAs 1, 2 or 3, while the other can be any free DMA channel (and so is selected from the less-used 5, 6 or 7). DMA1 is also a popular choice for many other peripherals, largely for historical reasons. 2 8/16 bit Floppy Disk controller Tape accelerators Not usually offered as an option for use by most peripherals (except the occasional tape accelerator card, because many tape drives run off the floppy interface, and can even be set to drive floppy disks themselves.) Most devices stay far away from DMA0, recognising its use by the system. Beware however, as some devices actually offer DMA0 as an option - never under any circumstances use DMA0 for peripherals! If you have no devices set to use DMA0 but a conflict becomes apparent anyway, it could be a problem with your motherboard. DMA1 is one of the two most contested channels in the system (the other being DMA3, which is often worse). It is important to watch for conflicts between multiple devices here, particularly if you are using a sound card. It is preferable in general to leave the sound card on DMA1 and move any other devices out of its way, for compatibility with older (poorly written) software that assumes the sound card is on DMA1. Also watch out for ECP parallel port conflicts here. DMA2 is not often a source of conflicts, as long as you remember not to put any other devices on it if you have a floppy disk controller in your system (which almost everyone does). Beware tape accelerator cards that default to DMA2 for their channel assignment. September 2003 55 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 3 8/16 bit None ECP parallel ports, SCSI host adapters, tape accelerator cards, sound or network cards, voice modems. Normally the only one free on the first controller (DMAs 0 to 3) when you are using a sound card. As a result, it is probably the "busiest" channel in the PC, with many different devices vying for its services. On very old XT systems, DMA channel 3 is used by the hard disk drive. DMA3 is probably the worst channel in the system for conflicts, because so many devices try to use it. It is important to watch for conflicts between multiple devices here, particularly if you are using a sound card or ECP parallel port. 4 8/16 bit Cascade for DMA channels 5 to 7. None; for system use only. There should not be any conflicts on this channel; any problems with it indicate a possible system hardware failure. 5 16 bit only High DMA channel for sound card. SCSI host adapters, network cards. 6 16 bit only None Sound cards (high DMA), network cards. This DMA channel is reserved for cascading the two DMA controllers on systems with a 16bit ISA bus. It is not available for use by peripherals. Normally taken by the sound card in your PC for its "high" DMA channel. Some network cards also use this channel, though others don't use DMA at all. This DMA channel is normally open and available for use by peripherals. It is one of the least used channels in the system and is an alternative location for the "high" sound card DMA channel or other devices. September 2003 56 Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7. Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7. ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) 7 16 bit only None Sound cards (high DMA), network cards. Normally open and available for use by peripherals. It is one of the least used channels in the system and is an alternative location for the "high" sound card DMA channel or other devices. Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7. We have seen that slow devices use programmed I/0 devices while high-speed devices use a DMA. It would be beneficial, therefore, for all high speed devices to have a DMA attached, in order to achieve maximum use of the CPU. This would not be practical however. To overcome this we can make use of a channel - a small processor which acts as a shared DMA to a number of peripheral devices. There are 3 basic types of channel: Selector channel. A selector channel may have a number of devices attached to it. However, the channel ‘selects’ a particular device and will not service any other device until finished with the device selected. The channel will transfer a block of words to or from the main memory, will synchronise the speed of transfer and perform parity checking. When transfer is completed it will generate a transfer complete interrupt or error signal if the parity check was bad. Byte Multiplexer channel. A byte multiplexer channel is used to transfer slower moving devices. It can service a number of devices simultaneously since the rate of transfer of data is greater than the rate at which a device can supply the data. The channel will poll individual devices connected and transfer the next character for each device, as they become ready for transfer. Character count and memory data addresses are returned to a fixed memory location. When a device is attached these parameters are fetched from memory and when the device is disconnected the parameters are placed back. A multiplexer channel can be attached to only one medium speed device for a burst transfer (i.e. more than one character). In this mode it acts as a medium speed selector channel. Block Multiplexer Channels. The block multiplexer channel combines the best of both the selector and the multiplexer channels. It can transfer data from high speed devices like the selector, transfer blocks of data like the multiplexer and poll devices to transfer blocks of data when requested. The block multiplexer has a distinct advantage over selector channels because it is not entirely dedicated to one device until the transfer of data is complete. In order to perform the above operation the hard disk will have to be rotated until the read/write heads are over the required sector, from which the data is to be accessed. A block multiplexer channel will service another device until the above device is ready to transfer. September 2003 57 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) DMA-Enabled versus Non-DMA Enabled Devices. As part of your assessment material you will be obliged to demonstrate, in the form of a graph, the increase in performance of a device which can use a DMA channel over an equivalent device which does not. You will be given the various data transfer rates of each device and asked to plot these on a rate/time graph, together with a control line showing 100% processor usage. The following exercises are of equivalent difficulty to an assessment level question. Exercise 1. You urgently need to access a web page on a foreign server, and on trying to navigate to the page a pop-up box asks you to download and install Georgian Text support. The files for the text are 9.2 megabytes in size and will take approximately 5.23 minutes to download and install. Out of the total time taken, 1/3 represents the transfer through the modem card buffer; 1/3 to process the data through the CPU; and 1/3 to transfer the file to the hard drive. Assuming that it is possible to apply DMA to the modem and hard drive, estimate the time required to download and install the text support files, where setting up the DMA controller takes 1/10th of the time that the CPU would have taken when measured over 1 second. Draw a graph to show both arrangements. Assume that the given transfer rate represents the processor and peripherals working at 75% capacity. What is the maximum transfer rate where a DMA controller is employed? Exercise 2. Two identical computers are required to download new anti-virus signature files from a remote server. Computer A transfers the data to its hard drive via the processor, because someone has inadvertently disabled the DMA controller; Computer B's programmed DMA is still intact. Computer A takes 1/4 second to transfer the data through the modem card buffer, 1/2 second for the CPU to process it, and 1/4 second to transfer it to hard drive buffer. The data can be downloaded at a maximum speed of just under 14.65 kilobytes / second. Computer B still takes 1/2 second to collect and transfer the file, but setting up the DMA controller only takes 5% of the time of the processing time of Computer A. Assume that each computer has a large enough cache that the processor and DMA controller are not competing for resources. Plot a graph showing the difference in transfer rates for both computers, comparing the performance of the system using the DMA controller with the one that does not. Now assume that the transfer rate of 14.65kBps represents Computer A working at 90% capacity. Draw in a control line to show this, then use the control to estimate the maximum transfer rate of Computer B (also in kBps). Your answer does not have to be 100% accurate but MUST be a close approximation. September 2003 58 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Glossary Of Computing Terms. You may find it helpful to complete this Glossary, writing in your own definition for each term as you learn about it. Accumulator Adder Address AGP ALU ASCII Assembly code Binary BIOS Bit Buffer Bus Byte Cache Capacitor CISC CMOS Compiler Computer Control Unit CPU Decimal September 2003 59 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Device driver Disk drive DMA EPROM Firmware Flag Flip-Flop Floppy disk Frequency Gigabyte Gigahertz Handshaking Hard disk Hardware Hexdecimal High-level language I/O port Instruction Instruction set Interface Interpreter Interrupt ISA Kilobyte Kilohertz September 2003 60 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) LCD Linker Low-level language Map File Megabyte Megahertz Memory Memory Map Microprocessor Microprogram Monitor Nanoprocessor Object file Operating system Overclocking Parallel port Parallel transmission Parity PCI Peripheral Pixel Polling POST Program counter PROM September 2003 61 ABCDE Engineering, Computing and Business Studies: Computer Architecture (D75P34) Protocol RAM Read Register RISC ROM SCSI Serial port Serial transmission Software Swap file Synchronous Terabyte Terahertz Timeslice Transistor Virtual memory Volatile storage Word Write Writeback Writethrough September 2003 62
© Copyright 2026 Paperzz