Anne Bracy CS 3410 Computer Science Cornell University

AnneBracy
CS3410
ComputerScience
CornellUniversity
The slides are the product of many rounds of teaching CS 3410 by
Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.
Howdoesaprocessorinteractwithitsenvironment?
ComputerSystem=
Memory+Datapath +Control +Input+Output
Network
Keyboard
Display
Disk
Device
Behavior
Partner
Data Rate(b/sec)
Keyboard
Input
Human
100
Mouse
Input
Human
3.8k
SoundInput
Input
Machine
3M
VoiceOutput
Output
Human
264k
SoundOutput
Output
Human
8M
Laser Printer
Output
Human
3.2M
GraphicsDisplay
Output
Human
800M– 8G
Network/LAN
Input/Output Machine
100M– 10G
Network/WirelessLAN Input/Output Machine
11 – 54M
OpticalDisk
Storage
Machine
5 – 120M
Flashmemory
Storage
Machine
32– 200M
Magnetic Disk
Storage
Machine
800M – 3G
Replaceall devicesastheinterconnectchanges
e.g.keyboardspeed==mainmemoryspeed?!
Core0
Core1
Cache
Cache
UnifiedMemoryandI/OInterconnect
Memory
Display
Disk
Keyboard
Network
DecoupleI/OdevicesfromInterconnect
EnablesmarterI/Ointerfaces
Core0
Core1
Cache
Cache
UnifiedMemoryandI/OInterconnect
Memory
Controller
I/O
Controller
I/O
Controller
I/O
Controller
I/O
Controller
Disk
Keyboard
Network
Memory
Display
Separatehigh-performanceprocessor,memory,display
interconnectfromlower-performanceinterconnect
Core0
Core1
Cache
Cache
HighPerformance
Interconnect
Memory
Controller
I/O
Controller
LowerPerformance
LegacyInterconnect
I/O
Controller
I/O
Controller
I/O
Controller
Disk
Keyboard
Network
Memory
Display
Processor– Memory (“FrontSideBus”. AlsoQPI)
• Short,fast,&wide
• Mostlyfixedtopology,designedasa“chipset”
– CPU+Caches+Interconnect+MemoryController
I/OandPeripheralbusses (PCI,SCSI,USB,LPC,…)
•
•
•
•
Longer,slower,&narrower
Flexibletopology,multiple/variedconnections
Interoperabilitystandardsfordevices
Connecttoprocessor-memorybusthroughabridge
Name
Use
Devics per Channel
channel
Width
DataRate
(B/sec)
Firewire 800
External 63
4
100M
USB2.0
External 127
2
60M
USB3.0
External 127
2
625M
Parallel ATA
Internal 1
16
133M
Serial ATA(SATA)
Internal 1
4
300M
PCI66MHz
Internal 1
32-64
533M
PCI Expressv2.x
Internal 1
2-64
16G/dir
Hypertransport v2.x
Internal 1
2-64
25G/dir
QuickPath (QPI)
Internal 1
40
12G/dir
Setofmethodstowrite/readdatato/fromdeviceandcontroldevice
Example:LinuxCharacterDevices
//Openatoy"echo"characterdevice
int fd = open("/dev/echo", O_RDWR);
//Writetothedevice
char write_buf[] = "Hello World!";
write(fd, write_buf, sizeof(write_buf));
//Readfromthedevice
char read_buf [32];
read(fd, read_buf, sizeof(read_buf));
//Closethedevice
close(fd);
//Verifytheresult
assert(strcmp(write_buf, read_buf)==0);
TypicalI/ODeviceAPI
• asetofread-onlyorread/writeregisters
Commandregisters
• writingcausesdevicetodosomething
Statusregisters
• readingindicateswhatdeviceisdoing,errorcodes,…
Dataregisters
• Write:transferdatatoadevice
• Read:transferdatafromadevice
EverydeviceusesthisAPI
Simple(old)example:ATKeyboardDevice
8-bitStatus: PE
8-bitCommand:
TO AUXB LOCK AL2 SYSF IBS
0xAA=“selftest”
0xAE=“enablekbd”
0xED=“setLEDs”
…
8-bitData:
scancode (whenreading)
LEDstate(whenwriting)or…
Input
Buffer
Status
OBS
Output
Buffer
Status
Q:Howdoes program OS codetalktodevice?
A:specialinstructionstotalkoverspecialbusses
Interactwithcmd,status,and
ProgrammedI/O
datadeviceregistersdirectly
•
•
•
•
inb $a,0x64
kbd statusregister
outb $a,0x60
kbd dataregister
Specifies:device,data,direction
Protection:onlyallowedinkernelmode
Kernelboundarycrossingisexpensive
Q:Howdoes program OS codetalktodevice?
A:Mapregistersintovirtualaddressspace
Faster.Lessboundarycrossing
Memory-mappedI/O
•
•
•
•
AccessestocertainaddressesredirectedtoI/Odevices
Datagoesoverthememorybus
Protection:viabitsinpagetable entries
OS+MMU+devices configuremappings
0xFFFFFFFF
Virtual
Address
Space
Memory-MappedI/O
0x00FFFFFF
I/O
Controller
Physical
Address
Space
I/O
Controller
Display
Disk
I/O
Controller
Keyboard
I/O
Controller
Network
0x00000000
0x00000000
Less-favoredalternative=ProgrammedI/O:
• SyscallinstructionsthatcommunicatewithI/O
• Communicateviaspecialdeviceregisters
ProgrammedI/O
Bothpollingexamples,
Butmmap I/Omoreefficient
MemoryMappedI/O
struct
char
char
};
kbd *k
kbd {
status, pad[3];
data, pad[3];
char read_kbd()
{
= mmap(...);
do {
sleep();
syscall
char read_kbd()
status = inb(0x64);
} while(!(status & 1)); {
do {
sleep();
return inb(0x60);
status = k->status;
}
NO
syscall
syscall } while(!(status & 1));
return k->data;
}
Howtotalktodevice?
• ProgrammedI/OorMemory-MappedI/O
Howtogetevents?
• PollingorInterrupts
Howtotransferlotsofdata?
disk->cmd = READ_4K_SECTOR;
Very,
Very,
disk->data = 12;
Expensive
while (!(disk->status & 1) { }
for (i = 0..4k)
buf[i] = disk->data;
1.ProgrammedI/O:Deviceßà CPUßà RAM
for(i =1..n)
• CPUissuesreadrequest
• Deviceputsdataonbus
&CPUreadsintoregisters
• CPUwritesdatatomemory
CPU
RAM
DISK
2.DirectMemoryAccess(DMA):Deviceßà RAM
• CPUsetsupDMArequest
• for(i =1...n)
Deviceputsdataonbus
&RAMacceptsit
• DeviceinterruptsCPUafterdone
CPU
RAM
DISK
Whichoneisthewinner?Whichoneistheloser?
DMAexample:readingfromaudio(mic)input
• DMAengineonaudiodevice…orI/Ocontroller…or
…
int dma_size = 4*PAGE_SIZE;
int *buf = alloc_dma(dma_size);
...
dev->mic_dma_baseaddr = (int)buf;
dev->mic_dma_count = dma_len;
dev->cmd = DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;
DMAexample:readingfromaudio(mic)input
• DMAengineonaudiodevice…orI/Ocontroller…or
…
int dma_size = 4*PAGE_SIZE;
void *buf = alloc_dma(dma_size);
...
dev->mic_dma_baseaddr = virt_to_phys(buf);
dev->mic_dma_count = dma_len;
dev->cmd = DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;
ProgrammedI/O
•
•
•
•
Requiresspecialinstructions
Canrequirededicatedhardwareinterfacetodevices
Protectionenforcedviakernelmodeaccesstoinstructions
Virtualizationcanbedifficult
Memory-MappedI/O
•
•
•
•
Re-usesstandardload/storeinstructions
Re-usesstandardmemoryhardwareinterface
Protectionenforcedwithnormalmemoryprotectionscheme
Virtualizationenabledwithnormalmemoryvirtualization
scheme
Howdoesprogramlearndeviceisready/done?
1.Polling: PeriodicallycheckI/Ostatusregister
• Commoninsmall,cheap,orreal-timeembeddedsystems
+ Predictabletiming,inexpensive
– WastesCPUcycles
2.Interrupts: DevicesendsinterrupttoCPU
•
•
+
–
–
Causeregisteridentifiestheinterruptingdevice
Interrupthandlerexaminesdevice,decideswhattodo
Onlyinterruptwhendeviceready/done
ForcedtosaveCPUcontext(PC,SP,registers,etc.)
Unpredictable,eventarrivaldependsonotherdevices’activity
Whichoneisthewinner?Whichoneistheloser?
DiverseI/Odevicesrequirehierarchicalinterconnect
whichismorerecentlytransitioningtopoint-to-point
topologies.
Memory-mappedI/Oisaneleganttechniqueto
read/writedeviceregisterswithstandardload/stores.
Interrupt-basedI/Oavoidsthewastedworkin
polling-basedI/Oandisusuallymoreefficient.
Modernsystemscombinememory-mappedI/O,
interrupt-basedI/O,anddirect-memoryaccess
tocreatesophisticatedI/Odevicesubsystems.