Parallelism and Concurrency - Cs.princeton.edu

ParallelismandConcurrency
COS326
DavidWalker
PrincetonUniversity
slidescopyright2013-2015DavidWalkerandAndrewW.Appel
permissiongrantedtoreusetheseslidesfornon-commercialeducaGonalpurposes
Parallelism
•  Whatisit?
•  Today'stechnologytrends.
•  Then:
–  Whyisitsomuchhardertoprogram?
•  (Isitactuallysomuchhardertoprogram?)
–  SomepreliminarylinguisGcconstructs
•  threadcreaGon
•  ourfirstparallelfuncGonalabstracGon:futures
2
PARALLELISM:
WHATISIT?
Parallelism
•  Whatisit?
–  doingmanythingsatthesameGmeinsteadofsequenGally
(one-aSer-the-other).
4
FlavorsofParallelism
DataParallelism
–  samecomputaGonbeingperformedonacollec%onof
independentitems
–  e.g.,addingtwovectorsofnumbers
TaskParallelism
–  differentcomputaGons/programsrunningatthesameGme
–  e.g.,runningwebserveranddatabase
PipelineParallelism
–  assemblyline:
sequenGal
f
mapfoverallitems
sequenGal
g
mapgoverallitems
5
Parallelismvs.Concurrency
Parallelism:performsmanytaskssimultaneously
•  purpose:improvesthroughput
•  mechanism:
–  manyindependentcompuGngdevices
–  decreaserunGmeofprogrambyuGlizingmulGplecoresorcomputers
•  eg:runningyourwebcrawleronaclusterversusonemachine.
Concurrency:mediatesmulG-partyaccesstosharedresources
•  purpose:decreaseresponseGme
•  mechanism:
–  switchbetweendifferentthreadsofcontrol
–  workononethreadwhenitcanmakeusefulprogress;whenitcan't,
suspenditandworkonanotherthread
•  eg:runningyourclock,editor,chatatthesameGmeonasingleCPU.
–  OSgiveseachoftheseprogramsasmallGme-slice(~10msec)
–  oSenslowsthroughputduetocostofswitchingcontexts
•  eg:don'tblockwhilewaiGngforI/Odevicetorespond,butletanotherthread
6
dousefulCPUcomputaGon
Parallelismvs.Concurrency
Parallelism:
performseveralindependent
taskssimultaneously
Concurrency:
mediate/mulGplex
accesstoshared
resource
job
cpu
cpu
job
job
…
…
cpu
job
resource
(cpu,disk,server,
datastructure)
manyefficientprogramsusesomeparallelismandsomeconcurrency
7
UNDERSTANDINGTECHNOLOGY
TRENDS
Moore'sLaw
•  Moore'sLaw:Thenumberoftransistorsyoucanputona
computerchipdoubles(approximately)everycoupleofyears.
•  ConsequenceformostofthehistoryofcompuGng:All
programsdoubleinspeedeverycoupleofyears.
–  Why?Hardwaredesignersarewickedsmart.
–  Theyhavebeenabletousethoseextratransistorsto(for
example)doublethenumberofinstrucGonsexecutedperGme
unit,therebyprocessingspeedofprograms
•  ConsequenceforapplicaGonwriters:
–  watchTVforawhileandyourprogramsop%mizethemselves!
–  perhapsmoreimportantly:newapplicaGonsthought
impossiblebecamepossiblebecauseofincreased
computaGonalpower
CPUClockSpeedsfrom1993-2005
10
CPUClockSpeedsfrom1993-2005
Nextyear’smachine
istwiceasfast!
11
CPUClockSpeedsfrom1993-2005
Oops!
12
CPUPower1993-2005
13
CPUPower1993-2005
Butpower
consumpGonisonly
partoftheproblem…
coolingistheother!
14
TheHeatProblem
15
TheProblem
2005
Cooler
1993
PenGum
Heat
Sink
16
Cray-4:1994
Upto64processors
Runningat1GHz
8MegabytesofRAM
Cost:roughly$10M
TheCRAY2,3,and4CPUandmemory
boardswereimmersedinabathof
electricallyinertcoolingfluid.
17
watercooled!
18
PowerDissipaGon
19
Powertochip
peaking
Darn!
Intelengineersno
longeropGmizemy
programswhile
IwatchTV!
20
Butlook:
Moore’sLawsGll
holds,sofar,for
transistors-per-chip.
Whatdowedo
withallthosetransistors?
1.  MulGcore!
2.  System-on-chipwith
specializedcoprocessors
(suchasGPU)
Bothofthoseare
PARALLELISM
21
Parallelism
WhyisitparGcularlyimportant(today)?
–  Roughlyeveryotheryear,achipfromIntelwould:
• 
• 
• 
• 
halvethefeaturesize(sizeoftransistors,wires,etc.)
doublethenumberoftransistors
doubletheclockspeed
thisdrovetheeconomicengineoftheITindustry(andtheUS!)
–  Nolongerabletodoubleclockorcutvoltage:aprocessor
won’tgetanyfaster!
• 
• 
• 
• 
(sowhyshouldyoubuyanewlaptop,desktop,etc.?)
powerandheatarelimitaGonsontheclock
errors,variability(noise)arelimitaGonsonthevoltage
butwecansGllpackalotoftransistorsonachip…(atleastfor
another10to15years.)
22
MulG-coreh/w–commonL2
Core
ALU
Core
ALU
L1cache
L1cache
ALU
ALU
L2cache
Mainmemory
23
Today…(actually9yearsago!)
24
GPUs
•  There'snothinglikevideo
gamingtodriveprogress
incomputa%on!
•  GPUscanhavehundreds
oreventhousandsof
cores
•  Threeofthe5most
powerfulsupercomputers
intheworldtake
advantageofGPU
acceleraGon.
•  ScienGstsuseGPUsfor
simulaGonandmodelling
–  eg:proteinfoldingand
fluiddynamics
GPUs
•  There'snothinglikevideo
gamingtodriveprogress
incomputa%on!
•  GPUscanhavehundreds
oreventhousandsof
cores
•  Threeofthe5most
powerfulsupercomputers
intheworldtake
advantageofGPU
acceleraGon.
•  ScienGstsuseGPUsfor
simulaGonandmodelling
JohnDanskin,PhDPrinceton1994,
–  eg:proteinfoldingand
VicePresidentforGPUarchitecture,Nvidia
fluiddynamics
(whathedoeswithhisspareGme…builtthiscarhimself)
So…
InsteadoftryingtomakeyourCPUgofaster,Intel’sjustgoingto
packmoreCPUsontoachip.
– 
– 
– 
– 
afewyearsago:dualcore(2CPUs).
aliqlemorerecently:4,6,8cores.
IntelistesGng48-corechipswithresearchersnow.
Within10years,you’llhave~1024IntelCPUsonachip.
Infact,that’salreadyhappeningwithgraphicschips(eg,Nvidia).
– 
– 
– 
– 
reallygoodatsimpledataparallelism(manydeeppipes)
buttheyaremuchdumberthananIntelcore.
andrightnow,chewupalotofpower.
watchforGPUstoget“smarter”andmorepowerefficient,while
CPUsbecomemorelikeGPUs.
27
STILLMOREPROCESSORS:
THEDATACENTER
DataCenters:GeneraGonZSuperComputers
DataCenters:LotsofConnectedComputers!
DataCenters
•  10sor100softhousandsofcomputers
•  Allconnectedtogether
•  MoGvatedbynewapplicaGonsandscalablewebservices:
–  let'scatalogueallNbillionwebpagesintheworld
–  let'sallallowanyoneintheworldtosearchforthepageheor
sheneeds
–  let'sprocessthatsearchinlessthanasecond
•  It'sAmazing!
•  It'sMagic!
DataCenters:LotsofConnectedComputers
Computercontainersforplug-and-playparallelism:
SoundsGreat!
•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?
33
SoundsGreat!
•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?
–  noway!
34
SoundsGreat!
•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?
–  noway!
–  toupgradefromIntel386to486,theappwriterandcompiler
writerdidnothavetodoanything(much)
•  IA486interpretedthesamesequenGalstreamofinstrucGons;it
justdiditfaster
•  thisiswhywecouldwatchTVwhileIntelengineersopGmizedour
programsforus
–  toupgradefromIntel486todualcore,weneedtofigureout
howtosplitasinglestreamofinstrucGonsintotwostreamsof
instrucGonsthatcollaboratetocompletethesametask.
•  withoutwork&thought,ourprogramsdon'tgetanyfasteratall
•  ittakesingenuitytogenerateefficientparallelalgorithmsfrom
sequen%alones
35
What’stheanswer?
InPart:FuncGonalProgramming!
Naiad
Pig
Dryad
PARALLELANDCONCURRENT
PROGRAMMING
MulGcoreHardware&DataCenters
Core
ALU
Core
ALU
L1cache
L1cache
ALU
ALU
L2cache
Mainmemory
39
Speedup
•  Speedup:theraGoofsequenGalprogramexecuGonGmeto
parallelexecuGonGme.
•  IfT(p)istheGmeittakestorunacomputaGononpprocessors
speedup(p)=T(1)/T(p)
•  Aparallelprogramhasperfectspeedup(akalinearspeedup)if
T(1)/T(p)=speedup=p
•  Badnews:Noteveryprogramcanbeeffec%velyparallelized.
–  infact,veryfewprogramswillscalewithperfectspeedups.
–  wecertainlycan'tachieveperfectspeedupsautomaGcally
–  limitedbysequenGalporGons,datatransfercosts,...
40
MostTroubling…
Most,butnotall,parallelandconcurrentprogrammingmodels
arefarhardertoworkwiththansequenGalones:
•  Theyintroducenondeterminism
–  therootof(almostall)evil
–  programpartssuddenlyhavemanydifferentoutcomes
•  theyhavedifferentoutcomesondifferentruns
•  debuggingrequiresconsideringallofthepossibleoutcomes
•  horribleheisenbugshardtotrackdown
•  Theyarenonmodular
–  moduleAimplicitlyinfluencestheoutcomesofmoduleB
•  Theyintroducenewclassesoferrors
–  racecondiGons,deadlocks
•  Theyintroducenewperformance/scalabilityproblems
–  busy-waiGng,sequenGalizaGon,contenGon,
41
InformalErrorRateChart
regularity
withwhich
youshoot
yourself
inthefoot
InformalErrorRateChart
regularity
withwhich
youshoot
yourself
inthefoot
heaven
onearth
manual
memory
management
nullpointers,
paucityoftypes,
inheritence
kitchen
sink+
manual
memory
unstructured
parallel
orconcurrent
programming
SolidParallelProgrammingRequires
1.GoodsequenGalprogrammingskills.
–  allthethingswe’vebeentalkingabout:usemodules,types,...
2.DeepknowledgeoftheapplicaGon.
3.Pickacorrect-by-construc%onparallelprogrammingmodel
–  wheneverpossible,aparallelmodelwithsemanGcsthatcoincides
withsequenGalsemanGcs
•  wheneverpossible,reusewell-testedlibrariesthathideparallelism
–  wheneverpossible,amodelthatcutsdownnon-determinism
–  wheneverpossible,amodelwithfewerpossibleconcurrencybugs
–  ifbugscanarise,knowandusesafeprogrammingpaqerns
4.Carefulengineeringtoensurescaling.
–  unfortunately,thereissomeGmesatradeoff:
•  reducednondeterminismcanleadtoreducedresourceuGlizaGon
–  synchronizaGon,communicaGoncostsmayneedopGmizaGon
44
OURFIRSTPARALLEL
PROGRAMMINGMODEL:THREADS
Threads:AWarning
•  ConcurrentThreadswithLocks:theclassicshoot-yourself-inthe-footconcurrentprogrammingmodel
–  alltheclassicerrormodes
•  WhyThreads?
–  almostallprogramminglanguageswillhaveathreadslibrary
•  OCamlinparGcular!
–  youneedtoknowwherethepiyallsare
–  theassemblylanguageofconcurrentprogrammingparadigms
•  we’llusethreadstobuildseveralhigher-levelprogramming
models
Threads
•  Threads:anabstracGonofaprocessor.
–  programmer(orcompiler)decidesthatsomeworkcanbedone
inparallelwithsomeotherwork,e.g.:
let _ = compute_big_thing() in
let y = compute_other_big_thing() in
...
–  weforkathreadtorunthecomputaGoninparallel,e.g.:
let t = Thread.create compute_big_thing () in
let y = compute_other_big_thing () in
...
47
IntuiGoninPictures
let t = Thread.create f () in
let y = g () in
...
processor1
Gme1 Thread.create
Gme2 execute g ()
Gme3 ...
processor2
(* doing nothing *)
execute f ()
...
48
OfCourse…
Supposeyouhave2availablecoresandyoufork4threads.Ina
typicalmulG-threadedsystem,
–  theoperaGngsystemprovidestheillusionthattherearean
infinitenumberofprocessors.
•  notreally:eachthreadconsumesspace,soifyouforktoomany
threadstheprocesswilldie.
–  it%me-mul%plexesthethreadsacrosstheavailableprocessors.
•  aboutevery10msec,itstopsthecurrentthreadonaprocessor,
andswitchestoanotherthread.
•  soathreadisreallyavirtualprocessor.
49
OCaml,ConcurrencyandParallelism
Unfortunately,evenifyourcomputerhas2,4,6,8cores,OCaml
cannotexploitthem.ItmulGplexesallthreadsoverasinglecore
thread
thread
…
thread
core
Hence,OCamlprovidesconcurrency,butnotparallelism.Why?
BecauseOCaml(likePython)hasnoparallel“runGmesystem”or
garbagecollector.OtherfuncGonallanguages(Haskell,F#,...)do.
Fortunately,whenthinkingaboutprogramcorrectness,itdoesn’t
maqerthatOCamlisnotparallel--IwilloSenpretendthatitis.
YoucanhideI/Olatency,domulGprocessprogrammingordistribute
tasksamongstmulGplecomputersinOCaml.
CoordinaGon
Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t
let t = Thread.create f () in
let y = g () in
...
HowdowegetbacktheresultthattiscompuGng?
51
FirstAqempt
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
match !r with
| Some v -> (* compute with v and y *)
| None -> ???
What’swrongwiththis?
52
SecondAqempt
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
let rec wait() =
match !r with
| Some v -> v
| None -> wait()
in
let v = wait() in
(* compute with v and y *)
53
TwoProblems
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
let rec wait() =
match !r with
| Some v -> v
| None -> wait()
in
let v = wait() in
(* compute with v and y *)
First,wearebusy-wai%ng.
•  consumingcpuwithoutdoingsomethinguseful.
•  theprocessorcouldbeeitherrunningausefulthread/programorpower
down.
54
TwoProblems
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
let rec wait() =
match !r with
| Some v -> v
| None -> wait()
in
let v = wait() in
(* compute with v and y *)
Second,anoperaGonliker:=Somevmaynotbeatomic.
• 
• 
• 
r:=SomevrequiresustocopythebytesofSomevintotherefr
wemightseepartofthebytes(correspondingtoSome)beforewe’ve
wriqenintheotherparts(e.g.,v).
Sothewaitermightseethewrongvalue.
55
Atomicity
Considerthefollowing:
let inc(r:int ref) = r := (!r) + 1
andsupposetwothreadsareincremenGngthesamerefr:
Thread1
Thread2
inc(r);
inc(r);
!r
!r
IfriniGallyholds0,thenwhatwillThread1seewhenitreadsr?
56
Atomicity
Theproblemisthatwecan’tseeexactlywhatinstrucGonsthe
compilermightproducetoexecutethecode.
Itmightlooklikethis:
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
57
Atomicity
ButaclevercompilermightopGmizethisto:
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
58
Atomicity
Furthermore,wedon’tknowwhentheOSmightinterruptone
threadandruntheother.
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
(ThesituaGonissimilar,butnotquitethesameonmulGprocessorsystems.)
59
TheHappensBeforeRelaGon
Wedon’tknowexactlywheneachinstrucGonwillexecute,but
therearesomeconstraints:theHappensBeforerelaGon
Rule1:Giventwoexpressions(orinstrucGons)insequence,e1;
e2weknowthate1happensbeforee2.
Rule2:Givenaprogram:
lett=Thread.createfxin
....
Thread.joint;
e
weknowthat(fx)happensbeforee.
Atomicity
OnepossibleinterleavingoftheinstrucGons:
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
Whatanswerdoweget?
61
Atomicity
Anotherpossibleinterleaving:
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
WhatanswerdowegetthisGme?
62
Atomicity
Anotherpossibleinterleaving:
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
WhatanswerdowegetthisGme?
Moral:ThesystemisresponsibleforschedulingexecuGonof
instrucGons.
Moral:Thiscanleadtoanenormousdegreeofnondeterminism.
63
Atomicity
Infact,today’smulGcoreprocessorsdon’ttreatmemoryina
sequen%allyconsistentfashion.
Thread1
Thread2
EAX := load(r);
EAX := load(r);
EAX := EAX + 1;
EAX := EAX + 1;
store EAX into r
store EAX into r
EAX := load(r)
EAX := load(r)
Thatmeansthatwecan’tevenassumethatwhatwewillsee
correspondstosomeinterleavingofthethreads’instruc%ons!
Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodidea
touseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGon
primiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.
64
Atomicity
Infact,today’smulGcoreprocessorsdon’ttreatmemoryina
sequen%allyconsistentfashion.Thatmeansthatwecan’teven
assumethatwhatwewillseecorrespondstosomeinterleaving
ofthethreads’instruc%ons!
Core1
Core2
Core3
Core4
WhenCore1storesto
ALU
ALU
ALU
ALU
“memory”,itlazily
propagatestoCore2’sL1
cache.TheloadatCore2
L1cache L1cache L1cache L1cache
mightnotseeit,unless
thereisanexplicit
synchronizaGon.
L2cache
Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodidea
touseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGon
primiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.
65
Summary:Interleaving&RaceCondiGons
Calculatepossibleoutcomesforaprogrambyconsideringallofthepossible
interleavingsoftheatomicacGonsperformedbyeachthread.
–  Subjecttothehappens-beforerelaGon.
•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.
•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.
–  Here,atomicmeansindivisibleacGons.
•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.
•  But,wriGngamulG-wordobjectisusuallynotatomic.
•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesof
simpleroperaGonssuchas
–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)
Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople
–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.
–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.
–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!
66
Summary:Interleaving&RaceCondiGons
Calculatepossibleoutcomesforaprogrambyconsideringallofthepossible
interleavingsoftheatomicacGonsperformedbyeachthread.
–  Subjecttothehappens-beforerelaGon.
WARNING
•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.
•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.
Ifyouseepeopletalkaboutinterleavings,BEWARE!
–  Here,atomicmeansindivisibleacGons.
Itprobablymeansthey’reassuming
•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.
“sequenGalconsistency,”
•  But,wriGngamulG-wordobjectisusuallynotatomic.
whichisanoversimplified,naïvemodelofwhatthe
•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesof
parallelcomputerreallydoes.
simpleroperaGonssuchas
It’sactuallymorecomplicatedthanthat.
–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)
Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople
–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.
–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.
–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!
67
AconvenGonalsoluGonforshared-memoryparallelism
let inc(r:int ref) = r := (!r) + 1
Thread1
Thread2
lock(mutex);
lock(mutex);
inc(r);
inc(r);
!r
!r
unlock(mutex);
unlock(mutex);
GuaranteesmutualexclusionofthesecriGcalsecGons.
ThissoluGonworks(evenforrealmachinesthatarenot
sequenGallyconsistent),but…
Complextoprogram,subjecttodeadlock,pronetobugs,
notfault-tolerant,hardtoreasonabout.
AconvenGonalsoluGonforshared-memoryparallelism
let inc(r:int ref) = r := (!r) + 1
Thread1
Thread2
lock(mutex);
lock(mutex);
inc(r);
inc(r);
SynchronizaHo
n
!r
!r
unlock(mutex);
unlock(mutex);
GuaranteesmutualexclusionofthesecriGcalsecGons.
ThissoluGonworks(evenforrealmachinesthatarenot
sequenGallyconsistent),but…
Complextoprogram,subjecttodeadlock,pronetobugs,
notfault-tolerant,hardtoreasonabout.
AnotherapproachtothecoordinaGonProblem
Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t
let t = Thread.create f () in
let y = g () in
...
Howdowegetbacktheresultthattiscompu%ng?
70
OneSoluGon(usingjoin)
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
Thread.join t ;
match !r with
| Some v -> (* compute with v and y *)
| None -> failwith “impossible”
71
OneSoluGon(usingjoin)
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
Thread.join t ;
match !r with
| Some v -> (* compute with v and y *)
| None -> failwith “impossible”
Thread.join tcauses
thecurrentthreadtowait
unGlthethreadt
terminates.
72
OneSoluGon(usingjoin)
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in
let y = g() in
SynchronizaHo
Thread.join t ;
n
match !r with
| Some v -> (* compute with v and y *)
| None -> failwith “impossible”
SoaSerthejoin,weknow
thatanyoftheoperaGons
ofthavecompleted.
73
InPictures
Thread1
t=createfx
inst1,1;
inst1,2;
inst1,3;
inst1,4;
…
inst1,n-1;
inst1,n;
joint
Thread2
inst2,1;
inst2,2;
inst2,3;
…
inst2,m;
Weknowthatforeach
threadtheprevious
instrucGonsmusthappen
beforethelaterinstrucGons.
Soforinstance,inst1,1must
happenbeforeinst1,2.
74
InPictures
Thread1
t=createfx
inst1,1;
inst1,2;
inst1,3;
inst1,4;
…
inst1,n-1;
inst1,n;
joint
Thread2
inst2,1;
inst2,2;
inst2,3;
…
inst2,m;
Wealsoknowthatthe
forkmusthappenbefore
thefirstinstrucGonofthe
secondthread.
75
InPictures
Thread1
t=createfx
inst1,1;
inst1,2;
inst1,3;
inst1,4;
…
inst1,n-1;
inst1,n;
joint
Thread2
inst2,1;
inst2,2;
inst2,3;
…
inst2,m;
Wealsoknowthatthe
forkmusthappenbefore
thefirstinstrucGonofthe
secondthread.
Andthankstothejoin,
weknowthatallofthe
instrucGonsofthesecond
threadmustbecompleted
beforethejoinfinishes.
76
InPictures
Thread1
t=createfx
inst1,1;
inst1,2;
inst1,3;
inst1,4;
…
inst1,n-1;
inst1,n;
joint
Thread2
inst2,1;
inst2,2;
inst2,3;
…
inst2,m;
However,ingeneral,we
donotknowwhether
inst1,iexecutesbeforeor
aSerinst2,j.
Ingeneral,synchroniza%on
instruc%onslikeforkand
joinreducethenumberof
possibleinterleavings.
Synchroniza%oncutsdown
nondeterminism.
Intheabsenceof
synchronizaGonwedon’t
knowanything…
77