DConf 2016 presentation D for primary storage.key

UsingDforDevelopment
ofLargeScalePrimaryStorage
#DConf2016
LiranZvibel
Weka.IO,CTO
[email protected]
@liranzvibel
1
Agenda
• Weka.IOIntroduction
• Ourprogresssincewepickedoff
• ExampleswhereDreallyshines
• Ourchallenges
• Improvementssuggestions
• Q&A
DforPrimaryStorage#DConf2016
2
Weka.IOIntroduction
3
AboutWeka.IO
• Enablingcloudsandenterpriseswithasinglestoragesolutionfor
resilience,performance,scalabilityandcostefficiency
• HQinSanJose,CA;R&DinTelAviv,Israel
• 30engineers,vaststorageexperience
• VCbackedcompany;SeriesBledbyWaldenInternational;SeriesA
ledbyNorwestVenturePartners
• Productusedinproductionbyearlyadopters(stillinstealth)
• Over200klocofourownDcode,about35packages
DforPrimaryStorage#DConf2016
4
Storagesystemrequirements
• Extremelyreliable,“alwayson”,state-full.
• Highperformancedatapath,measuredinµsecs
• Complicated“controlpath”/“managementcode”
• DistributednatureduetoHArequirements
• LowlevelinteractionwithHWdevices
• Somekernel-levelcode,someassembly
• Languagehastobeefficienttoprogram,andfitfor
largeprojects
DforPrimaryStorage#DConf2016
5
TheWeka.IOframework
• Softwareonlysolution
• User-spaceprocesses
• 100%CPU,pollingbasedonnetworkingandstorage
• Asynchronousprogrammingmodel,usingFibersandaReactor
• Memoryefficient,zero-copyeverything,verylowlatency
• GCfree,lock-freeefficientdatastructures
• ProprietarynetworkingstackfromEthernettoRPC
DforPrimaryStorage#DConf2016
6
OurProgress
7
CurrentstateforWeka
• Nomoreshow-stoppers,stillalongwaytogo
• Indeedproductivityisveryhigh,verygoodcode-to-featuresratio
• Weareableto“rapidprototype”featuresandthenironthem
• Allmajorruntimeissuesresolved
• Wegetgreatperformance
• ChoosingDwasagoodmove,andprovedtobeahugesuccess
DforPrimaryStorage#DConf2016
8
Compilationprogress
• SwitchedtoLDC(thanksDavidNadlingerandtheLDCteam!)
• Compilationisnowbypackage
• BetterRAM“management”
• Leveragingparallelismtospeedbuildtime
• Recentfront-ends“feel”muchmorestable
• LDCletsusbuildoptimizedcompilationwithasserts,whichisagood
thingforQA.
DforPrimaryStorage#DConf2016
9
LDCstatus
• Gotover100%performanceboostoverDMD
• Whencompilingasasinglepackagewithoptimizations
• Fiberswitchingbasedonregistersandnotpthreads
• NoGCallocationwhenthrowingandhandlingexceptions(ThanksMithun!)
• Integratelibunwindwithdwarfsupportforstacktraces(no--disable-fpelim)
• Supportdebug(-g)withbackendoptimizations
• Templateinstantiationbug—stillunresolvedfortheupstream
• @ldc.attribute.section(“SECTIONNAME”)
• -staticflagtoldc,allowingeasycompileandshipmentofutilities
DforPrimaryStorage#DConf2016
10
GCallocationandlatency
• Wenowcheckhowmuchweallocated(usinghacks,apiwouldbenice)
fromtheReactor,anddecidetocollectifweallocatedmorethan20MB
• Collectionactuallyhappensveryinfrequently(fewtimesinanhour)
• Collectiontimeisde-synchronizedacrossthecluster
• Collectiontimestillsignificant—about10ms
• Maindrawback—allocationMAYtake‘infinite’amountoftimeif
kernelisstressedonmemory.
DforPrimaryStorage#DConf2016
11
ExceptionsandGC
• ExceptionhandlingcodewasmodifiedtoneverrelyonGCallocation
• ReactorandFiberscode(+ourTraceInfoclass)modifiedtokeepthe
traceinafiberlocalstate.
✴Problem:potentiallythrowingfromscope(exit/success/failure)
• Throwablesareaclass,soallocatingthemcomesfromtheGC,mustbe
staticallyallocated:
• static __gshared auto ex = new Exception(“:o(”);
DforPrimaryStorage#DConf2016
12
CodeTidbits
13
NetworkBufferPtr
@nogc @property inout(NetworkBuffer)* get() inout nothrow pure {
autoptr=cast(NetworkBuffer*)(_addr>>MAGIC_BITS);
assert(ptrisnull||(_addr&MAGIC_MASK)==ptr._gen);
returnptr;
}
aliasgetthis;
•
_genkeepsincrementingwhenbuffetsallocatedfrompools
• Pointersremembertheirgenerations,andvalidateaccurateaccess
• Helpsdebuggingstalepointers
• problemwithimplicitcastsofnull,aliasthisisnotstrongenough.
Maybesomesyntaxcouldhelp
DforPrimaryStorage#DConf2016
14
Handlingallenumvalues
switch (pkt.header.type) {
foreach(name; __traits(allMembers, PacketType)) {
case __traits(getMember, PacketType, name):
return __traits(getMember, this, "handle" ~ name)(pkt);
}
• SimilarsolutionverifiesallfieldsinaCstructhavethesameoffset,naturallytheC
partendsupbeingmuchmorecomplex.
DforPrimaryStorage#DConf2016
15
Flagsetting/testing
@propertyboolflag(stringNAME)(){
return(_flags&__traits(getMember,NBFlags,NAME))!=0;
}
@propertyvoidflag(stringNAME)(boolval){
if(val){
_flags|=__traits(getMember,NBFlags,NAME);
}else{
_flags&=~__traits(getMember,NBFlags,NAME);
}
}
buffer.flag!"TX_ACK"=true;
DforPrimaryStorage#DConf2016
16
Efficientpacking
staticif(JoinedKV.sizeof<=CACHE_LINE_SIZE){
aliasKV=JoinedKV;
enumseparateKV=false;
}else{
structKV{
Kkey;
/*valueswillbestoredseparatelyfor
bettercachebehavior*/
}
V[NumEntries]values;
enumseparateKV=true;
}
DforPrimaryStorage#DConf2016
17
Challenges
18
Compilationtime
• Projectisbrokeninto~35packages.
• Somelogicalpackagesarecompiledasseveralsmallerpackages
• Current2.0.68.2compilerhasseveralpackagescompiledabout90
•
seconds,leadingtototalcompiletimeof4-5minutes.
Newer2.070.2+PGOcompilerreducestimebyabout35%(Thanks
Johan!).Stillgetting3-4minutespercompletecompile.
DforPrimaryStorage#DConf2016
19
Compiletimeimprovementsuggestions
• Introducemoreparallelismintothebuildprocess
• Supportincrementalcompiles.
• Nowwhenadependencyischanged,completepackageshavetobe
completelyrebuilt.Inmanycases,mostoftheworkisredundant
• WhendependencyIMPLEMENTATIONischanged,stilleverythinggets
recompiled
• Support(centralized)cachingforbuildresults.
• Don'tlethumans“contextswitch”whilewaitingforthecompiler!
DforPrimaryStorage#DConf2016
20
LongSymbols
• Totalsymbols:99649,over1k:9639,over500k:102,over1M:62
• Longestsymbolwas5M!
• Makesworkingwithstandardtoolsmuchharder(somenmtoolscrashontheexe).
• Asimplehashingsolutionwasimplementedinourspecialcompiler
• Demanglingnowstoppedworkingforus,weonlygetmodule/funcname
• Moretimeisspentonhashingthanwhatissavedonlinkage.Wemayneeda
“native”solution.
DforPrimaryStorage#DConf2016
21
PhobosAlgsForcingGC
privatestructMapResult(aliasfun,Range,ARGS…){
ARGS_args;
aliasR=Unqual!Range;
R_input;
this(Rinput,ARGSargs){
_input=input;
_args=args;}
@propertyautoreffront(){returnfun(_input.front,_args);}
…
autounder_value_gc(R)(Rr,intvalue){returnr.filter!(x=>x<value);}
autounder_value_nogc(R)(Rr,intvalue){returnr.xfilter!((x,y)=>x<y)(value);}
automultiple_by_gc(R)(Rr,intvalue){returnr.map!(x=>x*value);}
automultiple_by_nogc(R)(Rr,intvalue){returnr.xmap!((x,y)=>x*y)(value);}
DforPrimaryStorage#DConf2016
22
ImprovementIdeas
23
static foreach
• Makeitexplicit
• Allowittomanipulatetypes,toreplacecomplextemplaterecursion
templatehasUDAttributeOfType(T,aliasX){
aliasattrs=TypeTuple!(__traits(getAttributes,X));
templatehelper(inti){
staticif(i>=attrs.length){
enumhelper=false;
}elsestaticif(is(attrs[i]==T)||is(typeof(attrs[i])==T)){
staticassert(!helper!(i+1),"Morethanonematchingattribute:"~attrs.stringof);
enumhelper=true;
}else{
enumhelper=helper!(i+1);
}
}
enumhasUDAttributeOfType=helper!0;
}
DforPrimaryStorage#DConf2016
24
Transitive@UDA
• Specifysome@UDAsastransitive,sohecompilercanhelp“prove”correctness.
• Forexample:
• Definefunctionas@atomicifitdoesnotcontextswitch
•
•
• Functionmaybe@atomicifitonlycalls@atomicfunctions
• Nextstepwouldbetoprovethatnocontextswitchhappens
Canbeimplementedin“runtime”ifthereisa__traitsthatreturnsallthe
functionsthatafunctionmaycall.
Nextphasewouldbetobeableto‘prove’thingsonthefunctions,so@nogc,
nothrow,pureetccanusethesamemechanism.
DforPrimaryStorage#DConf2016
25
OtherSuggestions
• __traitsthatreturnsthatmaxstacksizeofafunction
• Addapredicatethattellswhetherthereisanexistingexceptioncurrently
handled
• DonateWeka’s@nogc‘standardlibrary’toPhobos:
• OurFiberadditionsintoPhobos(throwInFiber,TraceInfosupport,etc)[other
libfunsaswell]
• Containers,algorithms,locklessdatastructures,etc…
DforPrimaryStorage#DConf2016
26
Peta
Exa
Questions?
Zetta
Yotta
Xenna
Weka(10
30)
Table 1
0.68
88.4
84.3
75.8
67.5
59.5
56.6
51.9
50.4
44.9
42.7
35.7
35.1
31.4
30.5
25.8
19.0
18.3
14.3
13.7
9.4
0.70
58.1
57.1
51.0
40.7
36.4
38.3
35.9
30.9
25.2
31.0
31.3
24.5
21.1
20.5
20.1
13.5
12.0
10.6
14.0
6.8
0.70 + PGO
54.7
54.7
49.7
37.3
43.1
35.3
32.4
34.6
26.8
27.0
30.2
22.9
17.7
19.9
16.3
15.4
11.3
10.4
13.6
6.3
• 2.0.70.2isamajorimprovementin
•
compiletimeoverthe2.068.2
Still,the30-40%improvementmeanthat
engineershavetowaitlongminutestoget
thewholeexetobuild.
• We’rebreakinglargepackageintosmaller
ones,whenpossible
DforPrimaryStorage#DConf2016
28