UsingDforDevelopment ofLargeScalePrimaryStorage #DConf2016 LiranZvibel Weka.IO,CTO [email protected] @liranzvibel 1 Agenda • Weka.IOIntroduction • Ourprogresssincewepickedoff • ExampleswhereDreallyshines • Ourchallenges • Improvementssuggestions • Q&A DforPrimaryStorage#DConf2016 2 Weka.IOIntroduction 3 AboutWeka.IO • Enablingcloudsandenterpriseswithasinglestoragesolutionfor resilience,performance,scalabilityandcostefficiency • HQinSanJose,CA;R&DinTelAviv,Israel • 30engineers,vaststorageexperience • VCbackedcompany;SeriesBledbyWaldenInternational;SeriesA ledbyNorwestVenturePartners • Productusedinproductionbyearlyadopters(stillinstealth) • Over200klocofourownDcode,about35packages DforPrimaryStorage#DConf2016 4 Storagesystemrequirements • Extremelyreliable,“alwayson”,state-full. • Highperformancedatapath,measuredinµsecs • Complicated“controlpath”/“managementcode” • DistributednatureduetoHArequirements • LowlevelinteractionwithHWdevices • Somekernel-levelcode,someassembly • Languagehastobeefficienttoprogram,andfitfor largeprojects DforPrimaryStorage#DConf2016 5 TheWeka.IOframework • Softwareonlysolution • User-spaceprocesses • 100%CPU,pollingbasedonnetworkingandstorage • Asynchronousprogrammingmodel,usingFibersandaReactor • Memoryefficient,zero-copyeverything,verylowlatency • GCfree,lock-freeefficientdatastructures • ProprietarynetworkingstackfromEthernettoRPC DforPrimaryStorage#DConf2016 6 OurProgress 7 CurrentstateforWeka • Nomoreshow-stoppers,stillalongwaytogo • Indeedproductivityisveryhigh,verygoodcode-to-featuresratio • Weareableto“rapidprototype”featuresandthenironthem • Allmajorruntimeissuesresolved • Wegetgreatperformance • ChoosingDwasagoodmove,andprovedtobeahugesuccess DforPrimaryStorage#DConf2016 8 Compilationprogress • SwitchedtoLDC(thanksDavidNadlingerandtheLDCteam!) • Compilationisnowbypackage • BetterRAM“management” • Leveragingparallelismtospeedbuildtime • Recentfront-ends“feel”muchmorestable • LDCletsusbuildoptimizedcompilationwithasserts,whichisagood thingforQA. DforPrimaryStorage#DConf2016 9 LDCstatus • Gotover100%performanceboostoverDMD • Whencompilingasasinglepackagewithoptimizations • Fiberswitchingbasedonregistersandnotpthreads • NoGCallocationwhenthrowingandhandlingexceptions(ThanksMithun!) • Integratelibunwindwithdwarfsupportforstacktraces(no--disable-fpelim) • Supportdebug(-g)withbackendoptimizations • Templateinstantiationbug—stillunresolvedfortheupstream • @ldc.attribute.section(“SECTIONNAME”) • -staticflagtoldc,allowingeasycompileandshipmentofutilities DforPrimaryStorage#DConf2016 10 GCallocationandlatency • Wenowcheckhowmuchweallocated(usinghacks,apiwouldbenice) fromtheReactor,anddecidetocollectifweallocatedmorethan20MB • Collectionactuallyhappensveryinfrequently(fewtimesinanhour) • Collectiontimeisde-synchronizedacrossthecluster • Collectiontimestillsignificant—about10ms • Maindrawback—allocationMAYtake‘infinite’amountoftimeif kernelisstressedonmemory. DforPrimaryStorage#DConf2016 11 ExceptionsandGC • ExceptionhandlingcodewasmodifiedtoneverrelyonGCallocation • ReactorandFiberscode(+ourTraceInfoclass)modifiedtokeepthe traceinafiberlocalstate. ✴Problem:potentiallythrowingfromscope(exit/success/failure) • Throwablesareaclass,soallocatingthemcomesfromtheGC,mustbe staticallyallocated: • static __gshared auto ex = new Exception(“:o(”); DforPrimaryStorage#DConf2016 12 CodeTidbits 13 NetworkBufferPtr @nogc @property inout(NetworkBuffer)* get() inout nothrow pure { autoptr=cast(NetworkBuffer*)(_addr>>MAGIC_BITS); assert(ptrisnull||(_addr&MAGIC_MASK)==ptr._gen); returnptr; } aliasgetthis; • _genkeepsincrementingwhenbuffetsallocatedfrompools • Pointersremembertheirgenerations,andvalidateaccurateaccess • Helpsdebuggingstalepointers • problemwithimplicitcastsofnull,aliasthisisnotstrongenough. Maybesomesyntaxcouldhelp DforPrimaryStorage#DConf2016 14 Handlingallenumvalues switch (pkt.header.type) { foreach(name; __traits(allMembers, PacketType)) { case __traits(getMember, PacketType, name): return __traits(getMember, this, "handle" ~ name)(pkt); } • SimilarsolutionverifiesallfieldsinaCstructhavethesameoffset,naturallytheC partendsupbeingmuchmorecomplex. DforPrimaryStorage#DConf2016 15 Flagsetting/testing @propertyboolflag(stringNAME)(){ return(_flags&__traits(getMember,NBFlags,NAME))!=0; } @propertyvoidflag(stringNAME)(boolval){ if(val){ _flags|=__traits(getMember,NBFlags,NAME); }else{ _flags&=~__traits(getMember,NBFlags,NAME); } } buffer.flag!"TX_ACK"=true; DforPrimaryStorage#DConf2016 16 Efficientpacking staticif(JoinedKV.sizeof<=CACHE_LINE_SIZE){ aliasKV=JoinedKV; enumseparateKV=false; }else{ structKV{ Kkey; /*valueswillbestoredseparatelyfor bettercachebehavior*/ } V[NumEntries]values; enumseparateKV=true; } DforPrimaryStorage#DConf2016 17 Challenges 18 Compilationtime • Projectisbrokeninto~35packages. • Somelogicalpackagesarecompiledasseveralsmallerpackages • Current2.0.68.2compilerhasseveralpackagescompiledabout90 • seconds,leadingtototalcompiletimeof4-5minutes. Newer2.070.2+PGOcompilerreducestimebyabout35%(Thanks Johan!).Stillgetting3-4minutespercompletecompile. DforPrimaryStorage#DConf2016 19 Compiletimeimprovementsuggestions • Introducemoreparallelismintothebuildprocess • Supportincrementalcompiles. • Nowwhenadependencyischanged,completepackageshavetobe completelyrebuilt.Inmanycases,mostoftheworkisredundant • WhendependencyIMPLEMENTATIONischanged,stilleverythinggets recompiled • Support(centralized)cachingforbuildresults. • Don'tlethumans“contextswitch”whilewaitingforthecompiler! DforPrimaryStorage#DConf2016 20 LongSymbols • Totalsymbols:99649,over1k:9639,over500k:102,over1M:62 • Longestsymbolwas5M! • Makesworkingwithstandardtoolsmuchharder(somenmtoolscrashontheexe). • Asimplehashingsolutionwasimplementedinourspecialcompiler • Demanglingnowstoppedworkingforus,weonlygetmodule/funcname • Moretimeisspentonhashingthanwhatissavedonlinkage.Wemayneeda “native”solution. DforPrimaryStorage#DConf2016 21 PhobosAlgsForcingGC privatestructMapResult(aliasfun,Range,ARGS…){ ARGS_args; aliasR=Unqual!Range; R_input; this(Rinput,ARGSargs){ _input=input; _args=args;} @propertyautoreffront(){returnfun(_input.front,_args);} … autounder_value_gc(R)(Rr,intvalue){returnr.filter!(x=>x<value);} autounder_value_nogc(R)(Rr,intvalue){returnr.xfilter!((x,y)=>x<y)(value);} automultiple_by_gc(R)(Rr,intvalue){returnr.map!(x=>x*value);} automultiple_by_nogc(R)(Rr,intvalue){returnr.xmap!((x,y)=>x*y)(value);} DforPrimaryStorage#DConf2016 22 ImprovementIdeas 23 static foreach • Makeitexplicit • Allowittomanipulatetypes,toreplacecomplextemplaterecursion templatehasUDAttributeOfType(T,aliasX){ aliasattrs=TypeTuple!(__traits(getAttributes,X)); templatehelper(inti){ staticif(i>=attrs.length){ enumhelper=false; }elsestaticif(is(attrs[i]==T)||is(typeof(attrs[i])==T)){ staticassert(!helper!(i+1),"Morethanonematchingattribute:"~attrs.stringof); enumhelper=true; }else{ enumhelper=helper!(i+1); } } enumhasUDAttributeOfType=helper!0; } DforPrimaryStorage#DConf2016 24 Transitive@UDA • Specifysome@UDAsastransitive,sohecompilercanhelp“prove”correctness. • Forexample: • Definefunctionas@atomicifitdoesnotcontextswitch • • • Functionmaybe@atomicifitonlycalls@atomicfunctions • Nextstepwouldbetoprovethatnocontextswitchhappens Canbeimplementedin“runtime”ifthereisa__traitsthatreturnsallthe functionsthatafunctionmaycall. Nextphasewouldbetobeableto‘prove’thingsonthefunctions,so@nogc, nothrow,pureetccanusethesamemechanism. DforPrimaryStorage#DConf2016 25 OtherSuggestions • __traitsthatreturnsthatmaxstacksizeofafunction • Addapredicatethattellswhetherthereisanexistingexceptioncurrently handled • DonateWeka’s@nogc‘standardlibrary’toPhobos: • OurFiberadditionsintoPhobos(throwInFiber,TraceInfosupport,etc)[other libfunsaswell] • Containers,algorithms,locklessdatastructures,etc… DforPrimaryStorage#DConf2016 26 Peta Exa Questions? Zetta Yotta Xenna Weka(10 30) Table 1 0.68 88.4 84.3 75.8 67.5 59.5 56.6 51.9 50.4 44.9 42.7 35.7 35.1 31.4 30.5 25.8 19.0 18.3 14.3 13.7 9.4 0.70 58.1 57.1 51.0 40.7 36.4 38.3 35.9 30.9 25.2 31.0 31.3 24.5 21.1 20.5 20.1 13.5 12.0 10.6 14.0 6.8 0.70 + PGO 54.7 54.7 49.7 37.3 43.1 35.3 32.4 34.6 26.8 27.0 30.2 22.9 17.7 19.9 16.3 15.4 11.3 10.4 13.6 6.3 • 2.0.70.2isamajorimprovementin • compiletimeoverthe2.068.2 Still,the30-40%improvementmeanthat engineershavetowaitlongminutestoget thewholeexetobuild. • We’rebreakinglargepackageintosmaller ones,whenpossible DforPrimaryStorage#DConf2016 28
© Copyright 2026 Paperzz