5HSOLFDWLRQLQ1RGH3DUWLWLRQHG'DWD:DUHKRXVHV 3HGUR)XUWDGR $EVWUDFW 8QLYHUVLW\RI&RLPEUD 'HSDUWDPHQWRGH(QJHQKDULD,QIRUPiWLFD 3yOR,,3LQKDOGH0DUURFRV &RLPEUD 3RUWXJDO SQI#GHLXFSW ,Q WKLV SDSHU ZH FRQFHQWUDWH RQ JXDUDQWHHLQJ HIILFLHQW DYDLODELOLW\ DQG SURPRWLQJ PDQDJHDELOLW\ LQ D QRGHSDUWLWLRQHG GDWD ZDUHKRXVH 13': 7KH REMHFWLYH LV WKDW WKH V\VWHP EH DOZD\VRQ DQG DOZD\V HIILFLHQW HYHQ ZKHQ HQWLUH SDUWV RI LW DUH WDNHQ RIIOLQH IRU PDLQWHQDQFH DQG PDQDJHPHQW IXQFWLRQV VXFK DV ORDGLQJ ZLWK QHZ GDWD RU RWKHU '%$ IXQFWLRQDOLW\ 5HSOLFDWLRQ KDV DOUHDG\ EHHQ VWXGLHG IRU SDUDOOHO GDWDEDVHV LQ JHQHUDO :H LQYHVWLJDWH KRZ DOWHUQDWLYH UHSOLFDWLRQ VWUDWHJLHV FDQEHDSSOLHGWRWKH13':FRQWH[WDQGDQDO\]H DGYDQWDJHVDQGGUDZEDFNVDJDLQVWPHWULFV ,QWURGXFWLRQ 3DUDOOHO DUFKLWHFWXUHV FDQ VSHHGXS VLJQLILFDQWO\ WKH SURFHVVLQJ RYHU ODUJH GDWD ZDUHKRXVHV :H KDYH EHHQ SXUVXLQJ WKH LGHD RI UHSODFLQJ IXOO\GHGLFDWHG DQG SRZHUIXOVHUYHUVE\DSRVVLEO\QRQGHGLFDWHGQHWZRUNRI ORZFRVW XQGHUXWLOL]HG FRPSXWHUV WR KROG DQG SURFHVV GDWD ZDUHKRXVHV 7KH GDWD ZDUHKRXVH FDQ UHDFK JLJD RU HYHQ WHUDE\WHV DQG LV W\SLFDOO\ RUJDQL]HG DV D VHW RI PXOWLGLPHQVLRQDOVFKHPDV>@7KHUHDUHW\SLFDOO\VRPH YHU\ELJUHODWLRQV±IDFWVVWRULQJKLVWRULFDOGHWDLOVXFKDV HDFK LQGLYLGXDO VDOH RI HDFK SURGXFW LQ HDFK VWRUH RI D UHWDLO FKDLQ DQG VPDOOHU UHODWLRQV ± GLPHQVLRQV ± ZLWK GHVFULSWLYH SURSHUWLHV IRU WKH GLPHQVLRQV HJ SURGXFW VWRUHWLPH,QWKDWFRQWH[WSDUWLWLRQLQJUHIHUVWRGLYLGLQJ UHODWLRQV LQWR QRGHV VRPHKRZ WR WDNH DGYDQWDJH RI SDUDOOHO QRGH SURFHVVLQJ :H KDYH GLVFXVVHG KRUL]RQWDO SDUWLWLRQLQJ VWUDWHJLHV IRU 13': LQ >@ DQG VKRZHG WKDW D FDUHIXO SDUWLWLRQLQJ VWUDWHJ\ RYHU D VZLWFKHG QHWZRUN HQYLURQPHQW FDQ DFKLHYH DFFHSWDEOH VSHHGXSV +RZHYHU DYDLODELOLW\ LV DQ LVVXH LQ VXFK D FRQWH[W VR WKDW DYDLODELOLW\ RULHQWHG UHSOLFDWLRQEHFRPHVDPDMRUQHFHVVLW\DVDZD\WR SURYLGH DYDLODELOLW\ $ UHSOLFD LV D ³VWDQGE\´ FRS\ RI VRPHGDWDWKDWFDQEHDFWLYDWHGDWDQ\PRPHQWLQFDVHRI XQDYDLODELOLW\ RU IDLOXUH RI WKH QRGH KROGLQJ WKH ³RULJLQDO´ VR WKDW SURFHVVLQJ UHVXPHV DV XVXDO ,I SURFHVVLQJ ZLWK XQDYDLODEOH QRGHV LV LPSOHPHQWHG HIILFLHQWO\ XQDYDLODELOLW\ EHFRPHV OHVV RQHURXV WR WKH ZKROHV\VWHPDQGLWDOVREHFRPHVIHDVLEOHWRVWRSDVHWRI QRGHV IRU GDWD ORDGLQJ PDLQWHQDQFH XSJUDGLQJ RU RWKHU PDQDJHPHQW DFWLYLWLHV ZLWKRXW DQ\ PDMRU UHSHUFXVVLRQV WRSURFHVVLQJ7KHV\VWHPUHPDLQVDOZD\VRQDQGDOZD\V HIILFLHQW 5HSOLFD SODFHPHQW KDV EHHQ VWXGLHG LQ WKH FRQWH[W RI JHQHULF SDUDOOHO DQG GLVWULEXWHG GDWDEDVHV LQ ZKLFK WKH UHODWLRQVDUHQRWSDUWLWLRQHG>@:HUHYLHZ WKRVHZRUNVLQWKHUHODWHGZRUNVHFWLRQ,QWKLVSDSHUZH GLVFXVV UHSOLFDWLRQ IRU DYDLODELOLW\ LQ WKH 13': FRQWH[W DQGGLVFXVVWKHLUXVHIRUERWKWROHUDWLQJQRGHIDLOXUHVDQG DOORZLQJ PXOWLSOH QRGHV WR EH RIIOLQH VLPXOWDQHRXVO\ IRU ORDGLQJ RU DGPLQLVWUDWLRQ :H FRPSDUH WKH DSSURDFKHV IURPWKHSHUVSHFWLYHRIHIILFLHQF\2XUPDLQFRQWULEXWLRQV LQFOXGHVKRZLQJKRZUHSOLFDWLRQVWUDWHJLHVFDQEHDSSOLHG WR D ZRUNORDGEDVHG SUHSDUWLWLRQHG 13': VHWWLQJ DQG KRZ SURFHVVLQJ FDQ LQFRUSRUDWH WKH UHSOLFDV LQ FDVH RI QRGH IDLOXUHV DQDO\]LQJ DOWHUQDWLYHV DJDLQVW UHOHYDQW PHWULFV HYDOXDWLQJ WKH DOWHUQDWLYHV ZLWK HPSKDVLV RQ HIILFLHQF\ DQG IOH[LELOLW\ IRU DOORZLQJ PXOWLSOH RIIOLQH QRGHV DQDO\]LQJ WKH WUDGHRII EHWZHHQ HIILFLHQF\ DQG WKH FDSDFLW\ WR WDNH PXOWLSOH QRGHV RIIOLQH VLPXOWDQHRXVO\ 7KH SDSHU LV RUJDQL]HG DV IROORZV VHFWLRQ GLVFXVVHV UHODWHG ZRUN 6HFWLRQ RYHUYLHZV WKH 1RGH 3DUWLWLRQHG 'DWD :DUHKRXVH 6HFWLRQV DQG GLVFXVV UHSOLFDWLRQ DOWHUQDWLYHV DQG VHFWLRQ FRPSDUHV WKH DSSURDFKHV 6HFWLRQFRQWDLQVFRQFOXGLQJUHPDUNVDQGIXWXUHZRUN )LJXUH%DVLF3DUWLWLRQLQJ([DPSOHLQ13':73&+VFKHPD 5HODWHG:RUN 7KH PRVW UHOHYDQW UHODWHG ZRUN IRU WKLV SDSHU FRQFHUQV RQ UHSOLFDWLRQ VWUDWHJLHV EXW ZH DOVR UHYLHZ EULHIO\ SDUWLWLRQLQJ 6RPH RI WKH PRVW SURPLVLQJ SDUWLWLRQLQJ DQG SODFHPHQW DSSURDFKHV IRFXV RQ TXHU\ ZRUNORDGEDVHG SDUWLWLRQLQJ FKRLFH > @ 7KH LGHD LQ WKRVH ZRUNV LV WR XVH WKH TXHU\ ZRUNORDG WR GHWHUPLQH WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DWWULEXWHV ZKLFK VKRXOG EH UHODWHG WR W\SLFDO TXHU\ DFFHVV SDWWHUQV $OO WKRVH ZRUNV IRFXV PDLQO\ RQ KDVK SDUWLWLRQLQJ IRU HIILFLHQW SDUDOOHO MRLQ SURFHVVLQJ > @ DOVR UHYLHZHG LQ >@ 2XU SUHYLRXV ZRUN RQ WKH 13': > @ SURSRVHV DQG DQDO\]HV JHQHULF GDWD SDUWLWLRQLQJ VWUDWHJLHV LQGHSHQGHQWO\ RI WKH XQGHUO\LQJ GDWDEDVH VHUYHU DQG WDUJHWHG DW QRGH SDUWLWLRQHG GDWD ZDUHKRXVHV 2XU SXUSRVH LQ WKLV SDSHU LV WR VWXG\ DYDLODELOLW\ DQG UHSOLFDWLRQ FRQFHUQV WR WKH 13': GHVLJQ 5HSOLFDWLRQKDVEHHQVWXGLHGLQWKHSDUDOOHOGDWDEDVH FRQWH[W ,Q 7DQGHP¶V 1RQ6WRS 64/ >@ WKH XVH RI PLUURUHG GLVN GULYHV RIIHUV D KLJK OHYHO RI DYDLODELOLW\ EXWGRHV DSRRU MRERIGLVWULEXWLQJ WKH ORDG RI D IDLOHG SURFHVVRU ,I D SURFHVVRU IDLOV WKH VXEVWLWXWH SURFHVVRU ZLOOKDYHWRKDQGOHWKHGLVNVRIWKHIDLOHGSURFHVVRUDV ZHOO DV LWV RZQ HVVHQWLDOO\ GRXEOLQJ WKH SURFHVVLQJ WLPH ,Q WKLV SDSHU ZH DSSO\ WKLV VWUDWHJ\ DV WKH )XOO UHSOLFDWLRQ )5 RSWLRQ 7HUDGDWD¶V VFKHPH >@ DVVXPHV UHODWLRQ FOXVWHUV JURXS RI QRGHV DQG FDQ EDFNXSDSDUWLWLRQHGFRS\RIDUHODWLRQE\SODFLQJLWLQ WKH 1 RWKHU QRGHV RI WKH UHODWLRQ FOXVWHU ZLWK 1 QRGHV$OWKRXJKWKLVVFKHPHEDODQFHVWKHSURFHVVLQJLQ FDVHRIIDLOXUHVLIPRUHWKDQRQHQRGHLVXQDYDLODEOHLQ WKHFOXVWHUWKHV\VWHPVWRSV,QFKDLQHGGHFOXVWHULQJ> @ WZR GHFOXVWHUHG FRSLHV DUH NHSW VXFK WKDW WKH IUDJPHQWVRIWKHVHFRQGGHFOXVWHUHGFRS\DUHSODFHGLQ GLIIHUHQWQRGHVIURPWKHRQHVRIWKHSULPDU\FRS\7KLV VWUDWHJ\ LPSURYHV DYDLODELOLW\ ZKLOH PDLQWDLQLQJ WKH SHUIRUPDQFH OHYHO RI WKH 7HUDGDWD VFKHPH ,Q >@ LQWHUOHDYHG GHFOXVWHULQJ GLYLGHV WKH GLVNV LQWR FOXVWHUV DQG IXOO\ GHFOXVWHUV UHODWLRQ SDUWLWLRQV LQWR WKH FRUUHVSRQGLQJFOXVWHU,Q>@WKHDXWKRUVFRPSDUHKLJK DYDLODELOLW\ PHGLD UHFRYHU\ WHFKQLTXHV LQ D JHQHULF 2/73 HQYLURQPHQW LQFOXGLQJ 7HUDGDWD¶V LQWHUOHDYHG GHFOXVWHULQJ 5HFHQW ZRUN RQ UHSOLFDWLRQ LQFOXGHV > @ 7KH DXWKRUVXVHGDWDUHSOLFDWLRQWRLPSURYHGDWDDYDLODELOLW\ DQG TXHU\ ORDG EDODQFLQJ ZKLOH GHDOLQJ ZLWK FRQVLVWHQF\ SUREOHPV 7KH\ SURSRVH D OD]\ SUHYHQWLYH GDWDUHSOLFDWLRQVROXWLRQLQ>@DQGDVWUDWHJ\WRVFDOH XSWKHVROXWLRQLQ>@7KHZRUNLQ>@VWXGLHVVLPLODU DSSURDFKHVZKHQDSSOLHGWR:$1HQYLURQPHQWV7KH\ LGHQWLI\ WKH PRVW FUXFLDO ERWWOHQHFNV RI WKH H[LVWLQJ SURWRFROV DQG SURSRVH RSWLPL]DWLRQV WKDW DOOHYLDWH WKH LGHQWLILHGSUREOHPV :KLOH WKHVH ZRUNV IRFXV RQ JHQHULF UHSOLFDWLRQ VWUDWHJLHV IRU DYDLODELOLW\ FRQVLGHULQJ QRQSDUWLWLRQHG UHODWLRQV DQGRU 2/73 ORDGV ZH GLVFXVV DQDO\]H DQG HYDOXDWHUHSOLFDWLRQVWUDWHJLHVRQWKHVSHFLILFFRQWH[WRI WKH1RGH3DUWLWLRQHG'DWD:DUHKRXVHDQGDOVRFRQVLGHU LI WKH VWUDWHJLHV DOORZ PXOWLSOH QRGHV WR EH RIIOLQH VLPXOWDQHRXVO\IRUPDLQWHQDQFHRUPDQDJHPHQW 7KH13': 7KH 13': LV D GHVLJQ IRU HIILFLHQW SURFHVVLQJ RI WKHGDWDZDUHKRXVHRYHUORZFRVWFRPSXWHUQRGHVRQD SRVVLEO\ QRQGHGLFDWHG VZLWFKHG QHWZRUN 7KH REMHFWLYHLVQRWWRDVVXPHDQ\VSHFLDOL]HGKDUGZDUHRU LQWHUFRQQHFWV VR WKDW WKH 13': LV DEOH WR UXQ IRU LQVWDQFH LQ D 0ESV VZLWFKHG /$1 3DUDOOHOLVP LV REWDLQHGE\GLYLGLQJWKHGDWDVHWLQLWLDOO\LQWRWKHGLVNV RILQGLYLGXDOQRGHVVRWKDWHDFKQRGHLVDEOHWRDFFHVV LWV GDWD ORFDOO\ DQG GDWD LV H[FKDQJHG EHWZHHQ QRGHV ZKHQ QHFHVVDU\ ,Q RUGHU WR GHOLYHU D QHDU WR OLQHDU VSHHGXSRYHUWKHQRGHSDUWLWLRQHG13':FRQWH[WLWLV QHFHVVDU\ WR ILQG VXLWDEOH SDUWLWLRQLQJ DQG SODFHPHQW VWUDWHJLHV IRU WKH GDWD ZKLFK PD\ UHGXFH WKH QHHG WR H[FKDQJHGDWDEHWZHHQQRGHV7KLVLVVXHLVGLVFXVVHGLQ GHWDLOLQ>@DQGZHRQO\UHYLHZLWLQWKLVSDSHU2XU REMHFWLYH ZDV IRU WKH 13': WR EH DEOH WR SURFHVV HIILFLHQWO\ QRW RQO\ VLPSOH VWDU VFKHPDV >@ EXW DOVR PRUHFRPSOH[GDWDZDUHKRXVHVFKHPDVVXFKDV73&+ >@ ,Q D SDUWLWLRQLQJ DQG SODFHPHQW VFKHPH HDFK UHODWLRQ FDQ HVVHQWLDOO\ EH SDUWLWLRQHG GLYLGHG LQWR SDUWLWLRQVRU IUDJPHQWV RU FRSLHG LQ WKHLUHQWLUHW\ LQWR DOOQRGHVRIDJURXS,QRUGHUWRVLPSOLI\RXUGLVFXVVLRQ ZHDVVXPHWKDWWKH\DUHHLWKHUFRSLHGRUSDUWLWLRQHGLQWR DOO QRGHV WKDW LV WKH JURXS LV ³DOO QRGHV´ :H DOVR VLPSOLI\ WKH GLVFXVVLRQ E\ FRQVLGHULQJ KRPRJHQHRXV QRGHV VR WKDW HDFK QRGH KDV WKH VDPH ORDG 7KLV FRQVWUDLQW FDQ EH HOLPLQDWHG E\ WDNLQJ LQWR DFFRXQW QRGH SHUIRUPDQFHV LQ WKH LQLWLDO SODFHPHQW DQG VXEVHTXHQW UHRUJDQL]DWLRQV IRU ORDG EDODQFLQJ 5HODWLRQVWKDWDUHFRSLHGLQWRDOOQRGHVDUHDOVRGHQRWHG DV UHSOLFDWHG UHODWLRQV 7KH GHFLVLRQ WR UHSOLFDWH UHODWLRQV IRU SHUIRUPDQFH UHDVRQV LV DQ RXWSXW IURP SDUWLWLRQLQJQRWDYDLODELOLW\UHODWHGUHSOLFDWLRQEXWWKH UHVXOWLQJ UHSOLFDV DUH RI FRXUVH DOVR XVHIXO IRU DYDLODELOLW\,QRUGHUWRGLVWLQJXLVKDUHSOLFDGLFWDWHGE\ D SDUWLWLRQLQJ DOJRULWKP IURP RQH GLFWDWHG IURP DYDLODELOLW\ ZH GHQRWH SDUWLWLRQLQJ UHSOLFDWLRQ DV 3 UHSOLFDWLRQ 3DUWLWLRQHG UHODWLRQV FDQ EH GLYLGHG XVLQJ D URXQG URELQ UDQGRP UDQJH RU KDVKEDVHG VFKHPH 7KH 13': XVHV KRUL]RQWDO KDVKSDUWLWLRQLQJ DV WKLV DSSURDFK IDFLOLWDWHV NH\EDVHG WXSOH ORFDWLRQ DQG MRLQ RSHUDWLRQV )LJXUH VKRZV WKH SDUWLWLRQLQJ DQG SODFHPHQW RI UHODWLRQV IRU WKH 73&+ EHQFKPDUN >@ DIWHU WKH ZRUNORDGEDVHG DOJRULWKP LQ >@ ZDV DSSOLHG /,OLQHLWHP2RUGHUV36SDUWVXSS3SDUW6VXSSOLHU &FXVWRPHU ,Q WKDW )LJXUH GDVKHG UHFWDQJOHV UHSUHVHQW IXOO\SDUWLWLRQHG UHODWLRQV GDVKHG DUURZV UHSUHVHQW³UHSDUWLWLRQMRLQV´5-MRLQVWKDWUHTXLUHGDWD WR EH VKLSSHG EHWZHHQ QRGHV EROG DUURZV UHSUHVHQW ³HTXLSDUWLWLRQHG MRLQV´ (- MRLQV WKDW GR QRW UHTXLUH GDWD WR EH VKLSSHG EHWZHHQ QRGHV EHFDXVH WKH LQWHUYHQLQJ GDWD VHWV DUH SDUWLWLRQHG E\ WKH MRLQ NH\ DQGQRUPDO DUURZVUHSUHVHQW ³3UHSOLFDWHG MRLQV´55- ± MRLQV WKDW GR QRW UHTXLUH GDWD WR EH VKLSSHG EHWZHHQ QRGHV EHFDXVH RQH RI WKH LQWHUYHQLQJ UHODWLRQV LV 3 UHSOLFDWHG 5HSDUWLWLRQLQJ UHIHUV WR WKH QHHG WR H[FKDQJHGDWDEHWZHHQQRGHVLQRUGHUWRUHRUJDQL]HWZR GDWD VHWV VR WKDW WKH\ EHFRPH HTXLSDUWLWLRQHG SDUWLWLRQHG E\ WKH VDPH DWWULEXWH ,Q RUGHU WR FKRRVH WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DOWHUQDWLYH ZH PXVW XVHDVWUDWHJ\VXFKDVZRUNORDGEDVHGSDUWLWLRQLQJ>@ 7KHLGHDLVWRFKRRVHSDUWLWLRQLQJNH\VWKDW³PD[LPL]H´ WKH DPRXQW RI (- DV RSSRVHG WR 5- E\ ORRNLQJ DW WKH TXHU\ ZRUNORDG $GGLWLRQDOO\ IRU UHODWLRQV WKDW DUH VPDOOLQFRPSDULVRQWRWKHGDWDVHWWKDWZRXOGQHHGWR EH UHSDUWLWLRQHG WR MRLQ ZLWK WKHP 55- PD\ EH SUHIHUDEOH >@ DV LW DYRLGV SRWHQWLDOO\ ODUJH UHSDUWLWLRQLQJ RYHUKHDGV 7KLV LV WKH UHDVRQ ZK\ VPDOOHVW UHODWLRQV & DQG 6 DUH 3UHSOLFDWHG DV LW DYRLGV WKH QHHG WR VKLS ODUJHU GDWD EHWZHHQ QRGHV WR MRLQZLWKWKRVHVPDOOHUGDWDVHWV 4XHU\ SURFHVVLQJ RYHU D SDUDOOHO GDWDEDVH DQG LQ SDUWLFXODURYHUWKH13':IROORZVURXJKO\WKHVWHSVLQ )LJXUH ZKLFK ZH GHVFULEH LQ PRUH GHWDLO LQ >@ )LJXUH LOOXVWUDWHV D VLPSOH H[DPSOH &RQVLGHU D VXP TXHU\(DFKQRGHQHHGVWRDSSO\H[DFWO\WKHVDPHLQLWLDO TXHU\ RU PRUH JHQHULFDOO\ D PRGLILHG TXHU\ RQ LWV SDUWLDO GDWD DQG WKH UHVXOWV DUH PHUJHG E\ DSSO\LQJ D PHUJHTXHU\DJDLQDWWKHPHUJLQJQRGHZLWKWKHSDUWLDO UHVXOWVFRPLQJIURPWKHSURFHVVLQJQRGHV 0RUHJHQHULFDOO\WKHW\SLFDOTXHU\SURFHVVLQJF\FOH LV VKRZQ LQ)LJXUH DQG D FRPSOHWH H[DPSOH LVJLYHQ LQ)LJXUH6WHSSUHSDUHVWKHQRGHDQGPHUJHTXHU\ FRPSRQHQWV IURP WKH RULJLQDO VXEPLWWHG TXHU\ 6WHS ³6HQG4XHU\´IRUZDUGVWKHQRGHTXHU\LQWRDOOQRGHVLQ WKH 13': ZKLFK SURFHVV WKH TXHU\ ORFDOO\ LQ VWHS (DFKQRGHWKHQVHQGVLWVSDUWLDOUHVXOWLQWRWKHVXEPLWWHU QRGHZKLFKDSSOLHVWKHPHUJHTXHU\LQ6WHS6WHS UHGLVWULEXWHV UHVXOWV LQWR SURFHVVLQJ QRGHV LI UHTXLUHG IRUVRPHTXHULHVFRQWDLQLQJVXETXHULHVLQZKLFKFDVH PRUHWKDQRQHSURFHVVLQJF\FOHPD\EHUHTXLUHG 680; RYHUQ)$&7 *5283%<GLP$WWUV 6HQGWR QRGHV 680680V 81,213DUWLDOB6XPV *5283%<GLP$WWUV 680; RYHU)$&7 *5283%<GLP$WWUV 680; RYHUQ)$&7 *5283%<GLP$WWUV )LJXUH±7\SLFDO4XHU\RYHU13': 6XEPLWWHU1RGH 5HZULWH 4XHU\ &RPSXWLQJ &RPSXWH 3DUWLDO 1RGHV 5HVXOW $SSO\0HUJH 4XHU\ 6HQG 4XHU\ 6HQG 3DUWLDO 5HVXOWV 5HGLVWULEXWH )LJXUH±4XHU\3URFHVVLQJ6WHSVLQ13': 33B.(< 3[ 3\ 363B.(< 36[ /L2B.(< /L[ 2[ 36\ /L\ 22B.(< & 2\ 6 )LJXUH6FKHPDLQ1RGH;ZLWKUHSOLFDWHG6FKHPDIURP1RGH< ,Q VWHSV DQG RI )LJXUH ZH FDQ VHH WKDW $JJUHJDWLRQSULPLWLYHVDUHFRPSXWHGDWHDFKQRGH7KH PRVWFRPPRQSULPLWLYHVDUH /LQHDUVXP/6 680; 6XPRIVTXDUHV66 680; QXPEHURIHOHPHQWV1 H[WUHPHV0$;DQG0,1 4XHU\VXEPLVVLRQ 6HOHFWVXPDFRXQWDDYHUDJHDPD[DPLQD VWGGHYDJURXSBDWWULEXWHV )URPIDFWGLPHQVLRQVMRLQ *URXSE\JURXSBDWWULEXWHV 4XHU\UHZULWLQJDQGGLVWULEXWLRQWRHDFKQRGH 6HOHFWVXPDFRXQWDVXPD[DPD[DPLQD JURXSBDWWULEXWHV )URPIDFWGLPHQVLRQVMRLQ *URXSE\JURXSBDWWULEXWHV &RPSXWHSDUWLDOUHVXOWV 6HOHFWVXPDFRXQWDVXPD[DPD[DPLQD JURXSBDWWULEXWHV )URPIDFWGLPHQVLRQVMRLQ *URXSE\JURXSBDWWULEXWHV 5HVXOWVFROOHFWLQJ &UHDWHFDFKHGWDEOH 35TXHU\;QRGHVXPDFRXQWDVVXPDPD[DPLQD JURXSBDWWULEXWHV DVLQVHUWUHFHLYHGUHVXOWV! 5HVXOWVPHUJLQJ 6HOHFWVXPVXPDVXPFRXQWD VXPVXPDVXPFRXQWDPD[PD[DPLQPLQD VXPVVXPDVXPVXPDVXPFRXQWDJURXSBDWWULEXWHV )URP81,21B$//35TXHU\;GLPHQVLRQVMRLQ *URXSE\JURXSBDWWULEXWHV )LJXUH±%DVLF$JJUHJDWLRQ4XHU\6WHSV $OWKRXJK ZH KDYH GLVFXVVHG DQG HYDOXDWHG H[WHQVLYHO\ SDUWLWLRQLQJ DQG SURFHVVLQJ FKRLFHV IRU WKH 13': LQ SUHYLRXV ZRUNV ZH GLG QRW GLVFXVV DYDLODELOLW\ZKLFKLVQHYHUWKHOHVVYHU\LPSRUWDQWLQWKH SRWHQWLDOO\XQUHOLDEOHHQYLURQPHQWIRUZKLFK13':LV GHVLJQHGWRUXQ $ GLVFXVVLRQ RI DYDLODELOLW\ IRU WKH 13': EULQJV XS VHYHUDOLVVXHV)RULQVWDQFHQHWZRUNIDLOXUHVIDLOXUHRI WKH VXEPLWWHU RU FRPSXWLQJ QRGHV ORDGLQJ IDLOXUHV DYDLODELOLW\PRQLWRULQJDQGVRRQ(DFKRIWKHVHLVVXHV UHTXLUHV VSHFLILF VROXWLRQV )RU LQVWDQFH QHWZRUN IDLOXUHV FDQ EH DFFRPPRGDWHG XVLQJ EDFNXS FRQQHFWLRQV XQDYDLODELOLW\ RI VXEPLWWHU QRGH FDQ EH DFFRPPRGDWHGE\DOORZLQJPRUHWKDQRQHQRGHWREHD SRWHQWLDO VXEPLWWHU DQG URXWLQJ FOLHQW UHTXHVWV LQWR DYDLODEOH QRGHV )DLOXUH RI WKH VXEPLWWHU QRGH LQ WKH PLGGOH RI TXHU\ SURFHVVLQJ FDQ EH KDQGOHG E\ UHGLUHFWLQJ SDUWLDO UHVXOWV LQWR DQRWKHU QRGH RU UHVXEPLWWLQJ WKH TXHU\ 7KHVH LVVXHV DUH SDUW RI RXU FXUUHQWDQGIXWXUHZRUNRQWKHVXEMHFW,QWKLVSDSHUZH UHVWULFWRXUDWWHQWLRQWRWKHXQDYDLODELOLW\RIFRPSXWLQJ QRGHV UHSOLFDWLRQ DOWHUQDWLYHV WR DFKLHYH KLJK DYDLODELOLW\DQGSURFHVVLQJHIILFLHQF\LQWKHSUHVHQFHRI UHSOLFDWLRQDQGXQDYDLODELOLW\ $YDLODELOLW\WDUJHWHG5HSOLFDWLRQRYHU 13': &RQVLGHU ILUVW WKDW WKH EDVLF UHSOLFDWLRQ XQLW LQ 13':LVWKHQRGH$ZKROHFRS\RIUHODWLRQSDUWLWLRQV IURP RQH QRGH FDQ EH SODFHG LQ DQRWKHU QRGH DQG LQ FDVH RI IDLOXUH WKH UHSODFHPHQW QRGH ZLOO SURFHVV ³WZLFH´WKHDPRXQWRIGDWD±LWVRZQQRGHGDWDDQGWKH RQH LW LV UHSODFLQJ ,Q SUDFWLFH 3UHSOLFDWHG UHODWLRQV VPDOO GLPHQVLRQV GR QRW QHHG WR EH UHSOLFDWHG DJDLQ IRUDYDLODELOLW\)LJXUHVKRZVWKHVFKHPDRIDQRGH; ZLWKUHSOLFDWHGGDWDIURPDQRWKHUQRGH<1RGH;FDQ QRZUHSODFHQRGH<LQFDVHRIXQDYDLODELOLW\RI< :HZLOODOVRGLVFXVVLQWKHQH[WVHFWLRQDYDLODELOLW\ VWUDWHJLHV WKDW VOLFH WKH UHSOLFDWLRQ XQLWV IXUWKHU DQG GLYLGH WKH VOLFHV E\ PRUH WKDQ RQH QRGH 7KLV VWUDWHJ\ LPSURYHV WKH HIILFLHQF\ RI SURFHVVLQJ LQ FDVH RI QRGH XQDYDLODELOLW\ )RU LQVWDQFH /L\ LQ )LJXUH ZLOO EH UHSODFHG E\ /L\M M P DQG GLYLGHG LQWR P QRGHV ,Q WKLVFDVHWKHXQLWRIUHSOLFDWLRQZLOOEHWKHVOLFH 7KHUH LV DOVR DQRWKHU UHTXLUHPHQW FRQFHUQLQJ UHSOLFDWLRQ VOLFHV &RQVLGHU D SDUWLWLRQ /LL RI D UHODWLRQ /LWKDWLVSODFHGDWDQRGH;$VGHSLFWHGLQ)LJXUHV DQG UHODWLRQV DUH SDUWLWLRQHG E\ D SDUWLWLRQLQJ NH\ W\SLFDOO\ KDVKSDUWLWLRQHG DQG SODFHG LQ HTXL SDUWLWLRQHG IDVKLRQ ZKHQ SRVVLEOH HJ /L DQG 2 DUH ERWK SDUWLWLRQHG E\ 2B.(< DQG WXSOHV ZLWK D VSHFLILF YDOXH RI 2B.(< DUH SODFHG RQ WKH VDPH QRGH 7KH UHTXLUHPHQW LV WKDW UHSOLFDWLRQ VOLFHV DOVR EH RUJDQL]HG E\SDUWLWLRQLQJNH\LQDVLPLODUZD\VRWKDWWXSOHVZLWK WKHVDPHNH\ZLOOVWLOOEHFRORFDWHG :LWKUHVSHFWWRTXHU\SURFHVVLQJZLWKUHSOLFDVWKHUH DUHWZRLVVXHVZKLFKQRGHVSURFHVVZKLFKUHSOLFDVDQG KRZWKH\H[WHQGWKHLUSURFHVVLQJWRKDQGOHWKHUHSOLFDV 7KHILUVWLVVXHLVDVFKHGXOLQJSUREOHPZKLFKLVQRWRXU PDLQ FRQFHUQ LQ WKLV SDSHU DQG IRU ZKLFK ZH XVH D VLPSOHJUHHG\VROXWLRQ (DFKQRGHSURFHVVHVLWVRZQGDWD )RUHDFKXQDYDLODEOHQRGH &KRRVHUHSOLFDKROGLQJQRGHZLWKOHVVORDGWR SURFHVVLWVUHSOLFD LI PRUH WKDQ RQH KDYH VDPH ORDG FKRRVH FORVHVW $OWKRXJK WKLV DOJRULWKP GRHV QRW JXDUDQWHH EDODQFHG GLVWULEXWLRQ RI ORDG LW LV VXIILFLHQW IRU RXU SXUSRVHVDQGLIWKHUHLVLPEDODQFHLQWKHUHVXOWHJWKH WRS ORDG EHLQJ D QRGH ZLWK PXFK PRUH ORDG WKDQ WKH RWKHU RQHV D VHFRQG VWHS FDQ WU\ WR UHDOORFDWH WKH SURFHVVLQJRIRQHRUPRUHUHSOLFDVIURPWKDWQRGH :H QRZ FRQFHQWUDWH RQ KRZ WR KDQGOH UHSOLFDV ZKLOH SURFHVVLQJ TXHULHV $ QRGH UXQQLQJ D UHSOLFD RU VOLFH FDQ SURFHVV LWV GDWD VHW DQG WKH UHSOLFD LQGHSHQGHQWO\ DV LI LW UHSUHVHQWHG ³WZR YLUWXDO QRGHV´ UXQQLQJ WZR LQGHSHQGHQW LQVWDQFHV RI WKH F\FOH LQ )LJXUH 7KHVH FRPSXWDWLRQV \LHOG WZR SDUWLDO UHVXOWV DVLILWZHUHWKHSDUWLDOUHVXOWVIURPWZRVHSDUDWHQRGHV ZKLFK FDQ EH PHUJHG XVLQJ VWHS RI )LJXUH EHIRUH VHQGLQJ D VLQJOH SDUWLDO UHVXOW WR WKH PHUJHU QRGH 7KH QRUPDO SURFHVVLQJ UHVXPHV DV EHIRUH LQ VWHS ZLWK HYHU\QRGHVHQGLQJWKHLUUHVXOWVWRWKHPHUJLQJQRGHV 7KLV VWUDWHJ\ LV QRW WKH PRVW HIILFLHQW EHFDXVH WKH UHSODFHPHQW QRGH SURFHVVHV WKH ZKROH GDWD VHSDUDWHO\ IRUERWKYLUWXDOQRGHVDQGDSSOLHVDQH[WUDPHUJHTXHU\ $ EHWWHU DOWHUQDWLYH LV WR VFDQ WKH XQLRQ RI SDUWLWLRQHG UHODWLRQV 6FDQ RSHUDWLRQV RYHU SDUWLWLRQHG UHODWLRQV QRZ VFDQ ERWK WKH QRGHV¶ GDWD DQG WKH UHSOLFDV¶ GDWD DQG WKH TXHU\ SURFHHGV DV LQ D VLQJOH QRGH ZLWK WKH TXHU\ RSWLPL]HU FKRRVLQJ WKH EHVW TXHU\ SODQ 7KLV DOWHUQDWLYH LV EHWWHU EHFDXVH LW DYRLGV H[WUD PHUJLQJ RYHUKHDGDQGDOVRWKHQHHGWRMRLQWZLFHZLWKUHSOLFDWHG UHODWLRQVWKDWDSSHDUVLIWKHYLUWXDOQRGHVDSSURDFKZDV XVHG LQVWHDG RQH IRU HDFK YLUWXDO QRGH ZKLOH VFDQ XQLRQ UHTXLUHV D VLQJOH SURFHVVLQJ RI UHSOLFDWHG UHODWLRQV $V WKH VFDQ XQLRQ DOWHUQDWLYH LV PRUH HIILFLHQW WKDQ WKH YLUWXDO QRGHV DSSURDFK ZH DGRSWHG VFDQXQLRQ LQ 13': WKH H[SHULPHQWDO HYDOXDWLRQ LV EDVHGLQVFDQXQLRQ $OWHUQDWLYH5HSOLFDWLRQ6WUDWHJLHV In this section we consider and analyze alternative replication strategies. We analyze the advantages of each strategy using as metrics: degree of fault tolerance (how many nodes can be unavailable or fail simultaneously); efficiency (performance upon node failure); provision for taking several nodes offline simultaneously for data loading or other management or maintenance activities. For instance, it may be possible to take half the nodes offline for loading while the system remains online, then switch to loading the other half while never stopping the availability status of the system. 5.1. Full Replicas (FR) The simplest replica placement strategy involves replicating each node’s data into at least one other node. In case of failure of one node, a node containing the replica resumes the operation of the failed node. A simple placement algorithm considering R replicas is: Number nodes linearly; For each node i For replica =1 to R data for node i is also placed in node (i+R) MOD N; Metrics: • Degree of fault tolerance: R nodes when considering R replicas; • Efficiency (performance upon node failure): processing time doubles when a node fails; • Provision for taking several nodes offline simultaneously: can take multiple nodes offline simultaneously, as long as the set of unavailable nodes does not include all R+1 copies of any node. For example, in Figure 6 with two replicas, shaded boxes may be unavailable and the system still works, because nodes 3, 6 and 9 contain replicas of their two closest neighbors. This suggests that up to R/(R+1)N nodes can be offline simultaneously, if chosen carefully. )LJXUH$YDLODELOLW\LQ)5 The major drawback of this simple strategy is processing efficiency when unavailability of a few nodes occur: consider a NPDW system with N homogeneous nodes. Using a simplified linear model, assume that each node contains and processes about 1/N of the data in O(1/N) of the time it would take to process the whole data. If one node fails, the node replacing it with the replica will take (at least) about twice as long O(2/N), even though all the other nodes will take O(1/N). The replica effort is placed on a single node, even though other nodes are less loaded. 5.2. Fully Partitioned Replicas (FPR) Instead of having full replicas in a single node, much more efficiency results if replicas are partitioned into as many slices as there are nodes minus one. If there are N nodes, a replica is partitioned into N-1 slices and each slice is placed in one node. The replica of node i is now dispersed into all nodes except node i. The following algorithm can be used to place the slices: Number nodes linearly; The data for node i is partitioned into N-1 numbered slices, starting at 1; For slice x from 1 to N-1: Place slice x in node (i+x) MOD N . This strategy is the most efficient one because, considering N nodes, each replica slice has 1/(N-1) of the data and each node has to process only that fraction in excess in case of a single node being unavailable. If a node becomes unavailable, the remaining nodes will process their data together with the replica slices corresponding to the unavailable node. However, in this case it is not possible to stop more than one node if there is a single replica, because all nodes that remain active are needed to process a slice from the replica. In order to allow up to R nodes to become unavailable, there must be R non-overlapping replica slice sets. Two replicas are non-overlapped iff the equivalent slices of the two replicas are not placed in the same node. Consider that R replicas are to be created (tolerance to unavailability of R nodes). In order to avoid slice overlapping, the following placement algorithm is used: Number nodes linearly; The copy of the data of node i is partitioned into N-1 numbered slices, starting at 1. For j=0 to R: For slice x from 1 to N-1: Place slice x in node (i+j+ x) MOD N Metrics: • Degree of fault tolerance: R nodes, when R replicas are used; • Efficiency (performance upon node failure): processing time increases proportionally to size of slice (fraction 1/(N-1)); • Provision for taking several nodes offline simultaneously: need multiple non-overlapping replicas. 5.3. Partitioned Replicas (PR) Replicas may be partitioned into less than N slices (in NPDW with N nodes). If replicas are partitioned into x slices, we denote it by PR(x). If x=N, we have a fully partitioned replica. A very simple algorithm to generate less than N slices is: Number nodes linearly; The data for node i is partitioned into X slices starting at 1; For slice set j=0 to R: For slice x from 1 to X: Place slice x in node (i+j+ x) MOD N If we desire y nodes to be able to come offline simultaneously when a single replica is used, then the y nodes must not contain replica slices of each other. In order to achieve this, we can divide the nodes into groups that we want to take offline simultaneously. Then we guarantee by placement that replica slices of the nodes in a group are not placed in any node of that group and therefore we can take the whole group offline simultaneously for maintenance or other functionality. For instance, Figure 7 shows twelve nodes organized into two groups G1 and G2. Replicas of each node are PR(6) and the slices are placed in the other group. The labels R1 and R2 in the Figure represent the replicas of nodes of each group and indicate that they are placed in the other group. The replicas are fully partitioned into the other group. * 5 * 5 )LJXUH*URXSLQJ5HSOLFDV Using this strategy, it is possible to take a whole group (6 nodes) offline simultaneously. The system will run slightly slower than if we had a single node offline with 12 full replica slices, because slices are larger. This layout guarantees availability to failures of a single node (R=1) but also of any number of nodes from a single group. We denote this strategy by PRG(g,x) (g groups with x elements each) or PR(x), for simplicity and considering equal-sized groups. It works like FR at the inter-group level and FPR within each group. If we use this strategy with R replicas and R+1 groups, the system can tolerate failures or unavailability of nodes from up to R 1RI5HSOLFDV 5HVSRQVH7LPHPLQVHF QURIUHSOLFDV &RPSDUDWLYH$QDO\VLV Select nation, o_year, sum(amount) as sum profit from ( Select n_name as nation, year(o_orderdate) as o_year, l extendedprice * (1 - l discount) – ps_supplycost* l_quantity as amount from tpcd.part,tpcd.supplier, tpcd.lineitem, tpcd.partsupp, tpcd.orders, tpcd.nation where s suppkey = l_suppkey and ps suppkey = l_suppkey and ps partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like x and n_nationkey > y and o_orderpriority = 'z' and ps_availqty > w ) as profit group by nation, o_year order by nation, o_year desc;) )LJXUHVKRZVWKHUHVSRQVHWLPHPLQVHFZKHQ RXW RI QRGHV DUH RIIOLQH OLQH 7KH DOWHUQDWLYHV RQOLQH )35 35 35 )5 )LJXUH5HVSRQVH7LPH5HSOLFDVQRGHVIDLO4XHU\ 7KHUHVXOWVIRU13':ZLWKQRGHVDUHVKRZQLQ )LJXUH,QWKLVFDVHZHFRQVLGHUXQDYDLODEOHQRGHV LQVWHDG RI WKH RI WKH SUHYLRXV UHVXOWV DQG WKH SDLU 3535LQVWHDGRI35DQG35 QURIUHSOLFDV ,Q WKLV DQDO\VLV ZH IRFXV RQ WKH EDODQFH EHWZHHQ HIILFLHQW DYDLODELOLW\ E\ DQDO\]LQJ WKH SHUIRUPDQFH XQGHU QRGH XQDYDLODELOLW\ DQG WKH IOH[LELOLW\ WR WDNH PXOWLSOH QRGHV RIIOLQH :H FRQVLGHU WKH XVH RI IXOO UHSOLFDV )5 IXOO\ SDUWLWLRQHG UHSOLFDV )35 DQG SDUWLWLRQHG UHSOLFDV 35 7KH DQDO\VLV LQYROYHG PHDVXULQJ UHVSRQVH WLPH RI 13': RQ ORZ FRVW 3&V 0+] 0% 5$0 *% 73&+ >@ ZDV PDQXDOO\ VHWXS LQWR DQG QRGHV ZLWK SDUWLWLRQLQJ DQG SODFHPHQW DV GHVFULEHG LQ VHFWLRQ :HWKHQPHDVXUHGUHVSRQVHWLPHIRUTXHU\RI73&+ ZLWKRXW QRGHV RIIOLQH DQG FRPSDUHG WKH UHVXOW WR WKH UHVSRQVH WLPH ZLWK QRGHV RIIOLQH 4XHU\ LV UHSURGXFHG EHORZ IRU UHIHUHQFH WKH TXHU\ SDUDPHWHUV ZHUHJHQHUDWHGDVGHVFULEHGLQWKH73&+VSHFLILFDWLRQ DQGWKHUHVXOWVDUHWKHDYHUDJHRIUXQV UHVSRQVHWLPHPLQVHF Metrics: • Degree of fault tolerance: X nodes from a single group; If R replicas over R+1 groups are used, the system can tolerate failures or unavailability of nodes from up to R groups; • Efficiency (performance upon node failure): processing time increases proportionally to size of slice (fraction 1/(X)); • Provision for taking several nodes offline simultaneously: can take offline whole groups. FRPSDUHG DUH ³RQOLQH´ ± HYHU\ QRGH LV RQOLQH )35 ± IXOO\ SDUWLWLRQHG UHSOLFDV QRGHV RIIOLQH 35 ± SDUWLWLRQHG UHSOLFDV WZR JURXSV RI QRGHV HDFK 35±SDUWLWLRQHGUHSOLFDVJURXSVRIQRGHVHDFK ,WDOVRVKRZVWKHPLQLPXPQXPEHURIUHSOLFDVWKDWDUH QHFHVVDU\ WR SURYLGH WKH UHTXLUHG DYDLODELOLW\ 7KHVH UHVXOWV VKRZ WKH PXFK ODUJHU SHQDOW\ LQFXUUHG E\ )5 DQGWKHH[FHVVLYHQXPEHURIUHSOLFDVUHTXLUHGIRU)35 WR DOORZ QRGHV RIIOLQH VLPXOWDQHRXVO\ 35 SDUWLWLRQHGUHSOLFDVZLWKWZRHOHPHQWJURXSVDUHD JRRGFKRLFHDVLWUHTXLUHVDVLQJOHUHSOLFDDQGREWDLQVD JRRGUHVSRQVHWLPHVLPXOWDQHRXVO\ 1RI5HSOLFDV 5HVSRQVH7LPHPLQVHF RQOLQH )35 35 35 UHVSRQVHWLPHPLQVHF groups. More groups allow more nodes to be unavailable but slices will be larger, leading to possibly slower processing when groups are offline. )5 )LJXUH5HVSRQVH7LPH5HSOLFDVQRGHVIDLO4XHU\ 7KHWUHQGLVVLPLODUWRWKHRQHREVHUYHGLQ)LJXUH WKH PDLQ GLIIHUHQFH EHLQJ WKDW WKH UHVSRQVH WLPHV DUH PXFK ODUJHU LQ HYHU\ FDVH EHFDXVH WKHUH DUH RQO\ KDOI WKH QXPEHU RI QRGHV QRGHV LQ )LJXUH YHUVXV QRGHV LQ )LJXUH ,Q WKLV FDVH 35 VHHPV WR EH WKH EHVW FKRLFH DV LW DYRLGV WKH FRVW RI )5 RU 35 DQG VLPXOWDQHRXVO\WKHUHTXLUHPHQWRI)35WKDWWKHUHEHDW OHDVWUHSOLFDVRIHDFKQRGH )LJXUH FRPSDUHV WKH UHVSRQVH WLPH RQ 13': ZLWK QRGHV YHUVXV 13': ZLWK QRGHV 7KHVH UHVXOWV VKRZ WKDW DOWKRXJK WKH UHVSRQVH WLPH ZLWK QRGHV LV PXFK ODUJHU WKDQ WKDW ZLWK QRGHV DV >@ &RSHODQG * 7RP .HOOHU ³$ FRPSDULVRQ RI KLJK DYDLODELOLW\PHGLDUHFRYHU\WHFKQLTXHV´,Q3URFVRIWKH $&0,QWHUQDWLRQDO&RQIRQ0DQDJHPHQWRI'DWD H[SHFWHG WKH FRPSDULVRQ EHWZHHQ DOWHUQDWLYH UHSOLFDWLRQVFKHPHVIROORZVDVLPLODUWUHQG 7KHVH H[SHULPHQWDO UHVXOWV KDYH VKRZQ WKDW LW LV DGYDQWDJHRXVWRFRQVLGHUSDUWLWLRQHGUHSOLFDVLQVWHDGRI VLPSO\ IXOO UHSOLFDV LI WKH V\VWHP LV WR RIIHU HIILFLHQW DYDLODELOLW\ :LWK VXFK D FDSDELOLW\ WKH V\VWHP FDQ EH DOZD\VRQ DOZD\V HIILFLHQWHYHQ WKRXJK SDUWV RI LW DUH WDNHQRIIOLQHIRUPDLQWHQDQFHRIPDQDJHPHQWIXQFWLRQV VXFKDVORDGLQJZLWKQHZGDWDRU'%$IXQFWLRQDOLW\ :HDUHFXUUHQWO\WHVWLQJWKHVWUDWHJLHVRYHUDGGLWLRQDO TXHU\ZRUNORDGVZLWKYDULHGFKDUDFWHULVWLFV >@'H:LWW'*HUEHU5³0XOWLSURFHVVRU+DVK%DVHG-RLQ $OJRULWKPV´ 3URFHHGLQJV RI WKH (OHYHQWK &RQIHUHQFH RQ 9HU\/DUJH'DWDEDVHV6WRFNKROP6ZHGHQ$XJXVW >@ )XUWDGR 3 7KH ,VVXH RI /DUJH 5HODWLRQV LQ 1RGH 3DUWLWLRQHG 'DWD :DUHKRXVHV ,QWHUQDWLRQDO &RQIHUHQFH RQ 'DWDEDVH 6\VWHPV IRU $GYDQFHG $SSOLFDWLRQV '$6)$$ %HLMLQJ&KLQD$SULO >@ )XUWDGR 3 ([SHULPHQWDO (YLGHQFH RQ 3DUWLWLRQLQJ LQ 3DUDOOHO'DWD:DUHKRXVHV'2/$3:25.6+23RIWKH ,QW¶O&RQIHUHQFHRQ,QIRUPDWLRQDQG.QRZOHGJH0DQDJHPHQW &,.0:DVKLQJWRQ1RYHPEHU QURIUHSOLFDV 5HVSRQVH7LPHPLQVHFQRGHV 5HVSRQVH7LPHPLQVHFQRGHV >@ )XUWDGR 3 ³(IILFLHQWO\ 3URFHVVLQJ 4XHU\,QWHQVLYH 'DWDEDVHVRYHUD1RQGHGLFDWHG/RFDO1HWZRUN´1LQHWHHQWK ,QWHUQDWLRQDO3DUDOOHODQG'LVWULEXWHG3URFHVVLQJ6\PSRVLXP 'HQYHU&RORUDGR86$0D\ >@+VLDR+'DYLG-'H:LWW5HSOLFDWHG'DWD0DQDJHPHQW LQ WKH *DPPD 'DWDEDVH 0DFKLQH :RUNVKRS RQ WKH 0DQDJHPHQWRI5HSOLFDWHG'DWD online F PR PR (5) PR (10) PR (2) PR (5) FR )LJXUH&RPSDULVRQQRGHVYHUVXVQRGHV &RQFOXVLRQVDQG)XWXUH:RUN 7KHZRUNSUHVHQWHGLQWKLVSDSHUIRFXVHGRQUHSOLFDWLRQ IRU HIILFLHQW DYDLODELOLW\ RQ WKH 1RGH 3DUWLWLRQHG 'DWD :DUHKRXVH 13': $IWHU UHYLHZLQJ SODFHPHQW DQG SURFHVVLQJ LVVXHV RYHU WKH 13': ZH KDYH FRPSDUHG DOWHUQDWLYHUHSOLFDVWUDWHJLHVXVLQJPHWULFVWKDWLQFOXGHG HIILFLHQF\ GHJUHH RI WROHUDQFH WR QRGH IDLOXUHV DQG FDSDFLW\ WR DOORZ PXOWLSOH QRGHV WR EH RIIOLQH VLPXOWDQHRXVO\ 7KH DOWHUQDWLYHV UDQJLQJ IURP IXOO UHSOLFDWLRQWRYDULRXVGHJUHHVRISDUWLWLRQHGUHSOLFDWLRQ ZHUHFRPSDUHGH[SHULPHQWDOO\IURPWKHSHUVSHFWLYHRI SHUIRUPDQFH GHJUDGDWLRQ ZKHQ QRGHV JR RIIOLQH :H FRQFOXGHG WKDW UHSOLFDV SDUWLWLRQHG E\ JURXSV DUH WKH PRVW DGYDQWDJHRXV DOWHUQDWLYH IRU 13': LI ZH FRQVLGHU ERWK SHUIRUPDQFH DQG IOH[LELOLW\ LQ DOORZLQJ PXOWLSOH QRGHV WR EH WDNHQ RIIOLQH VLPXOWDQHRXVO\ IRU PDLQWHQDQFH RU ORDGLQJ UHDVRQV %HVLGHV H[WHQVLYH WHVWLQJ RI WKH DSSURDFKHV RXU IXWXUH ZRUN LQ WKLV VXEMHFWLQFOXGHVDXWRPDWLQJUHSOLFDWLRQDQGUHFRYHU\DV ZHOO DV DXWRPDWHG GDWD ZDUHKRXVH ORDGLQJ ZLWK WKH V\VWHPDOZD\VRQXVLQJWKH35VWUDWHJLHVGHVFULEHGLQ WKLVSDSHU 5HIHUHQFHV >@ &RXORQ & ( 3DFLWWL 3 9DOGXULH] ³6FDOLQJ XS WKH 3UHYHQWLYH 5HSOLFDWLRQ RI $XWRQRPRXV 'DWDEDVHV LQ &OXVWHU 6\VWHPV´ 9HFSDU WK ,QWHUQDWLRQDO &RQIHUHQFH 9DOHQFLD6SDLQ-XQH >@+VLDR+'DYLG-'H:LWW&KDLQHG'HFOXVWHULQJ$1HZ $YDLODELOLW\6WUDWHJ\IRU0XOWLSURFHVVRU'DWDEDVH0DFKLQHV ,&'( >@+VLDR+'DYLG-'H:LWW$3HUIRUPDQFH6WXG\RI7KUHH +LJK$YDLODELOLW\'DWD5HSOLFDWLRQ6WUDWHJLHV3',6 >@ .LPEDOO 5 7KH 'DWD :DUHKRXVH 7RRONLW 1HZ <RUN-:LOH\6RQV >@ .LWVXUHJDZD 0 7DQDND + DQG 0RWRRND 7 ³$SSOLFDWLRQ RI +DVK WR 'DWDEDVH 0DFKLQH DQG LWV $UFKLWHFWXUH´1HZ*HQHUDWLRQ&RPSXWLQJ >@ /LQ < % .HPPH 5 -LPHQH]3HULV ³&RQVLVWHQW 'DWD 5HSOLFDWLRQ ,V LW IHDVLEOH LQ :$1V"´ LQ WK ,QWHUQDWLRQDO (XUR3DU&RQIHUHQFH/LVERD3RUWXJDO$XJXVW >@3DFLWWL(0g]VX&&RXORQ³3UHYHQWLYH0XOWL0DVWHU 5HSOLFDWLRQ LQ D &OXVWHU RI $XWRQRPRXV 'DWDEDVHV´ WK ,QWHUQDWLRQDO (XUR3DU &RQIHUHQFH .ODJHQIXUW $XVWULD $XJXVW >@5DR-=KDQJF0HJLGGRQ/RKPDQ*³$XWRPDWLQJ 3K\VLFDO 'DWDEDVH 'HVLJQ LQ D 3DUDOOHO 'DWDEDVH´ $&0 ,QWHUQDWLRQDO &RQIHUHQFH RQ 0DQDJHPHQW RI 'DWD 0DGLVRQ:LVFRQVLQ86$-XQH >@7DQGHP'DWDEDVH*URXS1RQ6WRS64/$'LVWULEXWHG +LJK3HUIRUPDQFH+LJK5HOLDELOLW\,PSOHPHQWDWLRQRI64/ :RUNVKRSRQ+LJK3HUIRUP7UDQV6\V&$VHSW >@ 7HUDGDWD '%& 'DWDEDVH &RPSXWHU 6\VWHP 0DQXDO5HOHDVH&7HUDGDWD1RY >@ 73& %HQFKPDUN + 7UDQVDFWLRQ 3URFHVVLQJ &RXQFLO -XQH$YDLODEOHDWKWWSZZZWSFRUJ >@ <X & 7 DQG 0HQJ : 3ULQFLSOHV RI 'DWDEDVH 4XHU\ 3URFHVVLQJ IRU $GYDQFHG $SSOLFDWLRQV 0RUJDQ .DXIPDQQ >@=LOLR'&-KLQJUDQ$3DGPDQDEKDQ6³3DUWLWLRQLQJ .H\ 6HOHFWLRQ IRU D 6KDUHG1RWKLQJ 3DUDOOHO 'DWDEDVH 6\VWHP´,%05HVHDUFK5HSRUW5&
© Copyright 2026 Paperzz