Replication in Node Partitioned Data Warehouses

5HSOLFDWLRQLQ1RGH3DUWLWLRQHG'DWD:DUHKRXVHV
3HGUR)XUWDGR
$EVWUDFW
8QLYHUVLW\RI&RLPEUD
'HSDUWDPHQWRGH(QJHQKDULD,QIRUPiWLFD
3yOR,,3LQKDOGH0DUURFRV
&RLPEUD
3RUWXJDO
SQI#GHLXFSW
,Q WKLV SDSHU ZH FRQFHQWUDWH RQ JXDUDQWHHLQJ
HIILFLHQW
DYDLODELOLW\
DQG
SURPRWLQJ
PDQDJHDELOLW\ LQ D QRGHSDUWLWLRQHG GDWD
ZDUHKRXVH 13': 7KH REMHFWLYH LV WKDW WKH
V\VWHP EH DOZD\VRQ DQG DOZD\V HIILFLHQW HYHQ
ZKHQ HQWLUH SDUWV RI LW DUH WDNHQ RIIOLQH IRU
PDLQWHQDQFH DQG PDQDJHPHQW IXQFWLRQV VXFK DV
ORDGLQJ ZLWK QHZ GDWD RU RWKHU '%$
IXQFWLRQDOLW\ 5HSOLFDWLRQ KDV DOUHDG\ EHHQ
VWXGLHG IRU SDUDOOHO GDWDEDVHV LQ JHQHUDO :H
LQYHVWLJDWH KRZ DOWHUQDWLYH UHSOLFDWLRQ VWUDWHJLHV
FDQEHDSSOLHGWRWKH13':FRQWH[WDQGDQDO\]H
DGYDQWDJHVDQGGUDZEDFNVDJDLQVWPHWULFV
,QWURGXFWLRQ
3DUDOOHO DUFKLWHFWXUHV FDQ VSHHGXS VLJQLILFDQWO\ WKH
SURFHVVLQJ RYHU ODUJH GDWD ZDUHKRXVHV :H KDYH EHHQ
SXUVXLQJ WKH LGHD RI UHSODFLQJ IXOO\GHGLFDWHG DQG
SRZHUIXOVHUYHUVE\DSRVVLEO\QRQGHGLFDWHGQHWZRUNRI
ORZFRVW XQGHUXWLOL]HG FRPSXWHUV WR KROG DQG SURFHVV
GDWD ZDUHKRXVHV 7KH GDWD ZDUHKRXVH FDQ UHDFK JLJD RU
HYHQ WHUDE\WHV DQG LV W\SLFDOO\ RUJDQL]HG DV D VHW RI
PXOWLGLPHQVLRQDOVFKHPDV>@7KHUHDUHW\SLFDOO\VRPH
YHU\ELJUHODWLRQV±IDFWVVWRULQJKLVWRULFDOGHWDLOVXFKDV
HDFK LQGLYLGXDO VDOH RI HDFK SURGXFW LQ HDFK VWRUH RI D
UHWDLO FKDLQ DQG VPDOOHU UHODWLRQV ± GLPHQVLRQV ± ZLWK
GHVFULSWLYH SURSHUWLHV IRU WKH GLPHQVLRQV HJ SURGXFW
VWRUHWLPH,QWKDWFRQWH[WSDUWLWLRQLQJUHIHUVWRGLYLGLQJ
UHODWLRQV LQWR QRGHV VRPHKRZ WR WDNH DGYDQWDJH RI
SDUDOOHO QRGH SURFHVVLQJ :H KDYH GLVFXVVHG KRUL]RQWDO
SDUWLWLRQLQJ VWUDWHJLHV IRU 13': LQ >@ DQG VKRZHG WKDW D
FDUHIXO SDUWLWLRQLQJ VWUDWHJ\ RYHU D VZLWFKHG QHWZRUN
HQYLURQPHQW FDQ DFKLHYH DFFHSWDEOH VSHHGXSV +RZHYHU
DYDLODELOLW\ LV DQ LVVXH LQ VXFK D FRQWH[W VR WKDW DYDLODELOLW\
RULHQWHG UHSOLFDWLRQEHFRPHVDPDMRUQHFHVVLW\DVDZD\WR
SURYLGH DYDLODELOLW\ $ UHSOLFD LV D ³VWDQGE\´ FRS\ RI
VRPHGDWDWKDWFDQEHDFWLYDWHGDWDQ\PRPHQWLQFDVHRI
XQDYDLODELOLW\ RU IDLOXUH RI WKH QRGH KROGLQJ WKH
³RULJLQDO´ VR WKDW SURFHVVLQJ UHVXPHV DV XVXDO ,I
SURFHVVLQJ ZLWK XQDYDLODEOH QRGHV LV LPSOHPHQWHG
HIILFLHQWO\ XQDYDLODELOLW\ EHFRPHV OHVV RQHURXV WR WKH
ZKROHV\VWHPDQGLWDOVREHFRPHVIHDVLEOHWRVWRSDVHWRI
QRGHV IRU GDWD ORDGLQJ PDLQWHQDQFH XSJUDGLQJ RU RWKHU
PDQDJHPHQW DFWLYLWLHV ZLWKRXW DQ\ PDMRU UHSHUFXVVLRQV
WRSURFHVVLQJ7KHV\VWHPUHPDLQVDOZD\VRQDQGDOZD\V
HIILFLHQW
5HSOLFD SODFHPHQW KDV EHHQ VWXGLHG LQ WKH FRQWH[W RI
JHQHULF SDUDOOHO DQG GLVWULEXWHG GDWDEDVHV LQ ZKLFK WKH
UHODWLRQVDUHQRWSDUWLWLRQHG>@:HUHYLHZ
WKRVHZRUNVLQWKHUHODWHGZRUNVHFWLRQ,QWKLVSDSHUZH
GLVFXVV UHSOLFDWLRQ IRU DYDLODELOLW\ LQ WKH 13': FRQWH[W
DQGGLVFXVVWKHLUXVHIRUERWKWROHUDWLQJQRGHIDLOXUHVDQG
DOORZLQJ PXOWLSOH QRGHV WR EH RIIOLQH VLPXOWDQHRXVO\ IRU
ORDGLQJ RU DGPLQLVWUDWLRQ :H FRPSDUH WKH DSSURDFKHV
IURPWKHSHUVSHFWLYHRIHIILFLHQF\2XUPDLQFRQWULEXWLRQV
LQFOXGHVKRZLQJKRZUHSOLFDWLRQVWUDWHJLHVFDQEHDSSOLHG
WR D ZRUNORDGEDVHG SUHSDUWLWLRQHG 13': VHWWLQJ DQG
KRZ SURFHVVLQJ FDQ LQFRUSRUDWH WKH UHSOLFDV LQ FDVH RI
QRGH IDLOXUHV DQDO\]LQJ DOWHUQDWLYHV DJDLQVW UHOHYDQW
PHWULFV HYDOXDWLQJ WKH DOWHUQDWLYHV ZLWK HPSKDVLV RQ
HIILFLHQF\ DQG IOH[LELOLW\ IRU DOORZLQJ PXOWLSOH RIIOLQH
QRGHV DQDO\]LQJ WKH WUDGHRII EHWZHHQ HIILFLHQF\ DQG WKH
FDSDFLW\ WR WDNH PXOWLSOH QRGHV RIIOLQH VLPXOWDQHRXVO\
7KH SDSHU LV RUJDQL]HG DV IROORZV VHFWLRQ GLVFXVVHV
UHODWHG ZRUN 6HFWLRQ RYHUYLHZV WKH 1RGH 3DUWLWLRQHG
'DWD :DUHKRXVH 6HFWLRQV DQG GLVFXVV UHSOLFDWLRQ
DOWHUQDWLYHV DQG VHFWLRQ FRPSDUHV WKH DSSURDFKHV
6HFWLRQFRQWDLQVFRQFOXGLQJUHPDUNVDQGIXWXUHZRUN
)LJXUH%DVLF3DUWLWLRQLQJ([DPSOHLQ13':73&+VFKHPD
5HODWHG:RUN
7KH PRVW UHOHYDQW UHODWHG ZRUN IRU WKLV SDSHU
FRQFHUQV RQ UHSOLFDWLRQ VWUDWHJLHV EXW ZH DOVR UHYLHZ
EULHIO\ SDUWLWLRQLQJ 6RPH RI WKH PRVW SURPLVLQJ
SDUWLWLRQLQJ DQG SODFHPHQW DSSURDFKHV IRFXV RQ TXHU\
ZRUNORDGEDVHG SDUWLWLRQLQJ FKRLFH > @ 7KH LGHD
LQ WKRVH ZRUNV LV WR XVH WKH TXHU\ ZRUNORDG WR
GHWHUPLQH WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DWWULEXWHV
ZKLFK VKRXOG EH UHODWHG WR W\SLFDO TXHU\ DFFHVV
SDWWHUQV $OO WKRVH ZRUNV IRFXV PDLQO\ RQ KDVK
SDUWLWLRQLQJ IRU HIILFLHQW SDUDOOHO MRLQ SURFHVVLQJ >
@ DOVR UHYLHZHG LQ >@ 2XU SUHYLRXV ZRUN RQ WKH
13': > @ SURSRVHV DQG DQDO\]HV JHQHULF GDWD
SDUWLWLRQLQJ VWUDWHJLHV LQGHSHQGHQWO\ RI WKH XQGHUO\LQJ
GDWDEDVH VHUYHU DQG WDUJHWHG DW QRGH SDUWLWLRQHG GDWD
ZDUHKRXVHV 2XU SXUSRVH LQ WKLV SDSHU LV WR VWXG\
DYDLODELOLW\ DQG UHSOLFDWLRQ FRQFHUQV WR WKH 13':
GHVLJQ
5HSOLFDWLRQKDVEHHQVWXGLHGLQWKHSDUDOOHOGDWDEDVH
FRQWH[W ,Q 7DQGHP¶V 1RQ6WRS 64/ >@ WKH XVH RI
PLUURUHG GLVN GULYHV RIIHUV D KLJK OHYHO RI DYDLODELOLW\
EXWGRHV DSRRU MRERIGLVWULEXWLQJ WKH ORDG RI D IDLOHG
SURFHVVRU ,I D SURFHVVRU IDLOV WKH VXEVWLWXWH SURFHVVRU
ZLOOKDYHWRKDQGOHWKHGLVNVRIWKHIDLOHGSURFHVVRUDV
ZHOO DV LWV RZQ HVVHQWLDOO\ GRXEOLQJ WKH SURFHVVLQJ
WLPH ,Q WKLV SDSHU ZH DSSO\ WKLV VWUDWHJ\ DV WKH )XOO
UHSOLFDWLRQ )5 RSWLRQ 7HUDGDWD¶V VFKHPH >@
DVVXPHV UHODWLRQ FOXVWHUV JURXS RI QRGHV DQG FDQ
EDFNXSDSDUWLWLRQHGFRS\RIDUHODWLRQE\SODFLQJLWLQ
WKH 1 RWKHU QRGHV RI WKH UHODWLRQ FOXVWHU ZLWK 1
QRGHV$OWKRXJKWKLVVFKHPHEDODQFHVWKHSURFHVVLQJLQ
FDVHRIIDLOXUHVLIPRUHWKDQRQHQRGHLVXQDYDLODEOHLQ
WKHFOXVWHUWKHV\VWHPVWRSV,QFKDLQHGGHFOXVWHULQJ>
@ WZR GHFOXVWHUHG FRSLHV DUH NHSW VXFK WKDW WKH
IUDJPHQWVRIWKHVHFRQGGHFOXVWHUHGFRS\DUHSODFHGLQ
GLIIHUHQWQRGHVIURPWKHRQHVRIWKHSULPDU\FRS\7KLV
VWUDWHJ\ LPSURYHV DYDLODELOLW\ ZKLOH PDLQWDLQLQJ WKH
SHUIRUPDQFH OHYHO RI WKH 7HUDGDWD VFKHPH ,Q >@
LQWHUOHDYHG GHFOXVWHULQJ GLYLGHV WKH GLVNV LQWR FOXVWHUV
DQG IXOO\ GHFOXVWHUV UHODWLRQ SDUWLWLRQV LQWR WKH
FRUUHVSRQGLQJFOXVWHU,Q>@WKHDXWKRUVFRPSDUHKLJK
DYDLODELOLW\ PHGLD UHFRYHU\ WHFKQLTXHV LQ D JHQHULF
2/73 HQYLURQPHQW LQFOXGLQJ 7HUDGDWD¶V LQWHUOHDYHG
GHFOXVWHULQJ
5HFHQW ZRUN RQ UHSOLFDWLRQ LQFOXGHV > @ 7KH
DXWKRUVXVHGDWDUHSOLFDWLRQWRLPSURYHGDWDDYDLODELOLW\
DQG TXHU\ ORDG EDODQFLQJ ZKLOH GHDOLQJ ZLWK
FRQVLVWHQF\ SUREOHPV 7KH\ SURSRVH D OD]\ SUHYHQWLYH
GDWDUHSOLFDWLRQVROXWLRQLQ>@DQGDVWUDWHJ\WRVFDOH
XSWKHVROXWLRQLQ>@7KHZRUNLQ>@VWXGLHVVLPLODU
DSSURDFKHVZKHQDSSOLHGWR:$1HQYLURQPHQWV7KH\
LGHQWLI\ WKH PRVW FUXFLDO ERWWOHQHFNV RI WKH H[LVWLQJ
SURWRFROV DQG SURSRVH RSWLPL]DWLRQV WKDW DOOHYLDWH WKH
LGHQWLILHGSUREOHPV
:KLOH WKHVH ZRUNV IRFXV RQ JHQHULF UHSOLFDWLRQ
VWUDWHJLHV IRU DYDLODELOLW\ FRQVLGHULQJ QRQSDUWLWLRQHG
UHODWLRQV DQGRU 2/73 ORDGV ZH GLVFXVV DQDO\]H DQG
HYDOXDWHUHSOLFDWLRQVWUDWHJLHVRQWKHVSHFLILFFRQWH[WRI
WKH1RGH3DUWLWLRQHG'DWD:DUHKRXVHDQGDOVRFRQVLGHU
LI WKH VWUDWHJLHV DOORZ PXOWLSOH QRGHV WR EH RIIOLQH
VLPXOWDQHRXVO\IRUPDLQWHQDQFHRUPDQDJHPHQW
7KH13':
7KH 13': LV D GHVLJQ IRU HIILFLHQW SURFHVVLQJ RI
WKHGDWDZDUHKRXVHRYHUORZFRVWFRPSXWHUQRGHVRQD
SRVVLEO\ QRQGHGLFDWHG VZLWFKHG QHWZRUN 7KH
REMHFWLYHLVQRWWRDVVXPHDQ\VSHFLDOL]HGKDUGZDUHRU
LQWHUFRQQHFWV VR WKDW WKH 13': LV DEOH WR UXQ IRU
LQVWDQFH LQ D 0ESV VZLWFKHG /$1 3DUDOOHOLVP LV
REWDLQHGE\GLYLGLQJWKHGDWDVHWLQLWLDOO\LQWRWKHGLVNV
RILQGLYLGXDOQRGHVVRWKDWHDFKQRGHLVDEOHWRDFFHVV
LWV GDWD ORFDOO\ DQG GDWD LV H[FKDQJHG EHWZHHQ QRGHV
ZKHQ QHFHVVDU\ ,Q RUGHU WR GHOLYHU D QHDU WR OLQHDU
VSHHGXSRYHUWKHQRGHSDUWLWLRQHG13':FRQWH[WLWLV
QHFHVVDU\ WR ILQG VXLWDEOH SDUWLWLRQLQJ DQG SODFHPHQW
VWUDWHJLHV IRU WKH GDWD ZKLFK PD\ UHGXFH WKH QHHG WR
H[FKDQJHGDWDEHWZHHQQRGHV7KLVLVVXHLVGLVFXVVHGLQ
GHWDLOLQ>@DQGZHRQO\UHYLHZLWLQWKLVSDSHU2XU
REMHFWLYH ZDV IRU WKH 13': WR EH DEOH WR SURFHVV
HIILFLHQWO\ QRW RQO\ VLPSOH VWDU VFKHPDV >@ EXW DOVR
PRUHFRPSOH[GDWDZDUHKRXVHVFKHPDVVXFKDV73&+
>@ ,Q D SDUWLWLRQLQJ DQG SODFHPHQW VFKHPH HDFK
UHODWLRQ FDQ HVVHQWLDOO\ EH SDUWLWLRQHG GLYLGHG LQWR
SDUWLWLRQVRU IUDJPHQWV RU FRSLHG LQ WKHLUHQWLUHW\ LQWR
DOOQRGHVRIDJURXS,QRUGHUWRVLPSOLI\RXUGLVFXVVLRQ
ZHDVVXPHWKDWWKH\DUHHLWKHUFRSLHGRUSDUWLWLRQHGLQWR
DOO QRGHV WKDW LV WKH JURXS LV ³DOO QRGHV´ :H DOVR
VLPSOLI\ WKH GLVFXVVLRQ E\ FRQVLGHULQJ KRPRJHQHRXV
QRGHV VR WKDW HDFK QRGH KDV WKH VDPH ORDG 7KLV
FRQVWUDLQW FDQ EH HOLPLQDWHG E\ WDNLQJ LQWR DFFRXQW
QRGH SHUIRUPDQFHV LQ WKH LQLWLDO SODFHPHQW DQG
VXEVHTXHQW UHRUJDQL]DWLRQV IRU ORDG EDODQFLQJ
5HODWLRQVWKDWDUHFRSLHGLQWRDOOQRGHVDUHDOVRGHQRWHG
DV UHSOLFDWHG UHODWLRQV 7KH GHFLVLRQ WR UHSOLFDWH
UHODWLRQV IRU SHUIRUPDQFH UHDVRQV LV DQ RXWSXW IURP
SDUWLWLRQLQJQRWDYDLODELOLW\UHODWHGUHSOLFDWLRQEXWWKH
UHVXOWLQJ UHSOLFDV DUH RI FRXUVH DOVR XVHIXO IRU
DYDLODELOLW\,QRUGHUWRGLVWLQJXLVKDUHSOLFDGLFWDWHGE\
D SDUWLWLRQLQJ DOJRULWKP IURP RQH GLFWDWHG IURP
DYDLODELOLW\ ZH GHQRWH SDUWLWLRQLQJ UHSOLFDWLRQ DV 3
UHSOLFDWLRQ
3DUWLWLRQHG UHODWLRQV FDQ EH GLYLGHG XVLQJ D URXQG
URELQ UDQGRP UDQJH RU KDVKEDVHG VFKHPH 7KH
13': XVHV KRUL]RQWDO KDVKSDUWLWLRQLQJ DV WKLV
DSSURDFK IDFLOLWDWHV NH\EDVHG WXSOH ORFDWLRQ DQG MRLQ
RSHUDWLRQV )LJXUH VKRZV WKH SDUWLWLRQLQJ DQG
SODFHPHQW RI UHODWLRQV IRU WKH 73&+ EHQFKPDUN >@
DIWHU WKH ZRUNORDGEDVHG DOJRULWKP LQ >@ ZDV DSSOLHG
/,OLQHLWHP2RUGHUV36SDUWVXSS3SDUW6VXSSOLHU
&FXVWRPHU ,Q WKDW )LJXUH GDVKHG UHFWDQJOHV
UHSUHVHQW IXOO\SDUWLWLRQHG UHODWLRQV GDVKHG DUURZV
UHSUHVHQW³UHSDUWLWLRQMRLQV´5-MRLQVWKDWUHTXLUHGDWD
WR EH VKLSSHG EHWZHHQ QRGHV EROG DUURZV UHSUHVHQW
³HTXLSDUWLWLRQHG MRLQV´ (- MRLQV WKDW GR QRW UHTXLUH
GDWD WR EH VKLSSHG EHWZHHQ QRGHV EHFDXVH WKH
LQWHUYHQLQJ GDWD VHWV DUH SDUWLWLRQHG E\ WKH MRLQ NH\
DQGQRUPDO DUURZVUHSUHVHQW ³3UHSOLFDWHG MRLQV´55-
± MRLQV WKDW GR QRW UHTXLUH GDWD WR EH VKLSSHG EHWZHHQ
QRGHV EHFDXVH RQH RI WKH LQWHUYHQLQJ UHODWLRQV LV 3
UHSOLFDWHG 5HSDUWLWLRQLQJ UHIHUV WR WKH QHHG WR
H[FKDQJHGDWDEHWZHHQQRGHVLQRUGHUWRUHRUJDQL]HWZR
GDWD VHWV VR WKDW WKH\ EHFRPH HTXLSDUWLWLRQHG
SDUWLWLRQHG E\ WKH VDPH DWWULEXWH ,Q RUGHU WR FKRRVH
WKH PRVW DSSURSULDWH SDUWLWLRQLQJ DOWHUQDWLYH ZH PXVW
XVHDVWUDWHJ\VXFKDVZRUNORDGEDVHGSDUWLWLRQLQJ>@
7KHLGHDLVWRFKRRVHSDUWLWLRQLQJNH\VWKDW³PD[LPL]H´
WKH DPRXQW RI (- DV RSSRVHG WR 5- E\ ORRNLQJ DW WKH
TXHU\ ZRUNORDG $GGLWLRQDOO\ IRU UHODWLRQV WKDW DUH
VPDOOLQFRPSDULVRQWRWKHGDWDVHWWKDWZRXOGQHHGWR
EH UHSDUWLWLRQHG WR MRLQ ZLWK WKHP 55- PD\ EH
SUHIHUDEOH >@ DV LW DYRLGV SRWHQWLDOO\ ODUJH
UHSDUWLWLRQLQJ RYHUKHDGV 7KLV LV WKH UHDVRQ ZK\
VPDOOHVW UHODWLRQV & DQG 6 DUH 3UHSOLFDWHG DV LW
DYRLGV WKH QHHG WR VKLS ODUJHU GDWD EHWZHHQ QRGHV WR
MRLQZLWKWKRVHVPDOOHUGDWDVHWV
4XHU\ SURFHVVLQJ RYHU D SDUDOOHO GDWDEDVH DQG LQ
SDUWLFXODURYHUWKH13':IROORZVURXJKO\WKHVWHSVLQ
)LJXUH ZKLFK ZH GHVFULEH LQ PRUH GHWDLO LQ >@
)LJXUH LOOXVWUDWHV D VLPSOH H[DPSOH &RQVLGHU D VXP
TXHU\(DFKQRGHQHHGVWRDSSO\H[DFWO\WKHVDPHLQLWLDO
TXHU\ RU PRUH JHQHULFDOO\ D PRGLILHG TXHU\ RQ LWV
SDUWLDO GDWD DQG WKH UHVXOWV DUH PHUJHG E\ DSSO\LQJ D
PHUJHTXHU\DJDLQDWWKHPHUJLQJQRGHZLWKWKHSDUWLDO
UHVXOWVFRPLQJIURPWKHSURFHVVLQJQRGHV
0RUHJHQHULFDOO\WKHW\SLFDOTXHU\SURFHVVLQJF\FOH
LV VKRZQ LQ)LJXUH DQG D FRPSOHWH H[DPSOH LVJLYHQ
LQ)LJXUH6WHSSUHSDUHVWKHQRGHDQGPHUJHTXHU\
FRPSRQHQWV IURP WKH RULJLQDO VXEPLWWHG TXHU\ 6WHS ³6HQG4XHU\´IRUZDUGVWKHQRGHTXHU\LQWRDOOQRGHVLQ
WKH 13': ZKLFK SURFHVV WKH TXHU\ ORFDOO\ LQ VWHS (DFKQRGHWKHQVHQGVLWVSDUWLDOUHVXOWLQWRWKHVXEPLWWHU
QRGHZKLFKDSSOLHVWKHPHUJHTXHU\LQ6WHS6WHS
UHGLVWULEXWHV UHVXOWV LQWR SURFHVVLQJ QRGHV LI UHTXLUHG
IRUVRPHTXHULHVFRQWDLQLQJVXETXHULHVLQZKLFKFDVH
PRUHWKDQRQHSURFHVVLQJF\FOHPD\EHUHTXLUHG
680;
RYHUQ)$&7
*5283%<GLP$WWUV
6HQGWR
QRGHV
680680V
81,213DUWLDOB6XPV
*5283%<GLP$WWUV
680;
RYHU)$&7
*5283%<GLP$WWUV
680;
RYHUQ)$&7
*5283%<GLP$WWUV
)LJXUH±7\SLFDO4XHU\RYHU13':
6XEPLWWHU1RGH
5HZULWH
4XHU\
&RPSXWLQJ &RPSXWH
3DUWLDO
1RGHV
5HVXOW
$SSO\0HUJH
4XHU\
6HQG
4XHU\
6HQG
3DUWLDO
5HVXOWV
5HGLVWULEXWH
)LJXUH±4XHU\3URFHVVLQJ6WHSVLQ13':
33B.(<
3[
3\
363B.(<
36[
/L2B.(<
/L[
2[
36\
/L\
22B.(<
&
2\
6
)LJXUH6FKHPDLQ1RGH;ZLWKUHSOLFDWHG6FKHPDIURP1RGH<
,Q VWHSV DQG RI )LJXUH ZH FDQ VHH WKDW
$JJUHJDWLRQSULPLWLYHVDUHFRPSXWHGDWHDFKQRGH7KH
PRVWFRPPRQSULPLWLYHVDUH
/LQHDUVXP/6 680;
6XPRIVTXDUHV66 680;
QXPEHURIHOHPHQWV1
H[WUHPHV0$;DQG0,1
4XHU\VXEPLVVLRQ
6HOHFWVXPDFRXQWDDYHUDJHDPD[DPLQD
VWGGHYDJURXSBDWWULEXWHV
)URPIDFWGLPHQVLRQVMRLQ
*URXSE\JURXSBDWWULEXWHV
4XHU\UHZULWLQJDQGGLVWULEXWLRQWRHDFKQRGH
6HOHFWVXPDFRXQWDVXPD[DPD[DPLQD
JURXSBDWWULEXWHV
)URPIDFWGLPHQVLRQVMRLQ
*URXSE\JURXSBDWWULEXWHV
&RPSXWHSDUWLDOUHVXOWV
6HOHFWVXPDFRXQWDVXPD[DPD[DPLQD
JURXSBDWWULEXWHV
)URPIDFWGLPHQVLRQVMRLQ
*URXSE\JURXSBDWWULEXWHV
5HVXOWVFROOHFWLQJ
&UHDWHFDFKHGWDEOH
35TXHU\;QRGHVXPDFRXQWDVVXPDPD[DPLQD
JURXSBDWWULEXWHV
DVLQVHUWUHFHLYHGUHVXOWV!
5HVXOWVPHUJLQJ
6HOHFWVXPVXPDVXPFRXQWD
VXPVXPDVXPFRXQWDPD[PD[DPLQPLQD
VXPVVXPDVXPVXPDVXPFRXQWDJURXSBDWWULEXWHV
)URP81,21B$//35TXHU\;GLPHQVLRQVMRLQ
*URXSE\JURXSBDWWULEXWHV
)LJXUH±%DVLF$JJUHJDWLRQ4XHU\6WHSV
$OWKRXJK ZH KDYH GLVFXVVHG DQG HYDOXDWHG
H[WHQVLYHO\ SDUWLWLRQLQJ DQG SURFHVVLQJ FKRLFHV IRU WKH
13': LQ SUHYLRXV ZRUNV ZH GLG QRW GLVFXVV
DYDLODELOLW\ZKLFKLVQHYHUWKHOHVVYHU\LPSRUWDQWLQWKH
SRWHQWLDOO\XQUHOLDEOHHQYLURQPHQWIRUZKLFK13':LV
GHVLJQHGWRUXQ
$ GLVFXVVLRQ RI DYDLODELOLW\ IRU WKH 13': EULQJV XS
VHYHUDOLVVXHV)RULQVWDQFHQHWZRUNIDLOXUHVIDLOXUHRI
WKH VXEPLWWHU RU FRPSXWLQJ QRGHV ORDGLQJ IDLOXUHV
DYDLODELOLW\PRQLWRULQJDQGVRRQ(DFKRIWKHVHLVVXHV
UHTXLUHV VSHFLILF VROXWLRQV )RU LQVWDQFH QHWZRUN
IDLOXUHV FDQ EH DFFRPPRGDWHG XVLQJ EDFNXS
FRQQHFWLRQV XQDYDLODELOLW\ RI VXEPLWWHU QRGH FDQ EH
DFFRPPRGDWHGE\DOORZLQJPRUHWKDQRQHQRGHWREHD
SRWHQWLDO VXEPLWWHU DQG URXWLQJ FOLHQW UHTXHVWV LQWR
DYDLODEOH QRGHV )DLOXUH RI WKH VXEPLWWHU QRGH LQ WKH
PLGGOH RI TXHU\ SURFHVVLQJ FDQ EH KDQGOHG E\
UHGLUHFWLQJ SDUWLDO UHVXOWV LQWR DQRWKHU QRGH RU
UHVXEPLWWLQJ WKH TXHU\ 7KHVH LVVXHV DUH SDUW RI RXU
FXUUHQWDQGIXWXUHZRUNRQWKHVXEMHFW,QWKLVSDSHUZH
UHVWULFWRXUDWWHQWLRQWRWKHXQDYDLODELOLW\RIFRPSXWLQJ
QRGHV UHSOLFDWLRQ DOWHUQDWLYHV WR DFKLHYH KLJK
DYDLODELOLW\DQGSURFHVVLQJHIILFLHQF\LQWKHSUHVHQFHRI
UHSOLFDWLRQDQGXQDYDLODELOLW\
$YDLODELOLW\WDUJHWHG5HSOLFDWLRQRYHU
13':
&RQVLGHU ILUVW WKDW WKH EDVLF UHSOLFDWLRQ XQLW LQ
13':LVWKHQRGH$ZKROHFRS\RIUHODWLRQSDUWLWLRQV
IURP RQH QRGH FDQ EH SODFHG LQ DQRWKHU QRGH DQG LQ
FDVH RI IDLOXUH WKH UHSODFHPHQW QRGH ZLOO SURFHVV
³WZLFH´WKHDPRXQWRIGDWD±LWVRZQQRGHGDWDDQGWKH
RQH LW LV UHSODFLQJ ,Q SUDFWLFH 3UHSOLFDWHG UHODWLRQV
VPDOO GLPHQVLRQV GR QRW QHHG WR EH UHSOLFDWHG DJDLQ
IRUDYDLODELOLW\)LJXUHVKRZVWKHVFKHPDRIDQRGH;
ZLWKUHSOLFDWHGGDWDIURPDQRWKHUQRGH<1RGH;FDQ
QRZUHSODFHQRGH<LQFDVHRIXQDYDLODELOLW\RI<
:HZLOODOVRGLVFXVVLQWKHQH[WVHFWLRQDYDLODELOLW\
VWUDWHJLHV WKDW VOLFH WKH UHSOLFDWLRQ XQLWV IXUWKHU DQG
GLYLGH WKH VOLFHV E\ PRUH WKDQ RQH QRGH 7KLV VWUDWHJ\
LPSURYHV WKH HIILFLHQF\ RI SURFHVVLQJ LQ FDVH RI QRGH
XQDYDLODELOLW\ )RU LQVWDQFH /L\ LQ )LJXUH ZLOO EH
UHSODFHG E\ /L\M M P DQG GLYLGHG LQWR P QRGHV ,Q
WKLVFDVHWKHXQLWRIUHSOLFDWLRQZLOOEHWKHVOLFH
7KHUH LV DOVR DQRWKHU UHTXLUHPHQW FRQFHUQLQJ
UHSOLFDWLRQ VOLFHV &RQVLGHU D SDUWLWLRQ /LL RI D UHODWLRQ
/LWKDWLVSODFHGDWDQRGH;$VGHSLFWHGLQ)LJXUHV
DQG UHODWLRQV DUH SDUWLWLRQHG E\ D SDUWLWLRQLQJ NH\
W\SLFDOO\ KDVKSDUWLWLRQHG DQG SODFHG LQ HTXL
SDUWLWLRQHG IDVKLRQ ZKHQ SRVVLEOH HJ /L DQG 2 DUH
ERWK SDUWLWLRQHG E\ 2B.(< DQG WXSOHV ZLWK D VSHFLILF
YDOXH RI 2B.(< DUH SODFHG RQ WKH VDPH QRGH 7KH
UHTXLUHPHQW LV WKDW UHSOLFDWLRQ VOLFHV DOVR EH RUJDQL]HG
E\SDUWLWLRQLQJNH\LQDVLPLODUZD\VRWKDWWXSOHVZLWK
WKHVDPHNH\ZLOOVWLOOEHFRORFDWHG
:LWKUHVSHFWWRTXHU\SURFHVVLQJZLWKUHSOLFDVWKHUH
DUHWZRLVVXHVZKLFKQRGHVSURFHVVZKLFKUHSOLFDVDQG
KRZWKH\H[WHQGWKHLUSURFHVVLQJWRKDQGOHWKHUHSOLFDV
7KHILUVWLVVXHLVDVFKHGXOLQJSUREOHPZKLFKLVQRWRXU
PDLQ FRQFHUQ LQ WKLV SDSHU DQG IRU ZKLFK ZH XVH D
VLPSOHJUHHG\VROXWLRQ
(DFKQRGHSURFHVVHVLWVRZQGDWD
)RUHDFKXQDYDLODEOHQRGH
&KRRVHUHSOLFDKROGLQJQRGHZLWKOHVVORDGWR
SURFHVVLWVUHSOLFD
LI PRUH WKDQ RQH KDYH VDPH ORDG FKRRVH
FORVHVW
$OWKRXJK WKLV DOJRULWKP GRHV QRW JXDUDQWHH
EDODQFHG GLVWULEXWLRQ RI ORDG LW LV VXIILFLHQW IRU RXU
SXUSRVHVDQGLIWKHUHLVLPEDODQFHLQWKHUHVXOWHJWKH
WRS ORDG EHLQJ D QRGH ZLWK PXFK PRUH ORDG WKDQ WKH
RWKHU RQHV D VHFRQG VWHS FDQ WU\ WR UHDOORFDWH WKH
SURFHVVLQJRIRQHRUPRUHUHSOLFDVIURPWKDWQRGH
:H QRZ FRQFHQWUDWH RQ KRZ WR KDQGOH UHSOLFDV
ZKLOH SURFHVVLQJ TXHULHV $ QRGH UXQQLQJ D UHSOLFD RU
VOLFH FDQ SURFHVV LWV GDWD VHW DQG WKH UHSOLFD
LQGHSHQGHQWO\ DV LI LW UHSUHVHQWHG ³WZR YLUWXDO QRGHV´
UXQQLQJ WZR LQGHSHQGHQW LQVWDQFHV RI WKH F\FOH LQ
)LJXUH 7KHVH FRPSXWDWLRQV \LHOG WZR SDUWLDO UHVXOWV
DVLILWZHUHWKHSDUWLDOUHVXOWVIURPWZRVHSDUDWHQRGHV
ZKLFK FDQ EH PHUJHG XVLQJ VWHS RI )LJXUH EHIRUH
VHQGLQJ D VLQJOH SDUWLDO UHVXOW WR WKH PHUJHU QRGH 7KH
QRUPDO SURFHVVLQJ UHVXPHV DV EHIRUH LQ VWHS ZLWK
HYHU\QRGHVHQGLQJWKHLUUHVXOWVWRWKHPHUJLQJQRGHV
7KLV VWUDWHJ\ LV QRW WKH PRVW HIILFLHQW EHFDXVH WKH
UHSODFHPHQW QRGH SURFHVVHV WKH ZKROH GDWD VHSDUDWHO\
IRUERWKYLUWXDOQRGHVDQGDSSOLHVDQH[WUDPHUJHTXHU\
$ EHWWHU DOWHUQDWLYH LV WR VFDQ WKH XQLRQ RI SDUWLWLRQHG
UHODWLRQV 6FDQ RSHUDWLRQV RYHU SDUWLWLRQHG UHODWLRQV
QRZ VFDQ ERWK WKH QRGHV¶ GDWD DQG WKH UHSOLFDV¶ GDWD
DQG WKH TXHU\ SURFHHGV DV LQ D VLQJOH QRGH ZLWK WKH
TXHU\ RSWLPL]HU FKRRVLQJ WKH EHVW TXHU\ SODQ 7KLV
DOWHUQDWLYH LV EHWWHU EHFDXVH LW DYRLGV H[WUD PHUJLQJ
RYHUKHDGDQGDOVRWKHQHHGWRMRLQWZLFHZLWKUHSOLFDWHG
UHODWLRQVWKDWDSSHDUVLIWKHYLUWXDOQRGHVDSSURDFKZDV
XVHG LQVWHDG RQH IRU HDFK YLUWXDO QRGH ZKLOH VFDQ
XQLRQ UHTXLUHV D VLQJOH SURFHVVLQJ RI UHSOLFDWHG
UHODWLRQV $V WKH VFDQ XQLRQ DOWHUQDWLYH LV PRUH
HIILFLHQW WKDQ WKH YLUWXDO QRGHV DSSURDFK ZH DGRSWHG
VFDQXQLRQ LQ 13': WKH H[SHULPHQWDO HYDOXDWLRQ LV
EDVHGLQVFDQXQLRQ
$OWHUQDWLYH5HSOLFDWLRQ6WUDWHJLHV
In this section we consider and analyze alternative
replication strategies. We analyze the advantages of
each strategy using as metrics: degree of fault tolerance
(how many nodes can be unavailable or fail
simultaneously); efficiency (performance upon node
failure); provision for taking several nodes offline
simultaneously for data loading or other management or
maintenance activities. For instance, it may be possible
to take half the nodes offline for loading while the
system remains online, then switch to loading the other
half while never stopping the availability status of the
system.
5.1. Full Replicas (FR)
The simplest replica placement strategy involves
replicating each node’s data into at least one other node.
In case of failure of one node, a node containing the
replica resumes the operation of the failed node. A
simple placement algorithm considering R replicas is:
Number nodes linearly;
For each node i
For replica =1 to R
data for node i is also placed in node (i+R) MOD N;
Metrics:
• Degree of fault tolerance: R nodes when considering
R replicas;
• Efficiency (performance upon node failure):
processing time doubles when a node fails;
• Provision for taking several nodes offline
simultaneously: can take multiple nodes offline
simultaneously, as long as the set of unavailable nodes
does not include all R+1 copies of any node. For
example, in Figure 6 with two replicas, shaded boxes
may be unavailable and the system still works, because
nodes 3, 6 and 9 contain replicas of their two closest
neighbors. This suggests that up to R/(R+1)N nodes can
be offline simultaneously, if chosen carefully.
)LJXUH$YDLODELOLW\LQ)5
The major drawback of this simple strategy is
processing efficiency when unavailability of a few
nodes occur: consider a NPDW system with N
homogeneous nodes. Using a simplified linear model,
assume that each node contains and processes about
1/N of the data in O(1/N) of the time it would take to
process the whole data. If one node fails, the node
replacing it with the replica will take (at least) about
twice as long O(2/N), even though all the other nodes
will take O(1/N). The replica effort is placed on a single
node, even though other nodes are less loaded.
5.2. Fully Partitioned Replicas (FPR)
Instead of having full replicas in a single node, much
more efficiency results if replicas are partitioned into as
many slices as there are nodes minus one. If there are N
nodes, a replica is partitioned into N-1 slices and each
slice is placed in one node. The replica of node i is now
dispersed into all nodes except node i. The following
algorithm can be used to place the slices:
Number nodes linearly;
The data for node i is partitioned into N-1 numbered
slices, starting at 1;
For slice x from 1 to N-1:
Place slice x in node (i+x) MOD N .
This strategy is the most efficient one because,
considering N nodes, each replica slice has 1/(N-1) of
the data and each node has to process only that fraction
in excess in case of a single node being unavailable. If a
node becomes unavailable, the remaining nodes will
process their data together with the replica slices
corresponding to the unavailable node. However, in this
case it is not possible to stop more than one node if
there is a single replica, because all nodes that remain
active are needed to process a slice from the replica. In
order to allow up to R nodes to become unavailable,
there must be R non-overlapping replica slice sets. Two
replicas are non-overlapped iff the equivalent slices of
the two replicas are not placed in the same node.
Consider that R replicas are to be created (tolerance to
unavailability of R nodes). In order to avoid slice
overlapping, the following placement algorithm is used:
Number nodes linearly;
The copy of the data of node i is partitioned into N-1
numbered slices, starting at 1.
For j=0 to R:
For slice x from 1 to N-1:
Place slice x in node (i+j+ x) MOD N
Metrics:
• Degree of fault tolerance: R nodes, when R replicas
are used;
• Efficiency (performance upon node failure):
processing time increases proportionally to size of slice
(fraction 1/(N-1));
• Provision for taking several nodes offline
simultaneously: need multiple non-overlapping
replicas.
5.3. Partitioned Replicas (PR)
Replicas may be partitioned into less than N slices (in
NPDW with N nodes). If replicas are partitioned into x
slices, we denote it by PR(x). If x=N, we have a fully
partitioned replica. A very simple algorithm to generate
less than N slices is:
Number nodes linearly;
The data for node i is partitioned into X slices starting
at 1;
For slice set j=0 to R:
For slice x from 1 to X:
Place slice x in node (i+j+ x) MOD N
If we desire y nodes to be able to come offline
simultaneously when a single replica is used, then the y
nodes must not contain replica slices of each other. In
order to achieve this, we can divide the nodes into
groups that we want to take offline simultaneously.
Then we guarantee by placement that replica slices of
the nodes in a group are not placed in any node of that
group and therefore we can take the whole group
offline simultaneously for maintenance or other
functionality.
For instance, Figure 7 shows twelve nodes organized
into two groups G1 and G2. Replicas of each node are
PR(6) and the slices are placed in the other group. The
labels R1 and R2 in the Figure represent the replicas of
nodes of each group and indicate that they are placed in
the other group. The replicas are fully partitioned into
the other group.
*
5
*
5
)LJXUH*URXSLQJ5HSOLFDV
Using this strategy, it is possible to take a whole group
(6 nodes) offline simultaneously. The system will run
slightly slower than if we had a single node offline with
12 full replica slices, because slices are larger. This
layout guarantees availability to failures of a single
node (R=1) but also of any number of nodes from a
single group.
We denote this strategy by PRG(g,x) (g groups with x
elements each) or PR(x), for simplicity and considering
equal-sized groups. It works like FR at the inter-group
level and FPR within each group.
If we use this
strategy with R replicas and R+1 groups, the system can
tolerate failures or unavailability of nodes from up to R
1žRI5HSOLFDV
5HVSRQVH7LPHPLQVHF
QURIUHSOLFDV
&RPSDUDWLYH$QDO\VLV
Select nation, o_year, sum(amount) as sum profit from
(
Select n_name as nation, year(o_orderdate) as o_year,
l extendedprice * (1 - l discount) – ps_supplycost*
l_quantity as amount
from
tpcd.part,tpcd.supplier, tpcd.lineitem, tpcd.partsupp,
tpcd.orders, tpcd.nation
where
s suppkey = l_suppkey and ps suppkey = l_suppkey
and ps partkey = l_partkey and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like x and n_nationkey > y
and o_orderpriority = 'z' and ps_availqty > w
) as profit
group by nation, o_year
order by nation, o_year desc;)
)LJXUHVKRZVWKHUHVSRQVHWLPHPLQVHFZKHQ
RXW RI QRGHV DUH RIIOLQH OLQH 7KH DOWHUQDWLYHV
RQOLQH
)35 35 35
)5
)LJXUH5HVSRQVH7LPH5HSOLFDVQRGHVIDLO4XHU\
7KHUHVXOWVIRU13':ZLWKQRGHVDUHVKRZQLQ
)LJXUH,QWKLVFDVHZHFRQVLGHUXQDYDLODEOHQRGHV
LQVWHDG RI WKH RI WKH SUHYLRXV UHVXOWV DQG WKH SDLU
3535LQVWHDGRI35DQG35
QURIUHSOLFDV
,Q WKLV DQDO\VLV ZH IRFXV RQ WKH EDODQFH EHWZHHQ
HIILFLHQW DYDLODELOLW\ E\ DQDO\]LQJ WKH SHUIRUPDQFH
XQGHU QRGH XQDYDLODELOLW\ DQG WKH IOH[LELOLW\ WR WDNH
PXOWLSOH QRGHV RIIOLQH :H FRQVLGHU WKH XVH RI IXOO
UHSOLFDV )5 IXOO\ SDUWLWLRQHG UHSOLFDV )35 DQG
SDUWLWLRQHG UHSOLFDV 35 7KH DQDO\VLV LQYROYHG
PHDVXULQJ UHVSRQVH WLPH RI 13': RQ ORZ FRVW 3&V
0+] 0% 5$0 *% 73&+ >@ ZDV
PDQXDOO\ VHWXS LQWR DQG QRGHV ZLWK
SDUWLWLRQLQJ DQG SODFHPHQW DV GHVFULEHG LQ VHFWLRQ :HWKHQPHDVXUHGUHVSRQVHWLPHIRUTXHU\RI73&+
ZLWKRXW QRGHV RIIOLQH DQG FRPSDUHG WKH UHVXOW WR WKH
UHVSRQVH WLPH ZLWK QRGHV RIIOLQH 4XHU\ LV
UHSURGXFHG EHORZ IRU UHIHUHQFH WKH TXHU\ SDUDPHWHUV
ZHUHJHQHUDWHGDVGHVFULEHGLQWKH73&+VSHFLILFDWLRQ
DQGWKHUHVXOWVDUHWKHDYHUDJHRIUXQV
UHVSRQVHWLPHPLQVHF
Metrics:
• Degree of fault tolerance: X nodes from a single
group; If R replicas over R+1 groups are used, the
system can tolerate failures or unavailability of nodes
from up to R groups;
• Efficiency (performance upon node failure):
processing time increases proportionally to size of slice
(fraction 1/(X));
• Provision for taking several nodes offline
simultaneously: can take offline whole groups.
FRPSDUHG DUH ³RQOLQH´ ± HYHU\ QRGH LV RQOLQH )35 ±
IXOO\ SDUWLWLRQHG UHSOLFDV QRGHV RIIOLQH 35 ±
SDUWLWLRQHG UHSOLFDV WZR JURXSV RI QRGHV HDFK
35±SDUWLWLRQHGUHSOLFDVJURXSVRIQRGHVHDFK
,WDOVRVKRZVWKHPLQLPXPQXPEHURIUHSOLFDVWKDWDUH
QHFHVVDU\ WR SURYLGH WKH UHTXLUHG DYDLODELOLW\ 7KHVH
UHVXOWV VKRZ WKH PXFK ODUJHU SHQDOW\ LQFXUUHG E\ )5
DQGWKHH[FHVVLYHQXPEHURIUHSOLFDVUHTXLUHGIRU)35
WR DOORZ QRGHV RIIOLQH VLPXOWDQHRXVO\ 35
SDUWLWLRQHGUHSOLFDVZLWKWZRHOHPHQWJURXSVDUHD
JRRGFKRLFHDVLWUHTXLUHVDVLQJOHUHSOLFDDQGREWDLQVD
JRRGUHVSRQVHWLPHVLPXOWDQHRXVO\
1žRI5HSOLFDV
5HVSRQVH7LPHPLQVHF RQOLQH
)35
35 35
UHVSRQVHWLPHPLQVHF
groups. More groups allow more nodes to be
unavailable but slices will be larger, leading to possibly
slower processing when groups are offline.
)5
)LJXUH5HVSRQVH7LPH5HSOLFDVQRGHVIDLO4XHU\
7KHWUHQGLVVLPLODUWRWKHRQHREVHUYHGLQ)LJXUH
WKH PDLQ GLIIHUHQFH EHLQJ WKDW WKH UHVSRQVH WLPHV DUH
PXFK ODUJHU LQ HYHU\ FDVH EHFDXVH WKHUH DUH RQO\ KDOI
WKH QXPEHU RI QRGHV QRGHV LQ )LJXUH YHUVXV QRGHV LQ )LJXUH ,Q WKLV FDVH 35 VHHPV WR EH WKH
EHVW FKRLFH DV LW DYRLGV WKH FRVW RI )5 RU 35 DQG
VLPXOWDQHRXVO\WKHUHTXLUHPHQWRI)35WKDWWKHUHEHDW
OHDVWUHSOLFDVRIHDFKQRGH
)LJXUH FRPSDUHV WKH UHVSRQVH WLPH RQ 13':
ZLWK QRGHV YHUVXV 13': ZLWK QRGHV 7KHVH
UHVXOWV VKRZ WKDW DOWKRXJK WKH UHVSRQVH WLPH ZLWK QRGHV LV PXFK ODUJHU WKDQ WKDW ZLWK QRGHV DV
>@ &RSHODQG * 7RP .HOOHU ³$ FRPSDULVRQ RI KLJK
DYDLODELOLW\PHGLDUHFRYHU\WHFKQLTXHV´,Q3URFVRIWKH
$&0,QWHUQDWLRQDO&RQIRQ0DQDJHPHQWRI'DWD
H[SHFWHG WKH FRPSDULVRQ EHWZHHQ DOWHUQDWLYH
UHSOLFDWLRQVFKHPHVIROORZVDVLPLODUWUHQG
7KHVH H[SHULPHQWDO UHVXOWV KDYH VKRZQ WKDW LW LV
DGYDQWDJHRXVWRFRQVLGHUSDUWLWLRQHGUHSOLFDVLQVWHDGRI
VLPSO\ IXOO UHSOLFDV LI WKH V\VWHP LV WR RIIHU HIILFLHQW
DYDLODELOLW\ :LWK VXFK D FDSDELOLW\ WKH V\VWHP FDQ EH
DOZD\VRQ DOZD\V HIILFLHQWHYHQ WKRXJK SDUWV RI LW DUH
WDNHQRIIOLQHIRUPDLQWHQDQFHRIPDQDJHPHQWIXQFWLRQV
VXFKDVORDGLQJZLWKQHZGDWDRU'%$IXQFWLRQDOLW\
:HDUHFXUUHQWO\WHVWLQJWKHVWUDWHJLHVRYHUDGGLWLRQDO
TXHU\ZRUNORDGVZLWKYDULHGFKDUDFWHULVWLFV
>@'H:LWW'*HUEHU5³0XOWLSURFHVVRU+DVK%DVHG-RLQ
$OJRULWKPV´ 3URFHHGLQJV RI WKH (OHYHQWK &RQIHUHQFH RQ
9HU\/DUJH'DWDEDVHV6WRFNKROP6ZHGHQ$XJXVW
>@ )XUWDGR 3 7KH ,VVXH RI /DUJH 5HODWLRQV LQ 1RGH
3DUWLWLRQHG 'DWD :DUHKRXVHV ,QWHUQDWLRQDO &RQIHUHQFH RQ
'DWDEDVH 6\VWHPV IRU $GYDQFHG $SSOLFDWLRQV '$6)$$
%HLMLQJ&KLQD$SULO
>@ )XUWDGR 3 ([SHULPHQWDO (YLGHQFH RQ 3DUWLWLRQLQJ LQ
3DUDOOHO'DWD:DUHKRXVHV'2/$3:25.6+23RIWKH
,QW¶O&RQIHUHQFHRQ,QIRUPDWLRQDQG.QRZOHGJH0DQDJHPHQW
&,.0:DVKLQJWRQ1RYHPEHU
QURIUHSOLFDV
5HVSRQVH7LPHPLQVHFQRGHV
5HVSRQVH7LPHPLQVHFQRGHV
>@ )XUWDGR 3 ³(IILFLHQWO\ 3URFHVVLQJ 4XHU\,QWHQVLYH
'DWDEDVHVRYHUD1RQGHGLFDWHG/RFDO1HWZRUN´1LQHWHHQWK
,QWHUQDWLRQDO3DUDOOHODQG'LVWULEXWHG3URFHVVLQJ6\PSRVLXP
'HQYHU&RORUDGR86$0D\
>@+VLDR+'DYLG-'H:LWW5HSOLFDWHG'DWD0DQDJHPHQW
LQ WKH *DPPD 'DWDEDVH 0DFKLQH :RUNVKRS RQ WKH
0DQDJHPHQWRI5HSOLFDWHG'DWD
online
F PR
PR (5)
PR (10)
PR (2)
PR (5)
FR
)LJXUH&RPSDULVRQQRGHVYHUVXVQRGHV
&RQFOXVLRQVDQG)XWXUH:RUN
7KHZRUNSUHVHQWHGLQWKLVSDSHUIRFXVHGRQUHSOLFDWLRQ
IRU HIILFLHQW DYDLODELOLW\ RQ WKH 1RGH 3DUWLWLRQHG 'DWD
:DUHKRXVH 13': $IWHU UHYLHZLQJ SODFHPHQW DQG
SURFHVVLQJ LVVXHV RYHU WKH 13': ZH KDYH FRPSDUHG
DOWHUQDWLYHUHSOLFDVWUDWHJLHVXVLQJPHWULFVWKDWLQFOXGHG
HIILFLHQF\ GHJUHH RI WROHUDQFH WR QRGH IDLOXUHV DQG
FDSDFLW\ WR DOORZ PXOWLSOH QRGHV WR EH RIIOLQH
VLPXOWDQHRXVO\ 7KH DOWHUQDWLYHV UDQJLQJ IURP IXOO
UHSOLFDWLRQWRYDULRXVGHJUHHVRISDUWLWLRQHGUHSOLFDWLRQ
ZHUHFRPSDUHGH[SHULPHQWDOO\IURPWKHSHUVSHFWLYHRI
SHUIRUPDQFH GHJUDGDWLRQ ZKHQ QRGHV JR RIIOLQH :H
FRQFOXGHG WKDW UHSOLFDV SDUWLWLRQHG E\ JURXSV DUH WKH
PRVW DGYDQWDJHRXV DOWHUQDWLYH IRU 13': LI ZH
FRQVLGHU ERWK SHUIRUPDQFH DQG IOH[LELOLW\ LQ DOORZLQJ
PXOWLSOH QRGHV WR EH WDNHQ RIIOLQH VLPXOWDQHRXVO\ IRU
PDLQWHQDQFH RU ORDGLQJ UHDVRQV %HVLGHV H[WHQVLYH
WHVWLQJ RI WKH DSSURDFKHV RXU IXWXUH ZRUN LQ WKLV
VXEMHFWLQFOXGHVDXWRPDWLQJUHSOLFDWLRQDQGUHFRYHU\DV
ZHOO DV DXWRPDWHG GDWD ZDUHKRXVH ORDGLQJ ZLWK WKH
V\VWHPDOZD\VRQXVLQJWKH35VWUDWHJLHVGHVFULEHGLQ
WKLVSDSHU
5HIHUHQFHV
>@ &RXORQ & ( 3DFLWWL 3 9DOGXULH] ³6FDOLQJ XS WKH
3UHYHQWLYH 5HSOLFDWLRQ RI $XWRQRPRXV 'DWDEDVHV LQ &OXVWHU
6\VWHPV´ 9HFSDU WK ,QWHUQDWLRQDO &RQIHUHQFH
9DOHQFLD6SDLQ-XQH
>@+VLDR+'DYLG-'H:LWW&KDLQHG'HFOXVWHULQJ$1HZ
$YDLODELOLW\6WUDWHJ\IRU0XOWLSURFHVVRU'DWDEDVH0DFKLQHV
,&'(
>@+VLDR+'DYLG-'H:LWW$3HUIRUPDQFH6WXG\RI7KUHH
+LJK$YDLODELOLW\'DWD5HSOLFDWLRQ6WUDWHJLHV3',6
>@ .LPEDOO 5 7KH 'DWD :DUHKRXVH 7RRONLW 1HZ
<RUN-:LOH\6RQV
>@ .LWVXUHJDZD 0 7DQDND + DQG 0RWRRND 7
³$SSOLFDWLRQ RI +DVK WR 'DWDEDVH 0DFKLQH DQG LWV
$UFKLWHFWXUH´1HZ*HQHUDWLRQ&RPSXWLQJ
>@ /LQ < % .HPPH 5 -LPHQH]3HULV ³&RQVLVWHQW 'DWD
5HSOLFDWLRQ ,V LW IHDVLEOH LQ :$1V"´ LQ WK ,QWHUQDWLRQDO
(XUR3DU&RQIHUHQFH/LVERD3RUWXJDO$XJXVW
>@3DFLWWL(0g]VX&&RXORQ³3UHYHQWLYH0XOWL0DVWHU
5HSOLFDWLRQ LQ D &OXVWHU RI $XWRQRPRXV 'DWDEDVHV´ WK
,QWHUQDWLRQDO (XUR3DU &RQIHUHQFH .ODJHQIXUW $XVWULD
$XJXVW
>@5DR-=KDQJF0HJLGGRQ/RKPDQ*³$XWRPDWLQJ
3K\VLFDO 'DWDEDVH 'HVLJQ LQ D 3DUDOOHO 'DWDEDVH´ $&0
,QWHUQDWLRQDO &RQIHUHQFH RQ 0DQDJHPHQW RI 'DWD 0DGLVRQ:LVFRQVLQ86$-XQH
>@7DQGHP'DWDEDVH*URXS1RQ6WRS64/$'LVWULEXWHG
+LJK3HUIRUPDQFH+LJK5HOLDELOLW\,PSOHPHQWDWLRQRI64/
:RUNVKRSRQ+LJK3HUIRUP7UDQV6\V&$VHSW
>@ 7HUDGDWD '%& 'DWDEDVH &RPSXWHU 6\VWHP
0DQXDO5HOHDVH&7HUDGDWD1RY
>@ 73& %HQFKPDUN + 7UDQVDFWLRQ 3URFHVVLQJ &RXQFLO
-XQH$YDLODEOHDWKWWSZZZWSFRUJ
>@ <X & 7 DQG 0HQJ : 3ULQFLSOHV RI 'DWDEDVH
4XHU\ 3URFHVVLQJ IRU $GYDQFHG $SSOLFDWLRQV 0RUJDQ
.DXIPDQQ
>@=LOLR'&-KLQJUDQ$3DGPDQDEKDQ6³3DUWLWLRQLQJ
.H\ 6HOHFWLRQ IRU D 6KDUHG1RWKLQJ 3DUDOOHO 'DWDEDVH
6\VWHP´,%05HVHDUFK5HSRUW5&