Workload Characterization - CS, GMU

&6
:RUNORDG&KDUDFWHUL]DWLRQ
Dr. Daniel A. Menascé
http://www.cs.gmu.edu/faculty/menasce.html
Department of Computer Science
George Mason University
 1999 Menascé. All Rights Reserved.
1
:KDWLV:RUNORDG
&KDUDFWHUL]DWLRQ"
 1999 Menascé. All Rights Reserved.
2
1
:RUNORDG
❚ 7KHZRUNORDGRIDV\VWHPFDQEHGHILQHGDV
WKHVHWRIDOOLQSXWVWKDWWKHV\VWHPUHFHLYHV
IURPLWVHQYLURQPHQWGXULQJDQ\JLYHQSHULRG
RIWLPH
HTTP
requests
Web Server
 1999 Menascé. All Rights Reserved.
3
:RUNORDG&KDUDFWHUL]DWLRQ
FRQFHSWVDQGLGHDV
❚ %DVLFFRPSRQHQW RIDZRUNORDGUHIHUVWRD
JHQHULFXQLWRIZRUNWKDWDUULYHVDWWKH
V\VWHPIURPH[WHUQDOVRXUFHV
❙
❙
❙
❙
❙
7UDQVDFWLRQ
LQWHUDFWLYHFRPPDQG
SURFHVV
+773UHTXHVWDQG
GHSHQGVRQWKHQDWXUHRIVHUYLFHSURYLGHG
 1999 Menascé. All Rights Reserved.
4
2
:RUNORDG&KDUDFWHUL]DWLRQ
FRQFHSWVDQGLGHDV
❚ :RUNORDGFKDUDFWHUL]DWLRQ
❙ ZRUNORDGPRGHOLVDUHSUHVHQWDWLRQWKDW
PLPLFVWKHZRUNORDGXQGHUVWXG\
❚ :RUNORDGPRGHOVFDQEHXVHGIRU
❙ WKHVHOHFWLRQRIV\VWHPV
❙ SHUIRUPDQFHWXQLQJ
❙ FDSDFLW\SODQQLQJ
 1999 Menascé. All Rights Reserved.
5
:RUNORDG'HVFULSWLRQ
Business
Description
Functional
Description
Resource-oriented
Description
 1999 Menascé. All Rights Reserved.
User
Software
Hardware
6
3
:RUNORDG'HVFULSWLRQ
❚ %XVLQHVVFKDUDFWHUL]DWLRQ DXVHURULHQWHGGHVFULSWLRQ
WKDWGHVFULEHVWKHORDGLQWHUPVVXFKDVQXPEHURI
HPSOR\HHVLQYRLFHVSHUFXVWRPHUHWF
❚ )XQFWLRQDOFKDUDFWHUL]DWLRQ GHVFULEHVSURJUDPV
FRPPDQGVDQGUHTXHVWVWKDWPDNHXSWKHZRUNORDG
❚ 5HVRXUFHRULHQWHGFKDUDFWHUL]DWLRQ GHVFULEHVWKH
FRQVXPSWLRQRIV\VWHPUHVRXUFHVE\WKHZRUNORDGVXFK
DVSURFHVVRUWLPHGLVNRSHUDWLRQVPHPRU\HWF
 1999 Menascé. All Rights Reserved.
7
$:HE6HUYHU([DPSOH
❚ 7KHSDLU&38WLPH,2WLPH FKDUDFWHUL]HV
WKHH[HFXWLRQRIDUHTXHVWDWWKHVHUYHU
❚ 2XUEDVLFZRUNORDG+773UHTXHVWV
❚ )LUVWFDVHRQO\RQHGRFXPHQWVL]H.%
❚ H[HFXWLRQV!VHFVHF
❚ 0RUHUHDOLVWLFZRUNORDGGRFXPHQWVKDYH
GLIIHUHQWVL]HV
 1999 Menascé. All Rights Reserved.
8
4
([HFXWLRQRI+7735HTXHVWV
VHF
Request No. CPU time (sec) I/O time (sec) Elapsed time (sec)
1
0.0095
0.0400
0.0710
2
0.0130
0.1100
0.1450
3
0.0155
0.1200
0.1560
4
0.0088
0.0400
0.0650
5
0.0111
0.0900
0.1140
6
0.0171
0.1400
0.1630
7
0.2170
1.2000
4.3800
8
0.0129
0.1200
0.1510
9
0.0091
0.0500
0.0630
10
0.0017
0.1400
0.1890
Average
0.03157
0.205
0.5497
 1999 Menascé. All Rights Reserved.
9
5HSUHVHQWDWLYHQHVV RID
:RUNORDG0RGHO
Real
Workload
Workload
Model
System
System
Performance
Measures Preal
Performance
Measures Pmodel
 1999 Menascé. All Rights Reserved.
10
5
$5HILQHPHQWLQWKH:RUNORDG0RGHO
❚ 7KHDYHUDJHUHVSRQVHWLPHRIVHFGRHVQRW
UHIOHFWWKHEHKDYLRURIWKHDFWXDOVHUYHU
❚ 'XHWRWKHKHWHURJHQHLW\RIWKHLWVFRPSRQHQWVLWLV
GLIILFXOWWRYLHZWKHZRUNORDGDVDVLQJOHFROOHFWLRQRI
UHTXHVWV
❚ 7KUHHFODVVHV
❙ VPDOOGRFXPHQWV
❙ PHGLXPGRFXPHQWV
❙ ODUJHGRFXPHQWV
 1999 Menascé. All Rights Reserved.
11
([HFXWLRQRI+7735HTXHVWV
VHF
Request No. CPU time (sec) I/O time (sec) Elapsed time (sec)
1 small
0.0095
0.0400
0.0710
2 medium
0.0130
0.1100
0.1450
3 medium
0.0155
0.1200
0.1560
4 small
0.0088
0.0400
0.0650
5 medium
0.0111
0.0900
0.1140
6 medium
0.0171
0.1400
0.1630
7 large
0.2170
1.2000
4.3800
8 medium
0.0129
0.1200
0.1510
9 small
0.0091
0.0500
0.0630
10 medium
0.0017
0.1400
0.1890
 1999 Menascé. All Rights Reserved.
12
6
7KUHH&ODVV&KDUDFWHUL]DWLRQ
Type
CPU time (sec)
I/O time (sec)
No of
omponents
Small Docs.
0.0091
0.04
3
Medium Docs.
0.0144
0.12
6
Large Docs.
0.2170
1.20
1
Total
0.331
2.05
10
 1999 Menascé. All Rights Reserved.
13
:RUNORDG0RGHOV
❚ $PRGHOVKRXOGEHUHSUHVHQWDWLYHDQGFRPSDFW
❚ 1DWXUDOPRGHOV DUHFRQVWUXFWHGHLWKHUXVLQJEDVLF
FRPSRQHQWVRIWKHUHDOZRUNORDGRUXVLQJWUDFHVRIWKH
H[HFXWLRQRIUHDOZRUNORDG
❚ $UWLILFLDOPRGHOV GRQRWXVHDQ\EDVLFFRPSRQHQWRIWKH
UHDOZRUNORDG
❙ ([HFXWDEOHPRGHOVHJV\QWKHWLFSURJUDPVDUWLILFLDO
EHQFKPDUNVHWF
❙ 1RQH[HFXWDEOHPRGHOVWKDWDUHGHVFULEHGE\DVHWRI
SDUDPHWHUYDOXHVWKDWUHSURGXFHWKHVDPHUHVRXUFH
XVDJHRIWKHUHDOZRUNORDG
 1999 Menascé. All Rights Reserved.
14
7
:RUNORDG0RGHOV
❚ 7KHEDVLFLQSXWVWRDQDO\WLFDOPRGHOVDUHSDUDPHWHUV
WKDWGHVFULEHWKHVHUYLFHFHQWHUVLHKDUGZDUHDQG
VRIWZDUHUHVRXUFHVDQGWKHFXVWRPHUVHJUHTXHVWV
DQGWUDQVDFWLRQV
❙ FRPSRQHQWHJWUDQVDFWLRQV LQWHUDUULYDO WLPHV
❙ VHUYLFHGHPDQGV
❙ H[HFXWLRQPL[HJOHYHOVRIPXOWLSURJUDPPLQJ
 1999 Menascé. All Rights Reserved.
15
$:RUNORDG&KDUDFWHUL]DWLRQ
0HWKRGRORJ\
&KRLFHRIDQDQDO\VLVVWDQGSRLQW
,GHQWLILFDWLRQRIWKHEDVLFFRPSRQHQW
&KRLFHRIWKHFKDUDFWHUL]LQJSDUDPHWHUV
'DWDFROOHFWLRQ
3DUWLWLRQLQJWKHZRUNORDG
&DOFXODWLQJWKHFODVVSDUDPHWHUV
 1999 Menascé. All Rights Reserved.
16
8
6HOHFWLRQRIFKDUDFWHUL]LQJ
SDUDPHWHUV
❚ (DFKZRUNORDGFRPSRQHQWLVFKDUDFWHUL]HGE\WZR
JURXSVRILQIRUPDWLRQ
❚ :RUNORDGLQWHQVLW\
❙ DUULYDOUDWH
❙ QXPEHURIFOLHQWVDQGWKLQNWLPH
❙ QXPEHURISURFHVVHVRUWKUHDGVLQH[HFXWLRQ
VLPXOWDQHRXVO\
❚ 6HUYLFHGHPDQGV 'L'L« 'L.ZKHUH 'LM LVWKH
VHUYLFHGHPDQGRIFRPSRQHQWLDWUHVRXUFHM
 1999 Menascé. All Rights Reserved.
17
'DWD&ROOHFWLRQ
❚ 7KLVVWHSDVVLJQVYDOXHVWRHDFKFRPSRQHQWRIWKH
PRGHO
❙ ,GHQWLI\WKHWLPHZLQGRZVWKDWGHILQHWKH
PHDVXUHPHQWVHVVLRQV
❙ 0RQLWRUDQGPHDVXUHWKHV\VWHPDFWLYLWLHVGXULQJWKH
GHILQHGWLPHZLQGRZV
❙ )URPWKHFROOHFWHGGDWDDVVLJQYDOXHVWRHDFK
FKDUDFWHUL]LQJSDUDPHWHUVRIHYHU\FRPSRQHQWRIWKH
ZRUNORDG
 1999 Menascé. All Rights Reserved.
18
9
3DUWLWLRQLQJWKHZRUNORDG
❚ 0RWLYDWLRQUHDOZRUNORDGVFDQEHYLHZHGDVD
FROOHFWLRQRIKHWHURJHQHRXVFRPSRQHQWV
❚ 3DUWLWLRQLQJWHFKQLTXHVGLYLGHWKHZRUNORDGLQWR
DVHULHVRIFODVVHVVXFKWKDWWKHLUSRSXODWLRQV
DUHFRPSRVHGRITXLWHKRPRJHQHRXV
FRPSRQHQWV
❚ :KDWDWWULEXWHV FDQEHXVHGIRUSDUWLWLRQLQJD
ZRUNORDGLQWRFODVVHVRIVLPLODUFRPSRQHQWV"
 1999 Menascé. All Rights Reserved.
19
3DUWLWLRQLQJWKH:RUNORDG
❚
❚
❚
❚
❚
❚
❚
5HVRXUFHXVDJH
$SSOLFDWLRQV
2EMHFWV
*HRJUDSKLFDORULHQWDWLRQ
)XQFWLRQDO
2UJDQL]DWLRQDOXQLWV
0RGH
 1999 Menascé. All Rights Reserved.
20
10
:RUNORDG3DUWLWLRQLQJ
5HVRXUFH8VDJH
Transaction
Classes
Frequency Maximum CPU time Maximum I/O time
Trivial
(msec)
(msec)
40%
8
120
Light
30%
20
300
Medium
20%
100
700
Heavy
10%
900
1200
 1999 Menascé. All Rights Reserved.
21
:RUNORDG3DUWLWLRQLQJ
,QWHUQHW$SSOLFDWLRQV
Application Classes
KB Transmitted
WWW
4,216
ftp
378
telnet
97
Mbone
595
Others
63
 1999 Menascé. All Rights Reserved.
22
11
:RUNORDG3DUWLWLRQLQJ
'RFXPHQW7\SHV
Document Class
Percentage of Access (%)
HTML (html file types)
30
Images (e.g., gif or jpeg)
40
Sound (e.g., au or wav)
4.5
Video (e.g., mpeg, avi or mov)
7.3
Dynamic (e.g., cgi or perl)
12.0
Formatted (e.g., ps, dvi or doc)
5.4
Others
0.8
 1999 Menascé. All Rights Reserved.
23
:RUNORDG3DUWLWLRQLQJ
*HRJUDSKLFDO2ULHQWDWLRQ
Classes
Percentage of Total Requests
East Coast
32
West Coast
38
Midwest
20
Others
10
 1999 Menascé. All Rights Reserved.
24
12
&DOFXODWLQJWKHFODVV
SDUDPHWHUV
❚ +RZVKRXOGRQHFDOFXODWHWKHSDUDPHWHUYDOXHV
WKDWUHSUHVHQWDFODVVRIFRPSRQHQWV"
❙ $YHUDJLQJZKHQDFODVVFRQVLVWVRIKRPRJHQHRXV
FRPSRQHQWVFRQFHUQLQJVHUYLFHGHPDQGVDQDYHUDJH
RIWKHSDUDPHWHUYDOXHVRIDOOFRPSRQHQWVPD\EH
XVHG
❙ &OXVWHULQJ RIZRUNORDGVLVDSURFHVVLQZKLFKD
ODUJHQXPEHURIFRPSRQHQWVDUHJURXSHGLQWR
FOXVWHUVRIVLPLODUFRPSRQHQWV
 1999 Menascé. All Rights Reserved.
25
&DOFXODWLQJ&ODVV
3DUDPHWHUV
❚ +RPRJHQHRXV:RUNORDG
❙ FRPSXWHDULWKPHWLFPHDQ
❙ :RUNORDG^'L'L«'L._L «S`
❙ :RUNORDG&KDUDWHUL]DWLRQ
❘ ''«'.ZKHUH
p
❘ 'M S Σ 'LM
i=1
 D. A. Menascé. All Rights Reserved.
26
13
&DOFXODWLQJ&ODVV
3DUDPHWHUV
❚ +HWHURJHQHRXV:RUNORDG
❙ XVHFOXVWHULQJDQDO\VLVWRGHWHUPLQHJURXSVRI
³VLPLODU´ZRUNORDGV
❙ 8VHDYHUDJLQJZLWKLQHDFKJURXS
❙ &OXVWHULQJDQDO\VLVDOJRULWKPVPLQLPDO
VSDQQLQJWUHH DQGNPHDQV
 D. A. Menascé. All Rights Reserved.
27
3DUDPHWHU7UDQVIRUPDWLRQ
❚ 3UHYHQWLQJH[WUHPHYDOXHVRISDUDPHWHUV
IURPGLVWRUWLQJGLVWULEXWLRQXVHOLQHDU
WUDQVIRUPDWLRQ
❚ 'W PHDVXUHG' PLQLPXP^'L`
PD[LPXP^'L` PLQLPXP^'L`
 D. A. Menascé. All Rights Reserved.
28
14
:RUNORDG6DPSOH
Document Size (KB) No. Accesses
1
12
281
2
150
28
3
5
293
4
25
123
5
7
259
6
4
241
7
35
75
 D. A. Menascé. All Rights Reserved.
29
:RUNORDG6DPSOHORJDULWKPLF
WUDQVIRUPDWLRQRISDUDPHWHUV
Document Size (KB) No. Accesses
1
1.08
2.45
2
2.18
1.45
3
0.70
2.47
4
1.40
2.09
5
0.85
2.41
6
0.60
2.38
7
1.54
1.88
 D. A. Menascé. All Rights Reserved.
30
15
3
C3
2.5
C1
C6 C5
C4
Number Accesses
2
C7
1.5
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
31
 D. A. Menascé. All Rights Reserved.
Minimal Spanning Tree Example
3
C3
2.5
C1
C4
C4
C6 C5
Number Accesses
2
C7
1.5
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
32
16
Minimal Spanning Tree Example
3
C3
2.5
C6
C1
C5
C4
Number Accesses
2
C7
Minimum
inter-cluster
distance:
combine C3 and C6.
1.5
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
33
 D. A. Menascé. All Rights Reserved.
Minimal Spanning Tree Example
3
C36
2.5
C1
C5
C4
No. Accesses
2
1.5
Minimum
inter-cluster
distance:
combine C36 and C5.
C7
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
34
17
Minimal Spanning Tree Example
3
C356
2.5
C1
C4
No. Accesses
2
1.5
Minimum
inter-cluster
distance:
combine C4 and C7.
C7
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
35
 D. A. Menascé. All Rights Reserved.
Minimal Spanning Tree Example
3
C356
2.5
C1
C47
No. Accesses
2
1.5
Minimum
inter-cluster
distance:
combine C356 and C1.
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Document Size (KB)
 D. A. Menascé. All Rights Reserved.
36
18
5HVXOWRI:RUNORDG
&KDUDFWHUL]DWLRQ
Type
Small
Medium
Large
Class
C1356
C47
C2
Size (KB)
No. Accesses No. Components
8.19
271.51
4
29.58
96.05
2
150.00
28.00
1
37
 D. A. Menascé. All Rights Reserved.
K-means Example: starting allocation
3
C3
2.5
C1
C6 C5
C4
Number Accesses
2
C7
1.5
C2
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
38
19
K-means Example: starting allocation
3
C3
Ca
2.5
C1
C6 C5
Cb
Number Accesses
2
C4
C7
C2
1.5
Cc
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
39
 D. A. Menascé. All Rights Reserved.
K-means Example: C1 joins Ca.
3
C3 Ca C1
2.5
C6 C5
Cb
Number Accesses
2
C4
C7
C2
1.5
Cc
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
40
20
K-means Example: C5 joins Ca.
3
C3 Ca C1
2.5
C6 C5
Cb
Number Accesses
2
C4
C7
C2
1.5
Cc
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
41
 D. A. Menascé. All Rights Reserved.
K-means Example: C6 joins Ca.
3
2.5
C3Ca
C6 C5
C1
Cb
Number Accesses
2
C4
C7
C2
1.5
Cc
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
42
21
K-means Example: C7 joins Cb.
3
2.5
C3Ca
C6 C5
C1
Cb
C4
C7
Number Accesses
2
C2
1.5
Cc
1
0.5
0
0.00
0.50
1.00
1.50
2.00
2.50
Size (KB)
 D. A. Menascé. All Rights Reserved.
43
1RYHO)HDWXUHVLQWKH:::
❚ 7KH:HEH[KLELWVH[WUHPHYDULDELOLW\LQ
ZRUNORDGFKDUDFWHULVWLFV
❙ :HEGRFXPHQWVL]HVYDU\LQWKHUDQJHRIWR
E\WHV
❙ $FFHVVSDWWHUQVLQWKH:HEYDU\WUHPHQGRXVO\
/RDGVSLNHVRIWRWLPHVWKHDYHUDJH
 1999 Menascé. All Rights Reserved.
44
22
1RYHO)HDWXUHVLQWKH:::
❚ 7KH:HEH[KLELWVH[WUHPHYDULDELOLW\LQ
ZRUNORDGFKDUDFWHULVWLFV
❙ :HEGRFXPHQWVL]HVYDU\LQWKHUDQJHRIWR
E\WHV
❙ $FFHVVSDWWHUQVLQWKH:HEYDU\WUHPHQGRXVO\
/RDGVSLNHVRIWRWLPHVWKHDYHUDJH
❚ :HEWUDIILFH[KLELWVD EXUVW\ EHKDYLRU
❙ 7UDIILFLV EXUVW\ DFURVVVHYHUDOWLPHVFDOHV
❙ ,WLVGLIILFXOW\WRVL]HVHUYHUFDSDFLW\DQG
EDQGZLGWKWRVXSSRUWGHPDQGFUHDWHGE\ORDG
VSLNHV
 1999 Menascé. All Rights Reserved.
45
7\SHVRI:HE5HTXHVWV
❚ *(7RI6WDWLF+70/UHTXHVWV
❚ ([HFXWLRQRIDSSOLFDWLRQDWWKHVHUYHU
❙ &*,VFULSWVHJWRSURFHVV+70/IRUPV
❘ $QHZSURFHVVLVVWDUWHGIRUHDFKUHTXHVW
❘ 6WDWHOHVVDSSOLFDWLRQ
❙ 6HUYHU$3,VHJ16$3,,6$3,
❘ $SSOLFDWLRQFRGHLVORDGHGDQGH[HFXWHGLQWKHVDPH
FRQWH[WDVWKHVHUYHU
❘ 3RRUVHFXULW\DQGQRLVRODWLRQ
❙ )DVW&*,
❘ :HEVHUYHUDQGDSSOLFDWLRQFRPPXQLFDWHYLDOLJKWZHLJKW
7&3RUORFDO,3&
❘ $SSOLFDWLRQFDQEHSHUVLVWHQWDQGVWDWHIXOO
 1999 Menascé. All Rights Reserved.
46
23
7\SHVRI:HE5HTXHVWVFRQW·G
❚ ([HFXWLRQRIDSSOLFDWLRQDWWKHVHUYHU
FRQW·G
❙ 6HUYHUVLGHVFULSWLQJ
❘ 6HUYHULQWHUSUHWVVFULSWVRUSURJUDPV
HPEHGGHGLQSDJHVEHIRUHUHWXUQLQJWKHPWR
WKHFOLHQW
‡ 06$FWLYH6HUYHU3DJHV$63SHUPLWVWKHXVHRI
-DYD6FULSWDQG9%6FULSWFRPELQHGZLWK$FWLYH;
FRQWUROVZULWWHQLQDQ\SURJUDPPLQJODQJXDJH
‡ 1HWVFDSH·V/LYH:LUHSHUPLWVWKHXVHRIVHUYHUVLGH
-DYD6FULSW
 1999 Menascé. All Rights Reserved.
47
7\SHVRI:HE5HTXHVWVFRQW·G
❚ ([HFXWLRQRIDSSOLFDWLRQDWWKHFOLHQW
❙ &OLHQWVLGHVFULSWLQJHJ-DYD6FULSW
❙ 'RZQORDGRIDSSOLFDWLRQVIURPVHUYHUIRU
H[HFXWLRQDWWKHFOLHQW
❘ -DYD
‡ 3ODWIRUPLQGHSHQGHQW
‡ /LPLWHGDFFHVVWRFOLHQWUHVRXUFHV
❘ 06$FWLYH;&RQWUROV
‡ )RU06:LQGRZVHQYLURQPHQWVRQO\
‡ 8QUHVWULFWHGDFFHVVWR3&UHVRXUFHV
 1999 Menascé. All Rights Reserved.
48
24
&*,VFULSWVYVVHUYHUVLGH
VFULSWVSHUIRUPDQFHLPSDFW
25
Server Throughput (req/sec)
20
15
10
WLPHWRFUHDWHGHVWUR\FJLSURFHVV
WLPHWRH[HFXWHDSSOLFDWLRQ
5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of Concurrent Requests
CGIscripts
Server-side scripting
 1999 Menascé. All Rights Reserved.
49
:::7UDIILF%XUVWV
Bytes
107
106
Chronological time (slots of 1000 sec)
 1999 Menascé. All Rights Reserved.
50
25
%XUVWLQHVV DQG7KURXJKSXW
❚ %XUVWLQHVV IDFWRUEIUDFWLRQRIWLPH
GXULQJZKLFKWKHLQVWDQWDQHRXVDUULYDO
UDWHH[FHHGVWKHDYHUDJHDUULYDOUDWH
 1999 Menascé. All Rights Reserved.
51
%XUVWLQHVV DQG7KURXJKSXW
❚ %XUVWLQHVV IDFWRUEIUDFWLRQRIWLPH
GXULQJZKLFKWKHLQVWDQWDQHRXVDUULYDO
UDWHH[FHHGVWKHDYHUDJHDUULYDOUDWH
❚ 7KHVLWHWKURXJKSXWGHFUHDVHVZLWKWKH
EXUVWLQHVV RIWKHZRUNORDG
 1999 Menascé. All Rights Reserved.
52
26
7KH,PSDFWRI %XUVWLQHVV
M aximu m. T hroughp ut
60
50
40
30
20
10
0
0.0
0.1
0.2
B urstine ss facto r
0.3
 1999 Menascé. All Rights Reserved.
53
:HE:RUNORDG&KDUDFWHUL]DWLRQ
❚ 7KHGLVWULEXWLRQRIWKHVL]HRI
UHWXUQHGILOHVLVKHDY\WDLOHG
❙ 0RVWILOHVDUHVPDOOEXWWKHUHLVDQRQ
QHJOLJLEOHSUREDELOLW\RIUHWXUQHGILOHV
EHLQJODUJHHJLPDJHVYLGHRVRXQG
❙ 3URE >UHWXUQHGILOHVL]H![@ N [α
IRUODUJHYDOXHVRI[DQGα 3DUHWRGLVWULEXWLRQ
 1999 Menascé. All Rights Reserved.
54
27
:HE:RUNORDG&KDUDFWHUL]DWLRQ
+HDY\7DLOHG'LVWULEXWLRQ
0
1 0,0 0 0
1 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0
1 0 ,0 0 0 ,0 0 0
Ln (P [File Size > x])
-2
-4
-6
-8
-10
-12
x (in b yte s )
 1999 Menascé. All Rights Reserved.
55
,QFRUSRUDWLQJ1HZ3KHQRPHQDLQ
WKH:RUNORDG&KDUDFWHUL]DWLRQ
$FFRXQWLQJIRU+HDY\7DLOVLQWKH0RGHO
❚ 'XHWRWKHODUJHYDULDELOLW\RIWKHVL]HRIGRFXPHQWV
DYHUDJHUHVXOWVIRUWKHZKROHSRSXODWLRQZRXOGKDYH
YHU\OLWWOHVWDWLVWLFDOPHDQLQJ
❚ &DWHJRUL]LQJWKHUHTXHVWVLQWRDQXPEHURIFODVVHV
GHILQHGE\UDQJHVRIGRFXPHQWVL]HVLPSURYHVWKH
DFFXUDF\DQGVLJQLILFDQFHRISHUIRUPDQFHPHWULFV
❚ 0XOWLFODVV TXHXLQJQHWZRUNPRGHOVZLWKFODVVHV
DVVRFLDWHGZLWKUHTXHVWVIRUGRFVRIGLIIHUHQWVL]H
 D. A. Menascé. All Rights Reserved.
56
28
$FFRXQWLQJIRU+HDY\7DLOVDQ
H[DPSOH
❚ 7KH+773/2*RID:HEVHUYHUZDVDQDO\]HG
GXULQJKRXU$WRWDORIUHTXHVWV
ZHUHVXFFHVVIXOO\SURFHVVHGGXULQJWKH
LQWHUYDO
❚ /HWXVXVHD PXOWLFODVV PRGHOWRUHSUHVHQW
WKHVHUYHU
❚ 7KHUHDUHFODVVHVLQWKHPRGHOHDFK
FRUUHVSRQGLQJWRWKHILOHVL]HUDQJHV
57
 D. A. Menascé. All Rights Reserved.
$FFRXQWLQJIRU+HDY\7DLOVDQ
H[DPSOH
❚ )LOH6L]H'LVWULEXWLRQV
C la s s
F ile S iz e R a n g e
(K B )
P e rc e n t o f
R e q u e s ts
1
S iz e < 5
25
2
5 ≤ s iz e ≤ 5 0
40
3
5 0 ≤ s iz e ≤ 1 0 0
20
4
1 0 0 ≤ s iz e ≤ 5 0 0
10
5
s iz e ≥ 5 0 0
 D. A. Menascé. All Rights Reserved.
5
58
29
$FFRXQWLQJIRU+HDY\7DLOVDQ
H[DPSOH
❚ 7KHDUULYDOUDWHIRUHDFKFODVVULVDIUDFWLRQ
RIWKHRYHUDOODUULYDOUDWHλ UHTXHVWVVHF
❘ λ
❘
❘
❘
❘
λ
λ
λ
λ
× UHTVHF
×
×
×
×
 D. A. Menascé. All Rights Reserved.
UHTVHF
UHTVHF
UHTVHF
UHTVHF
59
:HE:RUNORDG&KDUDFWHUL]DWLRQ
❚ 3RSXODULW\
❙ =LSI·V /DZWKHQXPEHURIUHIHUHQFHV3
WRDILOHWHQGVWREHLQYHUVHO\
SURSRUWLRQDOWRLWVUDQNU
3 NU
❚ 7KHVHFRQGPRVWSRSXODUILOHJHWV
KDOIWKHQXPEHURIUHIHUHQFHVRIWKH
PRVWSRSXODURQH
 1999 Menascé. All Rights Reserved.
60
30
:HE:RUNORDG&KDUDFWHUL]DWLRQ
❚ 3RSXODULW\
❙ =LSI·V /DZWKHQXPEHURIUHIHUHQFHV3
WRDILOHWHQGVWREHLQYHUVHO\
SURSRUWLRQDOWRLWVUDQNU
3 NU
❚ 7KHQWK PRVWSRSXODUILOHJHWVQRI
WKHQXPEHURIUHIHUHQFHVRIWKH
PRVWSRSXODURQH
 1999 Menascé. All Rights Reserved.
61
:HE:RUNORDG
&KDUDFWHUL]DWLRQ
=LSI /DZ([DPSOH
File
A
B
C
D
E
F
Popularity
1
2
3
4
5
6
 1999 Menascé. All Rights Reserved.
% Accesses
40.8%
20.4%
13.6%
10.2%
8.2%
6.8%
62
31
:RUNORDG&KDUDFWHUL]DWLRQ
IRU(FRPPHUFH
❚ &XVWRPHUVLQWHUDFWZLWKWKHVLWHWKURXJK
VHVVLRQVZKLFKDUHVHTXHQFHVRI
LQWHUUHODWHGUHTXHVWV
❚ 6HVVLRQVFDQEHFKDUDFWHUL]HGE\D
&XVWRPHU%HKDYLRU0RGHO*UDSK&%0*
 1999 Menascé. All Rights Reserved.
63
&XVWRPHU%HKDYLRU0RGHO
*UDSK&%0*
browse
entry
add
select
pay
search
 1999 Menascé. All Rights Reserved.
64
32
&XVWRPHU%HKDYLRU0RGHO
*UDSK&%0*
browse
entry
add
select
pay
search
 1999 Menascé. All Rights Reserved.
65
&XVWRPHU%HKDYLRU0RGHO
*UDSK&%0*
browse
0.3
entry
add
select
pay
search
0.35
 1999 Menascé. All Rights Reserved.
0.15
0.2
66
33
&XVWRPHU%HKDYLRU0RGHO
*UDSK&%0*
0.15
0.35
0.2
browse
0.3
0.5
0.2
0.3
entry
0.1
0.1
0.4
add
select
0.3
0.3
0.5
0.2
0.3
1.0
0.3
search
0.35
 1999 Menascé. All Rights Reserved.
0.1
pay
0.15
0.2
67
&XVWRPHU%HKDYLRU0RGHO
*UDSK&%0*
browse
WUDQVLWLRQSUREDELOLW\
Ps,b, Zs,b
VHUYHUVLGHWKLQNWLPH
search
0.35
0.2
0.15
 1999 Menascé. All Rights Reserved.
68
34
7KLQNWLPH
6HUYHUWKLQNWLPH
VHUYHU
EURZVHU
QW
QW
EURZVHUWKLQNWLPH
QW
QW QHWZRUNWLPH
 1999 Menascé. All Rights Reserved.
69
&%0*
&%0** 3=
3 >SLM @Q [QWUDQVLWLRQSUREDELOLW\
PDWUL[
= >]LM @Q [QWKLQNWLPHPDWUL[
VWDWHHQWU\
VWDWHQH[LW
 1999 Menascé. All Rights Reserved.
70
35
0HWULFV'HULYHGIURPWKH
&%0*
Ventry = 1
n −1
V j = ∑ Vk × pk , j
k =1
 1999 Menascé. All Rights Reserved.
71
0HWULFV'HULYHGIURPWKH
&%0*
❚ $YHUDJH1XPEHURI9LVLWV3HU6WDWH
❙ (JDYHUDJHQXPEHURIVHDUFKHVSHUYLVLWWR
WKHVLWH
❚ $YHUDJH%X\WR9LVLW5DWLR9SD\
❚ $YHUDJH6HVVLRQ/HQJWK3HU9LVLW
 1999 Menascé. All Rights Reserved.
n −1
∑V
k =1
k
72
36
:RUNORDG&KDUDFWHUL]DWLRQ
0HWKRGRORJ\
➊,GHQWLI\WKHGLIIHUHQWW\SHVRIVHVVLRQV
WKDWFRPSRVHWKHZRUNORDGUHSUHVHQWHG
E\&%0*V
➋&RPSXWHZRUNORDGLQWHQVLW\SDUDPHWHUV
SHUFODVV
❙ VHVVLRQDUULYDOUDWH λs
r
❙ DUULYDOUDWHSHUUHTXHVWW\SH
❙ WKLQNWLPHVPDWUL[=
λrj = λsr × V jr
 1999 Menascé. All Rights Reserved.
73
:RUNORDG&KDUDFWHUL]DWLRQ
0HWKRGRORJ\
➌'HWHUPLQHUHVRXUFHXVDJHSDUDPHWHUV
n −1
Di ,r = ∑ Di ,r , j × V jr
j =1
GHPDQGDWGHYLFHL
E\FODVVU
 1999 Menascé. All Rights Reserved.
GHPDQGDWGHYLFHL
E\FODVVUIRUUHTXHVWM
74
37
:RUNORDG&KDUDFWHUL]DWLRQ
IRU(FRPPHUFH
❚ 1HHGWRPDSHFRPPHUFHIXQFWLRQVWR
WUDQVDFWLRQV
❙ VHDUFK 6HDUFK%RRN%\7LWOH
❚ 1HHGWRPDSWUDQVDFWLRQVWRUHVRXUFH
GHPDQGV
❙ 6HDUFK%RRN%\7LWOH,2VLQWKH,QGH[GLVN
,2VLQWKHPDLQ'%GLVN PVHF &38
.E\WHVWUDQVIHUUHGRYHUWKH/$1
 1999 Menascé. All Rights Reserved.
75
:RUNORDG&KDUDFWHUL]DWLRQ
0HWKRGRORJ\
+773/RJV
0HUJHDQG)LOWHU
5HTXHVW/RJ
*HW6HVVLRQV
6HVVLRQ/RJ
*HW&%0*VFOXVWHULQJDOJRULWKP
&%0*V
 1999 Menascé. All Rights Reserved.
76
38
:RUNORDG&KDUDFWHUL]DWLRQ
❚ +773ORJHQWU\
XVHUBLGUHTXHVWBW\SHUHTXHVWBWLPH
H[HFXWLRQBWLPH
❚ NWK VHVVLRQORJHQWU\ (Ck , Wk )
Ck = [ci , j ]
Q[QPDWUL[RIWUDQVLWLRQVFRXQWV
EHWZHHQLDQGM
Wk = [ wi , j ] Q[QPDWUL[RIDFFXPXODWHGWKLQNWLPHV
EHWZHHQLDQGM
 1999 Menascé. All Rights Reserved.
77
3URFHGXUH*HW&%0*V
❚ 6HVVLRQORJ;P &P:PP «0
❚ 'LVWDQFHEHWZHHQSRLQWV;D DQG;E
d a ,b =
n
n
∑∑ (C [i, j ] − C [i, j ])
i =1 j =1
2
a
b
❚ NPHDQVFOXVWHULQJDOJRULWKP
 1999 Menascé. All Rights Reserved.
78
39
3URFHGXUH*HW&%0*V
FRQW·G
❚ $GGLQJSRLQW;P &P:PWRFHQWURLGN
UHSUHVHQWHGE\SRLQW&:
&P:P
&:
&¶:¶
 1999 Menascé. All Rights Reserved.
79
3URFHGXUH*HW&%0*V
FRQW·G
❚ $GGLQJSRLQW;P &P:PWRFHQWURLGN
UHSUHVHQWHGE\SRLQW&:
s ( k ) × c[i, j ] + cm [i, j ]
s(k ) + 1
s ( k ) × w[i, j ] + wm [i, j ]
w’[i, j ] =
s(k ) + 1
c ’[i, j ] =
 1999 Menascé. All Rights Reserved.
80
40
3URFHGXUH*HW&%0*V
FRQW·G
❚ 2EWDLQLQJPDWULFHV3DQG=IRUHDFK
FOXVWHU
p[i, j ] =
c[i, j ]
n
∑ c[i, k ]
k =1
z[i, j ] =
w[i, j ]
c[i, j ]
 1999 Menascé. All Rights Reserved.
81
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ ,QWUDFOXVWHU GLVWDQFHIRUFOXVWHUN
~
1
dk =
d ( x, C k )
∑
s (k ) x∈Ck
❚ ,QWHUFOXVWHUGLVWDQFHEHWZHHQFOXVWHUVL
DQGM
~
Di , j = d (Ci , C j )
 1999 Menascé. All Rights Reserved.
82
41
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ ,QWUDFOXVWHU GLVWDQFHIRUFOXVWHUNDYJ
YDULDQFHDQGFRHIIRIYDULDWLRQ
1 k ~
d = ∑ dk
k j =1
σ
2
intra
1 k ~
=
(d k − d ) 2
∑
k − 1 j =1
k >1
Cintra = σ intra / d
 1999 Menascé. All Rights Reserved.
83
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ ,QWHUFOXVWHU GLVWDQFHEHWZHHQFOXVWHUV
DYJYDULDQFHDQGFRHIIRIYDULDWLRQ
k
k
1
~
D=
Di , j
k >1
∑
∑
k (k − 1) / 2 j =1 j =i +1
σ
2
inter
k
k
1
~
=
( Di , j − D ) 2
∑
∑
k (k − 1)2 − 1 i =1 j =i +1
k >2
Cinter = σ inter / D
 1999 Menascé. All Rights Reserved.
84
42
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ 0LQLPL]HLQWUDFOXVWHUYDULDQFH
❚ 0D[LPL]HLQWHUFOXVWHUYDULDQFH
❚ ³VPDOO´HQRXJKQXPEHURISRLQWVWR
DFKLHYHFRPSDFWDQGUHSUHVHQWDWLYH
ZRUNORDGUHSUHVHQWDWLRQ
❚ 8VHUDWLREHWZHHQLQWUDDQGLQWHUFOXVWHU
YDULDQFHVDQGFRHIILFLHQWVRIYDULDWLRQ
 1999 Menascé. All Rights Reserved.
85
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ 5DWLREHWZHHQLQWUDDQGLQWHUFOXVWHU
YDULDQFHV
2
β var =
σ intra
2
σ inter
❚ 5DWLREHWZHHQLQWUDDQGLQWHUFOXVWHU
FRHIILFLHQWVRIYDULDWLRQ
β cv =
 1999 Menascé. All Rights Reserved.
Cintra
Cinter
86
43
$VVHVVLQJWKHHIILFLHQF\RI
WKHFOXVWHULQJDOJRULWKP
❚ 6\QWKHWLF+773ORJZLWKHQWULHV
❙ *HQHUDWHGVHVVLRQV
❚ 5HDO+773ORJVIURPDUHWDLORQOLQHVWRUH
❙ UHTXHVWVDIWHULPDJHVZHUH
HOLPLQDWHG
❙ 5RERWVHVVLRQVZHUHGHWHFWHGYHU\ORQJ
VHVVLRQV
❙ LGHQWLILHGFXVWRPHUVHVVLRQVZLWKDYJ
VHVVLRQOHQJWKRIUHTXHVWVHDFK
 1999 Menascé. All Rights Reserved.
87
,QWUDDQG,QWHUFOXVWHU&9DQG
βFYYVQXPEHURIFOXVWHUV
1.2
1
0.8
0.6
0.4
0.2
0
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Number of Clusters
CV-inter
 1999 Menascé. All Rights Reserved.
CV-intra
Beta CV
88
44
βYDUYVQXPEHURIFOXVWHUV
0.25
Beta - Variance
0.20
0.15
0.10
0.05
0.00
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Number of Clusters
 1999 Menascé. All Rights Reserved.
89
&OXVWHUVIRU6\QWKHWLF/RJV
Cluster
% of Sessions
BV Ratio (%)
Session Length
AV Ratio (%)
Vb+Vs
1
44.28
5.70
5.6
11
3.6
2
28.00
4.50
15
15
11.4
3
10.60
3.70
27
21
20
4
9.29
4.00
28
20
23
5
6.20
3.50
50
32
39
6
1.50
2.00
81
50
70
‡ &OXVWHUPDMRULW\RIVHVVLRQVVKRUW
VHVVLRQVDQGKLJKHVW%9UDWLR
‡ &OXVWHUVPDOOIUDFWLRQRIVHVVLRQV
ODUJHVHVVLRQVVPDOOHVW%9UDWLR
 1999 Menascé. All Rights Reserved.
90
45
%X\WR9LVLW5DWLRYV6HVVLRQ/HQJWK
7
6
5
4
3
2
y = 0.0003x - 0.07x + 5.7919
2
2
R = 0.931
1
0
0
10
20
30
40
50
60
70
80
90
Session Length
 1999 Menascé. All Rights Reserved.
91
&OXVWHUVIRU5HDO/RJV
Cluster Percent of Points Avg. Session Length
1
6.5
12.0
2
42.6
6.9
3
20.4
7.2
4
12.7
9.0
5
2.7
14.8
6
8.0
12.0
7
7.2
11.2
 1999 Menascé. All Rights Reserved.
92
46
:RUNORDG&KDUDFWHUL]DWLRQ
IRU(FRPPHUFH
❚ &XVWRPHUVLQWHUDFWZLWKWKHVLWHWKURXJK
VHVVLRQVZKLFKDUHVHTXHQFHVRI
LQWHUUHODWHGUHTXHVWV
❚ 6HVVLRQVFDQEHFKDUDFWHUL]HGE\D&%0*
❚ *URXSVRI³VLPLODU´FXVWRPHUVFDQEH
FKDUDFWHUL]HGE\D&%0*SHUJURXS
❚ &%0*V FDQSURYLGHLPSRUWDQWPHWULFV
VXFKDVEX\WRYLVLWUDWLRDQGDYHUDJH
VHVVLRQOHQJWK
 1999 Menascé. All Rights Reserved.
93
47