Eruptions of the Old Faithful Geyser

Eruptions of the Old Faithful Geyser
A geyser is a hot spring that occasionally becomes unstable and erupts hot water and
steam into the air. The Old Faithful Geyser at Yellowstone National Park in Wyoming is
probably the most famous geyser in the world. Visitors to the park try to arrive at the
geyser site to see it erupt without waiting too long; the name of the geyser comes from the
fact that eruptions follow a relatively stable pattern. The National Park Service erects a
sign at the geyser predicting when the next eruption will occur, and also post these predictions at the Old Faithful Geyser WebCam page (http://www.nps.gov/archive/yell/
OldFaithfulcam.htm), which also includes a photo of the geyser that is updated roughly
every 30 seconds (see the last page of this handout for an example). Thus, it is of interest
to understand and predict the interval time until the next eruption.
The following analysis is based on a sample of 222 intereruption times taken during
August 1978 and August 1979 (data source: Applied Linear Regression, 2nd. ed., by S.
Weisberg). The histogram below shows that, in fact, Old Faithful isn’t as “faithful” as
you might think: times between eruptions range between 400 and 100 minutes, with two
apparent subgroups in the data (in fact, the geyser has become so popular not because
it is the largest or most regular geyser in Yellowstone Park, but rather because it erupts
more frequently than any of the other large geysers in the park).
c 2015, Jeffrey S. Simonoff
1
Frequency
30
20
10
0
40
50
60
70
80
90
100
Time interval until next eruption
Times between eruptions apparently center around 55 minutes roughly one–third of the
time, and around 80 minutes roughly two–thirds of the time. The existence of two subgroups in this type of data is rare, but not unheard of; J.S. Rinehart, in a 1969 paper in
the Journal of Geophysical Research, provides a mechanism for this pattern based on the
temperature level of the water at the bottom of a geyser tube at the time the water at the
top reaches boiling temperature.
A readily available characteristic of the geyser that might be used to forecast the
time until the next eruption is the duration of the previous eruption. A scatter plot of
time interval until the next eruption on duration of previous eruption looks quite linear,
suggesting the use of a linear model relating the two variables:
c 2015, Jeffrey S. Simonoff
2
Time interval until next eruption
100
90
80
70
60
50
40
2
3
4
5
Duration of previous eruption
That a shorter eruption would be followed by a shorter time interval until the next eruption
(and a longer eruption would be followed by a longer time interval) is also consistent with
Rinehart’s geyser model, since a short eruption is characterized by having more water at
the bottom of the geyser being heated short of boiling temperature, and left in the tube.
This water has been heated somewhat, however, so it takes less time for the next eruption
to occur. A long eruption results in the tube being emptied, so the water must be heated
form a colder temperature, which takes longer. A. Azzalini and A.W. Bowman provide
further discussion of statistical analysis based on this model in a 1990 paper in Applied
Statistics.
Here are the results of a regression of time interval until the next eruption on previous
eruption duration:
c 2015, Jeffrey S. Simonoff
3
Regression Analysis: Interval versus Duration
Analysis of Variance
Source
Regression
Duration
Error
Lack-of-Fit
Pure Error
Total
DF
1
1
220
32
188
221
Adj SS
27860
27860
8344
1658
6686
36204
Adj MS
27859.9
27859.9
37.9
51.8
35.6
F-Value
734.56
734.56
P-Value
0.000
0.000
1.46
0.065
Model Summary
S
6.15853
R-sq
76.95%
R-sq(adj)
76.85%
R-sq(pred)
76.57%
Coefficients
Term
Constant
Duration
Coef
33.97
10.358
SE Coef
1.43
0.382
T-Value
23.79
27.10
P-Value
0.000
0.000
VIF
1.00
Regression Equation
Interval = 33.97 + 10.358 Duration
Durbin-Watson statistic = 2.50204
The regression is very significant, with previous duration accounting for 77% of the
variability in time until the next eruption. However, these data form a time series, so we
need to check whether there is evidence of autocorrelation in the errors. Here is a time
series plot of the standardized residuals:
c 2015, Jeffrey S. Simonoff
4
Residuals Versus the Order of the Data
(response is Interval)
Standardized Residual
3
2
1
0
-1
-2
50
100
150
200
Observation Order
There isn’t any obvious cyclical pattern here, so that seems good. We need to check
the various tests for autocorrelation, however. The first is given with the regression output
— a Durbin–Watson value of 2.50. The sample size here is pretty large, so we can construct
an approximate z–statistic for this value,
√
√
z = (DW/2 − 1) n = (1.25 − 1) 222 = 3.72.
This is highly significant and positive, indicating negative autocorrelation in the errors.
Note that this reinforces the difficulty in identifying negative autocorrelation from a time
series plot, as it doesn’t show up as a cyclical effect.
Let’s look at an ACF plot to see what the autocorrelation structure looks like.
c 2015, Jeffrey S. Simonoff
5
Autocorrelation
Autocorrelation Function for SRES1
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5
Lag Corr
1
2
3
4
5
6
7
-0.26
0.13
-0.02
-0.03
0.06
0.04
0.05
T
LBQ
-3.81
1.79
-0.26
-0.40
0.87
0.61
0.72
14.69
18.37
18.45
18.65
19.55
20.01
20.64
10
Lag Corr
8
9
10
11
12
13
14
0.07
0.01
0.06
0.01
0.02
-0.03
0.06
15
T
LBQ
0.98
0.14
0.82
0.08
0.25
-0.47
0.83
21.83
21.85
22.69
22.70
22.78
23.06
23.96
Lag Corr
15
16
17
18
19
20
-0.07
0.02
0.14
0.02
0.06
-0.01
20
T
LBQ
-0.90
0.23
1.83
0.22
0.77
-0.14
25.02
25.09
29.57
29.64
30.46
30.49
The first–order autocorrelation is −.256, which is not overwhelmingly large, but is
significantly negative. The evidence either for or against an AR(1) process is marginal
— the second– and third–order autocorrelations are positive and negative, respectively, as
desired, so the AR(1) model is probably not too bad an assumption. It might be difficult
to see the structure in the graphical ACF plot, so here is a nongraphical version.
Autocorrelation Function
ACF of SRES1
1
2
3
4
5
6
7
8
9
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
-0.256
XXXXXXX
0.128
XXXX
-0.019
X
-0.029
XX
0.063
XXX
0.045
XX
0.052
XX
0.072
XXX
0.010
X
c 2015, Jeffrey S. Simonoff
6
10
11
12
13
14
15
16
17
18
19
20
0.060
0.006
0.019
-0.034
0.061
-0.066
0.017
0.136
0.016
0.058
-0.011
XX
X
X
XX
XXX
XXX
X
XXXX
X
XX
X
Finally, here is a runs test of the residuals, which also agrees that there is negative
autocorrelation (too many runs):
Runs test for SRES1
Runs above and below K = 0
The observed number of runs = 128
The expected number of runs = 111.423
103 observations above K, 119 below
P-value = 0.025
Since AR(1) looked reasonable here, let’s try the Cochrane–Orcutt procedure. After
forming the “*” variables (remember that we need to add .256 times the lagged variable
here), here is the Cochrane–Orcutt regression:
c 2015, Jeffrey S. Simonoff
7
Analysis of Variance
Source
Regression
durstar
Error
Lack-of-Fit
Pure Error
Total
DF
1
1
219
163
56
220
Adj SS
18975
18975
7708
5935
1773
26683
Adj MS
18975.1
18975.1
35.2
36.4
31.7
F-Value
539.10
539.10
P-Value
0.000
0.000
1.15
0.276
Model Summary
S
5.93277
R-sq
71.11%
R-sq(adj)
70.98%
R-sq(pred)
70.64%
Coefficients
Term
Constant
durstar
Coef
45.58
9.709
SE Coef
1.92
0.418
T-Value
23.75
23.22
P-Value
0.000
0.000
VIF
1.00
Regression Equation
intstar = 45.58 + 9.709 durstar
Durbin-Watson statistic =
2.04644
The regression is slightly less significant, but still strong. The fitted regression is
Interval = 36.287 + 9.7086 × Duration (after correcting the constant term), which represents a small increase in the constant and small decrease in the slope coefficient, and
implies that each additional minute’s duration of the previous eruption is associated with
an estimated expected 9.7 additional minutes until the next eruption. Note that to apply
this model and make predictions you must use this formula in the calculator directly, as
predictions that come from within the Cochrane–Orcutt regression will not be correct.
c 2015, Jeffrey S. Simonoff
8
Has the autocorrelation been removed? The time series plot and ACF plot of the
residuals look good:
Residuals Versus the Order of the Data
(response is intstar)
Standardized Residual
2
1
0
-1
-2
50
100
150
200
Observation Order
Autocorrelation
Autocorrelation Function for SRES2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5
Lag Corr
1
2
3
4
5
6
7
-0.03
0.09
-0.01
-0.02
0.05
0.07
0.07
T
LBQ
-0.39
1.30
-0.22
-0.33
0.80
1.01
1.06
0.15
1.87
1.92
2.04
2.71
3.79
4.99
c 2015, Jeffrey S. Simonoff
10
Lag Corr
8
9
10
11
12
13
14
0.08
0.04
0.07
0.01
0.02
-0.03
0.04
15
T
LBQ
1.20
0.57
0.98
0.21
0.26
-0.47
0.65
6.57
6.93
8.01
8.06
8.13
8.39
8.87
Lag Corr
15
16
17
18
19
20
-0.07
0.05
0.13
0.07
0.06
-0.02
20
T
LBQ
-1.01
0.67
1.88
1.03
0.89
-0.30
10.05
10.59
14.80
16.10
17.09
17.20
9
Autocorrelation Function
ACF of SRES2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
+----+----+----+----+----+----+----+----+----+----+
-0.026
XX
0.087
XXX
-0.015
X
-0.022
XX
0.054
XX
0.069
XXX
0.072
XXX
0.083
XXX
0.040
XX
0.068
XXX
0.015
X
0.018
X
-0.033
XX
0.045
XX
-0.070
XXX
0.047
XX
0.132
XXXX
0.073
XXX
0.064
XXX
-0.021
XX
√
The Durbin–Watson test is not significant (z = (2.05/2 − 1) 221 = .37), and neither
is the runs test:
Runs test for SRES2
Runs above and below K = 0
The observed number of runs = 102
The expected number of runs = 111.317
106 observations above K, 115 below
P-value = 0.208
c 2015, Jeffrey S. Simonoff
10
Finally, a residual versus fitted plot indicates some slight nonconstant variance, but
otherwise there don’t appear to be any problems:
Residuals Versus the Fitted Values
(response is intstar)
Standardized Residual
2
1
0
-1
-2
70
80
90
100
Fitted Value
Normal Probability Plot of the Residuals
(response is intstar)
Standardized Residual
2
1
0
-1
-2
-3
-2
-1
0
1
2
3
Normal Score
c 2015, Jeffrey S. Simonoff
11
Row
SRES2
HI2
COOK2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
*
-0.06888
-1.21031
-0.15962
1.68232
1.75803
-1.11728
1.50206
0.85784
-1.23435
0.54326
-0.94893
0.66322
0.57954
0.63161
1.12708
-0.28909
-1.72482
2.10828
-1.35300
2.00986
0.17931
1.85113
0.21255
0.10315
-0.75234
-0.87564
1.15510
1.34962
-0.69286
1.92303
-0.07669
0.52384
-1.31246
0.64940
-0.20470
-0.27955
-1.01452
1.73533
0.48717
*
0.0059557
0.0058103
0.0059429
0.0045307
0.0057982
0.0109818
0.0076968
0.0170279
0.0080758
0.0162340
0.0060030
0.0045633
0.0068265
0.0186940
0.0046435
0.0047412
0.0054976
0.0049836
0.0183074
0.0045500
0.0178870
0.0079380
0.0200963
0.0062935
0.0130769
0.0058799
0.0060954
0.0077734
0.0104149
0.0045755
0.0175117
0.0062935
0.0158828
0.0067620
0.0155014
0.0061450
0.0144302
0.0057832
0.0053449
*
0.0000142
0.0042805
0.0000762
0.0064406
0.0090123
0.0069305
0.0087500
0.0063739
0.0062023
0.0024351
0.0027191
0.0010082
0.0011543
0.0037999
0.0029631
0.0001991
0.0082229
0.0111311
0.0170693
0.0092319
0.0002928
0.0137092
0.0004632
0.0000337
0.0037499
0.0022675
0.0040913
0.0071350
0.0025262
0.0084990
0.0000524
0.0008690
0.0139003
0.0014356
0.0003299
0.0002416
0.0075348
0.0087584
0.0006377
c 2015, Jeffrey S. Simonoff
12
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
0.12848
-0.84575
-1.92309
2.06020
0.43925
0.25661
0.38638
1.00064
0.91621
-0.85636
0.88401
-1.93311
2.25204
0.02621
1.24228
0.70011
0.07085
0.83656
1.51139
0.21446
-0.88106
-1.64480
-0.01817
0.47416
0.05791
-0.04551
0.80049
0.41952
-0.95180
0.20459
-0.74726
1.60182
-0.00490
2.25516
0.30717
1.08464
0.34003
0.68550
-0.46033
-0.32160
2.25055
2.39998
0.00744
0.0047976
0.0046481
0.0073751
0.0047462
0.0047920
0.0049230
0.0049230
0.0096617
0.0066255
0.0074114
0.0048604
0.0048542
0.0045928
0.0052448
0.0112750
0.0057622
0.0072207
0.0081978
0.0045642
0.0179654
0.0050408
0.0057123
0.0081366
0.0144302
0.0053329
0.0046113
0.0046481
0.0196076
0.0061450
0.0047002
0.0053449
0.0047976
0.0213885
0.0060030
0.0174347
0.0045396
0.0183074
0.0052089
0.0151615
0.0062935
0.0053685
0.0048287
0.0171417
c 2015, Jeffrey S. Simonoff
0.0000398
0.0016701
0.0137390
0.0101206
0.0004645
0.0001629
0.0003693
0.0048843
0.0027994
0.0027378
0.0019084
0.0091141
0.0117003
0.0000018
0.0087993
0.0014204
0.0000183
0.0028923
0.0052369
0.0004207
0.0019664
0.0077713
0.0000014
0.0016459
0.0000090
0.0000048
0.0014962
0.0017600
0.0028006
0.0000988
0.0015003
0.0061846
0.0000003
0.0153570
0.0008371
0.0026825
0.0010781
0.0012303
0.0016311
0.0003275
0.0136691
0.0139739
0.0000005
13
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
1.73613
-1.18241
-0.09986
1.31411
-1.19477
-0.73568
-0.28371
-0.39396
0.12193
1.07190
-1.04561
-0.34384
-1.01959
-1.29061
1.75015
0.49492
1.16295
0.04542
-0.56352
0.93597
0.44439
-0.75373
-0.69250
0.36702
1.12507
0.75279
-0.54619
0.04232
0.06887
-1.47294
-1.37786
-0.18155
-0.49365
-1.31791
-1.15149
-1.74640
-0.89170
-0.92055
1.17035
-1.35648
-1.13356
-0.19884
0.33259
0.0050408
0.0170658
0.0045500
0.0178870
0.0067620
0.0086700
0.0056957
0.0077734
0.0170658
0.0056273
0.0134139
0.0047703
0.0093492
0.0072207
0.0066879
0.0060820
0.0065234
0.0072459
0.0077159
0.0144302
0.0057503
0.0134139
0.0078836
0.0080223
0.0066879
0.0073932
0.0103779
0.0147926
0.0069359
0.0127465
0.0071164
0.0075977
0.0055321
0.0096031
0.0098635
0.0088832
0.0118107
0.0118231
0.0080960
0.0080223
0.0060820
0.0059429
0.0088609
c 2015, Jeffrey S. Simonoff
0.0076354
0.0121369
0.0000228
0.0157256
0.0048591
0.0023667
0.0002305
0.0006080
0.0001291
0.0032511
0.0074325
0.0002833
0.0049054
0.0060574
0.0103116
0.0007494
0.0044403
0.0000075
0.0012346
0.0064132
0.0005711
0.0038621
0.0019054
0.0005447
0.0042613
0.0021104
0.0015642
0.0000134
0.0000166
0.0140055
0.0068036
0.0001262
0.0006778
0.0084206
0.0066043
0.0136680
0.0047517
0.0050695
0.0055899
0.0074403
0.0039315
0.0001182
0.0004945
14
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
0.24002
-0.64337
-1.04058
1.27417
-2.04268
-0.27925
0.82585
-1.32010
0.13080
0.19253
-0.13495
-0.21362
0.26297
0.39638
0.23489
-0.09010
-0.70973
0.31348
-1.00221
-1.10221
-0.67211
-1.61388
-1.40123
0.42158
-0.02521
1.80438
-0.20277
-0.28322
-0.34754
-1.05507
0.61148
-1.60721
1.41389
-1.33303
-0.12588
-0.17728
0.60297
0.34057
-0.08123
-1.20173
-1.15250
-0.32793
-0.35233
0.0072207
0.0081978
0.0063799
0.0054618
0.0049464
0.0079427
0.0109546
0.0074968
0.0130949
0.0106685
0.0083149
0.0098882
0.0053643
0.0056841
0.0065234
0.0178870
0.0052089
0.0065385
0.0124828
0.0113654
0.0127158
0.0094594
0.0115626
0.0109431
0.0055865
0.0075710
0.0096031
0.0112472
0.0083149
0.0137249
0.0069359
0.0155014
0.0074784
0.0124225
0.0054107
0.0082271
0.0055321
0.0096031
0.0151615
0.0069359
0.0096272
0.0115464
0.0074968
c 2015, Jeffrey S. Simonoff
0.0002095
0.0017107
0.0034762
0.0044580
0.0103709
0.0003122
0.0037770
0.0065816
0.0001135
0.0001999
0.0000764
0.0002279
0.0001865
0.0004491
0.0001811
0.0000739
0.0013188
0.0003234
0.0063483
0.0069831
0.0029090
0.0124368
0.0114841
0.0009832
0.0000018
0.0124188
0.0001993
0.0004562
0.0005064
0.0077454
0.0013058
0.0203362
0.0075313
0.0111760
0.0000431
0.0001303
0.0010112
0.0005623
0.0000508
0.0050433
0.0064558
0.0006281
0.0004688
15
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
-1.37297
-0.99730
-0.56347
-0.11126
-0.75358
0.66287
-0.84999
-0.62252
0.83642
-1.78426
1.47592
1.23528
1.55052
0.69415
-1.43357
-0.88078
-0.92324
-0.69250
1.10789
0.66696
-0.70752
-1.01764
0.63335
0.40465
1.16116
-0.16808
-1.12141
-0.44710
-1.17590
0.59300
-0.52561
0.14019
-0.36837
-0.44018
-1.77468
0.79171
1.82168
0.38157
-0.30627
0.59768
-0.00677
-1.17299
1.07970
0.0093592
0.0069526
0.0045256
0.0060555
0.0058399
0.0081366
0.0045256
0.0060555
0.0078204
0.0075153
0.0052723
0.0196076
0.0067620
0.0118396
0.0112354
0.0151466
0.0124225
0.0078836
0.0137249
0.0069359
0.0127465
0.0107815
0.0107103
0.0144158
0.0048630
0.0175889
0.0066100
0.0068754
0.0124828
0.0113654
0.0049259
0.0060422
0.0127772
0.0073033
0.0118396
0.0178479
0.0046300
0.0084198
0.0186940
0.0051178
0.0080023
0.0118520
0.0074968
c 2015, Jeffrey S. Simonoff
0.0089046
0.0034817
0.0007217
0.0000377
0.0016679
0.0018023
0.0016423
0.0011805
0.0027571
0.0120533
0.0057728
0.0152589
0.0081836
0.0028866
0.0116763
0.0059655
0.0053609
0.0019054
0.0085404
0.0015535
0.0032315
0.0056435
0.0021714
0.0011975
0.0032944
0.0002529
0.0041839
0.0006919
0.0087393
0.0020213
0.0006838
0.0000597
0.0008781
0.0007127
0.0188677
0.0056953
0.0077182
0.0006181
0.0008935
0.0009188
0.0000002
0.0082515
0.0044027
16
213
214
215
216
217
218
219
220
221
222
-0.34709
-0.98962
0.82259
-0.03482
-0.24113
0.69115
0.85232
-1.30085
-0.84667
1.11647
0.0070536
0.0183074
0.0067620
0.0155014
0.0056273
0.0121346
0.0048301
0.0131083
0.0122602
0.0111919
0.0004279
0.0091319
0.0023033
0.0000095
0.0001645
0.0029339
0.0017629
0.0112382
0.0044489
0.0070543
Thus, the given line seems reasonable as a way to predict the time until the next eruption of the geyser, using the easily available duration of the previous eruption. A rough 95%
p
d
prediction interval for the time until the next eruption would be Interval±2σ̂/
1 − ρ̂2 =
p
d
d
d
Interval
± (2)(5.933)/ 1 − (−.256)2 = Interval
± (2)(6.14) = Interval
± 12.28. A
d
d
rough 90% prediction interval would be Interval
± (1.65)(6.14) = Interval
± 10.13. You
might have noted that the interval given on the Old Faithful WebCam site was of the form
d time ± 10, so apparently it corresponds to a 90% prediction interval.
Predicted
We can verify the usefulness of this model by validating it on a separate set of observations. The 1990 Azzalini and Bowman article includes data for 296 eruptions in August
1985. If the OLS and GLS (Cochrane–Orcutt) equations are applied to these new data,
the errors have the following statistics:
Descriptive Statistics: GLS error, OLS error
Variable
GLS erro
OLS erro
Variable
GLS erro
OLS erro
N
230
230
N*
68
68
Mean
2.012
2.066
Median
1.223
1.369
TrMean
1.745
1.833
StDev
6.823
6.735
SE Mean
0.450
0.444
Minimum
-10.762
-11.865
Maximum
32.879
32.600
Q1
-3.711
-2.871
Q3
6.032
6.397
c 2015, Jeffrey S. Simonoff
17
In this case there is little to choose between the two models. Both underestimate the
time intervals by roughly two minutes on average, with standard deviations slightly higher
than the estimated standard deviation of errors from the original data.
As is typical, the Prais-Winston procedure gives a similar result:
Analysis of Variance
Source
Regression
durstar2
Error
Lack-of-Fit
Pure Error
Total
DF
1
1
220
164
56
221
Adj SS
19033
19033
7839
6066
1773
26872
Adj MS
19032.8
19032.8
35.6
37.0
31.7
F-Value
534.14
534.14
P-Value
0.000
0.000
1.17
0.253
Model Summary
S
5.96933
R-sq
70.83%
R-sq(adj)
70.69%
R-sq(pred)
70.36%
Coefficients
Term
Constant
durstar2
Coef
45.46
9.722
SE Coef
1.93
0.421
T-Value
23.55
23.11
P-Value
0.000
0.000
VIF
1.00
Regression Equation
intstar2 = 45.46 + 9.722 durstar2
You might have wondered about the possibility of just using the lagged version of
Interval as a predictor in a regression, as we’ve done earlier. In fact, the regression
on just lagged interval is clearly inferior to using the duration of the previous eruption,
although the autocorrelation is addressed:
c 2015, Jeffrey S. Simonoff
18
Analysis of Variance
Source
Regression
Lagged interval
Error
Lack-of-Fit
Pure Error
Total
DF
1
1
219
47
172
220
Adj SS
14797
14797
21358
3291
18067
36155
Adj MS
14797.0
14797.0
97.5
70.0
105.0
F-Value
151.72
151.72
P-Value
0.000
0.000
0.67
0.948
Model Summary
S
9.87547
R-sq
40.93%
R-sq(adj)
40.66%
R-sq(pred)
39.96%
Coefficients
Term
Constant
Lagged interval
Coef
116.44
-0.6399
SE Coef
3.75
0.0519
T-Value
31.05
-12.32
P-Value
0.000
0.000
VIF
1.00
Regression Equation
Interval = 116.44 - 0.6399 Lagged interval
Adding the duration variable (resulting in two predictors based on the previous eruption), yields a model that is comparable to the earlier models, but it is based on two
predictors, rather than one, and would thus be harder to implement “on the fly”:
Analysis of Variance
Source
Regression
Duration
Lagged interval
Error
DF
2
1
1
218
c 2015, Jeffrey S. Simonoff
Adj SS
28448.8
13651.9
635.6
7706.1
Adj MS
14224.4
13651.9
635.6
35.3
F-Value
402.40
386.20
17.98
P-Value
0.000
0.000
0.000
19
Lack-of-Fit
Pure Error
Total
189
29
220
7196.2
509.8
36154.9
38.1
17.6
2.17
0.008
P-Value
0.000
0.000
0.000
VIF
Model Summary
S
5.94550
R-sq
78.69%
R-sq(adj)
78.49%
R-sq(pred)
78.14%
Coefficients
Term
Constant
Duration
Lagged interval
Coef
50.14
9.159
-0.1673
SE Coef
4.06
0.466
0.0395
T-Value
12.35
19.65
-4.24
1.59
1.59
Regression Equation
Interval = 50.14 + 9.159 Duration - 0.1673 Lagged interval
We should note a mistake that we’ve made here: since these data have gaps corresponding to new days, cases 14, 27, 40, 54, 68, 82, 95, 108, 122, 136, 150, 164, 180, 194, and
208 should be considered missing in the Cochrane–Orcutt fit, since the lagged duration
and interval are not known for those cases (case 1 is of course taken as missing). If this is
done the results don’t change appreciably.
c 2015, Jeffrey S. Simonoff
20