Earthquake prediction: the null hypothesis

Geophys. J. Int. (1997) 131,495-499
SPECIAL SECTION-ASSESSMENT O F SCHEMES F O R E A R T H Q U A K E PREDICTION
Earthquake prediction: the null hypothesis
Philip B. Stark
Department of Statistics, University of Calijornia, Berkeley, CA 94720-3860, USA. E-mail: [email protected]
Accepted 1997 July 1. Received 1997 June 6; in original form 1997 February 17
SUMMARY
The null hypothesis in assessing earthquake predictions is often, loosely speaking, that
the successful predictions are chance coincidences. To make this more precise requires
specifying a chance model for the predictions and/or the seismicity. The null hypothesis
tends to be rejected not only when the predictions have merit, but also when the
chance model is inappropriate. In one standard approach, the seismicity is taken to be
random and the predictions are held fixed. ‘Conditioning’ on the predictions this way
tends to reject the null hypothesis even when it is true, if the predictions depend on
the seismicity history. An approach that seems less likely to yield erroneous conclusions
is to compare the predictions with the predictions of a ‘sensible’ random prediction
algorithm that uses seismicity up to time t to predict what will happen after time t.
The null hypothesis is then that the predictions are no better than those of the random
algorithm. Significance levels can be assigned to this test in a more satisfactory way,
because the distribution of the success rate of the random predictions is under our
control. Failure to reject the null hypothesis indicates that there is no evidence that
any extra-seismic information the predictor uses (electrical signals for example) helps
to predict earthquakes.
Key word: earthquake prediction.
1
INTRODUCTION
Suppose we are given a seismicity sequence and a set of
earthquake predictions. We seek to assess the predictions
statistically to determine whether they have merit. We want
our test to have a significance level a, that is we want to know
that when the (as yet unspecified) null hypothesis is true,
we have a chance of at most tl of rejecting it erroneously. To
reject the null hypothesis is to conclude that the success of
the prediction method should not be ascribed to chance
coincidence; colloquially, this would be to conclude that the
prediction method has merit. A more precise statement is
‘either the null hypothesis is false, or an event that has
probability < a has occurred’. Throughout this note, we shall
take a = 0.1.
To conclude that the null hypothesis is false is not the same
as concluding that the prediction method works, but this
distinction is often neglected. We might conclude that the null
hypothesis is false neither because the predictions have merit
nor because an event with probability < a occurred, but
because the null hypothesis is a probabilistically inadequate
model, whether or not the predictions have merit.
It might help to have a simple example in mind. We are
given a black box with a button on top and a one-digit
0 1997 RAS
display. When we push the button, a number is displayed. We
hypothesize that inside the box a coin is tossed five times
whenever we push the button, and the display shows the
number of times the coin landed ‘heads’. We seek to test the
null hypothesis that the coin is fair against the alternative that
it has probability greater than 50 per cent of landing ‘heads’.
Under the null hypothesis, the number displayed has a
binomial distribution with parameters n = 5, p = 0.5; under the
alternative the number is binomial with n = 5, p > 0.5. If we
reject the null hypothesis when the display shows 4 or more,
we get a test with level a zs 0.19.
We push the button, and the display shows 9. We therefore
reject the null hypothesis. However, under the null hypothesis
(and under the alternative), 9 is an impossible outcome! It is
clear that the null hypothesis is false, but not because of the
hypothesized value of p; rather, something more fundamental
is wrong with our probabilistic model of the black box.
(In this case, the alternative hypothesis does not explain the
observation either.)
Let us look at a slightly different situation. We have a black
box as before, but now we record a history of its output. In
the first nine trials, the output has been (4, 3, 2, 4, 2, 3, 3, 5, 4).
We propose to test the hypothesis that n = 5, p = 0.5 against
the alternative n = 5, p>O.5, by looking at the number of
495
496
P. B. Sturk
times in 10 trials that the output is 4 or larger. Under the
null hypothesis, that number has a binomial distribution with
n = 10, p = 0.19, so if we reject when we observe the number 4
or higher four or more times in 10 trials, we have a test with
significance level CI z 0.1. We push the button one more time,
and 4 shows on the display, so we reject the null hypothesis.
This test does not have the signijkance level claimed. The
appropriate computation would have been to find the conditional probability of observing the number 4 or higher, four
or more times in 10 trials, given that we had observed 4 or
higher four times in the j r s t nine trials. That conditional
probability is clearly unity, not 0.1. The point is that whether
we compute a probability or a conditional probability in a
hypothesis test can matter a great deal.
One common approach to assessing earthquake predictions
(e.g. Mulargia & Gasperini 1992; Riedel 1996; Varotsos et al.
1996b) is to model seismicity as a stochastic process, holding
the predictions fixed, and then to compare the observed success
rate of the predictions on the real seismicity with the success
rate of the predictions on random seismicity generated from
the stochastic-process model. If the measured prediction success
rate exceeds the 90th percentile of the success rate under the
stochastic model, one rejects the null hypothesis (at a significance level CI = 0.1) to conclude that the prediction method
works. There are several problems with this approach
(1) We might reject the null hypothesis not because the
predictions have merit, but because the stochastic model of
seismicity is poor. An event that appears to be unlikely
according to our model of seismicity might in actuality be
quite likely. This idea is illustrated in the first ‘black box’
example above.
(2) The significance level of the test is conditional. The
nominal significance level is not the probability of a false
rejection, but the conditional probability of a false rejection
given the predictions (and the stochastic model of seismicity).
If the predictions are a function of the seismicity (for example,
if the prediction algorithm tries to exploit clustering), this
conditional probability can be very misleading, as we shall see.
This approach pretends that no matter what the seismicity
had been, the predictions would have been the same, while no
reasonable geophysicist would ignore recent seismicity in predicting future seismicity. This general concern is illustrated in
the second ‘black box’ example above.
(3) The test does not answer a more important practical
question: does the prediction method (which typically uses
information extrinsic to the seismicity, such as electrical signals)
do a better job of predicting earthquakes than a reasonable
seismologist could from the seismicity alone, using a simple
model? If the answer to this question is ‘no’, one might question
the utility of the prediction method.
The difficulty of specifying a unique stochastic model for
earthquakes (point 1 ) is well known; see Kagan (1991) and
Ogata (1988)for a variety of competing models. The sensitivity
of the conclusions about the earthquake predictions to details
of the stochastic model of after shocks and to the spatial
variability of the rate of seismicity is well documented; see
e.g. Honkura & Tanaka (1996), Mulargia & Gasperini (1992),
Utada (1996) and Wyss & Allmann (1996). This note demonstrates that point (2) can substantially increase the chance
of concluding erroneously that the predictions have merit,
especially when the seismicity clusters. A similar conclusion
was reached by Michael (1997) in evaluating earthquake
predictions based on VLF anomalies. Point (3) has also been
raised by a number of authors, including Kagan & Jackson
(1996) and Mulargia & Gasperini (1996b). The explicit
recognition that this approach to testing predictions is a
conditional test appears to be new.
An alternative approach is to reformulate the null hypothesis
to say that the predictions are no better than those of another,
presumably simpler, method. In this approach, one conditions
on the observed seismicity, rather than on the predictions. I
assert that this approach overcomes all three of the problems
just mentioned. In this approach, one holds seismicity fixed,
and generates random predictions similar in character to given
predictions (for example, the number of predictions, length of
alarms, etc.). The algorithm for generating random predictions
should be sensible and causal-it should use only seismic
information available before time t in predicting what will
happen at time t. If the observed success rate is regularly
exceeded by random predictions, one concludes that the prediction method is not useful, in so far as it does no better than
a particular crude automated strategy that uses only seismic
information. This approach does not rely on a stochastic
model for seismicity, it allows for the possibility that the
predictions are a function of seismicity up to the time of each
prediction, and it explicitly compares the success rate with that
of other methods that rely exclusively on seismic data, not
extrinsic observations such as electrical signals. By deliberately
introducing chance into the ‘straw-man’ prediction algorithm,
one can assign a significance level to the test.
The approach was suggested by Stark (1996), but variants
have been suggested by others. For example, Kagan (1996)
suggested an extreme case of this approach: the ‘automatic
alarm’ strategy, in which one issues an alarm after every
sufficiently large event; he also compared the Varotsos,
Alexopoulos & Nomicos (VAN, 1981) Greek earthquake
predictions (Varotsos et al. 1996a) with a machine-learning
approach. Mulargia, Marzocchi & Gasperini (1996) also
explicitly compared the VAN predictions with a patternrecognition algorithm that automatically produces an alarm
whenever certain conditions are met. Aceves, Park & Strauss
( 1996) compared predictions derived by randomly sampling
the historic catalogue with the predictions to be evaluated
(VAN in their case); my principal objection to their
approach is the attempt to remove clustering from both the
predictions and the historical seismicity, and the fact that the
randomization of the catalogue prevents the predictions from
exploiting the seismic history.
The primary differences between this work and those just
mentioned are the recognition of the conditional nature of the
other approach to hypothesis testing; the deliberate introduction of chance into the comparison prediction algorithms
in order to obtain a more traditional statistical test; and the
rephrasing of the null hypothesis to be ‘these predictions are
no better than those of a (particular) automated strategy’,
rather than ‘the observed successes of these predictions are
chance coincidences’. The introduction of chance allows one
to adjust the prediction rate to match that of the method being
tested in a straightforward way, and allows one to assign a
significance level to the test.
2 SIMULATION MODEL
The points raised above can be illustrated by simulation. We
shall model seismicity as a Gamma renewal process, which is
0 1997 RAS, G J l 131, 495-499
Eurthquuke prediction: the izull lijyothesis
one generalization of a Poisson process. I n a Poisson process,
the times between events (interevent times) are independent
and identically distributed (iid) exponential random variables.
In a Gamma renewal process, interevent times are iid
Gamma(a, p) random variables. For a = 1, the Gamma distribution coincides with the exponential, yielding a Poisson
process as a special case. For other values of a, we can produce
synthetic seismicity sequences with more ( a < I ) or less (a > 1)
clustering than Poisson processes exhibit. Udias & Rice (1975)
found that a Gamma renewal process with a=0.509 (more
clustering than Poisson) fits some seismicity sequences better
than Poisson processes could. Fig. 1 shows three simulated
realizations of each of three Gamma renewal processes, all
with a rate of 15 events yr-l. The top three sequences are for
a Poisson process, the middle three are for a process with
more clustering than a Poisson, and the bottom three are for
a process with still more clustering.
In spite of their ability to model clustered seismicity, Gamma
renewal processes do not have aftershocks per se: a shock does
not raise the chance of a new shock by some mechanism;
rather, the clock restarts after every event, and the expected
rate of seismicity, ap, is constant over time. This is in contrast
to some stochastic models, such as the Epidemic Type
Aftershock Model (ETAS) (see Ogata 1988, 1993).
To give a scale to the simulations, we shall calibrate the
model to correspond roughly with Greek seismicity from
1987-1989, as reported by SI-NOA and tabulated by Geller
(1996). That tabulation shows an average of 15 events per
year with magnitude Ms>5.0. We thus fix the expected
number of events per year (a/3) in our simulations to be 15.
This will allow us to make a crude test of the VAN (Varotsos
et al. 1981) earthquake predictions (e.g. Varotsos et al. 1996a),
but this paper does not attempt a formal test of the VAN
predictions.
Because the observed rate of seismicity is a sufficient statistic
3-
+ + +
++
tt
+
+ + +++ +++
+
+
+++I+
+%
+
*
+
2.5 -
++
*
++it
+++
*
++
+
+
t;+
++ + +
+*
h
8
i
i
t*+
fKi+ 8
+ + ++++
it*++
Gamma(1,O 067) (Poisson)
+
2-
+ +
+tt+
+ ++
+ t + *
++*+*+
tc+t++H+
*+
+Ht
+
+
*it
++ +
+
+t
+ +
+i+*+++
1.5-
+
*
+
++
+
+
+
*t++++
+++
i t *
Gamma(0.50,0.133)
+
1-++++
~-
it
+ +
+ ++
+++
+
+
+
+
-HH
++++
++
+ + #
+
4
+%*
0.5 t++
**+
+
i++
+
+t
Gamma(0.10,0.66)
0
0.5
1
1.5
Timelyears
2
25
3
Figure 1. Three simulated realizations of each of three Gamma
renewal processes with the same rate, 15 events yr-’. The top three
sequences are simulations of a Poisson process, the middle three show
more clustering, and the bottom three still more. There are no
‘aftershocks’ in these mdoels-the interevent times are independent
and identically distributed. The difference is in the weight in the tails
of the interevent distributions.
0 1997 RAS, G J I 131, 495-499
497
for the intensity of a homogeneous Poisson process, if we were
to assume that the random seismicity sequences we generate
were Poisson-distributed, and estimate the intensity by maximum likelihood, the expected value of the estimate would be
15 events yr-’. (Nothing in estimating the intensity would alert
us to the fact that the process was not Poisson, and we would
estimate the entire family of Gamma renewal processes with
fixed ap to be about the same Poisson process.) We shall
generate 3-year sequences of seismicity, which correspond
roughly with the 1987-1989 VAN predictions, treating all of
Greece as a single region (that is ignoring spatial variations in
the seismicity rate).
3
SIMULATION ‘EXPERIMENTS’
3.1 First simulation
The first simulation illustrates points (1) and (2) of the
Introduction. In this experiment, we simulate 3 years of
Gamma(O.5,0.13) seismicity (which tends to cluster more than
a Poisson process with the same rate). From this sequence, we
generate a random set of earthquake predictions: each time
there is an event, we toss a coin. If the coin lands on ‘heads’,
we predict that there will be another event within 2.5 weeks
(issue an alarm with duration 2.5 weeks). If the coin lands on
‘tails’, we d o not issue a prediction. If a new prediction is
issued during an alarm, we extend the alarm time accordingly,
without increasing the number of predictions. Applying this
rule to the ‘observed’ seismicity sequence resulted in 15 predictions whose total duration was 0.83 years (28 per cent of
the 3-year period). Note that 15 is somewhat less than 0.5 x 45,
which would be the expected number of alarms if the alarm
window were vanishingly short, rather than of 2.5 weeks
duration. The smaller number of predictions results from the
concatenation of overlapping alarms. For this sequence of 15
predictions, the observed success rate is
number of correctly predicted events
total number of events
= 0.2.
Holding those 15 predictions fixed, we now generate 1000
3-year seismicity sequences from two renewal processes that
have the same rate, 15 events yr-’: a Gamma(O.5,0.133) process (the same as the one that generated the ‘observed‘
seismicity) and a Gamma( I, 0.067) process (Poisson). From
the empirical cumulative distribution functions (ecdfs) of the
success rates of the fixed predictions on the two sets of random
seismicity, we can estimate the success rate that corresponds
to a significance-level 0.1 test of the null hypothesis (the critical
value), and compare the observed success rate for the original
sequence with the critical value to test the hypothesis. The
results of the two sets of simulations are shown in Table 1, in
which the symbol Fo,9 denotes the 90th percentile of the success
rate in 1000 simulations.
Perhaps surprisingly, this test rejects the null hypothesis
even when it is true, that is even when the model of the
seismicity used in the test is the same as that used to generate
the original ‘observed’ sequence, and the correct predictions
succeed ‘by chance’. This results from conditioning on the
predictions: the predictions depend on the seismicity history.
They exploit clustering in the sequence from which they derive
by issuing alarms whose duration is somewhat longer than the
median interevent time after some events. While each simulated
498
P. B. Stask
Table 1. Results of simulations testing the hypothesis that successful
predictions are chance coincidences, conditional on the predictions,
with two different chance models for seismicity. Column 1: process
used to model seismicity; column 2: 90th percentile of the fraction of
events correctly predicted by the fixed predictions; equivalently, the
critical value for a significance-level 0.1 conditional test of the null
hypothesis.
Process
Fo 9
reject null
hypothesis'?
median false
alarm rate
True r(0.5,0.133)
Poisson r(1,0.067)
0.18
0.17
YES
YES
0.53
0.47
seismicity sequence has about the same amount of clustering,
the clusters are in different places, so the predictions are,
on average, less successful. This is one of the reasons that
conditioning on the predictions tends to yield erroneous
conclusions.
The median false-alarm rate is higher for the true process
than for a Poisson process with the same seismicity rate. This
also results from clustering: in the extreme limit of clustering,
all the simulated events would occur in one cluster, so if we
had n alarms, at least n - 1 of them would have to be false
alarms. On the other hand, conditional on the number of
events, the times of the Poisson-distributed events are uniformly distributed over the 3-year interval, so we have a
reasonable chance of no false alarms once the number of events
exceeds the number of alarms.
3.2 Second simulation
The second simulation is designed to evaluate the proposed
approach to testing the revised null hypothesis; namely, conditioning on the observed seismicity (holding it constant)
and comparing the success rate of the observed predictions
with that of randomly generated predictions. Using the
original seismicity sequence of the first simulation, we generate
1000 sets of random predictions, corresponding to different
realizations of the coin-tossing stage of that simulation. We
then calculate the empirical cdf of the success rate of those
predictions, and use its 90th percentile as the critical value of
the test of the null hypothesis. The results of these simulations
are shown in Table 2. In this test, the null hypothesis is
appropriately not rejected. (Note, however, that the original
realization of the coin tosses was particularly lucky!) This test
behaves as we would like it to, and addresses point (3) in the
Introduction: without using extrinsic information, a simple
rule for predicting seismicity does as well as or better than the
predictions being evaluated more than 10 per cent of the time.
Thus the value of any extrinsic information the predictions use
(in this case, there is none) has not been established. The null
hypothesis that this method is no better than an automated
strategy that uses only seismic information is not rejected.
Table 2. Results of 1000 simulations to test the revised null hypothesis
that the predictions are no better than those of an automatic method
that uses no extra-seismic information, conditionally on the observed
seismicity, rather than testing the null hypothesis that the successful
predictions succeeded by chance, conditionally on the predictions.
Column 1: 90th percentile of success rate of random predictions in
1000 trials.
3.3
Third simulation
The third simulation repeats the first two, but with the
'observed' seismicity generated from a Poisson process rather
than a more clustered Gamma renewal process. In this case,
the particular realization of the process and the coin tosses
led to 13 predictions with a total alarm time of 0.69 years
(23 per cent). 11 per cent of the events were successfully
predicted, and the false-alarm rate was 38 per cent. Note that
even for Poisson seismicity, the 90th percentile of the success
rate of the random predictions is higher when we condition
on the seismicity than when we condition on the predictions.
3.4 Fourth simulation
The fourth simulation apes a test of the VAN predictions
(Varotsos et al. 1996a). To make a more accurate test
would require accounting for the spatial heterogeneity of the
seismicity rate in Greece, which is beyond the scope of the
present work. According to the preliminary determination of
epicenters (PDE), the number of events in Greece betweeen
1987 and 1989 with m,>4.7 is 39 (Geller 1996). During the
same interval, the VAN group issued 23 predictions (Varotsos
et al. 1996a); the claimed success rate is 38 per cent. The
nominal duration of an alarm is about 23 days. Table 4 shows
the result of testing the null hypothesis that the successful
predictions are chance coincidences by conditioning on the
predictions and modelling the seismicity either as a Poisson
process of a Gamma renewal process with the same rate. In
both cases, the null hypothesis would have been rejected.
Table 5 shows the result of testing the null hypothesis that
the VAN predictions are no better than an automated strategy
that uses no extra-seismic information, conditional on the
observed seismicity. The automatic strategy was to generate
Table 3. Comparison of testing conditionally on the predictions, and
conditionally on the seismicity, when the seismicity is simulated from
a Poisson process.
Conditional on
FO.9
reject null
hypothesis?
median false
alarm rate
predictions
seismicity
0.15
0.17
NO
NO
0.46
0.45
Table4. Test of an idealization of the null hypothesis that the
successful VAN predictions, which are based on electrical signals, are
chance coincidences, holding the predictions fixed and comparing with
random seismicity. Column 1: 90th percentile of snccess rate of the
predictions on random seismicity.
Seismicity model
FO.9
reject null
hypothesis?
median false
alarm rate
Poisson
Gamma(0.5, 0.15)
0.17
0.19
YES
YES
0.30
0.40
Table 5. Test of an idealization of the null hypothesis that the VAN
predictions, which use electrical signals, are no better than an automated prediction method that uses no extra-seismic information,
holding the seismicity fixed and comparing the VAN success rate with
that of random predictions. Column 1: 90th percentile of success rate
of random 23-day predictions on the true seismicity.
F0.9
reject null hypothesis?
median false alarm rate
F0.9
reject null hypothesis?
median false alarm rate
0.21
NO
0.33
0.49
NO
0.30
0 1997 RAS, GJI 131, 495-499
Eurtlzquuke prediction: the null hypothesis
random 23-day ‘alarms’ from the observed seismicity, repeating
the steps of the earlier simulations, but using a biased coin
with probability p = 23/39 of landing on ‘heads’ (so that the
expected number of predictions, assuming no overlapping
alarms, agrees with the actual number of VAN predictions).
This resulted in an average of 10 predictions with an average
total alarm time of 0.95 years (32 per cent of the interval). The
90th percentile of the success rate of the random predictions
was 49 per cent so the null hypothesis would not be rejected.
4
DISCUSSION
The basic phenomena exhibited here, that is using a conditional
hypothesis test or an inappropriate null hypothesis can be
misleading, arise in other geophysical problems as well. For
example, Hide & Malin (1970) found apparently statistically
significant correlations between the geoid and the geomagnetic
field. Eckhardt (1984) showed that those correlations are not
meaningful, and that the impression of statistical significance
came from an inappropriate null hypothesis, which included
the assumption that the spectrum of the fields was white
(rather than red, as most geophysical fields are), and from the
fact that the significance level did not account for the fact that
certain parameters were fitted to maximize the apparent correlation (in Hide & Malin’s case, a rotation between the fields).
Not accounting for the fact that certain parameters have been
adjusted on the basis of the data amounts to a conditional
test of the null hypothesis, which we have seen can be quite
misleading. Similarly, Morelli & Dziewonski ( 1987) found
that the correlation between core-mantle topography models
inferred from PcP and P K P traveltimes were apparently
significant. They used the same (inappropriate) null hypothesis
as Hide & Malin, namely that the spectrum of CMB heterogeneity is white, and in assigning a significance level to the
observed correlation they did not account for the calibration
of various parameters, such as the number of singular functions
retained in their damped least-squares estimates of topography
(Stark 1995).
These same effects are present in some assessments of
earthquake predictions, for example the VAN predictions:
Varotsos et a/. ( 1996b) explicitly advocate the assumption that
the times, locations and magnitudes of events are jointly
independent, which is almost certainly false; the null hypothesis
has in many cases included the assumption that seismicity
(possibly after some data processing to ‘decluster’ the
sequences) has a Poisson distribution, which seems implausible
to me; and the geographical, magnitude and temporal windows
of the predictions were adjusted several times by the VAN
group to improve the apparent success rate of the predictions,
which has not generally been accounted for in determining
the significance level (Kagan & Jackson 1996; Mulargia &
Gasperini 1996a). Mulargia (1997) shows that if one accounts
for the optimization of those parameters in evaluating the
VAN predictions, the apparent significance disappears.
ACKNOWLEDGMENTS
I am grateful to S. N. Evans and J. R. Rice for helpful
discussions. I received support from the NSF and NASA.
0 1997 RAS, G J I 131,495-499
499
REFERENCES
Aceves, R.L., Park, S.K. & Strauss, D.J., 1996. Statistical evaluation
of the VAN method using the historic earthquake catalog in Greece,
Geopkys. Res. Lett., 23, 1425-1428.
Eckhardt, D.H., 1984. Correlations between global features of
terrestrial fields, Muth. Geol., 16, 155-171.
Geller, R.J., 1996. Debate on evaluation of the VAN method: Editor’s
introduction, Geophys. Res. Lett., 23, 1291- 1293.
Hide, R. & Malin, S.R.C., 1970. Novel correlations between global
features of the Earths gravitational and magnetic fields, Nature,
225, 605-609.
Honkura, Y. & Tanaka, N., 1996. Probability of earthquake
occurrences in Greece with special reference to the VAN predictions,
Geophys. Res. Lett., 23, 1417-1420.
Kagan, Y.Y., 1991. Likelihood analysis of earthquake catalogues,
Geophys. J. Int., 106, 135-148.
Kagan, Y.Y., 1996. VAN earthquake predictions-an
attempt at
statistical evaluation, Geophys. Res. Lett., 23, 1315-1318.
Kagan, Y.Y. &Jackson, D.D., 1996. Statistical tests of VAN earthquake
predictions: comments and reflections, Geophys. Res. Lett., 23,
1433-1436.
Michael, A.J., 1997. The evaluation of VLF guided waves as possible
earthquake precursors, Geophys. Res. Lett.. in press.
Morelli, A. & Dziewonski, A.M., 1987. Topography of the core-mantle
boundary and lateral homogeneity of the liquid core, Nuture,
325, 678-683.
Mulargia, F., 1997. Retrospective validation of the time association of
precursors, Geophys. J. Int., 131, 500-504 (this issue).
Mulargia, F. & Gasperini, P., 1992. Evaluating the statistical validity
of ‘VAN earthquake precursors, Geophys. J. Int., 111, 32-44,
1992.
Mulargia, F. & Gasperini, P., 1996a. Precursor candidacy and
validation: the VAN case so far, Geophys. Res. Lett., 23, 1323-1326.
Mulargia, F. & Gasperini, P., 1996b. VAN: Candidacy and
validation with the latest laws of the game, Geophys. Res. Lett., 23,
1327-1 330.
Mulargia, F., Marzocchi, W. & Gasperini, P., 1996. Rebuttal to
replies I and I1 by Varotsos et al., Geophys. Res. Lett., 23,
1339-1340.
Ogata, Y., 1988. Statistical models for earthquake occurrences and
residual analysis for point processes, J. Am. Stat. Assn, 83, 9-27.
Ogata, Y . , 1993. Fast likelihood computation of epidemic type
aftershock-sequence model, Geophys. Res. Lett., 20, 2143-2146.
Riedel, K.S., 1996. Statistical tests for evaluating earthquake prediction
methods, Geophys. Res. Lett., 23, 1407-1409.
Stark, P.B., 1995. Reply to Comment by Morelli and Dziewonski,
J. geophys. R e x , 100, 15 399-15 402.
Stark, P.B., 1996. A few considerations for ascribing statistical
significance to earthquake predictions, Geophys. Res. Lett., 23,
1399-1402.
Udias, A. & Rice, J.R., 1975. Statistical analysis of microearthquake
activity near San Andreas Geophysical Observatory, Bull. seism.
Soc. Am., 65, 809-828.
Utada, H., 1996. Difficulty of statistical evaluation of an earthquake
prediction method, Geophys. Res. Lett., 23, 1391-1394.
Varotsos, P., Alexoponlos, K. & Nomicos, K., 1981. Seismic electric
currents, Prakt. Akad. Athenon, 56, 277-286.
Varotsos, P., Eftaxias, K., Lazaridou, M., Dologlou, E. &
Hadjicontis, V., 1996a. Reply to ‘probability of chance correlations
of earthquakes with predictions in areas of heterogeneous seismicity
rate: the VAN case’, by M. Wyss and A. Allmann, Geophys. Res.
Lett., 23, 1311-1314.
Varotsos, P., Eftaxias, K., Vallianatos, F. & Lazaridou, M., 1996b.
Basic principles for evaluating an earthquake prediction method,
Geophys. Res. Lett., 23, 1295-1298.
Wyss, W. & Allmann, A., 1996. Probability of chance correlations of
earthquakes with predictions in areas of heterogeneous seismicity
rate: the VAN case, Geophys. Res. Lett., 23, 1307-1310.