A Generalization of Hewitt`s Test for Seasonality

International Journal of Epidemiology
O International Epidemlologlcal Association 1996
Vol. 25, No. 3
Printed In Great Britain
A Generalization of Hewitt's Test
for Seasonality
PETER A ROGERSON
Rogerson P A (Department of Geography and National Center for Geographic Information and Analysis, University of
New York at Buffalo, Buffalo, NY 14261, USA). A generalization of Hewitt1 s test for seasonality. International Journal of
Epidemiology 1996; 25: 644-648.
Background. Hewitt1 s statistic for seasonality In monthly data is the maximal rank sum among all possible rank sums
derived using consecutive 6-month periods. In this paper, Hewitt's test is extended to include those instances where
3, 4, or 5-month pulses or periods of raised incidence are hypothesized.
Methods. Monte Carlo methods are used to derive the approximate distribution of the test statistic under the null hypothesis, when the length of the hypothesized period is k- 3, 4, or 5. A combinatorial method is used to derive exact levels
for the test statistic. The test is applied to monthly data on adolescent suicide. Finally, the power of the test is compared
with the x2 statistic using Monte Carlo simulation.
Results. The distribution of the test statistic was found and used to test the null hypothesis of no seasonal variation in
monthly adolescent suicides, using a period of k = 3 months. The null hypothesis was rejected, indicating seasonality in
the data. Monte Carlo simulations show the test statistic to be more powerful than the x2 statistic when sample sizes are
small.
Conclusions. This generalization of Hewitfs test should be most useful in those instances where the researcher wishes
to carry out a quick and simple test of the null hypothesis of no seasonality against the alternative of a predetermined
3, 4, or 5-month period of raised incidence. When there is no a priori hypothesis about the appropriate length of period,
the test may be used for different k values, in a more exploratory fashion.
Keywords, seasonality, Hewitt's test
Similarly, the probability that T Ss 55 = 0.0483, and
thus T = 55 may be used for tests having an approximate Type I error probability of 0.05.
Hewitt's test is preferable to the x 2 test which compares observed and expected monthly values, because
the latter takes no account of the ordering of the monthly
observations. Comparisons of the power of various tests
of seasonality typically demonstrate that Hewitt's test
is slightly more powerful than the x 2 test, though other
tests (such as Kuiper's VN statistic4) appear to often be
superior to Hewitt's statistic.
Because of its simplicity, Hewitt's test has witnessed
widespread use. Recent applications of the test include
studies of seasonality in the mutagenic activity of
airborne particulates,3 seasonal variation in melanoma
of the eye, 6 seasonality in diseases of the skin,7-8 and
monthly and seasonal variation in congenital malformations. 9 ""
Hewitt's test makes the assumption that the year is
split into two equally wide intervals of 6 months each.
Yet for any given phenomenon, there may be no compelling reason to divide the year in this way. Marrero12
and Wallenstein et al. l3 also stress the need to examine
what they term the 'one-pulse' model, where there is
There now exists a broad array of tests for detecting
temporal trends or seasonality in monthly data. One of
the early tests was suggested by Edwards.1 His test,
which has retained widespread popularity, is designed
to test the hypothesis of no seasonality against harmonic cyclical alternatives.
Hewitt, Milner, Csima, and Pakula 2 noted that
Edwards' test lacked power for small sample sizes, and
they suggested a non-parametric test as an alternative.
Hewitt's test requires that the monthly data first be
ranked (12 = highest; I = smallest). Then all possible
sequences of six consecutive months are examined. The
statistic, 7", is the maximal sum of the ranks observed
for any 6-month period, and it is compared with either
the approximate critical values given by Hewitt et al.
or preferably, with the exact critical values provided
by Walter.3 The probability of a maximal rank sum of
T = 57 (12 + 11 + 10 + 9 + 8 + 7) is equal to 0.0130,
and thus T = 57 may be used as a critical value for a
desired significance level of approximately 99%.
Department of Geography and National Center for Geographic Information and Analysis, State University of New York al Buffalo, Buffalo,
NY 14261, USA.
644
645
HEWITTS TEST FOR SEASONAUTY
TABLE 1 Approximate
null distributions for the maximal rank sum
A= 3
Maximal
rank sum
t
33
32
31
30
29
28
27
26
25
24
23
= 4
Probability
(Tat)
0.0543
0.1056
0.1975
0.3220
0.4711
0.6154
0.7701
0.8824
0.9592
0.9921
0.9992
*=5
Maximal
rank sum
Probability
(T*t)
Maximal
rank sum
Probability
(T»t)
t
42
41
40
39
38
37
36
35
34
33
32
31
30
29
0.0267
0.0509
0.0927
0.1540
0.2398
0.3356
0.4590
0.5909
0.7316
0.8472
0.9325
0.9788
0.9967
0.9998
t
50
49
48
47
46
45
44
43
42
41
40
39
38
37
36
0.0152
0.0294
0.0573
0.0949
0.1499
0.2246
0.3173
0.4249
0.5432
0 6633
0.7782
0.8781
0.9580
0.9921
0.9996
NB The Table displays the probability that the maximal rank sum is equalled or exceeded under the null hypothesis of no seasonal pattern in monthly
data. Each distribution is based upon 20 000 randomly generated sequences of the integers 1-12.
one duration of designated length, not necessarily equal
to 6 months, where an incidence rate is raised over and
above some background rate. Wallenstein et a/.13 develop a statistic which may be used with monthly data
to evaluate the null hypothesis of no pattern against the
alternative hypothesis of a pulse of predetermined
length equal to 2 or 3 months. This so-called ratchet
circular scan statistic is defined as the maximum
number of events falling into any k = 2 or k = 3-month
period (they do not examine cases where k > 3, though
Krauth14 has recently emphasized the desirability of
doing so, and has suggested how the test can be extended to such cases). In their paper, they suggest that
an approach to the study of seasonal pattern having intuitive appeal would be to consider all possible random
orderings of the monthly totals. Because the minimum
/•-values possible with such an approach exceed 0.05
for k = 2 and k = 3, they do not further examine the
approach. In what follows, we reconsider the idea of
random reorderings of the monthly data and extend the
results of Hewitt and Walter to the cases where k = 3,
4 and 5.
METHODS AND RESULTS
Monte Carlo Results
Table 1 gives the approximate probability that any
given maximal rank sum is equalled or exceeded under
the null hypothesis of no seasonal variation. Monte
Carlo simulation was used to determine the distribution
by examining the maximal rank sums for randomly
generated sequences of the integers from 1 to 12. For
each value of k, 20 000 such sequences were generated.
The proportion of these sequences having given maximal rank sums was then tabulated.
For example, if it is hypothesized that the incidence
of some disease is higher during some undetermined
4-month period of the year, using a critical maximal
rank sum of 41 would yield a test of the null hypothesis
with an approximate a = 0.05. Either the four highest
values would have to cluster together (12 + 11 + 10 + 9
= 42), or the three highest values would have to be
clustered together with the fifth highest value (12 + 11
+ 10 + 8 = 41). Similarly, for a Type I error probability
of a approximately 0.09, one would use a critical value
of 40 for the k = 4 case.
Exact Results
Walter3 describes a combinatorial method that may be
used to determine the exact distribution of T. Using his
notation,
P(T=t) =
P{d,)-LIP(T;d,r d^d,} P{d,,d,p
where dt is a decomposition of / into k monthly ranks.
For example, with k = 3, t = 31 could be achieved via
646
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
TABLE 2 Exact null distributions for the maximal rank sum
TABLE 3 Number of adolescent suicides per month in the USA,
1978-1979
k=4
*=3
Maximal
rank sum
t
33
32
Probability
(T*t)
Maximal
rank sum
Probability
(T»t)
0.05454
0.10505
t
42
41
40
0.02424
0.04697
0.08910
k=6
k= 5
Maximal
rank sum
Probability
(T*t)
Maximal
rank sum
Probability
(T>t)
0.01515
0.02944
0.05621
0.09360
t
57
56
55
54
53
0.0130
0.0253
0.0483
0.0805
0.1299
t
50
49
48
47
one of two decompositions: either 12, 11, and 8 could
be adjacent to one another or 12, 10, and 9 could be
adjacent to one another. P(dlj) is the probability that a
particular decomposition appears in the data, and
P(T;dtildu) is the probability that T is attained as the
maximal rank sum, given that the particular decomposition is realized. For example, the appearance of the
sequence (12, 11, 8) does not guarantee that T = 31,
since either (10, 12, 11, 8) or (9, 12, 11, 8) would
constitute sequences with higher rank sums. The last
term in the equation accounts for double counting; that
is, there are some cases where two decompositions,
i andy, appear in the same set of data (in the case above,
if (11, 8, 12, 10, 9) appear in sequence, there are two
sums equal to 31).
As noted by Walter, 3 the method is quite tedious, and
for given it, it is easier to work with higher values of
t than with lower values. Table 2 gives the exact values
associated with the null distributions; these permit tests
to be carried out at approximate values of 0.01, 0.05,
and/or 0.10. Because of the difficulty associated with
carrying out the calculations, the tabulations in Table 2
are not presented for lower values of /, similar to
Walter's presentation of results for Jt = 6. The results in
Table 2 are in close agreement with the Monte Carlo
results of Table 1.
From the table, critical values of T corresponding to
an a of approximately 0.05 are T = 33, 41, and 48 for
January
February
March
April
May
June
July
August
September
October
November
December
Number
Average per day
Rank
334
291
332
284
311
257
282
261
275
289
290
268
5.38
5.20
5.35
4.73
5.02
4.28
4.55
4.21
4.58
4.66
4.83
4.32
12
11
10
7
9
2
4
1
5
6
8
3
k = 3, 4 and 5, respectively. Similarly, critical values
of T corresponding to an a of approximately 0.10 are
T = 32, 40, and 47, respectively.
Application
Wallenstein et a/.13 report data on the number of
adolescent suicides, by month, for the 2-year period
1978-1979, and also give the corresponding average
number of suicides per day. Table 3 displays the data.
The monthly rank of the average is also given, where
rank 12 corresponds to the highest monthly average,
and rank 1 the lowest.
With the predetermined value of k = 3 used by
Wallenstein et al.13 the maximal rank sum is T= 33, for
the period January-March. The P-value associated with
the observed test statistic is 0.0545 under the null
hypothesis, suggesting seasonal variation in the data.
Note that if Hewitt's original test had been employed T
would be equal to 51, for the 6-month period from January through June. For that case, using Hewitt's Monte
Carlo approximation to the distribution yields a clearly
insignificant P-value of 0.29.
Kevan,15 in his review of seasonal variation in
suicide, summarizes a large number of previous studies
on the subject. Though the underlying causes (e.g.
socioeconomic or biometerological) for seasonal variation in suicide remain unclear, an inspection of Kevan's
summary table suggests that pulse models with k = 3, 4,
or 5 may be plausible and useful in future studies.
Power Tests
The power of this generalization of Hewitt's test was
compared with the x 2 t e s t through a series of Monte
Carlo simulations. For k = 3, N = 50, 100, and 500
observations were randomly allocated to months, using
647
HEWTTTS TEST FOR SEASONALITY
the same 3-month pulse employed by Wallenstein et al.
where p, = 0.102 for i = 1, 2, 3, and p, = 0.077 for / = 4,
..., 12. For k = 4, 5 and 6, pulses of similar strength
were examined (where the ratio of pi during the pulse
period to the value of pt for the remainder of the year
was kept constant at 0.102/0.077 = 1.32). For k = 4,
N observations were randomly allocated to months using
probabilities p, = 0.0996, i = 1, 2, 3, 4 and p, = 0.0752
for i = 5, ..., 12. For k = 5, observations were generated
using p, = 0.0972, i = 1, ..., 5, and p, = 0.0734 for i = 6,
..., 12. Finally, for k = 6, observations were generated
using p, = 0.0950 for i = 1, ..., 6, and p, = 0.0717 for
i = 6, ..., 12. Note that the choice of the particular p / s
associated with each alternative is arbitrary, and a more
extensive investigation of relative power could be
carried out by using stronger and/or weaker pulses.
In those cases where ties in the monthly frequencies
arose, for the generalization of Hewitt's test the ties
were broken by the addition of a small, random decimal
to the frequencies. The proportion of null hypotheses
rejected out of 2000 repetitions are given for each value
of k (using a = 0.05) in Table 4. For k = 3, the power of
the ratchet circular scan statistic developed by
Wallenstein et al.13 is also provided.
Table 4 reveals that the generalized Hewitt test is
slightly more powerful than the x2 test for small
samples. For large samples, the generalized Hewitt test
is weaker than the x2 test when k = 3 or 4, and the two
tests have similar power when k = 5 or 6. For samples
of size 100, the generalized Hewitt test is more
powerful than the %2 test when k = 5 and 6. Note that for
k = 3, the circular ratchet scan statistic displays greater
power than either of the other two statistics.
DISCUSSION
In this paper, Hewitt's test is generalized for the cases
where it is hypothesized that there is a 'pulse' or short
period of time, less than 6 months in duration, where
there is a raised incidence of the phenomenon of
interest. The null distribution of the maximal rank sum
statistic is derived, and Monte Carlo tests demonstrate
that for small samples, this generalization of Hewitt's
test is more powerful than the X2test. In larger samples,
the x2 test dominates, owing to the amount of information essentially thrown away in the ranking process.
While Hewitt's test is easy to use, and has advantages over the common x2 test when sample sizes are
small, other statistics, such as the ratchet circular scan
proposed by Wallenstein et al. provide alternatives that
may have greater power. The ratchet circular scan
statistic has to date been developed only for k = 3,
though Krauth14 has recently provided bounds for the
TABLE 4 Power of generalized Hewitt's test versus x2 test
Jt = 3
#=100
#=500
Generalized Hewitt
0.079
0.099
0.370
X2
0.060
0.099
0.447
Ratchet soul
0.113
0.150
0.631
Note: For k = 3, the exact a = 0 0545 for the generalized Hewitt's test.
Generalized Hewitt
N = 50
N = 100
N = 500
0 084
0.107
0 463
X2
0.072
0.115
0.501
Note: For Jt = 4, the exact a = 0 047 for the generalized Hewitt's test.
N= 100
Hewitt
0.090
0 146
#=500
0.519
X2
0.077
0.114
0.500
Note. For k = 5, the exact a = 0 056 for the generalized Hewitt's test.
= 50
N=\00
Hewitt
0.085
0.127
X2
0.066
0.112
Note. For k = 6, the exact a =0.048 for Hewitt's test.
#=500
0.520
0.521
upper-tail probabilities for other values of k. Despite
the availability of such alternatives, the simplicity of
the generalized Hewitt statistic is attractive. Particularly in those cases where either sample sizes are small
or only data on ranks or rates are available, the extension of Hewitt's test described here should prove useful.
The test has the disadvantage that the length of the
hypothesized pulse, k, must be specified in advance.
Researchers should not simply examine the significance
associated with each value of k, and then identify that
k which is most significant, since the simultaneous
testing of multiple hypotheses will change the level of
the test. However, as a purely exploratory device, such
a strategy could be fruitful in developing hypotheses for
future study.
ACKNOWLEDGEMENTS
I gratefully acknowledge the assistance of NSF Grant
SES-8810917 to the National Center for Geographic
Information and Analysis, and NSF Grant SES9022192 to the Center for Advanced Study in the
648
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Behavioral Sciences in Stanford, CA, where this work
was initiated. I am also grateful for the comments of the
anonymous referee.
1
REFERENCES
Edwards J H. The recognition and estimation of cyclic trends.
Ann Hum Genetics 1961; 25: 83-86.
2
Hewitt D, Milner J, Csima A, Pakula A. On Edwards' criterion
of seasonably and a non-parametric alternative. Br J Prev
SocMed 1971; 25: 174-76.
3
Walter S D. Exact significance levels for Hewitt's test for seasonality. J Epidemiol Community Health 1980; 34: 147-49.
4
Freedman L S. The use of a Kolmogorov-Smirnov type statistic
in testing hypotheses about seasonal variation. J Epidemiol
Community Health 1979; 33: 223-28.
'Vellosi R, Vannucchi C, Bianchi F, Firio R, Rosellini D,
Ciacchini G, Giaconi V, Bronzetti G. Mutagenic activity
and chemical analysis of airborne participates collected
in Pisa (Italy). Bull Environ Contam Toxicol 1994; 52:
465-73.
'Schwartz S M, Weiss N S. Absence of seasonal variation in
the diagnosis of melanoma of the eye in the United States.
Br J Cane 1988; 58: 402-04.
7
Akslen L A, Hartveit F. Cutaneous melanoma—season and
invasion. Ada Dermato-Venereologica. 1988; 68: 390-94.
'Larsen T E, Mogensen S B, Holme I. Seasonal variations of
pigmented naevi. Ada Dermato-Venereologica. 1990; 70:
115-20.
' Bound J B, Harvey P W, Francis, B J. Seasonal prevalence of
major congenital malformations in the Fylde of Lancashire,
1957-81. J Epidemiol Community Health 1989; 43:
330-42.
l0
Castilla E E, Orioli I M, Lugarinho R, Dutra G P, LopezCamelo J S, Campana H E, Spagnolo A, Mastroiacovo P.
Monthly and seasonal variation in the frequency of
congenital anomalies. Int J Epidemiol 1990; 19: 399^*04.
" FitzPatrick D R, Raine P A M, Boorman J G. Facial clefts in the
west of Scotland in the period 1980-84: epidemiology and
genetic diagnosis. J Med Genet 1994; 31: 126-29.
l2
Marrero O. The performance of several statistical tests for
seasonally in monthly data. J Stat Comp and Simul 1983;
17: 275-90.
13
Wallenstein S, Weinberg C R, Gould M. Testing for a pulse in
seasonal data. Biometrics 1989; 45: 817-30.
14
Krauth J. Bounds for the upper-tail probabilities of the circular
ratchet scan statistic. Biometrics 1992; 48: 1177-85.
"Kevan, S M. Season of suicide: a review. Soc Sci Med 1980;
14D: 369-78.
(Revised version received November 1995)