Monthly and seasonal rainfall in Australia:
distribution, consistency and predictors
By
Md Masud Hasan (M.Sc)
This thesis is submitted in fulfilment of the requirements for the completion of the
degree of Doctor of Philosophy
School of Health and Sport Sciences
Faculty of Science, Health and Education
University of the Sunshine Coast
Sippy Downs Drive, Sippy Downs
Sunshine Coast, Queensland 4556
Australia
September 2011
TABLE OF CONTENTS
TABLE OF CONTENTS
i
LIST OF FIGURES
iv
LIST OF TABLES
viii
DECLARATION OF ORIGINALITY
xii
ABSTRACT
xiii
PUBLICATIONS FROM DOCTORAL THESIS
xvii
CHAPTER 1: INTRODUCTION
1
Background
1
Research Aims
5
Presentation of the Thesis
7
CHAPTER 2: LITERATURE REVIEW
9
Introduction
9
Rainfall Variability
9
Measures of Rainfall Variability
10
Entropy and Rainfall Variability
11
Impact of ENSO on Rainfall Variability
13
Rainfall Distributions
14
Daily Rainfall Distributions
15
Monthly Rainfall Distributions
16
Modelling Monthly Rainfall Totals
20
Monthly Rainfall Models
20
Seasonal Rainfall Distributions
23
Conclusion
26
CHAPTER 3: ENTROPY, CONSISTENCY IN RAINFALL DISTRIBUTION AND
POTENTIAL WATER RESOURCE AVAILABILITY IN AUSTRALIA
27
Statement of Intellectual Contribution
28
Abstract
29
Introduction
30
Methodology
32
Entropy and Rainfall Variability
32
Entropy of Stable Rainfall and Consistency Index
34
Potential Water Resources Availability
35
Data and Preliminaries
37
Results and Discussion
37
i
Conclusion
45
References
47
CHAPTER 4: TWO TWEEDIE DISTRIBUTIONS THAT ARE NEAR-OPTIMAL FOR
MODELLING MONTHLY RAINFALL IN AUSTRALIA
52
Statement of Intellectual Contribution
53
Abstract
54
Introduction
55
Data
58
Methodology
61
Tweedie Densities
61
Results and Discussion
63
Conclusion
67
References
70
CHAPTER 5: A SIMPLE POISSON–GAMMA MODEL FOR MODELLING
RAINFALL OCCURRENCE AND AMOUNT SIMULTANEOUSLY
75
Statement of Intellectual Contribution
76
Abstract
77
Introduction
78
Data and Preliminaries
81
Models
84
Exponential Dispersion Models
84
The Tweedie Family
84
Generalized Linear Models
87
Model Fitting
87
Model Results
88
Fitted Model
88
Interpretation
91
Diagnostic Checks
94
Simulation
95
Conclusion
96
References
98
CHAPTER 6: UNDERSTANDING THE EFFECT OF CLIMATOLOGY ON MONTHLY
RAINFALL AMOUNTS IN AUSTRALIA USING TWEEDIE GLMS
103
Statement of Intellectual Contribution
104
Abstract
105
Introduction
106
ii
Data
108
Models
110
Exponential Dispersion Models
110
The Tweedie Family
111
Generalized Linear Models
112
Model Fitting
113
Model Results
113
The Models
113
The Fitted Models and Interpretation
115
Conclusion
124
References
126
CHAPTER 7: THE TWEEDIE FAMILY OF DISTRIBUTIONS FOR MODELLING
SEASONAL RAINFALL TOTALS IN AUSTRALIA ..................................................... 133
Statement of Intellectual Contribution
134
Abstract
135
Introduction
136
Data and Preliminaries
138
Methodology
140
Exponential Dispersion Models
140
The Tweedie Family
142
Results and Discussion
144
Conclusion
151
References
153
CHAPTER 8: CONCLUSION AND FUTURE DIRECTIONS....................................... 158
Conclusion
158
Future Directions
161
References ............................................................................................................................. 162
iii
APPENDIX A: Article entitled, ―Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia‖, published in International Journal of
Climatology.………………………………………………………………………….184
APPENDIX B: Paper published in the Agricultural and Forest Meteorology………193
APPENDIX C: Paper entitled, ―Entropy, consistency in rainfall distribution and
potential water resource availability in Australia‖ published in Hydrological Processes.
………………………………………………………………………….… …………204
APPENDIX D: Article, ―Understanding the effect of climatology on monthly rainfall
amounts in Australia using Tweedie GLMs‖, published online on International Journal
of Climatology………………………………………………………………………..214
APPENDIX E: Letter from the Journal of Hydrology receiving the paper “The
Tweedie family of distributions for modelling seasonal rainfall totals in Australia”.226
LIST OF FIGURES
Chapter: 3
Figure 1 Location of the stations studied; the six case study stations mentioned in the
paper are named and indicated using squares. ................................................................... 36
Figure 2 Boxplots of the monthly rainfall distributions. The horizontal lines indicate
median rainfall for each month; the boxes indicate the first and third quartiles of the
distribution. The lines extending from the boxes extend to 1.5 times the interquartile
range; circles indicate observations more extreme than this. Note that the vertical scales
are not the same for each station. .................................................................................... 38
Figure 3 Contour maps of Australia showing the AE and ESR values. .......................... 39
Figure 4 Maximum possible entropy, ESR, AE of individual years and average AE for
the six studied stations ..................................................................................................... 40
Figure 5 The CI and DI of individual years and average of CI and DI for Bidyadanga,
Oenpelli and Clarence .................................................................................................. 41
iv
Figure 6 Map of Australia showing the average consistency index of Australian rainfall
stations. .................................................................................................................. …. 41
Figure 7 Map of Australia showing the average consistency index of Australian rainfall
n o and La Ni~
n a years. ..................................................................... 42
stations for E l Ni~
Figure 8 Years divided into four categories on the basis of CI and total annual rainfall.
..................................................................................................................................... 43
Figure 9 Contour maps representing the percentage of years with different rainfall
categories for the Australian rainfall stations. ............................................................. 44
Figure 10 Contour maps representing the percentage of years with Category I and
n o years for the Australian rainfall stations ....... 45
n a and El Ni~
Category IV in La Ni~
Chapter: 4
Figure 1 Locations of the stations studied; the four case study stations mentioned in the
paper are named and indicated using squares .............................................................. 59
Figure 2 Monthly rainfall distributions for Bidyadanga station .................................. 59
Figure 3 Monthly rainfall distributions for Cowal station ........................................... 60
Figure 4 Scatterplots showing the mean–variance relationship (measured on log scale)
of monthly rainfall for the stations Bidyadanga, Yoweragabbie, Cowal and Clarence
from 1910 to 2007, for all months. Each point represents the mean and variance of the
amount of rainfall for a single month .......................................................................... 62
Figure 5 The mean–variance relationships for the monthly rainfall distributions at
Bidyadanga, Yoweragabbie, Cowal and Clarence stations. Each line represents the
mean–variance relationship for a different month computed from 1910 to 2007 ....... 64
Figure 6 The 95% confidence intervals of p-indices for different months (1 = January, 2
= February and so on) for the stations Bidyadanga, Yoweragabbie, Cowal and Clarence
..................................................................................................................................... 66
v
Figure 7 Sample QQ-plots of the quantile residuals after fitting Tweedie distributions to
monthly rainfall totals for different months for the stations Bidyadanga, Yoweragabbie,
Cowal and Clarence. An ideal plot would show the points falling on the solid line, w
which corresponds to the standard normal distribution. ............................................. .67
Figure 8 The QQ-plots of the quantile residuals after fitting Tweedie distributions to
monthly rainfall totals for the months in the upper panel of Figure 7, with p 1.99 …67
Figure 9 Maps of Australia with black and grey dots representing the value of p for
different months for 102 rainfall stations. Black dots represent p 1.6 and grey dots
represent p 2 ……………………………………………………………………..
69
Chapter: 5
Figure 1 Location of the stations studied; the six case studies mentioned in the paper are
named and indicated using squares, grey dots represent the other studied stations. . ..81
Figure 2 Boxplots of the monthly rainfall distributions. The horizontal lines indicate
median rainfall; the boxes indicate the first and third quartiles of the distribution. The
lines extending from the boxes extend to 1.5 times the interquartile range; circles
indicate observations more extreme than this.............................................................. 82
Figure 3 Scatterplots showing the mean–variance relationship (measured on the log
scale) of monthly rainfall for the stations Bidyadanga, Trayning, Oenpelli, Ceduna,
Theebine and Clarence for the years from 1912 to 2007 for all months. Each point
represents the mean and variance of amount of rainfall for a single month ................ 83
Figure 4 The profile likelihood plot for the studied stations with sine and cosine terms
as predictors used to find the MLE of p. The points represent the computed likelihood
values; the solid line is a cubic-spline smooth interpolation through these points. The
dotted line indicates the nominal 95% confidence interval for p. ............................... 86
Figure 5 Plots showing the predicted and observed (for whole dataset of 96 years and
validation dataset of 36 years) mean monthly rainfall of the studied stations for different
months. ........................................................................................................................ 89
vi
Figure 6 Plots showing the predicted and observed (for whole dataset of 96 years and
validation dataset of 36 years) probability of no rain of the studied stations for different
months. ........................................................................................................................ 90
Figure 7 Maps of Australia showing the effect of sine and cosine terms on the amount of
rainfall. Black dots show significant positive effect (significantly larger rainfall amount
during summer), grey dots show significant negative effect (significant larger rainfall
amount during winter), black circles show no significant effect (even rainfall
distribution). .................................................................................................................... 91
Figure 8 QQ- plots of quantile residuals for the rainfall models of the studied stations
................................................................................................................................... ..94
Figure 9 Boxplots of the simulated monthly rainfall distributions for 96 years.............. 96
Chapter: 6
Figure 1 Location of the stations studied. The four case studies mentioned in the paper
are named and indicated using squares; grey dots represent the other studied stations
................................................................................................................................. ..109
Figure 2 Stations where Model 1, 2, 3 or 4 is preferred (on the basis of BIC) .............119
Figure 3 Contour maps of Australia showing the regression coefficients of NINO 3.4
(from Model 2) and SOI (from Model 3) on rainfall amounts ......................................119
Figure 4 Contour maps of Australia showing the p–values from LRT for Model 2
(NINO 3.4), Model 3 (SOI) and Model 4 (SOI phase) comparing with Model 1 (base
model) ............................................................................................................................120
Figure 5 Percentage change in predicted rainfall amount for the extra variable (NINO
3.4, SOI or SOI phase) with sine and cosine terms as variables ...................................122
Figure 6 Percentage change in predicted probability of no rainfall amount for the extra
variable (NINO 3.4, SOI or SOI phase) with sine and cosine terms as variables .........123
vii
Chapter: 7
Figure 1 Location of the stations studied. The four case study stations mentioned in the
paper are named and indicated using squares; black dots represent the other studied
stations ...........................................................................................................................138
Figure 2 Seasonal Rainfall distributions of Bidyadnga, Trayning, Theebine and
Clarence. ........................................................................................................................141
Figure 3 Mean–variance relationships of autumn, winter, spring and summer rainfall of
Bidyadanga measured on log scale ................................................................................142
Figure 4 Maps of Australia showing the stations where gamma, P–G and normal
distributions are near-optimal ........................................................................................146
Figure 5 Graphs showing the observed medians (cross), the medians of simulated
medians (square) along with the empirical 95% confidence intervals (vertical lines) for
forty relatively dry seasonsl..........................................................................................146
Figure 6: Graphs showing the observed 5th percentiles (cross), medians of the 5th
percentiles of simulated rainfall totals (squares) along with the empirical 95%
confidence intervals (vertical lines) for forty relatively dry seasons.............................150
Figure 7: Graphs showing the observed 95th percentiles (cross), medians of the 95th
percentiles of simulated rainfall totals (squares) along with the empirical 95%
confidence intervals (vertical lines) for forty relatively dry seasons.............................150
Figure 8: Observed probabilities of no rainfall (cross) and the medians of the
probabilities of no rainfall (squares) with empirical 95% confidence intervals (vertical
lines) for simulated rainfall for some seasonal datasets where the probability of no
rainfall in the observe rainfall time series is between 0.05 and 0.17 .............................151
LIST OF TABLES
Chapter: 4
viii
Table 1 Summary statistics of the monthly rainfall for Bidyadanga, Yoweragabbie, Cowal and
Clarence from 1910 to 2007 ..................................................................................................... 60
Chapter: 5
Table 1 Summary statistics of the monthly rainfall for whole dataset (1912 to 2007), estimation
dataset (1912 to 1971) and validation dataset (1972 to 2007) for six studied stations ............ 83
Table 2 The estimated values of the coefficients for the predictors in the fitted rainfall model.
The significance level is based on a standard asymptotic Wald z-test ..................................... 88
Chapter: 6
Table 1 The estimated values of the coefficients for the predictors in the fitted rainfall Models
for Bidyadanga. ...................................................................................................................... 116
Table 2 The estimated values of the coefficients for the predictors in the fitted rainfall Models
for Trayning ........................................................................................................................... 116
Table 3 The estimated values of the coefficients for the predictors in the fitted rainfall Models
for Theebine ........................................................................................................................... 117
Table 4 The estimated values of the coefficients for the predictors in the fitted rainfall Models
for Clarence ............................................................................................................................ 117
Table 5 Number (percentage) of stations where Models 1, 2, 3 and 4 are preferred using the
BIC, and where Models 2, 3 and 4 fit significantly better than the base model (Model 1) based
on the LRT (among the 220 studied stations) ........................................................................ 118
Chapter: 7
Table 1 Summary statistics of the seasonal rainfall for Bidyadanga, Trayning, Theebine and
Clarence from 1912 to 2007 ................................................................................................... 140
Table 2 Number (percentage) of stations where normal, gamma and P–G distributions are nearoptimal.................................................................................................................................... 145
Table 3 Number (percentage) of stations where the P-values from the K-S test statistics
are more than 5% level indicating a good fit of the data when fitting with the Tweedie
and log-normal distributions...................................................................................... 147
Table 4: Number (percentage) of stations where the observed 5th percentiles, 25th
percentiles, medians, 75th percentiles, 95th percentiles and probability of no rainfall of
seasonal rainfall data for different seasons are within the 95% empirical confidence
intervals of those of the simulated data. The bold figures indicate which of the two
models is superior. ..................................................................................................... 148
ix
ACKNOWLEDGEMENTS
This dissertation would not have been possible without the guidance and the help of
several individuals who in one way or another contributed and extended their valuable
assistance in the preparation and completion of this study.
First and foremost, my utmost gratitude to my PhD supervisor, Associate Professor
Peter K Dunn, whose sincerity and encouragement I will never forget. With his
inspiration, I hurdle all the obstacles in the completion this research work. Throughout
my data analysing, paper producing and thesis writing periods, he provided
encouragement, sound advice, good company, and lots of good ideas.
I would like to thank Associate Professor Shahjahan Khan of the University of Southern
Queensland for his advice and guidance at the early stage of my study. He is an
excellent teacher and I learned lots from him. I am indebted to my senior friend Dr.
Zahirul Hoque for his great support from the beginning of undergraduate study to the
end of my PhD. I am lucky to have a friend like him.
I am grateful to Shakhawat, Amin, Zaglul, Rob, Humayun, Giash, Antu, Azad, Darda,
Salim, Ratul and many more friends in Bangladesh for their great support and
encouragement towards my study. I also acknowledge the supports from Enamul,
Munia, Ripon, Nitol, Jasim, Bashar, Sajal and other friends in Australia for their
physical and emotional supports during my stay in Australia.
I would like to thank Barbara Palmer and Lyndal Kroker of Research office, Samantha
Poole and Nicole Bolitho of School of Health and Sport Sciences for their great help.
I owe my loving thanks to my wife Kazi Nasima Begum, and my sons Nahin and Nadif.
They have lost a lot due to my research and without their encouragement it would have
been impossible for me to finish this work. My special gratitude is due to my brothers,
x
my sisters and their families for their loving support. My loving thanks to my nephew
and nieces Hena, Saadi, Moumi and Jacob.
I am grateful to Prof NSM Yahya, Prof SM Shafiqul Islam, Prof MK Roy, Prof RN
Shil, Prof Shafiqur Rahman, Prof JC Paul, Prof Abdul Karim, Prof Soma Chowdhury,
Rakibul Mowla and other colleagues in the Department of Statistics, University of
Chittagong, Bangladesh for their great inspirations. I am also grateful to the authority of
the University of Chittagong for granting my study leave to complete my study in
Australia.
The financial support of University of the Sunshine Coast is gratefully acknowledged.
xi
DECLARATION OF ORIGINALITY
This thesis is submitted to the University of the Sunshine Coast in fulfilment of the
requirements for a Doctor of Philosophy.
The work presented in this thesis is, to the best of my knowledge and belief, original
except where explicit reference is made in the text. I hereby declare that this thesis
contains no material published elsewhere or extracted in whole or in part from a thesis
by which I have qualified for, or been awarded another degree or diploma at this or any
other institution. No other person‘s work has been relied upon or used without due
acknowledgment in the main text and bibliography of the thesis. This research was
conducted under the supervision of Associate Professor Peter K. Dunn (School of Health
and Sport Sciences, University of the Sunshine Coast).
This thesis has been prepared to confirm to the guidelines provided by the University of
the Sunshine Coast. Three of the five papers presented in the thesis are in press or
published in Wiley Blackwell and, hence, the referencing style of the journal is used
here. Spelling is Australian English.
…………………………
……………………..
Md Masud Hasan
Date
xii
Abstract
ABSTRACT
Rainfall has a substantial influence on agriculture, rural industries, food security and
water quality. However, different parts of the world, especially many regions of
Australia, often experience rainfall extremes of floods and droughts, and hence the
inhabitants struggle to adapt to the diverse rainfall climate. Better understanding of the
various characteristics of rainfall amounts in any region (variability in rainfall,
distribution of rainfall totals, potential predictors of rainfall, etc.) is the first step
towards adaptation. The first aim of the research was to develop measures to quantify
the consistency in monthly rainfall totals of Australian stations. Then the various
characteristics of the monthly and the seasonal rainfall totals of Australian stations were
studied using well fitting theoretical probability distributions. Finally, appropriate
statistical models are fitted to the monthly rainfall totals to simulate and predict rainfall,
and to find the potential predictors of monthly rainfall totals of Australian rainfall
stations.
Australia has one of the most variable rainfall climates in the world, and substantial
variability in rainfall amounts is considered as a threat to the inhabitants. Therefore,
quantifying the variability in rainfall is important to people living in severe variable
rainfall climates, as in many regions of Australia. The measures that are commonly used
to quantify variability in rainfall are the variance and the coefficient of variation.
However, these measures are not useful for quantifying the variability in monthly
rainfall totals of many Australian rainfall stations, which are highly skewed to the right,
i.e. some months with extremely large amounts of rainfall. Therefore, the concept of
entropy is applied to define measures appropriate for quantifying rainfall variability.
The entropy of stable rainfall (ESR) is defined to quantify the variability in the longterm average rainfall totals across the months of the year. The stations in northern
regions of Australia record more variability in rainfall across the months of the year
than the stations in southern regions. The consistency index (CI) is defined to compare
the variability in rainfall of individual years with the variability in the long-term average
rainfall totals across the months of the year. Areas close to the coastlines in northern,
southern and eastern parts of Australia receive more consistent rainfall in individual
years than elsewhere. In this study, the years were divided into four categories of
xiii
Abstract
potential water resources availability on the basis of annual rainfall total and CI. The
na
potential water resource availabilities of Australian stations in El Ni~n o and La Ni~
n o years are at
years were also compared. Almost everywhere in Australia, the El Ni~
n a years.
greater risk of receiving below median and inconsistent rainfall than the La Ni~
In addition to the variability, the other characteristics of monthly rainfall totals were
studied using theoretical probability distributions that fit well to the data. For this
purpose, the distributions within the Tweedie family were adopted to find the optimal
distributions within the family for modelling the monthly rainfall totals of Australian
stations. Monthly rainfall data of 102 rainfall stations from different parts of Australia
were considered. The possibility that different distributions are needed for modelling
the rainfall totals of each month was also explored. Two distributions within the
Tweedie family (the gamma and the Poisson–gamma) fitted well to the monthly rainfall
totals for the Australian rainfall stations that were studied. The gamma distributions
fitted well to the strictly positive monthly rainfall totals. The Poisson–gamma
distributions fitted well to the monthly rainfall totals that included exact zeros. A slight
variation of the gamma distribution is proposed for use in practice. The proposed
distributions fit the data almost as well as the gamma distributions but admit the
possibility that future months may have zero rainfall. These well-fitting theoretical
probability distributions are useful for studying the different rainfall characteristics and
also for fitting models to monthly rainfall totals with different explanatory variables.
Rainfall models have promising applications in prediction and simulation purposes in
many areas of agriculture, forestry, meteorology and hydrology. Rainfall data at
different timescales have mixtures of discrete (exact zeros when no rain falls) and
continuous (amount of rainfall in rainy events) components. Usually two different
models are used to model these two components of rainfall.
Tweedie generalized linear models were used to model the occurrence and amount of
monthly rainfall simultaneously. First, simple models with only sine and cosine terms as
predictors were fitted to the monthly rainfall totals from 220 Australian stations. Despite
their simplicity, the models fitted well to monthly rainfall data based on the probability
of no rain and the mean monthly rainfall amounts. Tweedie generalized linear models
xiv
Abstract
were also fitted to the monthly rainfall totals of Australian stations with climatological
variables (NINO 3.4, SOI and SOI phase), in addition to the cyclic sine and cosine
terms. The likelihood ratio test was performed to compare the fit of the models with the
climatological variables with the model having only sine and cosine terms as predictors
(the base model). The Bayesian Information Criterion (BIC) was used to identify the
model of best fit among the studied models. On the basis of the BIC, the model with
NINO 3.4 was preferred for modelling the monthly rainfall totals of most of the studied
stations. Stations for which the model with the SOI was preferred for modelling
monthly rainfall totals appeared in small clusters. Adding the climatological variables to
the base model improved the fit of the model and made substantial changes to the
predicted mean monthly rainfall amount and the probability of the occurrence of a dry
month. The models allow for disaggregation of monthly rainfall totals into the number
of rainfall events in each month and the mean amount of rainfall per event.
Finally, an investigation determined whether the distributions within the Tweedie
family fitted well to the seasonal rainfall totals of Australian stations. For this purpose,
seasonal rainfall data from 220 stations from different parts of Australia were studied.
The study found that, within the Tweedie family, normal distributions were rarely
optimal, the gamma distributions were near-optimal for a reasonable number of stations,
and the Poisson–gamma distributions were near-optimal for more than half of the
studied stations. The stations at which the Poisson–gamma distributions were nearoptimal are scattered throughout Australia, whereas many of the stations at which the
gamma distributions were near-optimal are scattered on the coastlines. The near-optimal
distributions within the Tweedie family simulated data with properties very similar to
the original seasonal rainfall data.
In conclusion, this study explored and developed simple statistical methods to
understand various characteristics of rainfall. The proposed consistency index and the
categorisation of the rainfall stations can be helpful for water planners to manage water
supplies, to impose water restrictions, to decide on water allocations and to plan future
water resources. The proposed distributions within the Tweedie family can be used to
understand various characteristics of rainfall and also to fit models to the rainfall data.
The fitted Tweedie generalized linear models generate data with characteristics very
xv
Abstract
close to the original data. The models also quantify the impact of various climatological
variables on monthly rainfall totals. The model results can also be used in agriculture
and primary industries, and also can be incorporated into crop simulation models.
xvi
Publications
PUBLICATIONS FROM DOCTORAL THESIS
At the date of submission of this thesis, the following papers are ‗in press‘ or have been
submitted for publication:
Hasan MM, Dunn PK. 2011. Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia. International Journal of Climatology 31(9):
1389–1397.
Hasan MM, Dunn PK. 2010. A simple Poisson-gamma model for modelling rainfall
occurrence and amount simultaneously. Agricultural and Forest Meteorology 150:
1319–1330.
Hasan MM, Dunn PK. 2011. Entropy, consistency in rainfall distribution and
potential water resource availability in Australia. Hydrological Processes 25(16):
2613–2622.
Hasan MM, Dunn PK. 2010. Understanding the effect of climatology on monthly
rainfall amounts in Australia using Tweedie GLMs. International Journal of
Climatology. In press, accepted December 10, 2010.
Dunn PK, Hasan MM. 2011. The Tweedie family of distributions for modelling
seasonal rainfall totals in Australia. Submitted on Journal of Hydrology, June 11,
2011.
xvii
Introduction
CHAPTER 1: INTRODUCTION
Background
Rainfall is the key element in the water cycle that fills lakes and rivers, recharges the
underground aquifers and provides water to plants and animals. Rainfall is critical to
agriculture as a regular rain pattern is vital for healthy plants. However, various parts of
the world, especially many regions of Australia, often experience extreme rainfall
events. Sometimes, deficiency of rainfall over an extended period of time results in
shortage of water for some activities, groups or environmental sectors, and results in
drought. During the drought from 1991 to 1995, the average production of rural
industries fell about 10 percent, resulting in a cost of cost of about five billion dollars to
the Australian economy (Natural Disasters in Australia, 2007). During the 2002–2003
drought, Australia experienced widespread bushfires, severe dust storms and
agricultural impacts that resulted in a drop of more than one percent in Australia‘s gross
domestic product (Watkins, 2005). In contrast, during periods of heavy rainfall, the
level of water in rivers, lakes or dams rises, and the water overflows to adjoining areas
and causes floods. During significant floods, lives can be lost, stock losses may be in
the tens of thousands, and damage to homes, businesses, roads and the like can run into
hundreds of millions of dollars. For example, during the recent floods in Queensland
and Victoria (2010–11), over 200,000 people from at least 70 towns were affected, and
the estimated damage and lost revenue was about AU$30 billion (ABC News, 18
January 2011).
Understanding the various characteristics of rainfall (variability in rainfall, probability
distribution of rainfall totals, potential predictors, etc.) is important to preserve the
environment from disasters such as floods and droughts, and for better management of
water resources. In this study, the variability in the monthly rainfall totals of Australian
stations was measured. The monthly and seasonal rainfall totals were then studied to
find well-fitting theoretical probability distributions. Finally, the impacts of different
climatological variables on the monthly rainfall totals of Australian rainfall stations
were investigated using statistical models.
1
Introduction
Australia has one of the most variable rainfall climates in the world, and this variability
in rainfall has great impacts on agriculture and primary industries (Beeton et al., 2006;
Wimalasuriya et al., 2008; Walker Institute, 2010). The highly variable rainfall climate
represents one of the challenges for planning and management of sustainable water
resources. Better understanding of the variability in rainfall has significant value for
efficient development, management and usage practices of water resources, and for
determining stream flow. Given the heavy reliance on rain water for consumptive,
agricultural, industrial and recreational uses in Australia, there is a clear need to
quantify the variability in rainfall.
The variance, the coefficient of variation and the variability index (based on percentiles)
are the common measures used to quantify the variability in rainfall (Bureau of
Meteorology, 1989; Sutherland et al., 1991; Dewar and Wallis, 1999; Van Etten, 2009).
The variance and the coefficient of variation are applicable when the data are
approximately normally distributed. When the variable is highly skewed, as is the case
for monthly rainfall totals from many Australian stations, the variance and the
coefficient of variation are not appropriate for quantifying the variability. The
variability index is not applicable to quantify the variability in the monthly rainfall
totals of individual years as there are only 12 rainfall events. Alternatively, the concept
of entropy is applied to the empirical probability distributions to develop measures
useful for quantifying the variability in rainfall. To quantify the variability in the
number of rainy days and in the rainfall amounts across the months of the year, the
intensity and apportionment entropies are defined (Maruyama et al., 2002; Maruyama et
al., 2005). In the literature, the variability in rainfall for any station is measured as the
mean of the entropy values over the entire study period (Maruyama and Kawachi, 1998;
Kawachi et al., 2001; Maruyama et al., 2002; Maruyama et al., 2005). However, the
variability in monthly rainfall totals for individual years may differ significantly from
the average variability of the entire study period. Quantifying this temporal variation in
the rainfall variability is of great importance in planning of water resource usage in
agriculture and in hydrology.
To overcome the shortcomings of the abovementioned measures, an index was defined
to quantify the variability in monthly rainfall totals. The index was defined using the
2
Introduction
concept of entropy and, hence, is applicable to skewed rainfall data. The developed
index was also used to compare the variability in monthly rainfall for individual years
with the variability in the long-term average rainfall across the months of the year. As
an example, the index was used to categorise the Australian stations into different
groups based on the availability of water resources.
The rainfall variability and potential availability of water resources in Australia is
influenced by a number of climatological factors, including the El Ni~n o Southern
n o and La Ni~
na
Oscillation (ENSO). The ENSO is the oscillation between El Ni~
conditions and has been shown to have a great impact on both rainfall amounts and
rainfall variability in Australia (Drosdowsky and Chambers, 2001; Wang and Hendon,
2007; Rotstayn et al., 2009). The ENSO affects the rainfall variability in different parts
of Australia differently (Stone et al., 1996; Kiem and Franks, 2001; Hope et al., 2009;
Smith et al., 2009). Consequently, the variability in rainfall and the availability of the
n o and La Ni~
n a years was also
water resources for Australian stations in El Ni~
compared.
Rainfall totals at all timescales have a random component. This random component can
be modelled by theoretical probability distributions. The selection of probability
distributions appropriate for modelling rainfall totals at different timescales is of
primary interest for probabilistic predictions and also for better understanding of the
various characteristics of rainfall.
The monthly rainfall totals of most Australian stations are highly skewed to the right.
The construction of a universal probability distribution capable of adequately describing
the rainfall totals for all timescales is a difficult task. A literature search confirms that
different distributions have been proposed as appropriate for different timescales or
even for the same timescale. For example, gamma, Kappa, mixed exponential and
mixed gamma distributions have been used for modelling daily rainfall totals (Das,
1955; Ison et al., 1971; Meilke, 1973; Wilks, 1998, 1999; Jamaludin and Jemain, 2008).
The gamma, hyper-gamma, Weibull and log-normal distributions have been used to
model monthly rainfall totals (Momiyama and Mitsudera, 1952; Suzuki, 1964; Ati et al.,
2000; Haghighatju, 2002). Moreover, the rainfall totals for various timescales have both
3
Introduction
discrete (exact zero when no rain falls) and continuous (amount of rainfall in rainy
events) components. However, the usual distributions used to explain the random
component of rainfall totals are defined for strictly positive values. To incorporate the
zeros within a continuous dataset, some authors propose mixture models between
Bernoulli and gamma, or log-normal distributions (Salvucci and Song, 2000; Husak et
al., 2007).
A search for the distributions useful for modelling the occurrence and amounts of
rainfall simultaneously was performed. The possibility that different distributions might
fit well to the rainfall data of different months was also explored. For this purpose, the
distributions within the Tweedie family were considered. One of the advantages of
using the Tweedie family of distributions is that the Poisson–gamma distributions
(Dunn and Smyth, 2005) belong to this family. These distributions consider the monthly
rainfall total as the sum of rainfall totals from a number of rainfall events and, hence,
can be used to model the amount and occurrence of rainfall simultaneously. Moreover, a
framework is already in place for fitting models based on the Tweedie distributions, for
inference and for diagnostic testing. In addition, predictors can be easily incorporated
into the modelling procedure.
Rainfall models are important for forecasting and simulation purposes and also for
understanding the impact of different climatological variables on rainfall. Appropriate
rainfall models assist in developing better climate-related risk management and decision
making capabilities. For instance, simulated rainfall data can be used as an input into
the management of water cycle to manage watersheds, to simulate the infiltration
process of groundwater, and to simulate stream flow (Singh, 1997a; Conway et al.,
2005; Ngongondo, 2006). Prediction of monthly and seasonal rainfall totals is useful to
determine supplemental irrigation, water requirements and storage of water, and for
reservoir management. Rainfall models also have extended applications in modelling
runoff, in determining soil water content, and for forecasting drought and flood. More
recently, interest has been grown in using these models to characterise the changes in
rainfall patterns due to the greenhouse effect and climate change.
The simplest models considered for modelling monthly and seasonal rainfall totals are
the linear regression models (Oettli and Camberlin, 2005; Westra and Sharma, 2009).
4
Introduction
However, the use of linear regression models is not appropriate in cases in which the
data do not approximately follow a normal distribution, as is the case for the monthly
rainfall totals of many Australian rainfall stations. The assumption of a gamma
distribution to model positive daily, weekly or monthly rainfall totals has been used
extensively (Wilks, 1999; Chandler and Wheater, 2002). However, the gamma
distributions are not defined for exact zeros and, hence, are unable to model the zero
rainfall totals. Usually, two different models are used to model the occurrence and
amount of rainfall. Markov chains and logistic regressions are used to model the
occurrence, and the gamma distributions are used to model the amount of monthly
rainfall (Richardson and Wright, 1984; Hamlin and Rees, 1987; Hansen and Ines,
2005).
To model the rainfall occurrence and the amount simultaneously, the Tweedie
generalized linear models (GLMs) has been fitted to the monthly rainfall totals of
Australian stations (Dunn, 2004). Instead of fitting different models to different months,
a single Tweedie GLM is fitted to the monthly rainfall totals of each station, with the
sine and cosine terms as covariates. In addition to the cyclic sine and cosine terms, the
climatological variables such as NINO3.4, SOI and SOI phase are used as covariates in
the model.
A search for the distributions appropriate for modelling the seasonal rainfall totals of
Australian stations was also performed. The seasonal rainfall totals of most Australian
stations are not normally distributed and are skewed to the right. In some cases, the
seasonal rainfall totals are mixture types (i.e. both discrete and continuous components
exist in the data). The seasonal rainfall data have very similar features to the monthly
rainfall data. The distributions within the Tweedie family were considered to find the
optimal or near-optimal distributions within the family for modelling the seasonal
rainfall totals.
Research Aims
The aims of the study were:
5
Introduction
1. To develop an index appropriate for quantifying the consistency in rainfall totals of
Australian stations across the months of the year.
2. To group the Australian rainfall stations into different categories of water resource
availability on the basis of the consistency in rainfall and the total annual rainfall.
3. To assess the impacts of ENSO on the consistency in rainfall and on potential
availability of water resources at Australian rainfall stations.
The consistency index (CI) was defined to quantify the consistency in rainfall in
individual years with the long-term average rainfall across the months of the year. On
the basis of the CI and the total annual rainfall, the Australian rainfall stations were
divided into different categories of water resources availability. The values of the CI
n a years were also
and the potential availability of water resource in El Ni~n o and La Ni~
compared. With the results from the analysis, an article entitled Entropy, consistency in
rainfall distribution and potential water resource availability in Australia has been
accepted for publication in the peer-reviewed journal, Hydrological Processes (an Aclass journal, based on the ERA system of classification) (Chapter 3).
4. To determine the theoretical probability distribution(s) that fit well to the monthly
rainfall totals of Australian stations.
Monthly rainfall totals from 102 stations from different parts of Australia were studied to
search for the theoretical probability distributions that fit well to the data. The Tweedie
family of distributions was considered to find the optimal distributions within the family
for modelling the monthly rainfall totals of the studied stations. With the outcomes from
this study an article entitled Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia has been published in the peer-reviewed journal
International Journal of Climatology (an A-class journal) (Chapter 4).
5. To fit simple and ―well-fitting‖ models to the monthly rainfall totals of Australian
stations for modelling the amounts and occurrence of rainfall simultaneously.
The Poisson–gamma generalized linear models (GLMs) were fitted to the monthly
rainfall totals of the Australian rainfall stations, with the sine and cosine terms as
predictors. The impact of sine and cosine terms on the monthly rainfall totals of the
6
Introduction
stations from different parts of Australia are presented as maps. Monthly rainfall data
were simulated using the models. Results from the models are presented in an article
entitled A simple Poisson–gamma model for modelling rainfall occurrence and amount
simultaneously, which has been published in the peer-reviewed journal Agricultural
and Forest Meteorology (an A-class journal) (Chapter 5).
6. To quantify the effect of different climatological variables on the monthly rainfall
totals of Australian stations using the Tweedie GLMs.
The Tweedie GLMs were also fitted to the monthly rainfall totals of Australian stations,
with sine and cosine terms and climatological variables. The performance of the models
including the climatological variables was compared with the models with only sine and
cosine terms as predictors. The model outputs are presented as tables and appropriate
graphs. Contour maps of Australia were constructed to identify the regions where the
climatological variables have significant impacts on monthly rainfall totals. With the
results of fitted models, a paper entitled Understanding the effect of climatology on
monthly rainfall amounts in Australia using Tweedie GLMs has been accepted for
publication in the International Journal of Climatology (an A-class journal) (Chapter
6).
7. To determine the probability distribution(s) that fit well to the seasonal rainfall totals
of Australian stations.
To determine the appropriate distributions for modelling the seasonal rainfall totals of
Australian stations, the seasonal rainfall data from 220 stations were studied. The
distributions within the Tweedie family were considered for modelling these seasonal
rainfall totals. The outputs from the analysis are presented in the article Optimal
distributions within the Tweedie family for modelling seasonal rainfall totals in
Australia which has been submitted for publication in the Journal of Hydrology (an A*
-class journal). The article is now under review (Chapter 7).
Presentation of the Thesis
The background and aims of the research are discussed in Chapter 1. A brief review of
the relevant literature is presented in Chapter 2. The next five chapters are the
7
Introduction
manuscripts written for publication and possible publication in A- and A*-class
international, peer-reviewed journals. The manuscripts have abstracts, introductions,
methods, discussion, concluding remarks and references. To date, two of the
manuscripts have been published; two manuscripts have been accepted, and one
manuscript is under review.
A statement of intellectual contribution and the reference or publication status for each
manuscript precedes each chapter. These manuscripts are presented in the submitted,
accepted or published form and, as a consequence, some repetition of background
information occurs.
8
Literature review
CHAPTER 2: LITERATURE REVIEW
Introduction
Rainfall is the key factor shaping vegetation, hydrology and water quality. The various
features of rainfall (occurrence and average amount of rainfall, rainfall variability, etc.)
determine which crops can be grown in different regions of the earth. Better
understanding of rainfall characteristics is critical for optimising farm production, for
precision farming and also for identifying the vulnerabilities to climate variability. As
additional insights into precision farming are gained, the importance of understanding
various rainfall characteristics becomes more apparent (Bosch and Davis, 1998).
Variability is a common feature of rainfall, and too much variability in rainfall is a
threat to inhabitants. Well-fitted theoretical probability distributions for rainfall totals at
various timescales have potential for the purposes of simulation and prediction and also
for measuring the various statistics of rainfall totals. In this exegesis, the different
characteristics of the monthly and seasonal rainfall totals of Australian stations such as
the variability in monthly rainfall totals, the well-fitted theoretical probability
distributions appropriate for modelling monthly and seasonal rainfall totals and
potential predictors of monthly rainfall totals will be discussed.
Rainfall Variability
Australia has one of the most variable rainfall climates in the world (Beeton et al.,
2006; Wimalasuriya et al., 2008), and the economy of Australia is vulnerable to rainfall
variability (Walker Institute, 2010). Reduced rainfall or increased rainfall variability
limits the availability of irrigation water for intensively irrigated enterprises such as
horticulture, cotton and sugar growing, dairying, livestock production and viticulture
(Barlow et al., 2010). Variability in the amount of rainfall has impacts on runoff,
groundwater recharge, ecosystems, farm business and rural industries (Ogden and
Julien, 1993; Sandstrom, 1995; Hammer et al., 1996; Weltzin et al., 2003; George et al.,
2005; Bewket, 2007; Chaves and Piau, 2008; Ati et al., 2009). Increased temporal
variability in rainfall patterns in grasslands increases plant water stress and alters key
9
Literature review
carbon cycling processes, which further lower the productivity, even without a change
in mean rainfall (Knapp et al., 2002; Nippert et al., 2006). Rainfall variations are causes
of great stress to farming activities and crop production (Adejuwon et al., 2007). Under
some circumstances, the variability in rainfall has more significant impact on the
landscape change than total rainfall (Tucker and Bras, 2000). The seasonal and interannual variability in rainfall were found to be critical in defining the structure and
density of tree stands in northern Australia (Liedloff and Cook, 2007). The variability in
rainfall also influences the behavioural ecology of nonhuman primate populations
(Dunbar, 1992; Bronikowski and Altmann, 1996; Bronikowski and Webb, 1996).
Understanding the variability in rainfall amounts is important to develop, manage and
use water resources efficiently (Conway et al., 2005; Ngongondo, 2006), and to
determine the characteristics of stream flow (Singh, 1997a). Consequently, many
authors have studied the spatial and temporal variability in the rainfall totals of
Australian stations, and the impacts of this variability on agriculture and primary
industries (Mollah and Cook, 1996; Meinke et al., 2005; Meneghini et al., 2007;
O'Reagain et al., 2009). The various measures used to quantify the variability in rainfall
totals are discussed below.
Measures of Rainfall Variability
The variance and the coefficient of variation are commonly used to quantify the
variability in rainfall at different timescales (Nicholls and Wong, 1990; Kiem and
Franks, 2001; Ayansina, 2009; Hope et al., 2009; Van Etten, 2009) as these measures
are easily understandable and easy to calculate. These measures are useful to quantify
the inter-annual rainfall variability of wet stations, at which the distribution of rainfall
totals is approximately normal. However, the distributions of the monthly rainfall totals
of many Australian stations are skewed to the right. Hence, the variance and the
coefficient of variation are not appropriate measures for quantifying the variability in
rainfall across the months of the year.
The variability in rainfall is also measured by an index based on percentiles, called the
variability index (VI) which is measured as VI = (90th percentile 10th percentile) / 50th
10
Literature review
percentile (Bureau of Meteorology, 1989; Sutherland et al., 1991; Dewar and Wallis,
1999; Van Etten, 2009). The VI is not useful for quantifying the variability in the
monthly rainfall totals of individual years which have only 12 rainfall events (the 12
monthly rainfall totals). The Simpson‘s index and the McIntosh index (McIntosh, 1967)
are sometimes used to quantify the evenness (as opposed to the variability) in rainfall
distribution across the months of the year (Dunbar, 1992; Conway et al., 2005).
However, the use of these two indices is not common.
Entropy and Rainfall Variability
For skewed data, which are not approximately normal, entropy (Shannon, 1948) is a
better measure than the variance or the coefficient of variation to quantify the variability
(Singh, 1997b; Avseth et al., 2005). Entropy is also a better measure of variability than
the variance and the coefficient of variation when the sample size is small (Singh et al.,
2007). Moreover, entropy is related to higher order moments and so offers a closer
characterisation of the data since this measure uses much more information about the
probability distribution (Ebrahimi et al., 1999) than the variance (which only use the
first two moments). Considering all these advantages over other measures, the concept
of entropy has been used for quantifying variability in rainfall.
Maruyama and Kawachi (1998) applied the concept of entropy to empirical probability
distributions to measure the variability in daily rainfall totals. Considering the data from
1105 rainfall stations in Japan, they constructed the empirical probability distributions
of daily rainfall for individual years. The empirical probability distributions were
expressed as the amounts of rainfall on individual days relative to the total annual
rainfall. The empirical probability distributions were then applied to the entropy
formula to obtain the entropy of rainfall for individual years. For a given station, the
mean of the yearly entropy values over the 22 studied years was considered as the
entropy of that station. The same concept was adapted by Kawachi et al. (2001) to
quantify the daily rainfall variability. Using the entropy values of the studied stations,
they constructed an iso-entropy map of Japan. They also used the entropy values to
categorise the rainfall stations of Japan into four groups based on potential water
11
Literature review
resource availability. For this purpose, they coupled the average entropy of daily rainfall
with the average annual rainfall (both were averaged over the study period).
Maruyama et al. (2002) applied the concepts of entropy to the rainfall data of Japan in
order to develop measures to quantify the variability in the amount of rainfall and the
number of rainy days across the months of the year. They defined the apportionment
entropy (AE) to quantify the variability in the amount of rainfall across the months of
the year. Intensity entropy (IE) was also defined to measure the variability in the
number of rainy days across the months of the year. They constructed iso-entropy maps
of Japan with values of the AE and the IE. Maruyama et al. (2005) measured the AE
and IE of monthly rainfall totals from 11,260 stations worldwide to quantify the
variability in rainfall totals across the months of the year. With the values of the two
entropies, they grouped the studied stations into categories based on potential water
resource availability. For this purpose, they first standardised the AE and IE values of
the stations. The water resource categories were obtained using the simple and k-means
clustering of the standardised entropy values. They constructed worldwide maps
representing potential water resource availability.
In the abovementioned studies (Maruyama and Kawachi, 1998; Kawachi et al., 2001;
Maruyama et al., 2002; Maruyama et al., 2005), the rainfall variability of each station is
measured as the mean of annual entropy values over the entire studied period. However,
significant temporal variations in rainfall variability are evident for most Australian
rainfall stations. Quantifying the year-to-year variation in monthly rainfall variability is
important as these variations have impacts on agriculture and other primary industries.
Mishra et al. (2009) used the concept of entropy to measure rainfall variability at
different timescales (monthly, seasonal, yearly and decadal). To quantify the temporal
variations in the rainfall variability, they defined the disorder index as the difference
between the maximum entropy (entropy value from the complete even distribution) and
the observed entropy of individual years. For almost all Australian rainfall stations, the
amount of rainfall is not evenly distributed across the months of the year. Some stations
have extremely variable rainfall distribution (almost no rainfall in some months and
relatively large amounts of rainfall in other months). Hence, the value of the maximum
12
Literature review
entropy is not a good choice to compare with the entropy of individual years. Moreover,
a comparison of the rainfall variability of different stations with a single value
(maximum entropy) is not always appropriate.
For individual stations, I propose measuring the variability in long-term average rainfall
totals across the months of the year using the entropy of stable rainfall (ESR). To
compare the rainfall variability in individual years with the long-term average rainfall
variability (as measured by ESR), I will define the consistency index (CI). The
availability of potential water resources depends on the total annual rainfall and the
variability in rainfall across the months of the years. Considering these two dimensions
of rainfall, the rainfall stations of Australia will be grouped into categories based on the
availability of water resources.
Impact of ENSO on Rainfall Variability
n o Southern Oscillation (ENSO) refers to coherent fluctuations of sea surface
The El Ni~
temperatures, pressure anomalies, and zonal winds in the basins of the tropical Indian
and Pacific Oceans, and has been shown to have a great impact on both rainfall amounts
and rainfall variability in Australia (Trenberth, 1997; Ropelewski and Halpert, 1987;
Drosdowsky and Chambers, 2001; Wang and Hendon, 2007; Rotstayn et al., 2009). The
ENSO affects the rainfall variability in various parts of Australia differently (Stone et
al., 1996; Kiem and Franks, 2001; Hope et al., 2009; Smith et al., 2009). Consequently,
the impact of the ENSO on the consistency in rainfall distribution and on the potential
availability of water resources in Australia was also studied. The NINO 3.4 index is one
of the ENSO indicators and is based on sea surface temperatures. NINO 3.4 is the
anomaly in average sea surface temperature in the region bounded by 50 N 50 S ,
n o time scales and therefore is
120 0 170 0 W . This region has large variability in El Ni~
used by many authors to understand the variability in rainfall distribution in different
n o or La Ni~
n a event is identified if the five-month running
parts of Australia. An El Ni~
n o or 0.40 for a La Ni~
na
average of the NINO 3.4 index exceeds 0.40 for an El Ni~
event for at least six consecutive months (Wu and Kirtman, 2007; Abtew et al., 2009a;
Everingham and Reason, 2009; Lee et al., 2009).
13
Literature review
Rainfall Distributions
In addition to variability, other characteristics of rainfall, such as the occurrence and
amount of rainfall, can be studied by well-fitting theoretical probability distributions.
Establishing the theoretical probability distributions that provide a good fit to rainfall
totals at various timescales has long been a topic of interest in hydrology, climatology
and meteorology (Ben-Gai et al., 1998; Koutsoyiannis, 2005; Husak et al., 2007). The
well-fitting probability distributions are useful in the generation of synthetic rainfall
data, in probabilistic predictions, and for better understanding of rainfall characteristics
(Rosenberg et al., 2004; Madi and Raqab, 2007; Piantadosi et al., 2009; Ingsrisawang et
al., 2010).
For wet stations, the annual rainfall totals approximately follow the normal distribution.
However, when the time scale is finer than annual, such as seasonal, monthly, daily or
hourly, the distribution of rainfall is highly skewed to the right. The construction of a
universal probability distribution capable of adequately describing rainfall at all
timescales is a difficult task. In the literature, different distributions have been proposed
as appropriate for modelling rainfall totals of different timescales, or even of the same
timescale. Moreover, the daily, weekly, monthly and seasonal rainfall totals have both
discrete and continuous components. The usual distributions used for explaining the
random component of rainfall totals are defined for strictly positive values
(Haghighatjou, 2002; Rosenberg et al., 2004; May, 2004; Aksoy, 2006).
The aim of current research was to search for the well-fitting theoretical probability
distributions for monthly and seasonal rainfall totals of Australian stations. However,
the daily and weekly rainfall totals have almost the same characteristics (e.g. right
skewed data with some exact zeros) as the monthly and seasonal rainfall totals. Hence, a
search of the literature for the distributions used for modelling the rainfall totals at finer
timescales was also performed. The distributions used for modelling the rainfall totals at
different timescales are discussed below.
14
Literature review
Daily Rainfall Distributions
The gamma distribution is a good choice to describe the rainfall totals for several
reasons. The first advantage of using the gamma distribution is that it is bounded on the
left at zero (Wilks, 1995; Thom, 1958). This characteristic is important for modelling
rainfall as negative rainfall total is not possible. Second, the gamma distribution is
positively skewed, meaning that it has an extended tail to the right of the distribution.
This characteristic is advantageous because it mimics actual rainfall distributions for
many areas where there is a non-zero probability of extremely high rainfall amounts,
even though the typical rainfall may not be very large (Ananthakrishnan and Soman,
1989). Third, the gamma distribution offers a tremendous amount of flexibility in the
shape of the distribution function (Wilks, 1995). The gamma distribution may range
from exponential-decay forms for shape values near one, to nearly normal forms for
shape values beyond twenty (Ozturk, 1981; Wilks, 1990). This flexibility allows the
gamma distribution to be fitted to any number of rainfall patterns with reasonable
accuracy.
For the abovementioned reasons, gamma distributions are widely used for modelling
right-skewed, positive daily rainfall totals (Das, 1955; Ison et al., 1971; Katz, 1977;
Buishand, 1978; Coe and Stern, 1982; Stern and Coe, 1984; May, 2004; Aksoy, 2006).
The other theoretical probability distributions that have been used for analysing daily
rainfall totals are the Kappa (Meilke, 1973), generalized log-normal (Swift and
Schreuder, 1981), mixed exponential (Chapman, 1997; Wilks, 1998, 1999) and mixed
gamma (Jamaludin and Jemain, 2008). Jamaludin and Jemain (2008) used exponential,
gamma, mixed exponential and mixed gamma distributions to describe the daily rainfall
amount in Malaysia and, based on the Akaike Information Criterion, showed that the
mixture distributions are better than single distributions for describing the amount of
daily rainfall.
To model the occurrence of daily rainfall, first-order (Gabriel and Neumann, 1962; Elseed, 1987), higher-order (Katz, 1977; Deni et al., 2009), hybrid (Wilks, 1999) and
hidden (Robertson et al., 2003) Markov chain models have been used. First-order
Markov chain models assume that the occurrence of rainfall on a day depends on the
15
Literature review
occurrence of rainfall on the previous day. Higher-order Markov chains consider that
the occurrence of rainfall on a day depends on the occurrence of rainfall on two or more
days earlier. Higher-order Markov models are more complex, but perform marginally
better (Deni et al., 2009). Hybrid Markov chains consider different orders for wet and
dry days while hidden Markov chains consider some hidden states. Chandler (2005)
used logistic regression to model dry or wet days as a function of a number of
predictors.
Monthly Rainfall Distributions
Like the daily rainfall totals, monthly rainfall totals for many Australian stations are
skewed to the right, and hence the assumption of normality is not appropriate. Again,
different distributions are suggested by different authors to model the monthly rainfall
totals of different regions. Sometimes, different distributions are suggested for
modelling the rainfall totals of different months.
Momiyama and Mitsudera (1952) showed that the gamma distribution fits well to the
monthly rainfall totals over Japan. However, Suzuki (1964, 1967) suggested that the
hyper-gamma is a well-fitting distribution for modelling the monthly rainfall totals of
Tokyo and Niigata, Japan. Mooley (1973) considered rainfall data from 39 stations in
different regions of Asia and studied the monthly rainfall data of the summer season.
Considering zero rainfall an attainable lower bound, he compared the fit of the gamma
distribution with other Pearsonian distributions (Pearsonian I and Pearsonian IX) and
concluded that the gamma distribution fits well to the monthly rainfall totals in the areas
experiencing Asian summer monsoon.
Considering the monthly rainfall totals from 29 stations in Libya, Sen and Eljadid
(1999) showed that gamma probability distributions with varying parameters fitted well
to the data. They constructed maps of the studied region with the shape and scale
parameters of the gamma distributions. From these two maps, the parameters for any
desired station could be found, provided that its position within the study area was
known. By substituting the zero rainfall totals with 0.01 inch, Ali et al. (2000) analysed
monthly rainfall totals over central and southern Florida. Comparing six different
16
Literature review
distributions (normal, log-normal with two parameters, log-normal with three
parameters, gamma with two parameters, gamma with three parameters, Weibull and
log-Pearson type III), they concluded that the two-parameter gamma distribution is
optimal within the studied distributions to model the monthly rainfall totals of the
region.
Haghighatjou (2002) compared the normal, two-parameter log-normal, three-parameter
log-normal, two-parameter gamma, Pearson type III, log-Pearson type III and type I
extremal (Gumbel) distributions for fit with the monthly rainfall data of the oldest
stations in Iran, including Bushehr, Isfahan, Mashhad, Tehran and Jask. To test how
well the distributions fitted the data, Kolmogrov-Smirnov test, mean relative deviation,
and mean square relative deviation were used. The log-Pearson type III distribution
proved best for modelling the monthly rainfall data of the studied stations.
Rosenberg et al. (2004) proposed a more general model, in which the gamma
distribution is extended to a rapidly convergent series of associated Laguerre
polynomials obtained from the parameters for the original gamma distribution. The
model allows matching of the observed higher order moments and, hence, allows
matching of more of the observed characteristics. They also noted that, if there are some
observed zero values, the data can be modelled by a mixed distribution and the gamma
distribution or the more general series representation can be used to model the non-zero
part of the distribution.
By fitting different theoretical probability distributions to non-zero monthly rainfall
totals of eight stations from arid and semi-arid regions of Ethiopia, Tilahun (2006)
showed that no single distribution provided a good fit for all the monthly rainfall data.
However, the gamma and log-normal distributions were preferred. By fitting gamma,
normal, Weibull and log-normal distributions to monthly rainfall data from 32 stations
from the upper Blue Nile basin, Abtew et al. (2009b) showed that the probability
distribution of monthly rainfall varied from month to month. The March, April, May
and August rainfall fitted the normal distribution, whereas the September rainfall fitted
well to the log-normal distribution. The January, July, October and November average
17
Literature review
rainfall of the basin fitted well with the gamma probability distribution. The Weibull
distribution fits well to the February, June and December rainfall.
In the abovementioned studies, the zero rainfall totals are either excluded from the
analysis or replaced by an attainable value. Thus, for modelling monthly rainfall totals,
incorporating the exact zeros with positive rainfall totals is a challenge, which has been
addressed by some authors.
Dingens and Steyaert (1971) proposed a general form of distributions as a mixture of
gamma distributions with the probability of no rainfall. Using the rainfall data from an
observatory at the University of Ghent, Belgium, they showed that the mixture
distributions fitted well to the monthly rainfall totals. Abouammoh (1986) used mixture
models incorporating Poisson and gamma distributions to model the monthly rainfall
totals of three regions in Saudi Arabia. Salvucci and Song (2000) considered the
Poisson arrival of storms of gamma distribution depth (the Poisson–gamma model) for
aggregated monthly rainfall. They evaluated the model using forty-five year time series
of rainfall from Boston, Massachusetts and Los Angeles, California. Lana and
Burgueno (2000) studied seven series of monthly rainfall totals, sometimes exceeding
recording periods of 100 years, from the Mediterranean coast and nearby Atlantic coast.
First, three statistical distributions (gamma, log-normal and a combination of Poisson
and gamma distributions) were fitted to model the monthly empirical distributions of
rainfall amounts. Each distribution was tested with the Kolmogorov-Smirnov test. They
found that most of the monthly cases required the gamma distribution and the rainfall
totals of the summer months were well described by the Poisson–gamma distribution.
Husak et al. (2007) accumulated probabilities conditional on the presence of rainfall,
with a mixture coefficient for the probability of no rain, to create the probability
distribution of monthly rainfall totals. They used a gamma distribution and excluded the
zero values in the rainfall history from the calculation of the shape and scale parameters.
In order to account for the occurrence of no rain in the modelled history, they used an
additional parameter in the theoretical distribution which corresponded to the
probability of receiving no rainfall during the interval. This probability of no rainfall is
estimated by counting the number of occurrences of zero and dividing it by the number
18
Literature review
of historical observations. Gonzalez and Valdes (2008) used the gamma distribution
function for modelling the probability distribution of non-zero rainfall. However, due to
the existence of arid and semi-arid areas in the study region, some stations had months
with probability of zero rainfall. Therefore, the composite gamma distribution function
was proposed for modelling monthly rainfall totals with some exact zeros.
Piantadosi et al. (2009) presented a model for the generation of synthetic monthly
rainfall data for a rainfall station at Parafield in Adelaide, South Australia. The rainfall
for each month of the year was modelled as a non-negative random variable from a
mixed distribution, with either a zero outcome or a strictly positive outcome.
Dunn (2004) proposed the distributions from the Tweedie family for modelling rainfall
totals on daily and monthly timescales. The main appeal of these distributions is that
they can model the rainfall amounts for both mixture type (continuous data with exact
zeros) and continuous data (continuous rainfall totals for all months). That is, separate
models are not necessary for modelling the occurrence and amount of rainfall. These
distributions belong to the exponential family of distributions, upon which GLMs are
based. Consequently, a framework for fitting models based on the Tweedie distributions
and for diagnostic testing already exists. In addition, covariates can be incorporated into
the modelling procedure. These distributions provide a mechanism for understanding
the fine-scale structure in coarse-scale data (Dunn, 2004). All exponential dispersion
models that are closed with respect to scale transformations are Tweedie models
(Jørgensen, 1997; Jiang, 2007); that is, if Y is from a particular Tweedie distribution
with index p , and c is a constant, then cY is also from the same Tweedie distribution.
Considering these advantages of the Tweedie family of distributions, in this study I used
these distributions to find the optimal distributions within the family for modelling the
monthly rainfall totals of Australian stations. The possibility that different distributions
within the Tweedie family may be optimal for modelling the rainfall totals of different
months was also explored.
19
Literature review
Modelling Monthly Rainfall Totals
The ability to model rainfall has promising applications in predictive analyses, and also
in crop growth, hydrological systems and crop simulation studies (Richardson and
Wright, 1984; Hansen et al., 2009). The monthly and seasonal rainfall models are used
for agricultural planning and management (Nnaji, 2001; Hansen et al., 2009; Nasseri
and Zahraie, 2010), and stochastic disaggregation of monthly rainfall data is useful for
crop simulation models (Lennox et al., 2004; Hansen and Ines, 2005). Simulated
synthetic rainfall data can be used as input into the management of water cycle,
especially when the observed rainfall record is inadequate with respect to length,
completeness or spatial coverage (Wilks, 1999; Rosenberg et al., 2004). Estimation of
water demand and the simulation of water supply systems generally need monthly
rainfall models (Srikanthan and McMahon, 2001).
Monthly Rainfall Models
Considering the importance of modelling monthly rainfall totals in different areas of
hydrology, agriculture and ecology, many studies have been done in this context. The
existing literature includes studies on statistical (linear regression, polynomial
regression, generalized linear regression) models, stochastic (ARIMA, ARMA) models,
fuzzy-rule based methods and the method of fragments to predict or generate monthly
rainfall totals.
The simplest models for modelling the monthly rainfall totals are the linear regression
models. Using monthly rainfall data from 24 weather stations in major wheat-growing
regions of Australia, Boer et al. (1993) used models for the prediction of rainfall based
on geographical information. They fitted linear regression models to model the amount
of rainfall as a function of longitude, latitude and altitude of the stations. Jacob et al.
(2003) investigated the long-term changes in rainfall intensities at both annual and
monthly scales by applying linear regression analysis to search for trends in the data
record. The linear regression analysis was extended to address not only the trends but
also the highest amounts of rainfall in each year or month and the changes in the
second, third and nth highest rainfall intensities. Chowdhury and Sharma (2007) used a
20
Literature review
linear regression model to quantify the effect of the El Ni~n o Southern Oscillation on the
amount of monthly rainfall. Considering nonlinear effects of some covariates on the
amounts of monthly rainfall, Zaw and Naing (2008) used a polynomial regression to
model the amount of monthly rainfall in Myanmar.
One of the basic assumptions of the abovementioned models is that the amount of
rainfall is normally distributed with constant variance. For some stations, the amounts
of rainfall on some timescales (e.g. annual) approximately follow a normal distribution.
In these instances, the use of normal distributions is appropriate. However, the amount
of monthly, weekly or daily rainfall usually does not follow normal distribution and is
right skewed. Therefore, alternative distributions are needed to model the amount of
rainfall on these shorter timescales.
In order to link the monthly rainfall to large-scale circulation patterns, Ozelkan et al.
(1996) used a fuzzy-indexing technique in conjunction with a fuzzy rule-based
technique and a standard linear regression. To measure the forecasting capability of the
models, the data were divided into a calibration period and a validation period. A
comparison of the results showed that the fuzzy rule-based model performed better than
the regression model and has potential for forecasting of monthly rainfall. Moreover,
they adapted a fuzzy rule-based framework in order to use the model under climate
change.
Monthly rainfall data can also be generated by disaggregating the generated annual
rainfall using the method of fragments (Srikanthan and McMahon, 1984), the method of
synthetic fragments (Porter and Pink, 1991) or the modified method of synthetic
fragments (Maheepala and Perera, 1996).
The merits of daily and monthly downscaling models for rainfall were compared using
data from Bern, Switzerland; Deuselbach, Germany; and De Bilt, the Netherlands
(Buishand et al., 2004). For each station, generalized linear models were fitted to
describe rainfall occurrence, the wet-day rainfall amounts, and the monthly rainfall
totals. The predictor dataset included dynamic variables and atmospheric moisture
(relative humidity for rainfall occurrence and specific humidity for rainfall amount).
Fealy and Sweeney (2007) observed daily and monthly rainfall data from 14 synoptic
21
Literature review
stations, obtained from the Irish Meteorological Service, Met Eireann, for the period
1961–2000. Logistic regression was employed to model wet- and dry-day sequences of
rainfall. While the mixed exponential distribution has been found to provide a better fit
for rainfall amounts (Wilks and Wilby, 1999), the relationship between the mean and
the variance in this distribution makes it difficult to incorporate into a GLM.
Nonetheless, the gamma distribution GLM has been found to be a good fit for rainfall
amounts in a number of regions.
The use of a GLM based approach for forecasting of monsoon rainfall is proving to be
an attractive alternative to the existing statistical models. The GLM approach has the
versatility to produce forecasts of both discrete variables (such as wet days, dry days,
etc.) and also continuous variables (such as weekly, monthly and seasonal rainfall). This
framework can also be used to produce forecasts of extreme events (Katz et al., 2002).
Rajagopalan (2009) found that the GLMs were useful for forecasting in monthly and
seasonal rainfall in Gujarat and also for predicting monthly and seasonal total rainfall
and number of wet days across India. They applied the GLMs to generate a forecast for
the June to September 2009 monsoon rainfall.
The monthly rainfall data from many Australian stations have exact zeros combined
with continuous rainfall totals. To incorporate the exact zeros within the continuous
data, the Tweedie generalized linear models were used in this study. Cyclical patterns
are likely to be evident in rainfall; for example, most locations have drier and wetter
months consistently from year to year. To model this cyclic pattern in rainfall totals,
models with the sine and cosine terms were developed.
The relationship between the southern oscillation index (SOI) and rainfall in Australia
has been well known for many years and studied by numerous authors (e.g. Troup,
1965; Quinn and Burt, 1972; Power et al., 1997; Simmonds and Hope, 1997;
Chowdhury and Beecham, 2010). Using the SOI, Stone and Auliciems (1992)
constructed five SOI phases, which were then be used to study the effect of the SOI on
rainfall amounts. The SOI phases have proven useful for studying their effects on
rainfall and on different types of cropping in Australia (Stone and McKeon, 1993;
Hammer et al., 1996; Meinke et al., 1996; Meinke and Ryley, 1997; Stone et al., 1996b;
22
Literature review
Willcocks and Stone, 2000). The NINO 3.4 index is one of the El Niño Southern
Oscillation indicators based on the temperature of the sea surface and is related to
monthly rainfall amounts in Australia (Wu and Kirtman, 2007; Everingham and Reason,
2009; Lee et al., 2009). The Niño 3.4 region overlaps portions of the Niño 3 and Niño 4
regions covering an area from 5°N-5°S and 170°W-120°W. Barnston and Chelliah
(1997) defined the Niño 3.4 region based on the correlation between the SOI-defined
ENSO events being stronger with the Niño 3.4 index than with the Niño 3 index. The
Japan Meteorological Agency (JMA) index was defined with the sea surface
temperature of the region located within the Niño 3 region, extending from 4°N- 4°S
and 150°W-90°W. The JMA index is found to be more sensitive to La Niña events than
all other indices. The Southern Oscillation, Niño 3.4, and Niño 4 indices are almost
equally sensitive to El Niño events and are more sensitive than the JMA, Niño 1+2, and
Niño 3 indices. (Hanley et al., 2003).
Consequently, in this study, the Tweedie GLMs were fitted to the monthly rainfall totals
of Australian stations, with these climatological variables as predictors. The impact of
different climatological variables on monthly rainfall totals in various regions of
Australia was also examined.
Seasonal Rainfall Distributions
The linear regression models are commonly used for modelling the seasonal rainfall
totals of different regions (Nnaji, 2001; Chiew and Leahy, 2003; Suppiah, 2004;
Cheung et al., 2008; Gonzalez and Cariaga, 2009; Risbey et al., 2009; Kirono et al.,
2010; Purdie and Bardsley, 2010). One of the basic assumptions for fitting the linear
regression models is that the distribution of the dependent variable is approximately
normally distributed. This assumption is not valid for the seasonal rainfall totals of
many Australian rainfall stations, which are skewed to the right. Hence, the use of a
linear regression model is not appropriate for modelling to the seasonal rainfall totals of
these stations.
Alternatively, the standardized precipitation index (SPI) is calculated for seasonal
rainfall totals, by transforming the seasonal rainfall totals which have gamma
23
Literature review
distributions to standard normal distributions. Linear regression models were then fitted
to the SPI (Almeira and Scian, 2006; Canon et al., 2007; Pai et al., 2010). However, the
calculation of the SPI is not straightforward, and the models based on the SPI do not
predict the actual rainfall total. The seasonal rainfall totals of different regions can be
modelled by theoretical probability distributions. Searching for distributions that fit well
for seasonal rainfall totals is important for improving the present forecast format of
seasonal totals and for calculating the various statistics of the dataset.
Mooley and Rao (1971) studied seasonal rainfall totals from 53 long-recorded stations
in India and found that distribution is skewed to the right. Since the normality
assumption did not hold in this case, they tested if the gamma distributions fitted the
data. Using the chi-square tests for goodness-of-fit, they showed that gamma
distributions provide good fit to the seasonal rainfall patterns at the majority of stations
in different parts of India. However, a normal distribution was a good fit for seasonal
rainfall at stations in some parts of India. The gamma distributions were also used to
model the summer monsoon (June to September) rainfall for a network of 39 well
distributed stations over the Asiatic monsoon area (Mooley, 1975).
Ropelewski et al. (1985) analysed seasonal rainfall data from 3400 stations worldwide
and fitted gamma distributions to the data which had only positive seasonal rainfall
totals. As the gamma distribution is not defined for exact zeros, they did not fit gamma
distributions to the stations in arid regions or regions with pronounced wet and dry
seasons (i.e. monsoon and Mediterranean climates).
Wilks and Eggleston (1992) considered the modified gamma distribution to fit seasonal
rainfall totals of 5376 stations in the United States. They calculated the maximum
likelihood estimates of the parameters with zero seasonal totals as censored data. They
concluded that, although the gamma distribution is not necessarily the best parametric
distribution for representing the seasonal rainfall at a particular location, it often
provides a quite reasonable approximation.
To fit a single model for the rainfall stations in a region, Guttman et al. (1993) divided
the United States into 104 regions. They used the three-parameter generalized extreme
value, gamma, generalized logistic and log-normal distributions for the regional average
24
Literature review
L-moments of the seasonal rainfall data. They observed that different distributions fitted
well to the seasonal rainfall data from different seasons in different areas. Sometimes,
more than one distribution fitted well to the rainfall data in particular seasons. To
incorporate the exact zero rainfall totals into the models, they first calculated the
probability of zero rainfall as a proportion of the dry season. The parameters of the
theoretical probability distributions were computed by the non-zero rainfall amounts.
Finally, the cumulative distribution functions of the mixture models were obtained as a
mixture of the probability of no rainfall and the cumulative distribution functions of the
non-zero rainfall totals.
Cho et al. (2004) investigated the spatial characteristics of non-zero rain rates to develop
a probability density function of rainfall using the large rainfall dataset provided by the
TRMM satellite. As a first step, the gamma and log-normal distributions were fitted to
the data. The sensitivity of model parameters to bin width and dynamic range was then
investigated. Yue and Hashino (2003) studied the seasonal rainfall data from 22
meteorological stations throughout Japan with long-term monthly rainfall records (about
110 years). For seasonal rainfall, the Pearson type III (gamma) distribution was the best
fit for the observations of spring rainfall; the log-Pearson type III was the best fit for
summer and winter rainfall; and the three-parameter log-normal was the best fit for
autumn rainfall, with the log-Pearson type III as a potential alternative.
To develop decision support system for agricultural risk for the farmers in Burkina
Faso, West Africa, Rader et al. (2009) analysed the July-August-September rainfall data
from a village in Bonam for the period 1963–1999. They used the seasonal forecasts
predict total as the probabilities that the rainfall falls into low, middle or high rainfall
categories based on the historical rainfall record for the area. For this purpose, they used
gamma distributions to fit the seasonal rainfall totals.
The abovementioned studies observed that a single distribution does not fit well to the
seasonal rainfall totals of all stations and, hence, a variety of distributions are required
to model seasonal rainfall data. The seasonal rainfall totals of many Australian stations
are mixture types (continuous data with exact zeros), but the usual distributions used to
fit seasonal rainfall totals are not defined for exact zeros. Alternatively, the Tweedie
25
Literature review
family of distributions was considered in this study to find the optimal distributions
within the family to model the seasonal rainfall totals of Australian stations. The
Tweedie family of distributions were considered as these distributions incorporate exact
zeros into the positive seasonal rainfall totals. I also explored the possibility that
different distributions may be optimal within the family for modelling the rainfall totals
in different seasons.
Conclusion
This chapter has reviewed classic and recent research on some characteristics of rainfall.
First, the different measures of rainfall variability were discussed. The commonly used
measures are not always appropriate for quantifying the rainfall variability of Australian
stations across the months of the year. Therefore, the concept of entropy has been
adapted to develop an index appropriate for quantifying the rainfall variability and to
compare rainfall variability in individual years with the long-term average rainfall
variability. Second, the theoretical probability distributions that fit well with monthly
and seasonal rainfall totals of Australian rainfall stations were reviewed. The theoretical
probability distribution(s) that fit well to the monthly and seasonal rainfall totals of
Australian stations were determined. Finally, the different models used for predicting
and generating monthly rainfall totals were discussed. Well fitted statistical models are
useful for simulating and predicting rainfall totals. The values of the coefficients of the
fitted models can be used to understand the effect of different climatological variables
on rainfall totals.
26
Consistency in rainfall and potential water resource availability
CHAPTER 3: ENTROPY, CONSISTENCY IN RAINFALL
DISTRIBUTION AND POTENTIAL WATER RESOURCE
AVAILABILITY IN AUSTRALIA
Authors: Md Masud Hasan and Peter K. Dunn
Affiliations: School of Health and Sport Sciences, Faculty of Science, Health and
Education, University of the Sunshine Coast.
Journal: Hydrological Processes (2011), 25: 2613–2622.
JCR ranking: 12/66 (Water Resources)
Impact factor: 1.870
ARC tier ranking: A
27
Consistency in rainfall and potential water resource availability
Statement of Intellectual Contribution
I, Md Masud Hasan, have made substantial independent intellectual contributions to the
research paper ‗Entropy, consistency in rainfall distribution and potential water
resource availability in Australia‘. The intellectual contributions include the
development of the study hypothesis, identification and modification of methodology,
development of new applications, responsibility for independent analysis and
interpretation of results.
…………………………….
…………………………………………
Md Masud Hasan
Date
………………………………..
……………………………..............
Peter K Dunn
Date
28
Consistency in rainfall and potential water resource availability
Abstract
Understanding the variability in monthly rainfall amounts is important for the
management of water resources. We use entropy, a measure of variability, to quantify
the rainfall variability in Australia. We define the entropy of stable rainfall ESR to
measure the long-term average rainfall variability across the months of the year. The
stations in northern Australia observe substantially more variability in rainfall
distributions and stations in southern Australia observe less variability in rainfall
distribution across the months of the year. We also define the consistency index CI to
compare the monthly rainfall distribution of a given year with the long-term average
monthly rainfall distribution. Higher value of the CI indicates the rainfall in the year is
consistent with the overall long-term average rainfall distribution. Areas close to the
coastline in northern, southern and eastern Australia observe more consistent rainfall
distribution in individual years with the long-term average rainfall distribution. For the
studied stations, we categorize the years into different potential water resource
availability on the basis of annual rainfall amount and CI. For almost all Australian
rainfall stations, El Ni~n o years have a greater risk of having below median and
n a years. The results may be
relatively inconsistent rainfall distribution than La Ni~
helpful for developing area-specific water usage strategies.
KEY WORDS: Entropy, Rainfall variability, Consistency index, Potential water
resource availability, El Ni~n o Southern Oscillation.
29
Consistency in rainfall and potential water resource availability
Introduction
Australia has one of the most variable rainfall climates in the world (Beeton et al., 2006;
Wimalasuriya et al., 2008). Variability in the amount of rainfall has an important
influence on farm business, rural industries and food security of farming households
(George et al., 2005; Bewket, 2007). For example, high variability in seasonal rainfall
from year to year means that cropping is often financially risky (Hammer et al., 1996).
Studying the variability in rainfall amounts is also important for efficient water
resources development, management and usage practices (Conway et al., 2005) and for
determining the characteristics of stream flow hydrographs (Singh, 1997a). Many
authors have studied the spatial and temporal variability in total annual rainfall in
Australia and their effect on agriculture and primary industries (Mollah and Cook, 1996;
Meinke et al., 2005; Meneghini et al., 2007; O'Reagain et al., 2009).
Common statistics used to measure the variability in rainfall are the variance and
coefficient of variation (Kiem and Franks, 2001; Hope et al., 2009; Van Etten, 2009).
However, variance and coefficient of variation are unsuitable measures of variability
when the distribution is not symmetric, as is the case with the monthly rainfall
distribution at Australian stations which is very much skewed to the right. Because of
this, some authors measure the variability in rainfall using percentiles; for example, the
Variability Index VI is defined as (90th percentile - 10th percentile) / 50th pecentile
(Dewar and Wallis, 1999; Van Etten, 2009). VI is not useful for measuring the
variability of the monthly rainfall totals within an individual year as there are only
twelve rainfall events (the twelve monthly totals).
When the probability distributions are asymmetric, entropy (Shannon, 1948) is a better
measure of the variability than variance (Singh, 1997b; Avseth et al., 2005). Entropy is
also a better measure of variability than variance when the sample size is small (Singh
et al., 2007). Further, the variance is related to the first two moments of the distribution
only. However, the Legendre series expansion of the entropy formula reveals that
entropy may be related to higher-order moments of a distribution, and so offers a much
closer characterization of the distribution since it uses much more information about the
probability distribution than the variance (Ebrahimi et al., 1999). Considering these
30
Consistency in rainfall and potential water resource availability
advantages over other measures of variability, we use entropy to measure the variability
in monthly rainfall amounts in Australia.
Applying the concept of entropy to empirical probability distributions, the variability in
the number of rainy days per month is measured by intensity entropy IE and the
variability in the amount of rainfall across the months is measured by apportionment
entropy AE. Maruyama and Kawachi (1998) measured the variability in daily rainfall
amounts of stations as the average rainfall entropy of the studied years. Using the
average of annual entropies and median of annual rainfalls, Kawachi et al. (2001)
categorized the rainfall stations in Japan on the basis of potential water resources
availability. Maruyama et al. (2002) used AE and IE to measure the monthly rainfall
variability in Japan. Using rainfall data from 11,260 stations worldwide, Maruyama et
al. (2005) used the AE and IE to understand the monthly rainfall variability. Using
simple and k-means clustering with the standardized AE and IE, worldwide maps were
constructed representing potential water resource availability. From the parts of the map
representing Australia, observed that the coastlines of south-east Australia observed
better water resources availability than elsewhere. Mishra et al. (2009) used entropy to
measure rainfall variability of different timescales (monthly, seasonal, yearly and
decadal). Analysing the data of 43 rainfall stations from Texas, USA, they concluded
that, the increasing trend of drought in some region may continue. In the
abovementioned studies (Maruyama and Kawachi, 1998; Kawachi et al., 2001;
Maruyama et al., 2002), the average of annual entropy values of the entire studied
period were used as the entropy value of a station. The long-term average of entropy
values of a station may differ significantly from the entropy values of individual years.
In this paper, consistency in monthly rainfall distribution for individual years will be
measured to observe the changes in the rainfall distribution from long-term average
rainfall distribution. On the basis of consistency in rainfall distribution and annual
rainfall totals, we categorize years according to the different availability of water
resources.
The rainfall variability and potential water resource availability in Australia is
influenced by a number of climatological factors, including El Ni~n o Southern
n o Southern Oscillation (ENSO) refers to coherent
Oscillation (ENSO). The El Ni~
31
Consistency in rainfall and potential water resource availability
fluctuations of sea surface temperatures, pressure anomalies, and zonal winds in the
basins of the tropical Indian and Pacific Oceans, and has been shown to have a great
impact on both rainfall amounts and rainfall variability in Australia (Trenberth, 1997;
Ropelewski and Halpert, 1987; Drosdowsky and Chambers, 2001; Wang and Hendon,
2007; Rotstayn et al., 2009). ENSO affects the rainfall variability in different parts of
Australia differently (Stone et al., 1996a; Kiem and Franks, 2001; Hope et al., 2009;
Smith et al., 2009). El Ni~n o is associated with a significant reduction in rainfall over
north-eastern and south-eastern Australia (McBride and Nicholls, 1983; Stone and
Auliciems, 1992; Taschetto et al., 2010). Consequently, we also study the impact of
ENSO on the consistency in rainfall distribution and on the potential water resource
availability.
The aims of this paper are
To represent the AE and the entropy of the long-term average monthly rainfall
distribution for Australian stations;
To introduce the consistency index to measure the relative change in the rainfall
distribution across the months for individual years compared to the stable
rainfall distribution;
To compare the consistency indices of Australian rainfall stations for El Ni~n o
n a years;
and La Ni~
To classify the stations into different water resource availability categories;
To produce contour maps for comparing potential water resource availability for
El Ni~
n o and La Ni~
n a years.
The results and figures used in the paper are helpful for understanding the consistency
in rainfall distribution of Australian stations. The contour maps give an indication of the
availability of water resources for different parts of Australia, and the effect of ENSO
on the consistency in rainfall distribution and on the availability of water resources.
Methodology
Entropy and Rainfall Variability
Shannon's entropy (Shannon, 1948), H, for the empirical probability distribution
p1 , p2 ,, pn is defined as
32
Consistency in rainfall and potential water resource availability
n
H p j log2 p j
(2.1)
j 1
n
Where
p
j 1
j
=1. Entropy measures the variability in a random variable. Larger values
of H indicate less variability of the random variable, whereas smaller values indicate
more variability of the random variable. In this paper, we measure the variability in the
apportionment of rainfall amounts within the months of the year using Shannon's
entropy (for the reasons detailed in Section 1).
Maruyama et al. (2002) defined the apportionment entropy (AE) to measure the
variability in the amount of monthly rainfall amounts. Let r j be the total amount of
rainfall during month j in a given year and R be the total rainfall for the same year. The
empirical probability distribution is the relative amount of rainfall p j for month j ; that
is p j rj R . Using Equation (2.1) for this empirical probability distribution, AE is
defined as
12
AE
j 1
rj
R
log2
rj
(2.2)
R
Here the value of AE lies between 0 and log 2 12 3.58 . AE is exactly zero for a given
year when all the yearly rainfall is concentrated to only one month and no rainfall
occurs in other months. If rainfall is evenly distributed among the months of the year so
that rj R 12 , then AE = 3:58 for the year. In the literature (Maruyama and Kawachi,
1998; Kawachi et al., 2001; Maruyama et al., 2002), the rainfall variability for a station
is often measured by the average of yearly AE values over the studied period. However,
the variability in monthly rainfall amounts of a station may vary from year to year.
Investigating the temporal variability in the monthly rainfall distribution is important for
any planning regarding the usage of water resources.
Mishra et al. (2009) introduced the concept of a disorder index (DI), a measure of
entropy-based variability in rainfall distribution as the difference between the maximum
33
Consistency in rainfall and potential water resource availability
possible entropy value under the evenly apportioned state (that is, log2 12 ) and the
actual entropy value obtained for each year in the time series.
DI log 2 12 AEi
(2.3)
DI is simply an origin shift of AE. For almost all rainfall stations, the amount of rainfall
is never evenly distributed across the months of a year. Moreover, in Australia, most
stations have extremely variable rainfall distribution (almost no rainfall in some months
and relatively large amount of rainfall in other months), while some stations have an
almost even rainfall distribution for all months of the year (BoM, 2010). For individual
stations, we propose measuring the variability in long-term average rainfall amounts
across the months of the year and then compare the rainfall distribution of individual
years with the long-term average rainfall distribution. To do this we introduce the
concepts of the entropy of stable rainfall (ESR) and consistency index (CI) in Section
2.2.
Entropy of Stable Rainfall and Consistency Index
The stable rainfall distribution of a station is the long-term mean of rainfall amounts
across the months of the year. For a given rainfall station, let s1 , s 2 , s12 be the average
rainfall amounts (averaged over the entire study period) for months January, February, .
. . , December respectively. S is the average annual rainfall of the station, so that
12
S s j . The probability distribution of stable monthly rainfall amounts is obtained as
j 1
p j s j s . The entropy-based variability in the stable rainfall amounts across the
months of the year is defined as the entropy of stable rainfall (ESR). Using this stable
rainfall distribution and Equation (2.1), the ESR is defined as
ESR
sj
S
log2
sj
(2.4)
S
Note that each station has only one ESR for the study period, indicating the variability
of the average monthly rainfall totals. Smaller values of ESR indicate that the station
34
Consistency in rainfall and potential water resource availability
has more variability and larger values indicate less variability in long-term average
monthly rainfall amounts.
For a station, the consistency in monthly rainfall amounts for a year will be measured as
the deviation of AE of that year from the ESR of the station. As entropy is measured on
the log scale, the ratio of the two entropies will be used to measure the deviation. For a
rainfall station, the consistency index CI i for year i is defined as
CI i AEi ESR
(2.5)
where AE i is the apportionment entropy of year i and ESR is the entropy of stable
rainfall for the station. For years with larger values of CI, the monthly rainfall
distributions of the years are consistent with the stable rainfall distribution of the
station. For smaller values of the CI, the years observe a more inconsistent rainfall
distribution compared to the stable rainfall distribution of the station. Consequently, the
drier months will receive lesser proportions and wetter months will receive larger
proportions of annual rainfall compared to the stable rainfall distribution. In other
words, the dry months will tend to be drier and the wet months will tend to be wetter. CI
is a better measure than DI for comparing the rainfall distribution of individual years, as
CI considers the long-term average rainfall distribution for each month at the respective
station but DI compares to a single value for the station (maximum possible entropy of
log2 12 which is never achievable).
Potential Water Resources Availability
The potential availability of water resources for use in agriculture and other primary
industries depends on the amount of yearly rainfall y and the consistency of rainfall
distribution among the months of the year, CI. To consider these two dimensions of the
rainfall distribution, we consider dividing the studied years into groups using the
arbitrary cut-off points as CI 0.85 and the median annual rainfall ~
y . The studied
years are divided into four categories:
Category I: y ~
y and CI 0.85 ; above median annual rainfall and a more
consistent rainfall distribution across the months;
35
Consistency in rainfall and potential water resource availability
Category II: y ~
y and CI 0.85 ; above median annual rainfall and a more
inconsistent rainfall distribution across the months;
Category III: y ~
y and CI 0.85 ; below median annual rainfall and a more
consistent rainfall distribution across the months;
Category IV: y ~
y and CI 0.85 ; below median annual rainfall and a more
inconsistent rainfall distribution across the months.
Category I may be considered an ideal situation for water resource planning purposes,
with years receiving more than median annual rainfall and more consistent rainfall
distribution across the months of the year compared to the observed historical rainfall
distribution. In contrast, Category IV may be considered as the worst situation for water
resource planning purposes, with less than median rainfall and more inconsistent
rainfall distribution. Stations with predominantly Category IV rainfall patterns possess a
greater risk of experiencing water shortage, at least in some of the months of the year.
Figure 1 Location of the stations studied; the six case study stations mentioned in the
paper are named and indicated using squares.
36
Consistency in rainfall and potential water resource availability
Data and Preliminaries
In this study, we use Australian rainfall stations with monthly rainfall data having long
rainfall records (from 1920 or earlier to 2007) with less than 0.5% of records missing.
Years with missing rainfall records for any of the months are excluded from the analysis
and the maximum number of years we need to omit is three (out of about 100 years).
The effect of the missing values is not too great. This gives 220 stations, and we use six
stations as case studies (Figure 1). These stations represent a variety of types of rainfall
distributions in Australia (Figure 2). Bidyadanga, Alice Springs and Oenpelli have large
variability in monthly rainfall distribution. Trayning and Theebine have less variability
in rainfall distribution and Clarence has an almost even rainfall distribution. Data were
obtained from the Australian Bureau of Meteorology. Years with missing rainfall
records for any of the months are excluded from the analysis and the maximum number
of years we need to omit is three (out of about 100 years). Some parts of Australia
(central and northern parts) do not have many stations with continuous rainfall records
for long periods and hence few rainfall stations from those regions are considered in the
present study. Because of this, our discussion and conclusion focus primarily on
southern and eastern Australia.
The NINO 3.4 index is one of the ENSO indicators, and is based on the sea surface
temperatures. Nino 3.4 is the average sea surface temperature anomaly in the region
bounded by 50 N 50 S , 120 0 170 0 W . The region has large variability on El Ni~n o
time scales, and so is used by many authors to understand the variability in rainfall
n a event is identified
distribution for different parts of Australia. An El Ni~n o or La Ni~
if the five month running average of the NINO 3.4 index exceeds 0.40 for El Ni~n o or
n a for at least six consecutive months (Nicholson and Entekhabi 1987;
0.40 for La Ni~
Wu and Kirtman, 2007; Abtew et al., 2009; Everingham and Reason, 2009; Lee et al.,
2009).
Results and Discussion
For the 220 Australian stations, the variability in rainfall distribution across the months
of the year is measured by the AE and presented by the contour map in left panel of
Figure 3. Stations in north-west Australia observe more variability in rainfall
37
Consistency in rainfall and potential water resource availability
distribution (smaller AE), and south and south-east areas observe relatively less
variability in rainfall distribution. We construct the contour maps using kriging (Diggle
and Ribeiro, 2007) to interpolate the values at unobserved locations from nearby
stations.
Figure 2 Boxplots of the monthly rainfall distributions. The horizontal lines indicate
median rainfall for each month; the boxes indicate the first and third quartiles of the
distribution. The lines extending from the boxes extend to 1.5 times the interquartile
range; circles indicate observations more extreme than this. Note that the vertical scales
are not the same for each station.
38
Consistency in rainfall and potential water resource availability
Figure 3 Contour maps of Australia showing the AE and ESR values.
Variability in long-term average rainfall amounts for different months of the year is
measured by ESR. The contour map in the right panel of Figure 3 represents the values
of the ESR for the Australian stations. The stations in northern Australia observe
substantially more variability in rainfall distributions across the months of the year
(smaller ESR). Stations in southern Australia observe less variability in rainfall
distribution. Some stations in south-east Australia observe almost even rainfall
distribution, on an average, across the months of the year.
For the six case studies, the values of maximum possible entropy, the ESR, the AE of
individual years and the average AE are presented in Figure 4. For Bidyadanga and
Oenpelli, the ESR differs substantially from the value of maximum possible entropy.
These stations observe large variability in the long-term average rainfall distribution.
For the other four stations the ESR do not differ much from the maximum possible
entropy. However, unlike the other three stations, the entropy of individual years for
Alice Springs is much lower than the ESR and the maximum possible entropy. For
Oenpelli, the entropy of most of the individual years is close to the value of ESR but
differs substantially from the value of maximum possible entropy. For Oenpelli, the
rainfall variability of individual years is close to the variability in historical monthly
rainfall amounts. In current literature (Maruyama and Kawachi, 1998; Kawachi et al.,
2001; Maruyama et al., 2002), the average of AE of studied stations is usually compared
with the value of maximum possible entropy. The rainfall amounts are never expected
39
Consistency in rainfall and potential water resource availability
to have an even distribution and so are not comparable with the value of maximum
possible entropy. Moreover, large differences exist in the long-term average monthly
rainfall distribution for the Australian stations. Hence, the use of a single value, the
maximum possible entropy, is not appropriate to compare the consistency of rainfall
distribution of individual years for all the stations. To measure the relative change in the
rainfall distribution of individual years, we use the CI, the ratio of the AE of individual
years to the ESR of respective station.
Figure 4 Maximum possible entropy, ESR, AE of individual years and average AE for
the six studied stations.
For comparing the results, we measure the consistency in rainfall distribution for three
selected rainfall stations by CI defined in Equation (2.5) and disorder in rainfall
distribution by DI defined in Equation (2.3). The results are presented in Figure 5. If the
rainfall in individual years is consistent with the long-term average rainfall distribution
of the station, the values of the CI is close to one (Oenpelli and Clarence). If the rainfall
is evenly distributed across the months of a year, the value of the DI is zero. For
Bidyadanga, the rainfall distribution of individual years differ substantially from even
rainfall distribution (larger values of DI) and also from the stable rainfall distribution of
the station (smaller CI). Oenpelli has a larger DI than Clarence, but both have a similar
CI. This implies, Oenpelli has greater variability in the distribution of the monthly
average rainfall (as measured by the DI), but that each station approximately receives
40
Consistency in rainfall and potential water resource availability
their respective average monthly rainfall distributions each year (as measured by the
CI). Though Oenpelli observes more variability in rainfall distribution across the
months of the year, due to more consistent rainfall pattern from year to year, it is good
for planning of water resources. CI is a good choice for understanding the rainfall
distribution of individual years as it measures the deviation in rainfall distribution of
individual years from the long-term average rainfall distribution of respective station.
Figure 5 The CI and DI of individual years and average of CI and DI for Bidyadanga,
Oenpelli and Clarence.
Figure 6 Map of Australia showing the average consistency index of Australian
rainfall stations.
41
Consistency in rainfall and potential water resource availability
Average consistency in rainfall distribution is measured by the mean of CI over the
studied years for the Australian rainfall stations and is presented in Figure 6. Areas
close to the coastline in northern, southern and eastern Australia have larger values of
CI. These areas observe more consistent rainfall distribution in individual years with
respect to the long-term average. For central and north-western Australia, the CI has
relatively smaller values. These areas observe substantially more inconsistent rainfall
distribution in individual years with respect to the long-term average rainfall
distribution.
Figure 7 Map of Australia showing the average consistency index of Australian
n a years.
rainfall stations for E l Ni~n o and La Ni~
n a on the consistency in rainfall
To quantify the influence of El Ni~n o and La Ni~
n o years is measured and represented
n a and El Ni~
distribution, the average CI for La Ni~
in Figure 7. For the rainfall stations of Tasmania (the small island in southern
n a years. For stations
Australia), the CI is almost identical for El Ni~n o and La Ni~
n a years observe a
located at the eastern coastal areas of mainland Australia, La Ni~
more consistent rainfall distribution (larger CI) with compared to the El Ni~n o years. In
El Ni~
n o years, some of the months receive relatively smaller proportions of the total
annual rainfall of the year than the proportion of rainfall in those months observed in the
long-term average.
42
Consistency in rainfall and potential water resource availability
Figure 8 Years divided into four categories on the basis of CI and total annual rainfall.
The availability of the water resources is determined using the CI and total annual
rainfall, and for any rainfall station the years are categorized into four groups as
mentioned in Section 2. For the six case studies, each year is categorized and located in
Figure 8. Alice Springs has 6.3% and Bidyadanga has 9.4% of years belong to Category
I (more than median annual rainfall with a more consistent monthly rainfall
distribution). The two stations have 43.8% and 38.5% of years respectively to Category
IV (less than median yearly rainfall with a more inconsistent monthly rainfall
distribution). Hence, Alice Springs and Bidyadanga is more likely to have years with
below median and inconsistent rainfall distribution which present difficulty for water
resource planning. This categorization may be useful for determining stations at risk of
water shortage, especially in some months of the year.
The contour maps in Figure 9 represent the percentage of years with different rainfall
categories for the Australian rainfall stations. We put special attention on the contour
maps representing the Category I (top left panel) and Category IV (bottom right panel)
in Figure 9 as these Categories may be represented as the ideal and worst situations
respectively for water resource planning. Stations located in northern, south-western,
south-eastern and eastern Australia observe more than 30% of years in Category I and
43
Consistency in rainfall and potential water resource availability
less than 15% of years in Category IV. These stations may be considered as an ideal
situation regarding to the water resources planning. Relatively larger percentages of
years with Category IV are observed for the stations on the central and south-western
Australia. These stations may be considered as the worst situation regarding to the water
resources planning.
Figure 9 Contour maps representing the percentage of years with different rainfall
categories for the Australian rainfall stations.
The contour maps in Figure 10 represent the percentage of years with Category I and
Category IV in El Ni~n o and La Nino years for the Australian rainfall stations. From the
contour maps in top panels of Figure 10, observe that, in El Ni~n o years, stations in
southern and eastern Australia have a smaller percentage of years in Category 1 than
La Ni~
n a years. Alternatively, from contour maps in the bottom panels of Figure 10,
observe that, almost everywhere in Australia, the El Ni~n o years have larger proportions
n a years. Hence, for almost all Australian rainfall
of years in Category IV than La Ni~
stations, the El Ni~n o years possess greater risk of having below median and relatively
44
Consistency in rainfall and potential water resource availability
inconsistent rainfall distribution and presenting greater challenges for water resource
planning.
Figure 10 Contour maps representing the percentage of years with Category I and
n o years for the Australian rainfall stations.
n a and El Ni~
Category IV in La Ni~
Conclusion
As monthly rainfall amounts of Australian stations are highly skewed, entropy is a better choice
than variance or coefficient of variation for measuring the rainfall variability. Australia
has rainfall stations with reasonably even monthly rainfall distributions, stations with
moderately even rainfall distribution and stations with extremely uneven monthly
rainfall distributions. To measure the variability in long-term average rainfall
distribution, we define the ESR (entropy of stable rainfall). Contour maps of Australia
with the values of ESR provide a clear representation of the long-term average rainfall
45
Consistency in rainfall and potential water resource availability
variability of Australian rainfall stations. The values of ESR for a station may be used
as a baseline to compare the consistency in rainfall distribution of individual years.
We then define the CI (the consistency index) to compare the monthly rainfall
distribution of individual years with the stable rainfall distribution of the station. For
n a years.
almost everywhere in Australia, CI is smaller in El Ni~n o years than in La Ni~
This means that in El Ni~n o years Australian stations receive more inconsistent monthly
n a years.
rainfall distribution than in La Ni~
For the studied stations, the years are divided into four categories on the basis of CI and
total annual rainfall. Category I is considered an ideal situation for water resource
planning with more than average rainfall and consistent rainfall distribution. Category
IV is considered the worst situation for water resource planning with less than average
rainfall and more inconsistent rainfall distribution. Stations with larger percentage of
years belonging to Category I and smaller percentage of years belonging to Category IV
have a reduced risk in terms of potential water resources availability. A good
understanding of the water resource availability may be obtain from the contour maps
with the percentage of years with different categories of rainfall distribution of
Australian rainfall stations.
For El Ni~n o events, almost everywhere in Australia observes a higher percentage of
years with below average and more inconsistent rainfall distributions. These years have
a higher probability of experiencing a water shortage, especially in some months of the
year. Special programs, including water preservation or efficient water usage strategies,
should be implemented to cater for the necessity of water in the months with an
expected shortage of water.
Acknowledgement: The authors thank Sarah Lennox (CSIRO) for assistance with
graphs. The comments of the reviewers are gratefully acknowledged; they improved the
flow, interpretation and understanding of the paper.
46
Consistency in rainfall and potential water resource availability
References
Abtew W, Melesse AM, Dessalegne T. 2009. El Ni~n o Southern Oscillation link to the
Blue Nile River Basin hydrology. Hydrological Processes 23: 3653 3660.
Avseth P, Mukerji T, Mavko G. 2005. Quantitative Seismic Interpretation: Applying
Rock Physics Tools to Reduce Interpretation Risk. Cambridge University Press, New
York.
Beeton RJS, Buckley KI, Jones GJ, Morgan D, Reichelt RE, Trewin D. 2006. Australia
state of the environment 2006. Technical report, Independent report to the Australian
Government Minister for the Environment and Heritage, Department of the
Environment and Heritage, Canberra.
Bewket W. 2007. Rainfall variability and agricultural vulnerability in the Amhara
region, Ethiopia. Ethiopian Journal of Development Research. 29(1): 1 34.
Bureau of Meteorology (BoM). 2010. Australian Government Website. URL: http://
www.bom.gov.au/lam/climate/levelthree/ausclim/zones.htm.
Conway D, Allison E, Felstead R, Goulden M. 2005. Rainfall Variability in East Africa:
Implications for Natural Resources Management and Livelihoods Philosophical
Transactions: Mathematical, Physical and Engineering Sciences 363(1826): 49 54.
Dewar RE, Wallis JR. 1999. Geographical patterning of interannual rainfall variability
in the tropics and near tropics: An L-moments approach. Journal of Climate 12: 3457
3466.
Drosdowsky W, Chambers LE. 2001. Near-Global sea surface temperature anomalies as
predictors of Australian seasonal rainfall. Journal of Climate 14: 1677 1687.
Ebrahimi N, Maasoumi E, Soo ES. 1999. Ordering univariate distributions by entropy
and variance. Journal of Econometrics 90(2): 317 337.
47
Consistency in rainfall and potential water resource availability
Everingham YL, Reason CJC. 2009. Interannual variability in rainfall and wet spell
frequency during the New South Wales sugarcane harvest season. International Journal
of Climatology, Published online in Wiley InterScience. DOI: 10.1002/joc.2066.
George DA, Birch C, Buckley D, Partridge IJ, Clewett JF. 2005. Assessing climate risk
to improve farm business management. Extension Farming Systems Journal 1(1): 71
77.
Hammer GL, Holzworth DP, Stone RC. 1996. The value of skill in seasonal climate
forecasting to wheat crop management in a region with high climatic variability.
Australian Journal of Agricultural Research 47: 717 737.
Hope P, Timbal B, Fawcett R. 2009. Associations between rainfall variability in the
southwest and southeast of Australia and their evolution through time. International
Journal of Climatology, Published online in Wiley InterScience. DOI: 10.1002/
joc.1964.
Kawachi T, Maruyama T, Singh VP. 2001. Rainfall entropy for delineation of water
resources zones in Japan. Journal of Hydrology 246: 36 44.
Kiem AS, Franks SW. 2001. On the identification of ENSO–induced rainfall and runoff
variability: A comparison of methods and indices. Hydrological Sciences 46(5): 715
727.
Lee CK, Shen SSP, Bailey B, North GR. 2009. Factor analysis for El Ni~n o signals in
sea surface temperature and precipitation. Theoretical and Applied Climatology 97: 195
203.
Maruyama T, Kawachi T. 1998. Evaluation of rainfall characteristics using entropy.
Journal of Rainwater Catchment System 4(1): 7 10.
Maruyama T, Kawachi T, Maeda S. 2002. Entropy-based assessments of monthly
rainfall variability. Journal of Rainwater Catchment Systems 8(1): 21 25.
48
Consistency in rainfall and potential water resource availability
Maruyama T, Kawachi T, Singh VP. 2005. Entropy-based assessment and clustering of
potential water resources availability. Journal of Hydrology 309: 104 113.
McBride JL, Nicholls N. 1983. Seasonal relationships between Australian rainfall and
southern oscillation. Monthly weather Review 111: 1998 2004.
Meinke H, Devoil P, Hammer GL, Power S, Allan R, Stone RC, Folland C, Potgieter A.
2005. Rainfall variability at decadal and longer time scales: Signal or noise? Journal of
Climate 18: 89 96.
Meneghini B, Simmonds I, Smith IN. 2007. Association between Australian rainfall and
the Southern Annular Mode. International Journal of Climatology 27: 109 121.
Mishra AK, Ozgera MB, Singh VP. 2009. An entropy-based investigation into the
variability of precipitation. Journal of Hydrology 370: 139 154.
Mollah WS, Cook IM. 1996. Rainfall variability and agriculture in the semi-arid
tropics-the Northern Territory, Australia. Agricultural and Forest Meteorology 79: 39
60.
Nicholson SE, Entekhabi D. 1987. Rainfall variability in equatorial and southern Africa:
relationships with sea-surface temperatures along the south-western coast of Africa.
Journal of Climate and Applied Meteorology 26: 561–578.
O'Reagain P, Bushell JC, Holloway C, Reid A. 2009. Managing for rainfall variability:
Effect of grazing strategy on cattle production in a dry tropical Savanna. Animal
Production Science 49: 85 99.
Rotstayn LD, Collier MA, Dix MR, Feng Y, Gordon HB, O'Farrell SP, Smith IN,
Syktus J. 2009. Improved simulation of Australian climate and ENSO related rainfall
variability in a global climate model with an interactive aerosol treatment. International
Journal of Climatology, Published online in Wiley InterScience. DOI: 10.1002/joc.
1952.
49
Consistency in rainfall and potential water resource availability
Shannon CE. 1948. Mathematical theory of communication. The Bell System Technical
Journal xxvii: 379 423.
Singh VP. 1997a. Effect of spatial and temporal variability in rainfall and watershed
characteristics on stream flow hydrograph. Hydrological Processes 11: 1649 1669.
Singh VP. 1997b. The use of entropy in hydrology and water resources. Hydrological
Processes 11: 587 626.
Singh VP, Jain SK, Tyagi AK. 2007. Risk and Reliability Analysis: A Handbook for
Civil and Environmental Engineers. ASCE Publications, San Diego.
Smith IN, Collier M, Rotstayn L. 2009. Patterns of summer rainfall variability across
tropical Australia-results from EOT analysis. Technical report, 8th World IMACS/
MODSIM Congress, Cairns, Australia.
Stone RC, Auliciems A. 1992. SOI phase relationships with rainfall in eastern Australia.
International Journal of Climatology 12: 625 636.
Stone RC, Hammer GL, Marcussen T. 1996. Prediction of global rainfall probabilities
using phases of the Southern Oscillation Index. Nature 384: 252 255.
Taschetto AS, Haarsma RJ, Gupta AS, Ummenhofer CC, England MH. 2010.
Teleconnections associated with the intensification of the Australian monsoon during
El Ni~
n o Modoki events. Conference Series: Earth and Environmental Science. 11,
012031.
Van Etten EJB. 2009. Inter-annual rainfall variability of arid Australia: greater than
elsewhere? Australian Geographer 40(1): 109 120.
Wang G, Hendon HH. 2007. Sensitivity of Australian rainfall to inter- El Ni~n o
variations. Journal of Climate 20(16): 4211 4226.
Wimalasuriya R, Ha A, Tsafack E, Larson K. 2008 Rainfall variability and its impact on
dry land cropping in Victoria. Technical report, 52nd Annual Conference of the
Australian Agricultural and Resource Economics Society (AARES), Canberra.
50
Consistency in rainfall and potential water resource availability
Wu R, Kirtman BP. 2007. Roles of the Indian Ocean in the Australian Summer
Monsoon-ENSO Relationship. Journal of Climate 20: 4768 4788.
51
Tweedie distributions for modelling monthly rainfall in Australia
CHAPTER 4: TWO TWEEDIE DISTRIBUTIONS THAT ARE
NEAR-OPTIMAL FOR MODELLING MONTHLY RAINFALL IN
AUSTRALIA
Authors: Md Masud Hasan and Peter K. Dunn
Affiliations: Department Health and Sport Sciences, Faculty of Science, Health and Education,
University of the Sunshine Coast.
Journal: International Journal of Climatology, (2011) 31: 1389–1397.
JCR ranking: 18/63 (Meteorology and Atmospheric Sciences)
Impact factor: 2.347
ARC tier ranking: A
52
Tweedie distributions for modelling monthly rainfall in Australia
Statement of Intellectual Contribution
I, Md Masud Hasan, have made substantial independent intellectual contributions to the
research paper ―Two Tweedie distributions that are near-optimal for modelling monthly
rainfall in Australia‖. The intellectual contributions include the development of the
study hypothesis, identification and modification of methodology, development of new
applications, responsibility for independent analysis and interpretation of results.
…………………………….
…………………………………………
Md Masud Hasan
Date
………………………………..
……………………………..............
Peter K Dunn
Date
53
Tweedie distributions for modelling monthly rainfall in Australia
Abstract
Statistical models for total monthly rainfall used for forecasting, risk management and
agricultural simulations are usually based on gamma distributions and variations. In this
study, we examine a family of distributions (called the Tweedie family of distributions)
to determine if the choice of the gamma distribution is optimal within the family. We
restrict ourselves to the exponential family of distributions as they are the response
distributions used for generalized linear models (GLMs), which has numerous
advantages. Further, we restrict ourselves to distributions where the variance is
proportional to some power of the mean, as these distributions also have desirable
properties. Under these restrictions, an infinite number of distributions exist for
modelling positive continuous data and include the gamma distribution as a special
case. Results show that for positive monthly rainfall totals in the data history for a
particular station, monthly rainfall is optimally or near–optimally modelled using the
gamma distribution by varying the parameters of the gamma distribution; using
different distributions for each month cannot improve on this approach. In addition,
under the same model restrictions, monthly rainfall totals that include zeros are also
well modelled by the same family of distributions. Hence monthly rainfall can be
suitably modelled using one of two Tweedie distributions depending on whether exact
zeros appear in the rainfall history. We propose a slight variation of the gamma
distribution for use in practice. This model fits the data almost as well as the gamma
distribution but admits the possibility that future months may have zero rainfall.
KEY WORDS: Rainfall modelling; Tweedie family of distributions; Poisson–gamma
distribution
54
Tweedie distributions for modelling monthly rainfall in Australia
Introduction
Rainfall models are important for forecasting and simulation purposes with extended
applications in modelling runoff, soil water content and for forecasting drought and
flood (Toth et al., 2000; Aubert et al., 2003). Appropriate rainfall models assist in
developing better climate-related risk management and decision-making capabilities.
For modelling purposes, two different aspects of rainfall are common on any given
timescale: the occurrence of rainfall and the amount of rainfall (Dunn, 2004). Rainfall
models exist for different timescales such as hourly, daily, weekly, monthly, seasonal or
annual (Boer et al., 1993; Sharda and Das, 2005; Aksoy, 2006; Tilahun, 2006). To
model the occurrence of daily rainfall, first-order (Gabriel and Neumann, 1962; El-seed,
1987), higher order (Katz, 1977; Deni et al., 2009), hybrid (Wilks, 1999) and hidden
(Robertson et al., 2003) Markov chain models have been used. First-order Markov chain
models assume that the occurrence of rainfall on a day depends on the occurrence of
rainfall on the previous day. Higher order Markov chains consider the occurrence of
rainfall on a day depends on the occurrence of rainfall on two or more days earlier.
Higher order Markov models are more complex, but perform marginally better (Deni et
al., 2009). Hybrid Markov chains consider different orders for wet and dry days while
hidden Markov chains consider some hidden states. Chandler (2005) used logistic
regression to model dry or wet days as a function of site altitude, North Atlantic
Oscillation, seasonality and autocorrelation (indicators for rain on each of previous 5
days, plus persistence indicators for rain on both previous 2 days and all previous 7
days).
Sometimes, modelling the amount of rainfall is more important than modelling the
occurrence of rainfall. Using rainfall data from New South Wales, Boer et al. (1993)
used linear regression to model the amount of seasonal and annual rainfall as a function
of longitude, latitude and altitude of the stations. Chowdhury and Sharma (2007) used a
linear regression model to quantify the effect of El Ni~n o Southern Oscillation on the
amount of monthly rainfall. Considering nonlinear effects of some covariates on
monthly rainfall amount, Zaw and Naing (2008) used polynomial regression to model
the amount of monthly rainfall in Myanmar. One of the basic assumptions regarding the
abovementioned models is that the amount of rainfall is normally distributed with
55
Tweedie distributions for modelling monthly rainfall in Australia
constant variance. For some stations, the amounts of rainfall on some timescales (e.g.
annual) approximately follow a normal distribution when the use of normal distributions
is appropriate. However, the amount of monthly, weekly or daily rainfall usually does
not follow normal distribution and is right skewed, and so alternative distributions are
needed to model the amount of rainfall on shorter timescales.
To model the right-skewed daily rainfall amounts, distributions that have been
employed include the gamma (Aksoy, 2006), truncated gamma (Das, 1955), kappa
(Meilke, 1973), generalized log-normal (Swift and Schreuder, 1981), mixed exponential
(Chapman, 1997; Wilks, 1998, 1999) and mixed gamma (Jamaludin and Jemain, 2008).
Jamaludin and Jemain (2008) used exponential, gamma, mixed exponential and mixed
gamma distributions to describe the daily rainfall amount in Malaysia, and based on the
Akaike Information Criteria (AIC), they showed that the mixture distributions are better
than single distributions for describing the amount of daily rainfall. Comparing lognormal, gamma, Weibull and log-logistic distributions on the non-zero weekly rainfall
data from Dehradun, India, Sharda and Das (2005) showed that the Weibull distribution
fits best (on the basis of the Anderson-Darling test). Taking 29 stations from Sen and
Eljadid (1999) showed that, for monthly rainfall amounts, the gamma distribution fits
well. Compared with other Pearsonian distributions (Pearsonian I and Pearsonian IX),
the gamma distribution fits best for modelling the amount of monthly rainfall in the
Asian summer monsoon (Mooley, 1973). Tilahun (2006) compared five different
distributions (normal, log-normal, gamma, Weibull and Gumbel) for modelling the
amount of rainfall in wet months in eight rainfall stations in Ethiopia and found none
were optimal for every station.
Another alternative for modelling right-skewed rainfall amounts is to use distributions
from a special family called the exponential dispersion model
(EDM) family of
distributions (Jørgensen, 1997). The EDM family of distributions is the response
distributions for generalized linear models (GLMs) (McCullagh and Nelder, 1989) and
includes common distributions such as the binomial, Poisson, gamma and normal
distributions. The models are widely used as the GLM framework is already in place for
fitting models based on the EDM family of distributions and for diagnostic testing. In
addition, covariates are easily incorporated into the modelling procedure (Jørgensen,
56
Tweedie distributions for modelling monthly rainfall in Australia
1987). GLMs have been used for fitting models to climatological data such as rainfall
by numerous researchers (Coe and Stern, 1982; Wilks, 1999; Chandler, 2005). The
common models used in modelling monthly, weekly or daily rainfall amount have
difficulty with the mixture of discrete (exact zero when no rainfall is recorded) and
continuous (rainfall amount with non-zero rainfall recorded) data. To overcome the
difficulty, some authors used logistic regression (Chandler and Wheater, 2002) or
Markov chains (Richardson and Wright, 1984; Stern and Coe, 1984; Laux et al., 2009)
to model the occurrence of wet or dry days, then gamma distributions to model the
amount of rainfall on wet days. For example, Das et al. (2006) used Markov chains for
rainfall occurrence and gamma distribution to model the amount of weekly rainfall in
Bihar, India. An alternative approach was adopted by Glasbey and Nevison (1997), who
applied a monotonic transformation of rainfall data to define a latent Gaussian variable
with zero rainfall corresponding to censored values below some threshold (1.05 mm).
Husak et al. (2007) used a conditional distribution by accumulating probabilities
conditional on the presence of rainfall. This is combined with a mixture coefficient used
to account for the probability of no rain to create the probability distribution. Yoo et al.
(2005) used mixed gamma distribution for modelling the amount of daily rainfall for
both wet and dry periods. The distribution has two parts: one is the probability of
having a dry day and the second is the probability of getting a wet day multiplying by a
gamma distribution explaining the amount of rainfall on a wet day.
Dunn (2004) used Poisson–gamma distributions to model the occurrence and amount of
rainfall simultaneously. The distributions in the Poisson–gamma family belong to the
EDM family of distributions (Jørgensen, 1997), upon which the GLMs are based.
Clearly, numerous probability models exist for modelling rainfall over various
timescales. Numerous studies have fitted particular distributions to monthly rainfall,
using the same distribution for each month but by varying the parameters, such as the
mean and the variance, for each month (Mooley 1973; Husak et al., 2007; Piantadosi et
al., 2009). The amount of rainfall on different months may follow different distributions
rather than following the same distribution with varying parameters. We explore the
possibility that different distributions are needed for each month by considering a broad
family of distributions. To do so, we restrict ourselves to EDM family of distributions
as these distributions are the response distributions for GLMs. Further, we consider
57
Tweedie distributions for modelling monthly rainfall in Australia
EDMs where the variance is proportional to some power of the mean (often called the
Tweedie family of distributions), as these distributions have properties useful for
rainfall modelling (discussed in Section 3). The Tweedie family includes distributions
suitable for modelling positive continuous data (such as the gamma) and also for
modelling positive continuous data with exact zeros (such as Poisson–gamma).
We first discuss the data (Section 2), and then introduce the Tweedie distributions and
their properties (Section 3). The results and discussion (Section 4) is followed by some
concluding comments (Section 5).
Data
To study the different features of rainfall distribution, the monthly rainfall data from
four Australian rainfall stations were taken as case studies (Figure 1) covering the
period from 1910 to 2007 and were obtained from the Australian Bureau of
Meteorology. Two were dry stations, Bidyadanga and Yoweragabbie, in Western
Australia; and two were wet stations, Cowal in Queensland and Clarence in New South
Wales. Table I shows the summary statistics of the rainfall distribution for the four
rainfall stations. The dry stations had high percentages of months with zero rainfall
(34.4% and 17.5% of all months for Bidyadanga and Yoweragabbie, respectively). The
wet stations, Cowal and Clarence, each had less than 1% of months with no rainfall. As
examples, consider the monthly rainfall distribution for Bidyadanga (Figure 2) and
Cowal (Figure 3): all monthly rainfall distributions are highly skewed to the right. For
Bidyadanga, the distributions are quite different: summer months (December to March)
clearly receive larger amounts of rainfall than the other months in general. For Cowal,
the rainfall distributions are similar over the different months. Another 98 stations from
different parts of Australia were also studied (Figure 1).
58
Tweedie distributions for modelling monthly rainfall in Australia
Figure 1 Locations of the stations studied; the four case study stations mentioned in
the paper are named and indicated using squares.
Figure 2 Monthly rainfall distributions for Bidyadanga station.
59
Tweedie distributions for modelling monthly rainfall in Australia
Figure 3 Monthly rainfall distributions for Cowal station.
Table 1 Summary statistics of the monthly rainfall for Bidyadanga, Yoweragabbie,
Cowal and Clarence from 1910 to 2007.
Stations
Bidyadanga
Yoweragabbie
Cowal
Clarence
Mean (mm)
Median (mm)
Percentage of months
with no rainfall
42.3
2.3
34.4
19.9
11.4
17.5
92.3
68.3
0.9
89.7
66.4
0.5
IQRa (mm)
46.2
24.9
91.2
85.0
Standard deviation (mm)
86.4
26.2
90.7
80.1
Coefficient of variation (%)
a
IQR, interquartile range.
204.5
131.7
98.3
89.3
60
Tweedie distributions for modelling monthly rainfall in Australia
Methodology
Tweedie Densities
Figures 2 and 3 show that the monthly rainfall distributions are highly skewed to the
right, and hence researchers have used a variety of distributions for modelling monthly
rainfall Y, including the log-normal, Weibull, generalized log-normal, gamma and
mixed gamma distributions (Mooley, 1973; Sen and Eljadid, 1999; Tilahun, 2006). In
all the abovementioned literatures, one type of distribution is used to model the rainfall
amount of all the months. As pre-empted in Section 1, we explore the possibility that
different distributions may be required for each month. We do this by embedding in a
broad family of distributions, called the EDM family of distributions, whose probability
functions have the form
1
f y; , a y, exp y
(1)
where is the mean of the distribution, 0 and the functions and are known
functions. Since these distributions are the response distributions for GLMs, they are
useful for modelling and simulation using the extensive GLM literature and software
already in place (McCullagh and Nelder, 1989). For EDMs, the mean is d / d
and the variance is Var[ Y ] d 2 / d 2 . As is a one-to-one function of ,
then Var[ ] d 2 / d 2 , called the variance function, which characterizes the
distribution in the class of EDMs.
Within the class of EDMs, we restrict ourselves to those distributions with the variance
function V p for p 0 , 1 , where the index p specifies the particular
distribution. These distributions are often called the Tweedie family of distributions
(Jørgensen, 1987, 1997; Dunn and Smyth, 2005). While these restrictions may appear
restrictive, special cases include many popular distributions, including the normal
( p 0) , Poisson , ( p 1) gamma ( p 2) and inverse Gaussian ( p 3) distributions.
Apart from these special cases, the probability functions for the Tweedie distributions
have no closed form. For p 0 , the distributions are suitable for modelling positive,
61
Tweedie distributions for modelling monthly rainfall in Australia
right-skewed data. For 1 p 2 , the Tweedie family is suitable for modelling positive
continuous data with exact zeros and are sometimes called the Poisson–gamma
distributions.
Figure 4 Scatterplots showing the mean–variance relationship (measured on log scale)
of monthly rainfall for the stations Bidyadanga, Yoweragabbie, Cowal and Clarence
from 1910 to 2007, for all months. Each point represents the mean and variance of the
amount of rainfall for a single month.
The case 1 p 2 deserves special mention. This case corresponds to a Poisson sum of
gamma distributions and has an interesting interpretation in the context of rainfall
modelling. Assume the amount of rainfall on day i 1,2,N is Ri where N is the
number of days with non-zero rainfall in the respective month. Then N has an
approximate Poisson distribution. Note that there will be months with no rainfall events
(when N 0 ). The total monthly rainfall Y is the Poisson sum of the gamma random
variables, so that Y R1 R2 RN , defining Y 0 when N 0 . The probability
function of Y is complicated and cannot be written in a closed form (Dunn and Smyth,
2005).
62
Tweedie distributions for modelling monthly rainfall in Australia
There are some important properties of the Tweedie distributions that make them
particularly appealing for use in rainfall modelling (Dunn, 2004):
•
There is some intuitive appeal for the models, considering total rainfall as a sum
of rainfall on smaller timescales (outlined above).
•
These distributions belong to the exponential family of distributions, upon which
GLMs are based. Consequently, there is a framework already in place for fitting
models based on the Tweedie distributions and for diagnostic testing. In
addition, covariates can be incorporated into the modelling procedure.
•
They provide a mechanism for understanding the fine scale structure in coarse
scale data (Dunn, 2004).
•
All exponential dispersion models that are closed with respect to scale
transformations are Tweedie models (Jørgensen, 1997; Jiang, 2007); that is, if Y
is from a particular Tweedie distribution with index p and c is some constant,
then cY is also from the same Tweedie distribution. While these are important
considerations, the Tweedie distributions also fit the data well, in practice, as
shown below.
Results and Discussion
To determine the appropriate Tweedie distribution for a monthly rainfall distribution,
the mean–variance relationship defined by the index parameter p must be determined.
The mean–variance relationship can be studied informally. Compute the mean and
variance of the rainfall amounts for each month, producing a mean and variance of the
monthly rainfall for each month over all years. Plotting the log of the variance against
the log of the mean (Figure 4) shows an approximate linear relationship between the
group means and group variances for all four rainfall stations. The variance of the
amount of rainfall is clearly not constant but depends on the mean; express the
relationship between mean and variance of the amount of rainfall as
log(groupvariance
) p log(groupmean) . Rearranging,
groupvariance exp[ p log(groupmean)]
63
Tweedie distributions for modelling monthly rainfall in Australia
const (group mean) p
that is, the approximate linear relationship implies a variance function of the form
V ( ) p , precisely the variance function for the Tweedie distributions.
Figure 5 The mean–variance relationships for the monthly rainfall distributions at
Bidyadanga, Yoweragabbie, Cowal and Clarence stations. Each line represents the mean–
variance relationship for a different month computed from 1910 to 2007.
More formally, estimating the index parameter requires sophisticated numerical
techniques
(Dunn
and
Smyth,
2005,
2008),
as
implemented
in
the
tweedie.profile function of the R (R Development Core Team, 2010) package
tweedie (Dunn, 2010). The slopes of the lines in mean–variance plots approximately
determine the p-indices, and hence the distributions in the Tweedie family. For Cowal
and Clarence, the mean–variance relationship is not the same for all months (Figure 5),
implying different Tweedie distributions are appropriate for different months. However,
for Bidyadanga and Yoweragabbie, the mean–variance relationship is similar for most
of the months. More formally, estimate p (denoted by p̂ ) using a profile maximum
likelihood estimate for each month, along with the 95% confidence intervals. This is
64
Tweedie distributions for modelling monthly rainfall in Australia
possible using the tweedie package (Dunn, 2010) and the function tweedie.
profile in R. The results for the four case studies are shown in Figure 6.
By embedding in the Tweedie family of distributions, we determine the distribution that
is optimal or nearly optimal to model the rainfall total for each month. For months with
Y 0 , necessarily 1 p 2 . If Y > 0 (i.e. no Y = 0), p can be 0 or greater than 1,
though commonly p 2 is obtained. For Bidyadanga in January, pˆ 2 , hence a gamma
distribution is near-optimal for modelling the amount of January rainfall. Apart from
January, the values of p̂ for all months are very similar, and all the confidence intervals
contain p 1.6 . The confidence intervals for p for Yoweragabbie contain p 1.6 for
all months. For Cowal, pˆ 1.6 for the months from June to November. For the other
months, the confidence intervals for p span p 2.0 , indicating that the use of gamma
distribution is appropriate. For Clarence, pˆ 1.6 for February, March, July, August,
September and November; for the other months, the gamma distribution is near-optimal.
Quantile residuals were used to assess how well the distributions fit the original data
(Dunn and Smyth, 1996). These have an exact standard normal distribution (apart from
sampling error) provided that the correct distribution is used. Six sample QQ-plots of
the quantile residuals (Figure 7) show, in all cases, that the Tweedie distributions fit the
total monthly rainfall well. Other residuals, such as deviance and Pearson residuals,
have difficulty with the exact zeros (Dunn and Smyth, 1996). On the basis of the results
from 102 studied stations, the following observations are made. For months with Y 0 ,
the Tweedie distributions are appropriate with pˆ 1.6 . For months with Y 0 for all
years, the gamma distribution is almost always the optimal or near-optimal choice of
distribution within the Tweedie class of distributions – no other Tweedie distribution is
a better choice for modelling such data.
65
Tweedie distributions for modelling monthly rainfall in Australia
Figure 6 The 95% confidence intervals of p-indices for different months (1 = January, 2
= February and so on) for the stations Bidyadanga, Yoweragabbie, Cowal and Clarence.
However, using the gamma distribution explicitly excludes the possibility of any
months in the simulated future of receiving zero rainfall; this may be unrealistic. In a
small number of cases, months with Y 0 are optimally modelled using pˆ 1.6 . Very
rarely, we found months with Y 0 where pˆ 1 . In these cases, the profile plots are
unhelpful due to an artefact of data, noted by numerous authors (Jørgensen, 1987;
Jørgensen and Paes de Souza, 1994; Gilchrist and Drinkwater, 1999) and has been
illustrated and described technically (Dunn and Smyth, 2005). In these rare cases, we
propose using pˆ 1.6 ; a QQ-plot based on pˆ 1.6 shows no problems with the model.
Most stations are modelled using pˆ 1.6 for most months; stations where pˆ 2 are
generally concentrated on the coastline (Figure 9).
66
Tweedie distributions for modelling monthly rainfall in Australia
Figure 7 Sample QQ-plots of the quantile residuals after fitting Tweedie distributions to
monthly rainfall totals for different months for the stations Bidyadanga, Yoweragabbie,
Cowal and Clarence. An ideal plot would show the points falling on the solid line,
which corresponds to the standard normal distribution.
Figure 8 The QQ-plots of the quantile residuals after fitting Tweedie distributions to
monthly rainfall totals for the months in the upper panel of Figure 7, with p 1.99 .
Conclusion
We have considered the Tweedie family of distributions for modelling total monthly
rainfall. These distributions have many desirable properties: they belong to the EDM
family and so can be used in the GLM framework with all the advantages this brings;
67
Tweedie distributions for modelling monthly rainfall in Australia
there is intuitive appeal as a Poisson sum of gamma distributions; they provide
mechanisms for understanding the fine-scale structure of the data; and they are all
closed with respect to scale transformations. Under these conditions, we have shown
that the gamma distribution (the Tweedie distribution with p 2 ) is almost always the
optimal or near-optimal distribution for modelling positive monthly rainfall amounts in
Australian stations; no other distribution in the Tweedie class is a better choice. In a
small number of cases, pˆ 1.6 seems appropriate. For months where some months
record exactly zero rainfall, Poisson–gamma distributions are shown to fit well using
pˆ 1.6 . These results mean that the gamma distribution is almost always the optimal
distribution among those studied for modelling monthly rainfall totals when Y 0 ; in
rare cases, pˆ 1.6 is suitable. Consequently, for most cases, simulation studies can do
no better (within the class of distributions studied) than to use the gamma distribution as
a basis for simulation for any month of the year with Y > 0 in the available data history.
We propose that simulations instead should be based on a value of p slightly less than
2, say p 1.99 . This has the advantage of still modelling the data well (compare the
QQ-plots of quantile residuals in the top panels of Figure 7 with the QQ-plots of Figure
9), yet admitting the possibility of zero rainfall in the simulated future. The values
p 2 and p 1.99 are both within all of the 95% confidence intervals for p , so either
choice is sensible based on the profile likelihood plots.
68
Tweedie distributions for modelling monthly rainfall in Australia
Figure 9 Maps of Australia with black and grey dots representing the value of p for
different months for 102 rainfall stations. Black dots represent p 1.6 and grey dots
represent p 2 .
Acknowledgements
The authors acknowledge Dr Leigh Findlay for editorial assistance with the manuscript.
The comments of a reviewer are gratefully acknowledged; they improved the flow,
interpretation and understanding of the paper.
69
Tweedie distributions for modelling monthly rainfall in Australia
References
Aksoy H. 2006. Use of gamma distributions in hydrological analysis. Turkish Journal of
Engineering and Environmental Sciences 24: 419–428.
Aubert D, Loumagne C, Oudin L. 2003. Sequential assimilation of soil moisture and
stream flow data in a conceptual rainfall runoff model. Journal of Hydrology 280: 145–
161.
Boer R, Fletcher DJ, Campbell LC. 1993. Rainfall patterns in a major wheat-growing
region of Australia. Australian Journal of Agricultural Research 44: 609–624.
Chandler RE. 2005. On the use of generalized linear models for interpreting climate
variability. Environmetrics 16: 699–715.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using generalized
linear models: a case study from the west of Ireland. Water Resources Research 38(10):
1192–1202.
Chapman TG. 1997. Stochastic models for daily rainfall in the western Pacific.
Mathematics and Computers in Simulation 43: 351–358.
Chowdhury S, Sharma A. 2007. Mitigating parameter bias in hydrological modelling
due to uncertainty in covariates. Journal of Hydrology 340: 197–204.
Coe R, Stern RD. 1982. Fitting models to daily rainfall. Journal of the Applied
Meteorology 21: 1024–1031.
Das PK, Subash N, Sikka AK, Sharda VN, Sharma NK. 2006. Modelling weekly
rainfall using gamma probability distribution and Markov chain for crop planning in a
subhumid (dry) climate of central Bihar. Indian Journal of Agricultural Sciences 76(6):
358–361.
Das SC. 1955. The fitting of truncated Type III curves to daily rainfall data. Australian
Journal of Physics 8: 298–304.
70
Tweedie distributions for modelling monthly rainfall in Australia
Deni SM, Jemain AA, Ibrahim K. 2009. Fitting optimum order Markov chain models
for daily rainfall occurrences in peninsular Malaysia. Theoretical and Applied
Climatology 97: 109–121.
Dunn PK. 2004. Occurrence and quantity of precipitation can be modelled
simultaneously. International Journal of Climatology 24: 1231–1239.
Dunn PK. 2010. Tweedie: Tweedie exponential family models. R package, R package
version 2.0.5. Vienna, Austria.
Dunn PK, Smyth GK. 1996. Randomized quantile residuals. Journal of Computational
and Graphical Statistics 5(3): 236–244.
Dunn PK, Smyth GK. 2005. Series evaluation of Tweedie exponential dispersion model
densities. Statistics and Computing 15: 267–280.
Dunn PK, Smyth GK. 2008. Evaluation of Tweedie exponential dispersion model
densities by Fourier inversion. Statistics and Computing 18: 73–86.
El-seed AMG. 1987. An application of Markov chain model for wet and dry spell
probabilities at Juba in southern Sudan. Geojournal 15(4): 420–424.
Gabriel KR, Neumann J. 1962. A Markov chain model for daily rainfall occurrence at
Tel Aviv. Journal of the Royal Meteorological Society 88: 90–95.
Gilchrist R, Drinkwater D. 1999. Fitting Tweedie Models to Data with Probability of
Zero Responses. Technical Report, Proceedings of the 14th International Workshop on
Statistical Modelling, Graz, Austria, July 19–23.
Glasbey CA, Nevison IM. 1997. Rainfall modelling using a latent Gaussian variable.
Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, and
Future Directions. Springer, New York, 233–242.
Husak GJ, Michaelsen J, Funk C. 2007. Use of the gamma distribution to represent
monthly rainfall in Africa for drought monitoring applications. International Journal of
Climatology 27: 935–944.
71
Tweedie distributions for modelling monthly rainfall in Australia
Jamaludin S, Jemain AA. 2008. Fitting the statistical distribution for daily rainfall in
peninsular Malaysia based on the AIC criterion. Journal of Applied Sciences Research
4: 1846–1857.
Jiang J. 2007. Linear and Generalized Linear Mixed Models and Their Applications.
Springer: New York.
Jørgensen B. 1987. Exponential dispersion models (with discussion). Journal of the
Royal Statistical Society Series B 49: 127–162.
Jørgensen B. 1997. The Theory of Dispersion Models. Chapman and Hall: London.
Jørgensen B, Paes de Souza MC. 1994. Fitting Tweedie‘s compound Poisson model to
insurance claims data. Scandinavian Actuarial Journal 1: 69–93.
Katz RW. 1977. Precipitation as a chain dependent process. Journal of the Royal
Statistical Society Series B 16: 671–676.
Laux P, Wagner S, Wagner A, Jacobeit J, Bardossy A, Kunstmann H. 2009. Modelling
daily precipitation features in the Volta Basin of west Africa. International Journal of
Climatology 21: 5113–5134.
McCullagh P, Nelder JA. 1989. Generalized Linear Models, 2nd edition, Chapman and
Hall: London.
Meilke PW Jr. 1973. Another family of distributions for describing and analysing
precipitation data. Journal of Applied Meteorology 12: 275–280.
Mooley DA. 1973. Gamma distribution probability model for Asian summer monsoon
monthly rainfall. Monthly Weather Review 101(2): 160–176.
Piantadosi J, Boland J, Howlett P. 2009. Generating synthetic rainfall on various
timescales daily, monthly and yearly. Environmental Modeling and Assessment 14:
431–438.
72
Tweedie distributions for modelling monthly rainfall in Australia
R Development Core Team. 2010. R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing: Vienna, Austria. ISBN 3-90005107-0.
Richardson CW, Wright DA. 1984. WGEN: A Model for Generating Daily Weather
Variables. Technical Report No 8, United States Department of Agriculture, Agriculture
Research Service.
Robertson AW, Kirshner S, Smyth P. 2003. Hidden Markov Models for Modeling Daily
Rainfall Occurrence over Brazil. Technical report. University of California.
Sen Z, Eljadid AG. 1999. Rainfall distribution function for Libya and rainfall
prediction. Journal of Hydrological Sciences 44: 665–680.
Sharda VN, Das PK. 2005. Modelling weekly rainfall data for crop planning in a subhumid climate of India. Agricultural Water Management 76: 120–138.
Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data (with discussion).
Journal of the Royal Statistical Society, Series A 147: 1–34.
Swift LW Jr, Schreuder HT. 1981. Fitting daily precipitation amounts using the SB
distribution. Monthly Weather Review 109: 2535–2540.
Tilahun K. 2006. The characterisation of rainfall in the arid and semiarid regions of
Ethiopia. Water South Africa 32: 429–436.
Toth E, Brath A, Montanari A. 2000. Comparison of short-term rainfall prediction
models for real-time flood forecasting. Journal of Hydrology 239: 132–147.
Wilks DS. 1998. Multisite generalization of a daily stochastic precipitation generation
model. Journal of Hydrology 210: 178–191.
Wilks DS. 1999. Interannual variability and extreme-value characteristics of several
stochastic daily precipitation models. Agricultural and Forest Meteorology 93: 153–169.
73
Tweedie distributions for modelling monthly rainfall in Australia
Yoo C, Jung KS, Kim TW. 2005. Rainfall frequency analysis using a mixed gamma
distribution: evaluation of the global warming effect on daily rainfall. Hydrological
Processes 19: 3851–3861.
Zaw WT, Naing TT. 2008. Empirical Statistical Modeling of Rainfall Prediction over
Myanmar. Technical report 36, Proceedings of World Academy of Science, Engineering
and Technology.
74
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
CHAPTER 5: A SIMPLE POISSON–GAMMA MODEL FOR
MODELLING RAINFALL OCCURRENCE AND AMOUNT
SIMULTANEOUSLY
Authors: Md Masud Hasan, Peter K. Dunn
Affiliations: School of Health and Sport Sciences, Faculty of Science, Health and
Education, University of the Sunshine Coast.
Journal: Agricultural and Forest Meteorology (2010), 150: 1319–1330.
JCR ranking: 1/46 (Forestry), 3/61 (Agronomy).
Impact factor: 3.197
ARC tier ranking: A
75
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Statement of Intellectual Contribution
I, Md Masud Hasan, have made substantial independent intellectual contributions to the
research paper ―A simple Poisson–gamma model for modelling rainfall occurrence and
amount simultaneously”. The intellectual contributions include the development of the
study hypothesis, identification and modification of methodology, development of new
applications, responsibility for independent analysis and interpretation of results.
…………………………….
…………………………………………
Md Masud Hasan
Date
………………………………..
……………………………..............
Peter K Dunn
Date
76
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Abstract
Modelling rainfall is important for prediction and simulation purposes in many areas of
planning, agriculture, forestry, meteorology and hydrology. Usually two different
models are needed to understand the two important features of rainfall: the occurrence
and the amount. Here we use a single model, a Tweedie generalized linear model, to
model the occurrence and amount of rainfall simultaneously. Choosing a simple model
with only sine and cosine terms as predictors, the model is fitted for 220 Australian
stations, with 6 rainfall stations are taken as case studies. The model fits well to monthly
rainfall data based on studying the probability of no rain each month, and mean monthly
rainfall amounts. Using the model, simulating monthly rainfall data for the stations with
inadequate rainfall records is possible. The model also allows for a disaggregation of
monthly rainfall amounts into the number of rainfall events in each month and the mean
amount of rainfall per event. This information can then be used in agriculture
production system simulators for agricultural planning and management.
KEY WORDS: Rainfall modelling, Tweedie generalized linear model, Poisson–gamma
model.
77
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Introduction
The ability to model rainfall has promising applications not only in predictive analyses,
but also in crop growth, hydrological systems and crop simulation studies (Richardson
and Wright, 1984; Hansen et al., 2009). Monthly and seasonal rainfall models are used
for agricultural planning and management (Nnaji, 2001; Hansen et al., 2009; Nasseri
and Zahraie, 2010), and stochastic disaggregation of monthly rainfall data is useful for
crop simulation models (Lennox et al., 2004; Hansen and Ines, 2005). Simulated
synthetic rainfall data can be used as an input into the water cycle management,
especially where the observed rainfall record is inadequate with respect to length,
completeness, or spatial coverage (Wilks, 1999; Rosenberg et al., 2004).
The simplest models considered for modelling monthly rainfall amounts are linear
regression models (Hughes and Saunders, 2002; Oettli and Camberlin, 2005; Westra
and Sharma, 2009). For example, Bhakar et al. (2006) used additive time series
decomposition models to model monthly rainfall amounts of the Kota region. The use
of the abovementioned models is not appropriate in cases where the monthly rainfall
amounts do not approximately follow a normal distribution, as is the case for monthly
rainfall for Australian stations, which are generally highly right skewed. For modelling
skewed data, transformations of the rainfall amounts are commonly used. Meng et al.
(2007) used a logarithmic transformation of the monthly rainfall to achieve approximate
normality. However, applying the transform to the summer monsoon rainfall amounts of
39 rainfall stations in Asia, Mooley (1973) showed that the performance of the
logarithmic transformation is poor.
Sometimes, non-normal rainfall amounts are converted to rainfall anomaly (Lorenzo et
al., 2010) or standardized precipitation index (Loukas and Vasiliades, 2004; Gonzalez
and Cariaga, 2009) to fit linear regression models to the converted indices. However,
because of the transformation some information regarding the rainfall data are lost. An
alternative to transforming for normality is to model rainfall directly with non-normal
distributions. Considering zero rainfall to be an attainable lower bound, Mooley (1973)
showed that the gamma distribution fits well to monthly rainfall in the Asian summer
monsoon.
78
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
The assumption of a gamma distribution for rainfall amounts has been used extensively
in the literature on rainfall modelling (Allan and Haan, 1975; Feuerverger, 1979;
Chapman, 1998; Wilks, 1999; Chandler and Wheater, 2002), often with variations or
extensions (Das, 1955; Stern and Coe, 1984; Wilks, 1990). Abtew et al. (2009) showed
that the rainfall amounts for different months fit well to different distributions (gamma,
normal, Weibull and log-normal distributions) by replacing zero monthly rainfall values
with 0.01 in. for monthly rainfall amounts of 32 rainfall stations in the upper Blue Nile
basin. Apart from the challenges of finding an appropriate distribution for rainfall
amounts, another difficulty when modelling monthly rainfall is that rainfall has both
discrete and continuous components. When no rain falls, the amount of rainfall is
discrete, otherwise the amount of rainfall is continuous. Rosenberg et al. (2004) used
gamma distributions, a zero-order approximation of Laguerre polynomials, to fit the
non-zero part of the monthly rainfall distribution. They suggested that when there are
observed zero values, the data should be modelled by a mixed distribution but did not fit
such models. The rainfall models used in crop simulation (Richardson and Wright,
1984; Hamlin and Rees, 1987; Hansen and Ines, 2005) use Markov chains to model the
occurrence and gamma distributions to model the amount of rainfall. Buishand et al.
(2004) used logistic regression to model occurrence and a gamma distribution to model
the amount of monthly rainfall. Piantadosi et al. (2009) used maximum likelihood to
find parameters for both the probability of a zero outcome and the gamma distribution
that best matches the observed probability density of monthly rainfall amounts for the
strictly positive outcomes.
To incorporate the zeros with a continuous dataset, some authors (Velarde et al., 2004;
Kamarianakis et al., 2008; Fernandes et al., 2009) proposed mixture models between
Bernoulli and gamma or log-normal distributions. An alternative approach was adopted
by Glasbey and co-workers (Glasbey and Nevison, 1997; Durban and Glasbey, 2001),
who applied a monotonic transformation of rainfall data to define a latent Gaussian
variable with zero rainfall corresponding to censored values below some threshold (1.05
mm). In this paper, we use a probability distribution which models the exact zeros and
the amount of rainfall simultaneously (Dunn, 2004). The models are based on the
Tweedie distributions (Smyth, 1996; Jørgensen, 1997; Dunn and Smyth, 2005), which
can be used in the generalized linear modelling (GLM) framework. Previously, the
79
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Tweedie distribution has been shown to model monthly rainfall data very well, by
fitting separate models for each month but using no predictor (Hasan and Dunn, 2010).
Here, we use a single model for each station (not one for each month) and use the sine
and cosine terms to model the monthly variations. These produce simpler models.
The aim of the paper is to establish simple models capturing the features of the data for
the purpose of simulation and is useful in a number of applications where the observed
rainfall record is relatively short, incomplete, or lacks spatial coverage. Tweedie GLMs
are easily used in practice and computationally fast to fit. They are also potentially
important for investigating and simulating various rainfall scenarios such as the
probability of no rain, the mean monthly rainfall amounts, the number of rainfall events
in each month and the mean amount of rainfall per event. In rain-fed agriculture, this
information is useful in planning agricultural policies. As an example application,
Tweedie models have previously been used in the agricultural production systems
simulator (APSIM) (McCown et al., 1996) for simulating crop yields (Lennox et al.,
2004) by disaggregating monthly rainfall data.
The next section discusses the data used in this study. Section 3 introduces GLMs which
are used to model the data and includes information on the distributions appropriate to
model rainfall data. Section 4 discusses the results of fitting the models to monthly
rainfall data, includes a diagnostic analysis and evaluates how well the model simulates
data. In Section 5, some summary remarks and conclusions are made.
80
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Figure 1 Location of the stations studied; the six case studies mentioned in the paper are
named and indicated using squares, grey dots represent the other studied stations.
Data and Preliminaries
Monthly rainfall data from 220 Australian rainfall stations are studied with 6 stations as
case studies (data obtained from the Australian Bureau of Meteorology) (Figure 1).
These stations represent a variety of types of rainfall distributions and agricultural
profiles in Australia (Figure 2). Oenpelli and Bidyadanga are grazing areas for beef
cattle. Wheat is the main crop produced in Trayning, while Ceduna supports a seafood
industry. Beef and dairy cattle, sugarcane and fruits are the main agricultural crops in
Theebine and its surrounding areas. Vegetable and fruits are the main crops produced in
the areas surrounding Clarence, which also has a substantial forestry industry. Among
the drier stations, Bidyadanga observes wet months from December to March, and
Trayning and Ceduna observe wet months during May to August. Among the three
wetter stations, Oenpelli and Theebine have wet months from December to March, and
Clarence has an even rainfall distribution over the months of the year (Figure 2). Table
1 shows summary statistics of the rainfall distribution for the six rainfall stations.
Oenpelli has a higher (29.4) percentage of months with no rainfall whereas the dry
stations (Trayning and Ceduna) have smaller (6.5 and 6.0) percentages of months with
no rainfall. Oenpelli has four wet months (DJFM) with median rainfall more than
200mm each, whereas the months in the dry season (MJJAS) have a little rainfall. A
portion of the data is used for model estimation, and the rest reserved for model
validation. The relative proportion to use is subject to much debate and research.
Because the data is a time series, contiguous portions are used for each part of the data:
81
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
the data from 1912 to 1971 is used to fit the model (62.5%), and the rest for validation.
The two sets of data have similar properties (Table 1). In addition to boxplots and
summary statistics, another way to understand the nature of the data is to understand the
mean–variance relationship. The mean–variance relationship can be understood by
computing the variance and mean of the rainfall in each month over all years. Plotting
the log of the variance against the log of the mean for the six stations (Figure 3) shows
an approximate linear relationship between the group means and group variances. The
variance of the amount of rainfall is clearly not constant but depends on the mean. The
relationship between mean and variance of amount of rainfall can be expressed as
log(group variance) α p log(group mean) . Rearranging:
groupvariance exp{α p log(groupmean)}
const (groupmean) p .
That is, the variance is approximately proportional to some power, p, of the mean. This
relationship will be used later to find a suitable distribution for modelling the monthly
rainfall.
Figure 2 Boxplots of the monthly rainfall distributions. The horizontal lines indicate
median rainfall; the boxes indicate the first and third quartiles of the distribution. The
lines extending from the boxes extend to 1.5 times the interquartile range; circles
indicate observations more extreme than this.
82
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Table 1 Summary statistics of the monthly rainfall for whole dataset (1912 to 2007),
estimation dataset (1912 to 1971) and validation dataset (1972 to 2007) for six studied
stations.
Stations
Dataset
Mean
(mm)
All years
42.5
Bidyadanga Estimation
38.7
Validation
48.9
All years
27.1
Trayning
Estimation
27.2
Validation
27.0
All years
118.1
Oenpelli
Estimation 112.0
Validation 128.3
All years
25.1
Ceduna
Estimation 25.8
Validation
23.8
All years
79.8
Theebine
Estimation 82.8
Validation
74.8
All years
89.6
Clarence
Estimation 91.5
Validation
86.4
Median
(mm)
Months with
no rain (%)
IQR
Co-efficient of
variation (%)
2.0
2.1
1.9
19.6
18.6
20.9
32.3
29.3
39.2
19.6
21.1
17.2
55.3
55.7
55.1
66.2
66.7
65.3
38.6
38.5
38.9
6.5
6.5
6.5
29.4
30.0
28.5
6.0
6.0
6.0
4.4
5.4
2.8
0.5
0.6
0.5
(mm)
46.5
204.8
208.0
198.1
99.3
102.5
93.8
133.1
130.9
135.0
91.0
88.5
95.3
101.8
104.0
96.6
89.6
92.1
84.7
38.6
57.8
33.6
35.5
30.8
210.2
206.4
216.5
29.0
29.8
27.8
85.1
88.1
80.2
84.8
85.4
83.6
Figure 3 Scatterplots showing the mean–variance relationship (measured on the log
scale) of monthly rainfall for the stations Bidyadanga, Trayning, Oenpelli, Ceduna,
Theebine and Clarence for the years from 1912 to 2007 for all months. Each point
represents the mean and variance of amount of rainfall for a single month.
83
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Models
Exponential Dispersion Models
GLMs, as used in this paper, are discussed in Section 3.3. Here, we introduce the
probability models upon which GLMs are based.
A probability function of the form
1
f y; , a y, exp y
(3.1)
for y S for some suitable function a( y, ) and known functions and (.) , is called
an exponential dispersion model (EDM). The mean is ( ) and 0 is the
dispersion parameter. The function a( y, ) cannot always be written in closed form and
is the function necessary to ensure the total integral or summation of y over the domain
S is one. Examples of EDMs include the normal, binomial, gamma and Poisson
distributions.
The notation Y ~ ED( , ) indicates that the random variable Y comes from an EDM
with location parameter E[Y ] ( ) and variance var[Y ] ( ) , as in Eq.
(3.1). The functional relationship between and defined by ( ) is invertible,
so that can be written as a function of . Hence, the variance can be written
var[ Y ] V ( ) , when V ( ) is called the variance function.
The Tweedie Family
A special case of EDMs is called the Tweedie family, studied by Jørgensen (1987,
1997) and named in honour of Tweedie (1984). The Tweedie family of distributions are
those EDMs with variance function V ( ) p for p (0 , 1) (Jørgensen, 1987). For
further information, see Smyth (1996), Jørgensen (1997) or Dunn and Smyth (2005).
The Tweedie distribution with mean , dispersion parameter and index parameter p
is denoted as Twp (, ) .
84
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
There are notable special cases of the Tweedie distributions. When p 0 , the Tweedie
distributions correspond to the normal distribution; when p 1 and 1 , the Tweedie
distributions correspond to the Poisson distribution; when p 2 , the Tweedie
distributions correspond to the gamma distribution; when p 3 , the Tweedie
distributions correspond to the inverse Gaussian distribution. Models for which p 0
have support on the entire real line but have 0 and are of no further interest in this
paper. When p 2 , the models have support on the positive reals. Of special interest
are the distributions for which 1 p 2 , also called the Poisson–gamma distributions
(Dunn and Smyth, 2005) which have support on the non-negative reals. In this context,
the probability distributions for which 1 p 2 can be developed as follows.
Assume any rainfall event i produces an amount of rainfall Ri , and that each Ri comes
from a gamma distribution Gam( ) (in this parameterization, the mean is and
variance 2 ). Assume the number of rainfall events in any one month is N , where N
has a Poisson distribution with mean ; that is N ~ Pois ( ) . This implies months with
no rainfall when N 0 . The total monthly rainfall, Y , is the sum Y R1 R2 RN
(where Y 0 when N 0 ). This model is used in this paper to model monthly rainfall,
and complements the work of Hamlin and Rees (1987), Chandler and Wheater (2002),
Buishand et al. (2004) and others who used two separate models to model the
occurrence and amount of rainfall. Apart from the special case of the normal, Poisson,
gamma and inverse Gaussian distributions, the Tweedie EDMs have no closed forms.
Dunn and Smyth (2005, 2008) studied two numerical methods for evaluating the
Tweedie density in general. However, fitting Tweedie GLMs only requires knowledge
of the first two moments; these extra computational issues are relevant for computing
quantile residuals (Dunn and Smyth, 1996) and to get maximum likelihood estimates
(MLE) of the index parameter p and dispersion parameter .
The Tweedie distributions have numerous useful properties making them particularly
appealing for use in rainfall modelling (Dunn, 2004):
There is intuitive appeal for the models, considering total rainfall as a sum of
rainfall amounts on smaller timescales (outlined above).
85
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
They provide a mechanism for understanding the fine-scale structure in coarsescale data, and consequently are useful for the disaggregation of monthly rainfall
to a daily timescale for incorporation into cropping system and other models.
They belong to the EDM family of distributions, upon which GLMs are based.
Consequently, a framework is already in place for fitting models based on the
Tweedie distributions, for inference and for diagnostic testing. In addition,
predictors are easily incorporated into the modelling procedure, as we do in this
paper.
They are closed under scale transformations so that changing the units from, say,
inches to centimetres, will not change the analysis quantitatively, apart from
obvious rescaling of and . The Tweedie distributions are the only EDMs
with this property (Jørgensen, 1997).
Smyth (1996), Dunn (2004), Lennox et al. (2004) and Dunn and White (2005) have
used these distributions in related contexts.
Figure 4 The profile likelihood plot for the studied stations with sine and cosine terms
as predictors used to find the MLE of p. The points represent the computed likelihood
values; the solid line is a cubic-spline smooth interpolation through these points. The
dotted line indicates the nominal 95% confidence interval for p.
86
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Generalized Linear Models
GLMs consist of two components (McCullagh and Nelder, 1989; Dobson, 2002):
The response variable Yi follows an EDM family distribution, with mean i
and dispersion parameter such that Yi ~ ED( i , wi ) for, i 1, 2, , n ,
where wi 0 are known prior weights and
The mean i is related to the predictors through a monotonic, differentiable
link function g(·) so that g X , where X [ x1 , x2 ,, x p ] ; x1 , x2 ,, x p
are vectors of predictors and is a vector of unknown regression coefficients.
Often, the linear predictor X is denoted by , when g i i Xi where X i is
the i th row of X.
Model Fitting
Fitting the Tweedie family requires estimates of , and p . Estimating for given
is p performed using a usually robust iterative procedure called iteratively reweighted
least-squares (McCullagh and Nelder, 1989). Many software packages fit GLMs; here,
we fit a novel family of distributions—the Tweedie family—and so use R (R
Development Core Team, 2010), the techniques expounded in Dunn and Smyth (2005,
2008), and the corresponding R packages (Dunn, 2010; Smyth, 2009).
To estimate the maximum likelihood value of p , a profile (log-) likelihood plot is used;
this requires the computation of the density. To estimate p for any postulated model,
proceed as follows. For a given value of p , assumed fixed, find the MLE of and
as above, and compute the log-likelihood. This is repeated for a range of p values and,
because of the associated computational burden, a cubic-spline interpolation through
these computed points is fitted. The value of p for which the log-likelihood is
maximized is chosen as the maximum likelihood value, p̂ . R functions (Dunn, 2009;
Smyth, 2009) are used to automate the process. Later, we compute the MLE of the
probability of a month having no rainfall, which is a function of . This requires using
87
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
the MLE of . The MLE of is difficult to compute but the algorithms used to
estimate p also compute the MLE of (Dunn and Smyth, 2005).
Table 2 The estimated values of the coefficients for the predictors in the fitted rainfall
model. The significance level is based on a standard asymptotic Wald z-test.
Dataset
Bidyadanga Estimation
Station
Validation
All data
Trayning
Estimation
Validation
All data
Oenpelli
Estimation
Validation
All data
Ceduna
Estimation
Validation
All data
Theebine
Estimation
Validation
All data
Clarence
Estimation
Validation
All data
̂ 0
̂1
̂ 2
̂
***
2.842
2.992***
***
1.908
1.907***
0.653
0.834***
8.21
8.60
2.903***
1.908***
0.724***
8.33
-0.764
***
3.31
-0.582
***
3.35
-0.692
***
3.32
1.990
***
7.11
1.882
***
7.22
1.945
***
7.13
-0.424
***
3.54
-0.337
***
3.36
-0.393
***
3.46
0.483
***
4.50
4.23
4.43
***
3.158
***
3.203
***
3.177
***
3.460
***
3.585
***
3.510
***
3.189
***
3.097
3.158
***
***
4.297
***
4.224
***
4.271
***
4.496
***
4.418
***
4.469
-0.098
*
-0.160
***
-0.122
**
1.768
***
1.903
***
1.822
***
-0.263
**
-0.437
***
-0.324
***
***
0.503
***
0.404
***
0.452
***
0.467
***
0.472
***
0.289
***
0.034
0.396
***
0.102
0.327
***
0.057
1.99
*
2.10
2.00
* 0.01 < p≤0.05; ** 0.001 < p≤0.01; *** p≤0.001.
Model Results
Fitted Model
Models of the form logi X i are fitted, based on Tweedie EDMs. To model
cyclical rainfall patterns (Figure 2), we use as predictors sine and cosine terms of the
form sin(2 π m/12) and cos(2 π m/12) where m 1, 2, , 12 correspond to January,
February, . . ., December. Hence, the fitted model with sine and cosine terms is
log i 0 1 sin
2m
2m
2 cos
12
12
88
(4.1)
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
where i ~ Tw p , ; 0 , 1 and 2 are regression coefficients. Notice that
i 12( j 1) m , where j 1, 2 ,, J corresponds to the case j (starting at j = 1 for the
year 1912).
To fit models to the estimation dataset, first the appropriate estimates of the index
parameter p are found to determine the particular Tweedie distribution for the station.
The profile (log-) likelihood plots with sine and cosine terms as predictors (Figure 4)
suggest the MLEs of p for the studied stations are 1.63, 1.54, 1.48, 1.49, 1.51 and 1.75,
respectively.
Figure 5 Plots showing the predicted and observed (for whole dataset of 96 years and
validation dataset of 36 years) mean monthly rainfall of the studied stations for different
months.
Here we propose a simple model useful for simulation purposes, choosing a single value
of pˆ 1.6 for all the studied stations rather than using different values of p for different
stations. Estimation of p for each station is computationally intensive; however, once
pˆ 1.6 is decided upon, the models are very quick and easy to fit in R using the
Tweedie family functions (Dunn, 2009; Smyth, 2009). For some stations, pˆ 1.6 does
89
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
not fall within the 95% confidence interval; however using pˆ 1.6 for each station
actually performs very well in practice (Section 4.3). The value of p̂ makes no
difference to the estimates of regression coefficients, but has an impact on the monthly
rainfall variation (discussed in Section 4.4). Provided p̂ is not too far from p 1.6 , the
impact on the variation is usually not too great. Of course, p can easily be estimated for
each station if necessary. This causes no great problems for the approach discussed
here, but requires extra effort to estimate a different value of p for each station. We
adopt pˆ 1.6 for simplicity. The models are fitted to the estimation, validation and
whole datasets (Table 2). In all cases, almost the same regression coefficients and ˆ are
obtained.
Figure 6 Plots showing the predicted and observed (for whole dataset of 96 years and
validation dataset of 36 years) probability of no rain of the studied stations for different
months.
90
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Interpretation
Using the model, we show how to understand various aspects of monthly rainfall, such
as the probability of no rain, the mean monthly rainfall amounts, the shape of the
rainfall distribution, the number of rainfall events of each month and the mean amount
of rainfall per event. Most of these features have application in agricultural planning and
management, and in hydrology. With the exception of Clarence, both the sine and
cosine terms are observed as important characteristics in rainfall (Table 2). From Figure
2, Bidyadanga, Oenpelli and Theebine receive more rainfall during the southern
hemisphere summer (DJFM), hence producing positive coefficients for the sine and
cosine terms. Trayning and Ceduna receive more rainfall during the southern
hemisphere winter (MJJA) and so negative coefficients of the sine and cosine predictors
are observed. Observing a reasonably even rainfall distribution over the months of the
year, Clarence has no significant coefficients for cosine terms but the sine terms have
significant coefficients. Since the right hand side of Eq. (4.1) only involves m, the
predicted rainfall i for each month only depends on the month m.
Figure 7 Maps of Australia showing the effect of sine and cosine terms on the amount
of rainfall. Black dots show significant positive effect (significantly larger rainfall
amount during summer), grey dots show significant negative effect (significant larger
rainfall amount during winter), black circles show no significant effect (even rainfall
distribution).
91
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
So we write ̂ m for the predicted rainfall for month m. Individual realization from this
model for each month will vary from year to year. To interpret the coefficients, recall
the
model
uses
a
logarithm
link
function
so
that
log i 0 1 sin(2m 12) 2 cos(2m 12) . Hence the predicted mean amount of
rainfall for month m is ˆ i exp{0 1 sin(2m 12) 2 cos(2m 12)} and the
baseline prediction is exp(0 ) . For example, consider the model fitted to the estimation
dataset to the rainfall in Bidyadanga. The constant term in the model for Bidyadanga is
akin to a baseline rainfall prediction of exp(2.842) = 17.15mm. The predicted mean
amount of rainfall in December, when m= 12, is ˆ 12 32 .96 mm . The solid lines in
Figure 5 represent the predicted mean rainfall amounts using the model fitted to the
estimation data. The ―dash‖ and ―dot-dash‖ lines represent the monthly mean amount of
rainfall for the whole dataset and validation dataset, respectively. The model performs
well for both datasets.
The Tweedie parameters , p, can be reparameterized to the Poisson and gamma
distribution parameters ( , , ) when 1 p 2 (Dunn, 2004), providing approximate
down-scaling information about monthly rainfall. The transformation between the
parameterizations (when 1 p 2 ) is:
2 p
(2 p)
; ( p 1) p 1 ;
p2
.
1 p
The mean number of rainfall events ( ), the shape of the rainfall gamma distribution (
) and the amount of rain per rainfall event can then be computed. Using
Bidyadanga as an example, the MLE of the dispersion parameter is ˆ 8.21 .
Reparameterizing, the mean number of rainfall events for December is estimated as
ˆ 1.24 ; the shape of the rainfall gamma distribution is ˆ 39.95 ; and the mean
amount of rain per event is ˆˆ 26.63mm . Using the model, the parameters can be
estimated for other months of Bidyadanga, as well as for each month of other rainfall
stations. Using the Tweedie distribution in this way to understand the finer timescale
structure of monthly rainfall has been successfully used to simulate the growing of
sorghum (Lennox et al., 2004) using APSIM. That application did not use the sine and
92
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
cosine terms as explanatory variables in the modelling. Clearly, the model used here
could be used for simulating crop yields.
Importantly, the probability of recording no rain is 0 Pr(Y 0) exp() (Dunn
and Smyth, 2005). The probability of recording no December rain in Bidyadanga is
ˆ 0 0.2894; thus we predict the probability of no December rain in Bidyadanga as
about 29%. The ―dash‖ and ―dot-dash‖ lines in Figure 6 are the observed probability of
no rain for the respective months calculated from whole dataset and validation dataset,
respectively. For example, for the 96 December rainfall totals in whole dataset of
Bidyadanga, 15 were exactly zero (no rainfall) so the observed probability of no rainfall
is 0.16. The solid lines represent the predicted probability of no rain using the model
fitted to the estimation data. From Figure 6, the model predicts the probability of
rainfall occurrence well. For whole data of Clarence, the probability of no rainfall for
different months is either 0 (for the six months with no exact zeros) or 1/96 (for the
months with only one month of exact zero rainfall). For validation dataset, the
probability of no rain for different months are observed as zero (for 10 months) and
1/36 (for 2 months with only one exact zero rainfall each) with no possible value in
between. The predicted probability of no rainfall for different months for Clarence lies
between 0.0002 and 0.001. The fitted Tweedie model predicts a small value of 0 for
all months, so the model performs well. In months with larger probabilities of no
rainfall the risks associated with cropping are increased. Modelling the probability of
recording no rain for different months is important in taking decisions regarding any
shift of cropping time or alternative cropping. For some months in some rainfall stations
no exact zeros are observed, so the model predicts a very small probability of getting no
rainfall. As an example, for Clarence the observed probability of getting no rainfall in
January is zero but the model predicts the probability as 0.0003. The result is helpful for
simulation purposes as even when there is no month with zero rainfall in the rainfall
history, the possibility of having months with no rainfall in future is not completely
excluded. Thus, the very simple model is useful for predicting or simulating the
probability of no rainfall P(Y 0) and amount of rainfall Y simultaneously and not
needing to separate models.
93
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Figure 8 Q–Q plots of quantile residuals for the rainfall models of the studied stations.
The model is also fitted to another 214 rainfall stations in Australia covering various
types of rainfall characteristics and agricultural regimes. For most of the cases the sine
and cosine terms have coefficients with same sign. The maps in Figure 7 represent the
effect of sine and cosine terms on monthly rainfall for the studied rainfall stations. From
Figure 7, it is observed that both the sine and cosine terms have significant negative
coefficients on rainfall for the southern parts of Australia (grey dots). The southern
rainfall stations receive significant larger amount of rainfall during June–November. A
few stations have no significant effect of the sine and cosine terms indicating relatively
even rainfall distributions throughout the months of a year (black circles). For other
stations, sine and cosine terms have significant positive effect on rainfall amount
indicating larger rainfall amount during southern hemisphere summer (black dots).
Diagnostic Checks
Assessing the model presents unusual difficulties: the exact zeros (dry months) cause
the usual deviance or Pearson residuals to have spurious and distracting patterns. An
94
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
alternative is to use (randomized) quantile residuals (Dunn and Smyth, 1996), which
explicitly remove these patterns for discrete data, or discrete components of data, by
incorporating the smallest amount of randomization necessary on the cumulative
probability scale. Dunn and Smyth (1996) suggest creating four sets of the residuals:
any features not preserved across the replications are considered artefacts of the
randomization. In this paper, one representative replication is given. The quantile
residuals are normally distributed (apart from sampling error in the estimated
parameters) provided the correct distribution form is used; hence they are useful for
checking if the correct distribution has been chosen. If the residuals are on or close to
the straight line (indicating normality), the fitted model is considered to perform well.
This is important here, where the suitability of the Poisson–gamma distributions is
crucial for the models fitted in this paper. A Q–Q plot (normal probability plot) of the
quantile residuals shows whether the distribution is appropriate for the fitted model.
Figure 8 shows all the residuals lying on or close to the line indicating normality
suggesting the fitted models are performing well. No large deviations are evident except
small deviations at the upper tail. The same feature is also observed by Chandler and
Wheater (2002). The diagnostic plots indicate a Poisson–gamma distribution with an
index parameter of pˆ 1.60 fits the monthly rainfall well for different stations and the
distribution proves to be adequate for simultaneously modelling both the continuous and
discrete components of monthly rainfall.
Simulation
The main purpose of the current study is to develop a simple model for simulating
rainfall for use in agricultural production systems. In the previous section, we
demonstrated that the model fits well. In this section we show that the simulated data
has similar properties to the actual rainfall data.
Rainfall data for 96 years is simulated from the corresponding Tweedie EDM based on
this value of , the index parameter p = 1.6, and MLE of the dispersion parameter .
The properties of the simulated data were then compared with the properties of the
actual data (also based on 96 years). The boxplots in Figure 9 represent the simulated
monthly rainfall distribution of the studied stations. Comparing with the observed
95
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
rainfall distribution in Figure 2, the simulated data using the model have very similar
properties to the observed rainfall. The model can be used to generate synthetic rainfall
data for stations with inadequate observed rainfall records. If the lower 95% CI is
greater than 1.6, the effect is to slightly overestimate the variation (Ceduna and
Oenpelli) and if the upper CI is less than 1.6, the effect is to slightly underestimate the
variation (Clarence).
Figure 9 Boxplots of the simulated monthly rainfall distributions for 96 years.
Conclusion
The ability to model rainfall is important for agricultural decision making, crop
management and crop simulation studies. Instead of using two different models for
monthly rainfall occurrence P(Y = 0) and monthly rainfall amounts Y, we use a single
model to understand the both rainfall events simultaneously. For the purpose, we use the
Poisson–gamma distribution from Tweedie family of distributions. These distributions
have many desirable properties. The most important of these for practitioners is that
these models provide mechanisms for understanding the fine scale structure of the data,
and so are useful for disaggregation of monthly rainfall amounts to daily timescale for
incorporation into cropping system and other models using software such as APSIM.
96
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
In the model we used a single value of pˆ 1.60 for all the stations. Better but more
complex models may be obtained by using a p estimated separately for each station.
Using pˆ 1.60 is a good compromise, balancing a well-fitting model with a model
simple to fit (i.e., p̂ is not re-estimated for each station). We used a simple model with
only sine and cosine terms as predictors. These models are computationally fast to fit,
and are useful in a number of applications where the observed rainfall record is
relatively short, incomplete, or lacks spatial coverage. The models are also potentially
important for investigating and simulating various aspects of rainfall useful in practice,
such as the probability of no rain, the mean monthly rainfall amounts, the number of
rainfall events in each month and the mean amount of rainfall per event. These features
of rainfall have many potential uses in primary industries, water resources planning and
hydrology. Despite their simplicity, these models fit the data well and produce
reasonable simulated monthly rainfall data. The climatological drivers of rainfall can
also be added to the model. In principle, this is just an extension to the development
here. However, in practice, other issues emerge that need to be addressed, and so this
remains an area of further research.
Acknowledgements
The comments of the reviewers and the editor are most gratefully acknowledged; they
improved the flow, interpretation and understanding of the paper. They were also
helpful to address some practical implications of the model.
97
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
References
Abtew W, Melesse AM, Dessalegne T. 2009. Spatial, inter and intra-annual variability
of the upper Blue Nile basin rainfall. Hydrological Processes 23: 3075–3082.
Allan DM, Haan CT. 1975. Stochastic Simulation of Daily Rainfall. Technical Report
82. Water Resources Institute, University of Kentucky.
Bhakar SR, Singh RV, Chhajed N, Bansal AK. 2006. Stochastic modeling of monthly
rainfall at Kota region. ARPN Journal of Engineering and Applied Sciences 1 (3): 36–
44.
Buishand TA, Shabalova MV, Brandsma T. 2004. On the choice of the temporal
aggregation level for statistical downscaling of precipitation. Journal of Climate 17:
1816–1827.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using generalized
linear models: a case study from the west of Ireland. Water Resources Research 38 (10):
1192–1202.
Chapman T. 1998. Stochastic modelling of daily rainfall: the impacts of adjoining wet
days on the distribution of rainfall amounts. Environmental Modelling and Software 13:
317–324.
Das SC. 1955. The fitting of truncated Type III curves to daily rainfall data. Australian
Journal of Physics 8: 298–304.
Dobson AJ. 2002. An Introduction to Generalized Linear Models, 2nd edition.
Chapman and Hall, London.
Dunn PK. 2004. Occurrence and quantity of precipitation can be modelled
simultaneously. International Journal of Climatology 24: 1231–1239.
Dunn PK. 2010. Tweedie: Tweedie exponential family models. R package, R package
version 2.0.5. Vienna, Austria.
98
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Dunn PK. Smyth GK. 1996. Randomized quantile residuals. Journal of Computational
and Graphical Statistics 5 (3): 236–244.
Dunn PK, Smyth GK. 2005. Series evaluation of Tweedie exponential dispersion model
densities. Statistics and Computing 15: 267–280.
Dunn PK, Smyth GK. 2008. Evaluation of Tweedie exponential dispersion model
densities by Fourier inversion. Statistics and Computing 18: 73–86.
Dunn PK, White N. 2005. Power-variance models for modelling rainfall. In: Statistical
Solution to Modern Problems: Proceedings of the 20th International Workshop on
Statistical Modelling, Sydney, pp. 149–156.
Durban M, Glasbey CA. 2001. Weather modelling using a multivariate latent Gaussian
model. Agricultural and Forest Meteorology 109: 187–201.
Fernandes MVM, Schmidt AM, Migon HS. 2009. Modelling zero-inflated spatiotemporal processes. Statistical Modelling 9 (1): 3–25.
Feuerverger A. 1979. On some methods of analysis for weather experiments.
Biometrika 66: 655–658.
Glasbey CA, Nevison IM. 1997. Rainfall modelling using a latent Gaussian variable. In:
Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and
Future Directions. Springer, New York, pp. 233–242.
Gonzalez MH, Cariaga ML. 2009. An approach to seasonal forecasting of summer
rainfall in Buenos Aires, Argentina. Atmosfera 22 (3): 265–279.
Hamlin MJ, Rees DH. 1987. The use of rainfall forecasts in the optimal management of
small-holder rice irrigation—a case study. Hydrological Sciences 32 (1): 15–29.
Hansen JW, Ines AVM. 2005. Stochastic disaggregation of monthly rainfall data for
crop simulation studies. Agricultural and Forest Meteorology 131: 233–246.
99
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Hansen JW, Mishra A, Rao KPC, Indeje M, Ngugi RK. 2009. Potential value of GCMbased seasonal rainfall forecasts for maize management in semi-arid Kenya.
Agricultural Systems 101: 80–90.
Hasan MM, Dunn PK. 2010. Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia. International Journal of Climatology, Published
online in Wiley InterScience. doi:10.1002/joc.2162.
Hughes, B.L., Saunders, M.A., 2002. Seasonal prediction of European spring
precipitation from E l Ni~n o Southern Oscillation and local sea-surface temperatures.
International Journal of Climatology 22: 1–14.
Jørgensen B. 1987. Exponential dispersion models (with discussion). Journal of the
Royal Statistical Society, Series B 49: 127–162.
Jørgensen B. 1997. The Theory of Dispersion Models. Chapman and Hall, London.
Kamarianakis Y, Feidas H, Kokolatos G, Chrysoulakis N, Karatzias V. 2008.
Evaluating remotely sensed rainfall estimates using nonlinear mixed models and
geographically weighted regression. Environmental Modelling & Software 23: 1438–
1447.
Lennox SM, Dunn PK, Power BD, Devoil P. 2004. A statistical distribution for
modelling rainfall with promising applications in crop science. Technical report. In:
Fischer, T., et al. (Eds.), New Directions for a Diverse Planet: Proceedings for the 4th
International Crop Science Congress. Brisbane, Australia.
Lorenzo MN, Iglesias I, Taboada JJ, Gesteira MG. 2010. Relationship between monthly
rainfall in northwest Iberian Peninsula and North Atlantic sea surface temperature.
International Journal of Climatology 30: 980–990.
Loukas A, Vasiliades L. 2004. Probabilistic analysis of drought spatiotemporal
characteristics in Thessaly region, Greece. Natural Hazards and Earth System Sciences
4: 719–731.
100
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
McCown RL, Hammer GL, Hargreaves JNG, Holzworth DP, Freebairn DM. 1996.
APSIM: a novel software system for model development, model testing, and simulation
in agricultural research. Agricultural Systems 50: 255–271.
McCullagh P, Nelder JA. 1989. Generalized Linear Models, 2nd edition. Chapman and
Hall, London.
Meng Q, Zhang Y, Wang Z. 2007. Rainfall Predictive Models for Building Simulation
II—Rainfall Estimation, Proceedings: Building Simulation. Technical Report. Tsinghua
University, Beijing, China.
Mooley, D.A., 1973. Gamma distribution probability model for Asian summer monsoon
monthly rainfall. Monthly Weather Review 101: 160–176.
Nasseri M, Zahraie B. 2010. Application of simple clustering on space time mapping of
mean monthly rainfall patterns. International Journal of Climatology, Published online
in Wiley InterScience. doi:10.1002/joc.2109.
Nnaji AO. 2001. Forecasting seasonal rainfall for agricultural decision-making in
northern Nigeria. Agricultural and Forest Meteorology 107: 193–205.
Oettli P, Camberlin P. 2005. Influence of topography on monthly rainfall distribution
over East Africa. Climate Research 28: 199–212.
Piantadosi J, Boland JW, Howlett PG. 2009. Generating synthetic rainfall on various
timescales—daily, monthly and yearly. Environmental Modeling and Assessment 14:
431–438.
R Development Core Team, 2010. R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-90005107-0, URL http://www.R-project.org.
Richardson CW, Wright DA. 1984. WGEN: A Model for Generating Daily Weather
Variables. Technical Report. United States Department of Agriculture, Agriculture
Research Service.
101
Poisson–gamma model for modelling rainfall occurrence and amount simultaneously
Rosenberg K, Boland JW, Howlett PG. 2004. Simulation of monthly rainfall totals.
ANZIAM Journal 46: 85–104.
Smyth GK. 1996. Regression analysis of quantity data with exact zeroes. In:
Proceedings of the Second Australia–Japan Workshop on Stochastic Models in
Engineering, Technology and Management. Technical report. Technology Management
Centre, University of Queensland, pp. 572–580.
Smyth GK. with contributions from Hu Y, Dunn PK. 2009. Statmod: Statistical
Modelling. R Package Version 1.4.1.
Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data (with discussion).
Journal of the Royal Statistical Society Series A 147: 1–34.
Tweedie MCK. 1984. An index which distinguishes between some important
exponential families. Statistics: applications and new directions. In: Proceedings of the
Indian Statistical Institute Golden Jubilee International Conference. Technical Report.
Indian Statistical Institute, Calcutta.
Velarde LGC, Migon HS, Pereira BB. 2004. Space-time modeling of rainfall data.
Environmetrics 15: 561–576.
Westra S, Sharma A. 2009. Estimating the seasonal predictability of global
precipitation—an
empirical
approach.
Technical
Report.
In:
18th
World
IMACS/MODSIM Congress, Cairns, Australia.
Wilks DS. 1990. Maximum likelihood estimation for the gamma distribution using data
containing zeros. Journal of Climate 3, 1495–1501.
Wilks DS. 1999. Interannual variability and extreme-value characteristics of several
stochastic daily precipitation models. Agricultural and Forest Meteorology 93, 153–169.
102
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
CHAPTER 6: UNDERSTANDING THE EFFECT OF
CLIMATOLOGY ON MONTHLY RAINFALL AMOUNTS IN
AUSTRALIA USING TWEEDIE GLMS
Authors: Md Masud Hasan, Peter K. Dunn
Affiliations: School of Health and Sport Sciences, Faculty of Science, Health and
Education, University of the Sunshine Coast.
Journal: International Journal of Climatology, Published online, DOI: 10.1002/joc.2332
JCR ranking: 18/63 (Meteorology and Atmospheric Sciences)
Impact factor: 2.347
ARC tier ranking: A
103
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Statement of Intellectual Contribution
I, Md Masud Hasan, have made substantial independent intellectual contributions to the
research paper ‗Understanding the effect of climatology on monthly rainfall amounts in
Australia using Tweedie GLMs‘. The intellectual contributions include the development
of the study hypothesis, identification and modification of methodology, development
of new applications, responsibility for independent analysis and interpretation of results.
…………………………….
…………………………………………
Md Masud Hasan
Date
………………………………..
……………………………..............
Peter K Dunn
Date
104
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Abstract
Rainfall models are used to understand the effect of various climatological variables on
rainfall amounts. The models also have potential uses in predicting and simulating
rainfall. We use Tweedie generalized linear models to model monthly rainfall amounts
and occurrence simultaneously with a set of predictors (sine term, cosine term, NINO
3.4, SOI and SOI phase). Models are fitted to the monthly rainfall data of 220
Australian stations with four stations as case studies. First, models with only sine and
cosine terms (the base model) are fitted to model the cyclic pattern of rainfall data, and
then one of the climatological variables is added each time in addition to the base
model. On the basis of the BIC, the model with NINO 3.4 is preferred for most of the
studied stations. Stations for which the model using the SOI is preferred appear in small
clusters. Adding the climatological variables to the base model improve the fit of the
model and make substantial changes in the predicted mean monthly rainfall amount and
probability of getting a dry month. The climatological variables have significant
impacts on the amount of rainfall in most stations located on the eastern and northeastern regions of Australia. The models use lags one of the climatological covariates
(i.e. value of the covariates of previous month with rainfall amount of a month) and are
useful for one month lead rainfall prediction.
KEY
WORDS:
Tweedie
generalized
linear
Climatology, rainfall Australia.
105
model,
Poisson–gamma
model,
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Introduction
Rainfall models are useful for understanding the impact of different climatological
variables on the occurrence and amount of rainfall, and have potential uses in predicting
and simulating rainfall. Consequently rainfall models are useful for modelling the
growth and development of crops, developing crop simulation models, and for
agricultural planning and management (Lennox et al., 2004; Shui and Haque, 2004;
Hansen et al., 2009). Prediction of monthly and seasonal rainfall amounts is useful in
determining supplemental irrigation, water requirements, storage of water, and for
reservoir management.
The use of the simplest statistical model, the linear regression model, is not appropriate
for modelling monthly rainfall amounts for Australian stations, which are generally
highly right skewed. For modelling skewed data, transformations of the rainfall totals
are commonly used (Mooley, 1973; Meng et al., 2007). Sometimes, rainfall totals are
converted to rainfall anomalies (Lorenzo et al., 2010) or a standardized precipitation
index (Loukas and Vasiliades, 2004; Almeira and Scian, 2006; Gonzalez and Cariaga,
2009) to fit linear regression models to the converted indices. However, because of the
transformation some information regarding the rainfall data are lost.
For modelling right skewed rainfall amounts, the assumption of a gamma distribution
has been used extensively in the literature on rainfall modelling (Allan and Hann, 1975;
Feuerverger, 1979; Chapman, 1998; Wilks, 1999; Chandler and Wheater, 2002), often
with variations or extensions (Das, 1955; Stern and Coe, 1984; Wilks, 1990).
Another difficulty when modelling monthly rainfall is that rainfall may have both
discrete (exact zero for dry months) and continuous (amount of rainfall for wet months)
components. For the rainfall models used in crop simulation it is a common practice to
use Markov chains to model the occurrence and gamma distributions to model the
amount of rainfall (Richardson and Wright, 1984; Hamlin and Rees, 1987; Rosenberg et
al., 2004; Hansen and Ines, 2005; Abtew et al., 2009). Buishand et al. (2004) used
logistic regression to model occurrence and gamma distribution to model the amount of
monthly rainfall. To model continuous data with exact zeros, some authors proposed
mixture models between Bernoulli and gamma or log-normal distributions (Fernandes
106
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
et al., 2009; Little et al., 2009; Piantadosi et al., 2009). An alternative approach was
adopted by Glasbey and co-workers (Glasbey and Nevison, 1997; Durban and Glasbey,
2001), who applied a monotonic transformation of rainfall data to define a latent
Gaussian variable with zero rainfall corresponding to censored values below some
threshold (1.05 mm).
The rainfall models based on the Tweedie distributions (Smyth, 1996; Jørgensen, 1997;
Dunn 2004; Dunn and Smyth, 2005) model the exact zeros and the amount of rainfall
simultaneously. The models can be used in the generalized linear modelling framework.
Previously the Tweedie distributions have been shown to model monthly rainfall data
well, by fitting separate models for each month but using no predictors (Hasan and
Dunn 2010a). Cyclical patterns are likely to be evident in rainfall; for example, most
locations have drier and wetter months consistently from year to year. Using Tweedie
generalized linear models (GLMs), Hasan and Dunn (2010b) fitted models with sine
and cosine terms as predictors to model the monthly rainfall amounts in Australia.
Apart from the cyclical patterns, climatological factors, such as the Southern Oscillation
Index (SOI) and the sea surface temperature anomaly, influence monthly rainfall. The
relationship between the SOI and rainfall in Australia has been well known for many
years and studied by numerous authors (e.g. Troup, 1965; Quinn and Burt, 1972; Power
et al., 1997; Simmonds and Hope, 1997; Chowdhury and Beecham, 2010). Correlation
coefficients and linear regression (Power et al., 1997; Simmonds and Hope, 1997;
Almeira and Scian, 2006) have been used to understand the relationship between SOI
and amount of rainfall. Using the SOI, Stone and Auliciems (1992) constructed five SOI
phases which can then be used to study the effect of the SOI on rainfall amounts. The
SOI phases have proven useful for studying their effects on rainfall and on different
types of cropping in Australia (e.g. Stone and McKeon, 1993; Hammer et al., 1996;
Meinke et al., 1996; Meinke and Ryley, 1997; Stone et al., 1996b; Willcocks and Stone,
2000). The NINO 3.4 index is one of the El Niño Southern Oscillation indicators based
on sea surface temperature and is related to the monthly rainfall amounts of Australia.
Several authors used linear regression analysis to study the effect of NINO 3.4 on the
rainfall amounts (Wu and Kirtman, 2007; Everingham and Reason, 2009; Lee et al.,
2009). The use of linear regression models is not appropriate for understanding the
107
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
relationship between the climatological variables and monthly rainfall totals of
Australian stations, because the rainfall distribution is not normal and the relationship is
not approximately linear.
In this paper, Tweedie GLMs will be fitted to understand the effect of the
abovementioned climatological variables on monthly rainfall amounts of Australian
rainfall stations. First, the simplest model with sine and cosine terms as predictors will
be fitted (the base model) as in Hasan and Dunn (2010b). Then one of the three
climatological variables (NINO 3.4, SOI and SOI phase) will be added each time to the
base model. Thus, in the study, we have four different models (one base model and
three models with climatological covariates). The impacts of the climatological
variables on monthly rainfall totals after adjusting for the sine and cosine terms will be
studied. We then examine the preferred model for each of the 220 Australian stations
and identify geographical regions where each climatological variable is superior for
modelling monthly rainfall. Statistical methods will be used to assess the extent to
which each model with climatological variable improves the base model for modelling
monthly rainfall on different regions across Australia. Values of the climatological
variables from the previous month (i. e. a lag of one) will be used in the models for one
month lead prediction of rainfall. We will also use the features of Tweedie models to
simultaneously examine the improvement in predicted mean rainfall and predicted
probability of no rainfall using each climatological variable.
We first discuss the data (Section 2), and then introduce the Tweedie distributions and
their properties (Section 3). In Section 4, different models, different model comparison
criteria and test statistics are defined, and a detailed interpretation of the results is
presented. Some concluding comments are made in Section 5.
Data
Monthly rainfall data from 220 Australian stations are studied (data obtained from the
Australian Bureau of Meteorology), with four stations as case studies: Bidyadanga and
Trayning, from Western Australia, Theebine from Queensland, and Clarence from New
South Wales (Figure 1). These stations represent a variety of types of rainfall
distributions in Australia. Most of the studied stations (92%) have data available from
108
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
1910 or earlier to the end of 2007. The remaining stations are generally located in
remote areas, and have rainfall records from 1950 or earlier to the end of 2007.
Figure 1 Location of the stations studied. The four case studies mentioned in the paper
are named and indicated using squares; grey dots represent the other studied stations.
NINO 3.4 is the average sea surface temperature anomaly in the region bounded by 5 0
N–50 S, 1200–1700 W. The region has large variability on El Niño time scales, and so is
used by many authors to understand their impact on rainfall amounts for different parts
of Australia (Wu and Kirtman, 2007; Everingham and Reason, 2009; Lee et al., 2009).
The monthly NINO 3.4 data used in the study is obtained from http://www.cgd.ucar.edu
/cas/catalog/climind/TNI_N34/index.html#Sec5. In the early part of the record there is a
considerable difference in estimation of the magnitude of El Ni~n o events
(reconstruction and reanalysed), even though strong correlation exists between the two
time series (Giese et al., 2010).
The SOI is standardized fluctuations in the air pressure difference between Tahiti and
Darwin (Troup, 1965). Sustained negative values of the SOI often indicate El Niño
episodes. These negative values are usually accompanied by sustained warming of the
central and eastern tropical Pacific Ocean, a decrease in the strength of the Pacific Trade
Winds, and a reduction in rainfall over eastern and northern Australia. Positive values of
the SOI are associated with stronger Pacific trade winds and warmer sea temperatures to
the north of Australia, known as a La Niña episode. Waters in the central and eastern
tropical Pacific Ocean become cooler during this time. The phases of the SOI (Stone
109
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
and Auliciems, 1992; Stone et al., 1996) are defined using a cluster analysis to group all
sequential two-month pairs of the SOI into five clusters. The phases are consistently
negative, consistently positive, falling, rising and consistently near zero. The SOI and
SOI phase data used in the study were obtained from http://www.longpaddock.qld.gov.
au/seasonalclimateoutlook/.
Lagged values of the climatological variables will be used; that is, the models use the
value of the climatological variables of previous month to model the rainfall amount of
a given month.
Models
Exponential Dispersion Models
Generalized linear models (GLMs), as used in this paper, are discussed in Section 3.3.
Here, we introduce the probability models upon which GLMs are based in general, and
the Tweedie exponential dispersion models in particular, in the following section.
A probability function of the form
1
f y; , a y, exp y
(1)
for y S , for some suitable function a( y, ) and known functions and (.) , is
called an exponential dispersion model (EDM). The mean is ( ) and 0 is the
dispersion parameter. The function a( y, ) cannot always be written in closed form and
is the function necessary to ensure the total integral or summation of Y over the domain
S is one. Examples of EDMs include the normal, binomial, gamma and Poisson
distributions.
The notation Y ~ ED( , ) indicates the random variable Y comes from an EDM with
location parameter E[Y ] ( ) and variance var[Y ] ( ) , as in Equation (1).
The functional relationship between and defined by ( ) is invertible, so the
variance can be written as var[Y ] V () , when V ( ) is called the variance function.
110
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
The Tweedie Family
A special case of EDMs is the Tweedie family of distributions studied by Jørgensen
(1987, 1997) and named in honour of Tweedie (1984). The Tweedie family of
distributions are those EDMs with variance function V p for p (0, 1)
(Jørgensen, 1987). For further information, see Smyth (1996), Jørgensen (1997) or
Dunn and Smyth (2005). The Tweedie distribution with mean , dispersion parameter
and index parameter p is denoted Twp (, ) .
To determine the appropriate distribution for modelling the monthly rainfall data, we
examine the mean-variance relationship of the monthly rainfall totals. By plotting the
log of mean against the log of variance of monthly rainfall totals, Hasan and Dunn
(2010b) show that the relationship can be suitably expressed as
log (group variance) = α + p log (group mean).
Rearranging,
group variance = exp { α + p log (group mean) } = const (group mean) p .
That is, the variance is approximately proportional to some power of the mean. Hence,
Tweedie distributions are appropriate for modelling monthly rainfall totals of Australian
rainfall stations.
There are four notable special cases of Tweedie distributions: the normal distribution
( p 0) , the Poisson distribution ( p 1, 1) , the gamma distribution ( p 2) and the
inverse Gaussian distribution ( p 3) . Apart from these special cases, the probability
functions for the Tweedie distributions have no closed form. For p 2 , the
distributions are suitable for modelling positive, right-skewed data. Of special interest
are the distributions for which 1 p 2 , also called the Poisson–gamma distributions
(Dunn and Smyth, 2005). In this context, the probability distributions for which
1 p 2 can be developed as follows.
Assume any rainfall event i produces an amount of rainfall Ri , and that each Ri comes
from a gamma distribution Gam(, ) . In this parameterization, the mean is and the
111
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
variance is 2 . Assume the number of rainfall events in any one month is N , where
N has a Poisson distribution with mean ; that is N ~ Pois ( ) . This implies months
with no rainfall when N 0. The total monthly rainfall, Y , is the sum
Y R1 R2 RN , where N has a Poisson distribution with mean . When N 0 ,
then Y 0 .
One of the important properties of Tweedie GLMs is that they provide a mechanism for
understanding the fine-scale structure in coarse-scale data, and consequently are useful
for the disaggregation of monthly rainfall to a daily timescale for incorporation into
cropping system and other models (Dunn, 2004). This model is used in this paper to
model monthly rainfall, and complements the work of Chandler and Wheater (2002),
Hamlin and Rees (1987), Buishand et al. (2004) and others who used two separate
models to model the occurrence and quantity of rainfall separately. Smyth (1996), Dunn
(2004), Lennox et al. (2004), Dunn and White (2005) and Hasan and Dunn (2010a,
2010b) used these distributions in related contexts.
Generalized Linear Models
GLMs consist of two components (McCullagh and Nelder, 1989; Dobson and Barnett,
2008):
1. The response variable Yi follows an EDM family distribution, with mean i and
dispersion parameter such that Yi ~ EDi , / wi for i 1, 2, , n , where
wi 0 are known prior weights. This is the random component.
2. The mean is related to the predictors through a monotonic, differentiable link
function g(.) so that g X where X [ x1 , x 2, ..., x p ] is a matrix;
x1 , x2, ..., x p are (n 1) vectors of predictors and is a (p 1) vector of
unknown regression coefficients. This is the systematic component.
Often, the linear predictor, Χ Ti is denoted by i , when g i i ΧTi where X i is
row i of matrix X .
112
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Model Fitting
Fitting the Tweedie family requires estimates of , and p . Estimating for given
p is performed using a usually-robust iterative procedure called iteratively reweighted
least-squares (McCullagh and Nelder, 1989). Many software packages fit GLMs; here,
we use
R
(R Development core team, 2010), the techniques expounded in Dunn and
Smyth (2005, 2008), and the corresponding
R
packages statmod (Smyth, 2009) and
tweedie (Dunn, 2010).
To get the maximum likelihood estimate (MLE) of p , a profile (log-) likelihood plot is
used; this requires the computation of the density. To estimate p for any postulated
model, proceed as follows: For a given value of p assumed fixed, the MLE of and
are found as above, and the log-likelihood computed. This is repeated for a range of
p values and, because of the associated computational burden, a cubic spline
interpolation through these computed points is fitted. The value of p for which the loglikelihood is maximized is chosen as the MLE, p̂ . R functions (Dunn, 2010; Smyth,
2009) are used to automate the process. The probability of a month having no rainfall,
π 0 P(Y 0) , is a function of . Finding the MLE of π 0 requires using the MLE of
. The MLE of is difficult to compute but the algorithms used to estimate p also
compute the MLE of (Dunn and Smyth, 2005).
Model Results
The Models
Models of the form logi XTi are fitted, based on Tweedie EDMs. To fit models,
first the appropriate estimates of the index parameter p are found to determine the
particular Tweedie distribution for the model. Except for four rainfall stations (which
have p̂ close to but larger than two), all the 220 studied stations have p̂ between 1 and
2, and for most of the studied stations p̂ is close to 1.6.
113
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
For simplicity, we propose models with a single value of p̂ =1.6 for all the studied
stations rather than using different values of p for different stations. Estimation of p
for each station is computationally intensive but once p̂ =1.6 is decided upon the
models are very quick and easy to fit. The value of p̂ makes no difference to the
estimates of regression coefficients, but has a small impact on the monthly rainfall
variation and on π 0 . Provided p̂ is not too far from p = 1.6, the impacts are usually not
too great. p can easily be estimated for each station, if necessary. Using p̂ =1.6 for
each station actually performs very well in practice (for details, see Hasan and Dunn
2010 b).
The four models considered here have the following systematic components:
1. log t 01 sin(2π m / 12) 2 cos(2π m / 12) ;
2. log t 01 sin(2π m / 12) 2 cos(2π m / 12) 3 (NINO 3.4) t 1 ;
3. log t 01 sin(2π m / 12) 2 cos(2π m / 12) 4 (SOI) t 1 ;
4. log t 01 sin(2π m / 12) 2 cos(2π m / 12) 5 (SOI phase)t 1 ;
where Yt ~ Tw p ( t , ) , m month of the year (1 means January, 2 means February, …
and so on), t 12( j 1) m , where j =1 (first year of studied data), 2 (second year of
studied data), … and so on. Model 1 (the base model) is the simplest with only sine and
cosine terms as predictors and is nested within the three other models. Models 2, 3 and
4 consider the lagged values from one of the climatological variables, NINO 3.4, SOI or
SOI phase respectively in addition to cyclic sine and cosine terms.
The Likelihood Ratio Test (LRT) is used to compare two models, one of which is
nested within the other (models in which the simpler model, Model A, can be obtained
from the more complex model, Model B, by imposing a set of linear constraints on the
parameters). The LRT statistic follows an F-distribution, where the null hypothesis is
that there is no statistically significant difference between the two models. For the Pvalues below the significance level, the null hypothesis is rejected and consequently
Model B is assumed to fit significantly better than Model A. The LRT will be used to
compare the fit of the Models 2, 3, and 4 with the Model 1 (the base model).
114
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Alternative methods for model comparison (nested or non-nested) are the Akaike
Information Criterion (AIC) and the Bayesian Information Criterion (BIC). The AIC
and the BIC balance having a simple model and with a model that fits well. Smaller
values of the AIC and the BIC indicate superior models. The model with the minimum
AIC or BIC is the preferred model. The BIC penalizes the complex model more heavily
than the AIC, and hence gives preference to simpler models in selection (Anderson,
2008; Lewis et al., 2010). The BIC will be used to identify the best fit model among the
fitted models.
The Fitted Models and Interpretation
Results of the models fitted to data up to 2007 for the four case study stations are
presented in Tables 1, 2, 3, and 4. To study the impact of SOI phases, phase 1 is
considered as reference category and the effect of other phases are compared with this
reference category. Apart from Clarence (where cosine term has no significant impact),
the cyclical sine and cosine terms have significant impact on monthly rainfall. For all
the four case studies, NINO 3.4 and SOI have significant impacts on monthly rainfall
amounts after adjusting for the sine and cosine terms. For Bidyadanga and Trayning, the
SOI phases do not have a significant impact on the monthly rainfall amount. Among the
four studied models, Model 1 is nested in the other three models. The LRT is used to
compare the fit of the Models 2, 3 and 4 with Model 1. When considering all 220
studied stations, the model with NINO 3.4 fits significantly better (based on the LRT
statistic) than the base model for most of the stations (83.2%). The model with SOI
phase fits significantly better than the base model for just above half of the studied
stations (57.7%) (Table 5).
115
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Table 1 The estimated values of the coefficients for the predictors in the fitted rainfall
Models for Bidyadanga.
Model 1
(Base Model)
LRT statistic
BIC
8081
(to compare withˆModel 1)
j
t -value
constant
Sine
Cosine
NINO 3.4
SOI
SOI phase
phase 2
phase 3
phase 4
phase 5
2.90
1.90
0.72
43.76***
21.28***
8.57***
Model 2
(Base Model +NINO 3.4)
Model 3
(Base Model +SOI)
Model 4
(Base Model +SOI phase)
4.22*
8081
8.41**
8075
2.18
8095
ˆ j
t -value
2.91
1.91
0.72
-0.11
43.92***
21.47***
8.55***
-1.98*
ˆ j
t -value
ˆ j
t -value
2.89
1.93
0.71
44.68***
22.08***
8.70***
2.82
1.94
0.71
19.70***
21.96***
8.57***
0.02
2.83**
0.29
-0.23
0.12
-0.02
1.64
-1.12
0.66
-0.10
*** p 0.001; ** 0.001 p 0.01; *0.01 p 0.05
Table 2 The estimated values of the coefficients for the predictors in the fitted rainfall
Models for Trayning.
Model 1
(Base Model)
LRT statistic
BIC
9702
(to compare withˆModel 1)
t -value
j
Constant
Sine
Cosine
NINO 3.4
SOI
SOI Phase
phase 2
phase 3
phase 4
phase 5
3.18
-0.12
-0.69
***
103.32
-2.84**
-16.00***
Model 2
(Base Model +NINO 3.4)
Model 3
(Base Model +SOI)
Model 4
(Base Model +SOI phase)
11.86**
9694
5.19*
9702
2.30
9718
ˆ j
3.19
-0.13
-0.70
-0.10
ˆ j
t -value
***
105.46
-3.02**
-16.59***
-3.38**
***
3.18
-0.12
-0.69
104.24
-2.87**
-16.20***
0.01
2.23*
*** p 0.001; ** 0.001 p 0.01; *0.01 p 0.05
116
t -value
ˆ j
t -value
3.13
-0.12
-0.70
41.74***
-2.95**
-16.33***
0.210
0.005
-0.003
-0.007
2.17
0.04
-0.03
-0.07
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Table 3 The estimated values of the coefficients for the predictors in the fitted rainfall
Models for Theebine.
Model 1
(Base Model)
LRT statistic
BIC
12117
(to compare with Model 1) t -value
ˆ j
Model 2
(Base Model +NINO 3.4)
19.61***
12102
Model 3
(Base Model +SOI)
13.51***
12109
Model 4
(Base Model +SOI phase)
3.64**
12129
ˆ j
t -value
ˆ j
t -value
ˆ j
t -value
4.27
160.40***
4.28
161.30***
4.27
161.11***
4.08
61.11***
Sine
0.47
12.52
***
0.48
12.91
***
0.47
12.77
***
0.47
12.49***
Cosine
0.47
12.65***
0.47
12.60***
0.47
12.60***
0.48
12.72***
-0.11
***
0.01
3.66***
phase 2
0.31
3.61***
phase 3
0.21
2.14*
phase 4
0.25
2.78**
phase 5
0.15
1.84
Constant
NINO 3.4
-4.31
SOI
SOI Phase
*** p 0.001; ** 0.001 p 0.01; *0.01 p 0.05
Table 4 The estimated values of the coefficients for the predictors in the fitted rainfall
Models for Clarence.
Model 1
(Base Model)
Model 2
(Base Model +NINO 3.4)
Model 3
(Base Model +SOI)
Model 4
(Base Model +SOI phase)
14.36***
12605
5.69*
12616
3.34
12627
LRT statistic
BIC
12616
(to compare with Model 1)
ˆ j
t -value
t -value
t -value
t -value
0.33
9.40***
0.04
1.31
phase 2
0.20
2.42*
phase 3
-0.09
-0.97
phase 4
0.01
0.16
phase 5
0.07
0.95
Sine
0.33
Cosine
0.06
NINO 3.4
9.14
***
1.60
4.48
0.33
9.30
***
0.05
1.43
-0.09
-3.75***
SOI
4.47
0.33
177.63***
ˆ j
70.81***
4.47
177.49***
ˆ j
4.41
Constant
177.55***
ˆ j
9.22
***
0.05
1.48
0.01
2.34*
SOI Phase
*** p 0.001; ** 0.001 p 0.01; *0.01 p 0.05
117
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Table 5 Number (percentage) of stations where Models 1, 2, 3 and 4 are preferred using
the BIC, and where Models 2, 3 and 4 fit significantly better than the base model
(Model 1) based on the LRT (among the 220 studied stations).
Model
BIC
Model 1 (Base model)
Model 2 (Base model + NINO 3.4)
Model 3 (Base model + SOI)
Model 4 (Base model + SOI phase)
45 (20.5)
145 (65.9)
29 (13.2)
1 (0.5)
LRT
--(Compared
183 (83.2) to
Model
1)
153 (69.5)
127 (57.7)
The performance of the four models is also compared using the BIC. On the basis of the
BIC, the model with the SOI is preferred for modelling the monthly rainfall in
Bidyadanga. The model using the NINO 3.4 is preferred for modelling the monthly
rainfall totals for the other three case study stations. Among the four studied models, the
base model and the model with SOI phase are not preferred for modelling the monthly
rainfall totals of any of the four case study stations. When considering all 220 studied
stations (Table 5), the model with NINO 3.4 is preferred for modelling the monthly
rainfall totals for most of the stations (65.9%). The model with SOI phase is preferred
for only one of the studied stations. The base model is preferred for modelling the
monthly rainfall totals of 20.5% of stations (Table 5), suggesting there are a reasonable
number of stations for which none of the climatological variables have a substantial
impact on rainfall.
Maps of Australia in Figure 2 represent the stations where the four studied models are
preferred (based on the BIC). Clearly, as indicated in Table 5, the model with NINO 3.4
is generally the preferred model. However, rainfall in south-east Australia is better
modelled using the SOI. Stations where the base model is preferred are clustered on
western inland Australia and on southern Australia.
118
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Figure 2 Stations where Model 1, 2, 3 or 4 is preferred (on the basis of BIC).
Figure 3 Contour maps of Australia showing the regression coefficients of NINO 3.4
(from Model 2) and SOI (from Model 3) on rainfall amounts.
The regression coefficients from all the 220 studied stations for NINO 3.4 (from Model
2) and SOI (from Model 3) are represented by contour maps in Figure 3. Model 4
considers one categorical variable, the SOI phase, which has five levels and so produces
four extra regression coefficients along with other coefficients. The regression
119
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
coefficients from Model 4 are not presented by the contour maps but the fit of all the
three models are compared with the base model using the LRT and the P-values are
presented by contour maps in Figure 4. When constructing the contour maps, Kriging
(Diggle and Ribeiro, 2007) is used to interpolate the values at unobserved locations
from nearby stations.
Figure 4 Contour maps of Australia showing the P-values from LRT for Model 2
(NINO 3.4), Model 3 (SOI) and Model 4 (SOI phase) comparing with Model 1 (base
model).
The values of regression coefficients for the two models are not comparable as the
models consider variables with different scale. After adjusting for the cyclical features
of rainfall data, the NINO 3.4 has a negative impact on rainfall in almost everywhere in
Australia, indicating that when the values of the index increase the amount of rainfall
decreases.
Larger negative impacts of the NINO 3.4 are observed on monthly rainfall amounts of
eastern and north-eastern Australia and in coastlines on western parts of Australia. The
SOI has a positive impact on rainfall in Australia, indicating that when the value of the
120
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
SOI increases, the amount of rainfall also increases. Negative regression coefficients for
the SOI are observed in some places in south-west Australia. However, only a few
stations are included in the study from this region. Like NINO 3.4, a larger impact of
SOI is observed on monthly rainfall amounts on eastern and north-eastern Australia.
From Figure 4, observe that models with the climatological variables fit significantly
better than the base model for the rainfall data from most of the places in north-eastern
and eastern Australia and in western parts of Tasmania. The model with NINO 3.4 also
fits significantly better for some places on the coastline of western Australia. No
significant improvement in the models with the climatological variables over the base
model is observed to fit the monthly rainfall data for the stations from southern, middle,
inland western and north-western parts of Australia.
To interpret the coefficients, recall the model uses the logarithmic link function. So for
example, for Model 2, for a given rainfall station, the mean amount of rainfall for event
t
is
ˆ t exp( 0) exp(sin 2π m / 12 ) exp(cos 2π m / 12 ) exp( NINO3.4 )t 1
1
2
3
.
To
understand this interpretation, use, as an example, the values of the climatological
variables from February 2008 to predict March 2008 rainfall of Bidyadanga. The values
of the NINO 3.4, SOI and SOI phase for February 2008 are 1.87, 20.99 and 2
respectively. For March 2008, the mean predicted rainfall amounts for Bidyadanga from
Models 1, 2, 3 and 4 are 122.85mm, 152.48mm, 173.79mm and 158.06mm respectively.
Model 1 includes only the sine and cosine terms as predictors, so the predicted rainfall
amounts ̂ are a recurrent pattern over the years of the study period. The predicted
mean rainfall amounts from Models 2, 3 and 4 have some year-to-year variations
depending on the NINO 3.4, SOI and SOI phase.
121
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Figure 5 Percentage change in predicted rainfall amount for the extra variable (NINO
3.4, SOI or SOI phase) with sine and cosine terms as variables.
To compare the results from the models having climatological variables with the base
model, percentage changes in predicted mean monthly rainfall amounts are measured
for the four case study stations. The changes for the three models for the months of the
years 2008 and 2009 are presented in Figure 5. The horizontal line through zero
indicates the predictions made by the reference base model and the other lines show the
percentage of change in the predicted mean monthly rainfall due to adding the
climatological variable with the base model. Any value far from the line indicates a
greater percentage of change in the predicted mean monthly rainfall amount for the
model with the climatological variables with respect to the base model. From the figure,
substantial changes in the predicted monthly rainfall amounts are observed due to
adding the climatological variables with the base model. The lines for percentage of
changes in monthly rainfall amounts due to adding the climatological variables have
almost the same trends for all the case studies stations. However, the percentage
changes are higher for the drier rainfall station (Bidyadanga) and lower for the wetter
rainfall station (Clarence).
122
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Figure 6 Percentage change in predicted probability of no rainfall amount for the extra
variable (NINO 3.4, SOI or SOI phase) with sine and cosine terms as variables.
The Tweedie distribution parameters , p, can be reparameterized to the Poisson
and gamma parameters , , , when 1 p 2 (Dunn, 2004), providing approximate
down-scaling information about monthly rainfall. The transformation between the
parameterizations (when 1 p 2 ) is:
2 p
2 p
; p 1 p 1 ;
p2
.
1 p
The mean number of rainfall events , the shape parameter of the rainfall gamma
distribution and the amount of rain per rainfall event can then be computed. The
maximum likelihood estimator of the dispersion parameter for the model fitted to
rainfall data of Bidyadanga is ˆ 10.84 . For Bidyadanga, the predicted mean number
of rainfall events in March, 2008 for Models 1, 2, 3 and 4 are = 1.58, 1.72, 1.82 and
1.75 respectively. For the models, the shape parameters of the rainfall gamma
123
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
distribution are = 116.59, 132.73, 143.56 and 135.62; and the mean amounts of
rainfall per event are = 77.73mm, 88.49mm, 95.71mm and 90.41mm respectively.
Using the model, the parameters can be estimated for other months of Bidyadanga, as
well as for each month of other stations. Using the Tweedie distribution in this way to
understand the finer timescale structure of monthly rainfall has been successfully used
to simulate the growing of sorghum (Lennox et al., 2004) using APSIM (McCown et
al., 1996).
Also of interest is how well the model predicts the probability of no rainfall. The
probability of recording no rain is π 0 exp( ) (Dunn and Smyth, 2005). Note that,
for some rainfall stations, there are some months with no exact zero in the observed
rainfall records; however, the models permits the possibility of having months with no
rainfall in future.
Using the values of the climatological variables from February 2008 for Bidyadanga,
the probabilities of recording no rainfall in March, 2008, with the fitted models are 0.21,
0.18, 0.16 and 0.17 respectively. Percentage changes in predicted probability of no
rainfall for Models 2, 3 and 4 with respect to the base model are measured for the
months of the years 2008 and 2009 and presented in Figure 6. The horizontal line
through zero indicates the predictions using the base model and the other lines represent
the percentages of change in the probability of no rainfall due to adding the
climatological variable with respect to the base model. From Figure 6, substantial
changes in the predicted probability of no rainfall are observed for the Models 2, 3 of 4
from the base model. For the wetter stations, larger percentages of changes in the
probability of no rainfall for the Models 2, 3 or 4 from the base model are observed with
respect to the drier stations. For drier stations the models produce more stable result
(predicted probability of no rainfall) than for the wetter stations as the wetter stations
have few months with no rainfall.
Conclusion
Tweedie GLMs were used to fit the monthly rainfall amounts of 220 Australian rainfall
stations with cyclic sine and cosine terms and the lagged values of the climatological
variables NINO 3.4, SOI and SOI phase. The four different models were studied to
124
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
observe if any of the models with climatological variables improve on the base model,
which one(s) perform better than the base model, and where each model with a
climatological variable performs better than the base model.
Based on the BIC, the model with NINO 3.4 is generally preferred for modelling the
monthly rainfall totals for most of the studied stations. The model with SOI is preferred
for the data for some stations in south-eastern Australia and the model with SOI phase is
preferred for only one of the studied stations. The stations where the base model is
preferred are concentrated on two clusters in southern Australia and inland Western
Australia.
Based on the LRT, models with the climatological variables fit significantly better than
the base model for modelling the monthly rainfall totals in most of the places in northeastern and eastern parts of Australia. Models with NINO 3.4 also fit well in some
places on coastlines of Western Australia. None of the climatological variables show
significant impact on the monthly rainfall totals for the stations located at the central
and north-western Australia. However, little data is available in these regions.
Substantial changes in the predicted amount and probability of no rainfall are observed
due to adding the climatological variables to the base model.
The sophisticated Tweedie generalized linear models were used for predicting various
features of rainfall, such as the probability of no rainfall, the monthly rainfall amounts,
the number of rainfall events in each month and the amount of rainfall per event. These
features of rainfall can be used in many areas of planning in agriculture and hydrology.
Acknowledgements
The comments of the reviewers and the editor are most gratefully acknowledged; they
improved the flow, interpretation and understanding of the paper.
125
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
References
Abtew W, Melesse AM, Dessalegne T. 2009. Spatial, inter and intra-annual variability
of the upper Blue Nile basin rainfall. Hydrological Processes 23: 3075–3082.
Allan DM, Haan CT. 1975. Stochastic simulation of daily rainfall. Technical Report 82,
Water Resources Institute, University of Kentucky.
Almeira GJ, Scian B. 2006. Some atmospheric and oceanic indices as predictors of
seasonal rainfall in the Del Plata Basin of Argentina. Journal of Hydrology 329: 350–
359.
Anderson DR. 2008. Model Based Inference in the Life Sciences. Springer, USA.
Buishand TA, Shabalova MV, Brandsma T. 2004. On the choice of the temporal
aggregation level for statistical downscaling of precipitation. Journal of Climate 17:
1816–1827.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using generalized
linear models: a case study from the west of Ireland. Water Resources Research 38(10):
1192–1202.
Chapman T. 1998. Stochastic modelling of daily rainfall: The impacts of adjoining wet
days on the distribution of rainfall amounts. Environmental Modelling and Software 13:
317–324.
Chowdhury RK, Beecham S. 2010. Australian rainfall trends and their relation to the
southern oscillation index. Hydrological Processes 24: 504–514.
Das SC. 1955. The fitting of truncated Type III curves to daily rainfall data. Australian
Journal of Physics 8: 298–304.
Diggle PJ, Ribeiro PJ Jr. 2007. Model-based Geostatistics, Springer, New York.
126
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Dobson AJ, Barnett AG. 2008. An introduction to generalized linear models, Chapman
and Hall, London.
Dunn PK. 2004. Occurrence and quantity of precipitation can be modelled
simultaneously. International Journal of Climatology 24: 1231–1239.
Dunn PK. 2010. Tweedie: Tweedie exponential family models. R package, R package
version 2.0.5. Vienna, Austria.
Dunn PK, Smyth GK. 2005. Series evaluation of Tweedie exponential dispersion model
densities. Statistics and Computing 15: 267–280.
Dunn PK, Smyth GK. 2008. Evaluation of Tweedie exponential dispersion model
densities by Fourier inversion. Statistics and Computing 18: 73–86.
Dunn PK, White N. 2005. Power-variance models for modelling rainfall. In Statistical
Solution to Modern Problems: Proceedings of the 20th International Workshop on
Statistical Modelling, Sydney, 149–156.
Durban M, Glasbey CA. 2001. Weather modelling using a multivariate latent Gaussian
model. Agricultural and Forest Meteorology 109: 187–201.
Everingham YL, Reason CJC. 2009. Interannual variability in rainfall and wet spell
frequency during the New South Wales sugarcane harvest season. International Journal
of Climatology, Published online in Wiley InterScience (www.interscience.wiley.com)
DOI: 10.1002/joc.2066.
Fernandes MVM, Schmidt AM, Migon HS. 2009. Modelling zero-inflated spatiotemporal processes. Statistical Modelling 9(1): 3–25.
Feuerverger A. 1979. On some methods of analysis for weather experiments.
Biometrika 66: 655–658.
Giese BS, Compo GP, Slowey NC, Sardeshmukh PD, Carton JA, Ray S, Whitaker JS.
n o. Bulletin of American Meteorological Society 91: 177–183.
2010. The 1918/19 El Ni~
127
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Glasbey CA, Nevison IM. 1997. Rainfall modelling using a latent Gaussian variable. In
Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, and
Future Directions, Springer, New York, 233–242.
Gonzalez MH, Cariaga ML. 2009. An approach to seasonal forecasting of summer
rainfall in Buenos Aires, Argentina. Atmosfera 22(3): 265–279.
Hamlin MJ, Rees DH. 1987. The use of rainfall forecasts in the optimal management of
small-holder rice irrigation–a case study. Hydrological Sciences 32(1): 15–29.
Hammer GL, Holzworth DP, Stone RC. 1996. The value of skill in seasonal climate
forecasting to wheat crop management in a region with high climatic variability.
Australian Journal of Agricultural Research 47: 717–737.
Hansen JW, Ines AVM. 2005. Stochastic disaggregation of monthly rainfall data for
crop simulation studies. Agricultural and Forest Meteorology 131: 233–246.
Hansen JW, Mishra A, Rao KPC, Indeje M, Ngugi RK. 2009. Potential value of GCMbased seasonal rainfall forecasts for maize management in semi-arid Kenya.
Agricultural systems. 101: 80–90.
Hasan MM, Dunn PK. 2010a. Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia. International Journal of Climatology. Published
online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/joc.2162.
Hasan MM, Dunn PK. 2010b. A simple Poisson–gamma model for modelling rainfall
occurrence and amount simultaneously. Agricultural and Forest Meteorology. 150:
1319–1330.
Jørgensen B. 1987. Exponential dispersion models (with discussion). Journal of the
Royal Statistical Society Series B, 49: 127–162.
Jørgensen B. 1997. The Theory of Dispersion Models, Chapman and Hall, London.
128
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Lee CK, Shen SPS, Bailey B, North GR. 2009. Factor analysis for El Niño signals in
sea surface temperature and precipitation. Theoretical and Applied Climatology 97:
195–203.
Lennox SM, Dunn PK, Power BD, Devoil P. 2004. A statistical distribution for
modelling rainfall with promising applications in crop science, Technical report, in
Fischer T. et al., New directions for a diverse planet: Proceedings for the 4th
International Crop Science Congress, Brisbane, Australia.
Lewis F, Butler A, Gilbert L. 2010. A unified approach to model selection using the
likelihood ratio test. Methods in Ecology & Evolution. doi: 10.1111/j.2041210X.2010.00063.x.
Little MA, McSharry PE, Taylor JW. 2009. Generalized Linear Models for SiteSpecific Density Forecasting of U.K. Daily Rainfall. Monthly Weather Review 137:
1029-1045.
Lorenzo MN, Iglesias I, Taboada JJ, Gesteira MG. 2010. Relationship between monthly
rainfall in northwest Iberian Peninsula and North Atlantic sea surface temperature.
International Journal of Climatology 30: 980–990. DOI: 10.1002/joc.1959.
Loukas A, Vasiliades L. 2004. Probabilistic analysis of drought spatiotemporal
characteristics in Thessaly region, Greece. Natural Hazards and Earth System Sciences
4: 719–731.
McCown RL, Hammer GL, Hargreaves JNG, Holzworth DP, Freebairn DM. 1996.
APSIM: a novel software system for model development, model testing, and simulation
in agricultural research. Agricultural Systems 50: 255–271.
McCullagh P, Nelder JA.1989. Generalized Linear Models, 2nd edition, Chapman and
Hall, London.
Meinke H, Ryley M. 1997. Effect of sorghum ergot on grain sorghum production: a
preliminary analysis. Australian Journal of Agricultural Research 48: 1241–1247.
129
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Meinke H, Stone RC, Hammer GL. 1996. SOI phases and climatic risk to peanut
production: A case study for northern Australia. International Journal of Climatology
16: 783–789.
Meng Q, Zhang Y, Wang Z. 2007. Rainfall predictive models for building simulation IIrainfall estimation, Proceedings: Building simulation. Technical report, Tsinghua
University, Beijing, China.
Mooley DA. 1973. Gamma distribution probability model for Asian summer monsoon
monthly rainfall. Monthly Weather Review 101(2): 160–176.
Piantadosi J, Boland JW, Howlett PG. 2009. Generating synthetic rainfall on various
timescales-daily, monthly and yearly. Environmental Modeling and Assessment 14:
431–438.
Power S, Tseitkin F, Torok S, Lavery B, Dahni R, McAvaney B. 1997. Australian
temperature, Australian rainfall and the Southern Oscillation, 1910–1992: coherent
variability and recent changes. Australian Meteorological Magazine 47: 85–101.
Quinn WH, Burt WV. 1972. Use of the southern oscillation in weather prediction.
Journal of Applied Meteorology 11: 616–628.
R Development Core Team. 2010. R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN: 3-90005107-0.
Richardson CW, Wright DA. 1984. WGEN: A model for generating daily weather
variables. Technical Report 8, United States Department of Agriculture, Agriculture
Research Service.
Rosenberg K, Boland JW, Howlett PG. 2004. Simulation of monthly rainfall totals
ANZIAM Journal 46: 85–104.
Shui LT, Haque A. 2004. Stochastic Rainfall Model for Irrigation Projects. Pertanika
Journal of Science and Technology 12(1): 137–147.
130
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Simmonds I, Hope P. 1997. Persistence characteristics of Australian rainfall anomalies.
International Journal of Climatology 17: 597–613.
Smyth GK. 1996. Regression modelling of quantity data with exact zeroes. Paper
presented at proceedings of the second Australia–Japan workshop on stochastic models
in
engineering,
technology and
management.
Technical
report,
Technology
Management Centre, University of Queensland.
Smyth GK with contributions from Hu Y and Dunn PK. 2009. statmod: Statistical
Modelling. R package version 1.4.1.
Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data (with discussion).
Journal of the Royal Statistical Society, Series A 147: 1–34.
Stone RC, Auliciems A. 1992. SOI phase relationships with rainfall in eastern Australia.
International Journal of Climatology 12: 625–636.
Stone RC, Hammer GL, Marcussen T. 1996a. Prediction of global rainfall probabilities
using phases of the Southern Oscillation Index. Nature, 384, 252–255.
Stone RC, McKeon GM. 1993. Prospects for using weather prediction to reduce pasture
establishment risk. Tropical Grasslands 27: 406–413.
Stone RC, Nicholls N, Hammer G. 1996b. Frost in northeast Australia: Trends and
influences of phases of the southern oscillation. Journal of Climate 9(8): 1896–1909.
Troup AJ. 1965. The Southern Oscillation. Quarterly Journal of Royal Meteorological
Society 91: 490–506.
Tweedie MCK. 1984. An index which distinguishes between some important
exponential families. Statistics: Applications and new directions, Proceedings of the
Indian Statistical Institute Golden Jubilee International Conference. Technical report,
Indian Statistical Institute, Calcutta.
Wilks DS. 1990. Maximum likelihood estimation for the gamma distribution using data
containing zeros. Journal of Climate 3: 1495–1501.
131
Effect of climatology on monthly rainfall of Australia using Tweedie GLMs
Wilks DS. 1999. Interannual variability and extreme-value characteristics of several
stochastic daily precipitation models. Agricultural and Forest Meteorology 93: 153–169.
Willcocks JR, Stone RC. 2000. Frost risk in eastern Australia and the influence of the
Southern Oscillation. DPI Information Series QI00001. Queensland Department of
Primary Industries: Toowoomba.
Wu R, Kirtman BP. 2007. Roles of the Indian Ocean in the Australian summer
monsoon–ENSO relationship. Journal of Climate 20: 4768–4788.
132
Tweedie distributions for modelling seasonal rainfall totals in Australia
CHAPTER 7: THE TWEEDIE FAMILY OF DISTRIBUTIONS FOR
MODELLING SEASONAL RAINFALL TOTALS IN AUSTRALIA
Authors: Peter K. Dunn, Md Masud Hasan
Affiliations: School of Health and Sport Sciences, Faculty of Science, Health and
Education, University of the Sunshine Coast.
Journal: Journal of Hydrology, Submitted June 11, 2011.
JCR ranking: 4 /64 (Water resources), 3/106 (Civil Engineering)
Impact factor: 2.433
ARC tier ranking: A*
133
Tweedie distributions for modelling seasonal rainfall totals in Australia
Statement of Intellectual Contribution
I, Md Masud Hasan, have made substantial independent intellectual contributions to the
research paper ‗The Tweedie family of distributions for modelling seasonal rainfall
totals in Australia’. The intellectual contributions include the development of the study
hypothesis, identification and modification of methodology, development of new
applications, responsibility for independent analysis and interpretation of results.
…………………………….
…………………………………………
Md Masud Hasan
Date
………………………………..
……………………………..............
Peter K Dunn
Date
134
Tweedie distributions for modelling seasonal rainfall totals in Australia
Abstract
Seasonal rainfall models have potential uses in agriculture, in streamflow hydrology and in
water resources planning and management. Understanding the distribution of seasonal
rainfall is important for achieving well-fitting models and for simulating reasonable rainfall
data. The widely-used log-normal and gamma distributions are not capable of modelling the
two components of seasonal rainfall simultaneously: the occurrence and the amount. To
model the both components of seasonal rainfall simultaneously, we considered distributions
from the Tweedie family. The Tweedie family includes continuous-symmetric (normal),
continuous-skewed (gamma) and mixture-type (Poisson-gamma) distributions. Studying the
seasonal rainfall data from 220 Australian stations, we observe that the gamma distributions
are near-optimal for a reasonable number of stations and the Poisson-gamma distributions
are optimal for more than half of the studied stations.
On the basis of the Kolmogorov-Smirnov test, the models from the Tweedie family fit the
observed seasonal rainfall data better than the log-normal distributions. Almost everywhere,
various statistics (5th percentiles, 25th percentiles, medians, 75th percentiles, 95th percentiles
and skewness) of observed seasonal rainfall totals are within the empirical 95% confidence
intervals of the respective statistics of simulated data using the Tweedie models. This is not
true for a reasonable number of cases when data is simulated by the log-normal models. The
data simulated by the Poisson-gamma models also have similar properties to the the
observed data with respect to the probability of no rainfall.
The ability of the Tweedie models to simulate the extreme rainfall amounts and the
probability of no rainfall can be useful in drought monitoring and in water resources
planning and management. The Tweedie models can also be incorporated into the
generalized linear modelling framework.
KEY WORDS: Tweedie family of distributions, optimal distribution, seasonal rainfall.
135
Tweedie distributions for modelling seasonal rainfall totals in Australia
Introduction
Accurate forecasting of seasonal rainfall totals presents considerable opportunities to
crop managers to provide improvements in the overall system of crop production
(Meinke et al., 2001; Stone and Meinke, 2005). Seasonal rainfall forecasting can also
have significant value in determining supplemental irrigation, water requirements, and
in engineering studies related to storage analyses and reservoir management (Nnaji,
2001; Singhrattna et al., 2005). Seasonal rainfall forecasting has also been used in
forecasting of streamflow (Endale et al., 2003; Kirono et al., 2010), and in better
management of the water used in hydro-generation on a seasonal basis (Purdie and
Bardsley, 2010).
Linear regression models and correlation coefficients are widely used for understanding
the impact of different predictors on seasonal rainfall total (Nnaji, 2001; Chiew and
Leahy, 2003; Suppiah, 2004; Cheung et al., 2008; Gonzalez and Cariaga 2009; Risbey
et al., 2009; Kirono et al., 2010; Purdie and Bardsley, 2010). Ntale et al. (2003) used
canonical correlation analysis to predict the standardized seasonal rainfall totals of East
Africa at 3-month lead time using the sea level pressure and the sea surface temperature
anomaly of the Indian and the Atlantic oceans. One of the basic assumptions of the
abovementioned models is that the dependent variable (seasonal rainfall total) is
normally distributed. However, the use of linear regression model is not appropriate
when the seasonal rainfall total is not normally distributed, as is the case for most of the
Australian rainfall stations.
Sometimes, seasonal rainfall totals are divided into categories and discriminant analysis
is performed to forecast seasonal rainfall (Ward and Folland, 1991; Mason, 1998;
Drosdowsky and Chambers, 2001) to fall in any of the categories (for example, above
third quartile, above median etc.). Discriminant analysis only forecasts total seasonal
rainfall falling into one of the categories.
Alternatively, seasonal rainfall totals can be modelled directly considering specific
distributions. For example, the gamma distribution is widely used for modelling the
positive daily (Aksoy, 2006), weekly (Sharda and Das, 2005) or monthly (Sen and
Eljdid, 1999) rainfall totals. Gamma and log-normal distributions fit well to the seasonal
136
Tweedie distributions for modelling seasonal rainfall totals in Australia
rainfall totals (Mooley 1975; Cho et al., 2004; Rader et al., 2009); however, these
distributions are defined for only strictly positive values. The standardized precipitation
index (SPI) assume the seasonal rainfall totals to have gamma distributions which are
then transferred to normal distributions and these transformed values are used in
modelling (Almeira and Scian, 2006; Canon et al., 2007; Pai et al., 2010).
The models mentioned above used for modelling rainfall totals at various timescales
also have difficulty with the mixture of discrete (exact zero when no rainfall is
recorded) and continuous (rainfall amount with non zero rainfall recorded) data. To
overcome the difficulty when modelling daily rainfall totals, some authors use logistic
regression (Chandler and Wheater, 2002) or Markov chains (Richardson and Wright,
1984; Stern and Coe, 1984; Laux et al., 2009) to model the occurrence of wet or dry
days, then gamma distributions to model the amount of rainfall on wet days (Coe and
Stern, 1982; Wilks, 1999; Chandler, 2005). Das et al. (2006) used Markov chains for
modelling rainfall occurrence and gamma distributions for modelling amount of weekly
rainfall in Bihar, India. An alternative approach was adopted by Glasbey and coworkers (Glasbey and Nevison, 1997; Durban and Glasbey, 2001), who applied a
monotonic transformation of rainfall data to define a latent Gaussian variable with zero
rainfall corresponding to censored values below some threshold. Dunn (2004) used
Poisson-gamma (P-G) distributions to model the occurrence and amount of monthly
rainfall simultaneously. Hasan and Dunn (2010a) showed that two different
distributions (gamma and P-G) in the Tweedie family fit well to the monthly rainfall
totals in Australian stations. Hasan and Dunn (2010b, 2011) used P-G models to
simulate the occurrence and amount of monthly rainfall totals for Australian stations.
However, the distributions are not used to model seasonal rainfall totals.
The aim of this paper is to search for suitable distribution(s) for modelling seasonal
rainfall totals of Australian stations within a certain family of distributions that have
useful properties. In the context of seasonal rainfall totals, Australia has a variety of
rainfall stations; stations with very wet to extremely dry, extremely skewed to
approximately symmetric rainfall distributions. We explore the possibility that the
seasonal rainfall totals of different seasons may follow different distributions (but
137
Tweedie distributions for modelling seasonal rainfall totals in Australia
within the same family of distributions), rather than following a single distribution with
varying parameters.
The sources and some characteristics of the data used in the study are described in
Section 2. Details of the methodology, including the form of the Tweedie family,
formulation and important characteristics of the Tweedie family of distributions are
discussed in Section 3. The results and discussion (Section 4) are followed by some
concluding comments (Section 5).
Figure 1 Location of the stations studied. The four case study stations mentioned in the
paper are named and indicated using squares; black dots represent the other studied
stations.
Data and Preliminaries
Seasonal rainfall data from 220 Australian rainfall stations were studied (data obtained
from the Australian Bureau of Meteorology) with four stations used as case studies
(Figure 1). Most of the studied stations (about 92%) have data available from 1910 or
earlier to 2007. The remaining stations are generally located in remote areas, and have
rainfall records from 1950 or earlier to 2007. The monthly rainfall totals were
accumulated to obtain the southern hemisphere seasonal rainfall totals (autumn: March,
April, May; winter: June, July, August; spring: September, October, November;
summer: December, January, February). The seasonal time series of the four case study
stations were studied for the period from 1912 to 2007. These stations represent a
variety of types of rainfall distributions in Australia. Bidyadanga has more rainfall
138
Tweedie distributions for modelling seasonal rainfall totals in Australia
during the southern hemisphere summer and little rainfall during spring and winter.
Theebine and Clarence also observe more rainfall during the southern hemisphere
summer, but winter is not as dry as Bidyadanga. Trayning is a winter-dominant rainfall
station. In Bidyadanga, some winter and spring seasons exist with no rainfall (Table 1).
The distribution of seasonal rainfall totals of the case study stations were presented in
the histograms in Figure 2. From the histograms we observe that, for most of the cases,
the seasonal rainfall totals are not normally distributed and are very skewed to the right.
Moreover, for a reasonable number of cases (26.8% of the studied 4 220 seasonal
rainfall data series), the seasonal rainfall data have at least one exact zero. The widelyused gamma and log-normal distributions are not appropriate for modelling the seasonal
rainfall data with any dry season.
For each season in Bidyadanga, the seasonal rainfall totals of ninety-six years were
divided into twelve groups with eight seasons each. The logarithm of the group means
and group variances are shown in the scatterplots in Figure 3. The scatterplots clearly
show that the variances of the group means are not constant; in fact, the group means
are roughly linearly related to the group variances when measured on a log scale. For
spring rainfall of Bidyadanga, a strong mean-variance relationship is observed. For the
other seasons, the relationship is linear but are more scattered with some outliers. The
graphs in Figure 3 were constructed with each plotted point consisting of only eight
seasonal rainfall totals each, and hence the scatterplots only provide approximate
information regarding the magnitude of the relationship. In other words, the graphs
offer only informal support for a linear mean-variance relationship; the more
sophisticated methods will be applied in this paper to quantify the magnitude of
relationship and to identify the appropriate distributions within the Tweedie family for
modelling the data.
139
Tweedie distributions for modelling seasonal rainfall totals in Australia
Table 1 Summary statistics of the seasonal rainfall for Bidyadanga, Trayning, Theebine
and Clarence from 1912 to 2007.
Bidyadanga Autumn
Winter
Spring
Summer
Trayning
Autumn
Winter
Spring
Summer
Theebine
Autumn
Winter
Spring
Summer
Clarence
Autumn
Winter
Spring
Summer
Minimum
0.3
0.0
(mm)
0.0
13.9
10.3
56.2
8.9
0.4
37.5
8.8
32.0
119.6
67.2
26.5
43.1
43.7
Median
129.2
11.5
(mm)
1.6
263.6
73.1
141.0
50.4
33.9
219.3
112.4
173.1
391.8
273.4
189.2
197.7
309.8
Mean
152.7
26.4
(mm)
7.8
323.3
85.0
143.3
54.4
42.5
241.5
128.8
184.0
404.9
321.1
227.0
207.2
322.1
SD
131.1
41.8
(mm)
20.8
210.8
49.7
43.2
26.4
36.5
149.2
87.4
90.0
177.3
179.5
129.8
103.4
146.0
CV
(%)
85.8
158.4
265.6
65.2
58.5
30.1
48.5
85.9
61.8
67.9
48.9
43.8
55.9
57.2
49.9
45.3
Skewness
1.80
3.53
5.54
0.87
1.46
0.67
0.29
1.15
1.30
1.26
0.64
0.77
1.15
1.56
0.52
0.91
Methodology
The seasonal rainfall totals for different seasons of the case study stations differ (Table
1) with respect to their centre (mean or median), dispersion (standard deviation or coefficient of variation) or skewness (computed using the standard definition given in, for
example, Daniel (2009)). Some of the seasonal rainfall data are continuous and some
are mixture of continuous and discrete data (continuous rainfall totals with exact zeros).
Hence, a single distribution may not be sufficient to represent the seasonal rainfall totals
of all studied stations. We explore the possibility that different distributions are needed
for each season by considering first the Exponential Dispersion Model (EDM) family of
distributions and then restricting to the Tweedie family, for reasons outlined below.
First, the Tweedie distribution is motivated and the probability function given.
Exponential Dispersion Models
A probability function of the form
140
Tweedie distributions for modelling seasonal rainfall totals in Australia
1
f y; , a y , exp y
(3.1)
for y S , for some suitable function a ( y , ) and known functions and (.) , is
called an exponential dispersion model or EDM (Dunn and Smyth, 2005). The mean is
( ) , and 0 is the dispersion parameter. The function a ( y , ) cannot always
be written in closed form and is the function necessary to ensure the total integral or
summation of Y over the domain S is one. Examples of common EDMs include the
normal, binomial, gamma and Poisson distributions.
The notation Y ~ ED(,) indicates that the random variable Y comes from an EDM
with location parameter E[Y ] ( ) and variance var [Y ] ( ) , as in
Equation (3.1). The functional relationship between and defined by ( ) is
invertible, so the variance can be written as var[Y ] V( ) , when V () ( ) is
called the variance function.
Figure 2 Seasonal Rainfall distributions of Bidyadanga, Trayning, Theebine and
Clarence.
141
Tweedie distributions for modelling seasonal rainfall totals in Australia
Figure 3 Mean-variance relationships of autumn, winter, spring and summer rainfall of
Bidyadanga measured on log scale.
The Tweedie Family
To determine the appropriate distribution for modeling the seasonal rainfall totals, we
examine the mean-variance relationship of the seasonal rainfall totals. The scatterplots
in Figure 3 informally support that the variances of the group means are not constant
and are roughly linearly related with the means when measured on a log scale. The
relationship can be suitably expressed as log (group variance) = α + p log (group mean).
Rearranging :
group variance = exp { α + p log (group mean) }
= const (group mean) p .
The variance is approximately proportional to some power of the mean; that is,
var( y ) p or var( y ) p . As mentioned earlier, the more sophisticated methods
will be applied to determine the value of p , and to identify the appropriate distributions
within the Tweedie family for modeling the data.
Considering this property of seasonal rainfall totals, we restrict ourselves to a special
case of EDMs, the Tweedie family of distributions studied by Jørgensen (1987, 1997)
and named in honour of Tweedie (1984). The Tweedie family of distributions are those
EDMs with variance function V p for p (0, 1) (Jørgensen, 1987). For further
information, see Smyth (1996), Jørgensen (1997) or Dunn and Smyth (2005). The
142
Tweedie distributions for modelling seasonal rainfall totals in Australia
Tweedie distribution with mean , dispersion parameter and index parameter p is
denoted Tw p ( , ) .
There are four notable special cases of Tweedie distributions: the normal distribution
( p 0) , the Poisson distribution ( p 1, 1) , the gamma distribution ( p 2) and the
inverse Gaussian distribution ( p 3) . Apart from these special cases, the probability
functions for the Tweedie distributions have no closed form. For p 2 , the
distributions are suitable for modelling positive, right-skewed data. Of special interest
are the distributions for which 1 p 2 , also called the Poisson-gamma (or P-G)
distributions (Dunn and Smyth, 2005). In this context, the probability distributions for
which 1 p 2 can be developed as follows.
Assume any rainfall event i produces an amount of rainfall R i , and that each R i comes
from a gamma distribution Gam(, ) . In this parameterization, the mean is and
the variance is 2 . Assume the number of rainfall events in any one season is N ,
where N has a Poisson distribution with mean ; that is, N ~ Pois ( ) . This implies
seasons with no rainfall when N 0 . The total seasonal rainfall, Y , is the sum
Y R1 R2 RN . When N 0 , then Y 0 . The Tweedie parameters ( , p, ) can
be reparameterized to the parameters of the Poisson and gamma distribution ( , , )
when 1 p 2 (Dunn, 2004), providing approximate down-scaling information about
seasonal rainfall. The transformation between the parameterizations (when 1 p 2 ) is:
2 p
(2 p)
; ( p 1) p1 ;
p2
1 p
Importantly, the probability of recording no rain is Pr( Y 0 ) exp( ) (Dunn and
Smyth, 2005).
There are some important properties of the Tweedie distributions that make them
particularly appealing for use in rainfall modelling (Dunn, 2004):
143
Tweedie distributions for modelling seasonal rainfall totals in Australia
There is some intuitive appeal for the models, considering total rainfall as a sum
of rainfall events on smaller timescales (outlined above).
These distributions belong to the EDM family of distributions upon which the
generalized linear models (McCullagh and Nelder, 1989) are based.
Consequently, there is a framework already in place for fitting models based on
the Tweedie distributions and for diagnostic testing. In addition, covariates can
be incorporated into the modelling procedure.
They provide a mechanism for understanding the finescale structure in coarse
scale data, as discussed above.
All exponential dispersion models that are closed with respect to scale
transformations are Tweedie models (Jorgensen, 1997; Jiang, 2007); that is, if Y
is from a particular Tweedie distribution with index p and c is some constant,
then cY is also from the same Tweedie distribution. This means that restricting
to the Tweedie class of EDMs is sensible.
While these are important considerations, the Tweedie distributions also fit the data well
as shown in Section 4.
A formal way to determine the optimal distribution within the Tweedie family is to
compute the maximum likelihood estimate of the value of the index parameter p̂ of the
Tweedie family. Estimation of the index parameter requires sophisticated numerical
techniques (Dunn and Smyth, 2005, 2008), as implemented in the tweedie.Profile
(Dunn, 2010) function of the R (R Development Core Team, 2009) package tweedie
(Dunn, 2010). Sometimes, the maximum likelihood estimate of p is found on the
boundary of the parameter space so that pˆ 1 . This phenomenon is discussed in detail
in Dunn & Smyth (2005). One interpretation is that the normal distribution ( p 0) may
be optimal within the Tweedie family.
Results and Discussion
The formal approach to decide upon the optimal distribution within the Tweedie family
on the basis of the estimated value of the index parameter is adopted here and the results
are presented in Table 2. For the rainfall totals of different seasons, 1 pˆ 2 is
144
Tweedie distributions for modelling seasonal rainfall totals in Australia
obtained for more than half of the studied stations. In these cases, within the Tweedie
family, a P-G distribution is optimal for modelling seasonal rainfall. For a reasonable
number of stations, p̂ is greater than but close to two and the 95% confidence interval
of p span at two. The gamma distributions are near-optimal for modelling seasonal
rainfall totals of these stations. The estimate of p is on the boundary of the parameter
space for 11.4% of winter data series, and 7.3% of spring data series; indicating the
normal distribution is optimal for these cases. The results show that the normal
distributions are rarely optimal within the Tweedie family and a single distribution is
not appropriate for modelling the seasonal rainfall totals of Australian stations.
Table 2 Number (percentage) of stations where normal, gamma and P–G distributions
are near-optimal.
Autumn
Winter
Spring
Summer
pˆ 1
normal
2 (0.9)
25 (11.4)
16 (7.3)
1 (0.5)
1 pˆ 2
P–G
120 (54.5)
123 (55.9)
138 (62.7)
126 (57.3)
pˆ 2
gamma
98 (44.6)
72 (32.7)
66 (30.0)
93 (42.2)
The maps of Australia in Figure 4 show where different distributions within the
Tweedie family are optimal or near-optimal for modelling seasonal rainfall totals. As
mentioned earlier, the P-G distributions are optimal for more than half of the studied
stations. The stations where the P-G distributions are optimal are scattered throughout
Australia. Most of the stations where the gamma distributions are near-optimal are
concentrated on the coastlines. For autumn and summer rainfall the normal distributions
are rarely optimal within the Tweedie family. For winter rainfall one cluster exists in
south-east Australia and for spring rainfall two clusters exist in south and south-west
Australia where the normal distributions are near-optimal within the Tweedie family.
145
Tweedie distributions for modelling seasonal rainfall totals in Australia
Figure 4 Maps of Australia showing the stations where gamma, P–G and normal
distributions are near-optimal.
The performance of Tweedie and log-normal distributions for modelling seasonal
rainfall data of the studied stations were compared using the Kolmogorov-Smirnov (KS) test statistic. For this purpose, data are simulated from the optimal distributions
within the Tweedie family and also from the log-normal distributions. The simulated
data has the same length as the observed data. The test statistic D is defined as the
maximum value of the absolute differences between the cumulative distribution
functions of the observed S N ( x ) and the simulated F ( x ) data as
D max| S N ( x ) F ( x) |
where N is the number of data points. For the observed values of D greater than the
critical values at 0.05 level of significance, the difference between the two distributions
are considered as significant. The results from the K-S tests for comparing the simulated
146
Tweedie distributions for modelling seasonal rainfall totals in Australia
data from the Tweedie models and the log-normal models with the observed seasonal
rainfall data are presented in Table 3. For a large number of cases, the Tweedie models
fit well to the seasonal rainfall totals and, for all seasons, the percentages (meeting the
arbitrary 5% level of significance) are higher for the Tweedie models than the lognormal models.
Table 3: Number (percentage) of stations where the P-values from the K-S test statistics
are more than 5% level indicating a good fit of the data when fitting with the Tweedie
and log-normal distributions.
Tweedie
Log-normal
Autumn
Winter
Spring
Summer
Total
220 (100) 192 (87.3) 208 (94.5) 194 (88.2) 814 (92.5)
172 (78.2) 168 (76.4) 173 (78.6) 155 (70.5) 668 (75.9)
We then compare the performance of Tweedie and the log-normal models to simulate
the seasonal rainfall data. Different statistics (5th percentile, 25th percentile, median, 75th
percentile, 95th percentile and skewness) of the observed data were compared with the
similar statistics of the data simulated from the Tweedie and the log-normal models. For
each season of 220 studied stations, 1000 samples of size 100 (as most of the stations
have data for around 100 years) each were simulated from the Tweedie and the lognormal distributions. The log-normal distribution is not defined for exact zeros. To
estimate the parameters when simulating data using log-normal distributions, the zero
rainfall totals were replaced with 0.01 mm. The different statistics of the simulated data
for each sample were calculated (for each season of a station we get 1000 values of each
statistics from the 1000 simulated samples). The medians and the empirical 95%
confidence intervals of the statistics of the simulated data were then estimated.
Table 4 represents the number (percentages) of stations out of 220 stations where
different statistics of the observed seasonal rainfall data were within the empirical 95%
confidence intervals of the respective statistics for the simulated data from the Tweedie
and the log-normal distributions. For the distributions considered in the analysis, only
the P-G distributions simulate continuous data with exact zeros. The observed and
simulated probability of no rainfall are compared for only the cases where the P-G
distributions are optimal within the Tweedie family. With respect to the considered
statistics (5th percentiles, 25th percentiles, medians, 75th percentiles, 95th percentiles and
147
Tweedie distributions for modelling seasonal rainfall totals in Australia
skewness), the Tweedie models simulate data with properties very similar to the
observed seasonal rainfall totals. For median and 75th percentile, the log-normal
distributions simulate data reasonably well. For more than 99% of stations, the observed
probability of no rainfall is within the 95% confidence interval of the simulated
probability of no rainfall using the P-G distributions.
Table 4: Number (percentage) of stations where the observed 5th percentiles, 25th
percentiles, medians, 75th percentiles, 95th percentiles and probability of no rainfall of
seasonal rainfall data for different seasons are within the 95% empirical confidence
intervals of those of the simulated data. The bold figures indicate which of the two
models is superior.
5th percentile
Tweedie
LN
Autumn
214 (97.3)
198 (90.0)
Winter
199 (90.5)
200 (90.9)
Spring
204 (92.7)
199 (90.5)
Summer
205 (93.2)
192 (87.3)
All season
822 (93.4)
789 (89.7)
25th percentile
Tweedie
LN
220 (100)
175 (79.5)
211 (95.9)
181 (82.3)
216 (98.2)
181 (82.3)
217 (98.6)
161 (73.2)
864 (98.2)
698 (79.3)
Median
Tweedie
LN
220 (100)
213 (96.8)
216 (98.2)
198 (90.0)
220 (100)
201 (91.4)
219 (99.5)
205 (93.2)
875 (99.4)
817 (92.8)
75th percentile
Tweedie
LN
219 (99.5)
219 (99.5)
220 (100)
217 (98.6)
219 (99.5)
219 (99.5)
220 (100)
220 (100)
878 (99.8)
875 (99.4)
95th percentile
Tweedie
LN
219 (99.5)
173 (78.6)
219 (99.5)
180 (81.8)
218 (99.1)
172 (78.2)
220 (100)
154 (70.0)
876 (99.5)
679 (77.2)
Skewness
Tweedie
LN
192 (87.3)
165 (75.0)
197 (89.5)
129 (58.6)
204 (92.7)
105 (47.7)
188 (85.5)
131 (59.5)
781 (88.8)
530 (60.2)
Probability of
no rainfall
P-G
LN
119 (99.2)
123 (100)
Not applicable
138 (100)
126 (100)
506 (99.8)
The graph in Figure 5 shows the distribution of the simulated medians with empirical
95% confidence intervals and the observed median of seasonal rainfall data sets. The
seasonal rainfall data sets shown all have a median rainfall between 45 and 65 mm;
these were selected for comparison, because the distribution can all be shown using a
similar vertical scale. When the data were simulated from the Tweedie model, the
median of the simulated medians are close to the median of the observed seasonal
148
Tweedie distributions for modelling seasonal rainfall totals in Australia
rainfall totals. For all the seasons presented in the graph, the medians of observed
rainfall totals are within the empirical 95% confidence intervals of the simulated
medians.
Figure 5: Graphs showing the observed medians (cross), the medians of simulated
medians (square) along with the empirical 95% confidence intervals (vertical lines) for
forty relatively dry seasons.
It is also important that the distributions simulate well the extreme rainfall events.
Figures 6 and 7 were constructed with the 5th percentiles (for stations with observed 5th
percentiles are between 15mm and 18mm) and the 95th percentiles (for the stations
choosen earlier to construct the graph for the median) respectively of seasonal rainfall
totals. Both the 5th percentiles and 95th percentiles of the observed seasonal rainfall
totals of the selected seasons are close to the the medians of the 5th percentiles and 95th
percentiles respectively of the simulated data from the Tweedie distributions. Except for
one of the selected forty stations, the 5th percentiles of observed rainfall totals are within
the empirical 95% confidence intervals of the 5th percentiles of simulated rainfall data.
For all selected rainfall stations, the 95th percentiles of the observed rainfall are within
the empirical 95% confidence intervals of the 95th percentiles of the simulated data from
the near-optimal distributions within the Tweedie family. From Figures 5, 6 and 7 we
can conclude that the Tweedie models simulate seasonal rainfall data very well.
149
Tweedie distributions for modelling seasonal rainfall totals in Australia
Figure 6: Graphs showing the observed 5th percentiles (cross), medians of the 5th
percentiles of simulated rainfall totals (squares) along with the empirical 95%
confidence intervals (vertical lines) for forty relatively dry seasons.
Figure 7: Graphs showing the observed 95th percentiles (cross), medians of the 95th
percentiles of simulated rainfall totals (squares) along with the empirical 95%
confidence intervals (vertical lines) for forty relatively dry seasons.
The P-G distributions within the Tweedie family simulate continuous data with exact
zeros. Figure 8 represents the observed probabilities of no rainfall and median of the
probabilities of no rainfall with empirical 95% confidence intervals from simulated data
for some seasonal rainfall datasets (seasonal datasets with observed probability of no
rainfall between 0.05 and 0.17). These seasonal datasets were chosen because the
probabilties of no rainfall were similar, and hence the information could be shown with
a convenient vertical scale. The observed probabilities of no rainfall are very close to
150
Tweedie distributions for modelling seasonal rainfall totals in Australia
the medians of the probabilities of no rainfall from the simulated data (Figure 8). For
the seasonal rainfall datasets included in the graph, the observed probabilities of no
rainfall are within the empirical 95% confidence intervals of the probabilities of no
rainfall of the simulated data.
Figure 8: Observed probabilities of no rainfall (cross) and the medians of the
probabilities of no rainfall (squares) with empirical 95% confidence intervals (vertical
lines) for simulated rainfall for some seasonal datasets where the probability of no
rainfall in the observe rainfall time series is between 0.05 and 0.17.
Conclusion
Modelling the seasonal rainfall totals has potential uses in hydrology, in agriculture and
also in water resources planning and management. The log-normal and the gamma
distributions are commonly used to model non-zero seasonal rainfall amounts. To
incorporate exact zeros with the positive rainfall totals, we consider the Tweedie family
of distributions for modelling the seasonal rainfall totals. Considering seasonal rainfall
data from 220 stations from different parts of Australia, we show that the normal
distributions are rarely optimal within the Tweedie family for modelling seasonal
rainfall totals. The gamma distributions are near-optimal for a reasonable number of
stations and the P-G distributions are near-optimal for more than half of the studied
stations. Seasonal rainfall data simulated from the near-optimal distributions within the
Tweedie family have properties very similar to the observed rainfall data. The P-G
distributions, the near-optimal distributions within the Tweedie family for more than
half of the studied stations, simulate continuous rainfall data with exact zeros. For the
151
Tweedie distributions for modelling seasonal rainfall totals in Australia
studied stations, the medians of the probability of no rainfall for the data simulated from
the P-G distributions are very close to the observed probability of no rainfall from the
available dataset.
The Tweedie distributions can be used to simulate the occurrence and the amount of
seasonal rainfall reasonably well, and hence, can be used in hydrology for assessing
droughts, and in water resources planning and management. The distributions can also
be used to fit models to the seasonal rainfall totals with climatological variables to
understand their effect on the amount and the occurrence of seasonal rainfall totals.
152
Tweedie distributions for modelling seasonal rainfall totals in Australia
References
Aksoy H. 2006. Use of gamma distributions in hydrological analysis. Turkish Journal of
Engineering and Environmental Sciences 24: 419–428.
Almeira GJ, Scian B. 2006. Some atmospheric and oceanic indices as predictors of seasonal
rainfall in the Del Plata Basin of Argentina. Journal of Hydrology 329: 350–359.
Canon J, Gonzalez J, Valdes J. 2007. Precipitation in the Colorado River basin and its low
frequency associations with PDO and ENSO signals. Journal of Hydrology 333: 252– 264.
Chandler RE. 2005. On the use of generalized linear models for interpreting climate variability.
Environmetrics 16: 699–715.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using generalized linear models:
a case study from the west of Ireland. Water Resources Research 38(10): 1192–1202.
Cheung WH, Senay GB, Singh A. 2008. Trends and spatial distribution of annual and seasonal
rainfall in Ethiopia. International Journal of Climatology. Published online in Wiley InterScience
DOI: 10.1002/joc.1623.
Chiew FHS, Leahy MJ. 2003. Inter-decadal Pacific Oscillation modulation of the impact of
El Ni~
n o Southern Oscillation on Australian rainfall and streamflow. MODSIM 1–4: 100–105.
Cho, H. K., Bowman, K.P., North. G.R., 2004. A Comparison of gamma and lognormal
distributions for characterizing satellite rain rates from the tropical rainfall measuring mission.
Journal of Applied Meteorology 43, 1586–1597.
Coe R, Stern RD. 1982. Fitting models to daily rainfall. Journal of the Applied Meteorology 21:
1024–1031.
Daniel WW. 2009. Biostatics: A Foundation for Analysis in the Health Sciences, Nineth Edition.
John Wiley and Sons, Inc. United States of America.
Das PK, Subash N, Sikka AK, Sharda VN. Sharma, N.K., 2006. Modelling weekly rainfall using
gamma probability distribution and Markov chain for crop planning in a subhumid (dry) climate
of central Bihar. Indian Journal of Agricultural Sciences 76(6): 358–361.
153
Tweedie distributions for modelling seasonal rainfall totals in Australia
Drosdowsky W, Chambers LE. 2001. Near-Global sea surface temperature anomalies as
predictors of Australian seasonal rainfall. Journal of Climate 14: 1677–1687.
Dunn PK. 2004. Occurrence and quantity of precipitation can be modelled simultaneously.
International Journal of Climatology 24: 1231–1239.
Dunn PK. 2010. Tweedie: Tweedie exponential family models. R package, R package version
2.0.5. Vienna, Austria.
Dunn PK, Smyth GK. 2005. Series evaluation of Tweedie exponential dispersion model densities.
Statistics and Computing 15: 267–280.
Dunn PK, Smyth GK. 2008. Evaluation of Tweedie exponential dispersion model densities by
Fourier inversion. Statistics and Computing 18: 73–86.
Durban M, Glasbey CA. 2001. Weather modelling using a multivariate latent Gaussian model.
Agricultural and Forest Meteorology 109: 187–201.
Endale DM, Fisher DS, Steiner JL. 2003. Long-term rainfall-runoff characteristics of a small
southern piedmont watershed. First Interagency Conference on Research in the Watersheds. 497–
502.
Glasbey CA, Nevison IM. 1997. Rainfall modelling using a latent Gaussian variable. In:
Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and Future
Directions. Springer, New York, pp. 233–242.
Gonzalez M.H, Cariga ML. 2009. An approach to seasonal forecasting of summer rainfall in
Buenos Aires, Argentina. Atmosfera 22(3): 265–279.
Hasan MM, Dunn PK. 2010a. Two Tweedie distributions that are near-optimal for modelling
monthly rainfall in Australia. International Journal of Climatology. Published online in Wiley
InterScience. DOI: 10.1002/joc.2162.
Hasan MM, Dunn PK. 2010b. A simple Poisson–gamma model for modelling rainfall occurrence
and amount simultaneously. Agricultural and Forest Meteorology 150: 1319–1330.
154
Tweedie distributions for modelling seasonal rainfall totals in Australia
Hasan, M.M., Dunn, P.K.. 2011. Understanding the effect of climatology on monthly rainfall
amounts in Australia using Tweedie GLMs. International Journal of Climatology. Published
online in Wiley InterScience. (www.interscience.wiley.com) DOI: 10.1002/joc.2332.
Jiang J. 2007. Linear and Generalized Linear Mixed Models and Their Applications. Springer:
New York.
Jørgensen B. 1987. Exponential dispersion models (with discussion). Journal of the Royal
Statistical Society Series B 49: 127–162.
Jørgensen, B., 1997. The Theory of Dispersion Models. Chapman and Hall. London.
Kirono DGC, Chiew FHS, Kent DM. 2010. Identification of best predictors for forecasting
seasonal rainfall and runoff in Australia. Hydrological Processes 24: 1237–1247.
Laux P, Wagner S, Wagner A, Jacobeit J, Bardossy A, Kunstmann H. 2009. Modelling daily
precipitation features in the Volta Basin of west Africa. International Journal of Climatology 21:
5113–5134.
Mason SJ. 1998. Seasonal forecasting of south African rainfall using a non-linear discriminant
analysis model. International Journal of Climatology 18: 147–164.
McCullagh P, Nelder JA. 1989. Generalized Linear Models, 2nd edition, Chapman and Hall:
London.
Meinke H, Baethgen WE, Carberry PS, Donatelli M, Hammer GL, Selvaraju R, Stöckle CO.
2001. Increasing profits and reducing risks in crop production using participatory systems
simulation approaches. Agricultural Systems 70: 493–513.
Mooley DA. 1975. Worst summer monsoon failures over the Asiatic monsoon area. Proceedings
of Indian National Science Academy 42A (1): 34–43.
Nnaji AO. 2001. Forecasting seasonal rainfall for agricultural decision-making in northern
Nigeria. Agricultural and Forest Meteorology 107: 193–205.
Ntale HK, Gan TY, Mwale D. 2003. Prediction of East African Seasonal Rainfall Using Simplex
Canonical Correlation Analysis. Journal of Climate 16: 2105–2112.
155
Tweedie distributions for modelling seasonal rainfall totals in Australia
Pai DS, Sridhar L, Guhathakurta P, Hatwar HR. 2010. District-wise drought climatology of the
southwest monsoon season over India based on Standardized Precipitation Index (SPI). Research
Report 2/2010. National Climate Centre, India Meteorological Department, Pune - 411 005.
Purdie JM, Bardsley WE. 2010. Seasonal prediction of lake inflows and rainfall in a hydroelectricity catchment, Waitaki River, New Zealand. International Journal of Climatology 30: 372–
389.
R Development Core Team, 2010. R: A Language and Environment for Statistical Computing, R
Foundation for
Statistical
Computing, Vienna, Austria,
ISBN 3-900051-07-0, URL
http://www.R-project.org.
Rader M, Kirshen P, Roncoli C, Hoogenboom G, Ouattara F. 2009. Agricultural risk decision
support system for resource-poor farmers in Burkina Faso, West Africa. Journal of Water
Resources Planning and Management 135(5): 323–333.
Richardson CW, Wright DA. 1984. WGEN: A Model for Generating Daily Weather Variables.
Technical Report No 8, United States Department of Agriculture, Agriculture Research Service.
Risbey JS, Pook MJ, McIntosh PC, Wheeler MC, Hendon HH. 2009. On the remote drivers of
rainfall variability in Australia. Monthly Weather Review 137: 3233–3253.
Sen Z, Eljadid AG. 1999. Rainfall distribution function for Libya and rainfall prediction. Journal
of Hydrological Sciences 44: 665–680.
Sharda VN, Das PK. 2005. Modelling weekly rainfall data for crop planning in a sub-humid
climate of India. Agricultural Water Management 76: 120–138.
Singhrattna N, Rajagopalan B, Clark M, Kumar KK. 2005. Seasonal forecasting of Thailand
summer monsoon rainfall. International Journal of Climatology 25: 649–664.
Smyth GK. 1996. Regression analysis of quantity data with exact zeroes. In: Proceedings of the
Second Australia–Japan Workshop on Stochastic Models in Engineering, Technology and
Management. Technical report. Technology Management Centre, University of Queensland, pp.
572–580.
Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data (with discussion). Journal
of the Royal Statistical Society Series A 147: 1–34.
156
Tweedie distributions for modelling seasonal rainfall totals in Australia
Stone RC, Meinke H. 2005. Operational seasonal forecasting of crop performance. Philosophical
Transactions of the Royal Society B 360: 2109–2124.
Suppiah R. 2004. Trends in the southern oscillation phenomenon and Australian rainfall and
changes in their relationship. International Journal of Climatology 24: 269–290.
Tweedie MCK. 1984. An index which distinguishes between some important exponential
families. Statistics: applications and new directions. In: Proceedings of the Indian Statistical
Institute Golden Jubilee International Conference. Technical Report. Indian Statistical Institute,
Calcutta.
Ward MN, Folland CK. 1991. Prediction of seasonal rainfall in the north nordeste of Brazil using
eigenvectors of sea-surface temperature. International Journal of Climatology 11(7): 711–743.
Wilks DS. 1999. Interannual variability and extreme-value characteristics of several stochastic
daily precipitation models. Agricultural and Forest Meteorology 93: 153–169.
157
Conclusion and future directions
CHAPTER 8: CONCLUSION AND FUTURE DIRECTIONS
Conclusion
In the current study, some novel measures of rainfall variability were explored and
developed for application to water resource availability in Australia. To understand
some other characteristics of the monthly and seasonal rainfall totals of Australian
stations, I aimed to find the theoretical probability distributions most useful for
modelling rainfall. For this purpose, the distributions within the Tweedie family were
considered. This family of distributions have important and useful properties. Some of
the distributions within the Tweedie family incorporate exact zeros with positive
monthly and seasonal rainfall totals. The Tweedie family belongs to the exponential
dispersion model family of distributions, upon which the generalized linear models
(GLMs) are based. Consequently, a framework already exists for fitting models based
on the Tweedie distributions, for inference and for diagnostic testing. I also explored the
possibility that different distributions are needed for modelling the rainfall totals of each
month or season. Tweedie GLMs with cyclic sine and cosine terms and climatological
variables were fitted to the monthly rainfall totals of Australian rainfall stations. The
following general conclusions can be drawn from this study:
1. To quantify the variability in distribution of long-term average rainfall, I defined
the ESR (entropy of stable rainfall). On the basis of the ESR, the stations located
in the northern regions of Australia record more variability in long-term average
rainfall across the months of the year than the stations located in the southern
regions. The value of the ESR for a station is used as a baseline to compare the
consistency in rainfall of individual years with the distribution of long-term
average rainfall.
2. To compare the variability in the distribution of monthly rainfall in individual
years with the variability in the distribution of stable rainfall, I defined the
consistency index (CI). On the basis of the CI, the regions close to the coastline
in northern, southern and eastern Australia receive more consistent rainfall than
elsewhere. For almost everywhere in Australia, the El Ni~n o years receive more
n a years.
inconsistent rainfall than the La Ni~
158
Conclusion and future directions
3. I divided the studied years into four categories of water resource availability on
the basis of the CI and the total annual rainfall. The northern, south-eastern and
south-western parts of Australia have better availability of water resources than
elsewhere. For El Ni~n o years, almost everywhere in Australia has higher
percentage of years with below average and more inconsistent rainfall
n o years also have a higher
n a years. The El Ni~
distributions than the La Ni~
probability of experiencing water shortage, especially in some months of the
year. Special programs, including water preservation and efficient strategies for
water usage, should be implemented to cater for the necessity of providing water
in the months with an expected shortage of water.
4. Distributions within the Tweedie family fit well with the monthly rainfall totals
of Australian stations. These distributions are useful for the disaggregation of
monthly rainfall into shorter timescales. A framework already exists for fitting
models based on the Tweedie distributions, for inference and for diagnostic
testing. In addition, predictors are easily incorporated into the modelling
procedure.
5.
The gamma distribution (the Tweedie distribution with p 2 ) is almost always
the optimal or near-optimal distribution for modelling positive monthly rainfall
amounts in Australian stations. In cases in which some months record exactly
zero rainfall, Poisson–gamma distributions using pˆ 1.6 are shown to fit well.
In cases in which the gamma distribution within the Tweedie family is optimal, I
propose that simulations instead should be based on a value of p slightly less
than 2; say p 1.99 . This has the advantage of still modelling the data well, yet
admitting the possibility of zero rainfall in the simulated future.
6. Poisson–gamma GLMs with a single value of pˆ 1.6 were used for modelling
the monthly rainfall totals of Australian stations. Despite their simplicity, these
models fit the data well and produce reasonable simulated monthly rainfall data.
159
Conclusion and future directions
7. The Poisson–gamma GLMs were fitted to the monthly rainfall totals of the
Australian rainfall stations with cyclic sine and cosine terms and the lagged
values of the climatological variables NINO 3.4, SOI and SOI phase. Four
different models were studied to determine if any of the models with
climatological variables improved on the performance on the base model (the
model with only the sine and cosine terms as predictors), which one(s) perform
better than the base model and where each model with a climatological index
performed better than the base model.
8. Based on the BIC, the model with the NINO 3.4 is generally preferred for most
of the studied stations. Based on the LRT, the models with the climatological
variables fit significantly better than the base model in most of the regions in
north-eastern and eastern parts of Australia. Models with NINO 3.4 also fit
significantly better in some places on the coastlines of Western Australia.
Substantial changes in the predicted amount and the probability of no rainfall are
observed when the climatological variables are added to the base model.
9. Importantly, in some parts of western and central Australia, none of the models
with climatological variables fit significantly better than the base model;
however, little data is available from these regions.
10. Most of the seasonal rainfall models used in the literature are based on the
assumption of the normality of the data. However, the seasonal rainfall totals for
most Australian stations are not approximately normally distributed and have
long tails to the right. The distributions within the Tweedie family fit well to
such seasonal rainfall totals.
11. Within the Tweedie family, the normal distributions are rarely optimal for
seasonal rainfall totals of Australian stations. The gamma distributions are nearoptimal for a reasonable number of stations and the Poisson–gamma
distributions are near-optimal for more than half of the 220 studied stations.
12. Seasonal rainfall data simulated from the near-optimal distributions within the
Tweedie family have properties very similar to the observed rainfall data of the
160
Conclusion and future directions
respective stations. The Poisson–gamma distributions, the near-optimal
distributions within the Tweedie family for more than half of the studied
stations, simulate continuous rainfall data with exact zeros. For the studied
stations, the medians of the probability of no rainfall in the data simulated from
the Poisson–gamma distributions are very close to the observed probability of no
rainfall from the available dataset.
13. The Tweedie distributions have been used in simulating seasonal rainfall data.
The distributions can also be used to fit models to the seasonal rainfall totals
with a set of predictors.
Future Directions
In this study, various characteristics of monthly and seasonal rainfall totals of Australian
rainfall stations are studied. An index is defined to quantify the consistency in rainfall
distribution across the months of the year. The monthly and seasonal rainfall totals of
Australian rainfall stations are investigated and well-fit theoretical probability are
proposed. Tweedie generalized linear models are fitted to model monthly rainfall totals
with a number of predictors. The research that has been undertaken for this thesis has
highlighted a number of topics on which further research would be beneficial.
1. The proposed consistency index can be used for quantifying the variability in
seasonal rainfall totals.
2. The impact of some other climatological variables on the consistency index and
on the potential water resource availability of Australian stations can be
quantified.
3. Some other climatological variables can be added to the Tweedie GLMs to
assess their impacts on the monthly rainfall totals of Australian stations.
4. Tweedie GLMs can be fitted to the seasonal rainfall totals of the Australian
stations with the season factor and with a number of climatological variables.
5. The models can be incorporated into other models used in agriculture and in
hydrology.
161
References
References
ABC News. January 18, 2011. Flood costs tipped to top $30b.
Abouammoh AM. 1986. On Probability distribution of monthly precipitation totals in
arid regions. Journal of Applied Statistics 2(3): 51–68.
Abtew W, Melesse AM, Dessalegne T. 2009a. El Ni~n o Southern Oscillation link to the
Blue Nile River Basin hydrology. Hydrological Processes 23: 3653–3660.
Abtew W, Melesse AM, Dessalegne, T. 2009b. Spatial, inter and intra-annual variability
of the upper Blue Nile basin rainfall. Hydrological Processes 23: 3075–3082.
Adejuwon JO, Odekunle TO, Omotayo MO. 2007. Extended-Range Weather
Forecasting in Sub-Saharan West Africa: Assessing a Potential Tool for Adapting Food
Production to Climate Variability and Climate Change. AIACC Working Paper No. 46.
Aksoy H. 2006. Use of gamma distributions in hydrological analysis. Turkish Journal of
Engineering and Environmental Sciences 24: 419–428.
Ali A, Abtew W, Horn SV, and Khanal N. 2000. Temporal and spatial characterization
of rainfall over central and south Florida. Journal of the American Water Resources
Association 36(4): 833–848.
Allan DM, Haan CT. 1975. Stochastic Simulation of Daily Rainfall. Technical Report
82. Water Resources Institute, University of Kentucky.
Almeira GJ, Scian B. 2006. Some atmospheric and oceanic indices as predictors of
seasonal rainfall in the Del Plata Basin of Argentina. Journal of Hydrology 329: 350–
359.
Ananthakrishnan R, Soman MK. 1989. Statistical distribution of daily rainfall and its
association with the coefficient of variation of rainfall series. International Journal of
Climatology 9: 485–500.
Anderson DR. 2008. Model Based Inference in the Life Sciences. Springer, USA.
162
References
Ati OF, Stigter CJ, Iguisi EO, Afolayan JO. 2009. Profile of Rainfall Change and
Variability in the Northern Nigeria, 1953–2002. Research Journal of Environmental and
Earth Sciences 1(2): 58–63.
Aubert D, Loumagne C, Oudin L. 2003. Sequential assimilation of soil moisture and
stream flow data in a conceptual rainfall runoff model. Journal of Hydrology 280: 145–
161.
Avseth P, Mukerji T, Mavko G. 2005. Quantitative Seismic Interpretation: Applying
Rock Physics Tools to Reduce Interpretation Risk. Cambridge University Press, New
York.
Ayansina A. 2009. Seasonal rainfall variability in Guinea Savanna part of Nigeria: a
GIS approach. International Journal of Climate Change Strategies and Management
1(3): 282–296.
Barlow S et al. 2010. National Climate Change Adaptation Research Plan: Primary
Industries. Consultation Draft.
Barnston, A. G., M. Chelliah, and S. B. Goldenberg, 1997: Documentation of a highly
ENSO related SST region in the equatorial Pacific. Atmosphere Ocean 35: 367–383.
Beeton RJS, Buckley KI, Jones GJ, Morgan D, Reichelt RE, Trewin D. 2006. Australia
state of the environment 2006. Technical report, Independent report to the Australian
Government Minister for the Environment and Heritage, Department of the
Environment and Heritage, Canberra.
Ben-Gai T, Bitan A, Manes A, Alpert P, Rubin S. 1998. Spatial and temporal changes in
rainfall frequency distribution patterns in Israel. Theoretical and Applied Climatology.
61: 177–190.
Bewket W. 2007. Rainfall variability and agricultural vulnerability in the Amhara
region, Ethiopia. Ethiopian Journal of Development Research. 29(1): 1–34.
163
References
Bhakar SR, Singh RV, Chhajed N, Bansal AK. 2006. Stochastic modeling of monthly
rainfall at Kota region. ARPN Journal of Engineering and Applied Sciences 1 (3): 36–
44.
Boer R, Fletcher DJ, Campbell LC. 1993. Rainfall patterns in a major wheat-growing
region of Australia. Australian Journal of Agricultural Research 44: 609–624.
Bosch DD, Davis FM. 1998. Rainfall variability and spatial patterns for the southeast.
In Proceedings of the 4th International Conference on Precision Agriculture. July 19–22,
1998. St. Paul, Minnesota.
Bronikowski AM, Altmann J. 1996. Foraging in a variable environment: weather
patterns and the behavioural ecology of baboons. Behavioural Ecology and
Sociobiology 39 : 11–25.
Bronikowski AM, Webb C. 1996. A critical examination of rainfall variability measures
used in behavioural ecology studies. Behavioural Ecology and Sociobiology 39 : 27–30.
Buishand TA. 1978. Some remarks on the use of daily rainfall models. Journal of
Hydrology 36: 295–308.
Buishand TA, Shabalova MV, Brandsma T. 2004. On the choice of the temporal
aggregation level for statistical downscaling of precipitation. Journal of Climate 17:
1816–1827.
Bureau of Meteorology. 1989. Climate of Australia. Australian Government Publishing
Service, Canberra.
Bureau of Meteorology (BoM). 2010. Australian Government Website. URL: http://
www.bom.gov.au/lam/climate/levelthree/ausclim/zones.htm.
Canon J, Gonzalez J, Valdes J. 2007. Precipitation in the Colorado River basin and its
low frequency associations with PDO and ENSO signals. Journal of Hydrology 333:
252– 264.
164
References
Chandler RE. 2005. On the use of generalized linear models for interpreting climate
variability. Environmetrics 16: 699–715.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using generalized
linear models: a case study from the west of Ireland. Water Resources Research 38(10):
1192–1202.
Chapman TG. 1997. Stochastic models for daily rainfall in the western Pacific.
Mathematics and Computers in Simulation 43: 351–358.
Chapman TG. 1998. Stochastic modelling of daily rainfall: the impacts of adjoining wet
days on the distribution of rainfall amounts. Environmental Modelling and Software 13:
317–324.
Chaves HML, Piau LP. 2008. Effect of rainfall variability and land use on Runoff and
sediment in the Pipiripau River basin in the Distrito Federal, Brazil. Journal of Soil
Science, 32: 333–343.
Cheung WH, Senay GB, Singh A. 2008. Trends and spatial distribution of annual and
seasonal rainfall in Ethiopia. International Journal of Climatology. Published online in
Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/joc1623
Chiew FHS, Leahy MJ. 2003. Inter-decadal Pacific Oscillation modulation of the
impact of El Ni~n o Southern Oscillation on Australian rainfall and streamflow.
MODSIM 1–4, 100–105.
Cho HK, Bowman KP, North GR. 2004. A comparison of gamma and log-normal
distributions for characterizing rain rates from the tropical rainfall measuring mission.
Journal of Applied Meteorology 43: 1586–1597.
Chowdhury RK, Beecham S. 2010. Australian rainfall trends and their relation to the
southern oscillation index. Hydrological Processes 24: 504–514.
Chowdhury S, Sharma A. 2007. Mitigating parameter bias in hydrological modelling
due to uncertainty in covariates. Journal of Hydrology 340: 197–204.
165
References
Coe R, Stern RD. 1982. Fitting models to daily rainfall. Journal of the Applied
Meteorology 21: 1024–1031.
Conway D, Allison E, Felstead R, Goulden M. 2005. Rainfall Variability in East Africa:
Implications for Natural Resources Management and Livelihoods. Philosophical
Transactions: Mathematical, Physical and Engineering Sciences. 363(1826): 49–54.
Daniel WW. 2009. Biostatics: A Foundation for Analysis in the Health Sciences, Nineth
Edition. John Wiley and Sons, Inc. United States of America.
Das SC. 1955. The fitting of truncated Type III curves to daily rainfall data. Australian
Journal of Physics 8: 298–304.
Das PK, Subash N, Sikka AK, Sharda VN, Sharma NK. 2006. Modelling weekly
rainfall using gamma probability distribution and Markov chain for crop planning in a
subhumid (dry) climate of central Bihar. Indian Journal of Agricultural Sciences 76(6):
358–361.
Deni SM, Jemain AA, Ibrahim K. 2009. Fitting optimum order Markov chain models
for daily rainfall occurrences in peninsular Malaysia. Theoretical and Applied
Climatology 97: 109–121.
Dewar RE, Wallis JR. 1999. Geographical patterning of interannual rainfall variability
in the tropics and near tropics: An L-moments approach. Journal of Climate. 12: 3457–
3466.
Diggle PJ, Ribeiro PJ Jr. 2007. Model-based Geostatistics, Springer, New York.
Dingens P, Steyaert H. 1971. Distribution for K-day rainfall totals. Hydrological
Sciences Journal. 16(3): 19–24.
Dobson AJ. 2002. An Introduction to Generalized Linear Models, 2nd edition.
Chapman and Hall, London.
Dobson AJ, Barnett AG. 2008. An introduction to generalized linear models, Chapman
and Hall, London.
166
References
Drosdowsky W, Chambers LE. 2001. Near-Global sea surface temperature anomalies as
predictors of Australian seasonal rainfall. Journal of Climate 14: 1677–1687.
Dunbar RIM. 1992. Time: a hidden constraint on a behavioural ecology of Baboons.
Behavioural Ecology and Sociobiology, 31: 35–49.
Dunn PK. 2004. Occurrence and quantity of precipitation can be modelled
simultaneously. International Journal of Climatology 24: 1231–1239.
Dunn PK. 2010. Tweedie: Tweedie Exponential Family Models. R package, Vienna,
Austria, R package version 2.0.5.
Dunn PK, Smyth GK. 1996. Randomized quantile residuals. Journal of Computational
and Graphical Statistics 5(3): 236–244.
Dunn PK, Smyth GK. 2005. Series evaluation of Tweedie exponential dispersion model
densities. Statistics and Computing 15: 267–280.
Dunn PK, Smyth GK. 2008. Evaluation of Tweedie exponential dispersion model
densities by Fourier inversion. Statistics and Computing 18: 73–86.
Dunn PK, White N. 2005. Power-variance models for modelling rainfall. In: Statistical
Solution to Modern Problems: Proceedings of the 20th International Workshop on
Statistical Modelling, Sydney, 149–156.
Durban M., Glasbey CA. 2001. Weather modelling using a multivariate latent Gaussian
model. Agricultural and Forest Meteorology 109: 187–201.
Ebrahimi N, Maasoumi E, Soo ES. 1999. Ordering univariate distributions by entropy
and variance. Journal of Econometrics 90(2): 317–337.
El-seed AMG. 1987. An application of Markov chain model for wet and dry spell
probabilities at Juba in southern Sudan. Geojournal 15(4): 420–424.
167
References
Endale DM, Fisher DS, Steiner JL. 2003. Long-term rainfall-runoff characteristics of a
small southern piedmont watershed. First Interagency Conference on Research in the
Watersheds. 497–502.
Everingham YL, Reason CJC. 2009. Interannual variability in rainfall and wet spell
frequency during the New South Wales sugarcane harvest season. International Journal
of Climatology, Published online in Wiley InterScience. DOI: 10.1002/joc.2066.
Fealy R, Sweeney J. 2007. Statistical downscaling of precipitation for a selection of
sites in Ireland employing a generalized linear modelling approach. International
Journal of Climatology, 27 (15): 2083–2094.
Fernandes MVM, Schmidt AM, Migon HS. 2009. Modelling zero-inflated
spatiotemporal processes. Statistical Modelling 9 (1): 3–25.
Feuerverger A. 1979. On some methods of analysis for weather experiments.
Biometrika 66: 655–658.
Gabriel KR, Neumann J. 1962. A Markov chain model for daily rainfall occurrence at
Tel Aviv. Journal of the Royal Meteorological Society 88: 90–95.
George DA, Birch C, Buckley D, Partridge IJ, Clewett JF. 2005. Assessing climate risk
to improve farm business management. Extension Farming Systems Journal. 1(1): 71–
77.
Giese BS, Compo GP, Slowey NC, Sardeshmukh PD, Carton JA, Ray S, Whitaker JS.
n o. Bulletin of American Meteorological Society 91: 177–183.
2010. The 1918/19 El Ni~
Gilchrist R, Drinkwater D. 1999. Fitting Tweedie Models to Data with Probability of
Zero Responses. Technical Report, Proceedings of the 14th International Workshop on
Statistical Modelling, Graz, Austria, July 19–23.
Glasbey CA, Nevison IM. 1997. Rainfall modelling using a latent Gaussian variable.
Modelling Longitudinal and Spatially Correlated Data: Methods, Applications, and
Future Directions. Springer: New York, 233–242.
168
References
Gonzalez J, Valdes JB. 2008. A regional monthly precipitation simulation model based
on an L-moment smoothed statistical regionalization approach. Journal of Hydrology
348: 27– 39.
Gonzalez MH, Cariaga ML. 2009. An approach to seasonal forecasting of summer
rainfall in Buenos Aires, Argentina. Atmosfera 22 (3): 265–279.
Guttman NB, Hosking JRM, Wallis. 1993. Regional precipitation for the continental
United States computed from L-moments. Journal of Climate 6: 2326–2340.
Haghighatjou P. 2002. Probability distribution functions as applied to monthly and
annual precipitation of old station in Iran. Journal of Agricultural Sciences and Natural
Resources 9 (3): 41–48.
Hamlin MJ, Rees DH. 1987. The use of rainfall forecasts in the optimal management of
small-holder rice irrigation—a case study. Hydrological Sciences 32 (1): 15–29.
Hammer GL, Holzworth DP, Stone RC. 1996. The value of skill in seasonal climate
forecasting to wheat crop management in a region with high climatic variability.
Australian Journal of Agricultural Research 47: 717–737.
Hanley DE, Bourassa MA, O'Brien JJ, Smith SR, Spade ER. 2003. A Quantitative
Evaluation of ENSO Indices. Journal of Climate 16: 1249–1258.
Hansen JW, Ines AVM. 2005. Stochastic disaggregation of monthly rainfall data for
crop simulation studies. Agricultural and Forest Meteorology 131: 233–246.
Hansen JW, Mishra A, Rao KPC, Indeje M, Ngugi RK. 2009. Potential value of GCMbased seasonal rainfall forecasts for maize management in semi-arid Kenya.
Agricultural Systems 101: 80–90.
Hasan MM, Dunn PK. 2010a. Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia. International Journal of Climatology,
doi:10.1002/joc.2162, Published online in Wiley InterScience.
169
References
Hasan MM, Dunn PK. 2010b. A simple Poisson–gamma model for modelling rainfall
occurrence and amount simultaneously. Agricultural and Forest Meteorology 150(10):
1319–1330.
Hope P, Timbal B, Fawcett R. 2009. Associations between rainfall variability in the
southwest and southeast of Australia and their evolution through time. International
Journal of Climatology, Published online in Wiley InterScience. DOI: 10.1002/joc.
1964.
Hughes BL, Saunders MA. 2002. Seasonal prediction of European spring precipitation
from E l Ni~n o southern oscillation and local sea-surface temperatures. International
Journal of Climatology 22: 1–14.
Husak GJ, Michaelsen J, Funk C. 2007. Use of the gamma distribution to represent
monthly rainfall in Africa for drought monitoring applications. International Journal of
Climatology 27: 935–944.
Ingsrisawang L, Ingsrisawang S, Luenam P, Trisaranuwatana P, Klinpratoom S,
Aungsuratana P, Khantiyanan W. 2010. Applications of statistical methods for rainfall
prediction over the eastern Thailand. Proceeding of the IMECS, Hong Kong
Ison NT, Feyerherm AM, Bark DL. 1971. Wet period precipitation and the gamma
distribution. Journal of Applied Meteorology 10: 658–665.
Jacob M, McKendry I, Lee R. 2003. Long-term changes in rainfall intensities in
Vancouver, British Columbia. Canadian Water Resources Journal 28(4): 587–603.
Jamaludin S, Jemain AA. 2008. Fitting the statistical distribution for daily rainfall in
peninsular Malaysia based on the AIC criterion. Journal of Applied Sciences Research
4: 1846–1857.
Jiang J. 2007. Linear and Generalized Linear Mixed Models and Their Applications.
Springer: New York.
170
References
Jørgensen B. 1987. Exponential dispersion models (with discussion). Journal of the
Royal Statistical Society Series B 49: 127–162.
Jørgensen B. 1997. The Theory of Dispersion Models. Chapman and Hall: London.
Jørgensen B, Souza MCPD. 1994. Fitting Tweedie‘s compound Poisson model to
insurance claims data. Scandinavian Actuarial Journal 1: 69–93.
Kamarianakis Y, Feidas H, Kokolatos G, Chrysoulakis N, Karatzias V. 2008.
Evaluating remotely sensed rainfall estimates using nonlinear mixed models and
geographically weighted regression. Environmental Modelling & Software 23: 1438–
1447.
Katz RW. 1977. Precipitation as a chain dependent process. Journal of the Royal
Statistical Society Series B 16: 671–676.
Katz RW, Parlange MB, Naveau P.2002. Statistics of extremes in hydrology. Advances
in Water Resources 25: 1287–1304.
Kawachi T, Maruyama T, Singh VP. 2001. Rainfall entropy for delineation of water
resources zones in Japan. Journal of Hydrology. 246: 36–44.
Kiem AS, Franks SW. 2001. On the identification of ENSO-induced rainfall and runoff
variability: A comparison of methods and indices. Hydrological Sciences. 46(5): 715–
727.
Kirono DGC, Chiew FHS, Kent DM. 2010. Identification of best predictors for
forecasting seasonal rainfall and runoff in Australia. Hydrological Processes 24: 1237–
1247.
Knapp AK, Fay PA, Blair JM, Collins SL, Smith MD, Carlisle JD, Harper CW, Danner
BT, Lett MS, McCarron JK. 2002. Rainfall Variability, Carbon Cycling, and Plant
Species Diversity in a Mesic Grassland. SCIENCE 298: 2202–2205.
171
References
Koutsoyiannis D. 2005. Uncertainty, entropy, scaling and hydrological stochastics. 1.
Marginal distribution properties of hydrological processes and state scaling.
Hydrological Sciences Journal 50(3): 381–404.
Lana X, Burgueno A. 2000. Some statistical characteristics of monthly and annual
pluviometric irregularity for the Spanish Mediterranean Coast. Theoretical and Applied
Climatology 65: 79–97.
Laux P, Wagner S, Wagner A, Jacobeit J, Bardossy A, Kunstmann H. 2009. Modelling
daily precipitation features in the Volta Basin of west Africa. International Journal of
Climatology 21: 5113–5134.
Lee CK, Shen SSP, Bailey B, North GR. 2009. Factor analysis for El Nino signals in
sea surface temperature and precipitation. Theoretical and Applied Climatology. 97:
195–203.
Lennox SM, Dunn PK, Power BD, Devoil P. 2004. A statistical distribution for
modelling rainfall with promising applications in crop science, Technical report. In:
Fischer, T., et al. (Eds.), New Directions for a Diverse Planet: Proceedings for the 4th
International Crop Science Congress. Brisbane, Australia.
Lewis F, Butler A, Gilbert L. 2010. A unified approach to model selection using the
likelihood ratio test. Methods in Ecology & Evolution. doi: 10.1111 /j.2041-210X.2010.
00063.x.
Liedloff AC, Cook GD. 2007. Modelling the effects of rainfall variability and fire on
tree populations in an Australian tropical savanna with the FLAMES simulation model.
Ecological Modelling 201: 269–282.
Little MA, McSharry PE, Taylor JW. 2009. Generalized Linear Models for SiteSpecific Density Forecasting of U.K. Daily Rainfall. Monthly Weather Review 137:
1029–1045.
172
References
Lorenzo MN, Iglesias I, Taboada JJ, Gesteira MG. 2010. Relationship between monthly
rainfall in northwest Iberian Peninsula and North Atlantic sea surface temperature.
International Journal of Climatology 30: 980–990.
Loukas A, Vasiliades L. 2004. Probabilistic analysis of drought spatiotemporal
characteristics in Thessaly region, Greece. Natural Hazards and Earth System Sciences
4: 719–731.
Madi MT, Raqab MZ. 2007. Bayesian prediction of rainfall records using the
generalized exponential distribution. Environmetrics 18: 541–549.
Maheepala S and Perera CJC. 1996. Monthly hydrologic data generation by
disaggregation. Journal of. Hydrology 178: 277–291.
Maruyama T, Kawachi T. 1998. Evaluation of rainfall characteristics using entropy.
Journal of Rainwater Catchment System. 4(1): 7–10.
Maruyama T, Kawachi T, Maeda S. 2002. Entropy-based assessments of monthly
rainfall variability. Journal of Rainwater Catchment Systems. 8(1): 21–25.
Maruyama T, Kawachi T, Singh VP. 2005. Entropy-based assessment and clustering of
potential water resources availability. Journal of Hydrology 309(1–4): 104–113.
Mason SJ. 1998. Seasonal forecasting of South African rainfall using a non-linear
discriminant analysis model. International Journal of Climatology 18: 147–164.
May W. 2004. Variability and extremes of daily rainfall during the Indian Summer
Monsoon in the period 1901–1989. Glob Planet Change 44:83–105.
McBride JL, Nicholls N. 1983. Seasonal relationships between Australian rainfall and
Southern Oscillation. Monthly weather Review. 111, 1998 2004.
McCown RL, Hammer GL, Hargreaves JNG, Holzworth DP, Freebairn DM. 1996.
APSIM: a novel software system for model development, model testing, and simulation
in agricultural research. Agricultural Systems 50: 255–271.
173
References
McCullagh P, Nelder JA. 1989. Generalized Linear Models, 2nd edition, Chapman and
Hall: London.
McIntosh RP. 1967. An index of diversity and the relation of certain concepts to
diversity. Ecology 48: 392–404.
Meilke PW Jr. 1973. Another family of distributions for describing and analysing
precipitation data. Journal of Applied Meteorology 12: 275–280.
Meinke H, Baethgen WE, Carberry PS, Donatelli M, Hammer GL, Selvaraju R, Stöckle
CO. 2001. Increasing profits and reducing risks in crop production using participatory
systems simulation approaches. Agricultural Systems 70: 493–513.
Meinke H, Devoil P, Hammer GL, Power S, Allan R, Stone RC, Folland C, Potgieter A.
2005. Rainfall variability at decadal and longer time scales: Signal or noise? Journal of
Climate 18: 89–96.
Meinke H, Ryley M. 1997. Effect of sorghum ergot on grain sorghum production: a
preliminary analysis. Australian Journal of Agricultural Research 48: 1241–1247.
Meinke H, Stone RC, Hammer GL. 1996. SOI phases and climatic risk to peanut
production: A case study for northern Australia. International Journal of Climatology
16: 783–789.
Meneghini B, Simmonds I, Smith IN. 2007. Association between Australian rainfall and
the Southern Annular Mode. International Journal of Climatology 27: 109–121.
Meng Q, Zhang Y, Wang Z. 2007. Rainfall Predictive Models for Building Simulation
II—Rainfall Estimation, Proceedings: Building Simulation. Technical Report. Tsinghua
University, Beijing, China.
Mishra AK, Ozgera MB, Singh VP. 2009. An entropy-based investigation into the
variability of precipitation. Journal of Hydrology 370: 139–154.
174
References
Mollah WS, Cook IM. 1996. Rainfall variability and agriculture in the semi-arid
tropics-the Northern Territory, Australia. Agricultural and Forest Meteorology 79(1–2):
39–60.
Momiyama M, Mitsudera M. 1952. A stochastic study of climatology, Tendency of
climatology. Meteorology and Statistics 3(2–5): 171–177.
Mooley DA. 1973. Gamma distribution probability model for Asian summer monsoon
monthly rainfall. Monthly Weather Review 101(2): 160–176.
Mooley DA. 1975. Worst summer monsoon failures over the Asiatic monsoon area.
Proceedings of Indian National Science Academy 42A (1): 34–43.
Mooley DA, Rao GA. 1971. Distribution function for seasonal and annual rainfall over
India. Monthly Weather Review 99: 796–799.
Nasseri M., Zahraie B. 2010. Application of simple clustering on space time mapping of
mean monthly rainfall patterns. International Journal of Climatology, Published online
in Wiley InterScience (www.interscience.wiley.com).
Natural disasters in Australia. 2007. Website of Australian Government. URL:
http://www.cultureandrecreation.gov.au/articles/naturaldisasters/
Ngongondo CS. 2006. An analysis of long-term rainfall variability, trends and
groundwater availability in the Mulunguzi river catchment area, Zomba mountain,
Southern Malawi. Quaternary International 148: 45–50.
Nicholls N, Wong KK, 1990: Dependence of Rainfall Variability on Mean Rainfall,
Latitude, and the Southern Oscillation. Journal of Climate 3: 163–170.
Nicholson SE, Entekhabi D. 1987. Rainfall variability in equatorial and southern Africa:
relationships with sea-surface temperatures along the south-western coast of Africa.
Journal of Climate and Applied Meteorology 26: 561–578.
Nippert JB, Knapp AK, Briggs. 2006. Intra-annual rainfall variability and grassland
productivity: can the past predict the future? Plant Ecology 184: 65–74.
175
References
Nnaji AO. 2001. Forecasting seasonal rainfall for agricultural decision-making in
northern Nigeria. Agricultural and Forest Meteorology 107: 193–205.
Ntale HK, Gan TY, Mwale D. 2003. Prediction of East African Seasonal Rainfall Using
Simplex Canonical Correlation Analysis. Journal of Climate 16: 2105–2112.
Oettli P, Camberlin P. 2005. Influence of topography on monthly rainfall distribution
over East Africa. Climate Research 28: 199–212.
Ogden FL, Julien PL. 1993. Runoff sensitivity to temporal and spatial rainfall
variability at runoff plane and small basin scale. Water resources research 29(8): 2589–
2597.
O'Reagain P, Bushell JC, Holloway C, Reid A. 2009. Managing for rainfall variability:
Effect of grazing strategy on cattle production in a dry tropical savanna. Animal
Production Science 49: 85–99
Özelkan EC, Ni F, Duckstein L. 1996. Relationship between monthly atmospheric
circulation patterns and precipitation: Fuzzy logic and regression approaches. Water
Resources Research 32(7): 2097–2103.
Ozturk A. 1981. On the study of a probability distribution for precipitation totals.
Journal of Applied Meteorology 20: 1499–1595.
Pai DS, Sridhar L, Guhathakurta P, Hatwar HR. 2010. District-wise drought
climatology of the southwest monsoon season over India based on Standardized
Precipitation Index (SPI). Research Report 2/2010. National Climate Centre, India
Meteorological Department, Pune - 411 005.
Piantadosi J, Boland J, Howlett P. 2009. Generating synthetic rainfall on various
timescales daily, monthly and yearly. Environmental Modeling and Assessment 14:
431–438.
176
References
Porter JW, Pink BJ. 1991. A method of synthetic fragments for disaggregation in
stochastic data generation. Hydrology and Water Resources Symposium, Institution of
Engineers, Australia, 187–191.
Power S, Tseitkin F, Torok S, Lavery B, Dahni R, McAvaney B. 1997. Australian
temperature, Australian rainfall and the Southern Oscillation, 1910–1992: coherent
variability and recent changes. Australian Meteorological Magazine 47: 85–101.
Purdie JM, Bardsley WE. 2010. Seasonal prediction of lake inflows and rainfall in a
hydro-electricity catchment, Waitaki River, New Zealand. International Journal of
Climatology 30: 372–389.
Quinn WH, Burt WV. 1972. Use of the Southern Oscillation in weather prediction.
Journal of Applied Meteorology 11: 616–628.
R Development Core Team. 2010. R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing: Vienna, Austria. ISBN 3900051070.
Rader M, Kirshen P, Roncoli C, Hoogenboom G, Ouattara F. 2009. Agricultural risk
decision support system for resource-poor farmers in Burkina Faso, West Africa.
Journal of Water Resources Planning and Management 135(5): 323–333.
Rajagopalan B. 2009. Risk Assessment and Forecasting of Indian Summer Monsoon for
Agricultural Drought Impact Planning. Colorado Water Institute. Completion Report
No. 215.
Richardson CW, Wright DA. 1984. WGEN: A Model for Generating Daily Weather
Variables. Technical Report No 8, United States Department of Agriculture, Agriculture
Research Service.
Risbey JS, Pook MJ, McIntosh PC, Wheeler MC, Hendon HH. 2009. On the remote
drivers of rainfall variability in Australia. Monthly Weather Review 137: 3233–3253.
177
References
Robertson AW, Kirshner S, Smyth P. 2003. Hidden Markov Models for Modeling Daily
Rainfall Occurrence over Brazil. Technical report. University of California.
Ropelewski CF, Halpert MS. 1987. Global and regional scale precipitation patterns
associated with the E l Ni~n o Southern Oscillation. Monthly Weather Review 115: 1606–
1626.
Ropelewski CF, Janowiak JE, Halpert MS. 1985. The Analysis and Display of Real
Time Surface Climate Data. Monthly Weather Review 113(6): 1101–1106.
Rosenberg K, Boland JW, Howlett PG. 2004. Simulation of monthly rainfall totals.
ANZIAM Journal 46: 85–104.
Rotstayn LD, Collier MA, Dix MR, Feng Y, Gordon HB, O'Farrell SP, Smith IN,
Syktus J. 2009. Improved simulation of Australian climate and ENSO related rainfall
variability in a global climate model with an interactive aerosol treatment. International
Journal
of
Climatology,
Published
online
in
Wiley
InterScience.
DOI:
10.1002/joc.1952.
Salvucci GD, Song C. 2000. Derived distributions of storm depth and frequency
conditioned on monthly total precipitation: adding value to historical and satellitederived estimates of monthly precipitation. Journal of Hydrometeorology 1: 113–120.
Sandstrom K. 1995. Modeling the effects of rainfall variability on groundwater recharge
in semi-arid Tanzania. Nordic Hydrology 26: 313–330.
Sen Z, Eljadid AG. 1999. Rainfall distribution function for Libya and rainfall
prediction. Journal of Hydrological Sciences 44: 665–680.
Shannon CE. 1948. Mathematical theory of communication. The Bell System Technical
Journal xxvii: 379–423.
Sharda VN, Das PK. 2005. Modelling weekly rainfall data for crop planning in a subhumid climate of India. Agricultural Water Management 76: 120–138.
178
References
Shui LT, Haque A. 2004. Stochastic Rainfall Model for Irrigation Projects. Pertanika
Journal of Science and Technology 12(1): 137–147.
Simmonds I, Hope P. 1997. Persistence characteristics of Australian rainfall anomalies.
International Journal of Climatology 17: 597–613.
Singh VP. 1997a. Effect of spatial and temporal variability in rainfall and watershed
characteristics on stream flow hydrograph. Hydrological Processes 11: 1649–1669.
Singh VP. 1997b. The use of entropy in hydrology and water resources. Hydrological
Processes 11: 587–626.
Singh VP, Jain SK, Tyagi AK. 2007. Risk and Reliability Analysis: A Handbook for
Civil and Environmental Engineers. ASCE Publications, San Diego.
Singhrattna N, Rajagopalan B, Clark M, Kumar KK. 2005. Seasonal forecasting of
Thailand summer monsoon rainfall. International Journal of Climatology 25: 649–664.
Smith IN, Collier M, Rotstayn L. 2009. Patterns of summer rainfall variability across
tropical
Australia-results
from
EOT
analysis.
Technical
report,
8th
World
IMACS/MODSIM Congress, Cairns, Australia.
Smyth GK. 1996. Regression analysis of quantity data with exact zeroes. In:
Proceedings of the Second Australia–Japan Workshop on Stochastic Models in
Engineering, Technology and Management. Technical report. Technology Management
Centre, University of Queensland, 572–580.
Smyth GK. with contributions from Hu Y, Dunn PK. 2009. Statmod: Statistical
Modeling. R Package Version 1.4.1.
Srikanthan R, McMahon TA. 1984. Synthesizing daily rainfall and evaporation data as
input to water balance-crop growth models. Journal of the Australian Institute of
Agricultural Science. 50: 51–54.
Srikanthan R, McMahon TA. 2001. Stochastic generation of annual, monthly and daily
climate data: A review. Hydrology and Earth System Sciences 5(4): 653–670.
179
References
Stern RD, Coe R. 1984. A model fitting analysis of daily rainfall data (with discussion).
Journal of the Royal Statistical Society, Series A 147: 1–34.
Stone RC, Auliciems A. 1992. SOI phase relationships with rainfall in eastern Australia.
International Journal of Climatology 12: 625–636.
Stone RC, Hammer GL, Marcussen T. 1996a. Prediction of global rainfall probabilities
using phases of the Southern Oscillation Index. Nature 384: 252–255.
Stone RC, McKeon GM. 1993. Prospects for using weather prediction to reduce pasture
establishment risk. Tropical Grasslands 27: 406–413.
Stone RC, Meinke H. 2005. Operational seasonal forecasting of crop performance.
Philosophical Transactions of the Royal Society 360: 2109–2124.
Stone RC, Nicholls N, Hammer G. 1996b. Frost in northeast Australia: Trends and
influences of phases of the southern oscillation. Journal of Climate 9(8): 1896–1909.
Suppiah R. 2004. Trends in the southern oscillation phenomenon and Australian rainfall
and changes in their relationship. International Journal of Climatology 24: 269–290.
Sutherland RA, Bryan RB, Wijendes DO. 1991. Analysis of the monthly and annual
rainfall climate in a semi-arid environment, Kenya. Journal of Arid Environments, 20:
257–275.
Suzuki E. 1964. Hyper gamma distribution and its fitting to the rainfall data.
Meteorology and Geophysics 15: 31–51.
Suzuki E. 1967. A statistical and climatological study on the rainfall in Japan.
Meteorology and Geophysics 18(3): 103–181.
Swift LW Jr, Schreuder HT. 1981. Fitting daily precipitation amounts using the SB
distribution. Monthly Weather Review 109: 2535–2540.
Taschetto AS, Haarsma RJ, Gupta AS, Ummenhofer CC, England MH. 2010.
Teleconnections associated with the intensification of the Australian monsoon during
180
References
E l Ni~
n o Modoki events. Conference Series: Earth and Environmental Science. 11,
012031.
Thom HC. 1958. A note on the gamma distribution. Monthly Weather Review 86: 117–
122.
Tilahun K. 2006. The characterisation of rainfall in the arid and semiarid regions of
Ethiopia. Water South Africa 32: 429–436.
Toth E, Brath A, Montanari A. 2000. Comparison of short-term rainfall prediction
models for real-time flood forecasting. Journal of Hydrology 239: 132–147.
Trenberth KV. 1997. The Definition of El Ni~n o . Bulletien of American Meteorological
Society 78: 2771–2777.
Troup AJ. 1965. The Southern Oscillation. Quarterly Journal of Royal Meteorological
Society 91: 490–506.
Tucker GE, Bras RL. 2000. A stochastic approach to modeling the role of rainfall
variability in drainage basin evolution. Water Resources Research 36(7): 1953–1964.
Tweedie MCK. 1984. An index which distinguishes between some important
exponential families. Statistics: applications and new directions. In: Proceedings of the
Indian Statistical Institute Golden Jubilee International Conference. Technical Report.
Indian Statistical Institute, Calcutta.
Van Etten EJB. 2009. Inter-annual rainfall variability of arid Australia: greater than
elsewhere? Australian Geographer 40(1): 109–120.
Velarde LGC, Migon HS, Pereira BB. 2004. Space-time modeling of rainfall data.
Environmetrics 15: 561–576.
Walker Institute: 2010 Rainfall variability in Queensland: Walker Institute research.
http://www.walker-institute.ac.uk/publications/factsheets/walkerfactsheetQueenslandlit
%20rev.pdf.
181
References
Wang G, Hendon HH. 2007. Sensitivity of Australian rainfall to inter- El Ni~n o
variations. Journal of Climate 20(16): 4211–4226.
Ward MN, Folland CK. 1991. Prediction of seasonal rainfall in the north nordeste of
Brazil using eigenvectors of sea-surface temperature. International Journal of
Climatology 11(7): 711–743.
Watkins AB. 2005. The Australian drought of 2005. WMO Bulletin 54(3): 156–162.
Weltzin JF, Loik ME, Schwinning S, Williams DG, Fay PA, Haddad BM, Harte J,
Huxman TE, Knapp AK, Lin G, Pockman WT, Shaw MR, Small EE, Smith MD, Smith
SD, Tissue DT, Zak JC. 2003. Assessing the Response of Terrestrial Ecosystems to
Potential Changes in Precipitation, Bio Science 53(10): 941–952.
Westra S, Sharma A. 2009. Estimating the seasonal predictability of global precipitation
–an empirical approach. Technical Report. In: 18th World IMACS/MODSIM Congress,
Cairns, Australia.
Wilks DS. 1990. Maximum likelihood estimation for the gamma distribution using data
containing zeros. Journal of Climate 3: 1495–1501.
Wilks DS. 1995. Statistical Methods in the Atmospheric Sciences: an Introduction.
Academic Press: San Diego, CA.
Wilks DS. 1998. Multisite generalization of a daily stochastic precipitation generation
model. Journal of Hydrology 210: 178–191.
Wilks DS. 1999. Interannual variability and extreme-value characteristics of several
stochastic daily precipitation models. Agricultural and Forest Meteorology 93: 153–169.
Wilks DS, Eggleston KL. 1992. Estimating monthly and seasonal precipitation
distributions using the 30- and 90-day outlooks. Journal of Climate 5(3): 252–259.
Wilks DS, Wilby RL. 1999. The weather generation game: a review of stochastic
weather models. Progress in Physical Geography 23: 329–357.
182
References
Willcocks JR, Stone RC. 2000. Frost risk in eastern Australia and the influence of the
Southern Oscillation. DPI Information Series QI00001. Queensland Department of
Primary Industries: Toowoomba.
Wimalasuriya R, Ha A, Tsafack E, Larson K. 2008. Rainfall variability and its impact
on dry land cropping in Victoria. Technical report, 52nd Annual Conference of the
Australian Agricultural and Resource Economics Society (AARES), Canberra.
Wu R, Kirtman BP. 2007. Roles of the Indian Ocean in the Australian Summer
Monsoon-ENSO Relationship. Journal of Climate 20: 4768–4788.
Yoo C, Jung KS, Kim TW. 2005. Rainfall frequency analysis using a mixed gamma
distribution: evaluation of the global warming effect on daily rainfall. Hydrological
Processes 19: 3851–3861.
Yue S, Hashiho M. 2003. Long-term trends of annual and monthly precipitation in
Japan. Journal of American Water resources Association 39(3): 587–596.
Zaw WT, Naing TT. 2008. Empirical Statistical Modeling of Rainfall Prediction over
Myanmar. Technical report 36. Proceedings of World Academy of Science, Engineering
and Technology.
183
APPENDIX A: Article entitled, ―Two Tweedie distributions that are near-optimal for
modelling monthly rainfall in Australia‖, published in International Journal of
Climatology.
184
185
186
187
188
189
190
191
192
APPENDIX B: Paper published in the Agricultural and Forest Meteorology.
193
194
195
196
197
198
199
200
201
202
203
APPENDIX C: Paper, entitled ―Entropy, consistency in rainfall distribution and
potential water resource availability in Australia‖, published in Hydrological Processes.
204
205
206
207
208
209
210
211
212
213
APPENDIX D: Article, ―Understanding the effect of climatology on monthly rainfall
amounts in Australia using Tweedie GLMs‖. published online on International Journal
of Climatology
214
215
216
217
218
219
220
221
222
223
224
225
APPENDIX E: Letter from the Journal of Hydrology receiving the paper “The
Tweedie family of distributions for modelling seasonal rainfall totals in Australia”.
HYDROL11596: Notice of manuscript number
[email protected] [[email protected]]
on behalf of J. Hydrology [[email protected]]
Sent: Monday, 20 June 2011 2:23 AM
To:
Md Hasan
Dear Mr Hasan,
Your submission entitled "The Tweedie family of distributions
for modelling seasonal rainfall totals in Australia" has been
assigned the following manuscript number: HYDROL11596.
You will be able to check on the progress of your paper by
logging on http://ees.elsevier.com/hydrol/ as Author.
Thank you for submitting your work to this journal.
With kind regards,
D. Jones
Administrative Support Agent [23-Mar-11]
Journal of Hydrology
© Copyright 2026 Paperzz