modeling the precipitation amounts dynamics for different time

MODELING THE PRECIPITATION AMOUNTS DYNAMICS
FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE
REGRESSION APPROACH
N. BARBU1,2, V. CUCULEANU1,3, S. STEFAN1
1
University of Bucharest, Faculty of Physics, RO-077125, P.O.BOX MG-11, Magurele, Bucharest,
E-mail: [email protected]
2
National Meteorological Administration, Bucharest, Romania
3
Academy of Romanian Scientists, Splaiul Independentei 54, RO-050094, Bucharest, Romania
Received September 29, 2014
Water resources are very important for ecosystems and water deficit may cause serious
social and economical issues. The aim of this study is to analyze the performances of
prediction procedure based on Multiple Linear Regression Model (MLRM), for the
precipitation amounts for yearly and seasonal time scales, in Romania. For this purpose
we have used as predictand annual and seasonal amount of precipitation and as
predictors Mean Sea Level Pressure (MSLP), geopotential height at 300 hPa
(HGT300), wind speed at 700 hPa (WS700), temperature at 850 hPa (T850) and Total
Column Water (TCW). The selection of predictors is based on the collinearity and
multicollinearity analysis. Multicollinearity problems occur only during winter. All
data sets used in this study are reanalysis gridded data with a spatial resolution of 2º x
2º lat/lon and are obtained from 20th Century Reanalysis Version 2 (20CR V2) Project.
The analysis is made for a period of 140 years between 1871 and 2010, the period 1871
– 2000 being used to build the MLRM and the period 2000 – 2010 for testing the
prediction performances of the MLRM. Using MLRM we have obtained some good
correlation between predicted and measured precipitation amount. The correlation
coefficient varies between 0.34 and 0.98, with the smallest values in winter and the
greatest values in spring. The Spearman correlation was also used to validate the
MLRM, and correlation coefficients are between 0.25 for winter and 0.98 for spring.
Key words: precipitation amount prediction, multiple linear regression, Sperman correlation.
1. INTRODUCTION
Precipitation is a principal element of the hydrological cycle, and changes in
precipitation pattern is important because may lead to floods or droughts events
that may cause economical loss and casualties. Frequency of precipitation plays an
important role in the management of agriculture, water resources and ecosystems,
and the time scales of the precipitation variability varies from months to years or
decades. Identifying and understanding the influence of the large-scale air
circulation patterns which produce temporal and spatial variations of precipitation
in Romania is therefore of a great importance.
Rom. Journ. Phys., Vol. 59, Nos. 9–10, P. 1127–1149, Bucharest, 2014
1128
N. Barbu, V. Cuculeanu, S. Stefan
2
In addition to global warming, on regional scale, several other factors may
determine future climate change such as variation in air circulation and topography
[1]. Variations in air circulation influence the climate in large areas both on interannual and long time scales while the topography modifies the effects of air
circulation on local scale [2].
Precipitation studies have been made for various time periods and on various
spatial scales: global [3], hemispheric [4], regional [5] and local [6]. A general
review of seasonal and annual precipitation trends in Italy using historical records
was made by Buffoni et al. [7]. Various methods are used to analyze spatial and
temporal variability of precipitation, for example: trend and change point analysis
[8, 4], cluster analysis [9, 10], Empirical Orthogonal Function (EOF) analysis [11,
6], canonical correlation analysis [11], and multiple linear regressions [12, 13, 14].
The multiple linear regression approach was developed and successfully
applied several decades ago to the specification of surface temperature from midtroposphere circulation, mainly for the purpose of weather prediction, by Klein
[15]. The multiple linear regression model was also used in several studies of
atmospheric physics, such as pollutant dynamics [16, 17], the estimation and
prediction of atmospheric concentrations of the natural occurring radionuclides,
radon and thoron [18], and to estimate and predict the radiative forcing of the
clouds [19].
In Romania, various studies pointed out changes observed in the precipitation
regime and their connection with changes in the large-scale circulation patterns [6,
20]. The main physical and physico-geographical factors controlling the spatial
distribution of the climatic conditions in Romania are the large scale mechanisms
(represented by the air circulation) and local mechanism induced by the Black Sea
and Carpathian Mountains (orographic forcing). Interactions between large-scale
air circulations with orography contribute to the determination of the precipitation
field structure [21]. Winter precipitation variability in Romania is mainly
controlled by variation in air circulation, the south-westerly circulation having the
dominant role, modulated by the Carpathian topography [6, 20].
The aim of this paper is to develop a rainfall predictive model on a yearly and
seasonally time scale. In order to achieve this, the multiple linear regression model
(MLRM) is used. The MLRM was build up by using as predictors mean sea level
pressure, geopotential height at 300 hPa, wind speed at 700 hPa, temperature at 850
hPa and total column water. First of all predictors were tested in terms of
colinearitatea and multicolinearity. After that, the selected predictors were used to
build up the MLRM for a period of 130 years between 1871 and 2000. Therefore
the analytical equation obtained by building up the MLRM was used to predict the
precipitation amount for yearly and seasonally time scales for 10 years, during the
period 2001–2010. The performances of the MLRM were tested by using Pearson
and Spearman correlation between predicted and measured precipitation amount.
3
Modeling the precipitation amounts dynamics in Romania
1129
The paper is structured as follows. The data sets used in this study are
presented in Section 2. Section 3 presents the methodology. In Section 4 the
collinearity and multicollinearity analysis are discussed. Section 5 presents the
estimation of annual and seasonal precipitation amount. Section 6 is dedicated to
the prediction of annual and seasonal precipitation amount. The main conclusions
are presented in Section 7.
2. DATA
Data sets used in this study are reanalysis gridded data obtained from the
NOAA’s Twentieth Century Reanalysis Project [22] and consist on annual and
seasonal amount of precipitation (PP) and annual and seasonal average of mean sea
level pressure (MSLP), geopotential height at 300 hPa (HGT300), wind speed at
700 hPa (WS700), temperature at 850 hPa (T850) and total column water (TCW).
The newly available Twentieth Century Reanalysis (20CR) covers the period
1871–2010. It assimilates surface pressure observations only [23, 22] into an
atmospheric model on a horizontal resolution of T62 (approximately 1.9°) and also
uses observed monthly sea-surface temperatures and sea-ice distributions as
boundary conditions. An Ensemble Kalman Filter is used to optimally combine the
imperfect observations and estimates of current state, producing an ensemble of 56
realizations of the reanalysis [22]. This allows an investigation of the observational
uncertainties in the data assimilation. Spatial resolution of gridded data is 2 x 2
degrees latitude/longitude.
All data are built up for a period of 140 years, during 1871 – 2010. Data were
divided into two groups: the first group is a period of 130 years between 1871 and
2000 and the second is a period of 10 years between 2001 and 2010. The first
group was used to build MLRM, and the second to test the prediction performances
of the MLRM. For this study we have been used the same data base source for
predictors and predictands to be consistent with the multiple linear regression
method.
The grid points, located in different regions of Romania, used to extract PP,
MSLP, HGT300, WS700, T850 and TCW time series are presented in Figure 1.
Each grid point used in the study is located in a different region of Romania
with different climate induced by the orographic barrier of the Carpathian
Mountains and by the Black Sea. The grid point 1 is located in western part of
Romania with influences of dry air masses from Central Europe and the wet air
masses from Mediterranean Sea. The grid point 2 is located in eastern part of
Romania with influences of synoptic patterns due to the Black Sea and the Russian
Plain. The grid point 3 is located in Intra-Carpathian Region with influences of the
Carpathian Mountains configuration. The grid point 4 is located in southern part of
Romania with influences of the southwestern air circulation and by the Black Sea
presence.
1130
N. Barbu, V. Cuculeanu, S. Stefan
4
Fig. 1 – Regions in Romania subjected to particular climate and weather patterns and associated grid
points used in this study (Region 1 – western part of Romania, Region 2 – eastern part of Romania,
Region 3 – Intra-Carpathian region and Region 4 – southern part of Romania).
To highlight different climatic characteristics of each region Figure 2 presents
the annual amounts of precipitation for all regions in Romania subjected to
particular climate and weather patterns.
Fig. 2 – Box plot of annual amount of precipitation for all regions of Romania
(whiskers represents minimum and maximum).
The smallest value of annual amount of precipitation averaged over the entire
study period (377 mm) belongs to the Region 2 and the greatest value of annual
amount of precipitation averaged over the entire study period (556 mm) belongs to
the Region 3, followed by the Region 1 (538 mm). Region 4 has a multiannual
mean of precipitation equal to 444 mm.
5
Modeling the precipitation amounts dynamics in Romania
1131
3. METHODOLOGY
In order to predict the annual and seasonal precipitation amount the multiple
linear regression model (MLRM) has been used. MLRM is a statistical technique
that is used to model a linear relationship between a dependent variable
(predictand), a continuous variable and one or more independent variables
(predictors) and assumes a linear relationship between variables [24].
The estimation procedure is made by using the analytical expression of the
multiple linear regression with selected predictors. Therefore, the predictand y(t) is
associated with the predictors { x i ( t )} i =1,2,…,n by the following relationship [25]:
y (t ) = β 0 + β 1 x 1 (t ) + β 2 x 2 (t ) + … + β n x n (t )
(1)
where β0 is the regression constant, β1…βn are the regression coefficients.
The performances of the MLRM are quantified by the following statistic
parameters:
• Multiple correlation coefficient (R) – is a measure of correlation between
predictand and predictors;
• Squared multiple correlation coefficient (R2) – quantifies the proportion
of variance of the predictand that is explained by the predictors;
• F-Test – is a global test of significance of the ensemble of coefficients.
The F-Test is used to test the null hypothesis stating that all regression
coefficients are equal to zero and the alternative hypothesis stating that at
least one of the regression coefficients is different of zero. Large values
of the F-Test provide evidence against the null hypothesis.
• p-value – characterizes the significance level so that the null hypothesis
to be rejected or accepted against the alternative hypothesis. The p-value
is computed by using the t-Student distribution for the regression
coefficients and F distribution for the F-Test for the overall model. The
p-value less than 0.05 indicate that the null hypothesis may be rejected
and the alternative one has to be accepted.
• Sum Squared Residuals (SSR) – indicate the deviation between the
predictand estimated or predicted by the MLRM and the measured
values.
Therefore the analytical expression obtained by building up the MLRM can
be used to predict the values of the dependent variable.
The performance of MLRM was also tested by using Pearson correlation and
Spearman (so called rank correlation) correlation between measured and estimated
precipitation amount for yearly and seasonally time scale. Pearson’s correlation
coefficient is a measure of the strength of the linear relationship between two
variables [26]. Spearman’s rank correlation coefficient is a nonparametric
(distribution-free) rank statistic proposed by Spearman [27] as a measure of the
1132
N. Barbu, V. Cuculeanu, S. Stefan
6
strength of an association between two variables. Spearman’s coefficient is not a
measure of the linear relationship between two variables, as some “statisticians”
declare. It assesses how well an arbitrary monotonic function can describe the
relationship between two variables, without making any assumptions about the
frequency distribution of the variables.
4. COLLINEARITY AND MULTICOLLINEARITY ANALYSIS
The selection of predictors is an iterative process based partly on user’s
subjective judgment. The two main requirements for predictors are: a good
relationship with predictant, and a reasonable temporal extension. In order to select
the independent predictors to build the MLRM the collinearity and
multicollinearity analysis were performed.
The collinearity analysis was made by determining the correlation
coefficients (R) and corresponding level of significance (p-value) between
predictors. Collinearity is a linear relationship between two predictors. Two
collinear predictors explaining the same part of variation for the predictand and
determine less significance in the case of individual terms of partial regression
coefficients. The regression coefficients value becomes more dependent on the
distribution and values of errors. When the correlation coefficient is close to 1
(larger than 0.99) the collinearity difficulties appear. A correlation describes the
strength of an association between variables. An association between variables
means that the value of one variable can be predicted, to some extent, by the value
of the other. A correlation is a special kind of association: there is a linear relation
between the values of the variables.
Table 1 presents only the highest correlation coefficients (R) and
corresponding level of significance (p) for yearly and seasonally time scales for
MSLP, HGT300, TCW, T850 WS700 predictors and region for which those were
obtained. Because collinearity problems occurs when the correlation coefficients
are larger than 0.99 we presented here only the highest values of the R. The R
values for all variables, all regions and all time scales are less than 0.99 and this
indicates that there are not collinearity problems.
Table 1
The highest correlation coefficients (R) and corresponding level of significance (p) for yearly
and seasonally time scales for MSLP, HGT300, TCW, T850 WS700 predictors
Year
HGT300
R
TCW
p
R
p
MSLP
0.38
7.80E06
-0.64
3.10E16
Region
1
4
HGT300
Region
0.27
2
0.002
TCW
Region
T850
Region
7
Modeling the precipitation amounts dynamics in Romania
1133
Table 1 (continued)
Winter
Spring
Summer
Autumn
T850
R
WS700
p
R
HGT300
p
R
TCW
p
R
T850
p
R
WS700
p
R
TCW
p
R
p
R
T850
p
R
WS700
p
R
HGT300
TCW
p
R
p
R
T850
p
R
WS700
p
R
HGT300
TCW
p
R
p
R
T850
p
R
WS700
p
R
HGT300
p
-0.44
1.60E07
-0.53
8.00E11
0.5
2.00E09
-0.68
1.10E18
-0.45
1.30E07
-0.52
2.70E10
0.2
0.02
-0.76
1.40E24
-0.57
2.60E12
-0.42
8.40E07
0.21
0.02
-0.58
1.27E12
-0.43
3.26E07
0.39
4.80E06
0.21
0.02
-0.61
3.90E14
-0.5
1.50E09
-0.62
1.20E14
4
1
0.73
1.00E21
-0.12
1
1
0.17
0.5
1.60E09
0.33
1.10E04
4
1
0.11
1
0.22
1
4
4
4
0.38
6.80E06
0.7
2.00E19
-0.07
2
2
2
0.46
0.79
9.90E27
0.35
4.10E05
2
1
0.33
1.20E04
4
0.35
5.20E05
1
-0.34
8.20E05
2
0.23
1
1
4
4
1
0.43
2.80E07
0.76
1.60E24
0.16
2
2
1
0.06
0.77
3.00E25
0.42
7.40E07
3
1
4
4
4
4
0.2
0.02
0.78
3.10E26
-0.29
8.50E04
4
1
3
0.03
0.72
-0.3
6.20E04
4
4
1
1
4
1
0.39
4.70E06
0.79
1.70E26
-0.15
0.09
2
1
2
0.67
7.60E18
0.39
4.80E06
2
1
0.01
1134
N. Barbu, V. Cuculeanu, S. Stefan
8
For the analysis of the collinearity one may draw the following conclusions:
T850 is positive correlated with HGT300. This is due to the anticyclonic systems
(high pressure systems) that generally were extended up to the troposphere.
TCW is negative correlated with MSLP, R varies between -0.52 (p-value is
2.14e-10) for summer for Region 1 and -0.76 (p-value is 1.37e-24) for spring for
Region 4. This result was expected because climate of Region 4 is influenced by
the Mediterranean Sea and Black Sea and climate of Region 1 is influenced by the
continental dry air masses that come from the Central Europe. Cyclonic activity
that is pronounced during spring contributes to the humidity advection in Region 4
from the Mediterranean Sea.
WS700 is negative correlated with MSLP and positive correlated with TCW
(R values between 0.42 and corresponding p-value 7.4e-7 for spring for Region 1
and 2.08e-5 and corresponding p-value 0.99 for summer for Region 3). Explanation
is that the wind speed is proportional to the baric gradient; it is associated to the
cyclone and wind speed contributes to the moisture advection. Region 3 is
sheltered by the Carpathian Mountains that act as a barrier against wet air masses
from the Mediterranean Sea.
HGT300 is positive correlated with MSLP, R is 0.5 and corresponding
p-value is 2.04e-9 for winter for Region 1 and 0.06 and corresponding p-value 0.95
for spring for Region 2. Generally baric systems have vertical extension up near the
tropopause and this leads to the correlation between MSLP and HGT300.
T850 is positive correlated with TCW. It is well known that warm air is able
to storing a large amount of water vapor than cold air.
Variability of the frequency of occurrence and life cycle of the baric systems
are a great importance for selected predictors. The Carpathian Mountains modulate
the climate characteristics and they induce a climate regionalization of Romania.
For selected predictors on all time scales and for all regions of Romania there
are not collinearity problems.
For the multicollinearity analysis the variance inflation factor (VIF) is used.
The main statistic for multicollinearity diagnostic is the VIF defined as [28]:
VIF =
1
1 – R 2j
(2)
where R 2j is the coefficient of determination when variable x j is regressed on j-1
remaining independent variables. A variable is considered to be problematic if its
VIF is larger than 10.0 [28].
In table 2 VIF values for all predictors, all regions of Romania and all time
scales are listed.
9
Modeling the precipitation amounts dynamics in Romania
1135
Table 2
The VIF values for all predictors, all time scales and all regions of Romania (1 is Region 1, 2
is Region 2, 3 is Region 3 and 4 is Region 4). The values of VIF that exceeds 10 are in bold
Year
Winter
Spring
Summer
Autumn
1
2
3
4
MSLP
5.41
4.39
5.12
4.79
HGT300
7.21
5.23
6.31
4.55
TCW
1.92
2.04
2.03
2.31
T850
5.8
5.28
5.55
4.71
WS700
1.5
1.26
1.38
1.24
MSLP
10.42
9.14
9.63
10.58
HGT300
14.07
11.47
12.98
10.79
TCW
4.71
3.96
4.27
3.82
T850
11.96
12.38
11.66
10.94
WS700
1.47
1.46
1.4
1.55
MSLP
8.29
6.92
8.27
8.33
HGT300
9.49
7.92
8.64
8.21
TCW
4.82
3.96
5.07
4.05
T850
9.48
9.84
9.16
9.97
WS700
1.35
1.15
1.28
1.17
MSLP
2.55
3.01
2.67
3.14
HGT300
4.53
3.26
3.88
2.44
TCW
1.81
1.68
1.77
1.72
T850
4.45
4.06
4.01
3.14
WS700
1.1
1.14
1.13
1.24
MSLP
6.1
4.6
5.69
4.4
HGT300
9.21
6.53
8.13
5.56
TCW
2.05
2.25
2.09
1.84
T850
9.67
8.2
9.01
7.09
WS700
1.76
1.22
1.59
1.45
By analyzing the VIF values it can be seen that only for winter the
multicolliniarity is present, the VIF values exceeds the upper limit (namely 10.0)
for MSLP for western part and southern part of Romania and for HGT300 and
T850 for all regions of Romania. The explanation may be that winter has a high
thermodynamic stability, and this means that temporal fluctuations of predictors
1136
N. Barbu, V. Cuculeanu, S. Stefan
10
are lower than for other seasons. The anticyclonic regime is dominant during
winter and this leads to the same weather situation for a long period of time
compared to other seasons. The VIF values for WS700 are the lowest and those are
less than 1.76. To avoid the multicollinearity problems the predictors that exceed
10.0 are not used to build MLRM for winter season. The values of VIF that
exceeds 10.0 can lead to multicollinearity consequences such as the negative values
for predicted values of precipitation amount. For the analyses of VIF we can say
that the considered variables are good predictors for precipitation amount
prediction except for winter.
5. ESTIMATION OF PRECIPITATION AMOUNT
Annual and seasonal precipitation amount were estimated by using multiple
linear regression software develop by Wessa [29]. The regression coefficients are
determined by using the least square method. The estimation of annual and
seasonal amount of precipitation is performed by build up the MLRM for 130 years
during the period 1871–2000 using as predictors MSLP, HGT300, T850, TCW and
WS700. Therefore the analytical expressions obtained were used to estimate the
precipitation amount for yearly and seasonally time scales for the same period that
was used to build up the MLRM.
5.1. ESTIMATION OF ANNUAL PRECIPITATION AMOUNT
In Table 3 the regression statistics for the estimation of annual precipitation
amount for all regions of Romania are presented. One may notice that the multiple
correlation coefficients (R) have the closer values for all regions. The R value is
the greatest for western and eastern part of Romania (0.87), and the smallest value
of R (0.82) was obtained for southern part of Romania. The p-values for all regions
are close to zero and this indicates that the results are statistically significant at a
level of 0.01.
Table 3
Regression statistics for all regions of Romania for annual time scale
Region
1
2
3
4
R
0.87
0.84
0.87
0.82
R²
0.76
0.71
0.75
0.67
p
2.11E-36
9.50E-32
1.81E-35
1.45E-28
F-test
77.19
60.88
73.69
51.23
SSR
275617
260226
323827
327359
Figure 3 presents the estimated and measured values of annual precipitation
amount for Region 1 (a), Region 2 (b), Region 3 (c) and Region 4 (d).
11
Modeling the precipitation amounts dynamics in Romania
(a)
(c)
1137
(b)
(d)
Fig. 3 – Estimated and measured annual amount of precipitation (PP) for all four regions of Romania
(a – Region 1, b – Region 2, c – Region 3 and d – Region 4) considered in this study.
The explained variance (R²) varies between 67% for region 4 and 76% for
region 1. The performance of the MLRM is also quantified by SSR, small values of
SSR indicates a good prediction of precipitation amount. SSR has the smallest
value for Region 2 (260226) and the highest values for Region 4 (327359). In case
of yearly time scale the best performance of the MLRM was noted for Region 2.
From Figure 3 it can be seen that for all regions of Romania the estimation
annual precipitation amounts are very close to the measured values.
5.2. ESTIMATION OF ANNUAL PRECIPITATION AMOUNT
Table 4 presents the regression statistics for the estimation procedure of the
MLRM for all seasons and all regions of Romania. The selected predictors used to
build up the MLRM, according to the multicolliniarity analysis (see Table 3), for
spring, summer and autumn are MSLP, HGT300, TCW, T850 and WS700 whereas
1138
N. Barbu, V. Cuculeanu, S. Stefan
12
for winter the predictors are MSLP, TCW and WS700 for Intra-Carpahian and
eastern part of Romania and TCW and WS700 for southern and western part of
Romania.
Table 4
Regression statistics for all four regions of Romania and for seasonal time scale
Winter
Spring
Summer
Autumn
Region
R
R²
p
F-test
SSR
1
0.51
0.26
1.90E-06
21.9
102044
2
0.6
0.36
9.20E-21
23.21
92694
3
0.73
0.53
5.00E-12
48.02
65584
4
0.43
0.19
6.80E-09
14.64
159251
1
0.88
0.78
1.70E-38
85.54
49889
2
0.82
0.67
1.80E-28
50.99
56434
3
0.88
0.78
6.40E-39
87.27
57108
4
0.85
0.73
3.80E-33
65.51
64809
1
0.92
0.84
4.30E-47
127.26
67677
2
0.82
0.67
2.00E-28
50.83
60549
3
0.91
0.83
1.20E-45
119.34
72234
4
0.85
0.73
2.70E-33
66.01
35483
1
0.82
0.68
5.50E-29
52.45
73293
2
0.78
0.61
1.70E-23
38.04
60395
3
0.83
0.68
2.20E-29
53.59
72963
4
0.77
0.6
5.60E-23
36.81
81219
Multiple correlation coefficients (R) indicate a good correlation between
estimated and measured precipitation amount for seasonally time scale, which
means that the predictors selected for prediction of precipitation amount capture
very well the mechanisms that control the precipitation dynamics. The greatest
R (0.92) was obtained for summer season for western part of Romania and the
smallest R (0.43) was obtained for southern part of Romania during winter.
Generally summer is the season with the largest values of R for all regions of
Romanian and, on the other hand winter is the season with the smallest R values.
The smallest values of R obtained for winter are due to the lower number of
predictors used compared to other seasons. The p-values are close to zero for all
seasons and all regions of Romania and this indicates that the results are
13
Modeling the precipitation amounts dynamics in Romania
1139
statistically significance at 0.05 level. The greatest explained variance (84%) was
found for summer season for region 1 and the lowest explained variance (19%) was
found for winter for region 4. Quantifying the performance of the MLRM in terms
of SSR the high performance was found for summer for region 4.
Figure 4 presents the estimated and measured precipitation amount for winter
for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region
4 - d).
(a)
(c)
(b)
(d)
Fig. 4 – Estimated and measured winter amount of precipitation (PP) for all regions of Romania
(a – Region 1, b – Region 2, c – Region 3 and d – Region 4) considered in this study for the period
1871–2000.
In Figure 5 is presented the estimated and measured precipitation amount for
spring for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and
Region 4 - d).
1140
N. Barbu, V. Cuculeanu, S. Stefan
(a)
(c)
14
(b)
(d)
Fig. 5 – The same as in Figure 4, but for spring.
Figure 6 presents the estimated and measured precipitation amount for summer
for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d).
(a)
(b)
15
Modeling the precipitation amounts dynamics in Romania
(c)
1141
(d)
Fig. 6 – The same as in Figure 4, but for summer.
In Figure 7 is presented the estimated and measured precipitation amount for
autumn for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and
Region 4 - d).
(a)
(b)
(c)
(d)
Fig. 7 – The same as in Figure 4, but for autumn.
1142
N. Barbu, V. Cuculeanu, S. Stefan
16
From Figures 4, 5, 6 and 7 one can clearly see that in case of winter season
there are differences between estimated and measured values of precipitation
amount. In case of spring, summer and autumn there are small differences between
estimated and predicted values of precipitation amount. From this analysis we can
expect poor performance of the MLRM only for winter.
6. PREDICTION OF PRECIPITATION AMOUNT
The prediction procedure is based on the regression analytical expression
with the data from previous period using the selected predictors on the basis of the
procedure exposed in Section 4. The analytical expressions were obtained by using
MLRM with selected predictors for a period of 130 years, between 1871 and 2010
for yearly and seasonal time scales. The prediction period is a period of 10 years,
between 2001 and 2010.
6.1. PREDICTION OF ANNUAL PRECIPITATION AMOUNT
Figure 8 presents the linear relationship between predicted and measured
precipitation amount for all regions of Romania (Region 1 - a, Region 2 - b,
Region 3 - c and Region 4 - d) for yearly time scale.
(a)
(b)
17
Modeling the precipitation amounts dynamics in Romania
(c)
1143
(d)
Fig. 8 – Measured precipitation amount (PP measured) function of predicted precipitation amount
(PP predicted) for all regions of Romania (a – Region 1, b – Region 2, c – Region 3 and d – Region 4)
for yearly time scale. The linear regression equation and corresponding parameters are drawn in
bottom right corner. The prediction was made for a period of 10 years (2001–2010).
We have obtained closer values of correlation coefficients (R) for all four
regions of Romania and they are larger than 0.85. The greatest value of the
correlation coefficients between measured and predicted values is 0.92 and it was
obtained for western part of Romania and Intra-Carpathian region. All values of R
are statistical significant at a level of 0.05, with the p-value less than 1.64e-3. In
case of yearly time scale one can say that the performance of the MLRM is high.
6.2. PREDICTION OF SEASONAL PRECIPITATION AMOUNT
Figures 9, 10, 11 and 12 present the linear relationship between predicted and
measured precipitation amount for all regions of Romania (Region 1 - a, Region
2 - b, Region 3 - c and Region 4 - d) for winter, spring, summer, and autumn.
(a)
(b)
1144
N. Barbu, V. Cuculeanu, S. Stefan
(c)
18
(d)
Fig. 9 – The same as in Figure 8, but for winter.
(a)
(b)
(c)
(d)
Fig. 10 – The same as in Figure 9, but for spring.
19
Modeling the precipitation amounts dynamics in Romania
(a)
(c)
(b)
(d)
Fig. 11 – The same as in Figure 9, but for summer.
(a)
(b)
1145
1146
N. Barbu, V. Cuculeanu, S. Stefan
(c)
20
(d)
Fig. 12 – The same as in Figure 9, but for summer.
The correlation coefficients (R) obtained varies between 0.34 for winter for
western part of Romania and 0.98 for spring for eastern part of Romania. Generally
the lowest values of R are obtained for winter and the highest values of R are
obtained for spring. The poor performances of prediction of the precipitation
amount in winter are poor for western and eastern part of Romania, the correlation
coefficients are 0.34 (p-value is 0.33) respectively 0.52 (p-value is 0.13). This is
due to the smallest number of predictors used for prediction compared to the others
seasons. However, for the Intra-Carpathian region and eastern part of Romania the
correlation coefficients are high, 0.87 respectively 0.73.
6.3. TESTING THE PERFORMANCE OF MLRM PREDICTION
The performances of the MLRM were tasted by using Spearman correlation;
the Spearman correlation is more robust, it is not sensible to the linear trend.
Table 5
Spearman correlation coefficients between measured and predicted annual and seasonal precipitation
amount for all regions of Romania (1 – Region 1, 2 – Region 2, 3 – Region 3 and 4 – Region 4).
The prediction period is 2001–2010
Year
Winter
Spring
Summer
Autumn
1
0.89
0.25
0.91
0.84
0.76
2
0.83
0.73
0.9
0.66
0.66
3
0.83
0.86
0.98
0.9
0.76
4
0.65
0.29
0.9
0.87
0.67
Table 5 presents the Spearman correlation coefficients between measured and
predicted precipitation amount for yearly and seasonal time scales. The correlation
21
Modeling the precipitation amounts dynamics in Romania
1147
coefficient varies between 0.25 in winter for western part of Romania and 0.98 for
eastern part of Romania in spring. And from this analysis the performance of the
MLRM is weaker for winter because the predictors used by the model are less than
for others seasons. The highest performance of the MLRM was found for spring for
all regions. The Spearman’s correlation coefficients are close to the Pearson’s
correlation coefficients.
7. CONCLUSIONS
In this study, the MLRM was developed to investigate the influence of the
MSLP, HGT300, TCW, T850 and WS700 on the annual and seasonal precipitation
amount in Romania. The collinearity and multicollinearity analysis indicate the fact
that the MSLP, HGT300, TCW, T850 and WS700 are very good predictors for
precipitation amount at yearly and seasonally time scales with small inconvenients
for winter when for MSLP, HGT300 and T850 predictors the multicollinearity
problems appears.
The regression statistics for the precipitation amount estimation show larger
values of multiple R with high level of significance (p-value close to zero) for the
entire year and for all seasons that varies between 0.43 in winter for southern part
of Romania and 0.92 in summer for western part of Romania.
The correlation coefficients between measured and predicted precipitation
amount obtained for 10 years during the period 2000-2010 by using MLRM varies
between 0.34 for winter for western part of Romania and 0.98 for spring for eastern
part of Romania.
For yearly time scale the correlation coefficients varies between 0.85 and
0.92 and Spearman correlation coefficient varies between 0.65 and 0.89. Those
results are in accordance with those found in previous studies. For example
Rajeevan et al. [30] reported that the correlation between predicted and observed
fot 24 years period was 0.77-0.84, and Mizanur Rahman et al. [14] reported that
the correlation between predicted and observed rainfall for the 31 years during the
period 1976-2007 is 0.74.
Using Spearman rank correlation method the correlation coefficients are
approximately the same with Pearson’s correlation coefficients, and vary between
0.25 in winter for western part of Romania and 0.98 for eastern part of Romania in
spring.
The multiple regression model presented in this study can be used to predict
precipitation anomalies in Romania in future climate using simulated predictors.
This is very useful because predictor fields, like HGT 300, T850 or MSL are
accurately simulated by model.
Finally, we can conclude that the MLRM developed in this study is a very
good model for precipitation amount prediction on yearly and seasonal time scales,
1148
N. Barbu, V. Cuculeanu, S. Stefan
22
and analytical equation obtained by running the MLRM for 130 years period can
be used for future climate projection. It is important to select optimal number of
predictors to build up the MLRM. A small number of predictors lead to less
performance of the MLRM.
In addition, the methodology can be applied for prediction of precipitation
fields not only for Romanian territory but also for other regions from Europe.
Acknowledgments. Author Barbu N. work was supported by the strategic grant
POSDRU/159/1.5/9.137750, “Project Doctoral and Postdoctoral programs support for increased
competitiveness in Exact Sciences research” co-financed by the European Social Founds within the
Sectoral Operational Program Human Resources Development 2007–2013.
The authors thank to the Executive Agency for Higher Education, Research Development and
Innovation Funding (UEFISCDI) for the research funds in the research project CLIMYDEX
“Changes in climate extremes and associated impact in hydrological events in Romania”, cod PNIIPCCE-ID-2011-2-0073.
The present study is also useful for the project EMERSYS (Toward an integrated, joint crossborder detection system and harmonized rapid responses procedures to chemical, biological,
radiological and nuclear emergencies), MIS-ETC code 774.
REFERENCES
1. J. H. Christensen, B. Hewiston, A. Busuioc, A. Chen, X. Gao, I. Held, R. Jones, R. K. Kolli,
W. T. Kown, R. Laprise., V. Magana Rueda, L. Mearns, C. G. Menéndez., J. Räisänen,
A Rinke, A. Saar and P. Whetton, Regional Climate Projections. In: Climate Change 2007:
The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report
of the Intergovernmental Panel on Climate Change [Solomon S., Qin D., Manning M., Chen
Z., Marquis M., Averyt K.B., Tignor M. and Miller H.L. (eds.)]. Cambridge University Press,
Cambridge, United Kingdom and New York, NY, USA (2007).
2. R. Bojariu and F. Giorgi, The North Atlantic Oscillation signal in a regional climate simulation for
the European region. Tellus, 57A:641–653 (2005).
3. H. F. Diaz, R.S. Bradley and J.K. Eischeid, Precipitation Fluctuation Over Global Land Areas
Since the Lates 1800’s, J. Geoph. Res. 94D1:1195–1210 (1989).
4. R. S. Bradley, H. F. Diaz, J. K. Eischeid, P.D. Jones, P. M. Kelly and C. M. Goodess, Precipitation
Fluctuation over Northern Hemisphere Land Areas Since the Mid-19th Century, Science
(Reprint Series), 237:171–175 (1987).
5. C. D. Schönwiese, J. Rapp, T. Fuchs and M. Denhard, Observed climate trends in Europe 1891–1990.
Meteorol Z NF 3:22–28 (1994).
6. A. Busuioc and H. von Storch, Changes in the winter precipitation in Romania and its relation to
the large scale circulation. Tellus 48A:538–552 (1996).
7. L. Buffoni, M. Maugeri and T. Nanni, Precipitation in Italy from 1833 to 1996. Theor. Appl.
Climatol. 63:33–40 (1999).
8. A. N. Pettitt, A non-parametric approach to the change-point problem. Applied Statistics.126–135
(1979).
9. G. Galliani and F. Filippini, Climate clusters in a small area. J. Clim. 3:47–63 (1985).
10. C. Cacciamani, S. Tibaldi and S. Nanni, Mesoclimatology of winter temperature and precipitation
in the Alpine Region of Northern Italy. Int. J. Climatol. 14:777–814 (1994).
11. H. von Storch, Spatial patterns: EOFs and CCA. In: von Storch H, Navarra A (eds) Analysis of
climate variability: applications of statistical techniques. Springer, Heidelberg, 227–258
(1995).
23
Modeling the precipitation amounts dynamics in Romania
1149
12. M. C. Valverde Ramíres, N. J. Ferreira and H. F. de Campos Velho, Linear and Nonlinear
Statistical Downscaling for Rainfall Forecasting over Southeastern Brazil. Weather and
Forecasting. 21:969–989 (2006).
13. M. D. Mizanur Rahman, M. Rafiuddin and M. D. Mahbub Alam, Seasonal forecasting of
Bangladesh summer monsoon rainfall using simple multiple regression model. J. Earth Syst.
Sci. 122(2):551–558 (2013).
14. R. Chifurira and D. Chikobvu, A Weighted Multiple Regression Model to Predict Rainfall
Patterns: Principal Component Analysis approach. Mediterranean Journal of Social Sciences,
5(7):34–42 (2014).
15. W. H. Klein, Specification of monthly mean surface temperatures from 700 mb heights. J. Appl.
Meteor. 1:154–156 (1962).
16. M. C. Hubbard and W. G. Cobourn, Development of a regression model to forecast ground-level
ozone concentration in Louisville, KY. Atmos. Environ. 32(14–15):2637–2647 (1998).
17. A. Vlachogianni, P. Kassomenos, A. Karppinen, S. Karakitsios and J. Kukkonen, Evaluation of a
multiple regression model for the forecasting of the concentrations of NOx and PM10 in
Athens and Helsinki. Science of the Total Environment. 409:1559–1571 (2011).
18. F. Simion, V. Cuculeanu, E. Simion and A. Geicu, Modeling the ²²²Rn and ²²°Rn progeny
concentration in atmosphere using multiple linear regression with meteorological variables as
predictors. Rom. Rep. Phys. 65, 524–544 (2013).
19. V. Cuculeanu, I. Ungureanu and S. Stefan, Study of the relationship among radiative forcing,
albedo and cover fraction of the clouds. Rom. Journ. Phys. 58, 987–999 (2013).
20. A. Busuioc, Large-scale mechanisms influencing the winter Romanian climate variability,
Detecting and Modelling Regional Climate Change and Associated Impacts, M. Brunet and
D. Lopez eds., Springer-Verlag, 333–343 (2001).
21. R. A. Houze, Cloud dynamics. Academic Press, Inc, San Diego (1993).
22. G. P. Compo, Whitaker J.S., Sardeshmukh P.D, Matsui N., Allan R.J., Yin X., Gleason
B.E., Vose R.S., Rutledge G., Bessemoulin P.,Brönnimann S., Brunet M., Crouthamel
R.I., Grant A.N., Groisman P.Y., Jones P.D., Kruk M.C.,Kruger A.C., Marshall G.J.,Maugeri
M., Mok H.Y.,Nordli O., Ross T.F., Trigo R.M., Wang X.L.,Woodruff S.D. andWorley S.J.,
The Twentieth Century Reanalysis Project, Q. J. R. Meteorol. Soc. 137:1–28. doi:
10.1002/qj.776 (2011).
23. G. P. Compo, J. S. Whitaker and P. D. Sardeshmukh, Feasibility of a 100 year reanalysis using
only surface pressure data. Bull. Am. Meteorol. Soc. 87:175–190, doi:10.1175/BAMS-87-2175 (2006).
24. P. Aksornsingchai and C. Srinilta, Statistical Downscaling for Rainfall and Temperature
Prediction in Thailand, Proceedings of the International MultiConference of Engineers and
Computer Scientists, 2011, March 16 – 18, 2011, Hong Kong (2011).
25. D.S. Wilks, Forecast verification. Statistical Methods in the Atmospheric Sciences, Acadenic
Press, p467 (1995).
26. K. Pearson, Mathematical contributions to the theory of evolution. III. Regression, heredity, and
panmixia. Philosophical Transactions of the Royal Society Ser. A 187:253–318 (1896).
27. C. E. Spearman, The proof and measurement of association between two things. Am. J. Physiol.
15:72–101 (1904).
28. P. A. Rogerson, A Statistical Method for the Detection of Geographic Clustering. Geogr Anal
33(3):215–227 (2001).
29. P. Wessa, Free Statistics Software, Office for Research Development and Education, version
1.1.23-r7, URL http://www.wessa.net/ (2014).
30. M. Rajeevan, D.S. Pai and R. Anil Kumar, New statistical models for long range forecasting of
south-west monsoon rainfall. Climate Dynamic. 28(7–8):813–828, doi:10.1007/s00382-0060197-6 (2006).