Collective and individual Month

Sarhad J. Agric. Vol.25, No.4, 2009
COLLECTIVE AND INDIVIDUAL MONTH-WISE DATA MANAGEMENT
APPROACH ON THE DATA COLLECTED IN KALAM (SWAT) THROUGH
MULTIPLE REGRESSION ANALYSIS
AMJAD MASOOD*, SYED MUHAMMAD SAEED SHAH**, MANZOOR AHMAD MALIK*
GUL DARAZ KHAN***, SAMAR GUL* and IKRAMUL HAQ****
*
**
***
****
Pakistan Council of Research in Water Resources, Peshawar – Pakistan.
Center of Excellence in Water Resources Engineering, University of Engineering and Technology,
Lahore – Pakistan.
Department of Water Management, NWFP Agricultural University, Peshawar – Pakistan.
Department of Agricultural Extension Education and Communication, NWFP Agricultural University,
Peshawar – Pakistan.
ABSTRACT
An understanding of hydrological regimes of mountain rivers is essential for water resources management
in Pakistan. As there are no proper estimates and relationships of river flow and climatic variables and especially
snow-melt stream flow relationships there is always a chance of floods which causes serious damages to crops,
human beings and other infrastructures. A proper study is therefore required to understand and analyze the runoff
regimes and its relation to the climatic variables to forecast the river flow. In this study different hydrological
regimes of River Swat Basin at Kalam are investigated by using inter-relationships of runoff with rainfall and
temperature. In regression analysis, two types of data management schemes are used in this study, first is Individual
Monthwise Regression in which 30 years monthly values of each parameter are regressed for each month
individually. The second is Collective Monthwise Regression in which normal value of each month is tabulated
against each parameter and then collectively flow values are regressed on precipitation, temperature and relative
humidity. In this collective monthwise technique encouraging results were obtained, which can be used for future
prediction of flow. Thirty years data was used to find out the linkages between river flows and climatic variables
(temperature, precipitation and relative humidity) in the study. Linkages in collective monthwise approach of
regression analysis came out quite better, especially for flow and temperature. This is due to the absence of gauging
stations at upper elevations, or it may be due to the reason that the station is not representative of the whole
catchment. In this study it is found that the Collective Month-wise Technique is a useful technique for Swat River
Basin at Kalam for predicting flow in River Swat.
Key Words: Water resources management, river flow, climatic variable, snowmelt, runoff regimes, regression analysis
Citation: Masood, A., S.M.S. Shah, M.A. Malik, G.D. Khan, S. Gul and I. Haq. 2009. Collective and individual
month-wise data management approach on the data collected in Kalam (Swat) through multiple regression analysis.
Sarhad J. Agric. 25(4): 557-561.
INTRODUCTION
Many of river catchments lie in the most northern part of Pakistan. The climate in Pakistan is mainly arid
and semiarid. Having high altitudes, these catchments receive a considerable amount of snowfall during winter
season. The stream flow is mainly due to melting of the snow. The snowmelt stream flow is a valuable one because
it occurs in the period of April, May and June before the monsoon rainfall. This early stream flow runoff is therefore
available for irrigation, power generation and water supply at the time when there is an extreme drought. The snowmelt and glacier-melt continue in July and August, but meantime there is abundant water from the monsoon of
damaging floods. But due to lack of proper estimates and relationships of river flow and climatic variables and
snow-melt stream flow relationships there is always a chance of floods which causes serious damage to crops,
human beings and other infrastructures. A proper study therefore is required to understand and analyze the runoff
regimes and its relation to the climatic variables to forecast the river flow at the proposed site.
One strategy for fitting a "best" line through the data would be to minimize the sum of the residual errors
which is ∑ei. This is an inadequate criterion; because best fit is a line connecting the points. Therefore, any straightline passing through the mid point of connecting line (except a perfectly vertical line) results a minimum value of
∑ei (Chapra and Canale, 1984). After fitting the best line, the second step in regression analysis is whether the data
can be adequately described by the regression line (Haan, 1973). Linear regression provides a powerful technique
for fitting a "best" line to data. It is predicted on the fact that, relationship between dependent and independent
558
Amjad Masood et al. Collective and individual month-wise data management approach…
variables is linear (Chapra and Canale, 1984). Although these models fit into linear regression in order to evaluate
the coefficient as a best fit. They could then be transformed back to original state and used for predictive purposes.
To test the accuracy of models, following three criteria (Haan, 1973; Pergram and Stretch, 1982; Chapra and Canale,
1984) can be used. A useful extension of linear regression is the case where dependent variable “Y” is a linear
function of two or more independent variables X1, X2, X3 etc.
^
Y = B + B X + B X + B X .......... .......... [ a ]
0
1 1
2 2
3 3
Usually “n” observations are available for the variable “Ŷ” and also “n” numbers of equations are formed one for
each observation. Therefore, we have to solve these “n’ equations for the “p’ unknown parameters (regression
coefficients) then “n” must be greater than “p” (Haan, 1973).
There is a role of seasonal snow accumulation based on surface measurements (De Scally, 1994) or on
remotely sensed assessments of snow covered area (Rango et al. 1977; Dey et al. 1989). De Scally (1994) studied
the Jehlum Basin and obtained high correlation coefficients between annual maximum snow peak water storage or
total winter precipitation and annual runoff, whilst summer precipitation was of little use in estimating annual flow.
It is concluded that in the Kunhar basin low elevation snow courses were as useful for forecasting as data from
remote high elevation sites.
Salomonson et al. (1997) analyzed low-resolution meteorological satellite data and simple photo
interpretation techniques have been used to map snow-covered areas during early April over the Indus River and
Kabul River basin in Pakistan. The stream flow in the regression analyses for each watershed was estimated (Indus
River, 1969-1973, R2 = 0.82 and Kabul River, 1967-1973 R2 = 0.89). Predictions of 1974 seasonal stream flow using
the regression equations were within 7% of the actual 1974 flow. Singh and Jain (2003) conducted studies on daily
stream flow simulation for the Sutluj River basin located in the western Himalayan regions. The model was
calibrated using a data set of three years (1985/86-1987/88) and model parameters were optimized. Using these
optimized parameters, simulations of daily stream flow were made for a period of six years (1988/89-1990/91 and
1996/97-1998/99). Modeling of stream flow involves physical features of the basin, including its total area, its
altitudinal distribution through elevation zones and the areas of these zones, and the altitude of precipitation and
temperature stations.
MATERIALS AND METHODS
Regression Analysis
The different data management schemes (individual and collective monthwise) for regression analysis have
been used in this study. Regression is statistical technique, may be used to evaluate correlations inferred from
knowledge of the physical environment. The resulting equations are in the form:
Y = B + B X + B X + B X ........................ + B X .......................[1b]...Acreman,..1985
0
1 1
2 2
3 3
n n
Evaluating the Regression
After fitting the best line, the second step in regression analysis is whether the data can be adequately
described by the regression line (Haan, 1973). One approach is to determine, how much of the variability in the
dependent variable is explained by the regression.
^
^
2
∑ (Y i − Y ) 2 = ∑ ( Y i − Y ) + ∑ (Y i − Y i )
[2]
The larger the sum of square due to regression the better data explained by regression equation. Ratio of the sum of
squares due to regression to the total sum of squares corrected for the mean can be used as a measure of ability of
the regression line to explain variations in the dependent variable. This ratio is commonly denoted by "r2" (Haan,
1973). r2 = sum of squares due to regression / sum of squares corrected for mean. Then:
r
2
2
^
2
= ∑ (Y − Y )
/ ∑ (Y − Y )
i
i
[3]
"r " is called "coefficient of determination". The Equation (3) can also be written;
2
2
2
2
r
= [ B ∑ Y + B ∑ X Y − ( ∑ Y ) / n ] /[ ∑ Y
− (∑ Y ) / n
[4]
i
i i
i
i
i
0
1
If the regression equation perfectly predicts every value of Yi then (Y^i - Y¯) would be zero. Therefore, Equation (2)
could be;
559
Sarhad J. Agric. Vol.25, No.4, 2009
^
2
2
∑ ( Y i − Y ) = ∑ ( Y i − Y ) .......
[5 ]
Under these condition, ratio of both sides of the equation would be one. On the other hand, if the regression equation
is explaining none of the variation in "Y" then one side of the equation would be zero which makes the ratio equal to
zero as well. Thus the range of coefficient of determination is from zero to one. Closer it is to one the better
regression equation fits the data.
Application of Linear Regression and Linearization of Non-Linear Relationship
Linear regression provides a powerful technique for fitting a "best" line to data. It is predicted on the fact
that, relationship between dependent and independent variables is linear (Chapra and Canale, 1984). But this is not
always the case; in hydrology sometimes the parameters have non-linear relationship with each other. In such cases
transformation (linearization) is essential to express the data in a form of linear regression. Therefore, by
linearization of non linear regression, we may be able to evaluate the constant and coefficients. Although these
models fit into linear regression in order to evaluate the coefficient as a best fit. They could then be transformed
back to original state and used for predictive purposes. To test the accuracy of models following three criteria (Haan,
1973; Pergram and Stretch, 1982; Chapra and Canale, 1984) can be used.
Multiple Linear Regression Method
A useful extension of linear regression is the case where dependent variable “Y” is a linear function of two or more
independent variables X1, X2, X3………
^
Y = B
+ B
0
1
X
1
+ B
X
2
+ B
2
3
X
+ ..........
3
..........
..........
..........
....[ 6 ]
Where “B0” is constant coefficient or intercept and B1, B2 and B3 are coefficient for variables X1, X2, and X3 etc.
Usually “n” observations are available for the variable “Ŷ” also “n” numbers of equations are formed one for each
observation. Therefore, we have to solve these “n’ equations for the “p’ unknown parameters (regression
coefficients) then “n” must be greater than “p” (Haan, 1973). As an example of “n” equations can be:
Y
1
Y
2
Y
3
Y
n
= B
0
= B
= B
0
0
= B
0
+ B
1
+ B
+ B
+ B
X
+ B
1 ,1
1
X
2 ,1
1
X
3 ,1
1
X
1
+ B
+ B
+ B
2
X
2
+ B
1, 2
2
X
2 ,2
2
X
3,2
X
n ,2
X
1,3
3
X
3
X
3
+ B
+ B
+ B
3
X
+ ..........
2 ,3
3 ,3
n ,3
+ B
+ ..........
+ ..........
+ ..........
.X
p
+ B
.+ B
... + B
p
p
n
1, p
.X
.X
.X
........[
2, p
3, p
n, p
7]
....[ 8 ]
....[ 9 ]
....[ 10 ]
Individual Month-wise Approach (Multiple Regression)
In this Approach Multiple Regression Analysis has been performed for each month between river flow and
climatic variables (temperature, precipitation and relative humidity) to estimate the flows. The general regression
equation is as follows:
Q = B
0
+ B P + B T
1
2 max
3
+ B T
3 min
+ B
4
. RH ..........
..........
.......[ 11 ]
Where Q = River flow in m /s ,
B0 = Constant coefficient or intercept
B1 = Coefficient or intercept for the rainfall,
B2 = Coefficient for the Maximum temperature
B3 = Coefficient for the minimum temperature
B4 = Coefficient for the relative humidity
Collective Month-wise Approach (Multiple Regression)
In this approach Multiple Regression Analysis has been performed for the whole normal year (30 years
average) between river flow and climatic variables (temperature, precipitation and relative humidity) to estimate the
flows.
Q = B
′
0
+ B P + (B
)T
1
2
mean
3
+ B
4
RH ..........
..........
..........
.......[ 12 ]
Where Q = River flow in m /s ,
B0 = Constant coefficient or intercept
B1 = Coefficient or intercept for the rainfall,
B2 / = Coefficient for the Maximum and minimum temperatures (say B2 + B3)
B4 = Coefficient for the relative humidity
560
Amjad Masood et al. Collective and individual month-wise data management approach…
Model for Regression Analysis
For regression analysis “MINITAB 11” software was used. MINITAB 11 for Windows is a powerful
statistical package that provides a wide range of basic and advanced data analysis capabilities. Minitab Inc. has long
been recognized as a leading developer of easy-to-use statistical software. MINITAB's well-designed user interface
makes it accessible to users with a wide variety of background and experience.
RESULT AND DISCUSSION
Individual Month-wise Approach (Multiple Regression)
The developed equations from the Individual Month-wise Approach for (Multiple Regression) different
months of the year are given in Table I.
Table I
S. No
1
2
3
4
5
6
7
8
9
10
11
12
Individual month-wise regression coefficients for different months of the year
Month
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Coefficients of different variables used in regression model
B1
B2
B3
B4
B0
0.0252
-0.613
0.000194
.000031
0.000222
0.033
0.048
-0.000289
0.000562
0.000048
0.0055
-0.117
0.000185
0.000304
0.000164
0.0019
0.114
0.000197
0.000448
0.000092
-0.0186
0.151
0.000487
0.000588
0.000107
-0.296
2.47
0.00445
0.00207
0.000162
-0.986
1.74
0.0127
0.00572
0.00131
-2.24
-12.9
0.0402
-0.0115
0.00422
-1.31
3.5
0.0207
0.00052
0.00292
-1.18
14.9
0.0126
0.00172
0.00641
-0.580
13.6
0.00730
0.00455
-0.00058
0.0466
2.10
0.00104
-0.00160
0.000326
R^2 (%)
Remarks
6.8
30.1
11.2
21.3
28
46.3
63.3
57.9
12.6
30.5
40.7
16.5
Weak
Medium
Weak
Weak
Medium
Fair
Good
Good
Weak
Medium
Fair
Weak
With reference to the above equations the linkage between river flow and climatic variables (temperature,
precipitation and relative humidity) is not so much good. Only in two months i.e. May and June the linkages are
moderate according to above equations. It means station is not representing the whole catchment. R2 value is 6.8%
for the month of November which is very low and shows that the linkage between flow and climatic variables is
quite weak. Similarly in the months from December to April and July to October, the relationships are consecutively
very weak due to having low R2 values. Only in May and June the coefficients of determination are 63.3 and 57.9%
respectively which are reasonable.
Collective Month-wise Approach (Multiple Regression)
The developed regression equations for collective month-wise approach for different groups of months are as
follows:
Table II
Individual month-wise regression coefficients for different months of the year
S. No.
Duration
1
2
3
Nov-Oct
April-Oct
Nov-May
Coefficients of different variables used in regression model
B1
B2/
B4
B0
-0.697
-5.9
0.0132
0.00355
-1.35
23.8
0.0357
-0.0105
-1.17
-10.3
0.0159
0.00972
R^2
(%)
77.7
95.1
93.9
Remarks
Very Good
Excellent
Excellent
In this approach regression results are very much improved and R2 values came 77.7%, 95.1% and 93.9%
which are very good comparatively. Thus these obtained equations can be used for prediction of flow.
CONCLUSION
Only in two months i.e. May and June the linkage is moderate according to the linear relationship from
given data obtained. The main cause of the non correlation of the remaining data of the year according to this
technique is due to the limited data collection point which is not uniformly representing the whole parts of the
catchment area. Thus the relationship obtained through individual month-wise approach cannot be used for
prediction of flow. The collective month-wise multiple regression for predicting flow from an established
relationship can be used as a tool for Swat Basin at Kalam site.
Sarhad J. Agric. Vol.25, No.4, 2009
561
REFERENCES
Acreman, M.C. 1985. Predicting the mean annual flood from basin characteristics in Scotland. J. Hydrol.30(1)
37-49.
Chapra, C.S. and R.P. Canale. 1984. Numerical methods for engineers. Mc Graw-Hill Co, London. pp.286-309.
De Sacally, F.A. 1994. Relative importance of snow accumulation and monsoon rainfall for estimating the annual
runoff, Jehlum basin, Pakistan. Hydrol. Sci. J. 39: 199-216.
Dey, B., V.K. Sharma and A. Rango. 1989. A test of snowmelt-runoff model for a major river basin in the western
Himalayas. Nordic Hydrol. 20:167-178.
Haan, C.T. 1973. Statistical method in hydrology. McGraw-Hill, New York. pp.180-221.
Pergram, G.G.S. and D.D. Stretch. 1982. Recursive integrated estimation of effective precipitation and continuous
stream flow model. Int’l. Symp. Missipi, USA. pp.191-228.
Rango, A., V.V. Salomonson and J.L. Foster. 1977. Seasonal stream-flow estimation in the Himalayan region
employing meteorological snow cover observations. Water Resources Res. 13,109-122.
Salomonson, V.V. and A. Rango and J.L. Foster. 1997. Seasonal stream flow estimation in the Himalayan region
employing meteorological satellite snow cover observations. Water Resources Res. 27(7), 1541-1552.
Singh, P. and S.K. Jain. 2003. Modeling of stream flow and its components for a large Himalayan basin with
predominant snowmelt yields. IAHS. 48(2)257-276.
Vehvilainen, B. and J. Lohvansuu. 1991. The effects of climatic change on discharge and snow cover in Finland.
IAHS.36 (2) 109-121.
Yevjevich, Y. 1972. Probability and Statistics in Hydrology. Water Resource, Pub, Colorado, USA, pp. 232-275.