NEAS Time Series Student Project xxxx xxxxxx x. xxxxx ELECTRICITY

NEAS Time Series Student Project
xxxx xxxxxx x. xxxxx
ELECTRICITY PRODUCTION FROM HYDROELECTRIC RESOURCES
IN THE PHILIPPINES
INTRODUCTION
This project looks at the annual electricity production from hydroelectric resources in the Philippines at a
specified period. Sources of electricity refer to the inputs used to generate electricity while hydropower refers
to electricity produced by hydroelectric power plants.1 Hydroelectricity has been one of the top four sources of
electricity since 1970s, along with coal, natural gas and other renewable sources.
TIME SERIES DATA
Figure 1 below shows the annual data values for this time series, expressed in billions of kilowatt-hours (kWh),
from year 1973 (time 0) to 2011 (time 38).
This time series, with a total of 39 data points, is based on the data bank of The World Bank Group on the
Philippines.
SAMPLE AUTOCORRELATION FUNCTIONS
The sample autocorrelation function (ACF) values for different lags (π‘˜ = 1 to 38) are computed using the
following formula
βˆ‘π‘›π‘‘=π‘˜+1(π‘Œπ‘‘ βˆ’ π‘ŒΜ…)( π‘Œπ‘‘βˆ’π‘˜ βˆ’ π‘ŒΜ…)
π‘Ÿπ‘˜ =
βˆ‘π‘›π‘‘=1(π‘Œπ‘‘ βˆ’ π‘ŒΜ…)2
and then graphed in Figure 2 below. (See ACF sheet in the working file embedded in Appendix A for the
computation of ACF values).
1http://databank.worldbank.org/data/views/reports/tableview.aspx
1
From the ACF graph, we can say that the time series is neither a white noise (as π‘Ÿπ‘˜ is far from zero for most π‘˜)
nor a random walk (as π‘Ÿπ‘˜ does not stay high for a very long time). Moreover, since there is no π‘ž for which π‘Ÿπ‘˜ is
close to zero for π‘˜ > π‘ž, the time series more likely does not follow a moving average model.
With these and due to the damped wave appearance of the ACF graph, we would then consider an
autoregressive model, and look further at the sample partial autocorrelation function (PACF) to have an idea of
the autoregressive order. Figure 3 below shows the sample PACF values πœ™Μ‚π‘˜π‘˜ at different lags π‘˜ using the
following recursive relationships
Μ‚
π‘Ÿπ‘˜ βˆ’ βˆ‘π‘˜βˆ’1
𝑗=1 πœ™π‘˜βˆ’1,𝑗 π‘Ÿπ‘˜βˆ’π‘—
Μ‚
πœ™π‘˜π‘˜ =
Μ‚
1 βˆ’ βˆ‘π‘˜βˆ’1
𝑗=1 πœ™π‘˜βˆ’1,𝑗 π‘Ÿπ‘—
where πœ™Μ‚π‘˜,𝑗 = πœ™Μ‚π‘˜βˆ’1,𝑗 βˆ’ πœ™Μ‚π‘˜π‘˜ πœ™Μ‚π‘˜βˆ’1,π‘˜βˆ’π‘— for 𝑗 = 1,2, … , π‘˜ βˆ’ 1. (See PACF sheet of the working file for the
computation of PACF values.)
For an 𝐴𝑅(𝑝) model, πœ™Μ‚π‘˜π‘˜ is expected to be close to zero for π‘˜ > 𝑝. Specifically, ±
2
βˆšπ‘›
can be used as critical limits
on πœ™Μ‚π‘˜π‘˜ to test this closeness and the fit of the 𝐴𝑅(𝑝) model. As shown in Figure 3 above, πœ™Μ‚π‘˜π‘˜ remains within or
2
close to (βˆ’0.10,0.10) range for π‘˜ > 2. The critical limits ±
2
√39
= ±0.3203 contain this range so the order of the
𝐴𝑅 process is less likely to be greater than 2. For π‘˜ = 1, πœ™Μ‚11 = 0.7275 is significantly greater than the upper
limit so an 𝐴𝑅(1) model is a possible candidate, while for π‘˜ = 2, although πœ™Μ‚22 = 0.3023 < 0.3203, we would
still further check an 𝐴𝑅(2) model, considering the small-sample error of the critical limits. That is, in the
succeeding sections of this project, we focus on determining which of 𝐴𝑅(1) and 𝐴𝑅(2) models provides better
fit for the time series, assuming that the time series follows a single model only all throughout its duration.
PARAMETRIC ESTIMATION
After identifying probable models through checking sample ACF and sample PACF and their graphs, we
estimate parameters of these models.
𝑨𝑹(𝟏) Model
First, we use method of moments for estimating the autoregressive coefficients πœ™π‘˜ ’s and refer to the formulas
derived in the Cryer-Chan textbook, Chapter 7.1. For an 𝐴𝑅(1) model, the parameter πœ™ can be estimated by πœ™Μ‚ =
π‘Ÿ1 = 0.7275. Since the mean of the time series is 5.8508, the MOM-estimated 𝐴𝑅(1) model is π‘Œπ‘‘ βˆ’ 5.8508 =
0.7275(π‘Œπ‘‘βˆ’1 βˆ’ 5.8508) + 𝑒𝑑 or equivalently
π‘Œπ‘‘ = 1.5945 + 0.7275π‘Œπ‘‘βˆ’1 + 𝑒𝑑
where 𝑒𝑑 is the usual β€œinnovation” or error term.
To estimate the noise variance πœŽπ‘’2 of the error terms, we first estimate the process variance by the sample
variance, that is,
𝑛
1
243.5032
𝑠2 =
βˆ‘(π‘Œπ‘‘ βˆ’ π‘ŒΜ…)2 =
= 6.4080
π‘›βˆ’1
39
𝑑=1
The estimate of πœŽπ‘’2 is then
πœŽΜ‚π‘’2 = (1 βˆ’ πœ™Μ‚π‘Ÿ1 )𝑠 2 = (1 βˆ’ 0.72752 )(6.4080) = 3.0167
Since |πœ™Μ‚| = 0.7275 < 1, it can be noted as well that this model is stationary.
The graph of the MOM-estimated 𝐴𝑅(1) model versus the actual time series is shown in Figure 4 below. From
this, it can’t fully model the time ranges with more fluctuations, i.e. the beginning and latter parts.
3
Using πœ™Μ‚ and the relationship πœŒπ‘˜ = πœ™ π‘˜ for an 𝐴𝑅(1) model, we can generate ACF at different lags and compare
the values with the sample ACFs to have an idea if the MOM-estimated model is reasonable. But intuitively,
since πœ™Μ‚ > 0, πœŒΜ‚π‘˜ is a nonnegative monotically decreasing function unlike the graph of the sample ACF in Figure
2, which starts to become negative starting lag 16. For this reason, we discount the possibility of an 𝐴𝑅(1)
model.
𝑨𝑹(𝟐) Model
π‘Ÿ (1βˆ’π‘Ÿ )
We next consider an 𝐴𝑅(2) model, for which the parameters πœ™1 and πœ™2 can be estimated by πœ™Μ‚1 = 11βˆ’π‘Ÿ22 and
1
2
π‘Ÿ βˆ’π‘Ÿ
πœ™Μ‚2 = 2 21 , respectively. Substituting π‘Ÿ1 = 0.7275 and π‘Ÿ2 = 0.6715, we obtain πœ™Μ‚1 = 0.5076 and πœ™Μ‚2 = 0.3023.
1βˆ’π‘Ÿ1
Similarly, the estimated 𝐴𝑅(2) model is π‘Œπ‘‘ βˆ’ 5.8508 = 0.5076(π‘Œπ‘‘βˆ’1 βˆ’ 5.8508) + 0.3023(π‘Œπ‘‘βˆ’2 βˆ’ 5.8508) or
equivalently,
π‘Œπ‘‘ = 1.1125 + 0.5076π‘Œπ‘‘βˆ’1 + 0.3023π‘Œπ‘‘βˆ’2 + 𝑒𝑑
2
The estimate of πœŽπ‘’ is
πœŽΜ‚π‘’2 = (1 βˆ’ πœ™Μ‚1 π‘Ÿ1 βˆ’ πœ™Μ‚2 π‘Ÿ2 )𝑠 2 = (1 βˆ’ (0.5076)(0.7275) βˆ’ (0.3023)(0.6715))(6.4080) = 2.7411
which is lower than that for the 𝐴𝑅(1) model due to the additional variable π‘Œπ‘‘βˆ’2 used in estimation.
Since the following conditions are satisfied by the parametric estimates:
1. πœ™Μ‚1 + πœ™Μ‚2 = 0.8099 < 1
2. πœ™Μ‚2 βˆ’ πœ™Μ‚1 = βˆ’0.2053
3. |πœ™Μ‚2 | = 0.3023 < 1
it can be noted that this model is likewise stationary.
The graph of this MOM-estimated model versus the actual time series is shown in Figure 5 below.
The 𝐴𝑅(2) model approximates the overall trend of the actual time series but has some significant deviations in
some data points, particularly at the latter part of the time series.
In checking the applicability of an 𝐴𝑅(2) model, we do parametric estimation using regression analysis tool in
Excel as well. Running this tool for the time series, the computed intercept is 0.6581 and coefficients πœ™1 and πœ™2
of 0.3380 and 0.5927, respectively, hence, the model π‘Œπ‘‘ = 0.65811 + 0.5927π‘Œπ‘‘βˆ’1 + 0.3380π‘Œπ‘‘βˆ’2 + 𝑒𝑑 . The
4
computed adjusted 𝑅 2 of this model is 0.8389, indicating that around 84% of the variation in the time series
can be explained by the estimated model or trend, which is a reasonably good percentage already.
The graph of this regression-estimated model versus the actual time series is shown in Figure 6 below.
As with the MOM-estimated model in Figure 5, the regression-estimated model approximates the overall trend
of the actual time series but has some significant deviations in some data points. (See 𝐴𝑅(𝑝) sheet in the
working file for the complete summary output of the regression tool runs.)
RESIDUAL ANALYSIS
To check which of the two 𝐴𝑅(2) models provide better fit, we compute for their standardized residuals across
time and plot together in one graph (Figure 7).
From Figure 7, we note that the residuals for both models are close to each other until time 26. From then on,
residuals differ significantly. For this range (time 27 onwards), the π‘Œπ‘‘ estimates based from the regression
model are greater than those from the MOM-model (Figure 8 below) hence, its smaller positive residuals and
larger negative residuals. However, if we look at the sum of the squared residuals, the regression-estimated
5
model of course have zero, while the MOM-estimated have 9.81. Hence, we are better off with the regressionestimated model.
SUMMARY AND CONCLUSIONS
This project aimed to model the actual time series of electricity production from hydroelectric resources in the
Philippines from 1973 to 2011. Data values are expressed in billions of kilowatt-hours (kWh) (Figure 1). By
examining the graph of the sample autocorrelation function (ACF) (Figure 2), the following have been eliminated
as choices for the model: white noise, random walk and moving average models. By examining next the graph of
the sample partial autocorrelation function (PACF) (Figure 3), the probable orders of an autoregressive model
have been identified, i.e. 𝑝 = 1 or 𝑝 = 2.
The parameters of 𝐴𝑅(1) and 𝐴𝑅(2) models are estimated using method of moments. For 𝑝 = 1, the estimated
stationary model is π‘Œπ‘‘ = 1.5945 + 0.7275π‘Œπ‘‘βˆ’1 + 𝑒𝑑 with πœŽΜ‚π‘’2 = 3.0167. However, by noting consistent deviations
in the graphs of this model versus the actual time series (Figure 4) and by reasoning out that the ACF using
πœ™Μ‚ = 0.7275 is inconsistent with the sample ACF, this model is discounted as well.
On the other hand, the MOM-estimated stationary 𝐴𝑅(2) model is π‘Œπ‘‘ = 1.1125 + 0.5076π‘Œπ‘‘βˆ’1 + 0.3023π‘Œπ‘‘βˆ’2 + 𝑒𝑑
with πœŽΜ‚π‘’2 = 2.7411. Graphically, this model fits the actual time series better than the 𝐴𝑅(1) model (Figure 5) but
it can be further improved by minimizing the squared residuals using regression analysis. Using the built-in Excel
tool, the regression-estimated model is π‘Œπ‘‘ = 0.65811 + 0.5927π‘Œπ‘‘βˆ’1 + 0.3380π‘Œπ‘‘βˆ’2 + 𝑒𝑑 . The two 𝐴𝑅(2) models
are then subjected to residual analysis for comparison. Graphs of the standardized residuals (Figure 7) and the
estimated models (Figure 8) show how close the two models are despite the differences in parameters. But as
mentioned, the regression-estimated model, by nature of its method of parametric estimation, yields a sum of
squared residuals very close to zero so this model is eventually chosen for the representation of the time series
considered.
However, it must be noted that the chosen 𝐴𝑅(2) model is in no way the optimal model for the time series. As
shown in Figure 7, the deviations of the residuals from zero increase as time increases. The techniques used in
this project are limited to the coverage of the time series course. More advanced techniques will definitely yield
models which provide better fit for the time series.
6
APPENDIX 1. EXCEL WORKING FILE
Excel file below contains the relevant data, computations and graphs used in this project.
NEAS Time Series
Student Project Dec 2014 (Luzon, Paul Adrian).xlsx
7