Forecasting Paper - gozips.uakron.edu

Adam Pilz
Forecasting
Grade: A-
Copper Demand
Low interest rates resulting from the 2001 recession fueled a boom in the demand for
credit from industrial, retail, and institutional borrowers1. These customers used funds to
expand their businesses, borrow from credit cards, purchase new mortgages, and
purchase and securitize loans for resale, which allowed borrowing to continue.
The expansion of business facilities and housing was very raw material intensive.
Copper is used in nearly all facets of expansion in both industrial facilities as well as
housing. Both require wiring, telecommunications equipment, electrical and electronic
products, and plumbing. When you include power transmission and generation, which
not only require large amounts of copper, but also represents the power to fuel the
expansion, about three quarters of copper use is accounted for2.
Before selling any wire or piping, copper must be removed from the ground and refined.
Therefore if we wanted to “see” economic expansion before it happened in terms of
corporate profits or GDP, we could simply look at the demand for copper and realize that
there is a lag between the time copper is mined, turned into products, and sold. Since we
are currently in another recession3, knowing when demand is starting up again is
important as it will help determine both monetary and fiscal policy. Since copper seems
to be a decent proxy for future economic growth, and the fate of the ailing housing
market, knowing its demand should give us an idea of what is to come.
1
http://research.stlouisfed.org/publications/net/20000801/netpub.pdf
http://www.copper.org/education/c-facts/c-electrical.html
3
http://www.nber.org/cycles/dec2008.html
2
Section II of this research contains a review of the literature. Section III contains a
description of the data used. Section IV reviews the methodology used. Section V the
results and Section VI conclusions of this research.
II. Literature Review
The amount of literature attempting to estimate copper demand is very limited. Some
debate about what kind of model should be used in estimating copper demand is posed.
Ordinary Least Squares(OLS) models as well as techniques to handle censored
regressions (i.e. TOBIT models) are used and debated. Mackinnon and Oleweiler (1980)
suggest that OLS models should not be used. This claim is based on the findings of Mc
Nichol(1975) who speculates that there were five periods between 1947 and 1974 where
the copper market was in disequilibrium: 1947-48, 1951-53, late 1954-1955, 1964-1970,
1973-1974. Disequilibrium, McNichols suggests, can be seen where the price differential
between the US Producer Price, which is the price prevalent in the US, and the price on
the London Metal Exchange, is large. He asserts that the large price differential persists
because US producers are keeping the US price low, which they set, and ration supplies
to consumers, which drives up demand on the London Metal exchange, causing the large
differential. The problem with this, asserted by Mackinnon and Oleweiler(1980) is that
during contiguous quarters of disequilibrium, OLS tends to have errors of the same sign
and will be serially correlated. In their Tobit model, MacKinnon Olewiler (1980) find
that the market for copper is not always in equilibrium.
Fisher, Cootner, Baily (1972) estimate a world copper market model with very complex
methods. They estimated supply and demand equations for The US, Europe, Japan, and
the rest of the world separately and then used a closed model with “A net input equation.”
They fit their model to data from 1948-1968 which is then used for forecasting and
simulation experiments. They find that the short run forecasts are satisfactory and the
long run forecasts are not satisfactory.
III. Data
Data for this research was obtained from the “United States Geological Survey(USGS)
Historical Statistics for Mineral and Material Commodities” at the USGS web page. The
USGS keeps historical records of copper consumption and unit pricing back to 1900.
Reported copper consumption is recorded directly from industry sources and is measured
in metric tons. The phrase “reported consumption” is used because there may be some
discrepancy between what is reported and what the actual consumption is. Unless there is
some incentive to lie, I would expect these discrepancies to be rather small. Reported
consumption, hereafter referred to as “consumption,” is defined as “the quantity of
refined copper used by the domestic industry (brass mills, wire-rod mills, foundries, etc.),
as measured by direct survey of the copper consuming industries, in the production of
semi fabricates, castings, chemicals, etc. in the United States.” Descriptive statistics are
included below as well as a chart in Appendix A.
Variable
Mean
SE Mean St Dev
Min
Q1
Median
Consumption
1536329
107752 1140339
118000 765500 1360000
Q3
Maximum
2117500
8767403
Unit value is a measure of the price of a physical unit of consumption (in this case, a
metric ton) in nominal dollars. Unit value, hereafter referred to as “price,” is estimated
from the “Annual Average U.S. Producer Copper Price” as reported in the Metal Prices in
the United States through 1998 (MP98) and the 2006 Minerals Yearbook(MY06).
Descriptive statistics are included below as well as a chart in Appendix A.
Variable
Unit value ($/
Metric ton)
N
Mean
SE
Mean
StDev
Minimum
Q1
Median
Q3
Maximum
108
1069
112
1167
128
296
642
1587
7231
IV. Methodology
As a precursor to the modeling methodology one important issue must be specified. The
model I will use, no matter which one, assumes that consumption and demand are exactly
equal, the copper market is always in equilibrium.
In evaluating the different modeling techniques possible for the construction of a copper
demand forecast, I have felt that an OLS model is more prone to error than other choices.
This error could occur due to the findings of MacKinnon Olewiler (1980), or because of
copper’s uses. It would require so many different variables to account for what is
influencing demand in the United States, let alone what is happening abroad that also
influences domestic consumption, that I don’t feel an accurate model would be produced.
Also, in order to forecast using an OLS model, I would need forecasts for the exogenous
variables which, again, are numerous.
After trying various Naïve, moving average, and smoothing models, I tried an ARIMA
model. I determined this to be the best model I could use for several reasons. In using
the Box-Jenkins Methodology an ARIMA model does not assume any particular pattern
in the data to be forecast as the other models do. Also, not only did the other models
perform poorly, but the ARIMA model makes use of the information in the series itself to
produce a forecast. Therefore, influence of exogenous variables such as trends occurring
in terms of new uses for copper or changes in preferences, and prices of substitutes and
complements will be taken into account by this model.
I began using the consumption data as it is reported by the USGS. The consumption data
clearly was trending upward.
Time Series Plot of Consumption
3000000
Consumption
2500000
2000000
1500000
1000000
500000
0
1900
1918
1936
1954
Year
1972
1990
I checked an autocorrelation correlogram to confirm.
Autocorrelation Function for Consumption
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
I used a first difference to remove the trend and continued. The autocorrelation and
partial autocorrelation correlograms seemed to suggest the use of either a MA(1) model
or an ARIMA(1,1,1) model. Both models provided horrendous results. The MS in both
models were to the tune of 37,000,000,000 which may not have been a significant
problem given the size of the numerical values of consumption. My real problem was
that no matter what variation I tried, I could not get the LBQ statistics to be insignificant.
Results, as well as additional commentary are provided in appendix B on these models.
Then I decided to try a slight transformation of the data. I divided consumption by the
price to see if the would stabilize the data. In essence, I would be forecasting the amount
of copper consumed in a year adjusted for its nominal price, hereafter referred to as
“adjusted consumption.” The interesting thing about this approach ended up being that
once I received my forecasts, I could multiply the forecast by the expected future price to
get the forecast tonnage consumed. An example:
Consumption ÷Price = Adj. Cons.
Stabilizes the data, the forecast is done, then, transform the forecasts back using:
Exp. Price ∙ Adj. Cons. = Consumption
A convenient future price for copper was supplied by the price of a copper futures
contract. The only adjustment needed here was that the copper traded on the NYMEX is
traded in ($ / pound) and all price figures used thus far were in ($ / metric ton). A simple
conversion, hereafter referred to as “Conversion Factor,” to ($ / metric ton) was needed.
An example:
($ ÷ Pound) ∙ (Pound ÷ Metric Ton) = ($ ÷ Metric Ton)
(Conversion Factor)
Time Series Plot of Adjusted Consumption
6000
Adjusted Consumption
5000
4000
3000
2000
1000
0
1900
1918
1936
1954
Year
1972
1990
There was clearly still some trending going on so I tried a first difference to see if the
trend could be removed.
Time Series Plot of Adjusted Consumption diff1
2000
Adjusted Consumption diff1
1500
1000
500
0
-500
-1000
-1500
1900
1918
1936
1954
Year
1972
1990
It appears as though the trend had been removed and the data fluctuate around a mean of
zero. The auto and partial autocorrelation correlograms provided below seem to imply
the use of either an MA(1), MA(2), or any variation of an ARIMA model.
Autocorrelation Function for Adjusted Consumption diff1
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
Partial Autocorrelation Function for Adjusted Consumption diff1
(with 5% significance limits for the partial autocorrelations)
1.0
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
V. Empirical Results
The results that were presented in the rough draft have been moved to Appendix C. After
careful reexamination of the AIC and BIC table I had, I realized that I forgot to include
one model before which ended up being the best in terms of those measures. Therefore
the results presented below are different than before. Here is the amended AIC and BIC
table:
AIC
BIC
ARIMA(0,1,1)
11.9812
16.6881
ARIMA(0,1,2)
11.9951
16.7269
ARIMA(1,1,2)
11.9915
16.7048
The rough draft was written with the model in green and the results presented below are
presented using the model highlighted in orange. They are very close in measure yet the
ARIMA(0,1,1) measures better in each, is simpler, and provides a seemingly better
forecast.
Final Estimates of Parameters ARIMA(0,1,1)
Type
MA
1
Coef
-0.3449
SE Coef
0.0912
T
-3.78
P
0.000
MS
159749
From the chart we see that the moving average coefficient is significant above the 99%
level. This tells us that the error of the past value is important in determining the next
value. Even though the MS has been reduced we cannot compare that to the models in
Appendix B since the transformation has been done.
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
Chi-Square
DF
P-Value
12
12.6
11
0.319
24
24.3
23
0.386
36
26.5
35
0.847
48
36.1
47
0.875
The LBQ statistics do not raise any flags so we know the residual autocorrelations as a
group are insignificant. Confirmation of this is also provided in the four in one plot and
the residual autocorrelation correlogram below.
Residual Plots for Adjusted Consumption
Normal Probability Plot
Versus Fits
99.9
1000
90
Residual
Percent
99
50
10
1
0.1
0
-1000
-1000
0
Residual
1000
0
Histogram
1500
3000
4500
Fitted Value
6000
Versus Order
1000
Residual
Frequency
30
20
10
0
0
-1000
-1000
-500
0
500
Residual
1000
1500
1 10 20 30 40 50 60 70 80 90 100
Observation Order
The errors seem to be normally distributed in regards to the normal probability plot. The
residual histogram appears to be normal. The “versus order” plot gives us a time series
of the residuals and it has the appearance of white noise. The last check of residuals is
provided below in the correlogram of the residuals.
Autocorrelation Function for RESI of Adjusted Consumption
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
Therefore our model is:
Ŷ= εt -.3449 ε (t-1)
Time Series Plot for Adjusted Consumption with Forecasts
(with forecasts and their 95% confidence limits)
7000
Adjusted Consumption
6000
5000
4000
3000
2000
1000
0
-1000
-2000
1
10
20
30
40
50
60
Time
70
80
90
100
110
Forecasts for Adjusted Consumption
Year
2008
2009
2010
2011
2012
Lower
-60.1399517
-89.7340626
-1360.25908
-662.826319
-925.039958
Forecast
323.4019481
323.4019481
323.4019481
323.4019481
323.4019481
Upper
1106.943848
1636.537959
2007.062976
2309.630215
2571.843854
Something to note is that the model is predicting that adjusted consumption will be
exactly the same for the next 5 years. We must keep in mind that these forecasts are of
adjusted consumption until we convert them back to original form. Therefore:
Forecasts for Consumption
Year
2008
2009
2010
2011
Average
Price4
3.2
2.048278
2.027125
2.0155
Conversion
Factor5
0.00045359
0.00045359
0.00045359
0.00045359
Price/ Metric
Ton
7054.829251
4515.703609
4469.068983
4443.440111
Adjusted
consumption
323.4019481
323.4019481
323.4019481
323.4019481
Forecast
Tonnage
2281545.523
1460387.344
1445305.615
1437017.188
This same table for the lower and upper bounds is included in Appendix D. Below is the
time series of copper consumption including both the upper and lower forecasts. Since I
had to calculate the consumption numbers using the expected prices, Minitab would
produce the typical forecast graph. Therefore I had to overlay three data sets which is the
reason for the different colors.
4
Average Price- (2008) is from the USGS website. From (2009-2011) is the average most recent close of
all the months available for the respective years derived from the NYMEX website.
http://www.nymex.com/cop_fut_csf.aspx
5
Is the conversion factor necessary to convert $/ Pound to $/ Metric ton. http://www.metricconversions.org/weight/pounds-to-metric-tons.htm
Time Series Plot of Consumption with Forecast, Lower, Upper
Variable
Forecast_1
Lower_2
Upper_2
10000000
Data
5000000
0
-5000000
-10000000
1900
1918
1936
1954
Year
1972
1990
2008
There is a slight discrepancy with the lower forecasts when we consider that they reflect
negative consumption, which is not possible. Therefore I provide another chart. Here is
basically the same graph just with a minimum value of zero. As you can see, the model
predicts a fall in the demand for copper.
Time Series Plot of Consumption with Forecast, Upper
Variable
Forecast_1
Lower_2
Upper_2
10000000
Data
8000000
6000000
4000000
2000000
0
1900
1918
1936
1954
Year
1972
1990
2008
Time Series Plot of Consumption with forecasts
3000000
Consumption
2500000
2000000
1500000
1000000
500000
0
1900
1918
1936
1954
Year
1972
1990
2008
In fact when we look at the time series plot of consumption by itself, the model predicts a
steep fall in the demand for copper in the next few years. The predicted consumption by
the USGS for 2008 was 2,000,000 metric tons and the model predicts 2,281,545 metric
tons, then a sharp fall, which may be the case if this worldwide recession continues.
Year
2008
2009
2010
2011
USGS
Forecast
2000000
Forecast
Tonnage
2281545.523
1460387.344
1445305.615
1437017.188
Now that we have a model that is satisfactory a check of an Ex Post as well as an Ex Post
Historical forecast is in order.
Ex Post: ARIMA(0,1,1)
Final Estimates of Parameters
Type
MA
Coef
1
SE
-0.3432
Coef
0.1128
T
-3.04
P
0.003
MS
216154
The moving average coefficient is significant at the 99% level and the MS appears to be
close to the former model.
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
Chi-Square
DF
P-Value
12
11.400
11.000
0.409
24
24.40
23.00
0.38
36
29.70
35.00
0.72
48
34.000
47.000
0.922
The LBQ stats are insignificant. The four in one plot below does not raise any red flags,
and neither does the correlogram of the residuals.
Ex Post Residual Plots for Adjusted Consumption
Normal Probability Plot
Versus Fits
99.9
99
1000
Residual
Percent
90
50
10
1
0
-1000
0.1
-1000
0
Residual
1000
2000
0
1500
Histogram
3000
4500
Fitted Value
6000
Versus Order
1000
15
Residual
Frequency
20
10
0
5
0
-1000
-1200
-600
0
600
Residual
1 5 10 15 20 25 30 35 40 45 50 55 60 65 70
1200
Observation Order
Autocorrelation Function for RESI1_Ex Post
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
Lag
12
14
16
18
Ex Post Time Series Plot for Adjusted Consumption
(with forecasts and their 95% confidence limits)
10000
Adjusted Consumption
7500
5000
2500
0
-2500
-5000
1
10
20
30
40
50
60
Time
70
80
90
100
Time Series Plot of Ex Post Adjusted Consumption and Adjusted Consumption
Variable
Ex Post Adjusted C onsumption
Actual Adjusted Consumption
6000
5000
Data
4000
3000
2000
1000
0
1900
1918
1936
1954
Year
1972
1990
We see that the Ex Post model does not predict the downturn that happens, when plotted
against adjusted consumption, because it seems to predict the same value into the future.
Ex Post Historical: ARIMA(0,1,1)
Final Estimates of Parameters
Type
MA
Coef
1
SE
-0.3449
Coef
0.0912
T
-3.78
P
0.000
MS
159749
The moving average coefficient is significant above the 99% level and the MS appears to
be below the Ex Post model. Nothing appears wrong with the LBQ stats below, they are
all insignificant.
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
Chi-Square
DF
12
12.600
11.000
24
24.300
23.000
36
26.500
35.000
48
36.100
47.000
P-Value
0.319
0.386
0.847
0.875
Residual Plots for Ex Post Historical Adjusted Consumption
Normal Probability Plot
Versus Fits
99.9
1000
90
Residual
Percent
99
50
10
1
0.1
0
-1000
-1000
0
Residual
1000
0
Histogram
1500
3000
4500
Fitted Value
6000
Versus Order
1000
Residual
Frequency
30
20
10
0
0
-1000
-1000
-500
0
500
Residual
1000
1500
1 10 20 30 40 50 60 70 80 90 100
Observation Order
The four in one plot appears to be ok as well as the correlogram of the residuals below.
Autocorrelation Function for RESI2_Ex Post Hitsorical
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
Ex Post Historical Time Series Plot for Adjusted Consumption
(with forecasts and their 95% confidence limits)
Adjusted Consumption_1
7500
5000
2500
0
-2500
-5000
1
10
20
30
40
50
60
Time
70
80
90
100
Here is the Ex Post Historical forecast against adjusted consumption. It doesn’t seem to
forecast the downturn that happens toward the end, although it seems reasonable prior to
that.
VI. Conclusion
In the beginning of this paper, I asserted that copper seems to be a decent proxy for future
economic growth, and the fate of the ailing housing market, and that knowing its demand
should give us an idea of what is to come. I employed the use of an ARIMA(0,1,1)
model to forecast the demand for copper in hopes of understanding when we can expect a
turnaround in the economy.
I am very pleased with this model and the forecasts generated by it. Considering that the
data only go through 2007, when the housing market and the economy were in better
shape, it provides an astonishingly low forecast for the demand for copper. This was
undoubtedly helped by the adjustment for price as the model was allowed to take into
account copper prices falling in half, which would not have happened had this adjustment
not been done.
If this model is correct, we should not expect a turnaround in the economy any time soon
if it is agreed that copper consumption is a good proxy for future economic growth.
The Ex Post and Ex Post Historical forecasts performed rather poorly, although not so
much when we consider the length of time over which the model is asked to predict.
Thirty six years into the future is a long time to assume that the fundamentals of the data
set will be the same as the time before it. Some of the reasons why domestic
consumption patterns would be vastly different after the early seventies is the enormous
expansion into computer technology. Copper is used in many components that service
this sector whose demand patterns were not reflected prior to the nineties.
BIBLIOGRAPHY
Copper Development Association Inc. (2009). COPPER.ORG. Retrieved 4/1, 2009, from
http://www.copper.org/resources/market_data/homepage.html
Fisher, F. M., Cootner, P. H., & Baily, M. N. (1972). “An econometric model of the
world copper industry.” The Bell Journal of Economics and Management Science,
3(2), 568-609. Retrieved from http://www.jstor.org/stable/3003038
MacKinnon, J. G., & Olewiler, N. D. (1980). “Disequilibrium estimation of the demand
for copper.” The Bell Journal of Economics, 11(1), 197-211. Retrieved from
http://www.jstor.org/stable/3003408
McNichol, D.L.. (1980). “The Two price system in the Copper Industry.”Bell Journal of
Economics, Vol. 6 No. 1. 50-73.
New York Mercantile Exchange, I. (4/3/2009). Copper. Retrieved 4/4, 2009, from
http://www.nymex.com/cop_opt_cso.aspx
rcallaghan. (Oct. 1, 2008). Retrieved 4/1, 2009, from
http://minerals.usgs.gov/ds/2005/140/#copper
TheOptionsGuide.com. (2009). Copper futures explained. Retrieved 4/4, 2009, from
http://www.theoptionsguide.com/copper-futures.aspx
McNichol, D.L.. (1980). “The Two price system in the Copper Industry.”Bell Journal of
Economics, Vol. 6 No. 1. 50-73.
Appendix A.
Summary for Consumption
A nderson-Darling N ormality Test
600000
1200000
1800000
2400000
3000000
A -S quared
P -V alue <
1.18
0.005
M ean
S tDev
V ariance
S kew ness
Kurtosis
N
1385843
761342
5.79641E +11
0.14072
-1.02552
108
M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum
118000
745000
1335000
2045000
3020000
95% C onfidence Interv al for M ean
1240613
1531072
95% C onfidence Interv al for M edian
1140000
1620000
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
671574
879030
Mean
Median
1100000
1200000
1300000
1400000
1500000
1600000
Summary for Unit value ($/t)
A nderson-Darling N ormality Test
0
1500
3000
4500
A -S quared
P -V alue <
8.10
0.005
M ean
S tDev
V ariance
S kew ness
Kurtosis
N
1068.9
1167.1
1362208.3
2.8882
11.7192
108
M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum
6000
128.0
296.0
642.0
1587.3
7231.0
95% C onfidence Interv al for M ean
846.3
1291.5
95% C onfidence Interv al for M edian
433.9
9 5 % C onfidence Inter vals
1029.5
Mean
Median
400
600
800
1000
789.0
95% C onfidence Interv al for S tD ev
1200
1400
1347.6
Summary for Adjusted Consumption
A nderson-Darling N ormality Test
1000
2000
3000
4000
5000
6000
A -S quared
P -V alue <
3.87
0.005
M ean
S tDev
V ariance
S kew ness
Kurtosis
N
1918.2
1193.3
1423851.4
1.68183
3.49480
108
M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum
295.9
1087.6
1626.5
2482.3
6113.2
95% C onfidence Interv al for M ean
1690.6
2145.8
95% C onfidence Interv al for M edian
1443.4
9 5 % C onfidence Inter vals
1052.6
Mean
Median
1500
1600
1700
1800
1900
1786.6
95% C onfidence Interv al for S tD ev
2000
2100
1377.7
Appendix B.
Final Estimates of Parameters
Type
MA
1
Coef
-0.0206
SE
Coef
0.0971
T
-0.21
P
0.833
MS
37846422882
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
ChiSquare
DF
P-Value
12
24
36
48
25.1
11
0.009
32.5
23
0.09
37.4
35
0.361
52.8
47
0.259
The MA(1) coefficient contains no explanatory power in this model. The normality of
the errors was unquestioned in regards to the normal probability plot and the residual
frequency histogram presented by Minitab. Also, the LBQ statistics imply that the
various lags 12, 24, 36, 48 are serially correlated, this model should be discarded.
Final Estimates of Parameters
Type
AR
MA
Coef
0.7170
0.8393
1
1
SE
Coef
0.2051
0.1561
T
3.50
5.38
P
0.001
0.000
MS
37092726953
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag
ChiSquare
DF
P-Value
12.000
24.000
36.000
48.00
22.200
10.000
0.014
28.600
22.000
0.158
34.300
34.000
0.454
49.40
46.00
0.34
Both the autoregressive and the moving average coefficients are significant and shoulf be
included. The MS is extremely large but seems to be the same as the model above. The
normality of the errors was unquestioned in regards to the normal probability plot and the
residual frequency histogram presented by Minitab. Also, the LBQ statistics imply that
the various lags 12, 24, 36, 48 are serially correlated, this model should be discarded.
Appendix C.
Time Series Plot of Adjusted Consumption diff1
2000
Adjusted Consumption diff1
1500
1000
500
0
-500
-1000
-1500
1
11
22
33
44
55
Index
66
77
88
99
The chart resembles the white noise look that we are searching for. The data seem to
fluctuate around a mean of zero. The autocorrelation and partial autocorrelation
correlograms provided below seem to imply the use of either an MA(1), MA(2), or any
variation of an ARIMA model.
Autocorrelation Function for Adjusted Consumption diff1
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
Partial Autocorrelation Function for Adjusted Consumption diff1
(with 5% significance limits for the partial autocorrelations)
1.0
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
After receiving these results, I viewed the four in one plot to check for the normality of
errors.
Final Estimates of Parameters
Type
AR
MA
MA
1
1
2
Coef
0.8116
0.5083
0.3665
SE
Coef
0.1612
0.1666
0.0919
T
5.03
3.05
3.99
P
0.000
0.003
0.000
MS
158531
The corrected model suggested the use of an ARIMA(1,1,1) model, and, after some
variations were tried, an ARIMA(1,1,2) performed extremely well. The results:
Final Estimates of Parameters
Type
AR
MA
MA
1
1
2
Coef
0.8116
0.5083
0.3665
SE
Coef
0.1612
0.1666
0.0919
T
5.03
3.05
3.99
P
0.000
0.003
0.000
MS
158531
Model:
Ŷ= .8116 Y(t-1) + εt -.5083 ε (t-1) - .3665 ε(t-2)
From the chart we see that autoregressive coefficient and both the moving average
coefficients are significant above the 99% level. This tells us that not only the past value
is important for determining the next one, but that the errors of the past two values are
important. Even though the MS has been reduced we cannot compare that to the models
in Appendix B since the transformation has been done.
Final Estimates of Parameters
Lag
ChiSquare
DF
P-Value
12.000
24.000
36.000
48.000
9.100
9.000
0.427
19.800
21.000
0.537
21.900
33.000
0.929
31.800
45.000
0.932
The LBQ statistics do not raise any flags so we know the residual autocorrelations as a
group are insignificant. Confirmation of this is also provided in the four in one plot and
the residual autocorrelation correlogram below.
Residual Plots for Adjusted Consumption
Normal Probability Plot
Versus Fits
2000
99.9
90
Residual
Percent
99
50
10
1
0.1
1000
0
-1000
-1000
0
Residual
1000
2000
0
2000
20
1000
10
0
3000
4500
Fitted Value
6000
Versus Order
30
Residual
Frequency
Histogram
1500
0
-1000
-800
-400
0
400
Residual
800
1200 1600
1 10 20 30 40 50 60 70 80 90 100
Observation Order
The errors seem to be normally distributed in regards to the normal probability plot. The
residual histogram appears to be normal when we consider the fact that the last bar all the
way to the right in the histogram represents a frequency of one. That is, 1% of the 107
values taken in to consideration lie that far out. The “versus order” plot gives us a time
series of the residuals and it has the appearance of white noise. The last check of
residuals is provided below in the correlogram of the residuals.
Autocorrelation Function for RES of Adjusted Consumption
(with 5% significance limits for the autocorrelations)
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2
4
6
8
10
12
14
Lag
16
18
20
22
24
26
Here are the forecasts from the ARIMA(1,1,2) model along with the copper prices in the
respective time periods. I could not find any futures contracts farther into the future than
2011.
Forecasts of Adj. Cons. from period 2007
Period
2008
2009
2010
2011
2012
Forecast
388.36
454.17
507.58
550.93
586.11
Lower
-392.19
-828.08
-072.54
-241.94
-371.19
Upper
1168.91
1736.41
2087.69
2343.80
2543.41
Copper Pricing
Year
2008
2009
2010
2011
* USGS
**COMEX
Price/pound
3.2*
2.01**
2.0247**
2.0285**
The next table contains the calculated forecasts using the formula:
(Price in Dollars ÷ Pound) ÷ (Pound ÷ Metric Ton) = Price in Dollars / Metric Ton
=Exp. Price
Exp. Price ×Adj. Cons.=Consumption
Forecasts of Consumption in metric tons from period 2007
Period
2008
2009
2010
2011
Forecast
Lower
Upper
8767403
4045266
4587354
4997849
-799052
-944343
-209224
-405955
26388673
15466103
18867915
21262145
As we can see even though the model performed well in terms of significance, the
forecasts seem to be extremely wrong. Here is the amended time series plot including
forecasts of Adjusted Consumption:
Time Series Plot of Adjusted Consumption
6000
forecast cons/dollar
5000
4000
3000
2000
1000
0
1
11
22
33
44
55
66
Index
77
88
99
110
It seems to be predicting that copper demand will pick up in the near future. But when
we see how this translates into the amended times series of Consumption…
Time Series Plot of Consumption
9000000
8000000
Consumption
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
1
11
22
33
44
55
66
Index
77
88
99
110
We see that there is obviously some flaw in the model. In fact the USGS estimates total
2008 US copper Consumption to be 2,000,000 metric tons whereas this model predicts
8,767,403 metric tons.
Theory of model:
Even though the model seems to predict copper demand at roughly 4 times the 2008
predicted by the USGS, there may be some merit to it. The data used in the model only
go through 2007. This means that at the end of the data the US economy is in a steady
climb upward, but most notably, the housing market is on fire. Also the models accounts
for copper prices being around $4 per pound in 2007.
This model could be seen as a prediction of what would’ve happened if the economy and
especially the housing market would have continued upward and if copper prices fell by
half, I used $2 copper to calculate the forecasted demand. Revisiting the plot above, it
tells us that if the market kept up its acceleration and then all of a sudden copper prices
fell dramatically, demand for copper would’ve skyrocketed, which is what we would
expect.
Appendix D.
Year
2008
2009
2010
2011
Average
Price
3.2
2.048278
2.027125
2.0155
Conversion
Factor
0.00045359
0.00045359
0.00045359
0.00045359
Price/ Metric
Ton
7054.829251
4515.703609
4469.068983
4443.440111
Adjusted
consumption
-460.1399517
-989.7340626
-1360.25908
-1662.826319
Lower Forecast
Tonnage
-3246208.791
-4469345.678
-6079091.662
-7388669.165
Year
2008
2009
2010
2011
Average
Price
3.2
2.048278
2.027125
2.0155
Conversion
Factor
0.00045359
0.00045359
0.00045359
0.00045359
Price/ Metric
Ton
7054.829251
4515.703609
4469.068983
4443.440111
Adjusted
consumption
1106.943848
1636.537959
2007.062976
2309.630215
Upper Forecast
Tonnage
7809299.837
7390120.367
8969702.892
10262703.54