Regression of pie sales on price and advertising expenses

Regression of pie sales on price and advertising
expenses
Hrishikesh Vinod
September 22, 2010
1
Introduction
The aim of this document is to teach writing of R code R Development Core
Team (2010) to beginning students. I assume that the reader has already
downloaded R, read some R manuals and started to play with R. The manuals
part of R website has many free R books. For example, see Arai’s A Brief
Guide to R for Beginners in Econometrics
Some help is available by reading my MS-Word file called r-lang.doc available at my website.
For informative viewing of R code with color coding for comments and
important lines use a software called Tinn-R. Download it and have file extension R then view any R code in Tinn-R. This is much fun. Try saving the
Appendix 1 alone as a separate file called appendix1.R and view it with the
Tinn-R viewer freely available at: http://www.sciviews.org/Tinn-R/
If you want to use Tinn-R to send stuff to R directly some more work is
needed. But Tinn-R has a menu on the left side under R-card which helps if
you forgot the exact syntax of any R command. You will have to tell TinnR where Rgui.exe is on your system. Type some simple commands: “x=3;
y=x+3; y”. Now ask Tinn-R to send this to R. Tinn-R will ask you to locate
the Rgui.exe file by browsing. On my system it is at:
C:\Program Files\R\R-2.10.1\bin\Rgui.exe
It may be different in your computer. You may have to use search facility
of your computer to locate. If you have a shortcut for R on your desktop,
1
right-clicking on that and going to the properties tab you will find it under
‘target’.
First we set up some initialization to clean out all old junk from memory
of R
rm(list=ls())
NULL
paste("Today is", date())
[1] "Today is Wed Sep 22 14:52:08 2010"
options(prompt=" ") #this changes the prompt of R to one space
$prompt
[1] " "
# i prefer this because one can copy and paste directly
#to R. The R prompt > also means greater than
#it does not allow simple copy and paste
options(continue="
") #4 spaces original continuation
$continue
[1] "
"
2
Data Entry in R
Now we enter the y(=sales) data:
y=c(350, 460, 350, 430, 350, 380, 430, 470, 450,
490, 340, 300, 440, 450, 300)
[1] 350 460 350 430 350 380 430 470 450 490 340 300 440 450 300
The c( ., ., etc.) means catalogue or list concatenate if an input needs more
than one line, just end earlier lines with a comma comma tells R that more
is coming y is apple pie sales. This is the simplest way of reading data into
R. More advanced ways are available. If you do not choose the options (;)
2
for prompt and (spaces) for continuation, you should get back the prompt
> from R suggesting that you have fully entered the data. If you get the +
sign, R expects continuation of earlier function, that is R wants more stuff.
If you do not see any reason to have more stuff then something is wrong and
get out by pressing the escape key.
Now we enter x(=price) data:
x=c(5.5, 7.5, 8.0, 8.0, 6.8, 7.5, 4.5, 6.4, 7.0,
5.0, 7.2, 7.9, 5.9, 5.0, 7.0)#copy and paste into your R
[1] 5.5 7.5 8.0 8.0 6.8 7.5 4.5 6.4 7.0 5.0 7.2 7.9 5.9 5.0 7.0
x is price charged for the apple pies. Now we enter data for advertising
expenses
z=c(3.3, 3.3, 3.0, 4.5, 3.0, 4.0, 3.0, 3.7, 3.5, 4.0,
3.5, 3.2, 4.0, 3.5, 2.7)#copy and paste into your R
[1] 3.3 3.3 3.0 4.5 3.0 4.0 3.0 3.7 3.5 4.0 3.5 3.2 4.0 3.5 2.7
z is for advertising expenses
Check data Entry for y by simply typing y
y;
[1] 350 460 350 430 350 380 430 470 450 490 340 300 440 450 300
3
Simple Graphics in R
Figure 1 displays y data. There are 15 data points. Now Figure 2 illustrates a line plot by adding ‘typ=l’ to the plot command of R.
Now Figure 3 illustrates a simple box plot. It can be made more meaningful by calling an R package called ‘gtools’ Warnes and Others (2010).
Figure 4 illustrates notched box plots. It uses the choice of color by ‘col’
as “lavender”.
3
plot(y)#copy and paste into your R
●
●
450
●
●
●
●
400
●
350
●
●
●
●
●
300
y
●
●
2
4
6
8
10
12
Index
Figure 1: Plot of pie sales data.
4
●
14
4.5
5.0
5.5
6.0
x
6.5
7.0
7.5
8.0
plot(x, typ="l")
2
4
6
8
10
12
14
Index
Figure 2: Line plot of price of apple pies data.
5
3.0
3.5
4.0
4.5
boxplot(z)
Figure 3: Box plot of advertising data.
6
3.0
3.5
4.0
4.5
library(gtools)
g=quantcut(z)
boxplot(split(z,g), col="lavender", notch=TRUE)
[2.7,3.1]
(3.1,3.5]
(3.5,3.85]
(3.85,4.5]
Figure 4: Notched Box plots of advertising data.
7
4
Preliminary Data Analysis Using R
The summary command in R gives a quick summary of data.
summary(cbind(x,y,z))
x
Min.
:4.500
1st Qu.:5.700
Median :7.000
Mean
:6.613
3rd Qu.:7.500
Max.
:8.000
y
z
Min.
:2.70
1st Qu.:3.10
Median :3.50
Mean
:3.48
3rd Qu.:3.85
Max.
:4.50
Min.
:300.0
1st Qu.:350.0
Median :430.0
Mean
:399.3
3rd Qu.:450.0
Max.
:490.0
This may be adequate for some purposes. Note that it does not give information about skewness, kurtosis, number of data points, number of missing
data, etc.
Understanding the data with tests on null hypothesis mean µ = 0 that is
we are testing if the population mean of one variable at a time is zero. For
example, we can check if the sales data have zero mean as follows in R.
t.test(x)#null hypothesis x data mean is zero
One Sample t-test
data: x
t = 21.8617, df = 14, p-value = 3.207e-12
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
5.964518 7.262149
sample estimates:
mean of x
6.613333
t.test(y)#null hypothesis y data mean is zero
One Sample t-test
data:
y
8
t = 24.3471, df = 14, p-value = 7.366e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
364.1552 434.5115
sample estimates:
mean of x
399.3333
t.test(z)#null hypothesis z data mean is zero
One Sample t-test
data: z
t = 27.5776, df = 14, p-value = 1.331e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.20935 3.75065
sample estimates:
mean of x
3.48
We can see that none of the confidence intervals contain the zero. Hence
all means of y, x and z are significantly different from zero.
Now we consider basic descriptive statistics by using the package ‘fBasics’
Wuertz and core team (2010). Note everything in R is case-sensitive. The
letter B is upper case here. Its output includes results of such tests on the
mean of each variable and even gives confidence limits for such a test.
library(fBasics)
basicStats(cbind(y,x,z))
nobs
NAs
Minimum
Maximum
1. Quartile
3. Quartile
Mean
y
x
z
15.000000 15.000000 15.000000
0.000000 0.000000 0.000000
300.000000 4.500000 2.700000
490.000000 8.000000 4.500000
350.000000 5.700000 3.100000
450.000000 7.500000 3.850000
399.333333 6.613333 3.480000
9
Median
Sum
SE Mean
LCL Mean
UCL Mean
Variance
Stdev
Skewness
Kurtosis
430.000000 7.000000 3.500000
5990.000000 99.200000 52.200000
16.401703 0.302508 0.126190
364.155178 5.964518 3.209350
434.511488 7.262149 3.750650
4035.238095 1.372667 0.238857
63.523524 1.171609 0.488730
-0.215868 -0.431478 0.373353
-1.572078 -1.343088 -0.847871
nobs= number of observations,
NAs=not available or missing data, usually we want zero missing data!
‘1. Quartile’ means the first quartile,
‘3. Quartile’ is the third quartile
SE Mean is standard error of mean for testing µ = 0,
LCL Mean is lower 95
Skewness, Kurtosis are Pearson’s measures based on moment orders 3 and
4.
5
Tukey-style Outlier Detection in R
Professor Tukey of Princeton argued long ago that outlier detection using 3
times standard deviation is faulty since it assumes normal density and standard deviation itself is sensitive to outliers. He suggested using IQR=inter
quartile range to replace standard deviation, since IQR is not sensitive to
outliers. One needs to compute lower and upper limits based on data, such
that data values below (and above) the lower (upper) limit are defined as
outliers.
There is no easy way to do outlier detection in R. I have written an R
function to accomplish this task. It also illustrates how to write functions in
R. A function involves several lines of code to do some computing task. It
has an input and an output. Any data array is input to my function. The
name of the function is ‘get.outliers’ Note the names like everything in R
is case sensitive. One is free to choose any function name one wishes. The
functions are ‘objects’ in R and their names should not conflict with existing
names and should not have strange characters or symbols.
10
#BEGIN OUTLIER DETECTION FUNCTION
#following function (about 30-lines of code) automatically computes outliers
get.outliers = function(x) {
#function to compute the number of outliers automatically
#author H. D. Vinod, Fordham university, New York, 24 March, 2006
su=summary(x)
if (ncol(as.matrix(x))>1)
{print("Error: input to get.outliers function has 2 or more columns")
return(0)}
iqr=su[5]-su[2]
dn=su[2]-1.5*iqr
up=su[5]+1.5*iqr
LO=x[x<dn]#vector of values below the lower limit
nLO=length(LO)
UP=x[x>up]
nUP=length(UP)
print(c(" Q1-1.5*(inter quartile range)=",
as.vector(dn),"number of outliers below it are=",as.vector(nLO)),quote=F)
if (nLO>0){
print(c("Actual values below the lower limit are:", LO),quote=F)}
print(c(" Q3+1.5*(inter quartile range)=",
as.vector(up)," number of outliers above it are=",as.vector(nUP)),quote=F)
if (nUP>0){
print(c("Actual values above the upper limit are:", UP),quote=F)}
list(below=LO,nLO=nLO,above=UP,nUP=nUP,low.lim=dn,up.lim=up)}
#xx=get.outliers(x)
# END FUNCTION here copy+paste all the way to this line
#Now assuming x, y and z are already in memory, use the outlier function as:
xx=get.outliers(x)
[1] Q1-1.5*(inter quartile range)= 3
[3] number of outliers below it are= 0
[1] Q3+1.5*(inter quartile range)=
10.2
11
[3]
number of outliers above it are= 0
#Some authors suggest omitting outliers from data, others say NO.
xx=get.outliers(y)
[1] Q1-1.5*(inter quartile range)= 200
[3] number of outliers below it are= 0
[1] Q3+1.5*(inter quartile range)=
600
[3] number of outliers above it are= 0
xx=get.outliers(z)
[1] Q1-1.5*(inter quartile range)= 1.975
[3] number of outliers below it are= 0
[1] Q3+1.5*(inter quartile range)=
4.975
[3] number of outliers above it are= 0
6
Analysis of Two Variables at a Time
We have read the pie sales (y) data. Demand theory in Economics suggests
that as price (x) increases, y should decrease. We also have advertising data
(z). Marketing theory suggests that as z increases, y also increases. These
binary relations can be studied with simple correlation analysis.
Pearson’s correlation coefficient and Spearman’s rank correlation coefficient are two standard tools for studying relations between two variables.
R allows easy computation of correlations and tests for their significance as
shown in this section.
#Tests for Null that Correlation Coefficient is ZERO! One pair at a time
#there are 3 possible pairs here so make 3 separate tests.
cor.test(x,y)#price and sales correlation is negative
Pearson's product-moment correlation
data: x and y
t = -1.783, df = 13, p-value = 0.09794
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
12
-0.7787120 0.0892576
sample estimates:
cor
-0.4432732
#but correlation coefficient is not statistically significant.
cor.test(z,y)#price and ad correlation is positive
Pearson's product-moment correlation
data: z and y
t = 2.4139, df = 13, p-value = 0.03126
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.06161665 0.83159352
sample estimates:
cor
0.5563199
#this correlation coefficient is statistically significant.
cor.test(x,z)#correlation coeff between x and z
Pearson's product-moment correlation
data: x and z
t = 0.1098, df = 13, p-value = 0.9142
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.4894569 0.5343685
sample estimates:
cor
0.03043758
#is insignificant.
#creating a matrix of correlation coefficients
vmtx=cbind(y,x,z)
cor(vmtx)
13
y
x
z
y 1.0000000 -0.44327318 0.55631986
x -0.4432732 1.00000000 0.03043758
z 0.5563199 0.03043758 1.00000000
library(Hmisc)#this package has a nifty function for all such
#correlation tests and reports three matrices
rcorr(vmtx)
y
x
z
y 1.00 -0.44 0.56
x -0.44 1.00 0.03
z 0.56 0.03 1.00
n= 15
P
y
x
z
y
0.0979 0.0313
x 0.0979
0.9142
z 0.0313 0.9142
These tests on correlation coefficients reveal possible linear relation for
each pair of variables only. But it may fail to detect nonlinear relations!
Section 6.1 below discusses Scatter Plots designed for checking possible nonlinearities.
Note that the output of rcorr function of the ‘Hmisc’ package Frank E
Harrell Jr and Others. (2010) has 3 parts.
1. top matrix gives correlation coefficients in two digits.
2. The middle part of the output of ‘rcorr’ lists the number of pairs of
values for which data are not missing. In our case all pairs of data have
n = 15 observations. R is smart not to report a 3 by 3 matrix of all 15
values here.
The availability of this output means ‘rcorr’ can compute correlation
coefficients in a smart way by using the largest number of data points
available for each pair of data. This is very tedious to do with very
14
large data sets and lots of missing data spread all over. Missing data
are indicated in R by the character string ‘NA’.
3. The bottom part of the output of ‘rcorr’ is entitled P. This refers to
the p-values for the test of the null hypothesis that the correlation
coefficient is zero. When p-value is less than say 0.05 we reject the
null.
6.1
Scatterplots in R
The package ‘car’ Fox (2009) is extremely useful package in R. It provides
very nice scatterplots and many useful things for econometrics. It plots
Cleveland (1981) nonparametric regression line with lowess method and setting span=1/2, along with a 95% confidence band for the nonlinear curve.
Our code creates an additional lowess line setting span=2/3 shown in the
same plot by using the ‘lines’ command useful in any R plot.
Figure 5 displays the scatter plot of y against x. The black line is a
nonparametric smooth regression from lowess with span of 2/3.
Next we study if sales and advertising relation is positive in a sophisticated
scatterplot provided by R.
Figure 6 displays the scatter plot of y against z. The black line is a
possible local nonlinear fit.
7
Regression Analysis in R
We have read the pie sales (y) data. Demand theory in Economics suggests
that as price (x) increases, y should decrease. We also have advertising data
(z). Marketing theory suggests that as z increases, y also increases.
We can run a linear regression of y on x and z very simply in R Regression
allows us to check if these two relations hold simultaneously not just in pairs
as in the previous section.
reg1=lm(y~x+z)# R function called lm=linear model is for multiple regression
#reg1 is name of the object which contains my first regression results
#I could have chosen any name I wish. It is used several times below
#
su1=summary(reg1) #this is needed to get detailed output of regression
su1#regression stats (multiple R square, Adjusted R square, Standard
15
library(car) #install and bring package into current memory of R
scatterplot(y,x, main="Demand Curve ", ylab="Price", xlab="Sales")
lines(lowess(y,x))#note the horizontal axis variable is listed first.
●
●
●
7.5
8.0
Demand Curve
●
●
7.0
●
●
●
6.5
●
6.0
Price
●
5.5
●
5.0
●
4.5
●
●
300
350
400
450
Sales
Figure 5: Scatterplot of pie sales and price data.
16
●
library(car) #install and bring package into current memory of R
scatterplot(z,x, main="Advertising and Sales",
ylab="Advertising", xlab="Sales")
lines(lowess(z,x))#note the horizontal axis variable is listed first.
8.0
Advertising and Sales
●
●
7.5
●
●
●
7.0
●
●
●
6.5
6.0
●
5.5
●
●
5.0
4.5
Advertising
●
●
●
3.5
4.0
●
3.0
4.5
Sales
Figure 6: Scatterplot of pie sales and price data.
17
Call:
lm(formula = y ~ x + z)
Residuals:
Min
1Q
-63.795 -33.796
Median
-9.088
3Q
17.175
Max
96.155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
306.53
114.25
2.683
0.0199 *
x
-24.98
10.83 -2.306
0.0398 *
z
74.13
25.97
2.855
0.0145 *
--Signif. codes: 0 Ś***Š 0.001 Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1
Residual standard error: 47.46 on 12 degrees of freedom
Multiple R-squared: 0.5215,
Adjusted R-squared: 0.4417
F-statistic: 6.539 on 2 and 12 DF, p-value: 0.01201
#Error, degrees of freedom) are given here.
#Excel reports multiple R which need not be reported.
#multiple R=square root of multiple R square, and need not be reported
#plot(reg1) #gives sophisticated plots of residuals
#One MUST hit the "enter" key five times to look at these fancy plots.
#My Hands-on book page 21 explains these plots.
library(xtable) # this creates nice set up for LATEX tables
#I am recommending you to learn to use Latex free word processing software.
xtable(su1)
% latex table generated in R 2.11.1 by xtable 1.5-6 package
% Wed Sep 22 14:52:08 2010
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrrrr}
\hline
& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\
\hline
(Intercept) & 306.5262 & 114.2539 & 2.68 & 0.0199 \\
18
x & -24.9751 & 10.8321 & -2.31 & 0.0398 \\
z & 74.1310 & 25.9673 & 2.85 & 0.0145 \\
\hline
\end{tabular}
\end{center}
\end{table}
This document is prepared by the Latex system. You can learn about it at:
WikiBook Also of interest is Not-so short introduction to Latex. I copy and
paste the output from xtable into my Latex document to get the following
tabulated output for regression coefficients.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 306.5262
114.2539
2.68
0.0199
x -24.9751
10.8321
-2.31
0.0398
z
74.1310
25.9673
2.85
0.0145
7.1
Regression Coefficient Confidence Intervals
confint(reg1) #prints 95% confidence intervals for coefficients
2.5 %
97.5 %
(Intercept) 57.58834 555.464042
x
-48.57626 -1.373916
z
17.55303 130.708883
confint(reg1, level=0.99) #prints 99% confidence intervals
0.5 %
99.5 %
(Intercept) -42.466848 655.519234
x
-58.062245
8.112065
z
-5.187243 153.449158
#confint does not need the package called car, it is generic in R
r1=resid(reg1)#creates an object called r1 containing residuals
sum(r1^2) #prints sum of squared residuals
19
[1] 27033.31
anova(reg1)
#prints the analysis of variance table
and F values for each
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x
1 11100 11100.4 4.9274 0.04646 *
z
1 18360 18359.6 8.1498 0.01449 *
Residuals 12 27033 2252.8
--Signif. codes: 0 Ś***Š 0.001 Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1
#regressor separately. For the F test needed in the paper use the
# LAST line of summary(reg1) output by R
# F-statistic: 6.539 on 2 and 12 DF, p-value: 0.01201
ano=anova(reg1) #ano object has anova results for further manipulation
regSS=ano$S[1]+ano$S[2] #extracts regression sum of squares $S of sum of sq
regSS
[1] 29460.03
residSS=ano$S[3] #extract residual sum of squares
residSS
[1] 27033.31
totalSS=regSS+residSS
totalSS
#INDIRECT estimate of total sum of squares
[1] 56493.33
totSS=var(y)*(length(y)-1)
totSS #direct estimate of total sum of squares should match exactly
[1] 56493.33
20
7.2
Regression Forecasting and Quality of Fit
Often the regression model is used for forecasting. In our case, one might
want to forecast sales y from known values of x and z. If the model fit is
good then the forecasts are likely to be good.
Hence we are lead to answer the question: How good a job does the
regression model do in fitting? One way to check this is to plot y values
against fitted y values. This is done in R as follows
Figure 7 displays y data against fitted y data with a 45 degree line also
plotted. Since R does smart choice of ‘aspect ratio’ in its plots, the two axes
do not have identical scale. The above R code is designed to get it right and
draw the 45 degree line.
7.3
Study of Regression Residuals
One of the assumptions of classical regression model is that the regression
errors are independent and identically distributed. This means no autocorrelation and no heteroscedasticity. We shall not discuss heteroscedasticity here.
It is discussed in Vinod (2008) and Vinod (2010). We shall also not discuss
four residual diagnostic plots (i) residuals vs fitted, (ii) Normal quantilequantile plot to see if residuals are normally distributed, (iii) scale location
plot, and (iv) Residuals versus leverage plots to locate outliers. mentioned
on page 20 of Vinod (2008). All are obtained by the simple R command
‘plot(reg1)’ if the regression is saved into the object ‘reg1.’
Recall that we have created an r object called r1 containing residuals of
regression based on the R object called reg1. First we compute the simple
autocorrelation between residuals The following code illustrates subsetting
of r1 data in R When we find correlation between r1 and lagged r1, we lose
one observation. The usable array is r1[2:n] and its lagged values are in the
array r1[1:(n-1)]. The range of two r1 vectors must be properly specified as
in the following R code.
n=length(r1)#useful R function length computes array length
cor(r1[1:(n-1)],r1[2:n])#correlation between residuals(t)
[1] 0.06579175
However, going beyond first order lag becomes tedious. The well known
Durbin-Watson test for autocorrelated errors in a regression is more powerful
21
fity=fitted(reg1)
plot(y,fity, main="Fitted against Actual y",
xlab="Actual Y", ylab="Fitted Y")
# Good fit =this figure should be close to a virtual 45 degree line
#the actual angle will depend on the scales along axes
#it will be 45 degree only if origin+ scale of axes match exactly
lines(c(min(y),max(y)), c(min(y),max(y)))#draws the 45 degree line
miny=min(y)
minfy=min(fity)
minboth=min(miny,minfy)
maxy=max(y)
maxfy=max(fity)
maxboth=max(maxy,maxfy)
plot(y,fity, main="Fitted against Actual y", xlab="Actual Y",
ylab="Fitted Y", xlim=c(minboth,maxboth),
ylim=c(minboth,maxboth) )
lines(c(minboth,maxboth), c(minboth,maxboth))#draws 45 degree line
Fitted against Actual y
●
450
●
●
●
●
●
400
Fitted Y
●
●
●
●
●
350
●
●
●
300
●
350
400
450
Actual Y
22
Figure 7: Plot of Fitted Sales against Actual Sales with a 45 degree line
than simple correlation computations. It is available in the R package ‘car’
Fox (2009) for any suitable number of lags. For quarterly data, economists
are interested in lags of order 4.
require(car)# the R package car needs to have been loaded
durbin.watson(reg1, max.lag=4)
lag Autocorrelation D-W Statistic p-value
1
0.0643978
1.683120
0.590
2
-0.1887002
1.844379
0.944
3
-0.2812442
2.004223
0.546
4
-0.2324956
1.823004
0.622
Alternative hypothesis: rho[lag] != 0
A simple function in R called ‘acf’ computes autocorrelation coefficients
and plots them while showing confidence bands, so that if the value of any acf
falls outside the band it is statistically significant, implying that the problem
of autocorrelated errors is present in the regression model.
Figure 8 displays the autocorrelation function of residuals showing that
all autocorrelations are not significant.
8
Conclusion
R is extremely powerful and easy to use tool for Statistical data analysis.
What is shown above is only a tip of the iceberg. R has over 2000 packages
doing all kinds of nifty things in statistics and graphics.
My book entitled: “Hands On Intermediate Econometrics Using R: Templates for Extending Dozens of Practical Examples.” World Scientific Publishers: Hackensack, NJ, 2008. The Internet link Hands-on Econometrics has
further information about buying the book, errata and exercises. It is listed
in the bibliography along with Vinod (2008).
I would urge everyone to also download the LATEX typesetting software
Some help is available by by googling the phrase Latex software. See Latex
for dummies may be a good start. It notes that you have first download
MikTex etc. You will need a front end for Latex. I use ‘WinShell’ available
by googling the phrase. I have already mentioned Not-so short introduction
to Latex.
23
acf(r1)
−0.5
0.0
ACF
0.5
1.0
Series r1
0
2
4
6
8
10
Lag
Figure 8: Plot Autocorrelation Function of Regression Residuals
24
Fordham Ph.D. students can use a dissertation template file for Fordham
students. Fordham Dissertation Template zip file The most convenient feature of Latex is that it sets up bibliography, indexing, table of contents, very
complicate math equations, tables, etc. in a very flexible way.
My own notes (very ill-organized, containing all I know of Latex) are
available by reading my MS-Word file called latex help.doc available at my
website.
This particular pdf document is created by Latex in combination with the
R function ‘Sweave’ Leisch (2002). I first created a file with latex commands
and R commands called “*.Rnw” and used the R function Sweave on that
file to create the *.tex file needed for Latex. Next, I used ‘WinShell’ to
create the present pdf file. If you agree that mine is a very good looking
document, (much better than what MS Word produces) you can also start
writing similar documents with a little patience.
References
W. S. Cleveland. Lowess: A program for smoothing scatterplots by robust
locally weighted regression. The American Statistician, 35:54–, 1981.
J. Fox. car: Companion to Applied Regression. R package version 1.2-14,
2009. URL http://CRAN.R-project.org/package=car.
Frank E Harrell Jr and Others. Hmisc: Harrell Miscellaneous, 2010. URL
http://CRAN.R-project.org/package=Hmisc. R package version 3.8-2.
Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz, editors, Compstat 2002 — Proceedings in Computational Statistics, pages 575–580. Physica Verlag, Heidelberg, 2002. URL http://www.stat.uni-muenchen.de/
~leisch/Sweave. ISBN 3-7908-1517-9.
R Development Core Team. R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria, 2010.
URL http://www.R-project.org. ISBN 3-900051-07-0.
H. D. Vinod. Superior estimation and inference avoiding heteroscedasticity
and flawed pivots: R-example of inflation unemployment trade-off. In H. D.
25
Vinod, editor, Advances in Social Science Research Using R, pages 39–63.
Springer, New York, 2010.
Hrishikesh D. Vinod. Hands-on Intermediate Econometrics Using R: Templates for Extending Dozens of Practical Examples. World Scientific, Hackensack, NJ, 2008. URL http://www.worldscibooks.com/economics/
6895.html. ISBN 10-981-281-885-5.
Gregory R. Warnes and Others. gtools: Various R programming tools, 2010.
URL http://CRAN.R-project.org/package=gtools. R package version
2.6.2.
Diethelm Wuertz and Rmetrics core team. fBasics: Rmetrics - Markets
and Basic Statistics, 2010. URL http://CRAN.R-project.org/package=
fBasics. R package version 2110.79.
26