Linear Regression without an intercept

Regression
Given a set of data {π‘₯𝑖 , 𝑦𝑖 } 1 ≀ 𝑖 ≀ 𝑛. We want to discover a
simple linear relationship between the independent variable π‘₯𝑖
and the corresponding dependent variable 𝑦𝑖 in the form of
factor).
𝑦𝑖 = 𝛽0 + 𝛽1 π‘₯𝑖 +βˆˆπ‘– … (1) (last term is the noise
If we try a scatter plot for all these data points, perhaps it might
look like
Y axis
X axis
To find a liner model means finding the β€˜best’ straight line that
could go through some of the points such that the sum of the
squares of the distances from the original given point to our
drawn line is minimum.
Y axis
X axis
This particular line is called least square line, trend line, linear
regression line, linear model, the best fit line given by two
parameters:
Intercept: OA
Slope: tan πœƒ, where πœƒ is as shown.
Y axis
A
πœƒ
O
X axis
This generalizes to multiple linear regression line over several
independent variables:
𝑦𝑖 = 𝛽0 + 𝛽1 π‘₯𝑖 + 𝛽2 𝑀𝑖 + 𝛽3 𝑣𝑖 +βˆˆπ‘–
… (2)
Our task, given our data, is to fit a linear model like (1) or (2). Of
course, we have a criterion for model fit.
Getting a linear fit model is no problem, but the issue is how well
it fits our data. And we figure out our performance from
regression statistics.
Let’s look at the dataset β€˜cement’ in MASS.
> library(MASS)
> data(cement)
> str(cement)
We go to http://cran.r-project.org/web/packages/MASS/MASS.pdf
to look into the dataset cement.
x1, x2, x3, and x4 are described as active ingredients of Portland
cement (active chemicals), and y is the amount of heat escaped
from the cement while it was setting.
Let’s look at the data.
> summary(cement)
Let us get a simple linear model regressing y via the independent
variable x1.
> model=lm(y~x1, data=cement)
> model
If we call the object model, we get the following information.
It appears for us,
𝑦 = 81.479 + 1.869 π‘₯1
π‘₯1 : Predictor variable
𝑦: Response variable.
I want to see how well the model fits our original data 𝑦 and the
fitted data from linear model, which we call the fitted(model).
> mf=fitted(model)
> mm=data.frame(mf,y)
> mm
How do the correlation coefficients look for any two variables?
> cor(cement) # Gives the pairwise correlation matrix
This shows the correlation between x1 and y is about 0.73, and
its square 0.53 implies that this linear model can explain about
53% of the observed behavior of y against x1.
80
90
y
100
110
REGRESSION PLOT of Y vs X1
5
10
15
20
x1
If we want to plot our model, this is what we do
> plot(y~x1, data=cement,
+ main="REGRESSION PLOT of Y vs X1")
> abline(model) # This draws the least square line
Expansion to Multiple Linear Regression
We are now going to estimate a linear model lm using all the
variables x1, x2, x3, and x4 together.
> mlm=lm(y~x1+x2+x3+x4, data=cement)
> mlm
Now the Multiple linear Regression estimate looks like
𝑦 = 62.4054 + 1.5511π‘₯1 + 0.5102π‘₯2 + 0.1019π‘₯3 βˆ’ 0.1441π‘₯4
Let’s look at how well they fit the data this time
> mfall=fitted(mlm)
> mfall=data.frame(mfall,y)
> mfall
> summary(mlm) #This gives the model performance stats
How do we read this summary?
β–  The very first line under the call shows us our model whose
summary we are after.
β–  Ideally, the regression residuals should have symmetric bellcurve distribution centered around 0. Sign of the median indicates
that the distribution is skewed to the right. Higher the magnitude
of the median more skewedness!
β–  If the residual distribution is perfectly symmetrical we should
have the magnitude of 1Q and 3Q values equal. In our case,
magnitude of 1Q is higher than 3Q, indicating residual distribution
slightly skewed to the right.
β–  Min and Max components refer to extreme outliers in the data
as far as response variable is concerned.
β–  Every model presupposes a hypothesis. One hypothesis is Null
hypothesis: nothing was learned, the treatment had no effect, the
model did not improve. Alternative to null hypothesis is the
alternative hypothesis that something happened, the mean
improved, treatment improved patient’s health, etc. We always
assume Null hypothesis is true. We then calculate a test-statistics
that computes the probability (p-value) that the Null-hypothesis is
true. If this has a very low value (usually <0.05), we are more
inclined to accept the Alternative hypothesis. If the data is very
noisy perhaps p-value as high as 0.1 could be used to switch to
Alternative hypothesis.
β–  Coefficients refer to the 𝛽 coefficients of the associated
variables in our formula under the estimate column. If the
coefficient is very small or 0, it is worthless to consider further.
Therefore, we ask the valid question: How likely is it that the true
coefficient is zero? That is the purpose of t-statistics, and the pvalues (last column). Smaller the p- value, more confident we are
that the model estimate for that variable should be accepted. For
the entire model, the p-value is about 4.756e-07, which is very
small indicating our coefficients are just about perfect.
β–  Look for R-squared values, particularly the adjusted ones. This
shows to what extent our linear model explains the data. This is
very high, over 97%. This is an extremely good model.
β–  For the statisticians bigger F values means the more
acceptability of the model with lower noise.
Linear Regression without an intercept
Sometimes our data because of noise may coerce a pattern that
may not be what the physics of the process implies. If we know
that the trend line should go through the origin, we need that
trend line. This is how we propose to do to coerce the data to
follow our alternative model.
> # we add a 0 to the rhs of the model to force a 0 intercept
> model=lm(y~x1 + 0, data=cement)
> summary(model)
Notice, the intercept is now 0, and the slope of the trend line
changed. The significance of the data with *** implies it should
be accepted.
Expansion of regression models to incorporate nonlinear
terms
We now want to include interaction terms in our regression. The
term u*v implies the variables u and v are interacting in the
model.
For a model like
> model = lm(y~x1*x2+x3*x4, data=cement)
> model
Call:
lm(formula = y ~ x1 * x2 + x3 * x4, data = cement)
Coefficients:
(Intercept)
88.67608
x3:x4
-0.02090
x1
0.12015
x2
0.02587
x3
0.61370
x4
-0.19599
x1:x2
0.02827
Our model now changes to
𝑦 = 88.6761 + 0.12π‘₯1 + 0.026π‘₯2 + 0.61π‘₯3 βˆ’ 0.196π‘₯4 + 0.028π‘₯1 π‘₯2
βˆ’ 0.02π‘₯3 π‘₯4
This is how we introduce nonlinearity into our model.
Incorporation of u*v implies the model will use u, v and u*v as
likely independent variables to see how the response variable
behaves.
Note additional syntax:
> model=lm(y~(x1+x2+x3+x4)^2, data=cement)
> model
This implies all individual and first-order interactions.
Call:
lm(formula = y ~ (x1 + x2 + x3 + x4)^2, data = cement)
Coefficients:
(Intercept)
x1
x2
x3
x4
x1:x2
-4.657e+01 4.208e+00 9.180e-01 -3.009e+00 9.836e-01 -6.533e-04
x1:x3
x1:x4
x2:x3
x2:x4
x3:x4
1.041e-01 -5.214e-02 8.500e-02 1.514e-02 3.225e-02