Workshop on
Introduction to Software R
April 2015
Dr Maheswaran Rohan
Department of Biostatistics and Epidemiology
Auckland University of Technology
Auckland
Time Table
9.00 – 10.15
Session 1
Introduce our-self
History of R and Brief Introduction of R
Reading and writing data in R
10.15 – 10.30
Morning Tea Break
10.30 – 12.00
Session 2
Introduction of R commander
Descriptive Analysis – Graphs and Tables
12.00 – 12.45
Lunch
12.45 – 2.45
Session 3
Statistical modelling
Simple linear regression
2.45 - 3.00
Afternoon Tea Break
3.00 - 4.30
Session 4
Multiple linear regression
Session 1
History of R and Basic R
Software R
• Originated in University of Auckland, New Zealand
• Main authors for the software
• Ross Ihaka and Robert Gentleman
• It is powerful and flexible plus free
• But have a steep learning curve.
• However, easy to gather information via online
• World wide, it is very popular statistics software
• Easy to learn statistical methods through R
• R is case sensitive
• More details
• http://cran.r-project.org/
Image was obtained from The New York Times
Why R
Nature , Vol 517, pages 109 - 110
January 6, 2009
“R is also the name of a popular programming
language used by a growing number of data
analysts inside corporations and academia.”
“While it is difficult to calculate exactly how many
people use R, those most familiar with the software
estimate that close to 250,000 people work with it
regularly. The popularity of R at universities could
threaten SAS Institute, the privately held business
software company that specializes in data analysis
software.”
January, 2015
“Now, twenty years later, it has become one of
the most widely used Statistical Analysis
software packages around the world”.
Like any other skill, learning R cannot be
done overnight. But Jennings says that it is
worth it. “Make that time. Set it aside as an
investment: for saving time later, and for
building skills that can be used across
multiple problems we face as scientists.”
Indication of Colours
• Red
–
Type the R code
• Magenta
–
Click the menu
• Blue
–
You don’t need to do anything,
(Just for teaching purpose).
Install R
• Window users
• http://cran.r-project.org/bin/windows/base
• Click Download R 3.1.2 for Windows
• Click Run to install R
• When you install R
• Close all other programs
• Use default
• Once R completely installed
• You can see
icon on the desktop.
Open R
Double click
icon in desk top
Not at all clear what to do next
R Menu
Even less clear what to do next
9
RGui
R Console
• RGui
– R - Software name
– Gui – Graphical User Interface.
• R prompt
– “ > ” – greater than sign
– You can type your command here
– Known as “Command line”
• R Console window is the most
important element of the RGui.
– R interprets and executes the
command
Prompt
R-version
Released on
2014-10-31
Simple Arithmetic Expression
•
•
•
•
•
+
*
/
ˆ
addition
subtraction
multiplication
division
exponentiation
i.
>
ii. >
iii. >
iv. >
v.
>
vi. >
vii. >
viii.>
2 + 5
2- 5
-2--3
3 * 4
13 / 4
2^5
1/2+3
1/(2+3)
Example of R Notations and Functions
Notations in R
‘?’ – help
‘??’ – Search
‘#’ – User Notes or
Comments
‘c’ - vector
Numerical functions
>exp(10)
>log(10)
>?log
# Give more detail about ‘log’ function
>log(10, base = 10)
>pi
Vector
i. >c(1,3,5,6)
ii. >c(1,3,5,6)/2
iii.>c(1,3,5,6) + c(3,5,6,9)
iv. >c(1,3,5,6) * c(3,5,6,9)
v. >c(1,3,5,6) / c(3,5,6,9)
List Objects and Assigning Values
• List objects
>ls()
character(0) means
• There are no objects in R's memory.
• Your computations are lost
• Because they have not been assigned to an object.
• Need to assign the command as an object
• Assigning operator “ <- ”
• Known as Left Pointing Arrow
• Nowadays, “ = ” sign is also used instead of “ <- ”
Exercise
>x <- 2
>x
>x = 4
>x
• Indexing Vector
>x <- seq(1, 50, 2)
>x
>x[2] # indexing vector denoted in square bracket.
>X # See error message
Note that, R is case sensitive
>y<-rnorm(10, 5,2)
>x <- c(5,8,2,3)
>x
>y<- c(7,2,9,0)
>a <- x + y
>a
>mean(a)
# Average
>sd(a)
# Standard Deviation
>y
>y[5]
>a25<- x[2] +y[5]
>a25
>ls()
• List objects
>ls()
• Remove objects
>rm(list=ls())
>ls()
# Generating random observations from normal
distribution with mean 5 and sd 2
• >quit() or q()
• Say “NO”
• We will recommend a tidier way of
saving your work
• Open R
• >ls()
• All your work is lost
How do we keep the record?
• Use Script
• It is a R editor
• Go to Menu
• File >> New Script
• Save it in desirable directory (H:\\R\\)
• File >> Save as
• Give appropriate name with R extension
• For Example “workshop1.R”
• To run the script
• Highlight statements and press Ctrl and R
OR
• Place cursor on the line and click
in RGui
Data Frame
x <- 1:10
y <- c(0.7, 2.1,3.2, 3.6,5.3,5.9, 7.3, 8.1, 8.9, 11)
da<-cbind(x,y)
da
is.matrix(da)
• Setting up Data Frame
da<-data.frame(da)
is.matrix(da)
da$x
Plot and Graphical Parameters
Plot
plot(y~x, data=da)
Characters Size
plot(y~x, data=da,cex=1:10)
# Try cex = (1:10)*0.1
Point of Characters
plot(y~x, data=da, pch=1:10,cex =1.5)
# Try pch = 11:20
Colours
plot(y~x, data=da, pch=11:20,cex=1.5,col=1:10)
Types of patterns
plot(y~x, data=da, pch=11:20,cex=1.5, col=1:10, type="l")
plot(y~x, data=da,pch=11:20, cex=1.5, col=1:10, type="o")
Types of lines
plot(y~x, data=da, pch=11:20, cex=1.5, col=1:10, type="o", lty=2)
# Try lty =3
Thickness of line
plot(y~x, data=da,pch=1:10, cex = 1.5, col=1:10,type="o", lty=3,lwd=2)
# Try lwd=1.25,0.75
z=LETTERS[1:10]
z
text(x,y,z)
# Try this
plot(y~x, data=da,pch=1:10,col=1:10,type="o",cex = 1.5, lty=3,lwd=2)
text(jitter(x), jitter(y), z)
# Try this
plot(y~x, data=da,pch=1:10,col=1:10,type="o", cex = 1.5, lty=3,lwd=2)
text(jitter(x, 1.5), jitter(y, 2), z)
Example of writing R functions
• Area of the circle = A = π r2 , where r is the radius of the circle
• A <-function(r)
{
pi*r^2
}
A(5)
A(2)
Import data to R from various sources
• If data is in .txt form (Eg: saved in Notepad),
use read.table()
Eg: fev.txt<- read.table("H:\\R\\fev.txt", header=TRUE)
• If data is in .xlsx form (Eg: saved in Excel), better to convert to .csv file
use read.csv()
Eg: fev.csv <- read.csv("H:\\R\\fev.csv")
• If data is in SPSS,
library(foreign)
Use read.spss()
• Quick reference
• http://www.statmethods.net/input/importingdata.html
View in R
head(fev.csv)
dim(fev.csv)
str(fev.csv)
Export from R to Various sources
• Use
write.table()
write.csv()
write.csv(object, give desire path to save the object)
Session 2
R-Commander and Descriptive Analysis
Problems faced by non-statisticians
• Typing code is new for those used to a point and click approach
• This creates command line errors
• Hard to understand the syntax error messages
• It is hard to familiarize the system
• because they are occasional users.
• Difficult to understand the help files within R
Alternatives within R
• R Commander (Rcmdr)
• Author: Prof John Fox (McMaster University)
• http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/
• iNZight
• Author: Chris Wild (University of Auckland)
• https://www.stat.auckland.ac.nz/~wild/iNZight/
• Deduce
• Author: Ian Fellows (PhD Candidate)
• Similar to but more limited than R Commander
• http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual
• RExcel
• Access to the statistics package R from within Excel
R Commander
• Similar to many point and click statistical
packages
• Allows users to create and run R code using
menus
• Useful tool for learning R and carrying out
statistical analyses
Issues with R-commander
• Cannot use to the complex situations
• Unstable
• It may freeze
• Particularly, if you open a number of windows
• Manage unstable issue
• Close all windows except R and R-commander
• Save script window frequently
• Option to use Single Document Interface (SDI)
R-Packages / Library
• Some packages already come with the installation
version of R.
• Some other packages need to be downloaded.
• More than 4000 packages available.
http://cran.r-project.org/
click packages
click Table of available packages, sorted by name
• Contributed by world wide authors
• Reference manual is available.
Install Additional Packages
• Example
• Install Rcmdr package
• Go to R and click packages in the menu
• Click Install packages ….
• Select New Zealand on the CRAN mirror window, then click ok
• Select Rcmdr on the Packages window, then click ok
• library(Rcmdr)
Menu
When you click on the
menu, script will be
automatically written here.
•
• You can write your own
script if need, then click
submit.
•
•You can view the results here.
•Some other situations other
window
will
open
automatically.
• Detail
about the data set when
you import the data.
• Error messages will be
displayed if anything goes
wrong.
Import the data from other resources
• Eg: Import FEV data set
• Data >> Import data >> from text file…
•
•
•
•
Enter the name of the data set – fev (you can give your own name if you wish)
Under the field separator, Tick commas if your data set has .csv extension.
Click OK
Search your directory where data is stored, then double click it.
Data
• Obtained from
• Rosner, B. (1999), Fundamentals of Biostatistics, 5th ed., Pacific Grove, CA: Duxbury
• Based on these studies
• Tager, I., Weiss, S., Munoz, A., Rosner, B., and Speizer, F. (1983), “Longitudinal Study of
the Effects of Maternal Smoking on Pulmonary Function,” New England Journal of
Medicine, 309(12), 699-703.
• Tager, I., Weiss, S., Rosner, B., and Speizer, F. (1979), "Effect of Parental Cigarette
Smoking on the Pulmonary Function of Children," American Journal of Epidemiology,
110(1), 15-26.
• Interest of the study
• The data are part of a larger study to follow the change in pulmonary function
over time in children.
• Assessing children's pulmonary function in the absence or presence of smoking
cigarettes, as well as exposure to passive smoke from at least one parent.
Data cts…
• Forced Expiratory Volume (FEV) is an index of pulmonary function that measures
the volume of air expelled after one second of constant effort.
• Participants
• 654 children ages 6-22
• Study Location
• Seen in the Childhood Respiratory Disease Study in 1980 in East Boston, Massachusetts.
• Variables Recorded
•
•
•
•
•
•
ID
Age
FEV
Height
Sex
Smoker
-
ID Number
years
litres
inches
Male / Female
Non = Non-smoker / Current = Current smoker
(Exposure to passive smoke from at least one parent)
View Data Set
• If data imported successfully, you can see in the
second line of from the top
• Data set: fev (Appeared automatically)
• Click View data set on the same line
• Note that, some functions are not available in Rcmdr
• Type head(fev) in the script window and click Submit
Summary Statistics
• Statistics >> Summaries >> Active data set
ID
Min.
:
Age
201
Min.
: 3.000
FEV
Min.
:0.791
Height
Min.
:46.00
1st Qu.:15811
1st Qu.: 8.000
1st Qu.:1.981
1st Qu.:57.00
Median :36071
Median :10.000
Median :2.547
Median :61.50
Mean
Mean
Mean
Mean
:37170
: 9.931
:2.637
:61.14
3rd Qu.:53639
3rd Qu.:12.000
3rd Qu.:3.119
3rd Qu.:65.50
Max.
Max.
Max.
Max.
:90001
:19.000
:5.793
:74.00
Sex
Smoker
Female:318
Current: 65
Male
Non
:336
:589
Any association between gender and smoking?
(Not interested in this study, but it is only for learning purpose)
• Statistics >> Contingency tables >> Two-way table..
• Click Sex
• Click Smoker
• H0: No association between variables (Independent)
H1: There is an association between variables
Frequency table:
Smoker
Sex
Current
Non
Female
39
279
Male
26
310
Pearson's Chi-squared test
data:
.Table
X-squared = 3.739, df = 1, p-value = 0.05316
Comparison between smokers and nonsmokers
•
•
•
•
•
Statistics >> Summaries >> Numerical summaries..
Click FEV
Click Summarize by groups
Click Smoker and OK
OK
mean
sd
IQR
0%
25%
50%
75%
100% data:n
Current 3.276862 0.7499863 0.956 1.694 2.795 3.169 3.751 4.872
Non
2.566143 0.8505215 1.128 0.791 1.920 2.465 3.048 5.793
65
589
Graphical Comparison
• Graph >> Boxplot
• Click FEV
• Click Plot by groups
• Click Smoker and OK
• OK
Ex. Draw a box plot FEV vs Sex
outliers
Any impact on FEV by smoking?
• Statistics >> Means >> Independent samples t-test..
• Click Smoker in Groups Box
• Click FEV in Response Variable
• H0: µsmoker = µnon-smoker
H1: µsmoker not equal to µnon-smoker
Welch Two Sample t-test
data:
FEV by Smoker
t = 7.1496, df = 83.273, p-value = 3.074e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.5130126 0.9084253
sample estimates:
mean in group Current
mean in group Non
3.276862
2.566143
• Why different results between graphical presentation and numerical output?
• t-test is very sensitive to outliers
View the outliers changes
• Just modify the one of previous codes in the script window by typing
:Smoker
Boxplot(FEV~Sex:Smoker, data=fev, id.method="y") and submit
Histogram
Graphs >> Histogram
Relationship between FEV vs Age
Graphs >> Scatterplot
Click Options tab to modify the graph
Pairwise relationship
• Graphs >> Scatterplot matrix
FEV versus all covariates
Graphs >> XY Conditioning Plot
Session 3
Statistical Modelling
Simple Linear Regression
Simple Linear Regression
• What do you mean Simple Linear Regression?
• Simple: Only two variables x and y
[one variable for response y, and other one for explanatory variable
x]
• Linear:
Relationship between x and y is not curve.
[mean of y = µy = β0 + β1 x]
Intercept
Slope
• Regression:
The possible value of y can be derived for given x values.
Model
• Variation in y values is expected
• We cannot observe exact y value for fixed value of x
• Need to add some unexplained variable ε
• Known as error or residual
• Model:
observed y = mean of y + error
y = µy + ε
y = β0 + β1x + ε
Parameters
Population
y = β0 + β1 x + ε
Sample
ŷ = β̂ 0 + β̂1 x
Statistics / Estimates
Assumptions
(i)
Independent
•
•
•
•
The error terms are independent of each other.
This imply the y-values equivalently are independent.
It can be, graphically, tested by plotting residual versus fitted value
Warning
• If independent is violated, the estimates no longer optimal.
(ii)
Constant variance.
• Variance of the errors is the same regardless of x.
• It can be, graphically, tested by plotting residual versus fitted value
• Warning
• Non-constant error variance does not bias the estimates, but does affect
efficiency. [The standard errors are usually inflated]
(iii) Normally distributed.
• Errors are normally distributed with mean 0 and variance σ2.
ε ~ N(0, σ2)
• This implies that y ~ N( β0 + β1x, σ2)
• This assumption is needed only for testing hypothesis,
not for estimation procedure.
• It can be, graphically, tested by using Q-Q plot
(iv) Variable x is fixed.
• In observational studies, x is hardly ever fixed.
• Assumed measurement error in x is very small.
Modeling
Four steps
• Explore/Inspect the data
Draw plots and make tables (Already done)
•
Specify the model
Define structure of the model with unexplained error
•
Fit the model
Possible closeness to the observed data
•
Subject the model to criticism
Check the assumptions
Examples of models in R
• lm - linear model (regression)
• aov - anova
• glm - generalised linear model • gam - generalised additive model
• rpart - tree
Examples
lm(y~x)
aov(y~sex)
lm(y~x+sex)
lm(y~x*sex)
Normal Data
Non - Normal Data
Compromise parametric
and non-parametric
methods
[refers regress y on x]
[regress y on factors]
[regress y on main effects x and sex ]
[regress y on main effects and
interaction]
Extract the information from R output
• Create a model
model1<-lm(y ~ x + sex)
• Extract the information
resid(model1) fitted(model1) plot(model1)
summary(model1)
Gives estimated errors
Gives estimated y values
Gives plots to test the assumptions
Summary of the model1
• Other useful commands print/ coef/ predict/plot/update
The relationship between FEV and Age
Model 1
FEV= β0 + β1 Age + ε
where β0 and β1 are model parameters to be estimated.
Assumptions about error ε:
1. Independent
2. Constant Variance
3. Normal distribution
Fitting Model
Statistics >> Fit models >> Linear model
• Click OK
Interpretation of the model
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.431648
0.077895
5.541 4.36e-08 ***
Age
0.222041
0.007518 29.533 < 2e-16 ***
• Fitted Model
FÊV = 0.4316 + 0.2220 Age
• For each year increase in age, the volume of FEV will
increase by 0.222 litre.
• Is this result true for all ages?
• Confidence Interval for estimates:
Models >> Confidence Intervals
Estimate
2.5 %
97.5 %
(Intercept) 0.4316481 0.2786920 0.5846042
Age
0.2220410 0.2072777 0.2368043
5
ŷ - fitted value
y - Overall mean value
Deviation from mean = y − y
4
Error
Deviation from
mean
3
FEV
Deviation reduced by
regression
Mean = 2.637
Deviation reduced by regression = ŷ − y
Error = y − ŷ
SST = ∑ (y − y) 2
SSR = ∑ (ŷ −y)
2
1
2
SSE = ∑ (ŷ −y)2
SST = SSR + SSE
5
10
15
Age
ANOVA Table
Sources of
variation
Degrees of freedom Sum of squares
Mean sum of
squares
Regression
1
SSR
MSR = SSR/1
Error
n-2
SSE
MSE = SSE /(n-2)
Total
n-1
SST
F ratio
MSR/MSE
H0 : Regression model has no predictive capability
(that all population regression coefficients are 0 simultaneously)
H1 : At least one of coefficient is not equal to 0
• R2 = SSR/SST
• For example,
• R2 = 80% means that
• 80 % of the total variation explained by the regression
ANOVA Table in R commander
• Models >> Hypothesis Test >> ANOVA table
Anova Table (Type II tests)
Response: FEV
Age
Sum Sq
Df
280.92
1
F value
Pr(>F)
872.18
< 2.2e-16 ***
Residuals 210.00 652
--Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05
Call:
lm(formula = FEV ~ Age, data = fev)
Residuals:
Min
1Q
Median
-1.57539 -0.34567 -0.04989
3Q
0.32124
Max
2.12786
0.222041
Test Statistics for slope = 0.007518 = 29.533
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.431648
0.077895
5.541 4.36e-08 ***
Age
0.222041
0.007518 29.533 < 2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1
2
MSE
Residual standard error: 0.5675 on 652 degrees of freedom
Multiple R-squared:
0.5722,Adjusted R-squared:
F-statistic: 872.2 on 1 and 652 DF,
0.5716
R2
p-value: < 2.2e-16
=29.5332
• Note that, in simple linear regression case
F-statistics = t value2 for Age.
ANOVA Table
Interpretation R2
• R2 = 0.5722
• 57 % variation in FEV is explained by the Age
• Correlation between FEV and Age is 0.7564 (= 0.5722 )
• In Rcmdr
Statistics >> Summaries >> Correlation Matrix
Age
FEV
Age 1.000000 0.756459
FEV 0.756459 1.000000
Testing Assumptions
• Models >> Graphs >> Basic diagnostic
plots
• Use Normal Q-Q plot to assess the normal
distribution
• Use the Residual Vs fitted plot to exam the
independent assumption
• Use either the Residual Vs fitted plot or
standardized residual Vs Fitted values to
exam the constant variance.
• Use the lower right panel plot to identify
outliers or influential points.
• Questions
• Is height associated with FEV?
• Is the relationship the same for the male and
female?
• Any impact on FEV by the interaction of smoking
and age?
• Answer
• Not sure
• Need further analysis to confirm this
• What Analysis?
• Possible to fit regression models for FEV on
Height, FEV on Sex (Ex)
• Multiple Regression
Session 4
Multiple Linear Regression
Multiple Regression
• Response variable is still continuous
• Same assumptions of simple linear regression
• Difference between simple and multiple linear regression are
•
•
•
•
•
More than one independent variables
Explanatory (independent) variables may be either continuous or factor
Interaction between explanatory variables may be possible
Relationship among covariates may be possible – Multicollinearity
It is important to identify the minimum adequate model (MDM)
Fitting main effects and interaction model
Call:
• Statistics >> Fit models >> Linear
model
lm(formula = FEV ~ Age * Smoker + Sex + Height, data = fev)
Residuals:
Min
1Q
Median
3Q
Max
-1.3615 -0.2519
0.0117
0.2492
1.9112
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-4.24138
0.43447
-9.762
0.04907
0.02208
2.223
-0.17401
0.32234
-0.540
Sex[T.Male]
0.15974
0.03337
4.787
2.1e-06 ***
Height
0.10296
0.00499
20.634
< 2e-16 ***
Age:Smoker[T.Non]
0.01980
0.02401
0.825
Age
Smoker[T.Non]
< 2e-16 ***
0.0266 *
0.5895
0.4099
--Signif. codes:
• Click OK
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4123 on 648 degrees of freedom
Multiple R-squared:
0.7756,
F-statistic: 447.9 on 5 and 648 DF,
Adjusted R-squared:
p-value: < 2.2e-16
0.7739
• Type drop1(m4.int, test=“F”) and
• Click Submit
Single term deletions
Model:
FEV ~ Age * Smoker + Sex + Height
Df Sum of Sq
<none>
RSS
AIC
F value
Pr(>F)
110.16 -1152.86
Sex
1
Height
1
Age:Smoker
1
3.896 114.06 -1132.14
72.380 182.54
22.9158 2.099e-06 ***
-824.58 425.7473 < 2.2e-16 ***
0.116 110.28 -1154.18
0.6799
0.4099
Fitting main effects model
• Statistics >> Fit models >> Linear
model
• Type drop1(m4, test=“F”) and
• Click Submit
Single term deletions
Model:
FEV ~ Age + Smoker + Sex + Height
Df Sum of Sq
<none>
RSS
AIC F value
Pr(>F)
110.28 -1154.18
Age
1
8.099 118.38 -1109.83
Smoker
1
0.368 110.65 -1154.00
Sex
1
3.803 114.08 -1134.00
Height
1
81.505 191.78
47.666 1.206e-11 ***
2.168
0.1414
22.383 2.743e-06 ***
-794.28 479.660 < 2.2e-16 ***
--Signif. codes:
• Click OK
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Remove ‘Smoker’ from the model and fit
• Statistics >> Fit models
>> Linear model
• Type drop1(m3, test=“F”) and
• Click Submit
Single term deletions
Model:
FEV ~ Age + Sex + Height
Df Sum of Sq
<none>
RSS
AIC F value
Pr(>F)
110.65 -1154.00
Age
1
7.793 118.44 -1111.49
45.779 2.957e-11 ***
Sex
1
4.027 114.67 -1132.62
23.656 1.446e-06 ***
Height
1
82.287 192.94
-792.37 483.394 < 2.2e-16 ***
--Signif. codes:
1
• Click OK
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
Assess the model
Models >> Graphs >> Basic diagnostic plots
Interpretation
Call:
lm(formula = FEV ~ Age + Sex + Height, data = fev)
• About 78% of FEV variation is
explained by the regression model
Residuals:
Min
1Q
Median
3Q
Max
-1.37613 -0.24834
0.01051
0.25748
1.94538
• There is no evidence to show the
relationship between FEV and
smoking or second hand smoke.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.448560
0.222966 -19.952
Age
0.061364
0.009069
6.766 2.96e-11 ***
Sex[T.Male]
0.161112
0.033125
4.864 1.45e-06 ***
Height
0.104560
0.004756
Signif. codes:
1
21.986
< 2e-16 ***
< 2e-16 ***
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
• Male with FEV are significantly
higher than female
• Female Model:
FÊV = -4.4486 + 0.0614Age + 0.1046 Height
• Male Model:
Residual standard error: 0.4126 on 650 degrees of freedom
Multiple R-squared:
0.7746, Adjusted R-squared:
F-statistic: 744.6 on 3 and 650 DF,
0.7736
p-value: < 2.2e-16
FÊV = -4.4486 + 0.0614Age + 0.1611 + 0.1046 Height
= - 4.2875 + 0.0614Age + 0.1046 Height
Summary
Explore data
Specify &
fit the model
Modelling flow chart – after
Everything
significant?
Variable
selection
No
Yes
Try
alternatives:
Model validation
Check Assumptions:
Eg: Independence, Homogeneity
Normality, No influential observations
OK
Model
interpretation
Not
OK
Eg
GLM; GAM;
Mixed models:
GLMM; GAMM
Statistical models
R and R commander
• R is a language and powerful tool for statistical computing and
graphics
• R cannot be learned overnight.
• But
• There are many R courses online nowadays
• Google search gives answers for your questions
• R Commander provide a bridge for learning the full power of R.
• R Commander works very well for basic statistical modelling
• Some infrequent users find that R Commander meets their limited
needs.
Thank You
© Copyright 2026 Paperzz