HS853 Oct 10, 2016
Hands-on Experiment (Do-It-Yourself)
(1) Choose/Load your data (You can consider some of the case-studies:
https://umich.instructure.com/courses/38100/files/folder/Case_Studies)
a. E.g., Cancer hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv"). This includes the
following variables:
tumorsize, co2, pain, wound, mobility, ntumors, nmorphine, remission, lungcapacity, Age,
Married, FamilyHx, SmokingHx, Sex, CancerStage, LengthofStay, WBC, RBC, BMI, IL6, CRP,
DID, Experience, School, Lawsuits, HID, Medicaid)
(2) Select Features – for your specific dataset
a. Outcome (e.g., remission),
b. fixed (e.g., SmokingHx , IL6) effects, and
c. random (e.g., Doctor ID=DID, Hospital ID=HID)) effects
(3) Some Exploratory Data Analytics/Plots:
a. box-plots (e.g., ggplot(data = hdp, aes(x=variable, y=value)) + geom_boxplot(aes(fill=DID)))
b. violin plots
tmp <- melt(hdp[, c("CancerStage", "IL6", "CRP")], id.vars="CancerStage")
ggplot(tmp, aes(x = CancerStage, y = value)) +
geom_jitter(alpha = .1) +
geom_violin(alpha = .75) +
facet_grid(variable ~ .) +
scale_y_sqrt()
)
(4) Fit some Models
a. Linear (fixed): e.g., mod_lin <- lm(as.numeric(CancerStage) ~ x)
#linear
b. Log (fixed): e.g., mod_log <- lm(y ~ log(x)) # Logarithmic
c. Mixed (linear): e.g., glmer(remission ~ IL6 + CRP + CancerStage + LengthofStay + Experience
+ (1|DID) + (1|HID)# random effects for Doctor and Hospital IDs
d. Mixed (cubic): e.g., glmer(remission ~ IL6 + I(IL6 ^3) + CRP + Experience + (1|DID) + (1+
CRP|HID)
(5) Model Assessment
a. Graphical: e.g.,
prd <- data.frame(x = seq(140, 290, by = 5))
result <- prd # get the result object initialized
result$mod_lin <- predict(mod_lin, newdata = prd)
result$mod_log <- predict(mod_log, newdata = prd)
result <- melt(result, id.vars="x", variable.name="model", value.name="fitted")
ggplot(result, aes(x = x, y = fitted)) +
theme_bw() +
geom_point(data = as.data.frame(cbind(x,y)), aes(x = x, y = y)) +
geom_line(aes(colour = model), size = 1)
b. Quantitative
lmer.model.0 <- lmer(Weight ~ Height + (1|Team) + (1|Position), data=data, REML=FALSE)
lmer.model.1 <- lmer(Weight ~ Height + Age + (1|Team) + (1|Position), data=data, REML=FAL
SE)
anova(lmer.model.0, lmer.model.1)
Specific Example using the HDP Cancer Dataset
library(reshape2)
library(ggplot2)
library("lme4")
hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv")
hdp <- within(hdp, {
Married <- factor(Married, levels = 0:1, labels = c("no", "yes"))
DID <- factor(DID)
HID <- factor(HID)
})
# get the range of IL6
attach(hdp); summary(IL6)
x <- IL6
y <- tumorsize # try with various other models, outcomes & predictors (see table below)
prd <- data.frame(x = seq(0,24, by = 24/(8525-1))) # length(IL6)== 8525
result <- prd
# get the result object initialized
mod_lin <- lm(tumorsize ~ IL6)
#linear
mod_log <- lm(tumorsize ~ log(IL6)) # Logarithmic
mod_glmer <- glmer(tumorsize ~ IL6 + CRP + CancerStage + LengthofStay + Experience + (1|DID) +
(1|HID))
mod_glmer3 <- glmer(tumorsize ~ IL6 + I(IL6 ^3) + CRP + Experience + (1|DID) + (1+ CRP|HID))
result$mod_lin <- predict(mod_lin, newdata = prd)
result$mod_log <- predict(mod_log, newdata = prd)
result$mod_glmer <- predict(mod_glmer, newdata = prd)
result$mod_glmer3 <- predict(mod_glmer3, newdata = prd)
head(result)
# Transform Result Obj: Wide to Long (for x=Weight)
result <- melt(result, id.vars="x", variable.name="model", value.name="fitted")
head(result)
myGraphicsObject
myGraphicsObject
myGraphicsObject
= x, y = y))
myGraphicsObject
myGraphicsObject
<- ggplot(result, aes(x = x, y = fitted))
<- myGraphicsObject + theme_bw()
<- myGraphicsObject + geom_point(data = as.data.frame(cbind(x,y)), aes(x
<- myGraphicsObject + geom_line(aes(colour=model), size=2)
# display the actual plot
Note that if you want to extract only ONE model and overlay it on the data you can specify
which model(s) you only want to overlay on the data (in this case only the linear model:
mod_lin).
myGraphicsObject <- ggplot(result[grep("mod_lin", result$model, ignore.case=T),], aes(x =
x, y = fitted))
myGraphicsObject <- myGraphicsObject + theme_bw()
myGraphicsObject <- myGraphicsObject + geom_point(data = as.data.frame(cbind(x,y)), aes(x
= x, y = y))
myGraphicsObject <- myGraphicsObject + geom_line(aes(colour=model), size=2)
myGraphicsObject
# display the actual plot
Examine the statistical significance of IL6 as a (fixed effect) predictor of tumorsize (controlling for the
random effects of Hospital and Doctors, and fixed effects of CRP + CancerStage + LengthofStay +
Experience)
lmer.model.IL6 <- mod_glmer <- glmer(tumorsize ~ IL6 + CRP + CancerStage + LengthofStay + Experience
+ (1|DID) + (1|HID))
lmer.model.noIL6 <- mod_glmer <- glmer(tumorsize ~ CRP + CancerStage + LengthofStay + Experience +
(1|DID) + (1|HID))
anova(lmer.model.IL6, lmer.model.noIL6)
Df
AIC
BIC logLik deviance Chisq Chi Df Pr(>Chisq)
lmer.model.noIL6 10 65270 65341 -32625
65250
lmer.model.IL6
11 65272 65349 -32625
65250 0.334
1
0.5633
Many alternative models/studies using the HDP dataset can be completed using the summary
below.
Variable
Tumor size
Type
Continuous, Gaussian
(correlated with CO2,
multivariate multilevel
candidate)
Continuous, Gaussian
(correlated with Tumor
CO2 (percents)
size, multivariate
multilevel candidate)
Pain
Wound
Mobility
Number of
tumors (right
censored)
Continuous, Gaussian
cut to be integer ranging
from 1 to 10
Continuous, Gaussian
cut to be integer ranging
from 1 to 10
Continuous, Gaussian
cut to be integer ranging
from 1 to 10
Count, Poisson, right
censored at 9
Number of selfCount, Poisson, zeroadministered
inflated
morphine doses
L1 Predictors
L2 Predictors
L3 Predictors
Age
FamilyHx
LengthofStay
(random by DID)
SmokingHx
CancerStage
SmokingHx x Age
DID
DID
Age
FamilyHx
LengthofStay
(random by DID)
SmokingHx
CancerStage
SmokingHx x Age
Sex
IL6
CRP
IL6 x CRP
DID
DID
Experience
HID
Married
Sex
RBC (random by
DID)
Sex x Married
Age
BMI
DID
Experience
School
Lawsuits
HID
Medicaid
Age
FamilyHx
LengthofStay
BMI
IL6
CRP
SmokingHx
CancerStage
BMI x FamilyHx
Experience x
CancerStage
DID
Experience
x
CancerStage
HID
Zero-inflation (binomial
model)
Zero-inflation
(binomial model)
Age
SmokingHx
CancerStage
Cancer in
remission
Dichotomized normal
(TRUE/FALSE)
Proportion of
optimal lung
capacity
Continuous, Beta
DID
Poisson model
DID
Age
LengthofStay
(random by DID)
FamilyHx
IL6
CRP
CancerStage
DID
Experience
RBC
IL6 x CRP
SmokingHx x
FamilyHx
DID
Experience
School
Poisson model
Age
Sex
LengthofStay
BMI (random by
DID)
CancerStage
HID
© Copyright 2026 Paperzz