REM Sleep in Mammals

James Bernhard
Math 260
REM Sleep in Mammals
In this paper, we develop a method to predict the percentage of REM sleep in
mammals, based on their body mass, brain mass, basal metabolic rate, gestation period,
and sleep exposure index.
In the first section, we give some background and
motivation. In the next section, we explain our analysis of the data, and in the last
section, we state the conclusions that the analysis led us to and assess the analysis.
Background
In 2006, Lesku et. al. [1] investigate relationships among average times for
various types of sleep in mammals. They extend existing approaches for understanding
how SWS and REM sleep times vary among mammals by using path analysis to
examine how anatomy, physiology, and ecology influence sleep patterns.
In reading their article, we thought that it would be interesting to use their data
to develop a method to predict a mammal’s proportion of REM sleep from physical and
ecological variables. We were more interested in predicting the mean over all mammals
with given characteristics, rather than predicting the mean for a particular mammal
species, so we decided to compute 95% confidence intervals for mean proportion REM
sleep rather than computing prediction intervals for individual species. However, our
method will easily allow us to compute prediction intervals as well.
Based on information in the article, we decided to use the following variables to
predict the proportion of REM sleep:
Variable
proportion of REM sleep
body mass
brain mass
basal metabolic rate
gestation period
sleep exposure index
Units
none
g
g
cubic cm of oxygen per hour
days
points
Abbreviation
REM
BoM
BrM
BMR
GP
SEI
The original data set consisted of 24 mammal species, but one of them (the goldenmantled ground squirrel) was missing data, so we omitted it, leaving us with 23
mammal species for our analysis, so we omitted about 4% of the species.
Analysis
To analyze the data, we used R [2] along with the car package [3] to carry out the
computations and generate the graphics.
By considering the context of the data along with some exploratory data analysis,
we decided that log transformations would be appropriate for all of the explanatory
variables. For lack of any particular preference, we used natural logarithms for them
all.
After transformations, we had the following scatterplot matrix for our collection
of variables.
We notice some heavy collinearity between several pairs of the explanatory
variables, particularly among body mass, brain mass, and basal metabolic rate.
However, this is to be expected, and we are not planning to interpret individual
coefficients in the fitted model equation but are instead planning to use the fitted model
for predictions, so this is not a problem.
We don’t notice any other particular patterns or outliers needing investigation,
so we continue with the analysis.
For the next step, we fit the model of type:
REM ~ log(BoM) + log(BrM) + log(BMR) + log(GP) + log(SEI).
Because all of the explanatory variables are numerical and we have no particular reason
to suspect interaction terms will be necessary, we have not included any. We arrived at
this model by consideration of the context of the data and some basic exploratory data
analysis.
The fitted model equation is:
μ[REM|BoM, BrM, BMR, GP, SEI] = 0.35 + 0.02 log(BoM) + 0.03 log(BrM) 0.02 log(BMR) – 0.06 log(GP) – 0.02 log(SEI).
We have omitted units in this equation because of the log transformations. Also note
that some of the coefficients have the opposite signs from what we might expect (the
negative ones), and this is probably due to multicollinearity.
As another check for multicollinearity, we can compute the variance inflation
factors for the explanatory variables. We find they are:
log(BoM) VIF
log(BrM) VIF
log(BMR) VIF
log(GP) VIF
log(SEI) VIF
47.5
9.6
34.8
3.30
4.51
As we can see, there are a couple of these that are near of above 10, which also indicates
probable multicollinearity among those. Fortunately this is not an issue for our
purposes.
In order to check whether or not statistical inference (including the confidence
intervals that we will compute) will be valid for this model, we next check the sampling
variability assumptions.
To check the assumption of the normality of the conditional error terms, we
investigate a normal quantile plot of the standardized residuals for the fitted model.
In this plot, we see nothing troublesome. All of the points lie within the 95%
confidence bands, so there is no indication that the conditional error terms depart from
normality.
To assess the remaining sampling variability assumptions, we examine a
standardized residuals versus fitted plot.
In this plot, we see no signs that the conditional error term has nonconstant
variance, so the assumption that it has equal variance seems to hold. Also, we see no
patterns indicating a lack of independence among conditional error terms, so the
assumption of independence of the conditional error terms seems to hold. (Also, there
is no particular other order suggested by the context of the data that we should check
for a lack of independence.) In addition, the means of the conditional error terms don’t
seem to depart from 0, so that assumption appears satisfied.
We don’t notice any outliers in this plot. All 23 residuals appear to be within
nearly 2 standard deviations of the mean, as we might expect if the sampling variability
assumptions hold.
Since there don’t appear to be any major departures from the sampling
variability assumptions, statistical inference appears to be valid, and we can use our
model to make predictions. We give some illustrative examples of predictions of mean
percent REM sleep among mammals having the specified characteristics in the
following table:
BoM (g)
BrM (g)
BMR (cu. cm
GP
SEI
μ[REM]
oxygen/hr) (days) (points)
99
3
100
27
1
0.184 (95% CI: 0.158-0.210)
2980
110
1635
200
5
0.168 (95% CI: 0.132-0.204)
Note that the last column has no units because REM is a proportion. Also note that the
95% confidence interval is for the conditional mean of REM over all mammals with the
specified values for the explanatory variables.
From these values, we can see that small mammals with low basal metabolic
rates, short gestation periods, and low sleep exposure indices tend to have a higher
proportion of REM sleep time than larger mammals with higher basal metabolic rates,
longer gestation periods, and higher sleep exposure indices, as we might expect.
To assess the goodness of fit, we note that the multiple R2 for the fitted model is
0.60, meaning that the model accounts for 60% of the variability in REM found in our
data set.
Conclusions
In this paper, we developed a method to predict the mean percent of REM sleep
over all mammals with given physical and ecological characteristics. It uses the model
of type:
REM ~ log(BoM) + log(BrM) + log(BMR) + log(GP) + log(SEI),
where BoM denotes body mass, BrM denotes brain mass, BMR denotes basal metabolic
rate, GP denotes gestation period, and SEI denotes sleep exposure index. The fitted
model equation is:
μ[REM|BoM, BrM, BMR, GP, SEI] = 0.35 + 0.02 log(BoM) + 0.03 log(BrM) 0.02 log(BMR) – 0.06 log(GP) – 0.02 log(SEI).
We have omitted units in this equation because of the log transform.
While there did appear to be multicollinearity among our explanatory variables,
this was not an issue, since we were using the model for predictive purposes only.
There did not appear to be any major departures from the sampling variability
assumptions that would affect the validity of our statistical inferences.
In our predictions, we found that small mammals with low basal metabolic rates,
short gestation periods, and low sleep exposure indices tend to have a higher
proportion of REM sleep time than larger mammals with higher basal metabolic rates,
longer gestation periods, and higher sleep exposure indices, as we might expect.
While most of the standard concerns for multiple linear regression did not seem
problematic in this analysis, one possible difficulty is the quality of the data. It was
culled from many different sources, and the physical and ecological characteristics of
the various mammal species given here were only estimates. In the data source, some
of these estimates had standard errors and some did not; we did not incorporate any of
this information into our analysis.
Bibliography
[1] J. Lesku, T. Roth II, C. Amlaner, and S. Lima, A Phylogenetic Analysis of Sleep
Architecture in Mammals: The Integration of Anatomy, Physiology, and Ecology, The
American Naturalist, Vol. 168, No. 4 (2006), 441-453.
[2] R Development Core Team (2011). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
[3] John Fox and Sanford Weisberg (2011). An R Companion to Applied
Regression, Second Edition. Thousand Oaks CA: Sage. URL:
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion