James Bernhard Math 260 REM Sleep in Mammals In this paper, we develop a method to predict the percentage of REM sleep in mammals, based on their body mass, brain mass, basal metabolic rate, gestation period, and sleep exposure index. In the first section, we give some background and motivation. In the next section, we explain our analysis of the data, and in the last section, we state the conclusions that the analysis led us to and assess the analysis. Background In 2006, Lesku et. al. [1] investigate relationships among average times for various types of sleep in mammals. They extend existing approaches for understanding how SWS and REM sleep times vary among mammals by using path analysis to examine how anatomy, physiology, and ecology influence sleep patterns. In reading their article, we thought that it would be interesting to use their data to develop a method to predict a mammal’s proportion of REM sleep from physical and ecological variables. We were more interested in predicting the mean over all mammals with given characteristics, rather than predicting the mean for a particular mammal species, so we decided to compute 95% confidence intervals for mean proportion REM sleep rather than computing prediction intervals for individual species. However, our method will easily allow us to compute prediction intervals as well. Based on information in the article, we decided to use the following variables to predict the proportion of REM sleep: Variable proportion of REM sleep body mass brain mass basal metabolic rate gestation period sleep exposure index Units none g g cubic cm of oxygen per hour days points Abbreviation REM BoM BrM BMR GP SEI The original data set consisted of 24 mammal species, but one of them (the goldenmantled ground squirrel) was missing data, so we omitted it, leaving us with 23 mammal species for our analysis, so we omitted about 4% of the species. Analysis To analyze the data, we used R [2] along with the car package [3] to carry out the computations and generate the graphics. By considering the context of the data along with some exploratory data analysis, we decided that log transformations would be appropriate for all of the explanatory variables. For lack of any particular preference, we used natural logarithms for them all. After transformations, we had the following scatterplot matrix for our collection of variables. We notice some heavy collinearity between several pairs of the explanatory variables, particularly among body mass, brain mass, and basal metabolic rate. However, this is to be expected, and we are not planning to interpret individual coefficients in the fitted model equation but are instead planning to use the fitted model for predictions, so this is not a problem. We don’t notice any other particular patterns or outliers needing investigation, so we continue with the analysis. For the next step, we fit the model of type: REM ~ log(BoM) + log(BrM) + log(BMR) + log(GP) + log(SEI). Because all of the explanatory variables are numerical and we have no particular reason to suspect interaction terms will be necessary, we have not included any. We arrived at this model by consideration of the context of the data and some basic exploratory data analysis. The fitted model equation is: μ[REM|BoM, BrM, BMR, GP, SEI] = 0.35 + 0.02 log(BoM) + 0.03 log(BrM) 0.02 log(BMR) – 0.06 log(GP) – 0.02 log(SEI). We have omitted units in this equation because of the log transformations. Also note that some of the coefficients have the opposite signs from what we might expect (the negative ones), and this is probably due to multicollinearity. As another check for multicollinearity, we can compute the variance inflation factors for the explanatory variables. We find they are: log(BoM) VIF log(BrM) VIF log(BMR) VIF log(GP) VIF log(SEI) VIF 47.5 9.6 34.8 3.30 4.51 As we can see, there are a couple of these that are near of above 10, which also indicates probable multicollinearity among those. Fortunately this is not an issue for our purposes. In order to check whether or not statistical inference (including the confidence intervals that we will compute) will be valid for this model, we next check the sampling variability assumptions. To check the assumption of the normality of the conditional error terms, we investigate a normal quantile plot of the standardized residuals for the fitted model. In this plot, we see nothing troublesome. All of the points lie within the 95% confidence bands, so there is no indication that the conditional error terms depart from normality. To assess the remaining sampling variability assumptions, we examine a standardized residuals versus fitted plot. In this plot, we see no signs that the conditional error term has nonconstant variance, so the assumption that it has equal variance seems to hold. Also, we see no patterns indicating a lack of independence among conditional error terms, so the assumption of independence of the conditional error terms seems to hold. (Also, there is no particular other order suggested by the context of the data that we should check for a lack of independence.) In addition, the means of the conditional error terms don’t seem to depart from 0, so that assumption appears satisfied. We don’t notice any outliers in this plot. All 23 residuals appear to be within nearly 2 standard deviations of the mean, as we might expect if the sampling variability assumptions hold. Since there don’t appear to be any major departures from the sampling variability assumptions, statistical inference appears to be valid, and we can use our model to make predictions. We give some illustrative examples of predictions of mean percent REM sleep among mammals having the specified characteristics in the following table: BoM (g) BrM (g) BMR (cu. cm GP SEI μ[REM] oxygen/hr) (days) (points) 99 3 100 27 1 0.184 (95% CI: 0.158-0.210) 2980 110 1635 200 5 0.168 (95% CI: 0.132-0.204) Note that the last column has no units because REM is a proportion. Also note that the 95% confidence interval is for the conditional mean of REM over all mammals with the specified values for the explanatory variables. From these values, we can see that small mammals with low basal metabolic rates, short gestation periods, and low sleep exposure indices tend to have a higher proportion of REM sleep time than larger mammals with higher basal metabolic rates, longer gestation periods, and higher sleep exposure indices, as we might expect. To assess the goodness of fit, we note that the multiple R2 for the fitted model is 0.60, meaning that the model accounts for 60% of the variability in REM found in our data set. Conclusions In this paper, we developed a method to predict the mean percent of REM sleep over all mammals with given physical and ecological characteristics. It uses the model of type: REM ~ log(BoM) + log(BrM) + log(BMR) + log(GP) + log(SEI), where BoM denotes body mass, BrM denotes brain mass, BMR denotes basal metabolic rate, GP denotes gestation period, and SEI denotes sleep exposure index. The fitted model equation is: μ[REM|BoM, BrM, BMR, GP, SEI] = 0.35 + 0.02 log(BoM) + 0.03 log(BrM) 0.02 log(BMR) – 0.06 log(GP) – 0.02 log(SEI). We have omitted units in this equation because of the log transform. While there did appear to be multicollinearity among our explanatory variables, this was not an issue, since we were using the model for predictive purposes only. There did not appear to be any major departures from the sampling variability assumptions that would affect the validity of our statistical inferences. In our predictions, we found that small mammals with low basal metabolic rates, short gestation periods, and low sleep exposure indices tend to have a higher proportion of REM sleep time than larger mammals with higher basal metabolic rates, longer gestation periods, and higher sleep exposure indices, as we might expect. While most of the standard concerns for multiple linear regression did not seem problematic in this analysis, one possible difficulty is the quality of the data. It was culled from many different sources, and the physical and ecological characteristics of the various mammal species given here were only estimates. In the data source, some of these estimates had standard errors and some did not; we did not incorporate any of this information into our analysis. Bibliography [1] J. Lesku, T. Roth II, C. Amlaner, and S. Lima, A Phylogenetic Analysis of Sleep Architecture in Mammals: The Integration of Anatomy, Physiology, and Ecology, The American Naturalist, Vol. 168, No. 4 (2006), 441-453. [2] R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. [3] John Fox and Sanford Weisberg (2011). An R Companion to Applied Regression, Second Edition. Thousand Oaks CA: Sage. URL: http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
© Copyright 2025 Paperzz