STATISTICAL LABORATORY, May 12th, 2010 BODY

STATISTICAL LABORATORY, May 12th, 2010
BODY TEMPERATURE DATA
Mario Romanazzi
1
INTRODUCTION
The data are taken from the article What’s Normal? – Temperature, Gender, and Heart Rate, by Allen
Shoemaker, appeared in the Journal of Statistics Education (1996). The data set includes data of a
sample of 130 people on three variables
1. body temperature (degrees Fahrenheit),
2. gender (1: male, 2: female),
3. heart rate (beats per minute).
These data can be used to address the following questions.
ˆ Is the distribution of body temperatures normal?
ˆ Is the true population mean really 98.6 degrees F (corresponding to 37 degrees C)?
ˆ At what temperature should we consider someone’s temperature to be abnormal ?
ˆ Is there a significant difference between males and females in normal temperature?
ˆ Is there a relationship between body temperature and heart rate?
2
DATA INPUT
> temp <- read.table("http://venus.unive.it/romanaz/statistics/data/bodytemp.txt",
+
header = TRUE)
> str(temp)
’data.frame’:
$ tempf: num
$ gen : int
$ batt : int
130 obs. of 3 variables:
96.3 96.7 96.9 97 97.1 97.1 97.1 97.2 97.3 97.4 ...
1 1 1 1 1 1 1 1 1 1 ...
70 71 74 80 73 75 82 64 69 70 ...
> summary(data)
tempf
Min.
: 96.30
1st Qu.: 97.80
Median : 98.30
Mean
: 98.25
3rd Qu.: 98.70
Max.
:100.80
gen
Min.
:1.0
1st Qu.:1.0
Median :1.5
Mean
:1.5
3rd Qu.:2.0
Max.
:2.0
batt
Min.
:57.00
1st Qu.:69.00
Median :74.00
Mean
:73.76
3rd Qu.:79.00
Max.
:89.00
tempc
Min.
:35.72
1st Qu.:36.56
Median :36.83
Mean
:36.81
3rd Qu.:37.06
Max.
:38.22
The structure is a data table with n = 130 rows (observed people, statistical units) and 3 columns
(the variables)
1
3
DATA TRANSFORMATION
3
2
DATA TRANSFORMATION
We first transform degrees Fahrenheit to degrees Celsius. Recall that
T (C) = 5(T (F ) − 32)/9.
Moreover, we consider separately male and female temperatures.
> tempc <- 5 * (temp$tempf - 32)/9
> tc_m <- tempc[temp$gen == 1]
> tc_f <- tempc[temp$gen == 2]
4
DATA SUMMARY AND DISTRIBUTIONAL PLOTS
> summary(tc_m)
Min. 1st Qu.
35.72
36.44
Median
36.72
Mean 3rd Qu.
36.72
37.00
Max.
37.50
> stem(tc_m, scale = 0.5)
The decimal point is at the |
35
36
36
37
37
|
|
|
|
|
79
1122223333344444
566666677777777888888999999
0000001111122223344
5
> summary(tc_f)
Min. 1st Qu.
35.78
36.67
Median
36.89
Mean 3rd Qu.
36.89
37.11
Max.
38.22
> stem(tc_f, scale = 0.5)
The decimal point is at the |
35
36
36
37
37
38
|
|
|
|
|
|
89
02234
55666666777777888888888999999
00001111111111111222333344
78
2
Is the distribution of body temperature normal? First, we produce a qq-plot of the sample data,
comparing sample order statistics to the corresponding expectations under normality. Second, we perform
the Kolmogorov-Smirnov formal test of normality.
> qqnorm(scale(tc_m), pch = 20, xlab = "Theoretical Normal Quantiles",
+
main = "Body Temperature - Males (n = 65)")
> qqline(scale(tc_m), col = "red", lwd = 2)
> ks.test(tc_m, "pnorm", mean(tc_m), sd(tc_m))
5
TESTING POPULATION MEAN TEMPERATURE
3
One-sample Kolmogorov-Smirnov test
data: tc_m
D = 0.0685, p-value = 0.9204
alternative hypothesis: two-sided
> qqnorm(scale(tc_f), pch = 20, xlab = "Theoretical Normal Quantiles",
+
main = "Body Temperature - Females (n = 65)")
> qqline(scale(tc_f), col = "red", lwd = 2)
> ks.test(tc_f, "pnorm", mean(tc_f), sd(tc_f))
One-sample Kolmogorov-Smirnov test
data: tc_f
D = 0.1078, p-value = 0.4365
alternative hypothesis: two-sided
Body Temperature − Females (n = 65)
2
Body Temperature − Males (n = 65)
●
●
3
●
●
●
●
●● ●
●
●
2
1
●
●● ●
●●
●●●●●●
●
●●●●
●
●●●
●●
● ●●●
●
1
● ●
●●
●●
●
●●●●●●●
●●●●●●
●●●●
●
●●●●●
●●●
●●●●●●
●
●●●●●
●●●
0
Sample Quantiles
0
●●●●
●●
●●●●●●
●●
−1
Sample Quantiles
●●
●●●●
●●
● ●●
−1
●
●
●●
●
● ● ●
●
●
●
−2
−2
●
●
●
●
●
●
●
−2
−1
0
1
Theoretical Normal Quantiles
2
−2
−1
0
1
2
Theoretical Normal Quantiles
Both the qq-plots and the Kolmogorov-Smirnov test (p-values are very high) suggest a good agreement
of the sample data with the hypothesis of a normal distribution. Only a few possible outliers in the female
sample are apparent.
5
TESTING POPULATION MEAN TEMPERATURE
Is the true population mean of body temperature really 37 degrees C? We study this problem by means
of confidence intervals and significance tests, working separately on males and females.
> t.test(tc_m, mu = 37, alternative = "two.sided", conf.level = 0.95)
One Sample t-test
6
GENDER COMPARISON
4
data: tc_m
t = -5.7158, df = 64, p-value = 3.084e-07
alternative hypothesis: true mean is not equal to 37
95 percent confidence interval:
36.62860 36.82098
sample estimates:
mean of x
36.72479
The results are very clear: the confidence interval does not include the value under test and the p-value
of the test is very low. Empirical evidence consistently suggests population mean temperature of males
to be less than 37 degrees C.
> t.test(tc_f, mu = 37, alternative = "two.sided", conf.level = 0.95)
One Sample t-test
data: tc_f
t = -2.2355, df = 64, p-value = 0.02888
alternative hypothesis: true mean is not equal to 37
95 percent confidence interval:
36.78312 36.98782
sample estimates:
mean of x
36.88547
Here, results are not so clear-cut. The 95% confidence interval still does not include the value under
test, but the p-value is between the classical thresholds of 1% and 5%. It would be important to check
the role of the previoulsy noted outlying data.
6
GENDER COMPARISON
The sample data suggest that body mean temperature could be higher in the female population. Is the
difference real or can it be attributed to sample (chance) error? We examine the problem first with
a notched paired boxplot and then with a formal test. Recall that a notched boxplot displays a 95%
confidence interval for the population median.
> boxplot(tc_m, tc_f, notch = TRUE, horizontal = TRUE, xlab = "Body Temperature
+
names = c("M", "F"), col = "lavender")
(Degrees Celsius)",
The plot, while confirming tendency of females to have slightly higher temperatures than males, shows
that confidence intervals are a bit overlapping, therefore the hypothesis of equal location (medians) of
male and female populations is not rejected.
> t.test(tc_m, tc_f, alternative = "two.sided", paired = FALSE,
+
var.equal = FALSE, conf.level = 0.95)
Welch Two Sample t-test
data: tc_m and tc_f
t = -2.2854, df = 127.51, p-value = 0.02394
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
GENDER COMPARISON
5
●
●
●
M
F
6
36.0
36.5
37.0
37.5
38.0
Body Temperature (Degrees Celsius)
-0.29980476 -0.02156277
sample estimates:
mean of x mean of y
36.72479 36.88547
Neither the test is decisive because the p-value is between 1% and 5%.