The R project for Comparisons of Several Multivariate Means

The R project for Comparisons of Several
Multivariate Means
Chu-yu Chung
Hang Du
Yi Su
Xiangmin Zhang
December 7, 2009
Abstract
Comparisons of multivariate means involve hypothesis testing, constructing simultaneous confidence intervals (SCI) and decomposing variances under certain condition. In this project, we write five individual R
functions to perform such tasks, including paired comparison, a repeated
measure design for comparing treatments, comparing mean vectors from
two multivariate population and comparing several multivariate population means. Our R functions are designed to largely facilitate the computation and to produce as much information as needed in practice.
1
Introduction
Multivariate hypothesis testing is different from univariate testing in many
ways. It makes use of multivariate normal assumption, which is more appropriate in many practical settings. It allows many possible alternatives.
The advantages of multivariate testing include preserving α-value and
testing with a greater power.
In section 1.1 to 1.5, we introduce how to formulate the testing when
comparing multivariate means. Either critical value for rejection or simultaneous confidence interval is presented. The notations are adapted
from Chapter 6 of Applied Multivariate Statistical Analysis (6th. ed.) by
Johnson R. A. and Wichern D. W.
1.1
Paired Comparison
Paired comparison is used to analyze measurements under different sets
of experimental conditions to estimate if the responses differ significantly
within these sets.
In multivariate paired comparison procedure, we label the responses as
X111 (variable 1 under treatment 1 in the first unit), . . ., X2np (variable
p under treatment 2 in the nth unit) to denote between p responses,
two treatments, and n experimental unites, hence the p paired-difference
random variables under j th unit become
1
Dj1 = X1j1 − X2j1
Dj2 = X1j2 − X2j2
...
Djp = X1jp − X2jp
(1)
Let Dj = (Dj1 , Dj2 , . . . , Djp ) for j = 1, 2, . . . , n, then


δ1
 δ2 
E(Dj ) = δ = 
... 
δp
(2)
Cov(Dj ) = Σd
(3)
The null hypothesis is that all the treatments have the same mean,
that is δ = 0, the alternative one is that the treatments have different
means.
If further assume D1 , D2 , . . . , Dn are independent Np (δ, Σd ) random
vectors, then
T 2 = n D̄ − δ
n
where D̄ =
1
n
X
Dj , and Sd =
′
D̄ − δ
S−1
d
n
1
n−1
j=1
X
Dj − D̄j
j=1
(4)
Dj − D̄j
′
is distributed as an ((n − 1) p/ (n − p)) Fp,n−p random variable.
And we reject H0 if
T2 >
(n − 1) p
Fp,n−p (α)
(n − p)
(5)
The 100(1 − α)% simultaneous confidence intervals for the individual
mean differences are
r
r
s2d
(n − 1) p
Fp,n−p (α)
(6)
(n − p)
n
The Bonferroni 100(1 − α)% simultaneous confidence intervals for the
individual mean differences are
δi : d¯i ±
δi : d¯i ± tn−1
1.2
α
2p
r
s2d
n
(7)
Repeated Measure Design Comparison
This is another generalization of the univariate paired t-statistics arising
in situations where q treatments are compared with respect to a single
response variable.
Assume the j th observation is


Xj1
 Xj2 
Xj = 
... 
Xjq
2
(8)
Assume all the population follows Nq (µ, Σx ). Let C be a contrast
matrix. An α level test of H0 : Cµ = 0 (equal treatment means) versus
H1 : Cµ 6= 0 is:
Reject H0 if
T 2 = n (Cx)
′
CxC
′
−1
(n − 1) (q − 1)
Fq−1,n−q+1 (α)
(n − q + 1)
(Cx) >
(9)
where Fq−1,n−q+1 is the upper (1 − α)th percentile of an F -distribution
with q−1 and n−q+1 d.f., x̄ =
1
n
n
X
xj and S =
1
n−1
j=1
n
X
′
(xj − x̄) (xj − x̄)
j=1
The 100(1−α)% simultaneous confidence intervals for a single contrast
′
c µ for any contrast vectors of interest
qare
′
c x̄ ±
q
(n−1)(q−1)
Fq−1,n−q+1
n−q+1
′
c Sc
n
(α)
1.3 Comparison of Two Multivariate Population
Means
We are going to compare the responses from one set of experimental settings (population 1) with independent response from another set of experimental settings (population 2) in this part. If X11 , X12 , . . . , X1n1 is
a random sample of size n1 from Np (µ1 , Σ) and X21 , X22 , . . . , X2n2 is
an independent random sample of size n2 from Np (µ2 , Σ), the likelihood
ratio test of
H0 : µ1 − µ2 = δ0
(10)
then
′ h 1
T 2 = X̄1 − X̄2 − (µ1 − µ2 )
n1
+
1
n2
Spooled
i−1 X̄1 − X̄2 − (µ1 − µ2 )
(11)
is distributed as
(n1 + n2 − 2) p
Fp,n1 +n2 −p−1
n1 + n2 − p − 1
(12)
where
n2 − 1
n1 − 1
S1 +
S2
(13)
n1 + n2 − 1
n1 + n2 − 1
and (n1 − 1) S1 is distributed as Wn1 −1 (Σ) and (n2 − 1) S2 is distributed
as Wn2 −1 (Σ).
Spooled =
The 100(1 − α)% simultaneous confidence interval for µ1i − µ2i is
r
X¯1i − X¯2i ± c
where c2 =
1
1
+
sii,pooled
n1
n2
(n1 +n2 −2)p
F
n1 +n2 −p−1 p,n1 +n2 −p−1
3
(14)
1.4 Comparison of Several Multivariate Population Means
MANOVA is a synthesis of analysis output for multivariate analysis. It is a
generalized form of univariate analysis of variance (ANOVA). MANOVA
table is used to identify sum of treatment effects and sum of residuals.
We will not delve into details here. A complete explanation of variance
decomposition and summary tables of MANOVA can be found in Chapter
6 of Applied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A.
and Wichern D. W.
1.5
Treatment Effect Comparison
In treatment efffect comparison, we first test if the treatment effects are
the same. When the hypothesis of equal treatment effects is rejected, we
will construct simultaneous confidence intervals for the components of the
differences of vector means. Treatment effect comparison is closed related
to MANOVA. Again, a complete discussion can be found in Chapter 6 of
Applied Multivariate Statistical Analysis (6th. ed.) by Johnson R. A. and
Wichern D. W.
2
Examples
For illustrative purpose, we run our functions on several datasets, all of
which accompany Chapter 6 of the book Applied Multivariate Statistical
Analysis(6th. ed.).
2.1
2.1.1
Paired Comparison
Example 1 (T6-1.dat)
Sample
1
2
3
4
5
6
7
8
9
10
11
x11j
6
6
18
8
11
34
28
71
43
33
20
x12j
27
23
64
44
30
75
26
124
54
30
14
x21j
25
28
36
35
15
44
42
54
34
29
39
Table 1: T6-1.dat
4
x22j
15
13
22
29
31
64
30
64
56
20
21
In the above table, the first two colunms are from treatment 1 and the
last two columms are from treatment 2.
The R output from running our function paired on this dataset is as
follows,
reject null hypothesis, nonzero mean difference exists
T Squared Based Simultaneous CI for difference
Estimate
LowerCI
UpperCI
1 -9.363636 -22.453272 3.726000
2 13.272727 -5.700119 32.245574
Bonferroni Based Simultaneous CI for difference
Estimate
LowerCI
UpperCI
1 -9.363636 -20.573107 1.845835
2 13.272727 -2.974903 29.520358
2.2
2.2.1
Repeated Measure Design Comparison
Example 2 (T6-2.dat)
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
x1
426
253
359
432
405
324
310
326
375
286
349
429
348
412
347
434
364
420
397
x2
609
236
433
431
426
438
312
326
447
286
382
410
377
473
326
458
367
395
556
x3
556
392
349
522
513
507
410
350
547
403
473
488
447
472
455
637
432
508
645
gender
600
395
357
600
513
539
456
504
548
422
497
547
514
446
468
524
469
531
625
Table 2: T6-2.dat
In the above table, each column represent data from an individual
treatment.
5
The R output from running our function repmeasure on this dataset
is as follows,
reject null hypothesis of equal treatment means
contrast matrix
[,1] [,2] [,3] [,4]
[1,]
-1
1
-1
1
[2,]
-1
-1
1
1
[3,]
-1
1
1
-1
Simultaneous CI for contrasts
Estimate
LowerCI
UpperCI
1 -206.32812 -282.19953 -130.4567
2 -306.92188 -415.73637 -198.1074
3
22.42188 -31.82305
76.6668
2.3 Comparison of Two Multivariate Population
Means
2.3.1
Example 3 (T6-9.dat)
Sample
1
2
.
.
.
23
24
25
26
.
.
.
47
48
x1
98
103
.
.
.
162
177
93
94
.
.
.
131
135
x2
81
84
.
.
.
124
132
74
78
.
.
.
95
106
x3
38
38
.
.
.
61
67
37
35
.
.
.
46
47
gender
female
female
.
.
.
female
female
male
male
.
.
.
male
male
Table 3: T6-9.dat
In the above table, the first 24 rows are data from population one(
gender = female), and the last 24 rows are data from population two(
gender = male).
The R output from running our function twopop on this dataset is as
follows,
6
mean vector of population one
4.900659 4.622909 3.940286
mean vector of population two
4.725444 4.477574 3.703186
reject equality of mean vectors
The coeffcient of the linear combination
of most responsible for rejection is
-43.72677 -8.710687 67.54641
T Squared Based Simultaneous CI for the difference
Estimate
LowerCI
UpperCI
1 0.1752157 0.05776762 0.2926638
2 0.1453352 0.05411666 0.2365537
3 0.2371000 0.12906223 0.3451377
Bonferroni
Estimate
1 0.1752157
2 0.1453352
3 0.2371000
2.3.2
Based Simultaneous CI for the difference
LowerCI
UpperCI
0.07702893 0.2734025
0.06907636 0.2215940
0.14678026 0.3274197
Example 4 (T6-12.dat)
Sample
1
2
.
.
.
24
25
26
27
.
.
.
49
50
x1
0.34
0.39
.
.
.
0.34
0.40
0.29
0.28
.
.
.
0.37
0.35
x2
3.71
5.08
.
.
.
4.27
4.58
5.04
3.95
.
.
.
5.23
5.37
x3
2.87
3.38
.
.
.
4.00
2.82
1.93
2.51
.
.
.
2.48
2.25
x4
30.87
43.85
.
.
.
50.35
32.48
33.85
35.82
.
.
.
34.86
35.07
gender
male
male
.
.
.
male
male
female
female
.
.
.
female
female
Table 4: T6-12.dat
In the above table, the first 25 rows are data from population one(
7
gender = male), and the last 25 rows are data from population two(
gender = female).
The R output from running our function twopop on this dataset is as
follows,
mean vector of population one
0.3136 5.1788 2.3152 38.1548
mean vector of population two
0.3972 5.3296 3.6876 49.4204
reject equality of mean vectors
The coeffcient of the linear combination
of most responsible for rejection is
-99.39898 6.375999 6.228141 -0.7908238
T Squared Based Simultaneous CI for the difference
Estimate
LowerCI
UpperCI
1 -0.0836 -0.1697234 0.002523361
2 -0.1508 -1.4650835 1.163483457
3 -1.3724 -1.8760572 -0.868742824
4 -11.2656 -17.1438597 -5.387340281
Bonferroni Based Simultaneous CI for the difference
Estimate
LowerCI
UpperCI
1 -0.0836 -0.1509852 -0.01621484
2 -0.1508 -1.1791296 0.87752962
3 -1.3724 -1.7664745 -0.97832550
4 -11.2656 -15.8649035 -6.66629645
2.4 Comparison of Several Multivariate Population Means
2.4.1
Example 5 (T6-12.dat)
Continue Example 4, we now demonstrate results of doing MANOVA on
the same dataset T6-12.dat. The R output from running our function
MANOVA is as follows,
Overall mean vector
[,1]
[,2]
[,3]
[,4]
[1,] 132.7528 133.3146 98.19101 50.46067
Treatment sample size
[,1] [,2] [,3]
[1,]
29
30
30
8
Treatment effect matrix
[,1]
[,2]
[,3]
[1,] -1.3734986 -0.3861423 1.7138577
[2,] 0.1336691 -0.6146067 0.4853933
[3,] 1.3262301 0.8756554 -2.1576779
[4,] 0.1255327 -0.2273408 0.1059925
One-Way MANOVA Table
Treatment SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 147.300878 26.752383 -173.908098 3.083107
[2,]
26.752383 18.918597 -42.424177 6.221813
[3,] -173.908098 -42.424177 213.678096 -8.005024
[4,]
3.083107
6.221813
-8.005024 2.344543
Error SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 1785.2609 174.1690 125.11034 289.05172
[2,] 174.1690 1904.2724 225.07586 178.87931
[3,] 125.1103 225.0759 2046.07471 -17.82644
[4,] 289.0517 178.8793 -17.82644 837.76782
Total SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 1932.56180 200.9213 -48.79775 292.13483
[2,] 200.92135 1923.1910 182.65169 185.10112
[3,] -48.79775 182.6517 2259.75281 -25.83146
[4,] 292.13483 185.1011 -25.83146 840.11236
Degrees of Freedom
Treatment Error Total
1
2
80
88
Bonferroni Based Simultaneous CI for Treatments’ Difference
Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI
1
1
-1
0
-0.987 -4.451
2.476
2
1
0
-1
-3.087 -6.551
0.376
3
0
1
-1
-2.100 -5.563
1.363
4
1
-1
0
0.748 -2.829
4.325
5
1
0
-1
-0.352 -3.929
3.225
6
0
1
-1
-1.100 -4.677
2.477
7
1
-1
0
0.451 -3.257
4.158
8
1
0
-1
3.484 -0.224
7.191
9
0
1
-1
3.033 -0.674
6.741
10
1
-1
0
0.353 -2.020
2.725
11
1
0
-1
0.020 -2.353
2.392
12
0
1
-1
-0.333 -2.706
2.039
9
2.4.2
Example 6 (T6-13.dat)
Sample
1
2
.
.
.
29
30
31
32
.
.
.
59
60
61
62
.
.
.
89
90
x1
131
125
.
.
.
131
124
124
133
.
.
.
135
130
137
129
.
.
.
138
138
x2
138
131
.
.
.
136
138
138
134
.
.
.
132
128
141
133
.
.
.
133
133
x3
89
92
.
.
.
114
101
101
97
.
.
.
98
101
96
93
.
.
.
100
91
x4
49
48
.
.
.
54
46
48
48
.
.
.
54
51
52
47
.
.
.
55
46
Group
1
1
.
.
.
1
1
2
2
.
.
.
2
2
3
3
.
.
.
3
3
Table 5: T6-13.dat
In the above table, the first 30 rows are data from group 1, the next
30 rows are data from group 2 and the last 30 rows are from data from
group 3.
The R output from running our function MANOVA on this dataset is as
follows,
Overall mean vector
[,1]
[,2]
[,3]
[,4]
[1,] 0.3554 5.2542 3.0014 43.7876
Treatment sample size
[,1] [,2]
[1,]
25
25
Treatment effect matrix
[,1]
[,2]
[1,] -0.0418 0.0418
[2,] -0.0754 0.0754
[3,] -0.6862 0.6862
[4,] -5.6328 5.6328
10
One-Way MANOVA Table
Treatment SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 0.087362 0.157586
1.434158
11.77255
[2,] 0.157586 0.284258
2.586974
21.23566
[3,] 1.434158 2.586974 23.543522 193.26137
[4,] 11.772552 21.235656 193.261368 1586.42179
Error SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 0.404480
5.378180
0.854764
4.328096
[2,] 5.378180 94.196160
2.597532 113.078548
[3,] 0.854764
2.597532 13.833280 105.750500
[4,] 4.328096 113.078548 105.750500 1884.311320
Total SS&CP matrix
[,1]
[,2]
[,3]
[,4]
[1,] 0.491842
5.535766
2.288922
16.10065
[2,] 5.535766 94.480418
5.184506 134.31420
[3,] 2.288922
5.184506 37.376802 299.01187
[4,] 16.100648 134.314204 299.011868 3470.73311
Degrees of Freedom
Treatment Error Total
1
1
46
49
Bonferroni Based Simultaneous CI for Treatments’ Difference
Trt.1 Trt.2 Estimate LowerCI UpperCI
1
1
-1
-0.084 -0.151 -0.016
2
1
-1
-0.151 -1.179
0.878
3
1
-1
-1.372 -1.766 -0.978
4
1
-1 -11.266 -15.865 -6.666
2.5
Treatment Effect Comparison
Example 7 and Example 8 use dataset T6-12.dat and T6-13.dat (the same
as in the last subsection).
2.5.1
Example 7 (T6-12.dat)
The R output from running our function trt.effect is as follows,
Bonferroni Based Simultaneous CI for Treatments’ Difference
Trt.1 Trt.2 Estimate LowerCI UpperCI
1
1
-1
-0.084 -0.151 -0.016
2
1
-1
-0.151 -1.179
0.878
3
1
-1
-1.372 -1.766 -0.978
4
1
-1 -11.266 -15.865 -6.666
11
2.5.2
Example 8 (T6-13.dat)
The R output from running our function trt.effect is as follows,
Bonferroni Based Simultaneous CI for Treatments’ Difference
Trt.1 Trt.2 Trt.3 Estimate LowerCI UpperCI
1
1
-1
0
-1.000 -4.442
2.442
2
1
0
-1
-3.100 -6.542
0.342
3
0
1
-1
-2.100 -5.542
1.342
4
1
-1
0
0.900 -2.674
4.474
5
1
0
-1
-0.200 -3.774
3.374
6
0
1
-1
-1.100 -4.674
2.474
7
1
-1
0
0.100 -3.680
3.880
8
1
0
-1
3.133 -0.647
6.913
9
0
1
-1
3.033 -0.747
6.813
10
1
-1
0
0.300 -2.061
2.661
11
1
0
-1
-0.033 -2.395
2.328
12
0
1
-1
-0.333 -2.695
2.028
3
3.1
Appendix (R code)
Paired Comparison
In this part, x1 is a n×p numeric matrix or dataframe of data of responses
under treatment 1 where n is number of experimental unit and p is number
of responses; x2 is a n×p numeric matrix or dataframe of data of responses
under treatment 2 where n is number of experimental unit and p is number
of responses,and the input level is the confidence level of interval.
paired<-function (x1, x2, level)
{
p <- ncol(x1)
n <- nrow(x1)
d <- x1 - x2
dbar <- apply(d, 2, mean)
s <- cov(d)
tsq <- n * t(dbar) %*% solve(s) %*% dbar
csq <- (n - 1) * p/(n - p) * qf(level, p, n - p)
if (tsq > csq)
cat("\n reject null hypothesis, nonzero mean difference exists \n")
else cat("do not reject null hypothesis, nonzero mean difference does not exist\n")
scit <- matrix(rep(0, p * 3), nrow = p)
scib <- matrix(rep(0, p * 3), nrow = p)
for (i in 1:p) {
scit[i, 1] <- dbar[i]
scit[i, 2] <- dbar[i] - sqrt(s[i, i]/n * csq)
scit[i, 3] <- dbar[i] + sqrt(s[i, i]/n * csq)
scib[i, 1] <- dbar[i]
scib[i, 2] <- dbar[i] - qt(1 - (1 - level)/(2 * p), n 1) * sqrt(s[i, i]/n)
scib[i, 3] <- dbar[i] + qt(1 - (1 - level)/(2 * p), n -
12
1) * sqrt(s[i, i]/n)
}
scit <- data.frame(Estimate = scit[, 1], LowerCI = scit[,
2], UpperCI = scit[, 3])
scib <- data.frame(Estimate = scib[, 1], LowerCI = scib[,
2], UpperCI = scib[, 3])
cat("\n T Squared Based Simultaneous CI for difference \n")
print(scit)
cat("\n Bonferroni Based Simultaneous CI for difference \n")
print(scib)
}
3.2
Repeated Measure Design Comparison
In the part,x is a n × q matrix or dataframe where n is number of experimental unit; q is number of treatment; and C is the Contrast matrix.
The input level is the confidence level of interval.
repmeasure<-function (x, C, level)
{
q <- ncol(x)
n <- nrow(x)
xbar <- apply(x, 2, mean)
xbar.new <- C %*% xbar
s <- cov(x)
s.new <- C %*% s %*% t(C)
tsq <- n * t(xbar.new) %*% solve(s.new) %*% xbar.new
csq <- (n - 1) * (q - 1)/(n - q + 1) * qf(level, q - 1, n q + 1)
if (tsq > csq)
cat("\n reject null hypothesis of equal treatment means \n\n")
else cat("\n do not reject null hypothesis \n\n")
m <- nrow(C)
sci <- matrix(rep(0, m * 3), nrow = m)
for (i in 1:m) {
sci[i, 1] <- xbar.new[i]
sci[i, 2] <- xbar.new[i] - sqrt(csq/n * s.new[i, i])
sci[i, 3] <- xbar.new[i] + sqrt(csq/n * s.new[i, i])
}
sci <- data.frame(Estimate = sci[, 1], LowerCI = sci[, 2],
UpperCI = sci[, 3])
cat(" contrast matrix \n")
print(C)
cat("\n Simultaneous CI for contrasts \n")
print(sci)
}
13
3.3 Comparison of Two Multivariate Population
Means
In this part, x1 is a n1 × p numeric matrix or dataframe of data from
population one where n1 is sample size and p is the number of responses.
x2 is a n2 × p numeric matrix or dataframe of data from population two
where n2 is sample size and p is the number of responses. The input level
is the confidence level of interval.
twopop<-function (x1, x2, level)
{
p <- ncol(x1)
n1 <- nrow(x1)
n2 <- nrow(x2)
x1bar <- apply(x1, 2, mean)
x2bar <- apply(x2, 2, mean)
cat("\n mean vector of population one \n", x1bar)
cat("\n\n mean vector of population two \n", x2bar)
s1 <- cov(x1)
s2 <- cov(x2)
s.pool <- (n1 - 1)/(n1 + n2 - 2) * s1 + (n2 - 1)/(n1 + n2 2) * s2
tsq <- t(x1bar - x2bar) %*% solve((1/n1 + 1/n2) * s.pool) %*%
(x1bar - x2bar)
csq <- (n1 + n2 - 2) * p/(n1 + n2 - p - 1) * qf(level, p,
n1 + n2 - p - 1)
if (tsq > csq) {
cat("\n\n reject equality of mean vectors\n\n")
cat("The coeffcient of the linear combination \n of most responsible for rejection is \n\
solve(s.pool) %*% (x1bar - x2bar))
}
else cat("\n\n do not reject equality of mean vectors\n\n")
scit <- matrix(rep(0, p * 3), nrow = p)
scib <- matrix(rep(0, p * 3), nrow = p)
for (i in 1:p) {
scit[i, 1] <- x1bar[i] - x2bar[i]
scit[i, 2] <- x1bar[i] - x2bar[i] - sqrt(csq) * sqrt((1/n1 +
1/n2) * s.pool[i, i])
scit[i, 3] <- x1bar[i] - x2bar[i] + sqrt(csq) * sqrt((1/n1 +
1/n2) * s.pool[i, i])
scib[i, 1] <- x1bar[i] - x2bar[i]
scib[i, 2] <- x1bar[i] - x2bar[i] - qt(1 - (1 - level)/(2 *
p), n1 + n2 - 2) * sqrt((1/n1 + 1/n2) * s.pool[i,
i])
scib[i, 3] <- x1bar[i] - x2bar[i] + qt(1 - (1 - level)/(2 *
p), n1 + n2 - 2) * sqrt((1/n1 + 1/n2) * s.pool[i,
i])
}
scit <- data.frame(Estimate = scit[, 1], LowerCI = scit[,
2], UpperCI = scit[, 3])
scib <- data.frame(Estimate = scib[, 1], LowerCI = scib[,
14
2], UpperCI = scib[, 3])
cat("\n\n T Squared Based Simultaneous CI for the difference \n")
print(scit)
cat("\n\n Bonferroni Based Simultaneous CI for the difference \n")
print(scib)
}
3.4 Comparison of Several Multivariate Population Means
In this part, Y is an N × p numeric matrix or dataframe of data where N
is total sample size and p is number of variables. X is an N × 1 numeric
matrix or dataframe of data where N is total sample size; the input level
is the confidence level of interval. C is the contrast used to test treatmenteffect’s differences.
MANOVA<-function (Y, X, level, C)
{
p <- ncol(Y)
g <- length(levels(as.factor(X)))
X <- as.numeric(X)
data <- matrix(cbind(Y, X), ncol = p + 1)
N <- length(X)
meanvec <- matrix(apply(Y, 2, mean), ncol = 1)
n <- matrix(rep(0, g), ncol = 1)
trtmean <- matrix(rep(0, p * g), ncol = g)
trt.effect <- matrix(rep(0, p * g), ncol = g)
trt.cov <- matrix(rep(0, (p * p) * g), nrow = p)
W <- matrix(rep(0, (p * p)), nrow = p)
B <- matrix(rep(0, (p * p)), nrow = p)
df2 <- 0
for (k in 1:g) {
n[k] <- length(subset(X, X == k))
trtmean[, k] <- as.matrix(mean(subset(Y, X == k)))
trt.effect[, k] <- trtmean[, k] - meanvec
for (i in 1:p) {
for (j in 1:p) {
trt.cov[j, i + (k - 1) * p] <- cov(subset(Y[,
i], X == k), subset(Y[, j], X == k))
}
}
W = W + (n[k] - 1) * trt.cov[, (1 + (k - 1) * p):(k *
p)]
B = B + (n[k]) * (trt.effect[, k]) %*% t(trt.effect[,
k])
df2 = df2 + (n[k] - g)
}
T = W + B
df1 <- g - 1
df3 <- N - 1
15
prob <- 1 - ((1 - level)/(p * g * (g - 1)))
t <- qt(prob, (N - g))
d <- (g * (g - 1)/2)
tau <- matrix(rep(0, (p * d)), ncol = 1)
lower <- matrix(rep(0, (p * d)), ncol = 1)
upper <- matrix(rep(0, (p * d)), ncol = 1)
for (i in 1:p) {
for (k in 1:(g - 1)) {
for (l in (k + 1):g) {
A <- matrix(C, ncol = d)
tau[(1 + (i - 1) * d):(i * d), ] <- t(A) %*%
(as.matrix(trt.effect[i, ]))
lower[(1 + (i - 1) * d):(i * d), ] <- tau[(1 +
(i - 1) * d):(i * d), ] - t * sqrt((W[i, i]/(N g)) * ((1/n[k]) + (1/n[l])))
upper[(1 + (i - 1) * d):(i * d), ] <- tau[(1 +
(i - 1) * d):(i * d), ] + t * sqrt((W[i, i]/(N g)) * ((1/n[k]) + (1/n[l])))
}
}
}
cat(" Overall mean vector \n")
print(t(meanvec))
cat("\n Treatment sample size \n")
print(t(n))
cat("\n Treatment effect matrix \n")
print(trt.effect)
cat("\n One-Way MANOVA Table \n")
cat("\n Treatment SS&CP matrix\n")
print(B)
cat("\n Error SS&CP matrix\n")
print(W)
cat("\n Total SS&CP matrix\n")
print(T)
cat("\n Degrees of Freedom\n")
df <- data.frame(Treatment = df1, Error = df2, Total = df3)
print(df)
cat("\n Bonferroni Based Simultaneous CI for Treatments’ Difference \n")
Bonferroni.SCI = data.frame(Trt = t(A), Estimate = round(tau,
3), LowerCI = round(lower, 3), UpperCI = round(upper,
3))
print(Bonferroni.SCI)
}
3.5
Treatment Effect Comparison
In this part, Y is an N × p numeric matrix or dataframe of data where N
is total sample size and p is number of variables. X is an N × 1 numeric
matrix or dataframe of data where N is total sample size. The input
level is the confidence level of interval. C is the contrast used to test
16
treatment-effect’s differences.
trt.effect<-function (Y, X, level, C)
{
p <- ncol(Y)
g <- length(levels(as.factor(X)))
data <- matrix(cbind(Y, X), ncol = p + 1)
N <- length(X)
prob <- 1 - ((1 - level)/(p * g * (g - 1)))
t <- qt(prob, (N - g))
d <- (g * (g - 1)/2)
tau <- matrix(rep(0, (p * d)), ncol = 1)
lower <- matrix(rep(0, (p * d)), ncol = 1)
upper <- matrix(rep(0, (p * d)), ncol = 1)
for (i in 1:p) {
for (k in 1:(g - 1)) {
for (l in (k + 1):g) {
A <- matrix(C, ncol = d)
tau[(1 + (i - 1) * d):(i * d), ] <- t(A) %*%
(as.matrix(var.decomp(Y, X)$trt.effect[i, ]))
lower[(1 + (i - 1) * d):(i * d), ] <- tau[(1 +
(i - 1) * d):(i * d), ] - t * sqrt((var.decomp(Y,
X)$W[i, i]/(N - g)) * ((1/var.decomp(Y, X)$trt.size[k]) +
(1/var.decomp(Y, X)$trt.size[l])))
upper[(1 + (i - 1) * d):(i * d), ] <- tau[(1 +
(i - 1) * d):(i * d), ] + t * sqrt((var.decomp(Y,
X)$W[i, i]/(N - g)) * ((1/var.decomp(Y, X)$trt.size[k]) +
(1/var.decomp(Y, X)$trt.size[l])))
}
}
}
cat("\n Bonferroni Based Simultaneous CI for Treatments’ Difference \n")
Bonferroni.SCI = data.frame(Trt = t(A), Estimate = round(tau,
3), LowerCI = round(lower, 3), UpperCI = round(upper,
3))
print(Bonferroni.SCI)
}
4
Reference
• Johnson R. A. and Wichern D. W. (2007). Applied Multivariate Statistical Analysis(6th. ed.). Englewood Cliffs, New Jersey: Prenticehall.
• Rencher A. C. (1998). Multivariate Statistical Inference and Applications. John Wiley & Sons, Inc.
• Morrison D.F. (1976). Multivariate Statistical Methods (2nd ed.).
McGraw-Hill Book Company.
17

Download Report

The R project for Comparisons of Several Multivariate Means

Paperzz.com

Your Paperzz