Descriptive Statistics: C1, C2, C3

InQu 5006 Project Assignment
Group 1 – Armando Alvarez, Jennifer Rivera, Axel Figueroa
Problem 7
Manufactured gas plants are used to produce gas for lighting, heating, and feedstock for
the chemical industry. This process creates wastes that include toxic hydrogen sulfide.
The following data represent the amount of sulfides (meg/g) for three independent runs
produced by a gas plant.
Run 1 Run 2 Run 3
0.50
0.75
0.80
0.60
0.40
0.80
0.60
0.76
0.54
0.87
0.67
0.78
0.56
0.55
0.57
0.68
0.88
0.98
1.20
0.65
0.56
0.67
0.49
0.68
0.97
0.68
0.59
0.99
0.57
0.66
(a) At a 0.05 significance level, is there evidence of a difference in the average
amount of sulfides for the three runs?
(b) If the results obtained in (a) indicate it is appropriate, determine which runs
differ in average sulfides.
(c) What assumptions are necessary in part (a)?
(d) Do you think these assumptions are valid for these data? Show.
Table 1 – Amount of sulfides present in the toxic waste of a gas manufacturing plant
Run
1
2
3
1
0.50
0.67
0.56
2
0.75
0.78
0.67
3
0.80
0.56
0.49
Observations
(meg/g)
4
5
6
7
0.60 0.40 0.80 0.60
0.55 0.57 0.68 0.88
0.68 0.97 0.68 0.59
8
0.76
0.98
0.99
9
0.54
1.20
0.57
10
0.87
0.65
0.66
Totals
6.62
7.52
6.86
Averages
0.66
0.75
0.69
21.00
0.70
Graph 1 – Box plot of test data of each run
Box Plot of test data of the amount of sulfides present in toxic waste
1.2
Amount of Sulfides (meg/g)
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Run 1
Run 2
Run 3
This graph shows that there is a random distribution of measures in each run. The
variability of amount of sulfides in the runs is pretty much symmetric and not much
difference between the runs. The second run is the most different from the rest but
further tests must be done to draw clearer conclusions.
Graph 2 – Probability plot of test data of each run
Probability Plot of amount of sulfides in toxic waste
Normal - 95% CI
99
Variable
Run 1
Run 2
Run 3
95
90
Mean StDev N
AD
P
0.662 0.1550 10 0.346 0.403
0.752 0.2116 10 0.508 0.150
0.686 0.1670 10 0.796 0.025
Percent
80
70
60
50
40
30
20
10
5
1
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Amount of sulfides (meg/g)
1.4
1.6
This probability plot shows an almost equal distribution of data between the runs that
suggests that the distribution of average amount of sulfides doesn’t vary much between
runs.
(a) We are testing to see if there is evidence that suggests that there is a difference in
the average amount of sulfides in the three runs of the collected data. The hypotheses
are:
H 0 : 1   2   3  0
H1 :  i  0
In order to determine if we reject the null hypothesis, we need to find f 0 . We reject H 0 if
f 0  f  ,a 1,a ( n 1) . To find f 0 we use the sums of squares for the analysis of variance:
y2
212
2
2
2
SST   y 
 (0.50)  (0.75)      (0.66) 
 0.9136
N
30
i 1 j 1
3
10
2
ij
3
SSTreatments  
i 1
yi2 y2 (6.62) 2  (7.52) 2  (6.86) 2 (21) 2



 0.04344
n
N
10
30
SS E  SST  SSTreatments  0.9136  0.04344  0.87016
MSTreatments  SSTreatments (a  1)  0.04344 (3  1)  0.02172
MS E  SSE [a(n  1)]  0.87016 [3(10  1)]  0.032228
F0 
MS Treatments 0.02172

 0.673968
MS E
0.032228
f  ,a 1,a ( n 1)  f 0.05, 2, 27  3.35
The ANOVA is summarized in Table 2. Since f 0  f 0.05, 2, 27 , we fail to reject
H 0 :  1   2   3  0 at the α=0.05 level and conclude that there is not enough evidence
to say that there is any significant difference in the average amount of sulfides in the
three runs. The P-value for this test is
P  P( F2, 27  0.673968)  0.518
The P-value is significantly larger than 0.05. Thus, H 0 :  1   2   3  0 would be
rejected at any level of significance α  P-value = 0.518.
Table 2 – ANOVA for the amount of sulfide data
Source of
Sum of
Degrees of
Mean
Variation
Squares
Freedom
Square
Runs
0.04344
2
0.02172
Error
0.87016
27
0.032228
Total
0.9136
29
f0
0.673968
P-value
0.518
Table 3 – Minitab analysis of variance output for amount of sulfides produced in each run
One Way ANOVA: Analysis of Variance for Amount of Sulfide
Source
Factor
Error
Total
DF
2
27
29
S = 0.1795
Level
Run 1
Run 2
Run 3
N
10
10
10
SS
0.0434
0.8702
0.9136
MS
0.0217
0.0322
R-Sq = 4.75%
Mean
0.6620
0.7520
0.6860
F
0.67
P
0.518
R-Sq(adj) = 0.00%
StDev
0.1550
0.2116
0.1670
Individual 95% CIs For Mean Based on
Pooled StDev
-----+---------+---------+---------+---(----------*-----------)
(----------*-----------)
(-----------*----------)
-----+---------+---------+---------+---0.60
0.70
0.80
0.90
Pooled StDev = 0.1795
Fisher 95% Individual Confidence Intervals
All Pairwise Comparisons
Simultaneous confidence level = 88.07%
Run 1 subtracted from:
Run 2
Run 3
Lower
-0.0747
-0.1407
Center
0.0900
0.0240
Upper
0.2547
0.1887
-----+---------+---------+---------+---(----------*----------)
(----------*----------)
-----+---------+---------+---------+----0.15
0.00
0.15
0.30
Run 2 subtracted from:
Run 3
Lower
-0.2307
Center
-0.0660
Upper
0.0987
-----+---------+---------+---------+---(----------*----------)
-----+---------+---------+---------+----0.15
0.00
0.15
0.30
(b) Since the test suggests that there is no significant difference between the runs, there
is no need to test which runs differ in the average amount of sulfides.
(c)The analysis of variance assumes that the observations are normally and
independently distributed with the same variance for each treatment or factor level.
(d)These assumptions are valid for this data and we can check by examining the
residuals. A residual is the difference between an observation and its estimated (or
fitted) value from the statistical model being studied. The residuals for the amount of
sulfide experiment are shown in Table 4. The normality assumption can be checked by
constructing a normal probability plot of the residuals (Graph 3), and the independence
assumption can be checked by plotting the residuals against the run order in which the
experiment was performed (Graph 4).
Table 4 – Residuals for the Amount of Sulfide Experiment
Run
Residuals
1
-0.16
0.09
0.14
-0.06 -0.26
0.14
-0.06
2
-0.08
0.03
-0.19
-0.2
-0.18 -0.07
0.13
3
-0.13 -0.02
-0.2
-0.01
0.28
-0.01
-0.1
0.1
0.23
0.33
-0.12
0.45
-0.12
Graph 3 – Normal probability plot of residuals
Probability Plot of Run 1_1, Run 2_1, Run 3_1
Normal - 95% CI
99
Variable
Run 1_1
Run 2_1
Run 3_1
95
90
Mean
0.002
0.002
0.002
Percent
80
70
60
50
40
StDev N
AD
P
0.1550 10 0.346 0.403
0.2116 10 0.508 0.150
0.1729 10 0.756 0.032
30
20
10
5
1
-0.8
-0.6
-0.4
-0.2 0.0
0.2
Residual Value
0.4
0.6
0.8
Graph 4 – Interval plot of residuals vs. factor levels
Interval Plot of Run 1_1, Run 2_1, Run 3_1
95% CI for the Mean
0.15
Residual Value
0.10
0.05
0.00
-0.05
-0.10
-0.15
Run 1_1
Run 2_1
Run 3_1
These plots do not reveal any model inadequacy or unusual problem with the
assumptions.
0.21
-0.1
0