information and decision analysis

INFORMATION AND DECISION ANALYSIS
Question 1
Claim 1
PART 1 DESCRIPTIVE STATISTIC
Descriptive Statistics: Diameter
Results for Store = EagleBoys
Variable CrustDescription
N N*
Mean SE Mean
Diameter DeepPan
43
0 29.089
0.0730
MidCrust
43
0 28.782
0.0737 0.483
ThinCrust
39
0 29.701
0.0881 0.550
StDev
0.479
The mean diameter of the Deeppan is 29.089 with a standard deviation of 0.479 and a standard
error of the mean diameter 0.073. The mean diameter of the Midcrust is 28.782 with a standard
deviation of 0.483, and a standard error of the mean is 0.737. While the mean diameter of the
Thin crust is 29.701 with a standard deviation of 0.550 and the standard error of 0.0881
The histogram of Eagle boys
Histogram (with Normal Curve) of Diameter by CrustDescription
Store = EagleBoys
27.00 27.75 28.50 29.25 30.00 30.75
DeepPan
MidCrust
16
12
Frequency
8
4
16
0
ThinCrust
12
8
4
0
27.00 27.75 28.50 29.25 30.00 30.75
Diameter
Panel variable: CrustDescription
DeepPan
Mean
29.09
StDev 0.4788
N
43
MidCrust
Mean
28.78
StDev 0.4830
N
43
ThinCrust
Mean
29.70
StDev 0.5499
N
39
INFORMATION AND DECISION ANALYSIS
2
From the histogram, it is observed that the diameter of pizza in Deeppan has a normal
distribution. There is no outlier in the data. The data of Midcrust is also normally distributed with
the presence of one outlier. The Thincrust also has normal distribution.
The box plot of Eagle boy’s store
Boxplot of Diameter
Store = EagleBoys
31
Diameter
30
29
28
27
26
DeepPan
MidCrust
CrustDescription
ThinCrust
From the boxplot, we can conclude that the data on the diameter of pizzas in a deep pan, mid
crust, and thin crust are all normally distributed.
b.
The 99% confidence interval for the mean diameter of the Eagle Boys and Domino’s
pizza
Level
Dominos
EagleBoys
N
125
125
Mean
27.442
29.174
StDev
1.169
0.626
Individual 99% CIs For Mean Based on
Pooled StDev
------+---------+---------+---------+--(--*---)
(--*---)
------+---------+---------+---------+--27.60
28.20
28.80
29.40
The 99% confidence interval of the mean Dominos is (26.431, 28.611), while the confidence
interval of the mean Eagle boys is (28.548, 29.40). From the confidence interval of the means
obtained, it concluded that the claim by Eagle boys is valid hence; their large pizza is larger than
those of Dominos.
Claim 2
INFORMATION AND DECISION ANALYSIS
3
a.
Individual 99% CIs For Mean Based on Pooled StDev
Level
-+---------+---------+---------+--------
Dominos
(--*---)
EagleBoys
(--*---)
-+---------+---------+---------+-------10.75
11.00
11.25
11.50
From the confidence interval of the mean, the Eagle boys have a confidence interval of (11.221,
11.715). This means that the confidence interval of the mean does not contain 12 inch. It
therefore concluded that the Eagle Boys claim that they have “real size 12-inch large pizzas” is
not significant.
b. Assuming the population distribution for the diameter of Eagle Boys pizza is normally
distributed with mean and standard deviation equal to a sample mean and standard
deviation. Calculate the probability that a pizza selected from the population at random
will be a “real size 12 inch large pizza”.
INFORMATION AND DECISION ANALYSIS
4
Question 2
a) What is the percentage of the cuts of meat in the entire Sydney sample that has a weight
of at least 250 gram?
From the data of weight, the total observation is 240, the total number of weights that is at
least 252 gram is 90. This means that the percentage of the weight that is at least more than
252 gram is
b) Find a 95% confidence interval for the mean weight of all cuts of meat in the Sydney
stores. State any assumptions about the population distribution shape that you are making
to estimate this interval, for your estimate to be valid. State the chance that your estimate
may not be correct.
The assumption
The distribution of the weight is normally distributed (Clear, 2008).
The mean of the weight 248.97 and the standard deviation is 7.20. Hence, the confidence
interval of the mean weight is (241.77, 256.17)
c) Indicate whether the confidence interval meets all the requirements of the inspector and
therefore whether she should pursue the matter further.
INFORMATION AND DECISION ANALYSIS
5
From the confidence interval, it can be observed that the confidence interval does not
satisfy all the conditions. It is therefore clear that the inspector must pursue the matter
further.
d) An inspector in Melbourne was asked to conduct a similar analysis of a sample of cuts
from Melbourne supermarkets. The mean weight and standard deviation of 240 randomly
selected cuts taken from supermarkets around the city was 250.90 grams and 9.85 grams
respectively. The manager of a certain store in Melbourne would like to know what
percentage of the cuts in their specific store are likely to weigh at least 250 grams.
Keeping in mind the information provided in the introduction, what should the inspector
advise this store manager? Be specific in explaining why she can or cannot answer the
store manager’s request.
From above analysis, there is surety that the mean of the weights is 250.9. The inspector should
advise that manager that the percentage of the cuts that is likely to weigh at least 250 is 50% of
all the total weights.
Question 3
(a) Find the correlation coefficients (and p-values) between Price and each of the independent
predictor variables. Highlight any that are significantly correlated at α = 0.05. Interpret the
significant correlations.
Correlations: Carat, Price, Cut_Princess, Cut_Round, Colour_D, Colour_E, ...
Price
Cut_Princess
Cut_Round
Carat
0.927
0.000
Price
-0.061
0.481
0.013
0.882
0.061
0.481
-0.013
0.882
Cut_PrincessCut_Round
-1.000
*
INFORMATION AND DECISION ANALYSIS
6
Colour_D
0.102
0.240
0.111
0.201
-0.114
0.191
0.114
0.191
Colour_E
-0.125
0.150
-0.054
0.537
0.117
0.179
-0.117
0.179
Colour_F
-0.211
0.014
-0.156
0.071
0.088
0.309
-0.088
0.309
Colour_G
0.072
0.406
0.083
0.339
0.000
1.000
0.000
1.000
Colour_H
0.088
0.314
0.034
0.700
-0.062
0.480
0.062
0.480
Colour_I
0.183
0.034
0.113
0.194
-0.039
0.651
0.039
0.651
Colour_J
-0.020
0.815
-0.073
0.403
-0.087
0.319
0.087
0.319
Colour_L
0.101
0.243
-0.067
0.442
-0.123
0.157
0.123
0.157
Clarity_P1
0.361
0.000
0.169
0.051
-0.197
0.023
0.197
0.023
Clarity_SI1
0.034
0.700
-0.046
0.596
-0.117
0.179
0.117
0.179
Clarity_SI2
0.076
0.385
-0.003
0.977
-0.118
0.173
0.118
0.173
Clarity_VS1
-0.068
0.433
-0.018
0.836
0.161
0.063
-0.161
0.063
Clarity_VS2
-0.131
0.130
-0.055
0.527
-0.016
0.851
0.016
0.851
Clarity_VVS1
-0.056
0.519
-0.000
0.996
0.050
0.563
-0.050
0.563
Clarity_VVS2
-0.047
0.587
0.072
0.409
0.227
0.008
-0.227
0.008
From the correlation analysis, there is an observation that there is a significant correlation
between the carat and the price of the diamond. This is because the correlation value between the
price and the carat is 0.927 that have a significant value of 0.000, which is less than 0.05 level of
confidence. There is also a significant correlation between the color-f and color-the price and I. It
is also observed that there is a correlation between the price and the clarity-p1.
(b) Find the best subsets regression of the variable Price using the independent variables. Select,
with full reasons, the variable(s) you would use in your final model.
INFORMATION AND DECISION ANALYSIS
7
Regression Analysis: Price versus Carat, Cut_Princess,
The regression
Price = - 9179
+ 5096
+ 1983
- 1126
equation is
+ 23372 Carat - 394 Cut_Princess + 5631 Colour_D + 5396 Colour_E
Colour_F + 4672 Colour_G + 3942 Colour_H + 1982 Colour_I
Colour_J - 7574 Clarity_P1 - 2570 Clarity_SI1 - 3021 Clarity_SI2
Clarity_VS1 - 1264 Clarity_VS2 + 218 Clarity_VVS1
Predictor
Constant
Carat
Cut_Princess
Colour_D
Colour_E
Colour_F
Colour_G
Colour_H
Colour_I
Colour_J
Clarity_P1
Clarity_SI1
Clarity_SI2
Clarity_VS1
Clarity_VS2
Clarity_VVS1
CoefSECoef
-9179
1070
23372.3
512.1
-393.7
238.5
5631
1000
5395.9
985.8
5095.9
982.6
4672.0
970.8
3941.8
997.4
1982
1123
1983
1579
-7574.3
783.5
-2570.3
426.1
-3021.3
505.3
-1126.0
451.5
-1264.4
414.5
218.5
832.1
S = 1261.87
R-Sq = 95.2%
T
-8.58
45.64
-1.65
5.63
5.47
5.19
4.81
3.95
1.76
1.26
-9.67
-6.03
-5.98
-2.49
-3.05
0.26
P
0.000
0.000
0.101
0.000
0.000
0.000
0.000
0.000
0.080
0.212
0.000
0.000
0.000
0.014
0.003
0.793
R-Sq(adj) = 94.6%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
15
118
133
SS
3715761912
187894013
3903655924
MS
247717461
1592322
F
155.57
P
0.000
(c) Run the final regression and display the full Minitab output for the equation of your
regression model for Price using only the significant variables you selected in part (b)
above. [Do not remove any outliers or leverage points.]
The regression
Price = - 7862
+ 3075
- 2882
Predictor
Constant
Carat
Colour_D
Colour_E
Colour_F
Colour_G
Colour_H
Clarity_P1
Clarity_SI1
equation is
+ 23387 Carat + 4173 Colour_D + 3752 Colour_E + 3462 Colour_F
Colour_G + 2363 Colour_H - 7676 Clarity_P1 - 2484 Clarity_SI1
Clarity_SI2 - 1085 Clarity_VS1 - 1150 Clarity_VS2
CoefSECoef
-7862.2
691.2
23387.4
506.4
4173.2
610.9
3751.8
539.1
3461.8
524.8
3074.9
507.2
2362.6
548.2
-7676.2
716.8
-2484.1
385.6
T
-11.38
46.18
6.83
6.96
6.60
6.06
4.31
-10.71
-6.44
P
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
INFORMATION AND DECISION ANALYSIS
Clarity_SI2
Clarity_VS1
Clarity_VS2
-2881.9
-1085.3
-1149.6
S = 1272.63
469.1
423.7
379.7
R-Sq = 94.9%
-6.14
-2.56
-3.03
8
0.000
0.012
0.003
R-Sq(adj) = 94.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
11
122
133
SS
3706066665
197589259
3903655924
MS
336915151
1619584
F
208.03
P
0.000
(d) Write out the regression coefficients in your final model and interpret them.
Predictor
Constant
Carat
Colour_D
Colour_E
Colour_F
Colour_G
Colour_H
Clarity_P1
Clarity_SI1
Clarity_SI2
Clarity_VS1
Clarity_VS2
CoefSECoef
-7862.2
691.2
23387.4
506.4
4173.2
610.9
3751.8
539.1
3461.8
524.8
3074.9
507.2
2362.6
548.2
-7676.2
716.8
-2484.1
385.6
-2881.9
469.1
-1085.3
423.7
-1149.6
379.7
T
-11.38
46.18
6.83
6.96
6.60
6.06
4.31
-10.71
-6.44
-6.14
-2.56
-3.03
P
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.012
0.003
From the regression coefficient, the coefficient of carat is 23387.4 that have a t- statistic of 46.18
and a p- value of 0.000. It means the carat is significant and can have use to predict the price.
The coefficient of colour_d is 4173 that has a t- statistic of 6.83 and a p- value of 0.000. This
means the colour_d is significant and can be used to predict the price. The coefficient of
colour_E is 3751.8 that have a t- statistic of 539.1 and a p- value of 0.000. This means the
colour_E is significant and can be used to predict the price. The coefficient of colour_F is 3461.8
that have a t- statistic of 6.96 and a p- value of 0.000. This means the colour_F is significant and
can be used to predict the price. The coefficient of colour_G is 3074.9 that have a t- statistic of
6.06 and a p- value of 0.000. This means the colour_G is significant and can be used to predict
the price. The coefficient of colour_H is 2362.6 that have a t- statistic of 4.31 and a p- value of
0.000. This means the colour_H is significant and can be used to predict the price. The
coefficient of Clarity_p1 is -7676.2 which has a t- statistic of -10.71 and a p- value of 0.000. This
INFORMATION AND DECISION ANALYSIS
9
means the Clarity_p1 is significant and can be used to predict the price. The coefficient of
Clarity_sI1 is -2484.1which has a t- statistic of -6.44 and a p- value of 0.000. This means the
Clarity_sI1 is significant and can be used to predict the price . The coefficient of Clarity_sI2 is -2881.9 which has a t- statistic of -6.14 and a p- value of 0.000. This means the Clarity_sI2 is
significant and can be used to predict the price. The coefficient of Clarity_vs1 is –1085.3 which
has a t- statistic of -2.56 and a p- value of 0.012. This means the Clarity_vs1 is significant and
can be used to predict the price. The coefficient of Clarity_vs2 is –1149.6 which has a t- statistic
of -3.03 and a p- value of 0.03. This means the Clarity_vs2 is significant and can be used to
predict the price.
e) Comment, with reasons, on whether you feel the model is a good fit to the data
From the regression analysis, the coefficient of determination is 0.949. This implies that the
regression analysis can account for 94.9% of all the errors. The regression model is good to fit
the data.
f) Based on the regression output for your final model identify if there are any diamonds
that appear to be over-priced ( poor value) or under-priced, that is “good value buys”?
The mean price of the priceless is 8856 while that of round is 8716. This means that there is no
diamond that appears to be over priced or underpriced.
g) Comment on the validity of the statement “The square princess cut diamond is usually
slightly cheaper than round brilliant cut”
One-way ANOVA: Price versus Cut
Source
Cut
Error
DF
1
132
SS
658281
3902997643
MS
658281
29568164
F
0.02
P
0.882
INFORMATION AND DECISION ANALYSIS
Total
133
S = 5438
Level
Princess
Round
10
3903655924
R-Sq = 0.02%
N
67
67
R-Sq(adj) = 0.00%
Individual 95% CIs For Mean Based on
Pooled StDev
MeanStDev ----+---------+---------+---------+----8856
4894
(------------------*-----------------)
8716
5932 (------------------*-----------------)
----+---------+---------+---------+----7700
8400
9100
9800
Pooled StDev = 5438
Grouping Information Using Tukey Method
Cut
Princess
Round
N
67
67
Mean
8856
8716
Grouping
A
A
Means\, that do not share a letter are significantly different.
From the comparison of the mean of the princess a diamond and the round diamond, it can be
concluded that there is no significant evidence to conclude that “The square princess cut
diamond is usually slightly cheaper than round brilliant cut”
h) Brianna has found a diamond she likes. It is 1.1 carats, with a brilliant round cut and she
has been told it has been classified as E colour grading and VS1 clarity. Estimate a fair
price for this diamond, and also estimate the range of prices that you would be 95%
confident an average diamond with these characteristics would fall within. (4 marks)
The regression
Price = - 7862
+ 3075
- 2882
equation is
+ 23387 Carat + 4173 Colour_D + 3752 Colour_E + 3462 Colour_F
Colour_G + 2363 Colour_H - 7676 Clarity_P1 - 2484 Clarity_SI1
Clarity_SI2 - 1085 Clarity_VS1 - 1150 Clarity_VS2
Price is 20530.7
95% confident
The standard deviation is 5418
INFORMATION AND DECISION ANALYSIS
11
Estimate the range of prices that you would be 95% confident an average diamond with these
characteristics would fall within.
The range of prices
( 15112.7, 25948.7)
INFORMATION AND DECISION ANALYSIS
Reference
Clear, T. R.,(2008). American Corrections.Belmont, CA: Cengage Learning.
12