the Exam Jam Booklet

STAT 35000 Exam Jam
Contents
1 First Day
Question 1: PDFs, CDFs, and Finding E(X), V (X) . .
Question 2: Bayesian Inference . . . . . . . . . . . . . .
Question 3: Binomial to Normal Approximation . . . .
Question 4: Sampling Distributions . . . . . . . . . . . .
Question 5: Confidence Intervals . . . . . . . . . . . . .
Question 6: One Sample Hypothesis Testing, Power . .
Question 7: Two Sample Hypothesis Testing . . . . . .
Question 8: Paired Sample Hypothesis Testing . . . . .
Question 9: One Sample Proportion Hypothesis Testing,
Question 10: Simple Linear Regression . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Power
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
3
4
5
6
7
8
9
10
11
2 Second Day
Question 1: Unions, Intersections, and Bayes Theorem .
Question 2: Normal Distribution . . . . . . . . . . . . .
Question 3: Binomial to Normal Approximation . . . .
Question 4: Confidence Intervals . . . . . . . . . . . . .
Question 5: One Sample Hypothesis Testing . . . . . . .
Question 6: One Sample Hypothesis Testing, Power . .
Question 7: Two Sample Proportion Hypothesis Testing
Question 8: Simple Linear Regression . . . . . . . . . .
Question 9: Anova . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
13
14
15
16
17
18
19
20
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mathematical Sciences Department @ IUPUI
1
STAT 35000 Exam Jam
First Day
1. A continuous random variable has a pdf given below
(
ke−x/100 ,
f (x) =
0,
0<x<∞
elsewhere
(a) Find value k such that f (x) is a valid p.d.f.
(b) Write out the c.d.f., or F (x)
(c) Find P (0 < X < 80).
(d) Find E(X) and V (X).
(e) Suppose we got a new variable Y = −2X + 50. Find E(Y ) and V (Y ).
2
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
2. Suppose that there are two drawers that contain different number of marbles. The first drawer contains
5 white marbles and 5 blue marbles. The second drawer contains 9 white marbles and 3 blue marbles.
Mike, randomly opens one of the drawers, but he is twice as likely to open the second drawer opposed
to the first drawer. Then, without looking, he randomly picks up one marble. Mike does not treat
either drawer any different nor the marble.
(a) Create a tree diagram with the corresponding probabilities (the tree should have two layers).
(b) Suppose that we had a following hypothesis,
H0 : Marble was picked from the 1st drawer
Ha : Marble was picked from the 2nd drawer
What is the probability for each hypothesis to happen?
(c) Given that Mike opened the 2nd drawer, what is the probability that the marble is white?
(d) Now, lets say that the marble happens to be blue. What is the updated belief for the two
hypothesis? In other words, what is the probability for each hypothesis to happen now? (Use
Bayes rule)
3
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
3. Chris loves eating snickers candy. He rolls a fair six sided die and if the die lands on 5, he eats one
candy. Let X denote the amount of snicker bars that Chris eats from a bag of 50.
(a) Name the exact distribution of X and its parameters.
(b) Find P (X ≥ 4).
(c) Name the approximate distribution for X and its parameters. Explain why approximation is valid.
(d) Find the approximate probability that Chris eats 10 or less snickers from the bag of 50, P (X ≤ 10).
4
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
4. Suppose X has the following distribution
x
p(x)
0
.1
Suppose we take a random sample of n=2.
(a) Calculate P (X̄2 = 2).
(b) Find the distribution of X̄2
(c) Find E[X̄] and V [X̄] for n = 2.
5
1
.5
3
.4
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
5. In certain chemical processes it is very important that a particular solution have a pH of exactly 8.20.
A method for determining the pH for solutions of this type is known to give measurements that are
normally distributed with a mean equal to the actual pH and with a standard deviation of 0.1. Suppose
6 independent measurements yielded the following pH values.
8.10
8.25
8.45
8.30
8.60
8.25
(a) Calculate the MOE.
(b) Construct a 95% Confidence Interval for the average pH level.
(c) What does this confidence interval represent?
(d) What conclusion can be made at the α = 0.05 level of significance?
6
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
6. It is advertised that certain machinery on average lasts 12.7 years. Paul wants to test and see whether
the average life of the machinery is actually less than 12.7 years. He tests 20 machines and finds that
the sample mean was 11.8. Assume the lifetime of the machinery is normally distributed with standard
deviation of 1.5.
(a) State the null and alternative hypothesis
(b) Name the parameters and describe what they represent.
(c) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(d) Calculate the p-value.
(e) Assuming α = .01, state the conclusion and interpret the results in the context of the problem.
(f) Suppose that the true average time a machine lasts is actually 12. For the test above, what is the
power to successfully detect the alternative?
(g) If we want the above power to be 95%, at least how many machines you need to have in your
sample?
7
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
7. Mrs. McBroom teaches two different independent classes. Class A has 25 students and Class B has
16 students. Both classes took the same standardize exam. Class A average was 80 with standard
deviation of 10 and Class B average was 85 with standard deviation of 7. Mrs. McBroom wants to
know if there is significant difference between the scores in two of her classes (use α = .05).
(a) State the null and alternative hypothesis
(b) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(c) Find the test statistic and
(d) Find the rejection region.
(e) Using the rejection region, state the conclusion and interpret the results in the context of the
problem.
8
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
8. A group of students wanted to see whether their new filter invention could filtrate Zinc out of the
drinking water. Five pairs of data were taken recording the zinc concentration in water before and
after the filtration. The data is shown below.
Before
After
After-Before
10
5
-5
12
8
-4
Zinc %
9.5
6
-3.5
16
8.5
-7.5
13.5
9.5
-4
Mean
12.2
7.4
-4.8
S.D.
2.66
1.85
1.6
Is there any difference in Zinc Concentration Before and After the filtration?
(a) State the Null and Alternative Hypothesis.
(b) What do the parameter(s) represent?
(c) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(d) Calculate test statistic.
(e) State your conclusion and interpret the results in the context for the researcher using α = 0.01
9
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
9. The CEO of a large company says that about 60 percent of its 10000 customers are satisfied with their
services. To test this claim, a local news reporter tests 100 customers. From that sample, 64 said they
were satisfied with the service. Assume α = 0.02.
(a) State the Null and Alternative Hypothesis
(b) Calculate test statistic.
(c) State the conclusion of your test.
(d) Suppose that we later found out that the true percent satisfaction rate was actually 55%. Find
the power of the test.
(e) If we wanted to have a power of 95%, what is the minimal sample size we would need?
10
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
10. Consider the following data.
x
1
2
3
y
6
15
12
Assume the form of the regression line is y = mx + b + , where is a random error term whose mean
is zero. Compute the following.
(a) x, y
(b) Sxy =
3
X
(xi − x)(yi − y)
i=1
(c) Sxx =
3
X
(xi − x)2
i=1
Sxy
.
Sxx
(e) Find the equation of the regression line using the slope from part (d) and the fact that the
regression line must go through the point (x, y).
(d) The slope of the regression line m =
(f) Estimate y when x = 10.
(g) What is the sum of the residuals? What is the sum of the residuals squared?
11
Mathematical Sciences Department @ IUPUI
2
STAT 35000 Exam Jam
Second Day
1. Suppose that there is 35% that the wife but not the husband will win the raffle ticket. The chance
that both of them will win is 20%. The chance that both of them lose is 40%. Let H=husband wins
the raffle ticket and W=wife wins the raffle ticket.
(a) Calculate the probability that wife OR husband wins the raffle, P (H ∪ W ).
(b) Calculate the probability that the husband wins the raffle, P (H).
(c) What is the probability that the wife wins given that the husband wins, P (W |H)?
(d) What is the probability that the husband loses given that the wife has won, P (H c |W )?
12
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
2. Suppose that amount of snow accumulation in Jasper is normally distributed. Suppose we take a
random sample of 25 independent measurements and find the mean to be 32.5 and standard deviation
9. Let X represent the snow accumulation.
(a) What is the probability that the amount of snow will be between 28.5 and 36.5?
(b) Find the 20th percentile for X.
(c) In order to have margin of error of 2 and be 95% confident, what is the smallest sample size we
would need to have?
13
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
3. A certain machine produces bottle caps for Coca-Cola. The machine produces 3% defectives. Total of
20,000 bottle caps are produced. Let X be the number of defective bottle caps out of 20,000.
(a) What is the name of the exact distribution? (Specify the parameter values)
(b) What is the name of the approximate distribution? (Specify the parameter values) What are the
conditions that allow us to use this approximation? (Name the Theorem as well)
(c) Find approximate probability that more than 550 bottle caps are defective.
14
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
4. A standardized test is given annually to all sixth grade students in the state of Washington. To
determine the average score of students in a particular district, a school supervisor selects a random
sample of 100 students. The sample mean of these students’ scores is 320 and the sample standard
deviation is 16.
(a) Give a 99% confidence interval estimate of the average score of students in that supervisor’s
district.
(b) Interpret the confidence interval found above in words.
(c) Suppose that the district claimed that the average student score was 322. Should we believe the
district at 95% confidence level?
(d) If we wanted to have M OE = 5, what is the smallest sample size we should have in order to be
90% confident?
15
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
5. A car is advertised as having a gas mileage rating of at least 30 mpg for highway driving. If the
miles per gallon obtained in five independent experiments are {28, 30, 33, 25, 29}, is the advertisement
correct?
(a) State the Null and Alternative Hypothesis.
(b) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(c) Calculate the test statistic.
(d) Find the Rejection Region using α = 0.05.
(e) Should you believe the advertisement? Explain.
(f) What assumptions are you making?
16
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
6. Suppose we want to show that children have a different cholesterol level than the national average. It
is known that the mean cholesterol level of Americans is 180 and α = .10. We test 5 children and find
the sample mean to be 200. We also know that the population standard deviation is 20.
(a) State the Null and Alternative Hypothesis.
(b) State the parameters.
(c) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(d) Find the p-value
(e) Do we reject or accept the null? Interpret your results.
(f) Suppose that instead of 5 children, we tested 49 children and mean and standard deviation
remained the same. Would we reject or accept the null? Interpret your results.
(g) Given that the true value is 190, what sample size we would need to have to have a power of the
test of 90%?
17
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
7. A famous reporter, Ian, wanted to figure out whether the students at IUPUI were more republican
than at IU. From his two independent surveys, he found out that 60 students out of a 100 at IUPUI
and 65 out of 120 at IU supported the Republican party. Assume α = .05.
(a) State the Null and Alternative Hypothesis.
(b) What do the parameter(s) represent?
(c) What method would you use?
i.
ii.
iii.
iv.
1-sample
1-sample
2-sample
2-sample
T
Z
T
Z
(d) Calculate the test statistic.
(e) State your conclusion and interpret the results in the context of the problem.
18
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
8. A researcher of the MAC wants to know whether visiting the MAC affects the grade the student receives
in a class. He records the number of visits (x) and the score received in a class (y) on 30 randomly
selected students. Summary statistics for the data are as follows: x̄ = 8.5, sx = 4, ȳ = 86.5, sy =
12, n = 30.
(a) In view of the question the researcher wants to answer, what is the independent variable, and the
response variable?
(b) Suppose that correlation between number of visits and exam score is r = 0.95. Find the least
squares regression equation.
(c) What is predicted score in a class if a student visits the MAC 5 times?
(d) Is the model overestimating or underestimating for the student who visited the MAC 5 times and
has a score of 94%?
(e) Interpret the meaning of slope and the intercept.
(f) Find the 95% confidence interval for the slope of the regression line. Does the 95% confidence
interval support the claim that β > 3?
19
Mathematical Sciences Department @ IUPUI
STAT 35000 Exam Jam
9. Suppose we have four different patient groups that we treat each with different medicine. Each of the
groups has 7 people. The doctors want to know whether there is significant difference in the average
recovery time between all of the groups, so they run an Anova test. The output is shown below with
some missing entries.
Source
Regression
Residual
Total
df
3
?
27
SS
56310
?
75998
MS
18770
?
F
?
p-value
≈0
Answer the following questions.
(a) What are the residual degrees of freedom?
(b) What is the number of observations, N?
(c) Find SSE.
(d) Find the standard deviation of the regression, σ.
(e) Calculate the F statistic and explain what it means (Include appropriate Hypothesis).
(f) Find R2 .
20