last year`s final - Department of Mathematics and Statistics

NATS 1500 Final Exam
April 17, 2010
Page 1 of 12
Name (PRINT)______________________________________________
Student Number_____________________________________________
Signature__________________________________________________
York University
Division of Natural Science
NATS 1500 3.0 Statistics and Reasoning in Modern Society
Final Exam
April 17, 2010, 7 pm
Instructor: G. Monette
Duration: 2 hour
Instructions:

No aids are allowed except a non-programmable calculator. Any device that can
establish a wireless or telephone connection is not allowed.

The marks for each question are shown in brackets. They sum to 210 but you will be
marked out of 200.

For multiple choice questions, clearly circle the letter corresponding to your answer. If
you change your mind, be sure to cross out the incorrect answer clearly and circle the
new answer. Make sure that your choice is not ambiguous, otherwise you will not
receive a mark for the question.

If you don't have enough space for your answer, please continue overleaf. Indicate
clearly with an arrow that your answer continues overleaf.
DO NOT OPEN
until an invigilator announces the start of the exam.
1
NATS 1500 Final Exam
April 17, 2010
1.
Page 2 of 12
A study investigated whether there was a higher risk of complications for a type of
colorectal surgery performed at a large urban hospital, University Hospital, versus a
smaller regional hospital, County Hospital. 400 patients who had the surgery at County
Hospital and 2,000 patients who had the surgery at University Hospital were studied. The
following table summarizes the number of cases with 'complications' in each group:
Complications No complications Total
County Hospital
12
388
400
University Hospital
160
1840
2000
Total
172
2228
2400
a)
[4] Find the rate of cases with complications in each group.
County Hospital __________ University Hospital___________
b)
[4] Which of the following is correct:
A. This is an observational study because there is not an equal number of patients at
each hospital.
B. This is an experimental study because each patient attends one and only one of
exactly two hospitals.
C. This is an observational study because each subject needs to be observed in order
to determine the value of the response variable, 'complications'.
D. None of the above.
c)
[10] The data suggest that it is safer (in the sense of a lower risk of complications) to
have surgery at County Hospital. Discuss whether this implies that a patient should
consider having surgery at County Hospital in order to reduce the risk of
complications. Identify at least one plausible confounding factor and one plausible
mediating factor that could partly explain the results of the study.
2
NATS 1500 Final Exam
April 17, 2010
Page 3 of 12
1 (continued)
d)
[10] How could you design a study that would establish which hospital a patient
should choose to lower the risk of complications? What obstacles would you
encounter in carrying out your design?
3
NATS 1500 Final Exam
April 17, 2010
2.
Page 4 of 12
The figure below shows four scatterplots: A, B, C and D.
[20] For each of the four scatterplots, identify whether a simple linear regression would be
appropriate and briefly support your answer. Continue your answer overleaf if necessary.
A)
B)
C)
D)
4
NATS 1500 Final Exam
April 17, 2010
3.
Page 5 of 12
The figure below shows four scatterplots.
a)
[5] Of the following possible values for correlation:
a) +1, b) –1, c) 0, d) +0.97, e) –0.97, f) +0.7, g) –0.7, h) +0.1, i) –0.1
the correct correlation for each scatterplot is:
A)__________ B) __________ C) __________ D) __________
b)
[2] In panel C, y is a z-score for weight and x is a z-score for height. If we used the
original units, namely kilograms and centimetres, would the correlation go up, go
down or stay the same?
5
NATS 1500 Final Exam
April 17, 2010
Page 6 of 12
4.
[10] Clearly define 2 of the following 3 terms: a) non-response bias, b) response bias,
c) selection bias.
5.
In a certain statistics course, 30% of the students got a A on the midterm. Of those who got
an A, 70% got an A on the course and of those who did not get an A on the midterm, 20%
got an A on the course.
6.
a)
[10] What proportion of the class got an A on the course? Show your work on the
opposite page.
b)
[10] If a student selected at random has an A on the course, what is the probability that
they got and A on the midterm?
[6] For each of the following measures, give the value that implies no difference between
the groups being compared:
a) Relative risk __________
b) Odds ratio __________
c) Percent increase in risk __________
6
NATS 1500 Final Exam
April 17, 2010
7.
Page 7 of 12
The following table classifies employees in a large government department by gender and
according to whether they have been promoted in the last two years.
Not promoted Promoted
Female
800
200
Male
1500
500
a)
[5] What is the risk of promotion for female employees?
b)
[5] What is the relative risk of promotion for females as compared with males?
c)
[5] What is the odds ratio for promotion for females as compared with males?
d)
[5] The following figure shows output from Rcmdr analyzing the table above.
How would you describe the evidence for a relationship between gender the
probability of promotion?
e)
[5] Is there conclusive evidence of discrimination in promotions against women?
Discuss briefly.
7
NATS 1500 Final Exam
April 17, 2010
Page 8 of 12
8.
[4] Suppose you were to read about a study showing that smokers have twice as much risk
of kidney disease as non-smokers. Can you conclude that smoking causes a higher risk of
kidney disease?
A. No, because the result was clearly based on an observational study.
B. Yes, because the result was clearly based on a randomized experiment.
C. The answer depends on whether the research was based on an observational study
or a randomized experiment, and it isn't obvious which was used.
D. No, because the baseline risk of kidney disease is not given.
9.
[4] Which of the following is true about the margin of error for the most common types of
surveys?
A. If the number of individuals in the sample were to be substantially increased, the
margin of error would decrease.
B. If the number of individuals in the sample were to be substantially decreased, the
margin of error would also decrease.
C. If the number of individuals in the sample were to be substantially increased, the
margin of error would not change.
D. If the number of individuals in the sample were to be substantially decreased, the
margin of error would not change.
10.
[4] Suppose a poll using a sample of 900 individuals found that 25% of them supported a
particular opinion. An interval that is 95% certain to contain the truth about the population
percent that support that opinion is:
A. 25% ± 0.11%
B. 25% ± 3.33%
C. 25% ± 6.67%
D. 25% ± 0.22%
11.
[4] Simpson's Paradox occurs when
A. No baseline risk is given, so it is not know whether or not a high relative risk has
practical importance.
B. A confounding variable rather than the explanatory variable is responsible for a
change in the response variable.
C. The direction of the relationship between two variables changes when the
categories of a confounding variable are taken into account.
D. The results of a test are statistically significant but are really due to chance.
12.
[4] An outlier is a data value that
A. is not consistent with the bulk of the data.
B. equals the maximum value in a set of data.
C. is significantly larger than the maximum value in a set of data.
D. is larger than 1 million.
8
NATS 1500 Final Exam
April 17, 2010
Page 9 of 12
13.
[4] If an exam was worth 100 points, and your score was at the 85th percentile, then
A. 85% of the class had scores at or above your score.
B. your score was 85 out of 100.
C. 15% of the class had scores at or above your score.
D. 15% of the class had scores at or below your score.
14.
[4] If the results of a hypothesis test give a p-value of 0.09, then, with a significance level
(alpha) = .05, the results are said to be
A. Not statistically significant because p ≥ alpha
B. Statistically significant because p ≥ alpha
C. Of practical importance because p ≥ alpha
D. Not of practical importance because p ≥ alpha
15.
[4] The likelihood that a statistic would be as extreme or more extreme than what was
observed, assuming the null hypothesis to be valid, is called a
A. statistically significant result
B. test statistic
C. significance level
D. p-value
16.
[4] In hypothesis testing, a Type 1 error occurs when:
A. The null hypothesis is not rejected when the alternative hypothesis is true.
B. The null hypothesis is rejected when the alternative hypothesis is true
C. The null hypothesis is not rejected when the null hypothesis is true.
D. The null hypothesis is rejected when the null hypothesis is true.
17.
The following figures show a scatterplot and Rcmdr output for an analysis of the
relationship between a 'prestige' rating and the mean education of 102 occupations studied
in a Canadian census.
9
NATS 1500 Final Exam
April 17, 2010
Page 10 of 12
17 (continued)
a)
[5] What is the correlation between prestige and mean education?
b)
[5] Is there evidence of a relationship between prestige and mean education?
c)
[5] Is it reasonable to use a simple linear regression to investigate this
relationship? Why or why not?
d)
[5] One occupation has a prestige rating of 48.5 and a mean education of 8.2.
What is the prestige rating predicted by the regression equation for this
occupation and what is the residual?
10
NATS 1500 Final Exam
April 17, 2010
Page 11 of 12
18.
[4] Michael wants to take Urdu or Japanese, or both. But classes are closed, and he must
apply and get accepted to be allowed to enrol in a language class. He has a 50% chance of
being admitted to Urdu, a 40% chance of being admitted to Japanese, and a 10% chance of
being admitted to both Urdu and Japanese. If he applies to both Urdu and Japanese, the
probability that he will be enrolled in Urdu or Japanese (or possibly both) is
A. 50%
B. 80%
C. 90%
D. 100%
19.
[4] A test to detect tuberculosis had a sensitivity of 95%. This means that
A. 95% of people who test positive will actually have tuberculosis.
B. 95% of people with tuberculosis will test positive.
C. 95% of people who do not have tuberculosis will test negative.
D. 95% of people who test negative will actually not have tuberculosis.
20.
[4] A test to detect tuberculosis had a specificity of 90%. This means that
A. 90% of people who test positive will actually have tuberculosis.
B. 90% of people with tuberculosis will test positive.
C. 90% of people who do not have tuberculosis will test negative.
D. 90% of people who test negative will actually not have tuberculosis.
21.
[4] The Canadian Quidditch Association (CQA) randomly tests Quidditch players for
illegal drugs. John has been randomly picked for the test and has tested positive. To
calculate the probability that John takes illegal drugs, the CQA needs to know:
A. the sensitivity of the test
B. the specificity and the sensitivity of the test
C. the prevalence of illegal drug taking, the specificity and the sensitivity of the test
D. the prevalence of illegal drug taking and the sensitivity of the test
22
[4] A test to detect prostate cancer in men had a specificity of 80%. This means that
A. 80% of the men who test positive will actually have prostate cancer.
B. 80% of the men with prostate cancer will test positive.
C. 80% of the men who do not have prostate cancer will test negative.
D. 80% of the men who test negative will actually not have prostate cancer.
23.
[4] A coffee and donut shop company holds a contest in which a prize may be revealed on
the inside of the rim of a paper coffee cup. The probability that each cup reveals a prize is
0.1 and winning is independent from one cup to the next. What is the probability that a
customer must roll up three or more rims before winning a prize?
A. (.1)(.1)(.9) = .009
B. (.9)(.9)(.1) = .081
C. (.9)(.9) = .81
D. 1  (.1)(.1)(.9) = .991
11
NATS 1500 Final Exam
April 17, 2010
Page 12 of 12
24.
[4] A medical treatment has a success rate of .8. Two patients will be treated with this
treatment. Assuming the results are independent for the two patients, what is the probability
that neither one of them will be successfully cured?
A. .64
B. .36
C. .4
D. .04
25.
[10] P-values are almost universally used in scientific research to evaluate evidence against
null hypotheses. Criminal courts that assess evidence under the assumption that the
accused is innocence follow a procedure that appears formally very similar to testing a null
hypothesis. Why can the use of p-values in court testimony lead to unfair convictions.
END
12