Practice Midterm I Problems from Chapter 1 Consider the data table

Practice Midterm I
Problems from Chapter 1
Consider the data table:
Id
Sex
Age
Eye Color
1
F
23
Brown
2
M
21
Blue
1.1 How many records are in this table?
Ans: 2
1.2 How many quantitative variables are in this table?
Ans: 1
1.3 How many categorical variables are in this table?
Ans: 2
Problems from Chapter 2
2.1 Fill in the missing relative frequencies:
Class
Frequency
Relative Frequency
First
325
14.77%
Second
285
12.95%
Third
706
32.08%
Crew
885
40.21%
Total:
2201
100%
Problems from Chapter 3
3.1 For each property, which of the histograms have that property?
A
B
(a)
(b)
(c)
(d)
(e)
bimodal
skew right
skew left
unimodal
symmetric
C
Ans:
Ans:
Ans:
Ans:
Ans:
B
C
D
A, C, D
A
D
3.2 Find the median of the numbers 1, 3, 5, 7, 9, 100.
Solution: Since there are an odd number of numbers here, we take the average of the middle two:
3.3 What is the range of the data in problem 3.2?
Solution: The range is the largest minus the smallest value in the data:
3.4 What is the first and third quartiles, Q1 and Q3, for the data in problem 3.2?
Solution: Q1 is the median of the lower half of the data, 1, 3, 5, which in this case is 3.
Q3 is the median of the upper half of the data, 7, 9, 100, which in this case is 9.
3.5 Find the IQR for the data in problem 3.2?
Solution: IQR = Q3 – Q1 = 9 – 3 = 6
3.6 Find the mean of the data in problem 3.2.
Solution:
3.7 Describe the differences between a bar chart and a histogram.
Answer: There are two differences, one is in the type of data that is presented and the other in the way
they are drawn. In bar graphs are usually used to display categorical data, that is data that fits into
categories. Histograms on the other hand are usually used to present quantitative data, that is data that
represents measured quantity where, at least in theory, the numbers can take on any value in a certain
range. Histograms are never plotted with spaces between the rectangles, while bar charts always have
spaces between the rectangles.
Problems from Chapter 4
4.1 Which of the following could not be the cause of an erroneous outlier:
(a)
(b)
(c)
(d)
(e)
Transposing digits
Excluding extreme values from the data
Confusion about units
Error entering data
Cheating
←
Answer
Problems from Chapter 5
5.1 Suppose that
and
. What is the z-score of 9?
Solution:
5.2 An SAT test has a mean of 620 with a standard deviation of 90, while an ACT test has a mean of
20 with a standard deviation of 5. If a student scores 780 on the SAT and 30 on the ACT, which
score is more impressive?
Solution:
We need to z-transform both scores to compare them.
SAT:
ACT:
The ACT score was a little more impressive.
5.3 When is it appropriate to model data with a normal distribution?
(A)
(B)
(C)
(D)
The distribution is bimodal and strongly left-skewed.
The distribution is multimodal and symmetric.
The distribution is unimodal and strongly right-skewed.
The distribution is unimodal and symmetric. ← Answer
5.4 About what percentage of normally distributed data falls within one sigma of the mean?
Answer:
68%
5.5 About what percentage of normally distributed data falls within two sigmas of the mean?
Answer:
95%
5.6 About what percentage of normally distributed data falls within three sigmas of the mean?
Answer:
99.7%
Problems from Chapter 6
6.1 Which of the following are not assumptions one should make when doing a linear regression:
(a)
(b)
(c)
(d)
The variables must be quantitative.
The variables should have the same variance. ← Answer
The scatterplot of the variables should be relatively straight.
There are no strong outliers..
6.2 Correlation measures the strength of the linear association between two variables. (True or False)
Answer:
True
6.3 Correlation implies causation. (True or False)
6.4 When does one obtain a correlation
Answer:
When all the
Answer:
False!
between two variables, x and y?
pairs are on a line with a negative slope.
6.5 What kind of variable may cause both variables in our model to be highly correlated?
Answer:
a lurking variable
Problems from Chapter 7
7.1 Suppose that you obtain a correlation coefficient
. About what percentage of the
variation of the response variable is explained by this linear regression model?
Solution: The square of the correlation coefficient gives the percentage of the variation
explained. Since
, this model explains 79.2% of the variance.
7.2 Suppose you do a linear regression and obtain the line
, what's residual of this data point?
Solution: Our linear model predicts a value of
this data point is
. The residual is thus
. If one of your data points is
. But the actual y-value for
.
7.3 What sort of residual plot does one expect to see if the data is truly linear?
Answer:
A good model will have the spread of the residuals consistent and small.
7.4 Suppose that you want to perform a linear regression on some data with a number of x values and
corresponding y values. If the standard deviation of the x values is 7 and the standard deviation of the
y values is 20, and the correlation of x with y is .7, then what is the slope of the regression line?
Solution:
7.5 Continuing from 7.4, assume that the average of all the x values is 40 and the average of all the y
values is 77. Find the y-intercept of the regression line.
Solution:
Problems from Chapter 8
8.1 A data point whose x-value is far from the mean of the rest of the x-values
is said to have high ____.
Answer: leverage
8.2 A point is ______ if omitting it from the analysis changes the model enough
to make a meaningful difference.
Answer: influential
8.3 Statistics such as the mean and median tend to inflate the impression of the strength
of the linear relationship. (True or False)
Answer: True
Problems from Chapter 9
9.1 What R command can you use to randomly simulate the results of 1000 die rolls?
Answer: sample(1:6, 1000, replace=TRUE)
9.2 Suppose event A happens 20% of the time, B happens 30% of the time, and C happens 50% of the
time. What R command creates a sample of 1000 A’s, B’s, C’s with these probabilities?
Answer:
sample(c(A, B, C), 1000, replace=TRUE, prob=c(.2, .3, .5))
Problems from Chapter 10
10.1 When obtaining a sample from a population, which is more important – to select a representative
sample or to select a large sample?
Answer: to select a representative sample
10.2 Suppose that a population is 70% women and 30% men. If your sampling frame consists of 150
women and 200 men, for stratified sampling, what is the maximum number of women and men you
should randomly select for your survey?
Answer: 140 women and 60 men
10.3 If you survey every third student in a class, this is an example of _________ sampling.
Answer: systematic
10.4 If you randomly select a Math B22 class, and survey every student in the class, this is an example
of _________ sampling.
Answer: cluster
10.5 If you randomly select 100 students from all students currently taking Math B22, this is an
example of __________ sampling.
Answer: simple random sampling
10.6 A sampling scheme that uses more than one sampling method is called _________________.
Answer: multistage sampling
10.7 A small trial run of a survey that you will eventually give to a larger group is called a _________.
Answer: pilot
10.8 If you find serious bias in a survey you've conducted, will it help to increase the size of your
sample?
Answer: no
Problems from Chapter 11
11.1 Is it easy to determine cause and effect relationships from observational studies and from
retrospective studies?
Answer: no
11.2 An experiment requires random assignments of subjects to _________ .
Answer: treatments
11.3 Is is possible to determine cause and effect relationships from conducting experiments?
Answer: yes
11.4 Explanatory variables are called _______ .
Answer: factors
11.5 In experiments, controls what you can, and _______ the rest.
Answer: randomize
11.6 In experimental design, _________ plays the role that stratifying accomplishes in survey design.
Answer: blocking
11.7 A difference is called statistically ___________ if the difference is much greater than what one
would expect from random chance.
Answer: significant
11.8 When neither the subject nor the experimenter knows which treatment is given, this is called ___.
Answer: double-blinding
11.9 A substance given to a subject that looks like a drug being tested but in reality is a sugar pill, it is
called a _______.
Answer: placebo
Problems from Chapter 12
12.1 Suppose that events A and B are disjoint, and
Then
Solution: Since A and B are disjoint,
,
.
12.2 Suppose that events A and B are independent, and
Then
,
.
Solution:
Since A and B are independent,
12.3
Suppose that events A and B are disjoint. Can they also be independent?
Solution:
No. The reason is that when events are truly independent, one gives you zero information about
the other, in the sense that knowing that A occurred gives you no information about whether or
not B occurred. But if the events are disjoint, and you know that A occurred, then you know
that B did not occur. Since A occurring gives you information about whether or not B occurred,
A and B can't be independent.
12.4 Consider the game where you throw dice.
What is the probability that you throw an odd number on the first die and a 5 or 6 on the second?
Solution:
The probability of throwing an odd number on the first die,
For the second die,
.
.
Since these two events are independent, we have:
12.5 Consider the game described in 12.4. What is the probability that you don't throw an odd
number on the first die and a 5 or 6 on the second?
Solution:
This is the complement of what we calculated in problem 12.4.
Hence