Practice Midterm I Problems from Chapter 1 Consider the data table: Id Sex Age Eye Color 1 F 23 Brown 2 M 21 Blue 1.1 How many records are in this table? Ans: 2 1.2 How many quantitative variables are in this table? Ans: 1 1.3 How many categorical variables are in this table? Ans: 2 Problems from Chapter 2 2.1 Fill in the missing relative frequencies: Class Frequency Relative Frequency First 325 14.77% Second 285 12.95% Third 706 32.08% Crew 885 40.21% Total: 2201 100% Problems from Chapter 3 3.1 For each property, which of the histograms have that property? A B (a) (b) (c) (d) (e) bimodal skew right skew left unimodal symmetric C Ans: Ans: Ans: Ans: Ans: B C D A, C, D A D 3.2 Find the median of the numbers 1, 3, 5, 7, 9, 100. Solution: Since there are an odd number of numbers here, we take the average of the middle two: 3.3 What is the range of the data in problem 3.2? Solution: The range is the largest minus the smallest value in the data: 3.4 What is the first and third quartiles, Q1 and Q3, for the data in problem 3.2? Solution: Q1 is the median of the lower half of the data, 1, 3, 5, which in this case is 3. Q3 is the median of the upper half of the data, 7, 9, 100, which in this case is 9. 3.5 Find the IQR for the data in problem 3.2? Solution: IQR = Q3 – Q1 = 9 – 3 = 6 3.6 Find the mean of the data in problem 3.2. Solution: 3.7 Describe the differences between a bar chart and a histogram. Answer: There are two differences, one is in the type of data that is presented and the other in the way they are drawn. In bar graphs are usually used to display categorical data, that is data that fits into categories. Histograms on the other hand are usually used to present quantitative data, that is data that represents measured quantity where, at least in theory, the numbers can take on any value in a certain range. Histograms are never plotted with spaces between the rectangles, while bar charts always have spaces between the rectangles. Problems from Chapter 4 4.1 Which of the following could not be the cause of an erroneous outlier: (a) (b) (c) (d) (e) Transposing digits Excluding extreme values from the data Confusion about units Error entering data Cheating ← Answer Problems from Chapter 5 5.1 Suppose that and . What is the z-score of 9? Solution: 5.2 An SAT test has a mean of 620 with a standard deviation of 90, while an ACT test has a mean of 20 with a standard deviation of 5. If a student scores 780 on the SAT and 30 on the ACT, which score is more impressive? Solution: We need to z-transform both scores to compare them. SAT: ACT: The ACT score was a little more impressive. 5.3 When is it appropriate to model data with a normal distribution? (A) (B) (C) (D) The distribution is bimodal and strongly left-skewed. The distribution is multimodal and symmetric. The distribution is unimodal and strongly right-skewed. The distribution is unimodal and symmetric. ← Answer 5.4 About what percentage of normally distributed data falls within one sigma of the mean? Answer: 68% 5.5 About what percentage of normally distributed data falls within two sigmas of the mean? Answer: 95% 5.6 About what percentage of normally distributed data falls within three sigmas of the mean? Answer: 99.7% Problems from Chapter 6 6.1 Which of the following are not assumptions one should make when doing a linear regression: (a) (b) (c) (d) The variables must be quantitative. The variables should have the same variance. ← Answer The scatterplot of the variables should be relatively straight. There are no strong outliers.. 6.2 Correlation measures the strength of the linear association between two variables. (True or False) Answer: True 6.3 Correlation implies causation. (True or False) 6.4 When does one obtain a correlation Answer: When all the Answer: False! between two variables, x and y? pairs are on a line with a negative slope. 6.5 What kind of variable may cause both variables in our model to be highly correlated? Answer: a lurking variable Problems from Chapter 7 7.1 Suppose that you obtain a correlation coefficient . About what percentage of the variation of the response variable is explained by this linear regression model? Solution: The square of the correlation coefficient gives the percentage of the variation explained. Since , this model explains 79.2% of the variance. 7.2 Suppose you do a linear regression and obtain the line , what's residual of this data point? Solution: Our linear model predicts a value of this data point is . The residual is thus . If one of your data points is . But the actual y-value for . 7.3 What sort of residual plot does one expect to see if the data is truly linear? Answer: A good model will have the spread of the residuals consistent and small. 7.4 Suppose that you want to perform a linear regression on some data with a number of x values and corresponding y values. If the standard deviation of the x values is 7 and the standard deviation of the y values is 20, and the correlation of x with y is .7, then what is the slope of the regression line? Solution: 7.5 Continuing from 7.4, assume that the average of all the x values is 40 and the average of all the y values is 77. Find the y-intercept of the regression line. Solution: Problems from Chapter 8 8.1 A data point whose x-value is far from the mean of the rest of the x-values is said to have high ____. Answer: leverage 8.2 A point is ______ if omitting it from the analysis changes the model enough to make a meaningful difference. Answer: influential 8.3 Statistics such as the mean and median tend to inflate the impression of the strength of the linear relationship. (True or False) Answer: True Problems from Chapter 9 9.1 What R command can you use to randomly simulate the results of 1000 die rolls? Answer: sample(1:6, 1000, replace=TRUE) 9.2 Suppose event A happens 20% of the time, B happens 30% of the time, and C happens 50% of the time. What R command creates a sample of 1000 A’s, B’s, C’s with these probabilities? Answer: sample(c(A, B, C), 1000, replace=TRUE, prob=c(.2, .3, .5)) Problems from Chapter 10 10.1 When obtaining a sample from a population, which is more important – to select a representative sample or to select a large sample? Answer: to select a representative sample 10.2 Suppose that a population is 70% women and 30% men. If your sampling frame consists of 150 women and 200 men, for stratified sampling, what is the maximum number of women and men you should randomly select for your survey? Answer: 140 women and 60 men 10.3 If you survey every third student in a class, this is an example of _________ sampling. Answer: systematic 10.4 If you randomly select a Math B22 class, and survey every student in the class, this is an example of _________ sampling. Answer: cluster 10.5 If you randomly select 100 students from all students currently taking Math B22, this is an example of __________ sampling. Answer: simple random sampling 10.6 A sampling scheme that uses more than one sampling method is called _________________. Answer: multistage sampling 10.7 A small trial run of a survey that you will eventually give to a larger group is called a _________. Answer: pilot 10.8 If you find serious bias in a survey you've conducted, will it help to increase the size of your sample? Answer: no Problems from Chapter 11 11.1 Is it easy to determine cause and effect relationships from observational studies and from retrospective studies? Answer: no 11.2 An experiment requires random assignments of subjects to _________ . Answer: treatments 11.3 Is is possible to determine cause and effect relationships from conducting experiments? Answer: yes 11.4 Explanatory variables are called _______ . Answer: factors 11.5 In experiments, controls what you can, and _______ the rest. Answer: randomize 11.6 In experimental design, _________ plays the role that stratifying accomplishes in survey design. Answer: blocking 11.7 A difference is called statistically ___________ if the difference is much greater than what one would expect from random chance. Answer: significant 11.8 When neither the subject nor the experimenter knows which treatment is given, this is called ___. Answer: double-blinding 11.9 A substance given to a subject that looks like a drug being tested but in reality is a sugar pill, it is called a _______. Answer: placebo Problems from Chapter 12 12.1 Suppose that events A and B are disjoint, and Then Solution: Since A and B are disjoint, , . 12.2 Suppose that events A and B are independent, and Then , . Solution: Since A and B are independent, 12.3 Suppose that events A and B are disjoint. Can they also be independent? Solution: No. The reason is that when events are truly independent, one gives you zero information about the other, in the sense that knowing that A occurred gives you no information about whether or not B occurred. But if the events are disjoint, and you know that A occurred, then you know that B did not occur. Since A occurring gives you information about whether or not B occurred, A and B can't be independent. 12.4 Consider the game where you throw dice. What is the probability that you throw an odd number on the first die and a 5 or 6 on the second? Solution: The probability of throwing an odd number on the first die, For the second die, . . Since these two events are independent, we have: 12.5 Consider the game described in 12.4. What is the probability that you don't throw an odd number on the first die and a 5 or 6 on the second? Solution: This is the complement of what we calculated in problem 12.4. Hence
© Copyright 2026 Paperzz