Statistics Team: Question #7 March Statewide Invitational

Statistics Team: Question #1
March Statewide Invitational
The quartile coefficient of dispersion (QCD) is a descriptive measure of dispersion that is used to make comparisons both
within and between data sets or probability distributions. The QCD is easily computed from the quartiles of the data set
or probability distribution using the formula 𝑄𝐶𝐷 =
𝑄3 − 𝑄1
. Note: it is possible for the QCD to be undefined in the case
𝑄3 + 𝑄1
when the first and third quartiles are opposites of each other. Compute the QCD for each of the following:
A) The data set of random digits {7, 3, 1, 4, 9, 6, 8, 5, 0, 2} as a simplified fraction.
B) The discrete uniform distribution for the set of digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} as a simplified fraction.
C) A Normal distribution with mean of 100 and a standard deviation of 20 rounded to the nearest thousandth.
D) The Standard Normal distribution.
Statistics Team: Question #1
March Statewide Invitational
The quartile coefficient of dispersion (QCD) is a descriptive measure of dispersion that is used to make comparisons both
within and between data sets or probability distributions. The QCD is easily computed from the quartiles of the data set
𝑄 −𝑄
or probability distribution using the formula 𝑄𝐶𝐷 = 𝑄3 + 𝑄1 . Note: it is possible for the QCD to be undefined in the case
3
1
when the first and third quartiles are opposites of each other. Compute the QCD for each of the following:
A) The data set of random digits {7, 3, 1, 4, 9, 6, 8, 5, 0, 2} as a simplified fraction.
B) The discrete uniform distribution for the set of digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} as a simplified fraction.
C) A Normal distribution with mean of 100 and a standard deviation of 20 rounded to the nearest thousandth.
D) The Standard Normal distribution.
Statistics Team: Question #2
March Statewide Invitational
What are the degrees of freedom for each of the following inference procedures?
A) A one-sample t-test comparing the sample mean of a SRS of 25 test scores to a population mean.
B) Regression inference performed on a set of 10 pre-tests paired with a set of 10 post-tests.
C) A chi-square goodness-of-fit test assessing the fairness of a random digit generator.
D) An independent two-sample t-test comparing the mean of a SRS of 25 test scores to a SRS of 20 test scores
using the conservative approach.
Statistics Team: Question #2
March Statewide Invitational
What are the degrees of freedom for each of the following inference procedures?
A) A one-sample t-test comparing the sample mean of a SRS of 25 test scores to a population mean.
B) Regression inference performed on a set of 10 pre-tests paired with a set of 10 post-tests.
C) A chi-square goodness-of-fit test assessing the fairness of a random digit generator.
D) An independent two-sample t-test comparing the mean of a SRS of 25 test scores to a SRS of 20 test scores
using the conservative approach.
Statistics Team: Question #3
March Statewide Invitational
Which of the following statements are true regarding the sum of a finite set of independent normally distributed random
variables? Answer each part as 0 for “False” or 1 for “True.”
A) The mean of their sum is the sum of their means.
B) The median of their sum is the sum of their medians.
C) The variance of their sum is the sum of their variances.
D) The standard deviation of their sum is the sum of their standard deviations.
Statistics Team: Question #3
March Statewide Invitational
Which of the following statements are true regarding the sum of a finite set of independent normally distributed random
variables? Answer each part as 0 for “False” and 1 for “True.”
A) The mean of their sum is the sum of their means.
B) The median of their sum is the sum of their medians.
C) The variance of their sum is the sum of their variances.
D) The standard deviation of their sum is the sum of their standard deviations.
Statistics Team: Question #4
March Statewide Invitational
Given the appropriate conditions (such as sufficiently large sample size or number of independent trials, or as the degrees
of freedom increases), the 68, 95, 99.7 rule (also known as the Empirical Rule) applies to which of the following
distributions? Answer each part as 0 for “No” or 1 for “Yes.”
A) The Normal Distribution
B) The Student’s t-Distribution
C) The Binomial Distribution
D) The Chi-Square Distribution
Statistics Team: Question #4
March Statewide Invitational
Given the appropriate conditions (such as sufficiently large sample size or number of independent trials, or as the degrees
of freedom increases), the 68, 95, 99.7 rule (also known as the Empirical Rule) applies to which of the following
distributions? Answer each part as 0 for “No” or 1 for “Yes.”
A) The Normal Distribution
B) The Student’s t-Distribution
C) The Binomial Distribution
D) The Chi-Square Distribution
Statistics Team: Question #5
March Statewide Invitational
The distribution of a large data set is such that the odds of randomly selecting a data value from it with a positive Z-score
to that of randomly selecting a data value with negative Z-score are 4 to 1, and none of the Z-scores are equal to 0.
A) What is the probability of randomly selecting a data value from this set with a positive Z-score?
B) What is the probability of randomly selecting a data value from this set below the mean?
C) What is the probability of randomly selecting a data value from this set that is exactly equal to the mean?
D) Two data values are randomly selected, with replacement, from the set. What is the probability that exactly
one has a positive Z-score and the other has a negative Z-score in either order?
Statistics Team: Question #5
March Statewide Invitational
The distribution of a large data set is such that the odds of randomly selecting a data value from it with a positive Z-score
to that of randomly selecting a data value with negative Z-score are 4 to 1, and none of the Z-scores are equal to 0.
A) What is the probability of randomly selecting a data value from this set with a positive Z-score?
B) What is the probability of randomly selecting a data value from this set below the mean?
C) What is the probability of randomly selecting a data value from this set that is exactly equal to the mean?
D) Two data values are randomly selected, with replacement, from the set. What is the probability that exactly
one has a positive Z-score and the other has a negative Z-score in either order?
Statistics Team: Question #6
March Statewide Invitational
The classic role-playing game of Dungeons & Dragons uses a 4, 6, 8, 10, 12, and 20-sided die. All of the dice are
numbered in the usual way with 1 through n (for n sides) with the exception of the 10-sided die. It is numbered with the
digits 0 through 9. A player takes all 6 dice into his hands, shakes them up, and rolls them all simultaneously exactly one
time. Compute each of the following probabilities as simplified fractions. You may assume all dice are fair.
A) P(All 6 dice show the same side)
B) P(A product of 0 when the results of the 6 dice are multiplied together)
C) P(All dice show a prime number)
D) P(The sum on the six dice is 50 given that the result on the 10-sided die is 0)
Statistics Team: Question #6
March Statewide Invitational
The classic role-playing game of Dungeons & Dragons uses a 4, 6, 8, 10, 12, and 20-sided die. All of the dice are
numbered in the usual way with 1 through n (for n sides) with the exception of the 10-sided die. It is numbered with the
digits 0 through 9. A player takes all 6 dice into his hands, shakes them up, and rolls them all simultaneously exactly one
time. Compute each of the following probabilities as simplified fractions. You may assume all dice are fair.
A) P(All 6 dice show the same side)
B) P(A product of 0 when the results of the 6 dice are multiplied together)
C) P(All dice show a prime number)
D) P(The sum on the six dice is 50 given that the result on the 10-sided die is 0)
Statistics Team: Question #7
March Statewide Invitational
Recall the six dice from question 6: 4, 6, 8, 10, 12, and 20-sided dice all numbered the usual way with 1 through n (for n
sides) with the exception of the 10-sided die. It is numbered with the digits 0 through 9. Compute each of the following.
A) The minimum possible sum on all six dice.
B) The maximum possible sum on all six dice
C) The expected value of the sum on all 6 dice.
D) The variance of the sum on all 6 dice as a simplified fraction.
Statistics Team: Question #7
March Statewide Invitational
Recall the six dice from question 6: 4, 6, 8, 10, 12, and 20-sided dice all numbered the usual way with 1 through n (for n
sides) with the exception of the 10-sided die. It is numbered with the digits 0 through 9. Compute each of the following.
A) The minimum possible sum on all six dice.
B) The maximum possible sum on all six dice
C) The expected value of the sum on all 6 dice.
D) The variance of the sum on all 6 dice as a simplified fraction.
Statistics Team: Question #8
March Statewide Invitational
The letters a, b, c, and d in the following two-way table each represent probabilities, not frequencies, for events X and Y.
X
XC
Y
a
c
YC
b
d
A) What is the sum of a, b, c, and d?
B) Compute the sum of the following statement numbers that are true if events X and Y are mutually exclusive:
1. a = 0
2. P(X or Y) = b + c
3. d = 1 – P(X or Y)
4. P(Y | X) = 0
5. Events X and Y are independent.
C) If events X and Y are both mutually exclusive and exhaustive, then a + d = ____.
D) Compute the sum of the following statement numbers that are true if events X and Y are independent:
1. a = (a + b)(a + c) and d = (c + d)(b + d)
2. b = P(X) P(YC) and c = [1 – P(YC)] [1 – P(X)]
3. If P(X) = P(Y), then b = c.
4. a ≠ 0
Statistics Team: Question #8
March Statewide Invitational
The letters a, b, c, and d in the following two-way table each represent probabilities, not frequencies, for events X and Y.
X
XC
Y
a
c
YC
b
d
A) What is the sum of a, b, c, and d?
B) Compute the sum of the following statement numbers that are true if events X and Y are mutually exclusive:
1. a = 0
2. P(X or Y) = b + c
3. d = 1 – P(X or Y)
4. P(Y | X) = 0
5. Events X and Y are independent.
C) If events X and Y are both mutually exclusive and exhaustive, then a + d = ____.
D) Compute the sum of the following statement numbers that are true if events X and Y are independent:
1. a = (a + b)(a + c) and d = (c + d)(b + d)
2. b = P(X) P(YC) and c = [1 – P(YC)] [1 – P(X)]
3. If P(X) = P(Y), then b = c.
4. a ≠ 0
Statistics Team: Question #9
March Statewide Invitational
A vehicle manufacturing plant receives shipments from three different parts vendors. Lately, there has been a defect issue
with some of the electrical wiring in the vehicles manufactured at the plant and the plant manager fears that all of the
vendors are contributing equally to the defect issue. Hence, he cannot justify cancelling the business contract with any
one of them in favor of the others. He reviews a random sample of 500 quality assurance inspections from the last six
months to see if his fear is warranted. The data are summarized in the table below.
Part Quality
Perfect
Acceptable
Rejected
Total
Vendor A
53
93
24
170
Vendor B
48
71
31
150
Vendor C
70
88
22
180
Total
171
252
75
500
Compute each of the following as either simplified fractions or decimals rounded to hundredths place.
A) P (A randomly selected rejected part came from Vendor B)
B) P (A randomly selected part that did not come from Vendor B was not rejected)
C) Assume all inference assumptions and conditions are met and perform the appropriate statistical test on the
data. Compute the value of the test statistic and add it to the degrees of freedom for the appropriate test.
D) Does the data provide statistically significant evidence at the 5% level to reject the plant manager’s fear? And
hence, can he justify cancelling the business contract with any one of the parts vendors? If so, which vendor
A, B, or C? Answer with the appropriate vendor letter or “NONE” if the manager cannot justify cancelling
any of the vendors’ contracts based on this data.
Statistics Team: Question #9
March Statewide Invitational
A vehicle manufacturing plant receives shipments from three different parts vendors. Lately, there has been a defect issue
with some of the electrical wiring in the vehicles manufactured at the plant and the plant manager fears that all of the
vendors are contributing equally to the defect issue. Hence, he cannot justify cancelling the business contract with any
one of them in favor of the others. He reviews a random sample of 500 quality assurance inspections from the last six
months to see if his fear is warranted. The data are summarized in the table below.
Part Quality
Perfect
Acceptable
Rejected
Total
Vendor A
53
93
24
170
Vendor B
48
71
31
150
Vendor C
70
88
22
180
Total
171
252
75
500
Compute each of the following as either simplified fractions or decimals rounded to hundredths place.
A) P (A randomly selected rejected part came from Vendor B)
B) P (A randomly selected part that did not come from Vendor B was not rejected)
C) Assume all inference assumptions and conditions are met and perform the appropriate statistical test on the
data. Compute the value of the test statistic and add it to the degrees of freedom for the appropriate test.
D) Does the data provide statistically significant evidence at the 5% level to reject the plant manager’s fear? And
hence, can he justify cancelling the business contract with any one of the parts vendors? If so, which vendor
A, B, or C? Answer with the appropriate vendor letter or “NONE” if the manager cannot justify cancelling
any of the vendors’ contracts based on this data.
Statistics Team: Question #10
March Statewide Invitational
Let X and Y represent distinct random variables defined on some subset of real numbers with finite and distinct means
and variances, none of which are equal to zero. Let the letters c and k represent distinct positive real constants.
Determine which of the following statements about X and Y are always true, sometimes true (and hence, sometimes
false), or never true (and hence, always false).
i.)
P(X ≤ c) = P(X < c)
iv.) µX + µY ≠ 0
ii.) P(X = c and Y = k) = P(X = c)*P(Y = k)
v.) P(Y < µY) = 0.5
iii.) P(X = c or Y = k) = P(X = c) + P(Y = k)
vi.) Var(X + Y) = Var(X) + Var(Y)
A = the number of statements which are always true.
B = the number of statements which are sometimes true.
C = the number of statements which are never true.
D = the mean of A, B, and C.
Statistics Team: Question #10
March Statewide Invitational
Let X and Y represent distinct random variables defined on some subset of real numbers with finite and distinct means
and variances, none of which are equal to zero. Let the letters c and k represent distinct positive real constants.
Determine which of the following statements about X and Y are always true, sometimes true (and hence, sometimes
false), or never true (and hence, always false).
i.)
P(X ≤ c) = P(X < c)
iv.) µX + µY ≠ 0
ii.) P(X = c and Y = k) = P(X = c)*P(Y = k)
v.) P(Y < µY) = 0.5
iii.) P(X = c or Y = k) = P(X = c) + P(Y = k)
vi.) Var(X + Y) = Var(X) + Var(Y)
A = the number of statements which are always true.
B = the number of statements which are sometimes true.
C = the number of statements which are never true.
D = the mean of A, B, and C.
Statistics Team: Question #11
March Statewide Invitational
The total score in the “Hustle” team event at the FAMAT State Convention ranges from 0 to 500. The XL Miner linear
regression output table below represents the results of analyzing the top 20 scores over the past 10 years to see if it is
possible to predict the expected value of a hustle team’s score from the team’s final rank. Thus, the explanatory variable is
the hustle team’s final rank denoted as “Hustle Rank” (1 = 1st, 2 = 2nd, and so on until 20 = 20th) and the response variable
is the team’s total score in the event denoted as “Hustle Score.” Assume all assumptions and conditions are verified.
Correlation
R Square
Degrees of Freedom
Standard Error
Observations
A
0.8288
B
28.1664
200
Intercept
Hustle Rank
Coefficient
408.9332
-10.6932
Standard Error
4.1376
C
t-score
98.8340
-30.9589
P-value
0.0000
0.0000
95% Conf. Interval
400.7738 417.0925
-11.3743
-10.0120
A) What is the sample linear correlation coefficient between a hustle team’s final score and their final rank
rounded to the nearest hundredth?
B) What are the degrees of freedom for this regression analysis?
C) What is the standard error of the sample slope about the regression line rounded to 4 decimal places?
D) What is the estimated value of the variance in hustle score about the regression line for each given final rank
rounded to the nearest thousandth?
Statistics Team: Question #11
March Statewide Invitational
The total score in the “Hustle” team event at the FAMAT State Convention ranges from 0 to 500. The XL Miner linear
regression output table below represents the results of analyzing the top 20 scores over the past 10 years to see if it is
possible to predict the expected value of a hustle team’s score from the team’s final rank. Thus, the explanatory variable is
the hustle team’s final rank denoted as “Hustle Rank” (1 = 1st, 2 = 2nd, and so on until 20 = 20th) and the response variable
is the team’s total score in the event denoted as “Hustle Score.” Assume all assumptions and conditions are verified.
Correlation
R Square
Degrees of Freedom
Standard Error
Observations
A
0.8288
B
28.1664
200
Intercept
Hustle Rank
Coefficient
408.9332
-10.6932
Standard Error
4.1376
C
t-score
98.8340
-30.9589
P-value
0.0000
0.0000
95% Conf. Interval
400.7738 417.0925
-11.3743
-10.0120
A) What is the sample linear correlation coefficient between a hustle team’s final score and their final rank
rounded to the nearest hundredth?
B) What are the degrees of freedom for this regression analysis?
C) What is the standard error of the sample slope about the regression line rounded to 4 decimal places?
D) What is the estimated value of the variance in hustle score about the regression line for each given final rank
rounded to the nearest thousandth?
Statistics Team: Question #12
March Statewide Invitational
Recall the linear regression output table from the previous question. The explanatory variable is the hustle team’s final
rank denoted as “Hustle Rank” (1 = 1st, 2 = 2nd, and so on until 20 = 20th) and the response variable is the team’s total
score in the event denoted as “Hustle Score.” Note: Trophies are awarded to the top-10 hustle teams; and again, you may
assume all regression assumptions and conditions are met. Round all FINAL answers to the nearest whole number.
Correlation
R Square
Degrees of Freedom
Standard Error
Observations
-0.9104
0.8288
198
28.1664
200
Intercept
Hustle Rank
Coefficient
408.9332
-10.6932
Standard Error
4.1376
0.3454
t-score
98.8340
-30.9589
P-value
0.0000
0.0000
95% Conf. Interval
400.7738 417.0925
-11.3743 -10.0120
A) What does the model predict as the approximate positive difference between 1st and 10th place hustle teams’ scores?
B) The highest scoring hustle team to not win a trophy was the 2015 11th place team who scored 337 points. According to
this regression model, by approximately how many points did this team over perform?
C) What does the model predict as the approximate final rank of the 2015 hustle team who scored 337 points?
D) = The number of correct statements below.
a. The size of the average or “typical” prediction error when using this linear regression model to predict a hustle
team’s expected final score from their final rank is approximately 28 points.
b. About 82.9% of the variance in a hustle team’s final score is accounted for by the linear relationship the team’s
final score has with its final rank.
c. The model is statistically significant at the 1% level of significance.
d. We are 95% confident that for each unit change in a hustle team’s final rank, their predicted final score will
change by about 10.01 to 11.37 points, on average.
e. Using the model to predict the expected final hustle score of a team who finishes in 40th place is quite accurate.
Statistics Team: Question #12
March Statewide Invitational
Recall the linear regression output table from the previous question. The explanatory variable is the hustle team’s final
rank denoted as “Hustle Rank” (1 = 1st, 2 = 2nd, and so on until 20 = 20th) and the response variable is the team’s total
score in the event denoted as “Hustle Score.” Note: Trophies are awarded to the top-10 hustle teams; and again, you may
assume all regression assumptions and conditions are met. Round all FINAL answers to the nearest whole number.
Correlation
R Square
Degrees of Freedom
Standard Error
Observations
-0.9104
0.8288
198
28.1664
200
Intercept
Hustle Rank
Coefficient
408.9332
-10.6932
Standard Error
4.1376
0.3454
t-score
98.8340
-30.9589
P-value
0.0000
0.0000
95% Conf. Interval
400.7738 417.0925
-11.3743 -10.0120
A) What does the model predict as the approximate positive difference between 1st and 10th place hustle teams’ scores?
B) The highest scoring hustle team to not win a trophy was the 2015 11th place team who scored 337 points. According to
this regression model, by approximately how many points did this team over perform?
C) What does the model predict as the approximate final rank of the 2015 hustle team who scored 337 points?
D) = The number of correct statements below.
a. The size of the average or “typical” prediction error when using this linear regression model to predict a hustle
team’s expected final score from their final rank is approximately 28 points.
b. About 82.9% of the variance in a hustle team’s final score is accounted for by the linear relationship the team’s
final score has with its final rank.
c. The model is statistically significant at the 1% level of significance.
d. We are 95% confident that for each unit change in a hustle team’s final rank, their predicted final score will
change by about 10.01 to 11.37 points, on average.
e. Using the model to predict the expected final hustle score of a team who finishes in 40th place is quite accurate.
Statistics Team: Question #13
March Statewide Invitational
Suppose random variable X follows a continuous uniform distribution on [0, 10]. Compute each of the following
probabilities rounded to the nearest thousandth:
A) P(x is prime)
B) P(x is composite)
C) P(x is a perfect square)
D) 𝑃(√2 < 𝑥 < 𝜋)
Statistics Team: Question #13
March Statewide Invitational
Suppose random variable X follows a continuous uniform distribution on [0, 10]. Compute each of the following
probabilities rounded to the nearest thousandth:
A) P(x is prime)
B) P(x is composite)
C) P(x is a perfect square)
D) 𝑃(√2 < 𝑥 < 𝜋)
Statistics Team: Question #14
March Statewide Invitational
A small SRS of multiple-choice AP Statistics exam scores is selected from the population of Florida scores and is
compared to a small SRS of multiple-choice AP Statistics exam scores from California to determine if there is a
statistically significant difference between the mean scores of students from the two states. The results of the two samples
summarized in the table below are for the raw scores, which is the number correct out of 40 multiple-choice questions.
Florida
California
Sample Mean
25
20
Sample Variance
160
90
Sample Size
10
10
After all necessary assumptions and conditions are verified; the appropriate statistical test is performed on each of the
following sets of scores in order to compare the two group means. Compute the absolute value of each test statistic to the
nearest hundredth.
A) The set of raw scores from each state.
B) The set of percent correct from each state (dividing each raw score by 40 and converting to a percent).
C) Each score in each set of raw scores is multiplied by 1.25, which converts them to a 50-point scale.
D) The raw score in each set of sample data is converted to a Z-score based on the respective sample mean and
sample standard deviation for each set of scores.
Statistics Team: Question #14
March Statewide Invitational
A small SRS of multiple-choice AP Statistics exam scores is selected from the population of Florida scores and is
compared to a small SRS of multiple-choice AP Statistics exam scores from California to determine if there is a
statistically significant difference between the mean scores of students from the two states. The results of the two samples
summarized in the table below are for the raw scores, which is the number correct out of 40 multiple-choice questions.
Florida
California
Sample Mean
25
20
Sample Variance
160
90
Sample Size
10
10
After all necessary assumptions and conditions are verified; the appropriate statistical test is performed on each of the
following sets of scores in order to compare the two group means. Compute the absolute value of each test statistic to the
nearest hundredth.
A) The set of raw scores from each state.
B) The set of percent correct from each state (dividing each raw score by 40 and converting to a percent).
C) Each score in each set of raw scores is multiplied by 1.25, which converts them to a 50-point scale.
The raw score in each set of sample data is converted to a Z-score based on the respective sample mean and sample
standard deviation for each set of scores.
Statistics Team: Question #15
March Statewide Invitational
Determine which of the following statements about graphical displays of data sets are always true, sometimes true (and
hence, sometimes false), or never true (and hence, always false).
i.
ii.
iii.
iv.
v.
vi.
It is possible to extract the exact values of the data set from a stemplot.
A bar graph is useful for determining the skewness or symmetry of the distribution of a categorical data set.
Any data set that is appropriately displayed in a bar graph could also be displayed in a pie chart.
The mean of the data set can be estimated from a boxplot that has a sufficiently detailed scale.
A scatterplot is useful for determining if there is a correlation between two categorical variables.
The exact shape of the distribution in a frequency histogram and a relative frequency histogram of the same
data set are exactly the same.
A = the number of statements which are always true.
B = the number of statements which are sometimes true.
C = the number of statements which are never true.
D = the sample variance of A, B, and C.
Statistics Team: Question #15
March Statewide Invitational
Determine which of the following statements about graphical displays of data sets are always true, sometimes true (and
hence, sometimes false), or never true (and hence, always false).
i.
ii.
iii.
iv.
v.
vi.
It is possible to extract the exact values of the data set from a stemplot.
A bar graph is useful for determining the skewness or symmetry of the distribution of a categorical data set.
Any data set that is appropriately displayed in a bar graph could also be displayed in a pie chart.
The mean of the data set can be estimated from a boxplot that has a sufficiently detailed scale.
A scatterplot is useful for determining if there is a correlation between two categorical variables.
The exact shape of the distribution in a frequency histogram and a relative frequency histogram of the same
data set are exactly the same.
A = the number of statements which are always true.
B = the number of statements which are sometimes true.
C = the number of statements which are never true.
D = the sample variance of A, B, and C.