STAT 210: Midterm - Takehome Name(s): Fall 2016 Points: 50

STAT 210: Midterm - Takehome
Fall 2016
Points: 50
Name(s): ________________________________
________________________________
________________________________
Consider the following statistics from Blair Walsh, the recently released Minnesota Vikings kicker.
Let us first consider Walsh’s field goal statistics from 2016. He attempted a total of 16 field
goals and made a total of 12. Walsh’s career percentage for field goals made is 84.2%. [See
information from the boxes labeled A.]
Research Question: Is there statistical evidence to suggest Blair Walsh was under-preforming in
2016 relative to his career percentage of 84.2%?
1. Identify the smallest possible value, largest possible value, location of the reference distribution,
and the outcome from the study for this situation on the number line below. (4 pts)




Smallest possible value
Largest possible value
Location of pyramid
Outcome from study
# Field Goals Made
1
2. Use the binomial probability model to complete the following statistical test.
Research Question: Is there statistical evidence to suggest Blair Walsh was underpreforming this year relative to his career percentage of 84.2%?
Testable Hypothesis
𝐻𝑂 : 𝜋 = 84.2%
𝐻𝐴 : 𝜋 < 84.2%
a. What is the p-value for this test? (2 pts)
b. Write a final conclusion for this research question. (3 pts)
3. Repeat Problem #2, but this time evaluate whether or not Blair Walsh was under-
performing on Point After Touchdowns, i.e. PATs. [Use information from the boxes labeled
B.]
Research Question: Is there statistical evidence to suggest Blair Walsh was underpreforming on PATs this year relative to his career percentage for PATs?
a. What is the p-value for this test? (3 pts)
b. Write a final conclusion for this research question. (3 pts)
2
4. Between the 2014 and 2015
seasons, the NFL made a rule
change for Point After Touchdowns
(PAT). The distance of an PAT was
20 yards before the 2015 season;
this distance was increased to 33
yards beginning in 2015.
a. Review the PAT statistics for Blair Walsh for the 2015 and 2016 seasons. Does it
appear that this rule change affected Blair Walsh? Briefly discuss. (2 pts)
b. If the rule change affected Walsh, then it is not really not fair to compare his 2016
PATs performance against 94.5% -- his career PAT percentage. What would be a
more fair percentage to use instead? Briefly discuss. (3 pts)
For the remaining problems, we will consider the (unofficial) results from the 2016 US Elections
as provided by Townhall.com. I have augmented the election results with several other
demographic variables, e.g. % Bachelor’s degree, % Born in US, Median Household Income, etc.,
from the American Community Survey provided by the US Census Bureau.
Note: Election results are not available for Alaska on Townhall.com website and hence are not
included in this data.
Data Sources:




2016 Raw Election Data:
https://github.com/tonmcg/County_Level_Election_Results_12-16
2016 Election Results: http://townhall.com/election/2016/president/
ACS Data Dictionary: http://api.census.gov/data/2014/acs5/profile/variables.html
Example API pull for ACS Data:
http://api.census.gov/data/2014/acs5/profile?get=NAME,DP03_0062E&for=county:*
3
To begin, create a new variable called Winner. This new variable simply indicates whether or
not Trump gained more votes than Clinton in each county.
New Variable: Winner
5. Use Analyze > Distribution to determine the proportion of counties where Trump received
more votes Clinton. Briefly discuss. Provide the JMP output. (2 pts)
Next, pick three demographic variables of interest in this datasets. These include the variables
from Percent_Households_Married_Family through Percent_Hispanic in the dataset.
Use Cols > Utilities > Make Binning Formula to create a binned version of the three
demographic variables you’ve decided to investigate. You should set the offset value to a
reasonable value, the width should be set so that you have about 4-6 bins, and select Range
Labels in the upper right corner. Click Make Formula Columns to create the new variable. An
example is shown here for Percent_BachelorsDegree.
Next, Use Analyze > Fit Y by X to explore the relationship between Winner and each of the
demographic variables you’ve selected to investigate. Again, an example setup is provided here
for Percent_BachelorsDegree Binned.
Note: The default color scheme in JMP has Clinton as red and Trump as blue in the plots. Use
Value Orderings to flip the order of the response variable so that Clinton in blue and Trump red
which is the common color scheme used for the democratic and republican party, respectively.
4
6. Which of the three demographic variables appears to be most indicative to the outcome of
the 2016 US Election? Explain how you made this determination. Provide your JMP output
as supporting evidence. (6 pts)
7. Consider the demographic variable identified in the previous problem (the most indicative
demographic variable). Use a local data filter to investigate whether or not the relationship
discovered above is influenced in some way by the other demographic variables you’ve
decided to investigate. Explain in details any relevant findings. Again, provide your JMP
output as supporting evidence. (6 pts)
Next, we will use JMP to do some mapping. Mapping will be easier if a yet another new
variable is created. This will be a numeric version of the previously created Winner variable.
New Variable: Winner_Number
After this variable is created, select Graph > Graph Builder. Place Name in the Map Shape box
in the lower left corner and slide the newly created Winner_Number variable onto the map.
Click Done and a map should be produced.
5
Note: You can ignore the fact that LaSalle County, IL and Oglala County, SD are not being
plotted on our map.
8. Use your map to determine which geographic areas of the country tend to vote
Democratic? Which areas tend to vote republican? Briefly discuss. (3 pts)
9. You are able to use a local data filter on your map (just like any other output in JMP). Apply
a local data filter to investigate how the map changes conditioning on another predictor
variable of your choice. (2 pts)
Consider the following summary done in JMP using Tables > Summary
6
10. What is the interpretation of the Mean(Winner_Number) value computed here? Briefly
discuss. (2 pts)
11. Use Tables > Summary to answer the following. Provide a screen-shot of relevant JMP
output for each.
a. Where there any states where Trump won the majority of votes in all counties? If
so, which states were these? (2 pts)
b. Where there any states where Clinton won the majority of votes in all counties? If
so, which states were these? (2 pts)
c. What states tend to be most evenly divided between Trump and Clinton? Briefly
discuss how you made this determination. (2 pts)
12. Use Tables > Summary to verify that Clinton actually gained a majority of the popular vote.
Provide your JMP output as evidence. Briefly discuss. (3 pts)
Hint: You should use Votes_Democrat and Votes_Republican for this problem.
7