Probability-Based Estimation in the 2012 General Election

Probability-Based Estimation in the
2012 General Election Exit Poll
Clint W. Stevenson, Chief Statistician
May 19, 2013
Goals
• Apply varying discrete and continuous
probability distributions to exit poll data.
• Provide a framework for estimating
national electoral votes using exit poll data.
• Expand estimation procedures to
incorporate known data using probability
distribution
Topics
• Simulating State Probabilities Using the
Dirichlet Distribution
• Dirichlet Process Model to Cluster
Elements
• Using Bayesian Regression to Model
State Estimates
• Multinomial Electoral Vote Simulation
Using the Dirichlet and Binomial
Distributions
Probability Distributions
• Normal distribution is by far the most well
known.
• Binomial distribution works for discrete
outcomes. For example winning all-ornothing electoral votes.
Probability Distributions
• Beta distribution works well when working
with univariate proportions.
• Dirichlet distribution is the multivariate
generalization of the Beta distribution.
• Many components of exit polling is
hierarchical in nature and result in different
distributions at each level
Statewide Simulation
5000
Simulated Probability of Obama Winning
Using Election Day Exit Poll Data
Total Probability
99% Credible Interval
Total Probability
99% Credible Interval
3000
Frequency
1000
2000
2000
1500
1000
0
500
0
Frequency
2500
4000
3000
3500
Simulated Probability of Obama Winning
Using Election Day Exit Poll Data
0.35
0.40
0.45
0.50
0.55
Probability of Obama Winning Florida
0.60
0.65
0.35
0.40
0.45
0.50
Probability of Obama Winning North Carolina
0.55
0.60
Beta Marginal Estimation
50000
20000
0
Frequency
Marginal Distribution for Age Group 1 - Age Group 2
-0.42
-0.40
-0.38
-0.36
-0.34
-0.32
-0.30
Marginal Difference
Florida Voter Age
Marginal Percentage
18-29
16 (n=671)
30-59
52 (n=2180)
60+
32 (n=1342)
Though the distribution is a
Dirichlet the marginal distribution
reduces to a Beta.
Dirichlet Process Clustering
• Several variants exist
– Chinese Restaurant Model
• Infinite number of tables with an infinite number of
chairs at each table.
– Pólya’s Urn Model
• Selecting from a distribution of colored balls from
an urn and then replace the ball plus one of same
color.
– Stick Breaking Model
• Breaking at stick at a specified location following a
Beta distribution
Dirichlet Process Clustering
• Provides a natural way to cluster observations without requiring a predefined set number of clusters similar to the well known k-mean
clustering.
• Example graphs using multivariate partitioning based on the 2008 final
vote, 2012 final vote, and 2012 exit poll vote.
Bayesian Regression
Example model uses past 2008 vote as well
as current 2012 exit poll vote.
Example Bayesian Regression
Both the 2008 past vote and 2012 exit poll vote
are strong predictors of the 2012 final vote with
an r-squared value of 0.95
Bayesian Regression Parameters
Bayesian Regression using Non-informative priors.
Example uses Florida Data
Posterior Predictive Distribution
The posterior predictive distribution can be use for model checking.
Example uses data from Florida.
Dirichlet-Multinomial Simulation
• Provides a way to sample from the
electoral vote posterior distribution.
• Determine probability of candidate winning
a state then use that probability to simulate
winning the electoral vote for the given
state.
Dirichlet-Multinomial Simulation
Distribution of Democratic Electoral Votes Based
on the 2012 Exit Poll
Summary & Conclusion
• There are many ways to analyze election
exit poll data using both traditional
statistics as well simulations approaches.
• Using simulation approaches provide a
good way to visualize the probability
distribution of the data rather than focusing
on a single estimate.
• It provides a natural and intuitive way to
understand the results from election
estimates.
Further Research
• Applying other probability distribution concepts
to an exit poll to evaluate:
–
–
–
–
–
–
Exit polling complex designs
Hierarchical modeling
Small sample sizes
Missing data
Censored data
Probability Distributions of Table
• If you have questions or you would like a copy of
the paper: [email protected]