What is probability?

MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
Empirical Methods
Appendix 1: Probability &
Descriptive Statistics
•
•
•
•
•
•
Relevance of probability to statistics
Definition and terminology of probability
Random variables
Probability distributions
Measures of central tendency and variability
The Normal and related distributions
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
1
What’s probability got to do with it?





We examine a sample to learn about the population.
To ensure representativeness, the sample is random.
So, observed outcomes (and statistics) are
determined by chance – they are values of random
variables.
Probability tells us about their behaviour.
Allows us to think in terms of expected values.


What is the expected value of a statistic computed from
thousands of repeated random samples?
What does that expected value reveal about the
population?
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
2
What is probability?





Relative frequency with which an event occurs (frequentist)
Experiment – a procedure that can be repeated infinitely and has
defined outcomes
Random variable – outcome of an experiment in form of a numerical
value
X is random variable and x is particular value of X
E.g. Tossing a coin is an experiment with potential outcome ‘heads’. If
you toss a coin millions of times, then n(heads)/n=0.5. P(heads)=0.5
P  X  x   lim
n 
n X  x
n
Empirical Methods - 2
Owen O'Donnell
University of Macedonia
Owen O'Donnell, University of Macedonia
3
1
MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
Discrete random variables

Take on a finite number of values
E.g. 0 (tails) or 1 (heads)  Bernoulli; or 0,1,2,3,4,5,...... (marriages!)


Are described by a list of all possible values and their
associated probabilities
x x x ...
x
1
p  P X  x  , j 1,2,....k.
j
2
3
p p p ...
1
2
3
p
k
j

This is known as the probability density function (pdf),

Probabilities must sum to 1, 
And lie between 0 and 1,
f  x   p , j 1,2,....k.
j
j
k

k
j 1
p 1
j
0  p 1
j
Empirical Methods - 2
4
Owen O'Donnell, University of Macedonia
Continuous random variables



Take any real value with a probability of zero
There are so many possible values, it is not
feasible to list them all e.g.
10.000000000001, 10.000000000002,
Since probability of any particular value is
zero, define event as range of values and
compute associated probability,
P X b   f  xdxF b
b


F(x) is the cumulative density function
Empirical Methods - 2
5
Owen O'Donnell, University of Macedonia
Probability density function
Cumulative density function
F(b)
f(x)
F(x)
F(x)
f(x)
F(a)
a
b
P a X b  f  xdx F b F a
a
b
b
a
Owen O'Donnell
University of Macedonia
P  X  b   1  F (b)
2
MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
Features of probability
distributions



In statistics, we are interested in the
probability distributions of random
variables
In particular, in the centre and the
dispersion of the distribution
Parameters describe these characteristics
of the distribution
Empirical Methods - 2
7
Owen O'Donnell, University of Macedonia
Measure of central tendency



The expected value of a variable is a
weighted average of all possible values
with weights given by the associated
relative frequencies.
It is the mean
Discrete: E x  x f  x   x f  x  ....  x f  x 
1
1
2
 x f x  
2
k
k
k
j 1

Continuous:
j
j
E x   xf  xdx  


Empirical Methods - 2
8
Owen O'Donnell, University of Macedonia
Properties of expected values



E.1: E(c) = c
E.2: E(aX+b) = aE(X)+b
E.3:
E aX aX ...aX   aE X  aE X  ...aE X 
1
1
2
2
k
Empirical Methods - 2
Owen O'Donnell
University of Macedonia
k
1
1
2
2
Owen O'Donnell, University of Macedonia
k
k
9
3
MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
The Median






The median is another measure of central
tendency
It is the value at the centre of the distribution, in
the sense that 50% of realised outcomes lie
below it.
P(X<Med(X))=0.5
Mean=median only if the distribution is symmetric
Mean and median measure different
characteristics
Mean is more widely used because it is the
expected value
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
10
Measures of dispersion


How tightly is a variable distributed around its expected
value?
Variance is the expected (squared) distance of a variable
from its mean
Var  X   E  X      
2
2




V.1: If P(X=c)=1, E(X)=c, Var(X)=0
V.2: Var(aX+b)=a2Var(X)
Standard deviation is the (positive) square root of the
variance sd ( X )  Var ( X )  
Unlike the variance, the standard deviation is measured in
the same units as X
11
Standardised variables



Transforming a variable by subtracting
its mean and dividing by its standard
deviation gives a variable with mean 0
and variance 1.
X 
Define, Z  
then using E.2 and V.2
E Z  
E( X )  

Empirical Methods - 2
Owen O'Donnell
University of Macedonia
 0, Var ( Z ) 
Var ( X )

2
1
Owen O'Donnell, University of Macedonia
12
4
MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
The Normal distribution






Often in statistics we either know, or must
assume, that a distribution has a particular shape.
The most popular distribution used is the normal.
This is because we often know that a statistic (e.g.
sample mean) follows the normal distribution, and
sometimes it is because of its convenience.
The normal distribution is symmetric and bellshaped.
If random variable X follows a normal distribution
with mean μ and variance σ2, we write X~N(μ,σ2).
Height and weight are normally distributed.
Income is approximately log-normally distributed
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
13
Normal distribution with μ=1 &
σ=1
f  x 
  x   
1
exp 
2
 2

2
Empirical Methods - 2
2

,   x  

Owen O'Donnell, University of Macedonia
14
Normal distributions with different
means and standard deviations
Plot 1
Plot 2
Plot 3
Mean
1
2
3
S.d.
1
2
1
Empirical Methods - 2
Owen O'Donnell
University of Macedonia
Owen O'Donnell, University of Macedonia
15
5
MA Politics and Economics of
Contemporary Eastern and
Southereastern Europe
Standard normal distribution
X  N   ,
2
Z 
X 

 N  0,1
Standard normal probability density function:
1
exp   z / 2 
  z 
2
Standard normal cumulative density function:   z   P ( Z  z )
2
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
16
Some commonly used probabilities
from the standard normal





P(Z<-1.96)=0.025
Since the distribution is symmetric,
P(Z>1.96)=0.025
So, P(-1.96<Z<1.96)=0.95
P(-1.645<Z<1.645)=0.90
P(-2.578<Z<2.578)=0.99
Empirical Methods - 2
Owen O'Donnell, University of Macedonia
17
Distributions related to the
normal


In statistics, we make frequent use of
three distributions related to the
normal, the χ2 , t and F distributions
We will frequently refer to tables to
compute probabilities from these
distributions
Empirical Methods - 2
Owen O'Donnell
University of Macedonia
Owen O'Donnell, University of Macedonia
18
6