MA Politics and Economics of Contemporary Eastern and Southereastern Europe Empirical Methods Appendix 1: Probability & Descriptive Statistics • • • • • • Relevance of probability to statistics Definition and terminology of probability Random variables Probability distributions Measures of central tendency and variability The Normal and related distributions Empirical Methods - 2 Owen O'Donnell, University of Macedonia 1 What’s probability got to do with it? We examine a sample to learn about the population. To ensure representativeness, the sample is random. So, observed outcomes (and statistics) are determined by chance – they are values of random variables. Probability tells us about their behaviour. Allows us to think in terms of expected values. What is the expected value of a statistic computed from thousands of repeated random samples? What does that expected value reveal about the population? Empirical Methods - 2 Owen O'Donnell, University of Macedonia 2 What is probability? Relative frequency with which an event occurs (frequentist) Experiment – a procedure that can be repeated infinitely and has defined outcomes Random variable – outcome of an experiment in form of a numerical value X is random variable and x is particular value of X E.g. Tossing a coin is an experiment with potential outcome ‘heads’. If you toss a coin millions of times, then n(heads)/n=0.5. P(heads)=0.5 P X x lim n n X x n Empirical Methods - 2 Owen O'Donnell University of Macedonia Owen O'Donnell, University of Macedonia 3 1 MA Politics and Economics of Contemporary Eastern and Southereastern Europe Discrete random variables Take on a finite number of values E.g. 0 (tails) or 1 (heads) Bernoulli; or 0,1,2,3,4,5,...... (marriages!) Are described by a list of all possible values and their associated probabilities x x x ... x 1 p P X x , j 1,2,....k. j 2 3 p p p ... 1 2 3 p k j This is known as the probability density function (pdf), Probabilities must sum to 1, And lie between 0 and 1, f x p , j 1,2,....k. j j k k j 1 p 1 j 0 p 1 j Empirical Methods - 2 4 Owen O'Donnell, University of Macedonia Continuous random variables Take any real value with a probability of zero There are so many possible values, it is not feasible to list them all e.g. 10.000000000001, 10.000000000002, Since probability of any particular value is zero, define event as range of values and compute associated probability, P X b f xdxF b b F(x) is the cumulative density function Empirical Methods - 2 5 Owen O'Donnell, University of Macedonia Probability density function Cumulative density function F(b) f(x) F(x) F(x) f(x) F(a) a b P a X b f xdx F b F a a b b a Owen O'Donnell University of Macedonia P X b 1 F (b) 2 MA Politics and Economics of Contemporary Eastern and Southereastern Europe Features of probability distributions In statistics, we are interested in the probability distributions of random variables In particular, in the centre and the dispersion of the distribution Parameters describe these characteristics of the distribution Empirical Methods - 2 7 Owen O'Donnell, University of Macedonia Measure of central tendency The expected value of a variable is a weighted average of all possible values with weights given by the associated relative frequencies. It is the mean Discrete: E x x f x x f x .... x f x 1 1 2 x f x 2 k k k j 1 Continuous: j j E x xf xdx Empirical Methods - 2 8 Owen O'Donnell, University of Macedonia Properties of expected values E.1: E(c) = c E.2: E(aX+b) = aE(X)+b E.3: E aX aX ...aX aE X aE X ...aE X 1 1 2 2 k Empirical Methods - 2 Owen O'Donnell University of Macedonia k 1 1 2 2 Owen O'Donnell, University of Macedonia k k 9 3 MA Politics and Economics of Contemporary Eastern and Southereastern Europe The Median The median is another measure of central tendency It is the value at the centre of the distribution, in the sense that 50% of realised outcomes lie below it. P(X<Med(X))=0.5 Mean=median only if the distribution is symmetric Mean and median measure different characteristics Mean is more widely used because it is the expected value Empirical Methods - 2 Owen O'Donnell, University of Macedonia 10 Measures of dispersion How tightly is a variable distributed around its expected value? Variance is the expected (squared) distance of a variable from its mean Var X E X 2 2 V.1: If P(X=c)=1, E(X)=c, Var(X)=0 V.2: Var(aX+b)=a2Var(X) Standard deviation is the (positive) square root of the variance sd ( X ) Var ( X ) Unlike the variance, the standard deviation is measured in the same units as X 11 Standardised variables Transforming a variable by subtracting its mean and dividing by its standard deviation gives a variable with mean 0 and variance 1. X Define, Z then using E.2 and V.2 E Z E( X ) Empirical Methods - 2 Owen O'Donnell University of Macedonia 0, Var ( Z ) Var ( X ) 2 1 Owen O'Donnell, University of Macedonia 12 4 MA Politics and Economics of Contemporary Eastern and Southereastern Europe The Normal distribution Often in statistics we either know, or must assume, that a distribution has a particular shape. The most popular distribution used is the normal. This is because we often know that a statistic (e.g. sample mean) follows the normal distribution, and sometimes it is because of its convenience. The normal distribution is symmetric and bellshaped. If random variable X follows a normal distribution with mean μ and variance σ2, we write X~N(μ,σ2). Height and weight are normally distributed. Income is approximately log-normally distributed Empirical Methods - 2 Owen O'Donnell, University of Macedonia 13 Normal distribution with μ=1 & σ=1 f x x 1 exp 2 2 2 Empirical Methods - 2 2 , x Owen O'Donnell, University of Macedonia 14 Normal distributions with different means and standard deviations Plot 1 Plot 2 Plot 3 Mean 1 2 3 S.d. 1 2 1 Empirical Methods - 2 Owen O'Donnell University of Macedonia Owen O'Donnell, University of Macedonia 15 5 MA Politics and Economics of Contemporary Eastern and Southereastern Europe Standard normal distribution X N , 2 Z X N 0,1 Standard normal probability density function: 1 exp z / 2 z 2 Standard normal cumulative density function: z P ( Z z ) 2 Empirical Methods - 2 Owen O'Donnell, University of Macedonia 16 Some commonly used probabilities from the standard normal P(Z<-1.96)=0.025 Since the distribution is symmetric, P(Z>1.96)=0.025 So, P(-1.96<Z<1.96)=0.95 P(-1.645<Z<1.645)=0.90 P(-2.578<Z<2.578)=0.99 Empirical Methods - 2 Owen O'Donnell, University of Macedonia 17 Distributions related to the normal In statistics, we make frequent use of three distributions related to the normal, the χ2 , t and F distributions We will frequently refer to tables to compute probabilities from these distributions Empirical Methods - 2 Owen O'Donnell University of Macedonia Owen O'Donnell, University of Macedonia 18 6
© Copyright 2026 Paperzz