Probability - UBC Department of Statistics

Probability
Statistical inference about population always involves
uncertainty, since inference is based on a sample/data and a
sample is only a small subset of the population. That is, our
conclusions cannot be 100% true.
The magnitude of uncertainty can be measured by probability.
For any (random) event A, 0 ≤ P(A) ≤ 1, with P(A) = 1
meaning 100% (certain) and P(A) = 0 meaning 0% (impossible).
In statistical analysis, it is important to estimate the magnitude of
uncertainty.
Probability
In probability calculations, certain rules are useful.
Combination rule: The total number of all (un-ordered)
arrangements to select r objects from n (distinct) objects (n ≥ r)
is
!
n
n!
,
=
r!(n − r)!
r
where n! = n × (n − 1) × · · · × 2 × 1.
P(AB) denotes the probability that both events A and B happen.
P(A|B) denotes the conditional probability of event A if we
know that event B has already occurred.
Independent events
Two events A and B are independent if knowing the
information of one event does not affect the probability of another
event.
For example, if two (or more) individuals are randomly chosen
from a population, then these two (or more) individuals can be
assumed to be independent.
Formally, two events A and B are independent if either
P(A|B) = P(A) (i.e., knowing B does not affect P(A)), or
P(AB) = P(A) × P(B) (i.e., multiplication rule).
Disjoint events
Notation: A + B means “either events A or B or both happen”.
Two events A and B are called disjoint or mutually exclusive if
they cannot happen at the same time. For example, A=“student
A passes the course”, B=“student A fails the course”.
The complement of A, denoted by Ā, consists of all outcomes
that are not in A. Thus, P(Ā) = 1 − P(A).
Multiplication rule and addition rule
Multiplication rule: If events A and B are independent, then
P(AB) = P(A) × P(B)
Addition rule: If events A and B are disjoint, then
P(A + B) = P(A) + P(B).
More general multiplication rule and addition rule are available.
Random Variable
A random variable (r.v.) X is a variable such that each outcome
of a random event corresponds to a certain value of X.
There are two types of random variables: discrete random
variables and continuous random variables.
The distribution of a discrete random variable shows all
possible values of the random variable and the corresponding
probability for each value. This distribution can be displayed
either in a table or a general formula.
Continuous Random Variable
The distribution of a continuous random variable is usually
described by a density function f (x). For example, the density
1 2
function for r.v. Z ∼ N(0, 1) is f (x) = √12π e− 2 x , −∞ < x < ∞.
The 5th percentile z0.05 of N(0, 1) is a value satisfying
Z z0.05
P(Z < z0.05 ) =
f (x)dx = 0.05,
−∞
where Z ∼ N(0, 1) and f (x) is the density of N(0, 1).
In general, for a continuous r.v., P(a ≤ X ≤ b) = the area
under the density curve f (x) between a and b for any real
numbers a and b.
Mean and Variance
The most important summaries of a random variable is its
mean (or expectation) and variance (or standard deviation).
The mean (or expectation) of a discrete random variable X,
denoted by E(X), is
E(X) = ∑ xi × P(X = xi ).
i
The variance of a discrete random variable X, denoted by
Var(X), is
Var(X) = ∑(xi − E(X))2 × P(X = xi ).
i
Mean and Variance
The mean (or expectation) of a continuous random variable X,
denoted also by E(X), is
Z
E(X) =
xf (x)dx.
The variance of a continuous random variable X, denoted by
Var(X), is
Z
Var(X) =
(x − E(X))2 f (x)dx.
Properties of Mean and Variance
Let X and Y be two r.v.’s (either discrete or continuous), and let
a and b are two constants (any fixed real numbers). Then
E(a + bX) = a + bE(X).
E(X + Y) = E(X) + E(Y),
Var(a + bX) =
E(X − Y) = E(X) − E(Y).
b2 Var(X).
If X and Y are independent, then
Var(X +Y) = Var(X)+Var(Y),
Var(X −Y) = Var(X)+Var(Y).
Linear Combination of Normal Random Variables
Random variables X1 , X2 , · · · , Xn are called i.i.d. if they are
independently and identically distributed. Example: a simple
random sample.
A linear combination of i.i.d. normally distributed random
variables still follow a normal distribution, i.e., if
X1 , X2 , · · · , Xn are i.i.d., each Xi ∼ N(µ, σ ), and if c1 , c2 , · · · , cn
are constants, then the linear combination
Y = c1 X1 + c2 X2 + · · · + cn Xn also follows a normal distribution:
s
!
n
Y ∼ N ( ∑ ci )µ,
i=1
n
∑ c2i σ
.
i=1
Example: choose ci = n1 , then Y is the sample mean X̄.