Statistics for Cyber Security
Wenyaw Chan
Division of Biostatistics
School of Public Health
University of Texas
- Health Science Center at Houston
Module (a): Basic Properties
of Probability and Statistical
Inferences
Definition of Probability
Three ways of defining Probability:
1. Objective Probability
2. Deductive Logic Definition of Probability
3. Subjective Definition of Probability
Definition of Probability
• Objective Probability
– If E is an event in an experiment, the experiment is repeated a
very large number of times, say N, and the event E is observed
in n of these N trials then Prob(E) =n/N.
• Deductive Logic Definition of Probability
– The probability of an event is determined logically from
symmetry or geometric considerations associated with the
experiment.
• Subjective Definition of Probability
– The probability of an event is determined subjectively, reflecting
a person’s “ degree of belief” that an event will occur!
Examples
1. Toss a coin 10000 times.
Pr(Head)=0.5001
2. Throw a dart on the circular Target
Pr(region A)=area(A)/total area
3. What is the probability that John will
pass this course? Who is answering
this question? John, Professor or his
friend.
Properties of probability
1. 0 <= P(E) <= 1
2. P(A or B occurs)=P(A) +P(B), if A and B
can not happen at the same time
Mutually Exclusive
– Two events are mutually exclusive, if AB =0
– A1 = {1,2}, A2 = {3,4} and A3 = {5,6} are
mutually exclusive
Properties of probability II
3. P(A B)=P(A) + P(B) - P(A B)
– Two events A and B are independent, then
P(A B) = P(A)P(B)
4. If events A and B are independent, then
P(A B)=P(A) + P(B) - P(A)P(B)
=P(A) + P(B) [1- P(A)]
Conditional Probability
Conditional probability of B given A:
P(BA)= P(A B)/P(A).
–
If A and B are independent, then
P(BA)= P(B)= P(BA).
–
If A and B are not independent, then
P(BA) P(B)P(BA).
Toss two dices: what is the probability that sum=6,
given that sum=even?
(5/36)/(1/2)
Exhaustive Events
A set of events A1 , A2, A3 Ak is exhaustive if at least
one of the events must occur.
i.e. A1 A2 A3 Ak = sample space
Toss a die
A1 = {1,2,3}, A2 = {1,3,4} and A3 = {2,5,6} are exhaustive
Law of Total Probability
P(B)= P(BAi) P(Ai), if A1 A2 A3 Ak
are mutually exclusive and exhaustive
Toss a blue die and a red die:
Pr(red= even)=
Pr(red= even| sum=2)
+Pr(red= even| sum=3)
+ Pr(red= even| sum=4) +.............
+ Pr(red= even| sum=12)
Bayes’ Rule
Let A and B be two events, then
P(B|A)
=[P(A|B)P(B)]/[ P(A|B) P(B) + P(A|B) P(B)]
Toss a blue die and a red die:
Pr(sum= even| red=2) =
Pr(red=2|sum= even)Pr(sum= even )
{Pr(red=2|sum= even)Pr(sum= even )+
Pr(red=2|sum= odd)Pr(sum= odd)}
Population and Samples
• Random Sample
– is a selection of some members of the population such that each
member is independently chosen and has a known nonzero
probability of being selected.
• Simple Random Sample
– is a random sample in which each group member has the same
probability of being selected.
• Cluster Sampling
– involves selecting a random sample of clusters and then looking
at all study units within the chosen clusters.
– (one-stage)
– In two-stage sampling, a random sample of clusters is selected
and then, within each cluster, a random sample of study units is
selected.
Random Variables and their
Distributions
Random Variable
• Random Variable:
– A numeric function that assigns probabilities to
different events in a sample.
• Discrete Random Variable:
– A random variable that assumes only a finite or
denumerable number of values.
– The probability mass function of a discrete random
variable X that assumes values x1, x2,… is p(x1),
p(x2), …., where p(xi)=Pr[X= xi].
• Continuous Random Variable:
– A random variable whose possible values cannot be
enumerated.
Example: Flip a coin 3 times
• Random Variable
– X = # of heads in the 3 coin tosses
• Probability Mass Function
–
–
–
–
P(X=3) = P{(HHH)} =1/8
P(X=2) = P{HHT, HTH, THH}= 3/8
P(X=1) = P{HTT,THT, TTH} = 3/8
P(X=0) = P{TTT} = 1/8
• X is a discrete random variable with probability
(mass) function
x
0
1
2
3
P(X=x)
1/8
3/8
3/8
1/8
Random Variable
Expected value of X :
k
E ( X ) xi Pr( X xi )
i 1
Variance of X :
2
k
Var ( X ) ( xi ) 2 Pr( X xi )
i 1
Standard Deviation of X:
= Var ( X )
Random Variable
• Note :
2
Var ( X ) E ( X )
2
2
E ( X ) [ E ( X )]
• Cumulative Distribution Function
– of X : Pr(X<=x) = F(x)
Binomial Distribution
• Examples of the binomial distribution have a
common structure:
– n independent trials
– each trial has only two possible outcomes, called
“success” and “failure”.
– Pr (success) = p for all trials
Binomial Distribution
• If X= # of successful trials in these n trials,
then X has a binomial distribution.
n k
P X k p (1 p ) n k
k
• k=0,1,2,….,n
• where
n
n!
k (n k )!k !
• Example: Flip a coin 10 times
Properties of Binomial
Distribution
• If X~ Binomial (n, p), then
E(X) = np
Var (X) = np(1-p)
Poisson Distribution
Pr X k
k=0,1,2,…..
e
k
k!
If X~ Poisson (), then EX = and VarX =
Poisson Process
• Assumption 1:
– Pr {1 event occurs in a very small time interval [0,t)} t
– Pr {0 event occurs in a very small time interval [0, t)}1- t
– Pr{more than one event occurs in a very small time interval [0,
t)}0
• Assumption 2:
– Probability that the number of events occur per unit time is the
same through out the entire time interval
• Assumption 3:
– Pr {one event in [t1,t2) | one event in [t0, t1)}
= Pr {one event in [t1, t2)}
Poisson Distribution
• X=The number of events occurred in the time
period t for the above process with
parameter, then mean=t and
Pr ( X k )
e
t
where k= 0,1,2,…
and e= 2.71828
E(X)=Var(X)=t
(t )
k!
k
Poisson approximation to
Binomial
• If X~ Binomial (n, p), n is large and p is
small, then
P( X k )
e
np
(np)
k!
k
Continuous Probability
Distributions
• Probability density function (p.d.f.) (of a
random variable):
– a curve such that the area under the curve
between any two points a and b, equals
– Prob[a x b ]= ∫ a x bf(x)dx
Pr(a<=X<=b)
a
b
Continuous Probability
Distributions
• Cumulative distribution function: Pr(x a)
Pr(X<=a)
a
Continuous Probability
Distributions
• The expected value of a continuous
random variable X is
∫ xf(x)dx, where f(x) is the p.d.f. of X.
• The definition for the variance of a
continuous random variable is the
same as that of a discrete random
variable, i.e.
Var(X)=E(X2)- (EX)2=∫(x-µ)2f(x)dx, where
µ=E(X).
The Normal Distribution
(The Gaussian distribution)
•
• The p.d.f. of a normal distribution
f ( x)
1
1
2
exp
(
x
)
2 2
2
where - < x <
The Normal Distribution
point of inflection
s
u-s
s
u
u+s
figure: a bell-shaped curve symmetric about
• Notation: X~N(, 2 )
: mean
2 : variance
The Normal Distribution
• N(0,1) is the standard normal distribution
• If X~ N(0,1), then
( x) Pr( X x)
– ~ : “is distributed as” ,
– : c.d.f. for the standard normal r.v.
• Note:
– The point of inflection is a point where the slope
of the curve changes its direction.
Properties of the N(0,1)
• 1. (-x) = 1-(x)
• 2.
– About 68% of the area under the standard
normal curve lies between –1 and 1.
– About 95% of the area under the standard
normal curve lies between –2 and 2.
– About 99% of the area under the standard
normal curve lies between –2.5 and 2.5.
Properties of the N(0,1)
• If X~ N(0,1) and P(X< Zu)=u, 0 u 1
then Zu is called the 100uth percentile
of the standard normal distribution.
95th %tile=1.645, 97.5th %tile=1.96, 99th
%tile=2.33
Area=u
Zu
Properties of the N(0,1)
• If X~ N(,
2),
then
X
~ N (0,1)
• This property allows us to calculate the
probability of a non-standard normal
random variable.
a X b
Pr a X b Pr
b
a
Other Distributions--t distribution
• Let X1, ….Xn be a random sample from a
normal population N(, σ2).
Then
X
s/ n
has a t distribution with n-1 degrees of
freedom (df).
Other Distributions--Chi-square distribution
• Let X1, ….Xn be a random sample from a
normal population N(0, 1).
Then
n
2
X
i
i 1
has a chi-square distribution with n
degrees of freedom (df).
Other Distributions--F distribution
• Let U and V be independent random
variables and each has a chi-square
distribution with p and q degrees of
freedom respectively.
U/p
Then
V /q
has a F distribution with p and q degrees
of freedom (df).
Covariance and Correlation
• The covariance between two random
variables is defined by
Cov(X,Y)=E[(X-µX)(Y-µY)].
• The correlation coefficient between two
random variables is defined by
ρ=Corr(X,Y)=Cov(X,Y)/(σX σ Y).
Variance of a Linear
Combination
• Var(c1X1 + c2X2)
c12Var ( X1 ) c22Var ( X 2 )
2c1c2Cov( X1, X 2 )
c12Var ( X1 ) c22Var ( X 2 )
2c1c2 X Y Corr( X1, X 2 )
Estimation
• Point Estimates
– A point estimate of a parameter θ is a single
number used as an estimate of the value of θ.
– e.g. A natural estimate to use for estimating the
population mean is the sample mean
.
X X /n
__
• Interval Estimation
n
i 1
i
– If an random interval I=(L,U) satisfying Pr(L< θ
<U)=1- α, the observed values of L and U for a
given sample is called a 1- α conference interval
estimate for θ.
Which one is more accurate?
Which one is more precise?
Estimation
What to estimate?
• B(n, p)
proportion
• Poisson () mean
• N(, σ2) mean and/or variance
Estimation of the Mean of a
Distribution
• A point estimator of the population mean
is sample mean.
• Sampling Distribution X of
is the distribution of values of X over all
possible samples of size n that could have
been selected from the reference population.
E( X )
Estimation
• An estimator of a parameter is unbiased
estimator if its expectation is equal to the
parameter.
• Note: The unbiasedness is not sufficient to be used as
the only criterion for chosen an estimator.
• The unbiased estimator with the
minimum variance (MVUE) is preferred.
• If the population is normal, then X is the MVUE of .
Sample Mean
• Standard error (of the mean)
= standard deviation of the sample mean
2
n
n
s
• The estimated standard error
n
where s: sample standard deviation .
Central Limit Theorem
• Let X1,…,Xn be a random sample from
some population with mean and
variance σ2
2
Then, for large n,
X N ,
n
Interval Estimation
• Let X1, ….Xn be a random sample from a
normal population N(, σ2). If σ2 is known,
a 95% confidence interval (C.I.) for is
1.96
1.96
,X
X
n
n
why? (next slide)
Interval Estimation
2
X
If X ~ N , , then Pr 1.96
1.96 .95
n
n
i.e.
1.96
X 1.96
n
X 1.96
X 1.96
n
n
n
X 1.96
X 1.96
n
n
Interval Estimation
Interpretation of Confidence Interval
• Over the collection of 95% confidence
intervals that could be constructed from
repeated random samples of size n, 95%
of them will contain the parameter
• It is wrong to say:
There is a 95% chance that the parameter
will fall within a particular 95%
confidence interval.
Interval Estimation
• Note:
1. When and n are fixed, 99% C.I. is wider than 95% C.I.
2. If the width of the C.I. is specified, the sample size can be
determined.
n
length
length
Hypothesis Testing
• Null hypothesis(H0): the statement to be
tested, usually reflecting the status quo.
• Alternative hypothesis (H1): the logical
compliment of H0.
• Note: the null hypothesis is analogous to the
defendant in the court. It is presumed to be true
unless the data argue overwhelmingly to the
contrary.
Hypothesis Testing
• Four possible outcomes of the decision:
Truth
Decision
Ho
H1
Accept H0
OK
Type II error
Reject H0
Type I error
OK
• Notation:
= Pr (Type I error) = level of significance
= Pr (Type II error)
1- = power= Pr(reject H0|H1 is true)
Hypothesis Testing
• Goal :
to make and both small
• Facts:
then
then
• General Strategy:
fix , minimize
Testing for the Population
Mean
• When the sample is from normal population
H0 : = 120 vs H1 : < 120
• The best test is based on X ,which is called the
test statistic. The "best test" means that the test
has the highest power among all tests with a given
type I error.
Is there any bad test? Yes.
• Rejection Region:
– range of values of test statistic for which H0 is rejected.
One-tailed test
• Our rejection region is X c
• Now, Pr(Type I error | Ho is true)
Pr( X c | X ~ N ( 0 ,
(
i.e.
c 0
/ n
c 0
/ n
2
n
))
)
Z or c 0 Z / n
Result
• To test H0 : = 0 vs H1 : < 0, based
on the samples taken from a normal
population with mean and variance
unknown,
x
t
the test statistic is
s/ n .
• Assume the level of significance is α then,
0
– if t < tn-1, α , then we reject H0.
– if t ≥ tn-1, α, then we do not reject H0.
P-value
• The minimum α-level at
which we can reject Ho
based on the sample.
• P-value can also be
thought as the
probability of obtaining
a test statistic as
extreme as or more
extreme than the actual
test statistic obtained
from the sample, given
that the null hypothesis
is true.
P value
Remarks
• Two different approaches on determining
the statistical significance:
– Critical value method
– P-value method.
One-tailed test
• Testing H0: µ= µ0 vs H1: µ > µ0
When 2 unknown and population is normal
x
Test Statistic: t
s/ n
0
Rejection Region: t > tn-1,α
p-value = 1- Ft,n-1 (t), where Ft,n-1 ( ) is the cdf for t distribution
with df=n-1.
• Note: If 2 is known, the s in test statistic will be replaced by
σ and
tn-1,α in rejection region will be replaced by zα , Ft,n-1 (t)
will be
replace by Ф(t).
Testing For Two-Sided
Alternative
• Let X1,….,Xn be the random samples from the
population N(µ, σ²), where σ² is unknown.
• H0 : µ=µ0 vs H1 : µ≠µ0
x
s/ n
– Rejection Region: |t|> tn-1,1-α/2
– Test Statistic: t
0
– p-value = 2*Ft,n-1 (t), if t<= 0. (see figures on next
slide)
2*[1- Ft,n-1 (t)], if t > 0.
• Warning: exact p-value requires use of
computer.
Testing For Two-Sided
Alternative
P-value for X>U0
P-value for X<=U0
if x> Uo
2Uo-x
Uo
x
if x<= Uo
x
Uo
2Uo-x
The Power of A Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in normal
population with known variance σ², the power is
[Z (0 - 1 ) n / ].
• Review : Power= Pr [rejecting H0 | H0 is false ]
• Factors Affecting the Power
1. Z power
2. | 0 1 | power
3. power
4. n power
The Power of The 1-Sample T Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in a normal
population with unknown variance σ², the
power, for true mean µ1 and true s.d.= σ, is
F(tn-1, .05), where F( ) is the c.d.f of non-central t
distribution with df=n-1 and non-centrality
1 0 n .
Notes:
1. t n-1,0.05 = -t n-1,0.95 .
2. If X and Y are independent random variables
such that Y~ N( ,1) and X ~ 2 with d.f.=m,
then Y/ (X/m) is said to have a non-central
t distribution with non-centrality .
Power Function For Two-Sided
Alternative
• To test H0 : µ=µ0 vs H1 : µ≠µ0 in normal
population with known variance σ², the
power is
[-Z1 2 + 0 1 n ] [-Z1 2 + 1 0 n ]
,where µ1 is true alternative.
Case of Unknown Variance
• For the same test with an unknown
variance population, the power is F(-tn-1, 1α/2) + 1- F(tn-1, 1- α /2), where F( ) is the c.d.f
of non-central t distribution with df=n-1 and
non-centrality 1 0 n .
Sample Size Determination
For example: H0 : µ=µ0 vs H1 : µ<µ0
power :[Z ( 0 1 ) n / ] 1 -
if 1 0
Hence,Z ( 0 1 ) n / Z1-
n
n
( Z Z1- )
( 0 1 )
( Z1 Z1- ) 2 2
( 0 1 ) 2
Factor Affecting Sample Size
2
n
1.
2. n
3. 1 n
4. | | n
• To test H0 : µ=µ0 vs H1 : µ≠µ0, σ² is
known.
Sample size calculation is
0
1
n
( Z1 / 2 Z1- ) 2 2
( 0 1 ) 2
Relationship between
Hypothesis Testing and
Confidence Interval
• To test H0 : µ=µ0 vs H1 : µ≠µ0, H0 is
rejected with a two-sided level α test if and
only if the two-sided 100%*(1 - α)
confidence interval for µ does not contain
µ0.
One Sample Test for the
Variance of A Normal Population
One Sample Test for A
Proportion
Exact Method
• If p(hat) < p0,
the p-value
2 Pr[ X # of events in the observatio n | X ~ B( n, P0 )]
2
# of events
k 0
n k
P0 (1 P0 )n k
k
• If p(hat) ≥ p0,
the p-value
2 Pr[ X # of events in the observatio n | X ~ B( n, P0 )]
2
n
k # of events
n k
P0 (1 P0 ) n k
k
Power and Sample size
One-Sample Inference for the
Poisson Distribution
• X ~ Poisson with mean μ
• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of
significance,
– Obtain a two-sided 100(1- α)% C.I. for µ,
say (C1, C2)
– If µ0 (C1, C2), we accept H0 otherwise
reject H0.
One-Sample Inference for the
Poisson Distribution
• The p-value (for above two-sided test)
– If observed X < µ0, then
P min[ 2 F ( x | 0 ),1]
– If observed X > µ0,
P min[ 2 (1 F ( x 1 | 0 )),1]
Where F(x | µ0) is the Poisson c.d.f with mean =
µ 0.
Large-Sample Test for Poisson
(for µ0 ≥ 10)
• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of
significance,
– Test Statistic:
X2
( x 0 ) 2
0
~ 1σ2 under H 0
– Rejection Region:
X 2 12,1
– p-value:
Pr 12 X 2
Introduction to SAS
• To run SAS you must create a file of SAS code, which
the SAS will read and use to run the program.
• You can type your SAS code into the program editor and
create a SAS program file.
• A SAS program usually consists of two components:
Data steps and Procedure steps.
• In the SAS program, any statement terminates with a
semicolon.
• A line comment begins with a *. For a comment of
several lines, we use /* …. */
A Simple SAS Program
•
•
•
•
•
•
•
•
•
•
•
•
•
title “My First SAS Program”;
Data temp;
Input id security_status $ years_of _Using_PC;
Datalines;
1 yes 10
2 yes 9
3 no 15
4 yes 12
5 no 7
;
Proc print;
Var id security_status years_of _Using_PC;
Run;
How to run a SAS program
• To run a SAS program, you click on the running mean icon. At the
center of the top of the main program.
• After you run the program, the log window will become active and
provide you with the information that includes error or warning
messages.
• The outputs can be found in the “Results Viewer-SAS Output”
window.
Probability Models related
Computer Security
• If a computer hacker has a probability of p
to succeed on each attempt, what is the
probability that he/she will succeed after N
attempts.
• The probability that there will no success
after N attempts is (1-p)N.
• The probability that he/she will succeed
after N attempts is 1-(1-p)N.
Probability Models related
Computer Security
• How many attempts are required for this
hacker to receive a x% of success rate?
• From the last question, we would like to
see x%=1-(1-p)N
• i.e. log(1-x%)=N*log(1-p)
• So, N is the smallest integer that is >=
log(1- x%)/log(1-p)
© Copyright 2026 Paperzz