random variable

Ondrej Ploc
Part 2
The main methods of mathematical
statistics, Probability distribution
Outline
 2.1 Assignment of the theoretical distribution to
empirical distribution
 2.2 Comparison of empirical and theoretical
parameters, estimation of theoretical parameters,
testing of parametrical hypotheses
 2.3 Measurement of statistical dependences – some
fundaments of regression and correlation analysis
2.1
Assignment of the theoretical
distribution to empirical distribution
Goals
 Probable investigation of selective statistical set: Choice of
acceptable theoretical distribution
 Probable picture of selective statistical set: Testing non-
parametric hypotheses
Acquired concepts
and knowledge pieces
 Theoretical distribution, partial survey in alphabetical
order:
 Bernoulli, Beta, Binomial, Chi-square, Discrete
Uniform, Erlang, Exponential, F, Gamma, Geometric,
Lognormal, Negative binomial, Normal, Poisson,
Student´s, Triangular, Uniform, Weibull
 Testing of non-parametric hypotheses
 Test of zero hypothesis H0
 Receiving or rejecting of zero hypothesis H0
 Level of statistical significance a, e.g. at a = 0,05
Assigned example
 Hypothesis: Empirical distribution can be substituted
by the normal distribution
Assigned example (2)
 The results of 50 test elaboration
Assignment of the theoretical
distribution to empirical distribution
= testing of non-parametric hypothesis
 Theoretical distribution is better due to simple
mathematical apparatus that enables to detect the
information inaccessible by another way
2.1.1 Interval division of frequencies
 It is recommended to construct 5 to 20 equidistant
intervals of the extent of statistical sign values
 Sturges rule (empirical)
 k = 1+3.3 log n
n is the extent of selective statistical set
2.1.2 Theoretical distribution
 Fundamental concept of probability theory
 it is the rule that every value of random variable
assigns the probability
 Random variable is the variable value which is definitely
determined by result of random attempt
 Random attempt is a realization of activities or processes
the result of which is not possible to anticipate with
certainty
 Probability = positive random attempt results / all
random attempt results (e.g. shooting at a target)
 Random variables can be discrete or continues
2.1.2 Theoretical distribution
Random variable
 To values of random variable it is possible to assign the
probabilities with which they come in the course of
random attempt.
2.1.2 Theoretical distribution
Distribution function F
 (Cumulative) distribution function quotes the
probability that a random variable RV obtains the
values smaller or equal to just chosen value xi (or x)
and this cumulative probability will be expressed by a
summation (or integral) of partial probabilities.
 The probability that X lies in the semi-
closed interval (a, b], where a < b, is therefore
 Properties:
Parameters of theoretical distributions
 The theoretical general, central and standardized
moments Oj, Cj and Nj
 Discrete: Pj marks the distribution function, xi the
values of random variable RV
 Continues: r(x) marks the probability density and the
x the RV
Parameters of theoretical distributions
 Often the names and marks “mean value (expected value) E and
dispersion (variance) D” are used, too. The expected value E is a
location parameter which measures the level of random variable RV.
The dispersion D is a variability parameter which measures the
“diffusion” of random variable values. The expected value E is equal to
theoretical general moment of 1.order O1, the dispersion D is equal to
theoretical central moment of 2.order C2.
 The theoretical general moment of 1.order O1 is the location parameter,
the theoretical central moment of 2.order C2 is the variability
parameter, the theoretical standardized moment of 3.order N3 is the
skewness parameter and the theoretical standardized parameter of
4.order N4 is the kurtosis parameter.
 The relation between empirical and theoretical parameters describes
the law of large numbers. Subject to compliance with certain
conditions, it can be expected that the empirical distribution and
related empirical parameters will approximate the theoretical
distribution and associated with him theoretical parameters. And the
more, the greater the extent of selective statistical set (the larger the
number of realized random attempts). Approaching the empirical
parameters to the theoretical parameters has not character of
mathematical convergence but probability convergence.
Discrete distribution
Binomial distribution
 Characteristic of random phenomenon
 The n independent random attempts are carried out, the
probability of monitored random phenomenon is the same
in the all random attempts and it is equal to p. It is sought
the probability that this phenomenon occurs itself 0, 1, …, ntimes. According to this definition the values x0, x1, …, xn of
relevant random variable are given by numbers 0, 1, …, n.
 Theoretical distribution (probability function)
 For described random phenomenon the probability function
is a rule which assigns the probabilities Pi for i = 0, 1, …, n to
the values xi of random variable.
 Distribution function:
Discrete distribution
Binomial distribution
 The significance of binomial distribution
 A typical example of independent random attempts is a
random selection of elements from a set if the selected
element is returned back, so called the selection with return.
It can be shown that, in the case where the extent of
selective set is small in comparison with the extent of basic
set, the difference between the selection with return and the
selection without return is insignificant. The binomial
distribution can therefore serve as a suitable criterion,
whether the selective statistical set was created on the basis
of random selection.
Continues distribution
Normal distribution
 Characteristic of random phenomenon
 The continuous random variable whose values 𝑥 ∈ −∞, ∞ ,
can have a normal distribution. The graph of function which
assigns the probabilities to these values of random variable
is given by well-known Gauss curve in the shape of a “bell”. It
is so sought a probability which will be assigned to unit
interval of continuous random variable values in the sense
that this interval will contain the value of x.
 Theoretical distribution (probability density function)
 Distribution function:
Continues distribution
Normal distribution
 Standardized normal distribution: N(0,1)
 Distribution function F(u) is Laplace function
Continues distribution
Normal distribution
Discrete distribution
Alternative distribution
 Special case of binomial distribution for n = 1
 The alternative distribution is discrete theoretical
distribution A(p) with one theoretical parameter of
zero-one random variable RV (the random variable has
values xi = i = 0, 1).
Discrete distribution
Poisson distribution
 The Poisson distribution is discrete theoretical
distribution Po(λ) with one theoretical parameter λ of
random variable RV (the random variable has values xi
= i = 0,1, …., ∞).
Discrete distribution
Geometric distribution
 The geometric distribution is discrete theoretical
distribution Ge(p) with one theoretical parameter p of
random variable RV (the random variable has values xi
= i = 0,1, …., ∞).
 The probabilities Pi geometrically decreases with
increasing values i. The independent attempts are
carried out and a probability taking the observed
phenomenon (i.e. the probability of success) is for all
the attempts the same and equal to p. The probability
of success only in attempt i + 1 is given by probability
function Pi.
Discrete distribution
Geometric distribution
Continues distribution
Lognormal distribution
 The lognormal distribution is continuous theoretical
distribution LN(μ, σ) of random variable RV which is
increasing function of random variable Y in the form x
= ey (the random variable Y has normal distribution
N(μ, σ)). The lognormal distribution has two
theoretical parameters μ, σ.
Continues distribution
Lognormal distribution
Apparatus of non-parametric testing
 zero hypothesis H0 supposes that empirical
distribution can be substituted by intended theoretical
distribution
 alternative hypothesis HA then supposes that this
presumption isn´t correct
 A comparison between theoretical and empirical
absolute frequencies is the essence of testing nonparametric hypotheses.
Apparatus of non-parametric testing
 For the verification of non-parametric and parametric
hypotheses the special group of theoretical distributions
was developed – these distributions are not intended to
replace the empirical distributions but they work as
statistical criteria. The normal distribution is the only
exception – in its standardized shape it may play a role of
statistical criterion, in its non-standardized shape may
substitute the empirical distributions.
 Standardized normal distribution (u-test), Student´
distribution (t-test), Pearson´ χ2 distribution and FisherSnedecor distribution (F-test) belong among the most
frequent statistical criteria.
Apparatus of non-parametric testing
 For verification of hypotheses H0 and Ha the suitable
statistical criterion is needful to select. The χ2-test is
used the most frequently for verification of a nonparametric hypothesis. If the creation of interval
division of frequencies is a condition for its
application, it is then needful to connect the each
partial interval with the absolute frequency equal to at
least 5. If this condition isn´t fulfilled it is necessary to
connect the partial intervals. Similarly, it is necessary
to proceed to the interval division of frequencies.
Apparatus of non-parametric testing
 After the selection of statistical criterion (e.g., χ2-test) it is
needful to come up to the determination of experimental value
of this criterion (e.g., χ2-exp.) and critical theoretical value (e.g.,
χ2-theor.). So called the critical domain W of relevant statistical
criterion will be recorded by means of the critical theoretical
value.
 If the experimental value of selected criterion will be an element
of the critical domain W it is necessary to receive the alternative
hypothesis Ha – i.e. the empirical distribution cannot be
substituted by intended theoretical distribution. In the contrary
case (the experimental value will not be an element of the
critical domain W) the zero hypothesis H0 can be received – i.e.
the empirical distribution can be substituted by intended
theoretical distribution.
Significance level
 The determination of significance level α is an essential
element of testing non-parametric and parametric
hypotheses. This significance level quotes the probability
of erroneous rejection of tested hypothesis (i.e. the
probability of the error of I. type). The most frequent
significance levels are the values α = 0.05 and α = 0.01. E.g.,
the significance level 0.05 enables for the positive test of
normality (i.e. it is received the hypothesis H0 on the
possibility to substitute the empirical distribution by
normal distribution and the hypothesis Ha is refused) to
determine the conclusion – if the selective statistical set
SSS will be selected 100 times from basic statistical set BSS,
in 95 cases it will be shown the empirical distribution can
be substituted by normal distribution.
2.1.5. Illustration of Nonparametric Testing
 Hypothesis: Empirical distribution can be substituted
by the normal distribution
2.1.5. Illustration of Nonparametric Testing
 In the course of testing the χ2-test will be applied, in the
course of its application the letter k will be to refer to
the number of intervals of frequency interval division,
the letter r then to the number of normal distribution
theoretical parameters (i.e. r = 2). The formulation ν =
k–r–1 expresses the number of freedom degrees which
enables together with a selected level of significance to
determine the critical theoretical value
χ2-teor. = χ2-k-r-1
using statistical tables.
The significance level is selected α = 0,05.