Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution Outline 2.1 Assignment of the theoretical distribution to empirical distribution 2.2 Comparison of empirical and theoretical parameters, estimation of theoretical parameters, testing of parametrical hypotheses 2.3 Measurement of statistical dependences – some fundaments of regression and correlation analysis 2.1 Assignment of the theoretical distribution to empirical distribution Goals Probable investigation of selective statistical set: Choice of acceptable theoretical distribution Probable picture of selective statistical set: Testing non- parametric hypotheses Acquired concepts and knowledge pieces Theoretical distribution, partial survey in alphabetical order: Bernoulli, Beta, Binomial, Chi-square, Discrete Uniform, Erlang, Exponential, F, Gamma, Geometric, Lognormal, Negative binomial, Normal, Poisson, Student´s, Triangular, Uniform, Weibull Testing of non-parametric hypotheses Test of zero hypothesis H0 Receiving or rejecting of zero hypothesis H0 Level of statistical significance a, e.g. at a = 0,05 Assigned example Hypothesis: Empirical distribution can be substituted by the normal distribution Assigned example (2) The results of 50 test elaboration Assignment of the theoretical distribution to empirical distribution = testing of non-parametric hypothesis Theoretical distribution is better due to simple mathematical apparatus that enables to detect the information inaccessible by another way 2.1.1 Interval division of frequencies It is recommended to construct 5 to 20 equidistant intervals of the extent of statistical sign values Sturges rule (empirical) k = 1+3.3 log n n is the extent of selective statistical set 2.1.2 Theoretical distribution Fundamental concept of probability theory it is the rule that every value of random variable assigns the probability Random variable is the variable value which is definitely determined by result of random attempt Random attempt is a realization of activities or processes the result of which is not possible to anticipate with certainty Probability = positive random attempt results / all random attempt results (e.g. shooting at a target) Random variables can be discrete or continues 2.1.2 Theoretical distribution Random variable To values of random variable it is possible to assign the probabilities with which they come in the course of random attempt. 2.1.2 Theoretical distribution Distribution function F (Cumulative) distribution function quotes the probability that a random variable RV obtains the values smaller or equal to just chosen value xi (or x) and this cumulative probability will be expressed by a summation (or integral) of partial probabilities. The probability that X lies in the semi- closed interval (a, b], where a < b, is therefore Properties: Parameters of theoretical distributions The theoretical general, central and standardized moments Oj, Cj and Nj Discrete: Pj marks the distribution function, xi the values of random variable RV Continues: r(x) marks the probability density and the x the RV Parameters of theoretical distributions Often the names and marks “mean value (expected value) E and dispersion (variance) D” are used, too. The expected value E is a location parameter which measures the level of random variable RV. The dispersion D is a variability parameter which measures the “diffusion” of random variable values. The expected value E is equal to theoretical general moment of 1.order O1, the dispersion D is equal to theoretical central moment of 2.order C2. The theoretical general moment of 1.order O1 is the location parameter, the theoretical central moment of 2.order C2 is the variability parameter, the theoretical standardized moment of 3.order N3 is the skewness parameter and the theoretical standardized parameter of 4.order N4 is the kurtosis parameter. The relation between empirical and theoretical parameters describes the law of large numbers. Subject to compliance with certain conditions, it can be expected that the empirical distribution and related empirical parameters will approximate the theoretical distribution and associated with him theoretical parameters. And the more, the greater the extent of selective statistical set (the larger the number of realized random attempts). Approaching the empirical parameters to the theoretical parameters has not character of mathematical convergence but probability convergence. Discrete distribution Binomial distribution Characteristic of random phenomenon The n independent random attempts are carried out, the probability of monitored random phenomenon is the same in the all random attempts and it is equal to p. It is sought the probability that this phenomenon occurs itself 0, 1, …, ntimes. According to this definition the values x0, x1, …, xn of relevant random variable are given by numbers 0, 1, …, n. Theoretical distribution (probability function) For described random phenomenon the probability function is a rule which assigns the probabilities Pi for i = 0, 1, …, n to the values xi of random variable. Distribution function: Discrete distribution Binomial distribution The significance of binomial distribution A typical example of independent random attempts is a random selection of elements from a set if the selected element is returned back, so called the selection with return. It can be shown that, in the case where the extent of selective set is small in comparison with the extent of basic set, the difference between the selection with return and the selection without return is insignificant. The binomial distribution can therefore serve as a suitable criterion, whether the selective statistical set was created on the basis of random selection. Continues distribution Normal distribution Characteristic of random phenomenon The continuous random variable whose values 𝑥 ∈ −∞, ∞ , can have a normal distribution. The graph of function which assigns the probabilities to these values of random variable is given by well-known Gauss curve in the shape of a “bell”. It is so sought a probability which will be assigned to unit interval of continuous random variable values in the sense that this interval will contain the value of x. Theoretical distribution (probability density function) Distribution function: Continues distribution Normal distribution Standardized normal distribution: N(0,1) Distribution function F(u) is Laplace function Continues distribution Normal distribution Discrete distribution Alternative distribution Special case of binomial distribution for n = 1 The alternative distribution is discrete theoretical distribution A(p) with one theoretical parameter of zero-one random variable RV (the random variable has values xi = i = 0, 1). Discrete distribution Poisson distribution The Poisson distribution is discrete theoretical distribution Po(λ) with one theoretical parameter λ of random variable RV (the random variable has values xi = i = 0,1, …., ∞). Discrete distribution Geometric distribution The geometric distribution is discrete theoretical distribution Ge(p) with one theoretical parameter p of random variable RV (the random variable has values xi = i = 0,1, …., ∞). The probabilities Pi geometrically decreases with increasing values i. The independent attempts are carried out and a probability taking the observed phenomenon (i.e. the probability of success) is for all the attempts the same and equal to p. The probability of success only in attempt i + 1 is given by probability function Pi. Discrete distribution Geometric distribution Continues distribution Lognormal distribution The lognormal distribution is continuous theoretical distribution LN(μ, σ) of random variable RV which is increasing function of random variable Y in the form x = ey (the random variable Y has normal distribution N(μ, σ)). The lognormal distribution has two theoretical parameters μ, σ. Continues distribution Lognormal distribution Apparatus of non-parametric testing zero hypothesis H0 supposes that empirical distribution can be substituted by intended theoretical distribution alternative hypothesis HA then supposes that this presumption isn´t correct A comparison between theoretical and empirical absolute frequencies is the essence of testing nonparametric hypotheses. Apparatus of non-parametric testing For the verification of non-parametric and parametric hypotheses the special group of theoretical distributions was developed – these distributions are not intended to replace the empirical distributions but they work as statistical criteria. The normal distribution is the only exception – in its standardized shape it may play a role of statistical criterion, in its non-standardized shape may substitute the empirical distributions. Standardized normal distribution (u-test), Student´ distribution (t-test), Pearson´ χ2 distribution and FisherSnedecor distribution (F-test) belong among the most frequent statistical criteria. Apparatus of non-parametric testing For verification of hypotheses H0 and Ha the suitable statistical criterion is needful to select. The χ2-test is used the most frequently for verification of a nonparametric hypothesis. If the creation of interval division of frequencies is a condition for its application, it is then needful to connect the each partial interval with the absolute frequency equal to at least 5. If this condition isn´t fulfilled it is necessary to connect the partial intervals. Similarly, it is necessary to proceed to the interval division of frequencies. Apparatus of non-parametric testing After the selection of statistical criterion (e.g., χ2-test) it is needful to come up to the determination of experimental value of this criterion (e.g., χ2-exp.) and critical theoretical value (e.g., χ2-theor.). So called the critical domain W of relevant statistical criterion will be recorded by means of the critical theoretical value. If the experimental value of selected criterion will be an element of the critical domain W it is necessary to receive the alternative hypothesis Ha – i.e. the empirical distribution cannot be substituted by intended theoretical distribution. In the contrary case (the experimental value will not be an element of the critical domain W) the zero hypothesis H0 can be received – i.e. the empirical distribution can be substituted by intended theoretical distribution. Significance level The determination of significance level α is an essential element of testing non-parametric and parametric hypotheses. This significance level quotes the probability of erroneous rejection of tested hypothesis (i.e. the probability of the error of I. type). The most frequent significance levels are the values α = 0.05 and α = 0.01. E.g., the significance level 0.05 enables for the positive test of normality (i.e. it is received the hypothesis H0 on the possibility to substitute the empirical distribution by normal distribution and the hypothesis Ha is refused) to determine the conclusion – if the selective statistical set SSS will be selected 100 times from basic statistical set BSS, in 95 cases it will be shown the empirical distribution can be substituted by normal distribution. 2.1.5. Illustration of Nonparametric Testing Hypothesis: Empirical distribution can be substituted by the normal distribution 2.1.5. Illustration of Nonparametric Testing In the course of testing the χ2-test will be applied, in the course of its application the letter k will be to refer to the number of intervals of frequency interval division, the letter r then to the number of normal distribution theoretical parameters (i.e. r = 2). The formulation ν = k–r–1 expresses the number of freedom degrees which enables together with a selected level of significance to determine the critical theoretical value χ2-teor. = χ2-k-r-1 using statistical tables. The significance level is selected α = 0,05.
© Copyright 2026 Paperzz