Ondrej Ploc Part 1 The main methods of descriptive statistics, Statistical probability Outline 1.1 Formulation of statistical investigation 1.2 Creation of Scale 1.3 Measurement, Probability 1.4 Elementary statistical processing 1.1 Formulation of statistical investigation Goals Collective random phenomenon and reason of its investigation Selective statistical set as a part of basic statistical set Acquired concepts and knowledge pieces collective random phenomenon statistical unit statistical sign – statistical character values of statistical sign basic statistical set – basic statistical file – population selective statistical set – sample statistical file Assigned example (1) The 4000 enterprises have undergone tests on “export ability”. The average “export ability”on a scale 1 to 5 (1 – maximum export ability, 5 – minimum export ability) was necessary to define for preliminary information.That is why the 50 tests was randomly selected and their results are presented in table Tab.1. Elaborate the collective random phenomenon (export ability of enterprise) gradually and complexly. Assigned example (2) The results of 50 test elaboration Collective random phenomenon CRP e.g. export ability of enterprise is the realization of the activities or processes whose result cannot be predicted with certainty and which are taking place in an extensive set of elements (e.g. enterprises). These elements have the certain group of identical properties (e.g. identical type of economical parameter – enterprise character) and the other a group of different properties (e.g. the different values of export ability of global economical state of enterprise). Mathematical statististics and probability theory deal with qualitative and quantitative analysis of the patterns of collective random phenomena. The statistical unit SU is delimited by the identical properties of investigated set elements (e.g. the enterprises and their character). The statistical sign SS is given by some from different properties of investigated set elements (e.g. by export ability of enterprise). The values of statistical sign VSS are a way of investigated statistical sign description (e.g. the description of export ability of mining industry enterprises by the percent of the mined ore transported for the processing within fortnight from the extraction). The basic statistical set BSS Population is given by all the statistical units, its extent is equal to the number of all the statistical units (e.g. the extent of investigated BSS is equal to the total number of 4000 enterprises in the assigned example). It is usually not in the practical possibilities of statisticians to investigate the statistical sign SS in all the statistical units SU and it is required to limit the number of statistical units SU. The random selection RS is limit the number of investigated statistical units SU in such a way, in order to transfer the results obtained to the entire BSS. The various ways of random selection are existing (drawing, generating a table of random numerals, deliberate selection). It is necessary to verify whether it could be considered as random selection obtained. The selected statistical set SSS is given those statistical units, which have been selected from the basic statistical set by the process of random selection. The extent of SSS is equal to the number of selected statistical units (e.g. the extent of SSS in the assigned example is equal to the number of 50 selected enterprises). Selected statistical set SSS is one-dimensional if it investigated only one statistical sign, multidimensional set found at, if investigated more statistical signs. Assigned example The formulation of the statistical investigation is implemented in the assigned example by the delimitation of selective statistical set 50 enterprises. In the context of this delimitation must be exactly characterized all the follow-up concepts – investigated collective random phenomenon CRP, definition of the statistical unit SU, determination of the investigated statistical sign SS, characterization of the statistical sign values VSS, exact delimitation of the basic statistical set BSS and finally, ensuring the procedure of random selection RS. Check questions What is the subject of investigation of statististics and probability theory What is the collective random phenomenon How is the statistical unit delimited How are statistical sign and its values delimited What is the difference between basic and selective statistical set Why is the procces of random selection important 1.2 Creation of Scale Goals creation of scale choice of scale type Acquired concepts and knowledge pieces Scale classification of scales parameters of selective type of scale Scale creation The scale creation is the suitable expression of statistical sign values by means of scale elements. The point is that the statistical sign values can be divided into reasonable groups, into scale elements. The system of scale elements creates the scale. The number k of scale elements can be calculated, for example, by Sturges rule k = 1 + 3.3 log10n, where n is an extent of selective statistical set SSS. Classification of scales According to the nature of statistical sign 1. qualitative (nominal) 2. Ordinal 3. quantitative metric 4. absolute metric. The classification of scales can be used also to classify statistical signs. In some cases, the statistical sign values immediately identify the scale and scaling isn´t necessary. The nominal scale is the classification into categories (the scale elements are the individual categories). For every two statistical units of selective statistical set it is possible to decide whether or not they are in terms of investigated statistical sign of identical or different (such as gender or employment, if the statistical units are individual persons). The ordinal scale enables you to not only decide on the identity or the diversity of the statistical units, but also to establish their order (e.g., achieve the degree of scholastic education). The scale elements are the individual order. This one doesn´t enable to determine the distance between two neighbouring statistical units arranged according to this scale. The quantitative metric scale already enables to establish the distance between two neighbouring statistical units – from this perspective, it is needful to define the unit of scale (e.g. percentage evaluation of export ability or other parameter of the global economical condition, the temperature in degrees Celsius). The scale elements are the individual points of scale expressed the numerical sizes. The quantitative metric scale expesses the values of statistical sign without the possibility factually to interpret, in the beginning (zero point) of scale – the choice of scale beginning is the question of free choice. The absolute metric scale is a quantitative metric scale and, in addition, it can be interpreted in the beginning of the scale factually – the scale zero responds to real zero value of investigated statistical sign (e.g. the temperature in degrees Kelvin, the number of errors in testing, the length of school attendance). The scale elements are the individual points of scale of numeric sizes not only expressed but also the absolute zero of scale. Only the absolute metric scale enables to calculate the divisions, the proportion of any two points of scale doesn´t depend on the choice of scale unit. Assigned example In the assigned example the statistical sign values “degree of export ability” are given by the degrees 1, 2, …, 5. It is evident the way of export ability expression had to be produced (e.g. degree 1 – exported 100%-80% of mined ore by enterprise of mining industry, degree 2 – exported 80%-60% of mined ore, … , degree 5 – exported 20%-0% of mined ore) – so the degrees 1, 2, …, 5 can be identified the scale of, which is the typical quantitative metric scale. The scale elements are the points of scale expressed by numerical sizes x1 = 1, x2 = 2, … , x5 = 5. This scale should reflect “the identical distance (e.g. 20%)” of export ability between any two neighboring scale elements. Check questions What is the creation of scale Is it possible to distinguish the types of scales according to which facts What are the basic types of scales What is the difference between the quantitative metric scale and absolute metric scale 1.3 Measurement Goals process of measurement expression of measurement results Acquired concepts and knowledge pieces measurement absolute frequency relative frequency cumulative frequencies Measurement The measurement is the process by which is one of k scale elements x1, x2, …, xk assigned to each statistical unit SU of selective statistical set SSS (with extent n of statististical units). The measurement results are the findings, that the scale element xi (i = 1, 2, …, k) was measured ni times. The summation of all the values ni (i = 1, 2, …, k), so called the absolute frequencies, must be equal to the extent n of selective statistical set SSS. Measurement The potential results of measurement (i = 1, 2, …, k) can be evaluated by the size of the probability which appears in the course of measurement. The statistical definition of probability works on n times independently carried out measurement (the number of measurement n corresponds to the extent of selective statistical set SSS) and on discovered the absolute frequencies ni of potential measurement results. The statistical probability p(xi) of result xi is then given by so called relative frequency ni / n. The summation of all the relative frequencies must be equal to 1. Measurement Also the cumulative frequencies can be classified as the results of the measurement. The cumulative frequency Σ (ni / n) is the probability that the measurement result will be measured lesser or equal to result xi. It is evident the cumulative frequencies can be detected only within quantitative metric or absolute metric scales. The cumulative frequencies, for example, are of great significance in the construction of financial or economical balance sheets. Assigned example Within the assigned example it is possible through table Tab.1 to discover that it was being worked with the scale created by 5 elements x1=1, x2=2, …, x5=5 (see the first column in table), their absolute frequencies were gradually n1=9, n2=15, n3=20, n4=4, n5=2 (see the second column in table). The relative frequencies ni / n are then presented in the third column of the table, the cumulative frequencies in the fourth column. Of the fifty enterprises selective statistical set (n=50) 9 enterprises were with the maximum export ability (probability of this degree is 0.18), 15 enterprises were with the lower degree than the highest degree (probability 0.30), 20 enterprises were with the middle export ability (probability 0.40), 4 enterprises were with the degree of development lower than middle degree (probability 0.08) and 2 enterprises were with the lowest degree of export ability (probability 0.04). Assigned example Within the assigned example the cumulative frequency, e.g. of result x3=3, is given by probability 0.88. This probability, that the degree 1, 2 or 3 will be determined within the investigation of export ability degree, can be determined by the summation of probabilities p(1) + p(2) + p(3) = 0.18 + 0.30 + 0.40 = 0.88. So the probability of detection of the middle degree is significantly high. Notes (1) In the case of quantitative metric scale or absolute metric scale the measurement can be considered the projection of statistical units set (e.g. within selective statistical set) into set of real numbers. The measurement methods depend on the expert field, which was defined in the investigated selective statistical set SSS. They will be different, e.g., in the investigation of a collective random phenomenon in sociology (various questionnaire forms of measurement) and the investigation of a collective random phenomenon in economy (various ways of export ability measurement before and after application of economical optimization of enterprise). Notes (2) The measurement method shall comply with the conditions of validity (whether it is measured what is to be measured), reliability (reproducibility of measurements) and objectivity (whether the various evaluators will mesure the statistical unit in the same way). The measurement results of investigated selective statistical set SSS are given by the information on statistical sign values, i.e. by the information on the absolute frequencies and the relative frequencies of individual scale elements and by the information on the cumulative frequencies. Check questions What is the measurement within statistical elaboration of collective random phenomenon What does the selection of measurement method depend on What conditions must the measurement method fulfil What are the results of measurement What is the statistical definition of probability How is the absolute and relative frequency defined How are the cumulative frequencies defined 1.4 Elementary statistical processing Goals Goals of investigation of descriptive statistics Empirical picture of selective statistical set Acquired concepts and knowledge pieces Frequencies tables Empirical distribution Graphical expression Plotting function – Graphical expression of empirical distribution Frequency polygon Empirical parameters General moments, e.g. average-means (arithmetic mean) Central moments, e.g. variance-standard deviation (determinative deviation) Standardized moments, e.g. obliqueness (skewness), pointedness (kurtosis) Statistical processing The measurement results, it is necessary to arrange, to express graphically and to express by suitable empirical parameters. These assignments can be fulfilled using the elementary statistical processing. The empirical picture of investigated selective statistical set SSS is the result of the elementary statistical processing. The elementary statistical processing also completes this group of major statistical methods that can be called descriptive statistics. The partial assignments “arrangement”, “graphical expression” and “expression by parameters” can be represented in three basic results of the elementary statistical processing – “table”, “empirical distributions (preferably in the shape of polygon)” and “empirical parameters”. 1.4.1. Table (1) The table represents a form of arrangement of the measurement results. In the description of the table stated in the assigned illustrating example, it can be watched the table Tab.1. The table contains eight columns. The first four columns are necessary partly for the display of the measurement results (fulfillment of task “arrangement”) partly for the representation of the empirical distributions (fulfillment of task “graphical expression”). The remaining four columns have the helping significance and they can be used to easy and quick calculation of empirical parameters (fulfillment of task “expression by parameters”). 1.4.1. Table (2) The first four columns contain: 1. column marked xi – scale elements 2. column marked ni – absolute frequencies of scale elements 3. column marked n / n – relative frequencies of scale elements 4. column marked Σ (n / n) – cumulative frequencies i i 1.4.1. Table (3) The following four columns contain the products needed for the calculation of empirical parameters: 5. column contains the products xi.ni 6. column contains the products xi2.ni 7. column contains the products xi3.ni 8. column contains the products xi4.ni 1.4.1. Table (4) The table is closed by summations of the data in individual columns. In the first four columns these summations have the checking significance, in the other four columns they are needed for the calculation of empirical parameters. 1.4.2. Empirical distributions of frequencies The empirical distributions of frequencies can be divided into two basic types. The first type assigns corresponding absolute frequencies ni or relative frequencies ni / n to the scale elements xi. The second type assigns corresponding cumulative frequencies Σ(ni / n) to the scale elements xi. 1.4.2. Empirical distributions of frequencies The graphical expression of empirical distribution of one- dimensional statistical set is connected with the use of the coordinate system in the plane. In this coordinate system the scale elements xi are always applied to horizontal axis, the corresponding frequencies to vertical axis. The graphical expression of these functional dependences is given by the set of points the first coordinate of which is always scale element xi, the second coordinate is corresponding frequency. By connection of neighbouring points of this set of the line segments it is possible to obtain the broken line which is called “polygon”. It is possible to distinguish “polygon of absolute frequencies”, “polygon of relative frequencies”, “polygon of cumulative frequencies”. 1.4.2. Empirical distributions of frequencies In addition to the graphical expression of empirical distributions by polygon the ranks of helping graphical representations is used. Their “advantage” is a deviation from mathematically exact apparatus and a certain quick orientation. The impossibility to continue by a deepen apparatus of the mathematical statistics is the shortage, above all from the point of view of the investigation of dependencies for the multi-dimensional statistical sets. The bar charts, the bar graphs, the pie charts, etcetera, belong to these helping graphical representations. Generally, it is possible to recommend the unique resorting to exact graphical expression. 1.4.2. Empirical distributions of frequencies The significance of the graphical expression of the empirical distribution is substantial. The graphical expression enables the immediate investigation which the theoretical distribution (in terms of probability theory) is close to the empirical distribution obtained as a result of descriptive statistics. The next significance consists in the immediate evaluation of parameters of location, variability, skewness and kurtosis of empirical distribution and by this way also of investigated statistical set. 1.4.2. Empirical distributions of frequencies Within the assigned example it is possible to practice, e.g., the construction of polygons of the absolute and the cumulative frequency. In figure Fig.2 the absolute frequencies polygon is represented, in figure Fig.3 then the cumulative frequencies polygon. 1.4.3. Empirical parameters The empirical parameters briefly and simply express the nature of investigated statistical set. The empirical parameters are mostly related to a selective statistical set that´s why they often bear the naming “selective parameters”. As selective parameters they have themselves the statistics-probability character and from this reason they behave as a special group of “statistical signs”. This view will not be developed in following explanation but it is necessary to draw attention to it, especially from the point of view of a deeper study of statistics and probability theory. Classification of emp. parameters 1. Classification according to the feature of the investigated statistical set (investigated statistical sign) parameters of location parameters of variability parameters of obliqueness (skewness) parameters of pointedness (kurtosis) 2. Classification of empirical parameters according to the way of their calculation: moment parameters (they work as a function of all values of statistical sign) quantile parameters (they represent only certain values of statistical sign) Quantile parameters The quantile parameters are closely related to the moment parameters but they are constructed by different way. The empirical quantile is always a certain value of statistical sign (which is expressed by quantitative metric or absolute metric scale). That value divides the number of smaller and greater values of statistical sign in certain ratio. E.g., the quantile dividing the values of statistical sign in the identical parts (i.e. fiftypercentage quantile) is called a “median”. The quantile parameters will not be investigated in more detail. Moment parameters The moment parameters are divided into general moments, central moments and standardized moments. The location moment (arithmetic mean) can be accurately characterized using general moment of 1.order, the variability moment (empirical variance) can be accurately characterized using central moment of 2.order , the obliqueness (skewness) and pointedness (kurtosis) can be accurately characterized using standardized moments of 3. and 4.order. Moment parameters As the standardized moments can be calculated using central moments and the central moments using general moments, the following procedure will be selected in next explanation (within this procedure the investigated statistical sign will be marked by letter x; the marks of statistical sign values xi, of absolute frequencies ni and of selective statistical set extent n don´t change themselves): Presentation of common relations for general and central moments Expression of needful central moments using general moments Expression of needful standardized moments using central moments Relations for general and central moments General moment of r-th order: General moment of 1. order: arithmetic mean Central moment of r-th order: Central moment of 2. order: empirical variance Determinative (standard) deviation: The expression of needful central moments using general moments The expression of needful standardized moments using central moments Since all the needful moment parameters can be determined using this procedure, now it is possible to describe the parameters of location, variability, obliqueness (skewness) and pointedness (kurtosis). Location parameter The location parameter is determined by general moment of 1. order O1(x) and it bears the name “arithmetic mean”. The position of the frequency empirical distribution is its location on the horizontal axis of the coordinate system. Variability parameter The variability parameter is determined by central moment of 2. order C2(x) and it bears the name “empirical variance” (the square root from variance then bears the name “standard deviation”). Determinative (standard) deviation shows what the information value is given to arithmetic mean. If the determinative (standard) deviation is large, the information value of arithmetic mean is small and vice versa. Obliqueness parameter The obliqueness parameter (skewness) is dominantly determined using standardized moment of 3. order N3(x) and it bears then the name “coefficient of skewness”. If the skewness coefficient is positive, then the scale elements lying to the left of the arithmetic mean have greater frequencies (positively skew distribution of frequencies – greater concentration of the lower scale elements, of the smaller values of statistical sign) and vice versa. Pointedness parameter The pointedness parameter (kurtosis) is dominantly determined using standardized moment of 4. order N4(x) and it bears then the name “coefficient of kurtosis”. The greater value of kurtosis coefficient corresponds to more pointed distribution of frequencies for a given variance. The quantity “excess”, defined by relation Ex = N4(x) – 3, is used as well. The excess compares the kurtosis of empirical distribution with the kurtosis of known standardized normal distribution. If the excess is positive, the empirical distribution is more pointed than this distribution. Assigned example In the assigned example the calculation of the empirical parameters of location, variability, skewness and kurtosis will be now carried out. The soonest the general moments of 1. to 4. order will be calculated using 5. up to 8. column of table Tab.1. O1(x) = 2.50 O2(x) = 7.26 O3(x) = 23.50 O4(x) = 82.86 Next part of the procedure will consist in the calculation of central moments of 2. up to 4. order: C2(x) = 1.031 (standard deviation Sx = 1.015) C3(x) = 0.300 C4(x) = 2.922 Assigned example Final part of the procedure of empirical parameters calculation will be aimed at the determination of standardized moments of 3. and 4. order and excess: 31 N3(x) = = 0.28 N4(x)= = 2.75 Ex = N4(x) – 3 = – 0.25 Assigned example Location parameter (arithmetic mean) O1(x) shows to the placement of frequencies empirical distribution on the horizontal axis – the arithmetic mean of export ability is 2.5 (a lower value than the middle degree of export ability) Determinative (standard) deviation expressed by the square root from C2(x) gives an indication of the arithmetic mean information value. An indication of the information value can be quantified by following way – in the range from export ability degree 1.5 to export ability degree 3.5 the 70% enterprises is roughly situated (the applicability of this information depends on whether the empirical distribution can be substituted by theoretical normal distribution). Assigned example The positive skewness coefficient N3(x) shows to the greater concentration of lower scale elements, of lower degrees of export ability development. The figure Fig.2 confirms that determination –the slight asymmetry of the left to the arithmetic mean. Relatively the high value of kurtosis coefficient and also the value of excess show to a comparability with the kurtosis of standardized normal distribution. This communication additionally supports the conclusion of arithmetic mean good information value. Check questions (1) What are the main goals of the elementary statistical processing How can be the measurement results arranged by suitable way How can be the measurement results graphically expressed by suitable way How can be the parameters of measurement results expressed by suitable way What is the empirical distribution of frequencies How can be the empirical distribution of one-dimensional statistical set expressed by graphical way Check questions (2) What is the frequency polygon What is the significance of graphical expression of empirical distribution How can be the empirical parameters divided according to described feature of investigated statistical set How can be the empirical parameters divided according to calculation way How are defined the general, central and standardized moments What is the most important parameter of location, variability, skewness and kurtosis, what is the statistical interpretation of these parameters How is the “excess” quantity defined and what is its significance
© Copyright 2026 Paperzz