In the assigned example the statistical sign values “degree of export

Ondrej Ploc
Part 1
The main methods of descriptive
statistics, Statistical probability
Outline
 1.1 Formulation of statistical investigation
 1.2 Creation of Scale
 1.3 Measurement, Probability
 1.4 Elementary statistical processing
1.1
Formulation of statistical
investigation
Goals
 Collective random phenomenon and reason of its
investigation
 Selective statistical set as a part of basic statistical set
Acquired concepts
and knowledge pieces
 collective random phenomenon
 statistical unit
 statistical sign – statistical character
 values of statistical sign
 basic statistical set – basic statistical file – population
 selective statistical set – sample statistical file
Assigned example (1)
 The 4000 enterprises have undergone tests on “export
ability”. The average “export ability”on a scale 1 to 5 (1 –
maximum export ability, 5 – minimum export ability)
was necessary to define for preliminary
information.That is why the 50 tests was randomly
selected and their results are presented in table Tab.1.
Elaborate the collective random phenomenon (export
ability of enterprise) gradually and complexly.
Assigned example (2)
 The results of 50 test elaboration
Collective random phenomenon CRP
 e.g. export ability of enterprise
 is the realization of the activities or processes whose result
cannot be predicted with certainty and which are taking
place in an extensive set of elements (e.g. enterprises).
These elements have the certain group of identical
properties (e.g. identical type of economical parameter –
enterprise character) and the other a group of different
properties (e.g. the different values of export ability of
global economical state of enterprise). Mathematical
statististics and probability theory deal with qualitative and
quantitative analysis of the patterns of collective random
phenomena.
The statistical unit SU
 is delimited by the identical properties of investigated
set elements (e.g. the enterprises and their character).
The statistical sign SS
 is given by some from different properties of
investigated set elements (e.g. by export ability of
enterprise).
The values of statistical sign VSS
 are a way of investigated statistical sign description (e.g.
the description of export ability of mining industry
enterprises by the percent of the mined ore transported for
the processing within fortnight from the extraction).
The basic statistical set BSS
 Population
 is given by all the statistical units, its extent is equal to
the number of all the statistical units (e.g. the extent of
investigated BSS is equal to the total number of 4000
enterprises in the assigned example). It is usually not
in the practical possibilities of statisticians to
investigate the statistical sign SS in all the statistical
units SU and it is required to limit the number of
statistical units SU.
The random selection RS
 is limit the number of investigated statistical units SU
in such a way, in order to transfer the results obtained
to the entire BSS. The various ways of random
selection are existing (drawing, generating a table of
random numerals, deliberate selection). It is necessary
to verify whether it could be considered as random
selection obtained.
The selected statistical set SSS
 is given those statistical units, which have been
selected from the basic statistical set by the process of
random selection. The extent of SSS is equal to the
number of selected statistical units (e.g. the extent of
SSS in the assigned example is equal to the number of
50 selected enterprises). Selected statistical set SSS is
one-dimensional if it investigated only one statistical
sign, multidimensional set found at, if investigated
more statistical signs.
Assigned example
The formulation of the statistical investigation is
implemented in the assigned example by the delimitation
of selective statistical set 50 enterprises. In the context of
this delimitation must be exactly characterized all the
follow-up concepts – investigated collective random
phenomenon CRP, definition of the statistical unit SU,
determination of the investigated statistical sign SS,
characterization of the statistical sign values VSS, exact
delimitation of the basic statistical set BSS and finally,
ensuring the procedure of random selection RS.
Check questions
 What is the subject of investigation of statististics and





probability theory
What is the collective random phenomenon
How is the statistical unit delimited
How are statistical sign and its values delimited
What is the difference between basic and selective
statistical set
Why is the procces of random selection important
1.2
Creation of Scale
Goals
 creation of scale
 choice of scale type
Acquired concepts
and knowledge pieces
 Scale
 classification of scales
 parameters of selective type of scale
Scale creation
 The scale creation is the suitable expression of
statistical sign values by means of scale elements.
The point is that the statistical sign values can be
divided into reasonable groups, into scale elements. The
system of scale elements creates the scale. The number k
of scale elements can be calculated, for example, by
Sturges rule k = 1 + 3.3 log10n, where n is an extent of
selective statistical set SSS.
Classification of scales
 According to the nature of statistical sign
1. qualitative (nominal)
2. Ordinal
3. quantitative metric
4. absolute metric.

The classification of scales can be used also to
classify statistical signs. In some cases, the statistical
sign values immediately identify the scale and scaling
isn´t necessary.
The nominal scale
 is the classification into categories (the scale elements
are the individual categories). For every two statistical
units of selective statistical set it is possible to decide
whether or not they are in terms of investigated
statistical sign of identical or different (such as gender
or employment, if the statistical units are individual
persons).
The ordinal scale
 enables you to not only decide on the identity or the
diversity of the statistical units, but also to establish
their order (e.g., achieve the degree of scholastic
education). The scale elements are the individual
order. This one doesn´t enable to determine the
distance between two neighbouring statistical units
arranged according to this scale.
The quantitative metric scale
 already enables to establish the distance between two
neighbouring statistical units – from this perspective,
it is needful to define the unit of scale (e.g. percentage
evaluation of export ability or other parameter of the
global economical condition, the temperature in
degrees Celsius). The scale elements are the individual
points of scale expressed the numerical sizes. The
quantitative metric scale expesses the values of
statistical sign without the possibility factually to
interpret, in the beginning (zero point) of scale – the
choice of scale beginning is the question of free choice.
The absolute metric scale
 is a quantitative metric scale and, in addition, it can be
interpreted in the beginning of the scale factually – the
scale zero responds to real zero value of investigated
statistical sign (e.g. the temperature in degrees Kelvin,
the number of errors in testing, the length of school
attendance). The scale elements are the individual
points of scale of numeric sizes not only expressed but
also the absolute zero of scale. Only the absolute
metric scale enables to calculate the divisions, the
proportion of any two points of scale doesn´t depend
on the choice of scale unit.
Assigned example
In the assigned example the statistical sign values “degree
of export ability” are given by the degrees 1, 2, …, 5. It is
evident the way of export ability expression had to be
produced (e.g. degree 1 – exported 100%-80% of mined ore
by enterprise of mining industry, degree 2 – exported
80%-60% of mined ore, … , degree 5 – exported 20%-0%
of mined ore) – so the degrees 1, 2, …, 5 can be identified
the scale of, which is the typical quantitative metric scale.
The scale elements are the points of scale expressed by
numerical sizes x1 = 1, x2 = 2, … , x5 = 5. This scale should
reflect “the identical distance (e.g. 20%)” of export ability
between any two neighboring scale elements.
Check questions
 What is the creation of scale
 Is it possible to distinguish the types of scales
according to which facts
 What are the basic types of scales
 What is the difference between the quantitative metric
scale and absolute metric scale
1.3
Measurement
Goals
 process of measurement
 expression of measurement results
Acquired concepts
and knowledge pieces
 measurement
 absolute frequency
 relative frequency
 cumulative frequencies
Measurement
The measurement is the process by which is one of k
scale elements x1, x2, …, xk assigned to each statistical
unit SU of selective statistical set SSS (with extent n of
statististical units). The measurement results are the
findings, that the scale element xi (i = 1, 2, …, k) was
measured ni times. The summation of all the values ni (i
= 1, 2, …, k), so called the absolute frequencies, must be
equal to the extent n of selective statistical set SSS.
Measurement
The potential results of measurement (i = 1, 2, …, k) can
be evaluated by the size of the probability which appears
in the course of measurement. The statistical definition
of probability works on n times independently carried
out measurement (the number of measurement n
corresponds to the extent of selective statistical set SSS)
and on discovered the absolute frequencies ni of
potential measurement results. The statistical
probability p(xi) of result xi is then given by so called
relative frequency ni / n. The summation of all the
relative frequencies must be equal to 1.
Measurement
Also the cumulative frequencies can be classified as the
results of the measurement. The cumulative frequency
Σ (ni / n) is the probability that the measurement result
will be measured lesser or equal to result xi. It is evident
the cumulative frequencies can be detected only within
quantitative metric or absolute metric scales. The
cumulative frequencies, for example, are of great
significance in the construction of financial or
economical balance sheets.
Assigned example
Within the assigned example it is possible through table Tab.1 to
discover that it was being worked with the scale created by 5
elements x1=1, x2=2, …, x5=5 (see the first column in table), their
absolute frequencies were gradually n1=9, n2=15, n3=20, n4=4, n5=2
(see the second column in table). The relative frequencies
ni / n are then presented in the third column of the table, the
cumulative frequencies in the fourth column. Of the fifty enterprises
selective statistical set (n=50) 9 enterprises were with the maximum
export ability (probability of this degree is 0.18), 15 enterprises were
with the lower degree than the highest degree (probability 0.30), 20
enterprises were with the middle export ability (probability 0.40), 4
enterprises were with the degree of development lower than middle
degree (probability 0.08) and 2 enterprises were with the lowest
degree of export ability (probability 0.04).
Assigned example
Within the assigned example the cumulative frequency,
e.g. of result x3=3, is given by probability 0.88. This
probability, that the degree 1, 2 or 3 will be determined
within the investigation of export ability degree, can be
determined by the summation of probabilities p(1) + p(2)
+ p(3) = 0.18 + 0.30 + 0.40 = 0.88. So the probability of
detection of the middle degree is significantly high.
Notes (1)
 In the case of quantitative metric scale or absolute metric
scale the measurement can be considered the projection of
statistical units set (e.g. within selective statistical set) into
set of real numbers.
 The measurement methods depend on the expert field,
which was defined in the investigated selective statistical
set SSS. They will be different, e.g., in the investigation of a
collective random phenomenon in sociology (various
questionnaire forms of measurement) and the
investigation of a collective random phenomenon in
economy (various ways of export ability measurement
before and after application of economical optimization of
enterprise).
Notes (2)
 The measurement method shall comply with the
conditions of validity (whether it is measured what is to be
measured), reliability (reproducibility of measurements)
and objectivity (whether the various evaluators will mesure
the statistical unit in the same way).
 The measurement results of investigated selective
statistical set SSS are given by the information on statistical
sign values, i.e. by the information on the absolute
frequencies and the relative frequencies of individual scale
elements and by the information on the cumulative
frequencies.
Check questions
 What is the measurement within statistical






elaboration of collective random phenomenon
What does the selection of measurement method
depend on
What conditions must the measurement method fulfil
What are the results of measurement
What is the statistical definition of probability
How is the absolute and relative frequency defined
How are the cumulative frequencies defined
1.4
Elementary statistical processing
Goals
 Goals of investigation of descriptive statistics
 Empirical picture of selective statistical set
Acquired concepts
and knowledge pieces









Frequencies tables
Empirical distribution
Graphical expression
Plotting function – Graphical expression of empirical
distribution
Frequency polygon
Empirical parameters
General moments, e.g. average-means (arithmetic mean)
Central moments, e.g. variance-standard deviation
(determinative deviation)
Standardized moments, e.g. obliqueness (skewness),
pointedness (kurtosis)
Statistical processing
 The measurement results, it is necessary to arrange, to
express graphically and to express by suitable empirical
parameters. These assignments can be fulfilled using the
elementary statistical processing. The empirical picture of
investigated selective statistical set SSS is the result of the
elementary statistical processing. The elementary
statistical processing also completes this group of major
statistical methods that can be called descriptive statistics.
 The partial assignments “arrangement”, “graphical
expression” and “expression by parameters” can be
represented in three basic results of the elementary
statistical processing – “table”, “empirical distributions
(preferably in the shape of polygon)” and “empirical
parameters”.
1.4.1. Table (1)
 The table represents a form of arrangement of the
measurement results. In the description of the table stated
in the assigned illustrating example, it can be watched the
table Tab.1.
 The table contains eight columns. The first four columns
are necessary partly for the display of the measurement
results (fulfillment of task “arrangement”) partly for the
representation of the empirical distributions (fulfillment of
task “graphical expression”). The remaining four columns
have the helping significance and they can be used to easy
and quick calculation of empirical parameters (fulfillment
of task “expression by parameters”).
1.4.1. Table (2)
 The first four columns contain:
1. column marked xi – scale elements
2. column marked ni – absolute frequencies of scale
elements
3. column marked n / n – relative frequencies of
scale elements
4. column marked Σ (n / n) – cumulative
frequencies
i
i
1.4.1. Table (3)
 The following four columns contain the products
needed for the calculation of empirical parameters:
5. column contains the products xi.ni
6. column contains the products xi2.ni
7. column contains the products xi3.ni
8. column contains the products xi4.ni
1.4.1. Table (4)
 The table is closed by summations of the data in
individual columns. In the first four columns these
summations have the checking significance, in the
other four columns they are needed for the calculation
of empirical parameters.
1.4.2. Empirical
distributions of frequencies
 The empirical distributions of frequencies can be
divided into two basic types. The first type assigns
corresponding absolute frequencies ni or relative
frequencies ni / n to the scale elements xi. The second
type assigns corresponding cumulative frequencies
Σ(ni / n) to the scale elements xi.
1.4.2. Empirical
distributions of frequencies
 The graphical expression of empirical distribution of one-
dimensional statistical set is connected with the use of the
coordinate system in the plane. In this coordinate system
the scale elements xi are always applied to horizontal axis,
the corresponding frequencies to vertical axis. The
graphical expression of these functional dependences is
given by the set of points the first coordinate of which is
always scale element xi, the second coordinate is
corresponding frequency. By connection of neighbouring
points of this set of the line segments it is possible to
obtain the broken line which is called “polygon”. It is
possible to distinguish “polygon of absolute frequencies”,
“polygon of relative frequencies”, “polygon of cumulative
frequencies”.
1.4.2. Empirical
distributions of frequencies
 In addition to the graphical expression of empirical
distributions by polygon the ranks of helping graphical
representations is used. Their “advantage” is a deviation
from mathematically exact apparatus and a certain quick
orientation. The impossibility to continue by a deepen
apparatus of the mathematical statistics is the shortage,
above all from the point of view of the investigation of
dependencies for the multi-dimensional statistical sets.
The bar charts, the bar graphs, the pie charts, etcetera,
belong to these helping graphical representations.
Generally, it is possible to recommend the unique resorting
to exact graphical expression.
1.4.2. Empirical
distributions of frequencies
 The significance of the graphical expression of the
empirical distribution is substantial. The graphical
expression enables the immediate investigation which
the theoretical distribution (in terms of probability
theory) is close to the empirical distribution obtained
as a result of descriptive statistics. The next
significance consists in the immediate evaluation of
parameters of location, variability, skewness and
kurtosis of empirical distribution and by this way also
of investigated statistical set.
1.4.2. Empirical
distributions of frequencies
Within the assigned example it is possible to practice,
e.g., the construction of polygons of the absolute and the
cumulative frequency. In figure Fig.2 the absolute
frequencies polygon is represented, in figure Fig.3 then
the cumulative frequencies polygon.
1.4.3. Empirical parameters
The empirical parameters briefly and simply express the
nature of investigated statistical set. The empirical
parameters are mostly related to a selective statistical set
that´s why they often bear the naming “selective
parameters”. As selective parameters they have
themselves the statistics-probability character and from
this reason they behave as a special group of “statistical
signs”. This view will not be developed in following
explanation but it is necessary to draw attention to it,
especially from the point of view of a deeper study of
statistics and probability theory.
Classification of emp. parameters
1. Classification according to the feature of the
investigated statistical set (investigated statistical sign)
 parameters of location
 parameters of variability
 parameters of obliqueness (skewness)
 parameters of pointedness (kurtosis)
2. Classification of empirical parameters according to
the way of their calculation:
 moment parameters (they work as a function of all
values of statistical sign)
 quantile parameters (they represent only certain values
of statistical sign)
Quantile parameters
The quantile parameters are closely related to the
moment parameters but they are constructed by
different way. The empirical quantile is always a certain
value of statistical sign (which is expressed by
quantitative metric or absolute metric scale). That value
divides the number of smaller and greater values of
statistical sign in certain ratio. E.g., the quantile dividing
the values of statistical sign in the identical parts (i.e.
fiftypercentage quantile) is called a “median”. The
quantile parameters will not be investigated in more
detail.
Moment parameters
The moment parameters are divided into general
moments, central moments and standardized moments.
The location moment (arithmetic mean) can be
accurately characterized using general moment of
1.order, the variability moment (empirical variance) can
be accurately characterized using central moment of
2.order , the obliqueness (skewness) and pointedness
(kurtosis) can be accurately characterized using
standardized moments of 3. and 4.order.
Moment parameters
 As the standardized moments can be calculated using
central moments and the central moments using general
moments, the following procedure will be selected in next
explanation (within this procedure the investigated
statistical sign will be marked by letter x; the marks of
statistical sign values xi, of absolute frequencies ni and of
selective statistical set extent n don´t change themselves):
 Presentation of common relations for general and central
moments
 Expression of needful central moments using general
moments
 Expression of needful standardized moments using central
moments
Relations for
general and central moments
 General moment of r-th order:
 General moment of 1. order:
 arithmetic mean
 Central moment of r-th order:
 Central moment of 2. order:
 empirical variance
 Determinative (standard) deviation:
The expression of needful central
moments using general moments
The expression of needful
standardized moments using central
moments
 Since all the needful moment parameters can be
determined using this procedure, now it is possible to
describe the parameters of location, variability, obliqueness
(skewness) and pointedness (kurtosis).
Location parameter
 The location parameter is determined by general
moment of 1. order O1(x) and it bears the name
“arithmetic mean”. The position of the frequency
empirical distribution is its location on the horizontal
axis of the coordinate system.
Variability parameter
The variability parameter is determined by central
moment of 2. order C2(x) and it bears the name
“empirical variance” (the square root from variance then
bears the name “standard deviation”). Determinative
(standard) deviation shows what the information value
is given to arithmetic mean. If the determinative
(standard) deviation is large, the information value of
arithmetic mean is small and vice versa.
Obliqueness parameter
The obliqueness parameter (skewness) is dominantly
determined using standardized moment of 3. order
N3(x) and it bears then the name “coefficient of
skewness”. If the skewness coefficient is positive, then
the scale elements lying to the left of the arithmetic
mean have greater frequencies (positively skew
distribution of frequencies – greater concentration of
the lower scale elements, of the smaller values of
statistical sign) and vice versa.
Pointedness parameter
The pointedness parameter (kurtosis) is dominantly
determined using standardized moment of 4. order
N4(x) and it bears then the name “coefficient of
kurtosis”. The greater value of kurtosis coefficient
corresponds to more pointed distribution of frequencies
for a given variance. The quantity “excess”, defined by
relation Ex = N4(x) – 3, is used as well. The excess
compares the kurtosis of empirical distribution with the
kurtosis of known standardized normal distribution. If
the excess is positive, the empirical distribution is more
pointed than this distribution.
Assigned example
In the assigned example the calculation of the empirical parameters
of location, variability, skewness and kurtosis will be now carried
out. The soonest the general moments of 1. to 4. order will be
calculated using 5. up to 8. column of table Tab.1.
 O1(x) = 2.50
 O2(x) = 7.26
 O3(x) = 23.50
 O4(x) = 82.86
 Next part of the procedure will consist in the calculation of
central moments of 2. up to 4. order:
 C2(x) = 1.031 (standard deviation Sx = 1.015)
 C3(x) = 0.300
 C4(x) = 2.922
Assigned example
 Final part of the procedure of empirical parameters
calculation will be aimed at the determination of
standardized moments of 3. and 4. order and excess: 31
 N3(x) = = 0.28
 N4(x)= = 2.75
 Ex = N4(x) – 3 = – 0.25
Assigned example
 Location parameter (arithmetic mean) O1(x) shows to the
placement of frequencies empirical distribution on the
horizontal axis – the arithmetic mean of export ability is 2.5
(a lower value than the middle degree of export ability)
 Determinative (standard) deviation expressed by the square
root from C2(x) gives an indication of the arithmetic mean
information value. An indication of the information value
can be quantified by following way – in the range from
export ability degree 1.5 to export ability degree 3.5 the 70%
enterprises is roughly situated (the applicability of this
information depends on whether the empirical distribution
can be substituted by theoretical normal distribution).
Assigned example
 The positive skewness coefficient N3(x) shows to the
greater concentration of lower scale elements, of lower
degrees of export ability development. The figure Fig.2
confirms that determination –the slight asymmetry of
the left to the arithmetic mean.
 Relatively the high value of kurtosis coefficient and also
the value of excess show to a comparability with the
kurtosis of standardized normal distribution. This
communication additionally supports the conclusion of
arithmetic mean good information value.
Check questions (1)
 What are the main goals of the elementary statistical





processing
How can be the measurement results arranged by suitable
way
How can be the measurement results graphically expressed
by suitable way
How can be the parameters of measurement results
expressed by suitable way
What is the empirical distribution of frequencies
How can be the empirical distribution of one-dimensional
statistical set expressed by graphical way
Check questions (2)
 What is the frequency polygon
 What is the significance of graphical expression of empirical





distribution
How can be the empirical parameters divided according to
described feature of investigated statistical set
How can be the empirical parameters divided according to
calculation way
How are defined the general, central and standardized moments
What is the most important parameter of location, variability,
skewness and kurtosis, what is the statistical interpretation of
these parameters
How is the “excess” quantity defined and what is its significance