Reliability and Risk Analysis Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Elementary Statistical Terms Population consists of all elements – individuals, items, or objects – whose characteristics are being studied. The population that is being studied is also called target population. A unit is a single entity (usually a person or an object) whose characteristics are of interest. Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Elementary Statistical Terms A sample from a statistical population is a proportion (a subset) of the population selected for study. A survey that includes every member of the population is called census. The technique of collecting information from a proportion of the population is called sample survey. A sample that represents the characteristics of the population as closely as possible is called a representative sample. Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Elementary Statistical Terms A variable is a characteristic under study that assumes different values for different elements. The value of variable for an element is called an observation or measurement. A data set is a collection of observations on one or more variables. The number of observations we call a sample size and denote usually n. Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Main Types of Data (variables) Basic types of data (variables): nominal or categorical ordinal cardinal or numerical discrete continuous Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency • Frequency nj is number of occurrences of variant xj . We can write where k is the number of variants. • Relative frequency is given pj = it fulfills Pk j=1 nj , n pj = 1. • Cumulative frequency Nj Nj = n1 + · · · + nj • Relative cumulative frequency Fj Fj = Nj = p1 + · · · + pj n Jiří Neubauer Multidimensional Data Pk j=1 nj = n, Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency – example We have data set containing the heights of 50 randomly chosen 15 months old boys (in cm): 83 85 81 82 84 82 79 84 80 81 82 82 80 82 80 82 83 84 82 79 83 82 83 82 82 82 81 80 82 82 83 80 82 85 81 83 81 81 83 82 81 85 83 79 81 81 81 84 81 82 Height Freq. Rel. freq. Cumulative Rel. cum. xi ni pi frequency Ni frequency Fi 79 3 0.06 3 0.06 80 5 0.10 8 0.16 81 11 0.22 19 0.38 82 16 0.32 35 0.70 83 8 0.16 43 0.86 84 4 0.08 47 0.94 85 3 0.06 50 1.00 Σ 50 1.00 — — Tabulka: Frequency table – height of 15 months old boys Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency – example Obrázek: Frequency distribution Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Emprirical distribution function We define empirical distribution function as follows N(xi ≤ x) Fn (x) = , n where the expression in the numerator indicates the number of elements which value is equal or less than x. Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency – example We have data set containing the quantity of the dust particles (in µg/m3 ): 1.23 1.51 1.41 1.14 1.47 1.10 1.53 1.22 1.34 1.24 1.54 1.31 1.27 1.16 1.45 1.34 1.23 1.37 1.51 1.29 1.06 1.31 1.14 1.58 1.17 1.09 1.27 1.22 1.33 1.63 1.41 1.17 1.43 1.31 1.39 1.48 1.27 1.40 1.04 1.02 1.52 1.34 1.41 1.58 1.38 Create a frequency table and plot the data. Jiří Neubauer Multidimensional Data 1.37 1.27 1.51 1.12 1.39 1.37 1.09 1.51 1.19 1.43 1.63 1.01 1.47 1.17 1.28 Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency – example Class (1.00; 1.10i (1.10; 1.20i (1.20; 1.30i (1.30; 1.40i (1.40; 1.50i (1.50; 1.60i (1.60; 1.70i Σ Middle xj 1.05 1.15 1.25 1.35 1.45 1.55 1.65 — Freq. nj 7 8 11 14 9 9 2 60 Rel. freq. pj 0.177 0.133 0.183 0.233 0.150 0.150 0.033 1 Cum. freq. Nj 7 15 26 40 49 58 60 — Rel. cum. Freq. Fj 0.117 0.250 0.433 0.667 0.817 0.967 1.000 — Tabulka: Frequency table – quantity of dust particles in µg/m3 Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Frequency and relative frequency – example Obrázek: Frequency distribution – histograms Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures One-dimensional discrete data One-dimensional continuous data Empirical distribution function Obrázek: Empirical distribution function Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional discrete data y1 .. . , X has r variants and Y has s variants. yn x1 .. Let us have two-dimensional data set . xn Joint absolute frequency of (xj , yk ) is njk = N(X = xj ∧ Y = yk ). Joint relative frequency of (xj , yk ) is pjk = njk . n Marginal absolute frequency of the variant xj is nj. = N(X = xj ) = nj1 + · · · + njs . Marginal relative frequency of the variant xj is pj. = nj. = pj1 + · · · + pjs . n Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional discrete data Marginal absolute frequency of the variant yj is n.k = N(X = yk ) = n1k + · · · + nrk . Marginal relative frequency of the variant yk is p.k = n.k = p1k + · · · + prk . n Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional discrete data – example The age of 42 dwarf apple-trees in years (X ) and the annual harvest (Y ) were recorded, see the table below. . xj 3 4 5 6 7 8 9 4 9 9 10 9 8 5 7 5 8 8 7 7 4 5 7 9 10 8 7 6 Jiří Neubauer yi 5 6 10 10 9 8 7 5 8 7 10 10 6 6 7 7 9 9 10 8 Multidimensional Data 8 Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional discrete data – example age/harvest 3 4 5 6 7 8 9 n.k 4 1 0 0 0 0 0 1 2 5 3 1 0 0 0 0 1 5 6 0 1 0 0 0 1 2 4 7 1 2 2 0 1 2 1 9 8 0 2 1 1 1 2 1 8 9 0 1 2 1 3 0 0 7 10 0 0 1 4 1 1 0 7 Tabulka: Frequency table Jiří Neubauer Multidimensional Data nj. 5 7 6 6 6 6 6 42 Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional discrete data – example Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional continuous data x1 y1 . .. Let us have two-dimensional data set .. . , we split values of X into r intervals xn yn (uj , uj+1 i, j = 1, . . . , r and values of Y into s intervals (vk , vk+1 i, k = 1, . . . , s. Each frequency is then related to the frequency of values at given intervals. Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional continuous data – example We have 34 measurement of pH and bicarbonate HCO− 3 in water. Construct distribution table. pH 7.6 7.1 8.2 7.5 7.4 7.8 7.3 8.0 7.1 HCO− 3 157 174 175 188 171 143 217 190 142 pH 7.5 8.1 7.0 7.3 7.8 7.3 8.0 8.5 7.1 HCO− 3 190 215 199 262 105 121 81 82 210 Jiří Neubauer pH 8.2 7.9 7.6 8.8 7.2 7.9 8.1 7.7 8.4 HCO− 3 202 155 157 147 133 53 56 113 35 Multidimensional Data pH 7.4 7.3 8.5 7.8 6.7 7.1 7.3 HCO− 3 125 76 48 147 117 182 87 Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional continuous data – example pH/HCO− 3 6.6–7.0 7.0–7.4 7.4–7.8 7.8–8.2 8.2–8.6 8.6–9.0 n.k 30–70 0 0 0 2 2 0 4 70–110 0 2 1 1 1 0 5 110–150 1 3 4 0 0 1 9 150–190 0 2 5 3 0 0 10 190–230 1 2 0 2 0 0 5 Tabulka: Distribution table Jiří Neubauer Multidimensional Data 230–270 0 1 0 0 0 0 1 nj. 2 10 10 8 3 1 34 Statistical data One-dimensional data Two-dimensional data Descriptive measures Two-dimensional discrete data Two-dimensional continuous data Two-dimensional continuous data – example Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Descriptive measures measures of location (center) – mean, quantiles, mode, . . . measures of dispersion (variation) – variance, standard deviation, sample variance, sample standard deviation, . . . measures of concentration – skewness and kurtosis measures of dependency – coefficients of correlations Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of location mean: arithmetic mean x = 1 n harmonic mean x H = geometric mean x G = n P xi i=1 n n P 1 x i=1 i n √ x1 · x2 · · · xn quantile: The quantile xp is the value of the variable which fulfills that 100p % of values of ordered sample (or population) are smaller or equal to xp and 100(1 − p) % of values of ordered sample (or population) are larger or equal to xp . mode: x̂ is the value with highest frequency Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of dispersion range of variation: R = xmax − xmin interquartile range: RQ = x0,75 − x0,25 n P variance: sn2 = n1 (xi − x)2 i=1 √ 2 sn n P 1 standard deviation sn = sample variance s 2 = n−1 (xi − x)2 i=1 √ sample standard deviation s = s 2 n P average deviation dx = n1 |xi − x| i=1 Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of concentration n P (xi − x)3 1 i=1 skewness: a3 = n sn3 n P (xi − x)4 1 i=1 kurtosis: a4 = −3 n sn4 Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of dependency x1 y1 . .. Let us have two-dimensional data set .. . , where x and y denotes means of X xn yn and Y , sx , sy are standard deviations of X , Y . Pearson correlation coefficient is defined by formula n 1 X xi − x yi − y . rxy = n i=1 sx sy We can rewrite it in the form rxy = where sxy = sxy , sx sy n 1X (xi − x)(yi − y ) n i=1 is covariance of X a Y . Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of dependency We calculate ranks of values xi , yi and denote them pi , qi Spearman’s correlation coefficient (rank correlation coefficient) is then defined by the formula P 6 ni=1 (pi − qi )2 ρ=1− . n(n − 1) Jiří Neubauer Multidimensional Data Statistical data One-dimensional data Two-dimensional data Descriptive measures Measures of dependency We say that (xi , yi ) and (xj , yj ) concordant if both xi > xj and yi > yj or if both xi < xj and yi < yj . We say that they are discordant, if both xi < xj and yi > yj or if both xi > xj and yi < yj . If xi = xj or yi = yj , the pair is neither concordant nor discordant. Let us denote nc number of concordant pairs and nd number od discordant pairs. Kendall correlation coefficient is defined by formula τ = nc − nd . 1 n(n − 1) 2 Jiří Neubauer Multidimensional Data
© Copyright 2026 Paperzz