Descriptive statistics

Descriptive Statistics
Dr. Asif Rehman

Measures of Central Tendency
 Measure of central tendency refer to the summary measures
used to describe the most “typical” value in a set of values.
 The three most common measure of central tendency are:
Mean
2. Median
3. Mode
1.
Mean
 The most popular measure of central tendency for a
quantitative data set.
 Also known as the average.
 It is calculated by adding all the observations and dividing by
total number of observations.
 The sample mean is denoted by x̅ (pronounced x bar) and the
population mean is denoted by µ (the Greek letter mu).
 Mean can only be calculated for quantitative data
Mean
Suppose we draw a sample of five women and measure their
weights in pounds.
110, 110, 140, 150, 160
The Mean weight would be equal to:
110+110+140+150+160/5 = 670/5 = 134 pounds
Median
 The median is an important measure of central tendency.
 It is the value that divides the a distribution in to two equal halves
 Arrange the observations in order from smallest to largest value or
vice versa.
 If there are an odd number of total observations, the Median is the
middle value.
 If there is an even number of total observations, the Median is the
average of of the two middle values.
 The Median value is useful when some measurements are much
bigger or much smaller than the rest. The mean of such data will be
biased towards these extreme values while the median is not
influenced by extreme values.
Median
Suppose we draw a sample of five women and measure their
weights in pounds.
110, 140, 110, 160, 150
Arrange is ascending order (110+110+140+150+160)
The Median value would be 140 pounds since 140 pound is the
middle weight
Mode
 The Mode is the most frequently occurring value in a set of
observations.

in a set 110, 110, 140, 150, 160
 The most frequent value is 110 ( as occurring twice) so the
mode of the data is 110 pounds.
Mean versus Median
 The mean may be better indicator of the most, typical value, if a set
of scores has an outlines. An outliner is an extreme value that differs
greatly from other values.
 Scores that are much above or below the mean are called outliners.
E.g if in the above mentioned data one individual has a wt of 250 Ibs
(wt of 160 Ibs replaced by 250 Ibs). This will be an extreme value, e.g
outliner and will impact the mean value.

Mean = (110+110+140+150+250)/5 = 760/5 = 152 Ibd
 The mean value on account of 250 Ibs is much higher than most
reading in the data set. Hence, in such cases median should be
reported which will continue to be 140
 However, when the sample size is large & doesn’t include outliners,
the mean score usually provides a better measure of central tendency
Measures of variations
 These includes the measures to describe the amount of
variability or spread in a set of data. The most common
measures of variability are the
 Range
 Variance
 Standard deviation.
Measures of variations
Range
 Range is the simplest measure of variability. It is defined as the
difference in value between the highest and lowest observation
in the data set.
 For example consider the following women weight in the data
set 110, 110, 140, 150, 160. The range would be:
110-160 = 50
Measures of variations
Variance
Variance quantifies the amount of variability or spread about the mean of
the sample. For instance, the women weight in the previous example
were 110, 110, 140, 150 and 160 pound.
Variance (S) = ξ ( x1 – x̅ )2 / (n – 1)
X1 = individual sample observation
x̅ = sample mean
N = total sample size
ξ = some of the differences b/w individual sample observation and sample
mean
Measures of variations
Variance
Variance (S) = ξ ( x1 – x̅ )2 / (n – 1)
= [(110-134)2 + (110-134)2+(140-134)2+(150-134)2+(160-134)]2/5-1
= [(-24)2 + (-24)2+(6)2+(16)2+(26)]2/5-1
= [576+576+36+256+676]/4 = 2120/4
= 530
(To avoid the (–)sign we used the principle of squaring the value to get rid of the
minus sign. Hence we obtain the squared difference of each value form the
mean)
Standard Deviation
 SD is the square root of the variance.
 The SD is a measure, which describes how much individual
measurement differs, on the average from the mean.
SD =
=
S
(530)
= 23.02
Standard Deviation
 A large SD reflects that there is a wide scatter of measured
values around the mean
 while a small SD reflects that the individual values are
concentrated around the mean with little variation among
them.
Standard Deviation
 Remember in planning and decision making we are interested in
figure which tells us the average difference of each value from
the mean but what we obtained in Variance is the average of
square of the difference, so to have an average of the difference
we need to reverse the process i.e we would have to take the
square root of the value.
 And this figure we obtain which we have all our interest and the
most powerful tool in Biostatistics and is termed as Standard
Deviation.
Standard Deviation Exercise
Suppose, for a study on 300 chronic kidney disease patients, the Hb
levels were obtained. The data on Hb level is plotted. The data is
normally distributed, with a mean Hb and SD are calculated as 7
mg/dL and 1 mg/dL, respectively: (3 marks)
Calculate the number of patients who’s Hb level will be within
range of 6mg/dL to 8 mg/dL.
SD
SD
Mean = 7mg/dl
SD = 1 mg/dl
No of pt at 1 SD either side= 300 x 68% = 204
THANK YOU