DESCRIPTIVE STATISTICS Lectures delivered by Prof.K.K.Achary YENEPOYA RESEARCH CENTRE • A descriptive measure of a set of values ( observations/data set ) is a representative value( generally a single number) which summarizes the characteristic properties/features of data. The common features observed are • concentration of data around a central value • scatter( dispersion/spread) around a central value • asymmetry in data distribution • peakedness of data distribution Contd… • The ‘central value’ and ‘spread’ are numerical summaries of the data. • The center of a data set is commonly called the average.There are many ways to describe the average value of a distribution. • The most appropriate measures of center and spread depend on the shape of the distribution and scale of measurement. • Once the characteristics of the distribution are known, we can analyze the data for interesting features, including unusual data values, called outliers. • • • • Consider the following data sets. Age at first child birth: 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Blood cholesterol: 145,145,143,189,180,196,145,198,205,209,227,2 221,292,302, 253,233,283 • Day temperature: 27,26,27,28,29,29,29,30,30,30,29,30,31,31,31,32 ,31,32,32,35 • What are your general comments? • The descriptive measures can be computed for the whole mass of data( if available(?)) i.e. population ,related to a study or only a representative part of the population,i.e. sample (random/non random sample ). • If we compute a measure from a random sample, it is termed “ statistic”/descriptive statistics. • The measures describing the characteristics of a population are called parameters. Using the sample data we estimate the parameters.The functions of sample observations which estimate the parameters are called ‘ estimators’ / ‘statistic’. • The numerical value of the estimator is called an ‘estimate’. Measures of central Tendency(Averages) • Some of the measures of central tendency are: • Mean( arithmetic mean or average value as commonly used) • Median • Mode • We have positional averages like, quartiles, deciles and percentiles • Arithmetic mean: • Definition: Arithmetic mean of a sample ( or population ) is the “average” of all observations in the sample/population. • It is given by • Arithmetic mean = (Total of all values in the sample/population)/Number of values in the sample/population. • Let there be n values in the sample and they are recorded as : • x1 , x2 , x3 ,… xn ( you can also use y1 ,y2 , y3,… yn ) • Sum or total of n observations-( x1 + x2 + x3 +…. + xn ) = ∑xi xi Mean • n • This formula is used to compute the mean of ungrouped data.( What is ungrouped data? What is grouped data? ) • Computation steps: – Add all observations(Sum up) – Find no. of observations in the sample data set ( n ) – Divide the total by n. This value is mean • • • • • • • Example: Consider the age data. 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Here n=15 Now ,the sum( total) is 360. Hence mean= (360/15) = 24 The mean age at first child birth is 24 years. • Compute the mean level of blood cholesterol. • Computation of mean in grouped data: • Grouped data may be in the form of a discrete frequency distribution or continuous frequency distribution. • Let us first consider a discrete frequency distribution Example of a discrete frequency table (distribution) • Score level (xi) • 0 • 1 • 2 • 3 • 4 • 5 • Total No. of students (fi) 3 12 23 46 18 6 108 • The mean is computed the same way as before. • Total of all observations = 0x3+1x12+2x23+3x46+4x18+5x6 = 298 • No. of observations=108 • Mean = 298/108 =2.759259 =2.76 • For the above case ,computational formula can be developed as follows. • Let the distinct values be x1 , x2 , x3 ,…. xk where xi repeats fi times. • Then the total of all observations is given by x1f1 +x2f2 + … +xkfk.. The total number of observations is f1+f2+ … +fk = n • Hence the formula for mean is: x f Mean i n i • Computation of mean from continuous frequency table ( distribution ): • Recollect the method of constructing a frequency table when observations are recorded on a continuous variable. • Let the midpoint of the ith class interval be denoted as xi and the corresponding class frequency be fi. • Then the mean can be computed using the above formula by taking midpoint values for xi’ s and • f1+f2+ … +fk = n, where k is the number of classes. x Mean i fi n Remember that we have lost information on the individual values. We assume that in every interval the mid point is the representative value. Hence we get only an approximate value ( estimate ) of the mean; it is not the actual /true value. Properties of the mean For a given set of data mean is unique and it is computed using all the observations. Computation is easy. Can be computed for interval and ratio scale data. Extreme values influence the mean and it may be completely distorted by such values. Ex: consider the ‘age of mothers data’ 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Mean age is 24 years • Suppose one observation is 39 ( say the last one) • The revised total is 360-24+39=375 • Mean is 375/15 =25 years • Suppose 18 and 16 are two observations replacing 21 and 23. • Then the mean age becomes = 23.3 years An important note • The mean value obtained may not be one of the values in the data set • If the data set contains many extreme values (outliers) , then mean is not an appropiate average.
© Copyright 2026 Paperzz