DESCRIPTIVE STATISTICS-1 Lecture - 5 Prof.K.K.Achary YENEPOYA RESEARCH CENTRE Any finite or infinte collection of individuals, not necessarily animate, subject to a statistical study is called a population ( also called universe ) Example: population of students ( quite general) Population of students in YU – specified to institution Population of tobacco using students in Mangalore Populations are usually described in terms of their observational units, their extent( coverage ) and time. Careful definition of the population is essential in designing a research study In a research study, it may not be possible to collect information/data from the entire study population, because it may be too large Then we go for a sample from the study population A sample is a subset of individuals/subjects from the study population, generally a representative part of the population In statistical study , we are considering random or unbiased samples. There are different methods of selecting random samples from a given population The type of random sampling and the size of the sample are very important in a research study and they should be predetermined The information that is measured /observed from every study subject is a variable. The collection of all such measurements from all subjects from the sample(population) is called sample data( population data ) Example: sample: Tobacco using students in YU Study variables: age,gender,tobacco use A descriptive measure of a set of values ( observations/data set ) is a representative value( generally a single number) which summarizes the characteristic properties/features of data. The common features observed are concentration of data around a central value scatter( dispersion/spread) around a central value asymmetry in data distribution peakedness of data distribution The ‘central value’ and ‘spread/dispersion’ are numerical summaries of the data. The center of a data set is commonly called the average. There are many ways to describe the average value of a distribution. The most appropriate measures of center and spread depend on the shape of the distribution and scale of measurement. Once the characteristics of the distribution are known, we can analyze the data for interesting features, including unusual data values, called outliers. Consider the following data sets. Age at first child birth: 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Blood cholesterol: 145,145,143,189,180,196,145,198,205,209,22 7,222,292,302, 253,233,283 Day temperature: 27,26,27,28,29,29,29,30,30,30,29,30,31,31,31 ,32,31,32,32,35 What are your general comments? The descriptive measures can be computed for the whole mass of data( if available(?)) i.e. population ,related to a study or only a representative part of the population,i.e. sample (random/non random sample ). If we compute a measure from a random sample, it is termed “ statistic”/descriptive statistics. The measures describing the characteristics of a population are called parameters. Using the sample data we estimate the parameters. The functions of sample observations which estimate the parameters are called ‘ estimators’ / ‘statistic’. The numerical value of the estimator is called an ‘estimate’. Measures of central Tendency(Averages) Central tendency is the tendency of the data/observations to cluster around a central value. This central value is also called an ‘average’ Some of the measures of central tendency are: Mean( arithmetic mean or average value as commonly used) Median Mode We have positional averages like, quartiles, deciles and percentiles Arithmetic mean: Arithmetic mean of a sample ( or population ) is the “average/mean” of all observations in the sample/population. It is given by Arithmetic mean = (Total of all observations in the sample/population)/Number of observations in the sample/population. Let there be n values in the sample and they are recorded as : x1 , x2 , x3 ,… xn ( you can also use y1 ,y2 , y3,… yn ) Sum or total of n observations-( x1 + x2 + x3 +…. + xn ) = ∑xi x Mean n i This formula is used to compute the mean of ungrouped data.( What is ungrouped data? What is grouped data? ) Computation steps: • Add all observations(Sum up) • Find no. of observations in the sample data set ( n ) • Divide the total by n. This value is mean Example: Consider the age data. 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Here n=15 Now ,the sum( total) is 360. Hence mean= (360/15) = 24 The mean age at first child birth is 24 years. Compute the mean level of blood cholesterol. Computation of mean in grouped data: Grouped data may be in the form of a discrete frequency distribution or continuous frequency distribution. Let us first consider a discrete frequency distribution Score 0 1 2 3 4 5 Total level (xi) No. of students (fi) 3 12 23 46 18 6 108 The mean is computed the same way as before. Total of all observations = 0x3+1x12+2x23+3x46+4x18+5x6 = 298 No. of observations=108 Mean = 298/108 =2.759259 =2.76 For the above case ,computational formula can be developed as follows. Let the distinct values be x1 , x2 , x3 ,…. xk where xi repeats fi times. Then the total of all observations is given by x1f1 +x2f2 + … +xkfk.. The total number of observations is f1+f2+ … +fk = n Hence the formula for mean is: x f Mean i n i Computation of mean from continuous frequency table ( distribution ): Recollect the method of constructing a frequency table when observations are recorded on a continuous variable. Let the midpoint of the ith class interval be denoted as xi and the corresponding class frequency be fi. Then the mean can be computed using the above formula by taking midpoint values for xi’ s and f1+f2+ … +fk = n, where k is the number of classes. x Mean i fi n Remember that we have lost information on the individual values. We assume that in every interval the mid point is the representative value. Hence we get only an approximate value ( estimate ) of the mean; it is not the actual /true value. For a given set of data mean is unique and it is computed using all the observations. Computation is easy. Can be computed for interval and ratio scale data. Extreme values influence the mean and it may be completely distorted by such values. Ex: consider the ‘age of mothers data’ 22,24,24,23,21,25,26,23,24,25,26,26,23,24,24 Mean age is 24 years Suppose two observations are 39 ( say the last two) The revised total is 360-24-24+39+39= 390 Mean is 390/15 =26 years Suppose 18 and 16 are two observations replacing 21 and 23. Then the mean age becomes = 23.3 years An important note The mean value obtained may not be one of the values in the data set If the data set contains many extreme values (outliers) , then mean is not an appropriate measure of central tendency/average.
© Copyright 2026 Paperzz