2012 --STAT1010 Assignment 2-- Prof. Tsang 1. For the following population of size N = 15 scores: 3, 4, 4, 5, 6, 6, 6, 7, 7, 8, 9, 9, 9, 10, 11; a. Sketch a histogram showing the population distribution. b. Find and locate the value of the population mean in your sketch. c. Compute the variance, and standard deviation for the population. d. Choose any sample of size n=8 (you can include any 8 of the 14 scores above in your sample) from the population and compute the mean, variance, and standard deviation for this sample. e. Compare the variance, and standard deviation you got in (c) and (d). Are they the same or which set of them is larger? f. Now suppose you make a mistake in your calculation in (d). You divide the sum of squared deviations by n instead of n-1 in the formula for the variance. What do you get for the sample variance and standard deviation? Is the result larger or smaller than the result in (d)? g. Compare the variance, and standard deviation you got in (c) and (f). Are they the same or which set of them is larger? h. Which set of results for sample variance, and standard deviation ( (d) or (f) ) is a better estimate for the population variance and standard deviation? Solution: a. 2k 15 k 4 b. Mean=6.93 c. Population Variance : 2 1 n ( xi ) 2 5.26 n i 1 Population S.D. 2 2.29 d. we can choose the first 8 values as a sample 3, 4, 4, 5, 6, 6, 6, 7 Statistics 1 2012 --STAT1010 Assignment 2-- Prof. Tsang VAR00001 N Valid 8 Missing 0 Mean 5.1250 Std. Deviation Variance 1.35620 1.839 e. They are not same and (c) is larger. f. Variance: 1.716 Std. Deviation: 1.310 So, the result is smaller than the result in (d). g. the result in (c) is larger. h. (d) is better. 2. Use the same set of population data of size N = 15 as in problem (1) to determine the following: a. The smallest & largest scores, and range of the dataset b. The first quartile, median and the third quartile (using interpolation) c. The interquartile range (IQR) d. Draw the Box-and-whisker plot for the dataset. e. The skewness of the distribution using the formula: Sk=(mean – median)/σ Solution: a. Statistics VAR00001 N Valid 15 Missing 0 Range 8.00 Minimum 3.00 Maximum 11.00 b. Using interpolation The first quartile: i=15*25%=3.75 the 3rd value is 4, the 4th value is 5 , Q1=4+(5-4)*0.75=4.75 median : i=15*50%=7.5 the 7th value is 6, the 8th value is 7, Median=7+(8-7)*0.5=7.5 The third quartile: i=15*0.75=11.25 The 11th value is 9, the 12th value is 9, Q3=9 c. IQR=Q3-Q1=9-4.75=4.25 2 2012 --STAT1010 Assignment 2-- Prof. Tsang d. e. Sk=(mean – median)/σ=-0.249 3. The following is a table for countries with the highest CO2 emissions in 2006 and their GDP CO2 emissions in 2006 (MTon) China 6103 US 5752 Russia 1564 India 1510 Japan 1293 Germany 805 United Kingdom 568 Canada 544 South Korea 475 Italy 474 GDP at 2006 (in trillions US$) 2.7 13.4 1.0 0.9 4.4 2.9 2.1 1.3 1.3 1.9 a. b. c. d. e. f. g. Find the mean, variance and standard deviation of CO2 emissions for these countries Find the Coefficients of Variation for the CO2 emission data Find the “Five-number summary” of CO2 emissions for these countries Find the mean, variance and standard deviation of GDP for these countries Find the Coefficients of Variation for the GDP data Find the “Five-number summary” of GDP for these countries Construct a scatter-plot using the GDP as the horizontal variable and the CO2 emissions as the vertical variable. Can you observe any relationship between these 2 variables from your plot? h. Find the covariance between the CO2 emissions and the GDP for these countries and their Correlation Coefficient Solution: Answer is given by SPSS. a. Statistics VAR00001 N Valid Missing 10 0 3 2012 --STAT1010 Assignment 2-Mean 1908.80 Std. Deviation 2160.55 Variance Prof. Tsang 4667985.51 b. Coefficients of Variation = S.D./Mean=1.151 c. Statistics VAR00001 N Valid 10 Missing 0 Minimum 474.00 Maximum 6103.00 Percentiles 25 526.7500 50 1049.0000 75 2611.0000 d. Statistics VAR00002 N Valid 10 Missing Mean 0 3.190 Std. Deviation 3.7427 Variance 14.008 e. Coefficients of Variation = S.D./Mean=1.173 f. Statistics VAR00002 N Valid Missing 10 0 Minimum .9 Maximum 13.4 4 2012 --STAT1010 Assignment 2-Percentiles 25 1.225 50 2.000 75 3.275 Prof. Tsang g. There are a little relationship between these 2 variables from the plot. h. covariance: 1 ( xi x )( yi y ) 5225.44 n 1 Correlation Coefficient: covariance/Sx Sy=5225.44/2160.55/3.7427= 4. Following is a table containing the raw scores of 45 students in the mid-term test and the final examination. Use Excel spread sheet or other software you know to: (a) Calculate the mean, median, mode, variance, standard deviation and the coefficients of variation for the two set of scores. What do you learn from these descriptive statistics? (b) Construct histograms with bin-width of 10 to show the distributions of the two set of scores. (c) Find the Five Number Summary and the IQR of the two set of scores from the histogram. (d) Construct a scatter-plot using the final examination scores as the horizontal variable and the mid-term test scores as the vertical variable. Can you observe any relationship between these 2 variables from your plot? Find the covariance and correlation coefficient between these 2 variables. (e) Transform the raw scores of the two dataset to standardized distributions using the z-scores and compare the 2 distributions. What conclusion can you draw from it? (f) The student with ID=10 got a raw score of 85 in both the mid-term and final exam. Can you determine if the student has improved his/her standing in the class or not? Mid-term Student ID Final Exam test 1 83.3 93.5 2 89.5 95 3 82.9 89 4 97.4 86 5 71.7 77.5 6 81.0 82.7 7 94.0 95 5 2012 --STAT1010 Assignment 2-- Prof. Tsang 8 82.2 96.2 9 84.3 95 10 85.0 85 11 73.6 66 12 77.4 93.0 13 87.2 83.9 14 87.7 73 15 78.0 87.6 16 84.4 92.5 17 81.4 85 18 89.6 82.5 19 82.5 81.2 20 86.4 100 21 80.0 79.2 22 91.0 92.5 23 78.9 81.4 24 87.4 82.5 25 73.3 81.2 26 76.5 77.5 27 89.1 76.2 28 69.0 75 29 75.0 80.0 30 71.0 63.7 31 84.1 54 32 70.7 87.5 33 65.8 70 34 65.6 73.7 35 68.5 76.2 36 61.8 58 37 53.4 66.2 38 59.8 83.0 39 44.0 65 40 59.3 56.2 41 46.8 57.5 42 52.2 72.5 43 49.0 75 44 64.0 57 45 36.0 43.8 6 2012 --STAT1010 Assignment 2-- Prof. Tsang Solution: a. Statistics Mid-term test N Valid Final Exam 45 45 0 0 Mean 74.482 78.320 Median 78.000 81.200 36.0a 95.0 Std. Deviation 14.3558 13.0473 Variance 206.088 170.233 Missing Mode a. Multiple modes exist. The smallest value is shown coefficients of variation: for Mid-term test: 14.3558/74.482=0.193 for Final Exam: 13.0473/78.320=0.167 We learn: (i)students’ score improved a bit in the final exam, (ii)variation in scores also decreased b. Left-skewed 7 2012 --STAT1010 Assignment 2-- Prof. Tsang Left-skewed c. Statistics Mid-term test N Valid Final Exam 45 45 0 0 Minimum 36.0 43.8 Maximum 97.4 100.0 25 65.700 71.250 50 78.000 81.200 75 84.700 87.550 Missing Percentiles d. There is some relationship between these 2 variables from the plot. In general, students’ scores in the mid-term and 8 2012 --STAT1010 Assignment 2-- Prof. Tsang the final are proportional to each other. Student with better mid-term score will tend to have better final score. Covariance: 130.09 Correlation coefficient :0.695 (e) 9 2012 --STAT1010 Assignment 2-- Prof. Tsang More students are one σ above the mean value in the final exam. f. In the midterm, zm 85 74.482 0.733 14.3558 In the final exam, z f 85 78.320 0.512 13.0473 Since z f z m , we can get that the student has not improved his/her standing in the class. 5. A manufacturer of tires wants to advertise a mileage interval that excludes no more than 5% of the mileage on tires he sells (at least 95% of the tires he sells will be able to achieve a mileage in that interval) . Suppose that μ = 25000 and σ = 4000. What interval would you would suggest by applying the Chebyshev's theorem? What would be the answer if it is known that the mileage follows the normal distribution? Give your best estimate. 1 0.95 k 20 k2 The interval is [25000 4000 20 ] [7111.46,42888.54] If the mileage follows the normal distribution, more than 95% of the tires will be able to achieve a mileage in the interval [µ-2σ, µ+2σ], or [17000, 33000]. Solution: Applying the Chebyshev's theorem, 1 10
© Copyright 2026 Paperzz