METR702 YILIN LU LAB 2 FALL 2014 Lab 2 – Exploring the Central Limit Theorem Part A: Thought experiments with synthetic distributions: 1. To get an extreme value as average of two numbers, the two numbers has to close to the same extreme side (either close to 1 or close to 0). However, the chance to choose 2 numbers randomly and both 2 numbers are extreme at the same side is very small. As a result, the extreme values are under-represented and the middle ones are more widely represented when we average 2 random numbers. 2. The shapes of the distributions of “avg of 10” and “avg of 20” becomes more bell-shape and more narrow than the “avg of 2”. Distributions Parent distribution 0 .1 .2 .3 Normal (0.54223, 0.30044) .4 .5 .6 .7 .8 .9 1 Normal (Mean, Stdev) avg of 2 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Normal (0.55077, 0.20357) Normal (Average of Possible Means, SE) avg of 10 0 .1 .2 .3 Normal (0.52916, 0.09646) .4 .5 .6 .7 .8 .9 1 avg of 20 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Normal (0.54038, 0.0697) 3. The means are around 0.5, not substantially different. It because the mean of the measurements from the uniform parent distribution/samples is already close to the true mean. 4. The standard deviations are decreasing because the shapes of distributions get narrower when we average more points from the samples. 5. Stdev(x) of parent distribution = 0.30044 SE (m-avg mean) = Stdev(x) / (m) 1/2 m=2, SE (2-avg mean) = 0.30044 / (2) 1/2 = 0.2124, close to 0.20357 m=10, SE (10-avg mean) = 0.30044 / (10) 1/2 = 0.0950, close to 0.09646 m=20, SE (20-avg mean) = 0.30044 / (20) 1/2 = 0.0672, close to 0.0697 The averages seem pretty close to the prediction of the Central Limit Theorem rule. 6. Distributions Parent distribution 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 .4 .5 .6 .7 .8 .9 1 Normal (0.22402,0.26943) Normal (Mean, Stdev) avg of 2 0 .1 .2 .3 Normal (0.21219,0.18546) Normal (Average of Possible Means, SE) avg of 10 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 .4 .5 .6 .7 .8 .9 1 Normal (0.23125,0.08641) avg of 20 0 .1 .2 .3 Normal (0.21925,0.05751) 7. The distribution of the 2-point average does not look as a normal distribution – not bellshaped. It is skewed to the left. The 10-point average is more like a normal distribution and the 20-point average is more acceptably “normal”. It might because as number of points in the average increase, the distributions become more precise/narrow in the peak. 8. Stdev(x) of parent distribution = 0.22402 m=2, SE (2-avg mean) = 0.22402/ (2) 1/2 = 0.1584, close to 0.18546 m=10, SE (10-avg mean) = 0.22402/ (10) 1/2 = 0.0708, close to 0.08641 m=20, SE (20-avg mean) = 0.22402/ (20) 1/2 = 0.05, close to 0.05751 The averages seem close to the prediction of the Central Limit Theorem rule, but not as good as the ones with the uniform parent distribution. 9. Yes, I do think averaging more measurements makes the distribution more “normal”. Part B: Can you estimate the average flow from a watershed? 10. Distributions Flow (L/s) - Parent Distribution 0 Normal (1491.85,2052.6) Normal (Mean, Stdev) 10 000 avg of 10 (L/s) 0 10 000 Normal (1487.53,636.159) Normal (Average of Possible Means, SE) avg of 50 (L/s) 0 Normal (1485.76,290.327) 10 000 When the number of samples being averaged increases, the peaks of distribution get more and more narrow while the distributions get more and more “normal”. 11. CLT Rule: SE (m-avg mean) = Stdev(x) / SQRT(m) Stdev(x) = 2052.6 [L/s] Number of the points in the average (m) Stander deviation error (SE) [L/s] 1 10 20 50 100 2052.6 636.159 460.992 290.327 197.17303 Stander deviation error expected from the CLT Rule [L/s] 2052.6 649.0891125 458.9753131 290.2814758 205.26 The rule seems still reasonable accurately, with plus/minus 10 L/s uncertainty. 12. Roughly, I need 100 samples in the average to get standard deviation error in about 200 L/s. 13. SE (m-avg mean) = Stdev(x) / SQRT(m) SQRT(m) = Stdev(x) / SE (m-avg mean) m = [ Stdev(x) / SE (m-avg mean) ] 2 Stdev(x) = 2052.6 [L/s] SE (m-avg mean) = 15 [L/s] m = [ 2052.6 (L/s) / 15 (L/s) ] 2 = 18725 So, to get standard deviation error in about 15 L/s (1%), we need about 18700 samples in the average. 14. This result implies that to detect the effects of climate change or land-use changes in a high precise, I need to have a lot of samples. So, I need to continuously repeat the random measurement, which is very hard to process by one person alone or by hands, and I don’t even know if the samples of measurements are representative of the whole population. I can detect the effects without a such high percentage of standard deviation error, which will give a much less precise and maybe an inaccurate average respected to the true mean if the parent distribution (the distribution of population which we don’t know) is not uniform; that is, there is a risk to get results with bias.
© Copyright 2026 Paperzz