STATISTICS 1403 PROBABILITY AND STATISTICS FOR THE BIOSCIENCES PROBLEM SET: GAUSSIAN AND LAPLACE FENCES Abstract. What qualifies a sample observation as an outlier? Explore two common distributions to get an idea. 1. Introduction An outlier is defined to be an observation that is “extremely far” from the center of its distribution. In exploratory data analysis, we often use the box-and-whisker plot with fences to identify outliers (observations outside the fences). We define the lower and upper fences based on the quartiles and interquartile range as L = Q1 − 1.5 IQR U = Q3 + 1.5 IQR IQR = Q3 − Q1 Notice there is no probability statement attached to the appearance of an outlier in this definition. This exercise will find such statements. 2. The Gaussian Distribution Calculate the probability that an observation from a sample of normally distributed observations is identified as an outlier: (a) Find the z-scores for the first and third quartiles, Q1 and Q3 (the 25% and 75% points). (b) Find the number of standard deviation units making up the interquartile range, IQR. (c) Find the z-scores which correspond to the lower and upper fences, L and U. (d) For normally distributed data, what is the probability that an observation x is an outlier, i.e x < L or x > U ? (i.e. p = Pr(X < L) + Pr(X > U )) (e) Using a binomial model, estimate the sample size n such that the probability of seeing at least one outlier, P r(N > 0), is at least 90%. This number is the integer value of n satisfying the inequality Pr(N > 0) = 1 − Pr(N = 0) = 1 − (1 − p)n ≥ 0.9 1 2 GAUSSIAN AND LAPLACE FENCES 3. Laplace Distribution The Laplace distribution is a symmetric model for absolute deviations from the mean, µ. Its CDF and inverse CDF (or quantile function) are 1 |x−µ|/b x<µ 2e F (x) = Pr(X < x) = 1 − 12 e−(x−µ)/b x ≥ µ and F −1 (p) = µ − b sign(p − 0.5) ln(1 − 2|p − 0.5|) where b is a scale factor. As before calculate the probability that an observation from a sample of exponentially distributed observations is identified as an outlier. Set µ = 0 and β = 1 to simplify calculations so all scores are in standard units. (a) Find the scores for the first and third quartiles, Q1 and Q3 (the 25% and 75% points). (b) Find the number of standard units making up the interquartile range, IQR. (c) Find the scores which correspond to the lower and upper fences, L and U . (d) For Laplace distributed data, what is the probability that an observation x is an outlier, i.e x < L or x > U ? (e) Using a binomial model, estimate the sample size n such that the probability of seeing at least one outlier, P r(N > 0), is at least 90%. (Use the same inequality as used for the Gaussian distribution.) 4. Comparison (a) Draw parallel box-and-whisker-plots, labeled with standard scores, for the Gaussian and Laplace distributions. (Since you have no data, there will be no outliers.) (b) Find the excess kurtosis for the Gaussian and Laplace distributions in any standard statistical reference (cite in in your write-up). GAUSSIAN AND LAPLACE FENCES 3 (c) Compare the outlier probabilities for the two distributions. Based on this comparison, what do you think the kurtosis statistic means? 5. Your Analysis Your analysis should be handwritten. Include a brief problem statement (in your own words, do NOT copy these instructions), and a brief narrative explaining your calculations and findings. Submit your write-up via Blackboard as a scanned or saved .PDF or Word .DOC file.
© Copyright 2026 Paperzz