Applied & Comp Math 625.403 “Statistical Methods and Data Analysis” Dr. Jackie Telford Class #1 ACM 625.403 - Telford 1 General Information Bring the textbook to every class. – Try to read or skim chapter(s) before lecture. Homework collected each class period (except for exam days) which will be graded and returned the next class period. See Syllabus for details of schedule. For copy of viewgraphs (2 per page) or lecture notes, go to “Specific Course Information” link on www.apl.jhu.edu website – can bring a copy to class for easier note-taking – can print in B&W (color usually not necessary) – can print duplex to save paper Class #1 ACM 625.403 - Telford 2 1 Scientific Method (cyclic process) 1. Collection of data through observations and experiments 2. Recognition of patterns in those data 3. Formulation of hypotheses and theories to systematize those patterns, often using mathematical logic and equations 4. Predictions that can be tested by more observations and experiments from: “Great Principles of Science”, Robert Hazen, The Teaching Company Class #1 ACM 625.403 - Telford 3 Predicting the Locations of Planets 1. Ptolemy (c. 100-170 AD): Earth-centered solar system, used circular orbits and smaller epicyles (to explain “retrograde” motion) 2. Copernicus (1473-1543): Sun-centered solar system, still assumed circular orbits 3. Tycho Brahe (1564-1601): improved instruments increased precision and accuracy of observations, discovered errors in the predictions of both Ptolemaic and Copernican systems 4. Kepler (1571-1630): based on Tycho Brahe’s data, derived three laws of planetary motion, including that the planets orbit in ellipses with the Sun at one focus Class #1 ACM 625.403 - Telford 4 2 Scientific Dilemma Increasingly precise and accurate observations (decreasing the measurement error) can lead to improved prediction capability. Works well for deterministic systems governed by Newtonian physics (universal law of gravitation). What if there is inherent unpredictably such that the measurement error is small as compared to randomness in the phenomenon being studied? (manufacturing variation, signal embedded in noise, genetics, public opinion polls, …) How do you get reliable information? Answer: Statistical Methods Class #1 ACM 625.403 - Telford 5 Chapter 1: Introduction Statistics is the science of collecting and analyzing data for the purpose of drawing conclusions and making decisions. Statistical tasks: 1. Collecting data 2. Summarizing and exploring data 3. Drawing conclusions and making decisions based on data as the basis for taking action Statistical Terms: Descriptive statistics and exploratory data analysis Inferential statistics (involves fitting models to data) Class #1 ACM 625.403 - Telford 6 3 Statistical Terms (Cont.) A population is a collection of all units of interest. A sample is a subset of a population that is actually observed. A measurable property or attribute associated with each unit of a population is called a variable. A parameter is a numerical characteristic of a population. A statistic is a numerical characteristic of a sample. Statistics are used to infer the values of parameters. A random sample gives an equal pre-assigned chance to every unit of the population to enter the sample. In probability, we assume that the population and its parameters are known and compute the probability of drawing a particular sample. In statistics, we assume that the population and its parameters are unknown and the sample is used to infer the values of the parameters. Different samples give different estimates of population parameters (called sampling variability). Sampling variability leads to “sampling error”. Probability is deductive. Statistics is inductive. Class #1 ACM 625.403 - Telford 7 Difference between Statistics and Probability Statistics: Given the information Probability: Given the information in the box, what is in your hand? in your hand, what is the box? from: Statistics, Norma Gilbert, W.B. Saunders Co., 1976 Class #1 ACM 625.403 - Telford 8 4 Statistical Software SAS SPSS Minitab Statistica S-plus BMDP Excel add-ons (not Microsoft “add-ins”) JMP - used in class and in computer lab Class #1 ACM 625.403 - Telford 9 Chapter 2: Review of Probability Approaches to probability – Classical approach (equally likely events, counting formulas) – Frequentist (“Monte Carlo” the events) – Personal or subjective approach (Bayesian) – Axiomatic approach (mathematical) Basic ideas of axiomatic approach – Sample space (set of all possible things that could happen) – Events (particular things that could happen) – Union (e.g. in cards, either a 10 or a diamond) – Intersection (e.g. in cards, a 10 of diamonds) – Complement (e.g. in cards, not a 10 or a diamond) – Disjoint or mutually exclusive events (e.g. in cards, either a 10 or a 5) – Inclusion (subset, e.g. in cards, less than a 10 or a 5) Class #1 ACM 625.403 - Telford 10 5 Axioms of Probability Axioms: 1. P(A) ≥ 0 2. P(S) = 1 where S is the sample space 3. P(A ∪ B) = P(A) + P(B) if A and B are mutually exclusive Theorems about probability can be proved using these axioms and these theorems can be used in probability calculations. P(A) = 1 - P(Ac ) (see “birthday problem” on p. 13) Conditional probability and Joint probability: P(A | B) = P (A ∩ B) / P(B) and P (A ∩ B) = P(A | B) × P(B) Events A and B are independent if P (A | B) = P(A), which implies P (A ∩ B) = P(A) × P(B) • Example 2.11 on p. 17 (private or public school vs. Adv. Placement) • Example 2.14 (“Let’s make a deal” problem) on p. 20 Class #1 ACM 625.403 - Telford 11 Sensor Problem Assume that there are two chemical hazard sensors: A and B. Let P(A falsely detecting a hazardous chemical)=0.05 and the same for B. What is the probability of both sensors falsely detecting a hazardous chemical? P (A ∩ B) = P(A|B)×P(B) = P(A) × P(B) = 0.05 × 0.05 = 0.0025 – only if A and B are independent (use different detection methods). If A and B are both “fooled” by the same chemical substance, then P (A ∩ B) = P(A | B) × P(B) = 1 × 0.05 = 0.05 – which is 20 times the rate of false alarms (same type of sensor) DON’T assume independence without good reason! Class #1 ACM 625.403 - Telford 12 6 AIDS Testing Example Made-up data AIDS Not AIDS Test positive (+) 95 495 590 Test negative (-) 5 9405 9410 100 9900 10000 P(+|not AIDS) = 495/9900 = .05 (false positives) want these to be low P(AIDS) = 100/10000 = .01 (prevalence) P(-|AIDS) = 5/100 = .05 (false negatives) P(not AIDS|-) = 9405/9410 = .999 (specificity) P(AIDS|+) = 95/590 = .16 (sensitivity/predictability) want these to be high P(not AIDS|+) = 495/590 = .84 (incorrect predictability) This is on reason why we don’t have mass AIDS testing Class #1 ACM 625.403 - Telford 13 Counting Formulas Operation O1 has n1 outcomes, operation O2 has n2 outcomes, …, and operation Ok has nk outcomes. Multiplication Rule: n1× n2 × … × nk is the total number of possible outcomes of O1, O2, …, Ok in sequence Permutations (ordered arrangements of distinct items): 1. n distinct items: n(n -1) • • • (2)(1) = n! 2. r out of n distinct items: n(n -1) • • • (n - r + 1) = n!/(n - r)! n × ____ n-1 × …×_____ n-r+1 ___ 1 2 r (positions) Combinations (unordered arrangement of items): 3. r out of n items: n!/[(n - r)! r!] Class #1 (dividing by r! unorders them) ACM 625.403 - Telford 14 7 Suggestions for Solving Probability Problems Draw a picture – Venn diagram – Tree or event diagram (Probabilistic Risk Assessment) – Sketch Write out all possible combinations if feasible Do a smaller scale problem first – Figure out the algorithm for the solution – Increment by size of the problem by one and check algorithm for correctness – Generalize algorithm (mathematical induction) Class #1 ACM 625.403 - Telford 15 Random Variables A random variable (r.v.) associates a unique numerical value with each outcome in the sample space Example: X= 1 if coin toss results in a head 0 if coin toss results in a tail Discrete random variables: number of possible values is finite or countably infinite: x1, x2, x3, x4, x5, x6, … Probability mass function (p.m.f.) – f(x) = P(X = x) (Sum over all possible values =1 always) Cumulative distribution function (c.d.f) – F(x) = P (X ≤ x) = kΣ≤ x f(k) • See Table 2.1 on p. 21 (p.m.f. and c.d.f. for two dice) • See Figure 2.5 on p. 22 (p.m.f. and c.d.f. graphs for two dice) Class #1 ACM 625.403 - Telford 16 8 Continuous Random Variables An r.v. is continuous if it can assume any value from one or more intervals of real numbers Probability density function (p.d.f.) f(x) f(x) ≥ 0 ∞ ∫ f ( x )dx = 1 (Area under the curve = 1 always) −∞ b P ( a ≤ X ≤ b) = ∫ f ( x )ds for any a ≤ b a • See Figure 2.6 on p. 23 (area under curve between two values a and b) Class #1 ACM 625.403 - Telford 17 Cumulative Distribution Function The cumulative distribution function (c.d.f.), denoted F(x), for a continuous random variable is given by: x F ( x ) = P( X ≤ x ) = ∫ f ( y )dy −∞ It follows that • f ( x) = dF ( x ) dx See Example 2.16 on p. 24 (exponential distribution) Class #1 ACM 625.403 - Telford 18 9 Expected Value The expected value or mean of a discrete r.v. X,denoted by E(X), µx, or simply µ, is defined as: E ( X ) = µ = ∑ x f ( x ) = x1 f ( x1 ) + x2 f ( x2 ) + K x The expected value of a continuous r.v. X is defined as: E ( X ) = µ = ∫ x f ( x )dx • See Figure 2.7 on p. 25 (mean as Center of Gravity or balance point Class #1 ACM 625.403 - Telford 19 Variance and Standard Deviation The variance of an r.v. X, denoted by Var(X), σx2, or simply σ2, is defined as: Var(X) = σ2 = E[(X - µ)2] So Var(X) = E[(X - µ)2]= E(X2 - 2µX + µ2) = E(X2) - 2µE(X) + E(µ2) = E(X2) - 2µµ + µ2 = E(X2) - µ2 = E(X2) - [E(X)]2 The standard deviation (SD) is the square root of the variance. Note that the variance is in the square of the original units, while the SD is in the original units. • See Example 2.17 on p. 26 (mean and variance of two dice) Class #1 ACM 625.403 - Telford 20 10 Quantiles and Percentiles For 0 ≤ p ≤ 1 the pth quantile (or the 100pth percentile), denoted by θp, of a continuous r.v. X is defined by the following equation: P ( X ≤ θ p ) = F (θ p ) = p θ.5 is called the median • See Example 2.20 on p. 30 (exponential distribution) Jointly distributed random variables and independent random variables See pp. 30-33 Class #1 21 ACM 625.403 - Telford Covariance and Correlation Cov(X,Y) = σXY = E[(X - µX)(Y - µY)] = E(XY) - E(X)E(Y) = E(XY) - µX µY If X and Y are independent, then E(XY) = E(X)E(Y) so the covariance is zero. The other direction is not true. Note that: E( X Y ) = ∞ ∞ ∫ ∫ x y f ( x, y ) dx dy − ∞− ∞ ρ XY = corr ( X , Y ) = Cov ( X , Y ) σ = XY Var ( X )Var (Y ) σ xσ Y • See Examples 2.26 and 2.27 on pp. 37-38 (prob vs. stat grades Class #1 ACM 625.403 - Telford 22 11 Two Famous Theorems Chebyshev’s Inequality: Let c > 0 be a constant. Then, irrespective of the distribution of X, P( X − µ ≥ c) ≤ • σ2 c2 See Example 2.29 on p. 41 (exact vs. Cheb. for two dice) Weak Law of Large Numbers: Let X be the sample mean of n i.i.d. observations from a population with finite mean µ and variance σ2. Then, for any fixed c > 0, P( X − µ ≥ c ) → 0 as n → ∞ Class #1 23 ACM 625.403 - Telford Selected Discrete Distributions Bernoulli trials: (single coin flip) p if x = 1 (success) f ( x ) = P( X = x ) = 1 − p if x = 0 (failure) E(X) = p and Var(X) = p(1-p) 0 1 Binomial distribution: (multiple coin flips) X successes out of n trials n f ( x ) = P ( X = x ) = p x (1 − p )n − x for x = 0, 1, K, n x E(X) = np and Var(X) = np(1-p) • See Example 2.30 on p. 43 (teeth) Class #1 ACM 625.403 - Telford 0 1 . . n 24 12 Selected Discrete Distributions (cont) Hypergeometric: drawing balls from the box without replacing the balls (as in the hand with the question mark) Poisson: number of occurrences of a rare event Geometric: number of failures before the first success Multinomial: more than two outcomes Negative Binomial: number of failures before first X successes 1 2 3 … N Uniform: N equally likely events • See Table 2.5, p. 59 for properties of these distributions Class #1 ACM 625.403 - Telford 25 Selected Continuous Distributions Uniform: equally likely over an interval Exponential: lifetimes of devices with no wear-out (“memoryless”), interarrival times when the arrivals are at random Gamma: kth arrival time when the arrivals are at random Lognormal: lifetimes (similar shape to Gamma but with longer tail) Beta: not equally likely over an interval • See Table 2.5, p. 59 for properties of these distributions Class #1 ACM 625.403 - Telford 26 13 Normal Distribution (“Bell-curve”, Gaussian) A continuous r.v X has a normal distribution with parameter µ and σ2 if its probability density function is given by: 1 f ( x) = exp[ −( x − µ ) 2 / 2σ 2 ] for - ∞ < x < ∞ σ 2π E(X) = µ and Var(X) = σ2 (see Figure 2.12, p. 53) Standard normal distribution: Z= X −µ σ ~ N (0, 1) • See Table A.3 on p. 673 Φ(z) = P(Z ≤ z) X −µ x−µ x−µ ≤ = z = Φ P( X ≤ x ) = P Z = σ σ σ • See Examples 2.37 and 2.38 on pp. 54-55 (computations) Class #1 ACM 625.403 - Telford 27 Carl Friedrick Gauss (1777 - 1855) Class #1 ACM 625.403 - Telford 28 14 Karl Pearson (1857 - 1936) “Many years ago I called the Laplace-Gauss curve the NORMAL curve, which name, while it avoids an international question of priority, has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another ABNORMAL. That belief is, of course, not justifiable.” Karl Pearson, 1920 Class #1 29 ACM 625.403 - Telford Percentiles of the Normal Distribution Suppose that the scores on a standardized test are normally distributed with mean 500 and standard deviation of 100. What is the 75th percentile score of this test? X − 500 x − 500 x − 500 P( X ≤ x ) = P ≤ = Φ = 0.75 100 100 100 From Table A.3, Φ(0.675) = 0.75 x − 500 = 0.675 ⇒ x = 500 + (0.675)(100) = 567.5 100 Factoids: µ 95% of the normal distribution is within ±2σ of µ 99.7% of the normal distribution is within ±3σ of µ 68% of the normal distribution is within Class #1 ACM 625.403 - Telford ±1σ of 30 15 Linear Combinations of r.v.s Xi ~ N(µi, σi2) for i = 1, …, n and Cov(Xi, Xj) = σij for i≠j Let X = a1X1 + a2X2 + … + anXn where ai are constants. Then X has a normal distribution with mean and variance: n E( X ) = E(a1 X1 + a2 X 2 + K+ an X n ) = a1µ1 + a2µ2 + K+ an µn = ∑ai µi n n n i =1 Var ( X ) = Var ( a1 X 1 + a2 X 2 + K + an X n ) = ∑ a σ + 2∑∑ ai a jσ ij i =1 X = (X1 + X2 + … + Xn) / n , so ai = 1/n 2 i 2 i i =1 j =1 i≠j Therefore, X from n i.i.d. N(µ, σ2) observations ~ N(µ, σ2/n), since the covariances (σij) are zero (by independence). Class #1 ACM 625.403 - Telford 31 Homework #1 Read Chapters 1 and 2, omitting Sections 2.4.4 & 2.10 Exercises: (answers to odd problems in the back of the book) (show work – no credit if calculations or printout not shown) 2.33 2.34 (a) discrete, (b) 0.4, (c) 0.6 2.36 (a) only: mean=2.57, var=3.545 2.43 2.80 (a) 0.4207, (b) 0.3674, (c) 15.46 2.83 Read or skim Chapters 3 (omitting Section 3.4) and Chapter 4. Class #1 ACM 625.403 - Telford 32 16
© Copyright 2026 Paperzz