Trainer: Bondor Cosmina-Ioana, PhD A ALWAYS Random variables and probability distributions S SEEK K KNOWLEDGE Objectives The meaning of the term Probability Metodhs of sampling Probability distribution Exemples Life sciences • If you want to know about living beings than you need to use statistics • Why? • Living beings are not the same • There are variable characteristics • Populations Probability • Basic definitions – experiment – trial – Outcome, event • Rules about how to combine the probabilities of events: AND, OR, NON • Events: mutually exclusive, imposible, complementary, certain, independents Probability An experiment repeated many times, Each repetition called a trial One or more outcomes can result from each trial the number of times that outcome occurs The probability (outcome) = the total number of trials Example 1 – Ebola virus cases • Ebola virus disease (EVD) – acute infectious disease with high mortality risk. • Epidemia din 2014 CDC. 2014 Ebola Outbreak in West Africa - Case Counts. Nov.2015. Available at: http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/case-counts.html Country Cases Deaths Guinea Siera Leone 3805 14122 2536 3955 Liberia 10672 4808 Other states 36 15 Total 28635 11314 Deaths (%) Death probability Survival probability Example 1 – Ebola virus cases Country Cases Deaths Survival Siera Leone 14122 3955 =14122-3955 Death probability Survival probability Example 1 – Ebola virus cases Country Cases Deaths Survival Death probability Siera Leone 14122 3955 10167 =3955/14122 Survival probability Example 1 – Ebola virus cases Country Cases Deaths Survival Death probability Guinea 3805 2536 Siera Leone 14122 3955 66.6 10167 0.666 0.28 Liberia 10672 4808 Alte state 36 15 45.1 =15/36*100= 41.7 0.451 =15/36= 0.417 Total 28635 11314 39.5 0.395 Survival probability Example 1 – Ebola virus cases Death Survival No. of deaths + No. of survival = Total Deaths (%) + Survival (%) = 100 % Example 1 – Ebola virus cases Deaths (%) + Survival (%) = 100 % Country Cases Deaths Survival Death probability Survival probability Guinea 3805 2536 Siera Leone 14122 3955 66.6 10167 0.666 0.28 =1-0.666 =1 - 0.28 Liberia 10672 4808 45.1 0.451 =1-0.4or51 Alte state 36 15 41.7 0.417 =10167/14122 Total 28635 11314 39.5 0.395 ? Example 1 - P(A) and P(nonA) Deaths (%) + Survival (%) = 100 % P(A) + P(nonA) = 1 Country Cases Deaths Survivals Death probability Survival probability Guinea 3805 2536 Siera Leone 14122 3955 66.6 10167 0.666 0.28 0.334 0.72 Liberia 10672 4808 45.1 0.451 0.549 Alte state 36 15 41.7 0.417 0.583 Total 28635 11314 39.5 0.395 0.605 Example 1 - P(A) and P(nonA) Deaths (%) + Survival (%) = 100 % P(A) + P(nonA) = 1 Country Cases Deaths Survivals Death probability Survival probability Guinea 3805 2536 ? ? ? Siera Leone 14122 3955 10167 0.28 0.72 Liberia 10672 4808 ? ? ? Alte state 36 15 ? ? ? Total 28635 11314 ? ? ? Example 2 – P(A and B) – Blood type by gender for 1000 people Blood type Frequency P(Blood type) Women O 400 200 A 450 200 B 142 96 AB Total 8 1000 4 500 P (Women) P (Women and Blood type) Example 2 – P(A and B) – Blood type by gender Blood type Frequency P(Blood type) Women O A 400 450 =400/1000 =450/1000 200 200 B 142 =142/1000 96 AB Total 8 1000 ? 4 500 P(Women) P (Women and blood type) P(A and B) = P(A)*P(B) – for independent events Blood type Frequency P(Blood type) Women O 400 0.40 200 A 450 0.45 200 B AB 142 8 0.142 0.008 96 4 Total 1000 1.0 500 P (Women) =200/450 P (Women and Blood type) P(A and B) = P(A)*P(B) – for independent events Blood type Frequency P(Blood type) Women O 400 =400/1000 200 A 450 0.45 200 B 142 =142/1000 96 AB Total 8 1000 ? 4 500 P(Women) 0.44 P (Women and blood type) = 𝑃 𝑤𝑜𝑚𝑒𝑛 ∗ 𝑃(𝑇𝑦𝑝𝑒 𝐴) = 𝟎. 𝟒𝟓 ∗ 𝟎. 𝟒𝟒 Example 2 – P(A and B) – Blood type distribution Blood type Frequency P(Blood type) Women P (Women) O A 400 450 ? 0.45 200 200 ? 0.44 P (Women and Blood type) ? 0.20 B 142 ? 96 ? ? AB Total 8 1000 ? ? 4 500 ? ? ? ? P(A or B) = P(A) + P(B) – P(A and B) Blood Frequency type O 400 A 450 P(Blood type) 0.40 0.45 B 142 0.142 AB Total 8 1000 0.008 1.0 P(Type 0 or Type A)= ? = P(Type 0) + P(Type A) – P(Type 0 and Type A) = = 400 1000 + 450 1000 −0= 950 1000 = 0.95 Type 0 and Type A are mutually exclusive: P(Type 0 and Type A) = 0 Summary A, B two events: P(A or B) = P(A) + P(B) - P(A and B) A,B two independent events: P(A and B) = P(A) * P(B) A,B two dependent events: P(A and B) = P(A dependent B) * P(B) P(A and B) = P(A|B) * P(B) Summary • A,B mutually exclusive events P(A and B)=0 • B the complementary event of A P(complementary event of A) = 1 - P(A) • A the certain event P(certain event) = 1 • A the impossible event P(impossible event) = 0 Bayes' theorem Dependent events Conditional probability—the probability of an outcome depending on an earlier outcome. 𝑃(𝐴|𝐵) = 𝑃 𝐵 𝐴 ∗𝑃(𝐴) . 𝑃(𝐵) Envolved in making medical decision – reasoning process interpreting diagnostic procedures. Why to study samples instead of whole population? Researchers study samples instead of populations • • • • More quickly Less expansive Less dangerous More accurate conclusions Clinical research Generalizing results: Group of patients similar subjects Sample population Example 100 patients with dental infections Treatment with 75 % succes erythromycin When we can do the generalization? Generalization: The probability to treat dental infection with erythromycin is 75% Experiment Probability of a child born being male ≈0.5 (50% of the cases) 3 boys 4 boys Family with 4 children 1 boys 2 boys 0 boys In 100 family with 4 children? In 100 families with 100 children? Select 100 families with 4 children: Number of boys No. of family 0 1 2 3 4 Total 4 29 40 24 9 100 • Number of boys in a family – Variable Probability distribution We call frequency distribution of variable X the number of times occur the possible values of the variable X Number 0 of boys No. of 4 family Probability 0.04 1 2 3 4 Total 29 40 24 9 100 0.29 0.40 0.24 0.09 1 • Number of boys in a family – Variable If we select families from the infertility treatment clinic • Families with identical twins: Number of boys 0 1 2 3 4 Total 50 0 0 0 50 100 • ! Selection influence the results Inferential statistics - condition To select random sample population: What is the distribution of no. of boys in families with 4 children? Population Sampling Population Sampling Sample population Inference Sample Why random selection? • Reducing / eliminating experimental errors = decrease / eliminate selection bias • The sample should be representative of the population = have the same distribution of important characteristics as the population where: – important characteristics = in connection with the studied characteristic Sampling methods • Random sampling - each subject has the same probability to be selected • Systematic sampling – each kth subject is selected • Stratified sampling – population is divided into subgroups and a random sample is selected from each subgroups Sampling methods • Cluster sampling – population is divided into clusters and a random sample is selected from each cluster (cluster = geographic zones) • Nonprobabilistic sample – the probability that a suject is selected is unknown Sampling methods • When researchers select random samples and then make measurements, the result is a random variable. Probability distribution • The values of a random variable can be summarized in a frequency distribution which we call probability distribution Probability distribution in families with 4 children? Number of boys Probability 0 1 2 3 4 Total 0.0625 0.25 0.375 0.25 0.0625 1.00 • How we compute probability distribution? How we compute probability distribution? Formula Rule Match with a theoretical probability distribution Random variable The results of random selection and measurements of some characteristics is a random variable. Values of a random variable = a frequency distribution = probability distribution. Distribution types Commonly used symbols in inference and statistics Characteristic Mean Standard deviation Proportion Population Parameters Symbol Sample Parameters Symbol 𝑋 SD p Probability distributions • Binomial distribution is used to determine the probability of yes/no events—the number of times a given outcome occurs in a given number of attempts. • Poisson distribution is used to determine the probability of rare events. • Normal distribution is used to find the probability that an outcome occurs when we measure a numerical observations have a bell-shaped distribution. Normal distribution Random variable X is normal N(, ) if the distribution depend on two parameters: mean and standard deviation Formula: f(x) 1 2 e 1 x 2 ( ) 2 . The Standard Normal Distribution . A normal distribution with =0 and =1. We change the variable with: X U Formula: f(x) 1 2 1 2 e x 2 Questions • Which value of Z divide the area under the curve in 95% and 5%? Zα = 1.645 • Which value of Z divide the area under the curve in 97.5% and 2.5%? Zα = 1.96 Exercise Men Ages SBP Women SBP 16 19 24 29 39 49 59 69 112 114 115 116 118 123 128 134 115 119 122 122 123 125 128 132 SBP is normal distributed with =120 and =10 mmHg. 1. Which value of SBP divide the area under the curve in 95% and 5%? 2. Which value of SBP divide the area under the curve in 97.5% and 2.5%? • Thank you !!!
© Copyright 2026 Paperzz