1 Introduction – part 1 Course Syllabus – (Reminder: prepare a folder for homework) Prepare a 3-ring binder with dividers for class notes, homework, quizzes, tests, or related information Homework Check the answers for odd numbers on the back of the book, show your work. No credit will be awarded if no work is shown. Use 3 holes line papers with your name on the right upper corner with your ID; write down the homework # and assignments #. Place your homework on the homework folder and is due Monday TI -84+ Calculator is required. Write down calculator commands on your homework. Homework log – for your own record Prepare a folder for homework with homework log Write the date next to the homework number and homework log should be in your homework folder with your homework Failure to comply with instructions, will result point deduction 2 Introduction – part 2 Lecture notes will be on line, please print it out before the class Lecture notes may only contain the summary of lectures, examples will be shown on the whiteboard, thus taking notes is essential part of the learning. Beginning each class , there may be a warm-up (may be counted as class participation points). At the end of each class, there may be an exit slip which contains in-class work (may be counted as class participation points). Quizzes (counted as your grade) may be given randomly unannounced, may contains vocabularies, homework problems. Familiar with syllabus, and keep track your total score on your own. 3 1-1 Overview (1) Key Concept: Overview and Introduction of statistics. After this lesson, you will be able to know some of the examples of statistics and identify the terms, and data types used in statistics Chapter Problem - 1936 presidential election predictions with Literary Digest Poll and George Gallup’s poll Literacy Digest polled 10 millions people, got about 2millions+ back, and predicted Alf Landon would capture 57% of votes George Gallup polled 50,000 people, predicted that Roosevelt would capture 57% of votes and he actually got 61% of the popular votes Why was Literary digest poll so wrong? Does size of poll matter? What is Statistics The word “statistics” is derived from the Latin word status (meaning “state”). Statistics - The science of collecting, organizing, and interpreting numerical facts – data You live in the world of statistics Statisticians assemble, classify, and tabulate data, then analyze the data in order to make generalizations and decisions Households, governments, and businesses rely heavily on statistical data for guidance 4 1-1 Overview (2) Gallup Pull conducted by the Gallup Organization (invented by George Gallup); Example: Election – predict the % of votes, Prescription Drug (e.g. side effects), or ethnic group population, etc. Census - United States conducts the census every 10 years. The result is used for many government projects, highly political considerations Baseball batting average, imported car sales, or president’s popularity Why study statistics Provide a perspective about how to look at data and effectively communicate information about it to others. Provide understanding some research reports that encountered daily news, such as: The annual earnings of college graduates exceed, on the average, those of high school graduates by $30,000 Heavy uses of tobacco suffer significantly more respiratory ailments than those nonusers Help for future research project with statistical analysis 5 1-1 Overview (3) Definitions used in Statistics Data Collection of observations such as measurements, genders, survey responses Statistics – a science of Planning studies and experiments, obtaining data Organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data Population Sample The complete collection of all elements (e.g. scores, people, measurements) The collection is complete in the sense that it includes all subjects to be studied A subcollection of members selected from a population Census The collection of data from every member of the population 6 1-2 Statistical Thinking Statistical thinking is the ability to see the big picture and to consider the following relevant factors: Context of the data – what are the data values? Source of the data – obtained objectively or biased? Sampling method – how the data was collected? Conclusions – what can we conclude analysis? Practical implications – identify any implications, statistical significance? Statistical thinking may involve determining whether results are statistical significant (vs practical significant) Sample data collection for statistics Must be collected in an appropriate way through a random process If not collected in an appropriate way, the data may be useless, because it may be biased. Statistics is an interesting subject that are extensive, real and meaningful in real world applications HW #1: PP 9-11, #1 – 17 Every Other Odd (EOO) 7 1-3 Types of Data (1) Key Concepts The subject of statistics is largely about using sample data to make inferences (or generalizations) about an entire population. It is essential to know and understand the definitions that follows. To study different types of data (numbers) know the difference between quantitative data and qualitative data. Parameter (for population) A numerical measurement describing some characteristic of a population 3250 (walk buttons for pedestrians to cross the intersections in New York City) 77% do not work (based on a report) The figure of 77% is a parameter 3250 is entire population of walk buttons for NYC Statistic (for sample) (Note: sample is part of population) A numerical measurement describing some characteristic of a sample Surveyed 877 executives, 45% would not hire someone made typos on their job application The figure of 45% is a statistic, because it is based on a sample (877), not entire population of executives 8 1-3 Types of Data (2) Quantitative data Numbers representing counts or measurements (it is important to use the appropriate units of measurement, such as $, hours, feet, meters, ages, etc) Examples: thousands of dollars, hundredths of a seconds, 250 million people, the weights of supermodels Two types of quantitative data Definitions Discrete – finite number or a countable number (0, 1, 2, 3, ..), e.g. the number of eggs that a hen lays Continuous (numerical) data – values correspond to some continuous scale (without gap), e.g. amount of milk that a cow produced Qualitative data Sometimes called categorical or attribute data Data can be separated into different categories that are distinguished by some nonnumeric characteristic; example: the genders (male/female) of professional athletes, colors of eyes, political party affiliations 9 1-3 Types of Data (3) Classified to Four Levels of Measurement 1. Nominal – categories only Data that consist of names, labels, or categories only; cannot be ordered or used for calculations. Examples: Yes/No/Undecided; Colors of cars; political party affiliations 2. Ordinal – categories (data) can be arranged in some order Difference between data values either cannot be determined or are meaningless Examples: courses grades A, B, C, D, F in order, but A – B is not a quantity; Ranks such as cities as livability by magazine. Provide information about relative comparisons, but not the difference 3. Interval – Ordinal with meaningful differences (with no natural starting point; i.e. data cannot have zero as starting point; zero = no quantity) 4. Examples: (1) Body temperature between 98.2F and 98.6F – ordered, with difference 0.4F ; (2) Years 1000, 2008, 1492. Not 0 year, e.g. 0 century? Ratio – Interval with differences and a natural starting point zero Examples (1) Weights, like diamond in ring (2) Prices – college books. Zero means no cost 10 1-3 Summary (4) and Homework #2 Levels of Measurement Ratio There is a natural zero starting point and ratios are meaningful Interval Differences are meaningful, but there is no Example: body temperatures natural zero starting point and ratios are in degrees Fahrenheit or meaningless Celsius Ordinal Categories are ordered, but differences can’t be found or are meaningless Example: rank of colleges in U.S. News and World Report Nominal Categories only, Data cannot be arranged in an ordering scheme. Example: Eye colors Distance Example: The finishing positions of a sample of drivers in a NASCAR race: 3, 8, 12, 15, 17 (3rd place, 8th place, etc.) What is the level of measurement of these data? Are these data discrete or continuous? Are the data qualitative or quantitative? HW #2: Page 16: #1-25 EOO 11 1.4 Critical Thinking (1) Key Concepts: Success in the introductory statistics course typically requires more common sense than mathematical expertise. This section is to illustrate how common sense is used when we think critically about data and statistics. Importance of good samples : Sampling techniques (sampling procedures); Randomness; Avoid the biased result Correlation and Causality – correlation does not imply causality is another way to misinterpret statistical data, i.e. to find a statistical association between two variables and to conclude that one of the variables causes (or directly affects) the other variable. Correlation means two variables are related – will learn more later in Ch. 10; Two variables may be related, but may not be directly related (cause of the other) IQ and wealth? News report often use the wording: one is the cause of the other 12 (1-4) Critical Thinking (2) Bad Samples Voluntary response sample (or self-selected sample): is one in which the respondents decide whether to be included. Results in biased samples, e.g. Literary Digest poll Examples: Polls conducted through Internet; Mail-in; Telephone In this case, valid conclusion can be made only about the specific group of people who agree to participate. What went wrong in the Literary Digest Poll, which received 2.3 million responses out of 10 million ballots? The sampling method is a voluntary response sample and is biased, since the ballots were sent to the magazine subscribers, car owners, or telephone users. Small Samples: Your opinion: good? bad? Biased? Can we draw a reasonable conclusion or inference? Example: basing a school suspension rate on a sample of only three students Small or large may not be too significant, the importance is appropriate collection of sample Voting behavior Samples with Reported Results When 1002 eligible voters were surveyed, 70% of them said they voted in recent presidential election. However, voting records show that only 61% of eligible voters actually did vote. When ask subjects their weights, you most likely would get their desired weights, not their actual weights. 13 (1-4) Critical Thinking (3) Bad Samples Bar graphs or pie graphs can be misleading due to scale; you must analyze the numerical information given in the graph, not the graph’s shape $32,996 35000 $30,180 30000 25000 20000 15000 10000 5000 0 0 0 $33,500 $33,000 $32,500 $32,000 $31,500 $31,000 $30,500 $30,000 $29,500 $29,000 $28,500 $32,996 $30,180 California California Nevada Nevada Pictographs - Use drawings of the object to depict the data can be misleading to: due to scale 25 Daily Oil Comsumption Personal Income per Capita Personal Income per Capita 20 20 20.0 15 10 5.4 5.4 5 0 USA Japan USA Japan 14 (1-4) Critical Thinking (4) Bad Samples Use percentages also can be misleading due to the interpretation of the meaning of % Example as book stated, in referring to lost baggage, Continental Airlines ran ads claiming that this was “an area where we’ve already improved 100% in the last six months”. What does 100% mean to you? Some said it means there is no baggage being lost and which is not true Survey questions can be biased depend how it asked and order of questions 97% yes: “Should the President have the line item veto to eliminate waste?” 57% yes: “Should the President have the line item veto, or not?” “Would you say that traffic contributes more or less to air pollution than industry?” Or “Would you say that industry contributes more or less air pollution than traffic No response -Non response for a survey questions can skew the data whether the person is not available or refuses to answer the questions. E.g. telemarketers phone calls Missing Data - Result can be dramatically affected by missing data Data can be missing randomly – i.e. related to its value or other values Data can be missing due to some factors such as non-reporting. (as a result of how the survey was conducted) 15 (1-4) Critical Thinking (5) Bad Samples Self-Interest Study - Studies are sponsored by parties with interests to promote. Can you think of good examples? Precise Number - Can be misleading, and biased the result Partial Pictures Most of us believe when the number was stated precisely, we think it’s accurate! Example: the number of households in the United States is 103,215,027. In this case, it’s better just say 103 Million “90% of all our cars sold in this country in the last 10 years are still on the road”; 10 years data may be just 3 years Misleading the consumers, even though it’s not a false statement Deliberate Distortions - Produce biased result, can misleading the users The survey of car rental company, Avis was winner, but was distorted. Hertz sued Avis for false advertising based on the survey 16 1-4 Summary and Homework #3 There are many cases that misuses the statistics, you probably heard or read about those statistical report Study Statistics will make you smarter to decide what study or research is better than others; Use your common sense to interpret data and statistics Example: During a show on MTV, the host asks viewers to call in and vote for or against a new song, with the result that 74% of 12,335 viewers favor it. Given that it is a large sample, and more than 50% favored the song, is it valid to conclude that the majority of Americans favor the song? Why or why not? Homework #3 (Chap. 1-4) Page 23-24, 1-17 EOO 17 1-5 : Collecting Sample Data (1) Key Concept – Discuss the methods to collect sample data appropriately for the quality of statistical analysis. Observational study – observe and measure specific characteristics, but don’t attempt to modify the subject being studied Experiments – apply some treatment and then observe , proceed to observe its effect on the subjects (experimental units) Cross-sectional study – data are observed, measured, and collected at one point in time Retrospective (case-control) study – data are collected from the past by going back in time (through examination of records, interviews, and so on) Prospective (longitudinal or cohort) study – data are collected in the future from groups sharing common factors (i.e. cohort) 18 1-5 : Collecting Sample Data (2) Experiments and Important considerations Experiments mean some treatment are applied. The result can be confounding (confounding means unable to distinguish among the effects of different factors) Important Considerations: Control the effects of variables Use replication Blinding – used a lot in the experimental drugs; Placebo effect, Blinding technique, Double blinding technique Blocks – a group of subjects that are similar, example like fertilizer or drugs; Randomize the block design – block the subjects with similar characteristics, and randomly assign treatments Completely Randomized Design – assign subjects to different treatment groups through random process selection. Example: polio experiment Rigorously Controlled Design – subjects are carefully chosen and given similar treatment for the experiment. E.g. blood pressure lowering drug Replication – repetition of an experiment to recognize differences from different treatments Sample size that is large enough to avoid erratic behavior and to see the true nature of any effects and obtain that sample using an appropriate method, such as one based on randomness Use randomization (very important, see next chart) 19 1-5 : Collecting Sample Data (3) Randomization and Sampling Strategies Random sample – Members from a population are selected randomly, i.e. everyone has equal chance; e.g. use computer to generate telephone numbers. Simple Random sample of n subjects – Subjects selected in such a way that every possible sample of same size n has equal chance of being chosen; this is a requirement will be used for various statistical procedures. Probability sample – members selected from a population is 20 known chance of being selected. 1-5 : Collecting Sample Data (4) Sampling Techniques Systematic sampling – select some starting point and then every nth element in the population (e.g. every 3rd) Convenience sampling – use the results easy to get 21 1-5 : Collecting Sample Data (5) Sampling Techniques Stratified sampling – subdivide the population to different subgroups (strata, two or more) with same characteristics (e.g. gender or age bracket), than draw sample from each subgroup (stratum); e.g. men or women or Democrats or republicans. Cluster Sampling – divide the population to sections (or clusters), then randomly select some of these clusters and form the samples, e.g. voting precincts or divide all classes at a college with subjects and sections, then poll all students in randomly selected classes. 22 1-5 : Collecting Sample Data (6) Multistage Sampling Multistage sample design – selection of a sample in different states (includes random, stratified, and cluster sampling at different stages), is a very complicated design Example: unemployment statistics use households survey in US 2007 different regions (Primary Sampling Units or PSU) with metropolitan areas, counties, groups of smaller counties 792 from 2007 Each one of 792 is partitioned into blocks Identify clusters of households that are close to each other Select the clusters randomly Sampling errors – the difference between a sample result and the true population result Non-sampling errors – occurs when sample data are incorrectly collected, recorded, or analyzed (from a biased sample, or defective measurement instrument, or data was entered wrong) 23 1-5 Statistical Studies & Homework #4 Statistical Studies Observational study Past period of time When was the observations Observations or experiments? Forward in time One point in time Retrospective study Cross-sectional study Experimental study Design the experiment 1. Control effects of the variables by blinding, blocks, completely randomized 2. Replication 3. Randomization Prospective study Types of studies and experiments; Controlling the effects of variables; Randomization; Types of sampling; Sampling errors HW #4 (Chap 1-5), Pg. 34-36, 1 – 25 EOO 24
© Copyright 2026 Paperzz