Chapters 1 & 3 Graphical Methods for Describing Data Mr. Potter • • • • • • • • • • Bachelors in Mathematics w/ Education Masters in Educational Leadership Doctoral Student, Psych of Cog/Instruction World Champion in Taekwondo Married 16 years, my son Kai is 13 in 8th grade at McMillan, my daughter Kaitlin is 4 Turn in information sheet Tutoring times Electronics – no phones, no music - Calc’s Dress code – if I tell you to tuck, it will be documented Tardies • Prerequisites – successful completion of Algebra 2 or above • Homework • Supplies – TI-84 or TI-83 plus calculator – preferred is TI-84 • AP Exam – Thursday, May 12, 2016 – 12:00 pm – Test: • 40 Multiple choice questions • 5 free response questions • 1 investigative task – What is the passing rate? What is statistics? • the science of collecting, organizing, analyzing, and drawing conclusions from data Why should one study statistics? Can dogs help patients with heart 1. To be informedfailure ... by reducing a) Extract information stress from tables, andcharts and graphs anxiety? b) Follow numerical arguments c) Understand the basics of how data should When people take be gathered, summarized, and analyzed to a vacation do they draw statistical conclusions really leave work behind? Why should one study statistics? (continued) Many companies now require drug 2.screening To makeasinformed judgments a condition of employment. With these screening tests there is a risk of a false-positive reading. Is the If you choose a particular major, what risk of a false result acceptable? are your chances of finding a job when you graduate? 3. To evaluate decisions that affect your life What is variability? Suppose you went into a convenience In fact, variability is store toquality, purchase a soft drink.universal! Does The state, oralmost degree of being every can variable on the shelf contain exactly 12 or changeable. ounces? It is variability that makes life interesting!! NO – there may be a little more or less in the various cans due to the variability that is inherent in the filling process. If the Shoe Fits ... The two histograms to the right display the distribution of heights of gymnasts and the distribution of heights of female basketball players. Which is which? Why? Heights – Figure A Heights – Figure B If the Shoe Fits ... Suppose you found a pair of size 6 shoes left outside the locker room. Which team would you go to first to find the owner of the shoes? Why? Suppose a tall woman (5 ft 11 in) tells you see is looking for her sister who is practicing with a gym. To which team would you send her? Why? The Data Analysis Process 1. Understand the nature of the problem It is important to haveand a clear 2. Decide what to measure how to It is important select and apply measure itbeforetogathering direction data. appropriate inferential statistical 3.Itthe Collect data is important to carefully define the methods This often leads to develop the variables tostep be studied and to It is important to understand how 4. formulation Summarize of data and performquestions. new research appropriate methods for determining data is collected because the type of preliminary analysis their values. analysis that is appropriate depends 5. Perform formal analysis This analysis provides insight on initial how the data was collected! into important characteristics of the 6. Interpret results data. Suppose we wanted to know the average GPA of high school graduates in the nation this year. We could collect data from all high schools in the nation. What term would be used to describe “all high school graduates”? Population • The entire collection of individuals or objects about which information is desired What do you call it when you • A census is performed to gather collect data about the entire about the entire population population? GPA Continued: Suppose we wanted to know the average GPA of high school graduates in the nation this year. Why might we not want to use a census here? We could collect data from all high schools in the nation. If we didn’t perform a census, what would we do? Sample • A subset of the population, selected for study in some prescribed manner What would a sample of all high school graduates across the nation look like? High school graduates from each state (region), ethnicity, gender, etc. GPA Continued: Suppose we wanted to know the Once we have collected the data, average GPA of high school what would we do with it? graduates in the nation this year. We could collect data from a sample of high schools in the nation. Descriptive statistics • the methods of organizing & summarizing data If the sample of high school GPAs contained 1,000 numbers, how could the data be organized or summarized? • Create a graph • State the range of GPAs • Calculate the average GPA GPA Continued: Suppose we wanted to know the average GPA of high school graduates in the nation this year. We could collect data from a sample Could we use the data from our of high schools in the nation. sample to answer this question? Inferential statistics • involves making generalizations from a sample to a population Based on the sample, if the average GPA for high school graduates was 3.0, what generalization could be made? The average national GPA for this year’s high school graduate is approximately 3.0. Could someone claim that the average GPA for graduates in your school district is 3.0? Be sure tolocal sample from the No. Generalizations based on the results of a sample can only be made back to the population from which the sample came from. population of interest!! Variable • any characteristic whose value may change from one individual to another • Suppose we wanted to know the IsGPA thisofa high variable . . . average school The number ofnation wrecks peryear. week graduates in the this Define of interest. at the the variable intersection outside The variable of school? interest isYES the GPA of high school graduates Data • The values for a variable from individual observations For this variable . . . The number of wrecks per week at the intersection outside . . . What could observations be? 0, 1, 2, … Two types of variables categorical numerical discrete continuous Categorical variables • Qualitative • Identifies basic differentiating characteristics of the population Can you name any categorical variables? Numerical variables • quantitative • observations or measurements take on numerical values • makes to average these values Cansense you name any numerical variables? • two types - discrete & continuous Discrete (numerical) • Isolated points along a number line • usually counts of items Continuous (numerical) • Variable that can be any value in a given interval • usually measurements of something Identify the following variables: 1. the color of cars in the teacher’s lot Categorical 2. the number of calculators owned by students at your school Discrete numerical 3. the zip code of an individual Is money a measurement orCategorical a count? 4. the amount of time it takes students to drive to school Continuous numerical 5. the appraised value of homes in your city discrete numerical Classifying variables by the number of variables in a data set Suppose that the PE coach records the height of each student in his class. This is an example of a univariate data Univariate - data that describes a single characteristic of the population Classifying variables by the number of variables in a data set Suppose that the PE coach records the height and weight of each student in his class. This is an example of a bivariate data Bivariate - data that describes two characteristics of the population Classifying variables by the number of variables in a data set Suppose that the PE coach records the height, weight, number of sit-ups, and number of push-ups for each student in his class. This is an example of a multivariate data Multivariate - data that describes more than two characteristics (beyond the scope of this course) Graphs for categorical data Bar Chart When to Use Categorical data How to construct – Draw a horizontal line; write the categories or labels below the line at regularly spaced intervals – Draw a vertical line; label the scale using frequency or relative frequency – Place equal-width rectangular bars above each category label with a height determined by its frequency or relative frequency Bar Chart (continued) What to Look For Frequently or infrequently occurring categories Collect the following data and then display the data in a bar chart: What is your favorite ice cream flavor? Vanilla, chocolate, strawberry, or other Double Bar Charts When to Use Categorical data How to construct – Constructed like bar charts, but with two (or more) groups being compared – MUST use relative frequencies on the vertical axis – MUST include a key to denote the different Whybars MUST we use relative frequencies? Each year the Princeton Review conducts a survey of students applying to college and of parents of college applicants. In 2009, 12,715 high school students responded to the question “Ideally how far from home would you like the college you attend to be?” Also, 3007 parents of students applying to college responded to the question “how far from would you like What should you home do first? the college your child attends to be?” Data is displayed in the frequency table below. Frequency Ideal Distance Students Parents Less than 250 miles 4450 1594 250 to 500 miles 3942 902 500 to 1000 miles 2416 331 More than 1000 miles 1907 180 Create a comparative bar chart with these data. Relative Frequency Ideal Distance Students Parents Less than 250 miles .35 .53 250 to 500 miles .31 .30 500 to 1000 miles .19 .11 More than 1000 miles .15 .06 Found Foundby bydividing dividingthe thefrequency frequencyby bythe thetotal total number numberofofstudents parents What does this graph show about the ideal distance college should be from home? Segmented (or Stacked) Bar Charts When to Use Categorical data How to construct – MUST first calculate relative frequencies – Draw a bar representing 100% of the group – Divide the bar into segments corresponding to the relative frequencies of the categories Remember the Princeton survey . . . Create a segmented bar graph with these data. Relative Frequency First Ideal Distance Students Less than 250 miles .35 250 to 500 miles .31 500 to 1000 miles .19 More than 1000 miles .15 draw a Parents bar that .53 represents .30 100% of the .11 students who .06 answered the survey. Relative Frequency Relative frequency Notice Ideal Distance that this segmented Students Parents bar chart Less than 250 miles .35 relationship .53 displays the same between the 250 to 500 miles .31 opinions of students and .30 parents concerning 500 to 1000 miles .19 that college .11 the ideal distance is from home More than 1000 .15 as miles the double bar .06 chart does. First draw a 1.0 Next, divide Do the same thing bar that themiles bar into Less than 250 0.8 for parents – represents 250 to 500 miles segments. don’t forget a key 0.6 100% of the 500 to 1000 miles denoting each More than 1000 miles who students 0.4 category answered the 0.2 survey. Students Parents Pie (Circle) Chart When to Use Categorical data How to construct – Draw a circle to represent the entire data set – Calculate the size of each “slice”: Relative frequency × 360° – Using a protractor, mark off each slice To describe – comment on which category had the largest proportion or smallest proportion Typos on a résumé do not make a very good impression when applying for a job. Senior executives were asked how many typos in a résumé would make them not consider a job candidate. The resulting data are summarized in the table below. Number of Typos Frequency Relative Frequency 1 60 .40 2 54 .36 3 21 .14 4 or more 10 .07 Don’t know 5 .03 Create a pie chart for these data. Number of Typos 1 2 Frequency Relative Frequency What does this pie chart tell us about the 60 .40 number of54typos occurring in résumés .36 21 applicant .14 would not be before the 4 or more 10 .07 considered for a job? 3 Don’t know 5 .03 First draw a Next, calculate circle to each Repeat for the size ofthe the represent slice. slice for “1 typo” entire data set.is the Here .40×360º =144º completed pie chart created Draw slice. usingthat Minitab. Graphs for numerical data Dotplot When to Use Small numerical data sets How to construct – Draw a horizontal line and mark it with an appropriate numerical scale – Locate each value in the data set along the scale and represent it by a dot. If there are two are more observations with the same value, stack the dots vertically Dotplot (continued) What to Look For – – – – The representative or typical value The extent to which the data values spread out The nature of the distribution along the number line The presence of unusual values Collect the following data and then display the data in a dotplot: How many body piercings do you have? How to describe a numerical, univariate graph What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ? 1. Center • discuss where the middle of the data falls • three measures of central tendency – mean, median, & mode The mean and/or median is typically reported rather than the mode. What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? 2. Spread • discuss how spread out the data is • refers to the variability in the data Remember, Standard deviation & IQR will be discussed in Chapter Range = maximum value – 4 minimum value • Measure of spread are – Range, standard deviation, IQR What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ? 3. Shape • refers to the overall shape of the distribution The following slides will discuss these shapes. Symmetrical 1. Collect data by rolling two dice and recording the sum of the two dice. • refers datatimes. in which both sides Repeatto three are (more or less) the same when the graph is folded 2. Plot your sums on thevertically dotplot ondown the board. the middle • bell-shaped is a special type 3. What shape does this distribution –have? has a center mound with two sloping tails Uniform 1. Collect data by rolling a single die and recording the number rolled. Repeat • refers to data in which every class five times. has equal or approximately equal frequency 2. Plot your numbers on the dotplot on To help remember the board. the name for this shape, picture 3.soldier What shape does standing in this distribution have? straight lines. What are they wearing? Skewed 1. Collect data finding the age of five coins in circulation (current year Name a variable distribution minus year ofwith coin)aand record that is skewed left.one side • refers to data in which (tail) longer than other side 2. Plot is the ages on thethe dotplot on the board. • the direction of skewness is on the 3. What shape does this distribution side of the longer tail have? The directions are right skewed or left skewed. Bimodal (multi-modal) Suppose collect data on the time it takes totodrive San of Luis Obispo, • refers the from number peaks in California California. the shapeto ofMonterey, the distribution Some people may take the inland • Bimodal would have two peakswhile route (approximately 2.5 hours) others may take thehave coastal route • Multi-modal would more than (between 3.5 and 4 hours). two peaks Bimodal distributions can occur when the data set consist of observations from What shape would thisofdistribution two different kinds individuals or Whathave? would a distribution be called if it objects. had ONLY one peak? Unimodal 3. Shape • refers to the overall shape of the distribution • symmetrical, uniform, skewed, or bimodal What strikes you as the most distinctive difference among the distributions of exam scores in class J ? 4. Unusual occurrences • Outlier - value that lies away from the rest of the data • Gaps • Clusters 5. In context • You must write your answer in reference to the context in the problem, using correct statistical vocabulary and using complete sentences! Dotplot (continued) What to Look For – – – – The representative or typical value The extent to which the data values spread out Describealong thethedistribution The nature of the distribution number line the number of body The presence of unusual values of piercings the class has. Collect the following data and then display the data in a dotplot: How many body piercings do you have? Numerical Graphs Continued Stem-and-Leaf Displays When to Use Univariate numerical data How to construct Each is split into two parts: Can also number create comparative stem-and-leaf – Select one or more of the leading digits for the Remember collected in Chapter 1 – how many stem the data setdisplays piercings do–you have? stem Would a stem-and-leaf display be a – List the possible in a vertical column Stem consists of values the first digit(s) Use for small to good graph for this distribution? Why or why not? –Leaf Record the leaf forof each observation beside consists the final digit(s) each corresponding stem valuemoderate sized data sets. – Indicate the units for stems andBe leaves in to a key sure list or legend Doesn’t work well If you have a long lists of every stem from for leaves behind asmallest fewdata stems, thelarge to To describe sets. you can split in value order thestems largest – comment on the center, spread, and shape of the spread the distribution and if there aretoany unusualout features distribution. The following data are price per ounce for various brands of different brands of dandruff shampoo at a local grocery store. 0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23 Create a stem-and-leaf display with this data? What would an List the stems For the observation of Stem Leaf TheContinue median price per ounce recording each appropriate stem Describe this vertically “0.32”, write the 2 is 7 1 for dandruff shampoo leaf with the be? distribution. behind the “3” stem. $0.285, with a range corresponding stemof 2 1 9 8 3 $0.37. The distribution is 2 6 3 positively skewed with an 4 outlier at $0.54. 5 4 The Census Bureau projects the median age in 2030 for the 50 states and Washington D.C. A stem-and-leaf display is shown below. Notice now you We use L for lower leaf valuesthat (0-4) see(5-9). the shape of and H for higher leaf can values this distribution. Notice that you really cannot We can split the stems in order see a distinctive shape for this to better see the shape of the distribution due to the long list distribution. of leaves The median percentage of primary-school-aged The following is data on the percentage ofto thein children enrolled inCreate school is larger for countries Let’s truncate the leaves a to comparative stemBe sure use comparative What is an appropriate primary-school-aged who are enrolled in Northern Africa thanchildren in Central Africa, but the unit place. and-leaf display. these language when describing stem? schoolare forthe 19 countries in distribution Northern Africa and ranges same. The for countries distributions! for 23 countries in Central in Northern Africa is strongly negatively“4” skewed, but “4.6” African. becomes the distribution for countries in Central Africa is approximately symmetrical. Northern Africa 54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9 98.8 91.6 97.8 96.1 92.2 94.9 98.6 86.6 96.9 88.9 Central Africa 58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9 43.0 85.0 63.4 58.4 61.9 40.9 73.9 34.8 74.4 97.4 61.0 66.7 79.6 Histograms When to Use Univariate numerical data How to construct DiscreteConstructed data For comparative histograms – use ―Draw a horizontal scale and mark it with thefor possible differently values for the variable two separate graphs with the same discrete versus ―Draw a on vertical scale and mark it with frequency continuous data or scale the horizontal axis relative frequency ―Above each possible value, draw a rectangle centered at that value with a height corresponding to its frequency or relative frequency To describe – comment on the center, spread, and shape of the distribution and if there are any unusual features Queen honey bees mate shortly after they become adults. During a mating flight, the queen usually takes several partners, collecting sperm that she will store and use throughout the rest of her life. A study on honey bees provided the following data on the number of partners for 30 queen bees. 12 8 9 2 3 7 4 5 5 6 6 4 6 7 7 7 10 4 8 1 6 7 9 7 8 7 8 11 6 10 Create a histogram for the number of partners of the queen bees. 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Draw a First draw a rectangle Next draw a horizontal above each vertical axis, scaled value with a axis, scaled with the height with possible corresponding frequency values of to the or relative the variable frequency. offrequency. interest. Suppose we use relative frequency instead of frequency on the vertical What do you notice about the shapes axis. of these two histograms? Histograms When to Use Univariate numerical data How to construct Continuous data ―Mark the boundaries of the class intervals on the horizontal axis ―Draw a vertical scale and mark it with frequency or relative frequency ―Draw a rectangle directly above each class This is the type of histogram that interval withmost a height corresponding to its frequency students are familiar with. or relative frequency To describe – comment on the center, spread, and shape of the distribution and if there are any unusual features A study examined the length of hours spent The median number ofahours spent watching watching TV per day for sample of children TV per day was greater for the 1-year-olds age 1 and for a sample of children age 3. Below than for the 3-year-olds. The distribution for are the comparative histograms. 3-year-olds was more strongly skewed right than the distribution for the 1-year-olds, Notice the common scaleranges. on but the two distributions had similar Write a few sentences the horizontal axis comparing the distributions. Children Age 1 Children Age 3 Cumulative Relative Frequency Plot When to use - used to answer questions about percentiles. How to constructPercentiles are a value with a given of percent of observations - Mark the boundaries the intervals on the at or below that value. horizontal axis - Draw a vertical scale and mark it with relative frequency - Plot the point corresponding to the upper end of each interval with its cumulative relative frequency, including the beginning point - Connect the points. The National Climatic Center has been collecting the cumulative weatherFind data for many years. The relative annual rainfall amounts for Albuquerque, New Mexico from 1950 to frequency for each 2008 were used to create the frequency distribution interval below. Annual Rainfall (in inches) Relative frequency 4 to <5 0.052 5 to <6 0.103 6 to <7 0.086 7 to <8 0.103 8 to <9 0.172 9 to <10 0.069 10 to < 11 0.207 11 to <12 0.103 12 to <13 0.052 13 to <14 0.052 Cumulative relative frequency + + 0.052 0.155 0.241 Continue this pattern to complete the table The National Climatic Center has been collecting weather data for many years.relative The annual rainfall To create a cumulative frequency amounts Albuquerque, Newthe Mexico from 1950of to plot,for graph a point for upper value 2008the were used to and create frequency distribution interval thethe cumulative relative In the context offrequency this below. Annualexplain Rainfall the Relative problem, frequency (in inches) meaning of this value. Cumulative relative frequency 4 to <5 0.052 0.052 5 to <6 0.103 0.155 6 to <7value one0.086 0.241 Why isn’t this Plot a point for each interval. 7(1)? to <8 0.103 0.344 8 to <9 0.516 Plot aofstarting point at (4,0). In the context this 0.172 to <10 0.069 points. 0.585 Connect problem, 9explain the the 10 to < 11 0.207 0.792 meaning of this value. 11 to <12 0.103 0.895 12 to <13 0.052 0.947 13 to <14 0.052 0.999 Cumulative relative frequency 1.0 0.8 0.6 What proportion of years had rainfall amounts that were 9.5 inches or less? Approximately 0.55 0.4 0.2 2 4 6 8 Rainfall 10 12 14 Cumulative relative frequency 1.0 0.8 Approximately 30% of the years had annual rainfall less than what amount? 0.6 0.4 0.2 Approximately 7.5 inches 2 4 6 8 Rainfall 10 12 14 Which interval of rainfall amounts had a larger proportion of years – 9 to 10 inches or 10 to 11 inches? Explain Cumulative relative frequency 1.0 0.8 0.6 The interval 10 to 11 inches, because its slope is steeper, indicating a larger proportion occurred. 0.4 0.2 2 4 6 8 Rainfall 10 12 14 Displaying Bivariate Numerical Data Scatterplots When to Use Bivariate numerical data Scatterplots are How to construct discussed in much greater depth in - Draw a horizontal scale and mark it with Chapter 5. variable appropriate values of the independent - Draw a vertical scale and mark it appropriate values of the dependent variable - Plot each point corresponding to the observations To describe - comment the relationship between the variables Time Series Plots When to Use - measurements collected over time at regular intervals How to construct Can abe considered - Draw horizontal scale and mark it with bivariate data where the appropriate values of time y-variable is thescale variable - Draw a vertical and mark it appropriate values of the observed measured and the x-variable - Plotvariable each point corresponding to the is time observations and connect To describe - comment on any trends or patterns over time The accompanying time-series plot of movie box office totals (in millions of dollars) over 18 weeks in the summer for 2001 and 2002 appeared in USA Today (September 3, 2002). Describe any trends or patterns that you see. Who, What, When, Where, Why, How • Who? – Individual cases about whom we record some characteristics. Individuals who answer a survey are called respondents. People on whom we experiment are subjects or participants but animals, plants, and other inanimate subjects are called experimental units. • What and why? – Type of variable and why you need to look at this variable. • Where, When, and How? – What methods were used to collect the data? – Where and when addresses differences in years and locations which may be important. Context • Context – The context tells Who was measured, How the data were collected, where the data were collected, and When and Why the study was performed. • Every question must be answered in context in complete sentences.
© Copyright 2026 Paperzz