Section 1.4: Types of Data and Some Simple Graphical Displays Definitions • Variable – Is any characteristic whose value may change • Data – Result from making observations either on a single variable or simultaneously on two or more variables • Univariate Data Set – Consists of observations on a single variable made on individuals in a sample or population Univariate Data Sets • Two Types of Univariate Data Sets: – Categorical Data Set – Individual observations are categorical responses (nonnumerical) – Numerical Data Set – Individual observations are numerical (quantitative) in nature Airline Safety Violations • The Federal Aviation Administration (FAA) monitors airlines and can take administrative actions for safety violations. Information about the tines assessed by the FAA appeared in the article “Just How Safe Is That Jet?” Violations that could lead to a fine were categorized as Security (S), Maintenance (M), Flight Operations (F), Hazardous Materials (h), or Other (O). Data for the variable type of violation for 20 administrative actions are given in the following list. Airline Safety Violations List S S M H M O S M S S F S O M S M S M S M What type of data set is this list? Because type of violation is nonnumerical, this is a categorical data set. More than one variable If an observation consists of two or more responses or values: Two: Bivariate data set More than two: Multivariate data set Revisiting Airline Safety Violations Example of Bivariate Data Set The same article gave data on both the number of violations and the average fine per violation for 10 major airlines. The resulting data are given in the following table: Airline # of Violations Average Fine Alaska 258 5038.76 America West 257 3112.84 American 1745 2693.41 Continental 973 5755.39 Delta 1280 3828.13 Northwest 1097 2643.57 Southwest 535 3925.23 TWA 642 2803.74 United 1110 2612.61 US Airways 891 3479.24 Numerical Data • Two types of Numerical Data – Discrete – If the possible values are isolated points on the number line – Continuous – if the set of possible values forms an entire interval on the number line Definitions • Frequency Distribution for Categorical Data – A table that displays the possible categories along with the associated frequencies or relative frequencies • Frequency – For a particular category is the number of times the category appears in the data set • Relative Frequency – For a particular category is the fraction or proportion of the time that the category appears in the data set. Definitions continued Relative Frequency is calculated as: frequency relative frequency = number of observations in the data set When the table includes relative frequencies it is sometimes referred to as a relative frequency distribution. Preferred Leisure Activities Below is a report on physical activity patterns in urban woman. The following coding is used: W = walking, T = weight training, C = cycling, G = gardening, A = aerobics W T A W G T W W C W T W A T T W G W W C A W A W W W T W W T The corresponding frequency distribution is given. Frequency Distribution for Preferred Activity Bar Charts • A graph of the frequency distribution of categorical data. How to construct 1. Draw a horizontal line, and write the category names or labels below the line at regularly spaced intervals. 2. Draw a vertical line and label the scale using either frequency or relative frequency. 3. Place a rectangular bar above each category label. The height is determined by the frequency and weight should be the same. Why students drop out An article examined the reasons that college seniors leave before graduating. 42 seniors that dropped out were interviewed. Here is the data from the interview. Reason for leaving Frequency Academic Problems 7 Poor teaching 3 Needed a break 2 Economic Reasons 11 Family Responsibilities 4 To Attend another school 9 Personal Problems 3 Other 3 Bar Chart for why students drop out Dotplots • A picture of numerical data in which each observation is represented by a dot on or above a horizontal measurement scale. How to construct 1. Draw a horizontal line and mark it with an appropriate measurement scale 2. Locate each value in the data set along the measurement scale, and represent it by a dot. If there are two or more observations with the same value, stack the dots vertically. Graduation Rates for NCAA Division I Schools in California and Texas The data is of freshmen who earned a bachelor's degree by 6 years. California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63 Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22 Dotplot of Graduation Rates for California and Texas Activity: Head Sizes
© Copyright 2026 Paperzz