90-786 Lecture 1

Lecture 1. Making Sense of Data: Data
Variation
David R. Merrell
90-786 Intermediate Empirical
Methods for Public Policy and
Management
Making Sense of Data: Data
Variation

Introductions



Instructor: David R. Merrell
TA s: Max Hernandez-Toso and Hao Xu
Course Content:
USEFUL STATISTICS
Statistics is the use of data to reduce
uncertainty about potential observations
Course Information


Web site
 http://Duncan.heinz.cmu.edu/GeorgeWeb/
Heinz 90-786 Front Page.htm
Data files

r:/academic/90786
Making Sense of Data




Motivation in management and policy
What is data?
What’s the use of data?
Data variation
Motivation for Statistical Input

Managerial Decision Making



Changes in societal or organizational
conditions
Differences between observations and
expectations
Policy Making

Impact of changing the system
What is Data?


Unit of analysis
Number of variables


one, two, more than two
Level of measurement / kind of
data

Nominal, Ordinal, Interval
Unit of analysis







Focus of attention: a case that can be be
separately and uniquely identified
person (student, woman, tenant, ..
place (city, street intersection, river, …
object (car, power plant, ...)
organization (school, corporation, …)
incident (birth, election)
time period(day, season, year, ...)
Variables


Characteristics, attributes, and
occurrences observed about each unit
of analysis
Require specific step-by-step procedure
to obtain values for the variable
Examples

Driver's license application study




Unit of analysis: people who apply for a driver's license.
Outcome variable: License issued or not
Other variables: Applicant's age, sex, and race
Snowfall in Pittsburgh



Units of analysis: Snowstorms
Outcome variable: depth of the snowfall from each
storm
Other variables: date of snowstorm, temperature
Nominal data



Classifies outcomes by categories
Categories must be mutually exclusive
and exhaustive
Examples:

Marital status, region of the country, religion,
occupation, school district, place of birth, blood
type
Ordinal data


Classifies outcomes by ranked
categories
Examples:


Officers in the U.S. Army can be classified as:
 1 = general
5 = captain
 2 = colonel
6 = first lieutenant
 3 = lieutenant colonel
7 = second lieutenant
 4 = major
Education (highest diploma or degree attained)
Interval data


Classifies outcomes on a continuous
scale
Examples:



Scholastic Aptitude Test (SAT) score
Consumer Price Index (CPI)
Time of day
What’s the Use of Data?



Description
Evaluation
Estimation
Description



Summary of observations
In February, 1997 the M1A money
supply in Taiwan rose 6.46% over
February, 1996
Housing starts in June, 1996, rose to a
seasonally adjusted rate of 1,480,000
units from a revised 1,461,000 in May
Evaluation


Comparison of observed state of affairs
against expectations
Expectations are based on: ethical
norms, managerial plans and budgets
Estimation


Uses observations to assess an attribute of
a population or to predict future values.
A new charter school in Boston raised test
scores an average of 7 percentile points.


How would other charter schools do?
How will this charter school do in the future?
Data Variation: Data
Compression and Display


Boxplots
Five number summary





minimum
lower quartile point
median
upper quartile point
maximum
Batting Average of 263 major
league baseball players
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Compressed Data Values
Median
Minimum
Maximum
0.263
0.196
0.353
Range
0.155
Mode
0.250
Mean
0.263
Standard Deviation 0.023
Batting Average of 263 major
league baseball players
Maximum
0.352
0.4
0.35
Median 0.263
0.3
0.25
0.2
0.15
0.1
0.05
0
Minimum
0.196
Next Time ...

Data Compression for One Variable