Unit #1 Chapters

CHAPTER 2: DATA
• THE FIVE W’S
•
WHO- THE ROWS OF A DATA TABLE THAT CORRESPOND TO THE INDIVIDUAL CASES ABOUT WHOM WE RECORD SOME
STATISTICS.
•
•
•
•
•
WHAT- THE CHARACTERISTICS RECORDED ABOUT EACH INDIVIDUAL
WHY- REASON DATA WAS COLLECTED
WHERE- WHERE THE DATA WAS COLLECTED
WHEN- THE TIME THAT THE DATA WAS COLLECTED
HOW- IT IS IMPORTANT TO EXPLAIN THE TYPE OF EXPERIMENT, SURVEY, OR STUDY THAT WAS CONDUCTED
• COLLECTED DATA IS ORGANIZED INTO DATA TABLES
X
Y
Z
A
123
34567
56789
B
123345
789
0987654
CHAPTER 2 CONT.
• VARIABLES
• MEASURED IN UNITS
• CATEGORICAL VARIABLE- ANSWERS QUESTIONS ABOUT HOW CASES FALL INTO CATEGORIES
• QUANTITATIVE VARIABLE- ANSWERS QUESTIONS ABOUT THE QUANTITY OF WHAT IS MEASURED
• TYPES OF RESPONDENTS
• SUBJECTS- PEOPLE WE EXPERIMENT ON
• EXPERIMENTAL UNITS- ANIMALS, PLANTS, WEB SITES AND OTHER INANIMATE SUBJECTS
• RESPONDENTS- INDIVIDUALS WHO ANSWER A SURVEY
CHAPTER 3: DISPLAYING AND DESCRIBING
DATA
•
•
•
THREE RULES OF DATA ANALYSIS
1.
2.
3.
MAKE A PICTURE
MAKE A PICTURE
MAKE A PICTURE
FREQUENCY TABLE
•
•
•
TABLE THAT ORGANIZES COUNTS FOR CATEGORICAL DATA
RELATIVE FREQUENCY TABLES SHOW PERCENTS
IMPORTANT TO KNOW PROPORTIONS SO WE CAN USE PERCENTS
AREA PRINCIPLE- THE AREA OCCUPIED BY A PART OF THE GRAPH SHOULD CORRESPOND TO THE MAGNITUDE OF THE VALUE
IT REPRESENTS.
CHAPTER 3 CONT.
• BAR CHART- DISPLAYS THE DISTRIBUTION OF A CATEGORICAL VARIABLE, SHOWING THE COUNTS FOR EACH
CATEGORY NEXT TO EACH OTHER FOR EASY COMPARISON.
• PIE CHARTS- SHOWS ALL THE CASES ON AS A CIRCLE AND THEY SLICE THE CIRCLE INTO PIECES WHO SIZES
ARE PROPORTIONAL TO THE FRACTION OF THE WHOLE OF EACH CATEGORY.
• CONTINGENCY TABLE
•
•
•
SHOWS TWO VARIABLES SIDE BY SIDE
MARGINAL DISTRIBUTION- SHOWS THE COUNTS FOR EACH VARIABLE
CONDITIONAL DISTRIBUTION- SHOWS THE PERCENTS FOR EACH VARIABLE
• INDEPENDENCE- WHEN THE DISTRIBUTION OF ONE VARIABLE IS THE SAME FOR ALL CATEGORIES OF ANOTHER
AP Stats Grades
A
B
C
D
F
Bar Chart
Pie Chart
X
Y
A
5678
234567
B
98765
345678
Total
99999
98765
Contingency Table
CHAPTER 4: DISPLAYING AND
SUMMARIZING DATA
• HISTOGRAM- REPRESENTS COUNTS AS BARS AND PLOTS THEM AGAINST QUANTITATIVE DATA.
• RELATIVE FREQUENCY HISTOGRAM- SAME AS HISTOGRAM, REPLACING THE COUNTS ON THE VERTICAL
AXIS WITH PERCENTAGES OF THE TOTAL NUMBER OF CASES.
• STEM-AND-LEAF PLOT- SIMILAR TO A HISTOGRAM, BUT IT SHOWS EACH INDIVIDUAL VALUE.
• DOTPLOT- A DOT IS PLACED ALONG AN AXIS FOR EACH CASE IN THE DATA.
• QUANTITATIVE DATA CONDITION- THE DATA ARE VALUES OF A QUANTITATIVE VARIABLE WHOSE UNITS ARE
KNOWN. MUST KNOW THIS BEFORE MAKING A GRAPHICAL DISPLAY.
CHAPTER 4 CONT.
• THREE THINGS TO DESCRIBE A DISTRIBUTION
1.
2.
3.
SHAPE- WHETHER IT UNIMODAL OR BIMODAL, SYMMETRIC OR SKEWED, AND WHETHER OR NOT THERE ARE
OUTLIERS.
CENTER- THE CENTER OF THE DATA. USUALLY TALKS ABOUT THE MEDIAN.
 MEDIAN-IS THE MIDDLE VALUE THAT DIVIDES THE TWO HALVES OF THE HISTOGRAM.
SPREAD- THE RANGE AND INTERQUARTILE RANGE OF THE DATA.
 RANGE- THE DIFFERENCE BETWEEN THE MAXIMUM AND THE MINIMUM OF THE DATA.
 INTERQUARTILE RANGE- THE DIFFERENCE BETWEEN THE UPPER QUARTILE RANGE AND THE LOWER QUARTILE RANGE
• 5 NUMBER SUMMARY- REPORTS THE MEDIAN, QUARTILES, MINIMUM, AND THE MAXIMUM OF A DATA SET.
CHAPTER 4 CONT.
• MEAN
• FEELS LIKE THE CENTER BECAUSE IT IS THE POINT WHERE THE HISTOGRAM BALANCES.
• CALCULATED BY DIVIDING THE TOTAL OF YOUR DATA BY THE NUMBER OF DATA POINTS.
• USED WHEN THE HISTOGRAM IS SYMMETRIC AND THERE ARE NO OUTLIERS.
• MEDIAN
• IS RESISTANT TO VALUES THAT ARE EXTRAORDINARILY LARGE OR SMALL
• USED WHEN THE DATA IS SKEWED OR HAS OUTLIERS.
• STANDARD DEVIATION
• ACCOUNTS FOR HOW FAR EACH VALUE IS FROM THE MEAN.
• ONLY WORKS FOR SYMMETRIC DATA.
• CANNOT BE CALCULATED BY ITS SELF, SO YOU MUST TAKE THE SQUARE ROOT OF THE VARIANCE IN ORDER
TO OBTAIN THE STANDARD DEVIATION.
CHAPTER 5: UNDERSTANDING AND
COMPARING DISTRIBUTIONS
• BOXPLOT- A GRAPHICAL REPRESENTATION OF A 5 NUMBER SUMMARY. ALSO, SHOWS OUTLIERS OF THE DATA.
• OUTLIERS
•
•
ANY POINT THAT HAS LEVERAGE ON THE DATA DUE TO BEING EXTREMELY HIGH OR EXTREMELY LOW.
TO DETERMINE WHETHER OR NOT A POINT IS AN OUTLIER YOU USE THE FORMULA: 1.5 X IQR THEN SUBTRACT FROM
LOWER QUARTILE AND ADD TO UPPER QUARTILE.
• RE-EXPRESSING OR TRANSFORMING DATA- APPLY A SIMPLE FUNCTION TO FIX SKEWED DATA. EX: TAKING THE
NATURAL LOG OF YOUR DATA.
• BOXPLOTS ALLOW YOU TO COMPARE MULTIPLE SPREADS OF DATA.
COMPARING DISTRIBUTIONS
CHAPTER 6: THE STANDARD DEVIATION AS
A
RULER
AND
THE
NORMAL
MODEL
•
STANDARD DEVIATION
•
•
ANSWERS THE QUESTION HOW FAR IS THIS VALUE FROM THE MEAN AND HOW DIFFERENT ARE THESE TWO STATISTICS
STANDARDIZED VALUES OR Z-SCORES MEASURE THE DISTANCE OF EACH DATA VALUE FROM THE MEAN IN STANDARD
DEVIATIONS. STANDARDIZED VALUES HAVE NO UNITS.
• SHIFTING DATA
•
WHEN WE ADD OR SUBTRACT A CONSTANT TO EACH VALUE ALL MEASURES OF POSITION(CENTER, PERCENTILES, MIN,
AND MAX) WILL INCREASE OR DECREASE BY THAT SAME CONSTANT. THIS LEAVES SPREAD THE SAME.
•
WHEN WE MULTIPLY OR DIVIDE BY A CONSTANT TO EACH VALUE ALL MEASURES OF POSITION AND SPREAD WILL BE
MULTIPLIED OR DIVIDED BY THAT CONSTANT.
CHAPTER 6 CONT.
• NORMAL MODEL
•
THE BELL SHAPE CURVE THAT IT IS APPROPRIATE FOR DISTRIBUTIONS WHOSE SHAPES ARE UNIMODAL AND
SYMMETRIC.
•
•
•
•
NUMBERS WE USE TO SPECIFY THIS MODEL ARE CALLED PARAMETERS.
SUMMARIES OF THIS DATA ARE CALLED STATISTICS.
A NORMAL MODEL WITH A MEAN OF 0 AND A STANDARD DEVIATION OF 1 IS CALLED THE STANDARD NORMAL MODEL.
IN ORDER TO USE THIS MODEL THE DATA MUST MEET THE NEARLY NORMAL CONDITION.
• THE 68-95-99.7 RULE- SAYS THAT 68% OF THE DATA WILL FALL WITHIN 1 STANDARD DEVIATION OF THE MEAN,
95% WILL FALL WITHIN 2 STANDARD DEVIATIONS, AND 99.7% WILL FALL WITHIN 3 STANDARD DEVIATIONS.
CHAPTER 6 CONT.
• RULES FOR WORKING WITH THE NORMAL MODEL
1.
2.
3.
MAKE A PICTURE
MAKE A PICTURE
MAKE A PICTURE
• NORMAL PROBABILITY PLOT- TELLS YOU IF YOUR DATA IS NORMAL BY SHOWING WHETHER OR NOT YOUR
DATA LIES ON A DIAGONAL LINE.
NORMAL MODEL AND NORMAL
PROBABILITY PLOT