full dist`n

Last Time
• Hypothesis Testing
– Yes – No Questions
– Assess with p-value
P[what saw or m.c. | Boundary]
– Interpretation
– Small is conclusive
– 1-sided vs. 2-sided
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Numerical answers:
– No computers, no calculators
– Handwrite Excel formulas (e.g. =9+4^2)
– Don’t do arithmetic (e.g. use such formulas)
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Numerical answers:
– No computers, no calculators
– Handwrite Excel formulas (e.g. =9+4^2)
– Don’t do arithmetic (e.g. use such formulas)
• Bring with you:
– 8.5 x 11 inch sheet of paper
– With your favorite info (formulas, Excel, etc.)
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Numerical answers:
– No computers, no calculators
– Handwrite Excel formulas (e.g. =9+4^2)
– Don’t do arithmetic (e.g. use such formulas)
• Bring with you:
– 8.5 x 11 inch sheet of paper
– With your favorite info (formulas, Excel, etc.)
• Course in Concepts, not Memorization
Administrative Matters
State of BlackBoard Discussion Board
• Generally happy with result
Administrative Matters
State of BlackBoard Discussion Board
• Generally happy with result
• But think carefully about “where to post”
– Look at current Thread HW 4
– Note “diffusion of questions”
– Hard to find what you want
Administrative Matters
State of BlackBoard Discussion Board
• Generally happy with result
• But think carefully about “where to post”
– Look at current Thread HW 4
– Note “diffusion of questions”
– Hard to find what you want
• Suggest keep HW problems all together
– i.e. One “Root node” per HW problem
Administrative Matters
State of BlackBoard Discussion Board
• Suggest keep HW problems all together
– i.e. One “Root node” per HW problem
Administrative Matters
State of BlackBoard Discussion Board
• Suggest keep HW problems all together
– i.e. One “Root node” per HW problem
• Choose where to post (in tree) carefully
Administrative Matters
State of BlackBoard Discussion Board
• Suggest keep HW problems all together
– i.e. One “Root node” per HW problem
• Choose where to post (in tree) carefully
• Use better “Subject Lines”
– Not just dumb “Replies”
– You can enter anything you want
– Try to make it clear to readers…
– Especially when “not following current line”
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 261-262, 9-14
Approximate Reading for Next Class:
270-276, 30-34
Hypothesis Testing
In General:
p-value = P[what was seen,
or more conclusive | at
boundary between H0 & H1]
Caution:
more conclusive requires careful
interpretation
Hypothesis Testing
Caution:
more conclusive requires careful
interpretation
Reason:
Need to decide between
1 - sided Hypotheses, like
H0 : p <
vs.
H1: p ≥
And 2 - sided Hypotheses, like
H0 : p =
vs.
H1: p ≠
Hypothesis Testing
e.g. a slot machine bears a sign which says
“Win 30% of the time”
In 10 plays, I don’t win any.
Can I conclude sign is false?
(& thus have grounds for complaint,
or is this a reasonable occurrence?)
Hypothesis Testing
e.g. a slot machine bears a sign which says
“Win 30% of the time”
In 10 plays, I don’t win any.
Let p = P[win],
Model:
Test:
Conclude false?
let X = # wins in 10 plays
X ~ Bi(10, p)
H0: p = 0.3
vs.
H1: p ≠ 0.3
Hypothesis Testing
Test:
H0: p = 0.3
vs.
H1: p ≠ 0.3
p-value = P[X = 0 or more conclusive | p = 0.3]
Hypothesis Testing
Test:
H0: p = 0.3
vs.
H1: p ≠ 0.3
p-value = P[X = 0 or more conclusive | p = 0.3]
(understand this by visualizing # line)
Hypothesis Testing
Test:
H0: p = 0.3
H1: p ≠ 0.3
vs.
p-value = P[X = 0 or more conclusive | p = 0.3]
0
1
2
3
4
5
6
Hypothesis Testing
Test:
H0: p = 0.3
H1: p ≠ 0.3
vs.
p-value = P[X = 0 or more conclusive | p = 0.3]
0
1
2
3
4
5
6
30% of 10, most likely when p = 0.3
i.e. least conclusive
Hypothesis Testing
Test:
H0: p = 0.3
H1: p ≠ 0.3
vs.
p-value = P[X = 0 or more conclusive | p = 0.3]
0
1
2
3
4
5
6
so more conclusive includes
Hypothesis Testing
Test:
H0: p = 0.3
H1: p ≠ 0.3
vs.
p-value = P[X = 0 or more conclusive | p = 0.3]
0
1
2
3
4
5
6
so more conclusive includes
but since 2-sided, also include
Hypothesis Testing
Generally how to calculate?
0
1
2
3
4
5
6
Hypothesis Testing
Generally how to calculate?
Observed Value
0
1
2
3
4
5
6
Hypothesis Testing
Generally how to calculate?
Observed Value
Most Likely Value
0
1
2
3
4
5
6
Hypothesis Testing
Generally how to calculate?
Observed Value
Most Likely Value
0
1
2
3
4
# spaces = 3
5
6
Hypothesis Testing
Generally how to calculate?
Observed Value
Most Likely Value
0
1
2
3
4
5
6
# spaces = 3
so go 3 spaces in other direct’n
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
0
1
2
3
4
5
6
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
p-value = P[X = 0 or more conclusive | p = 0.3]
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
p-value = P[X = 0 or more conclusive | p = 0.3]
= P[X ≤ 0 or X ≥ 6 | p = 0.3]
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
p-value = P[X = 0 or more conclusive | p = 0.3]
= P[X ≤ 0 or X ≥ 6 | p = 0.3]
= P[X ≤ 0] + (1 – P[X ≤ 5])
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
p-value = P[X = 0 or more conclusive | p = 0.3]
= P[X ≤ 0 or X ≥ 6 | p = 0.3]
= P[X ≤ 0] + (1 – P[X ≤ 5])
= 0.076
Hypothesis Testing
Result: More conclusive means
X ≤ 0 or X ≥ 6
p-value = P[X = 0 or more conclusive | p = 0.3]
= P[X ≤ 0 or X ≥ 6 | p = 0.3]
= P[X ≤ 0] + (1 – P[X ≤ 5])
= 0.076
Excel result from:
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg4.xls
Hypothesis Testing
Test:
H0: p = 0.3
p-value = 0.076
vs.
H1: p ≠ 0.3
Hypothesis Testing
Test:
H0: p = 0.3
vs.
H1: p ≠ 0.3
p-value = 0.076
Yes-No Conclusion:
0.076 > 0.05,
so not safe to conclude “P[win] = 0.3” sign
is wrong, at level 0.05
Hypothesis Testing
Test:
H0: p = 0.3
vs.
H1: p ≠ 0.3
p-value = 0.076
Yes-No Conclusion:
0.076 > 0.05,
so not safe to conclude “P[win] = 0.3” sign
is wrong, at level 0.05
(10 straight losses is reasonably likely)
Hypothesis Testing
Test:
H0: p = 0.3
vs.
H1: p ≠ 0.3
p-value = 0.076
Yes-No Conclusion:
0.076 > 0.05,
so not safe to conclude “P[win] = 0.3” sign
is wrong, at level 0.05
Gray Level Conclusion:
in “fuzzy zone”,
some evidence, but not too strong
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
Hypothesis Testing
Alternate Question:
can we conclude:
•
Same setup,
P[win] < 30% ???
Seems like same question?
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
•
Seems like same question?
•
Careful, “≠” became “<”
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
•
Seems like same question?
•
Careful, “≠” became “<”
•
I.e. 2-sided hypo became 1-sided hypo
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
•
Seems like same question?
•
Careful, “≠” became “<”
•
I.e. 2-sided hypo became 1-sided hypo
•
Difference can have major impact
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
same boundary between H0 & H1
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
= P[ X ≤ 0 | p = 0.3]
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
= P[ X ≤ 0 | p = 0.3] = 0.028
Hypothesis Testing
Alternate Question:
can we conclude:
Test:
H0: p ≥ 0.3
Same setup,
P[win] < 30% ???
vs.
H1: p < 0.3
p-value = P[ X = 0 or m. c. | p = 0.3]
= P[ X ≤ 0 | p = 0.3] = 0.028
Excel result from:
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg4.xls
Hypothesis Testing
Alternate Question:
can we conclude:
p-value = 0.028
Same setup,
P[win] < 30% ???
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
p-value = 0.028
Yes-No:
Now can conclude P[win] < 30%
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
•
Have strong evidence that P[win] < 30%
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
•
Have strong evidence that P[win] < 30%
•
But cannot conclude P[win] diff’t from 30%
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
•
Have strong evidence that P[win] < 30%
•
But cannot conclude P[win] diff’t from 30%
•
Different from Common Sense
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
•
Have strong evidence that P[win] < 30%
•
But cannot conclude P[win] diff’t from 30%
•
Different from Common Sense
•
I.e. “logic of statistical significance” different
from“ordinary logic”
Hypothesis Testing
Yes-No:
Now can conclude P[win] < 30%
Paradox of Yes-No Approach:
•
Have strong evidence that P[win] < 30%
•
But cannot conclude P[win] diff’t from 30%
•
Different from Common Sense
•
I.e. “logic - stat. sig.” not “ordinary logic”
•
Reason: for 2-sided, uncertainty comes
from both sides, just adds to gray level
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
p-value = 0.028
Yes-No:
Now can conclude P[win] < 30%
Gray Level:
Evidence still flaky, but stronger
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
p-value = 0.028
Yes-No:
Now can conclude P[win] < 30%
Gray Level:
•
Evidence still flaky, but stronger
Note: No gray level paradox
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
p-value = 0.028
Yes-No:
Now can conclude P[win] < 30%
Gray Level:
Evidence still flaky, but stronger
•
Note: No gray level paradox
•
Since no cutoff, just “somewhat stronger…”
Hypothesis Testing
Alternate Question:
can we conclude:
Same setup,
P[win] < 30% ???
p-value = 0.028
Yes-No:
Now can conclude P[win] < 30%
Gray Level:
Evidence still flaky, but stronger
•
Note: No gray level paradox
•
Since no cutoff, just “somewhat stronger…”
•
This is why I recommend gray level
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
(strongly affects answer)
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
(strongly affects answer)
2. Careful Interpretation
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
(strongly affects answer)
2. Careful Interpretation
(notion of “P[win]≠30%” being tested
is different from usual)
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
2. Careful Interpretation
But not so bad with Gray Level interpretation
Hypothesis Testing
Lessons:
1-sided vs. 2-sided issues need:
1. Careful Implementation
2. Careful Interpretation
But not so bad with Gray Level interpretation:
“very strong”
“marginal” – “flaky”
“very weak”
p-val < 0.01
0.01 ≤ p-val ≤ 0.1
0.1 < p-val
Hypothesis Testing
HW C14:
Answer from both gray-level and
yes-no viewpoints:
(c) A TV ad claims that 30% of people prefer
Brand X. Should we dispute this claim if a
random sample of 10 people show:
(i)
2 people who prefer Brand X (p-val = 0.733)
(ii) 3 people who prefer Brand X (p-val = 1)
(iii) 6 people who prefer Brand X (p-val = 0.076)
(iv) 10 people who prefer Brand X (p-val = 5.9e-6)
Hypothesis Testing
HW C14:
Answer from both gray-level and
yes-no viewpoints:
(d) A manager asks 12 workers, of whom 7
say they are satisfied with working
conditions. Does this contradict the CEO’s
claim that ¾ of the workers are satisfied?
(p-val = 0.316)
Hypothesis Testing
HW:
8.22a, ignore “z statistic” (p-val = 0.006)
8.29a, ignore “sketch …” (p-val = 0.184)
And now for something
completely different
Coin tossing & die rolling
And now for something
completely different
Coin tossing & die rolling:
•
Useful thought models in this course
And now for something
completely different
Coin tossing & die rolling:
•
Useful thought models in this course
•
We’ve calculated various probabilities
And now for something
completely different
Coin tossing & die rolling:
•
Useful thought models in this course
•
We’ve calculated various probabilities
•
Model for “randomness”…
And now for something
completely different
Coin tossing & die rolling:
•
Useful thought models in this course
•
We’ve calculated various probabilities
•
Model for “randomness”…
•
But how random are they really?
And now for something
completely different
Randomness in coin tossing
And now for something
completely different
Randomness in coin tossing:
•
Excellent source
•
Prof. Persi Diaconis (Stanford U.)
And now for something
completely different
Randomness in coin tossing:
•
Excellent source
•
Prof. Persi Diaconis (Stanford U.)
http://www-stat.stanford.edu/~cgates/PERSI/
And now for something
completely different
Randomness in coin tossing
And now for something
completely different
Randomness in coin tossing:
•
Prof. Persi Diaconis (Stanford U.)
•
Trained as performing magician
And now for something
completely different
Randomness in coin tossing:
•
Prof. Persi Diaconis (Stanford U.)
•
Trained as performing magician
•
Legendary Trick:
–
He tosses coin, you call it, he catches it!
And now for something
completely different
Randomness in coin tossing:
•
Prof. Persi Diaconis (Stanford U.)
•
Trained as performing magician
•
Legendary Trick:
–
•
He tosses coin, you call it, he catches it!
Coin tosses not really random
And now for something
completely different
Randomness in die rolling?
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
Can solve using BINOMDIST
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
•
Margin of Error
(Find dist’n, use to measure error)
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
•
Margin of Error
(Find dist’n, use to measure error)
•
Choose Sample Size
(for given amount of error)
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
•
Margin of Error
(Find dist’n, use to measure error)
•
Choose Sample Size
(for given amount of error)
Need better prob. tools
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Start with visualizing probability distributions
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Start with visualizing probability distributions
(key to “alternate representation”)
Visualization
Idea: Visually represent “distributions” (2 types)
Visualization
Idea: Visually represent “distributions” (2 types)
a) Probability Distributions (e.g. Binomial)
Visualization
Idea: Visually represent “distributions” (2 types)
a) Probability Distributions (e.g. Binomial)
Summarized by f(x)
Visualization
Idea: Visually represent “distributions” (2 types)
a) Probability Distributions (e.g. Binomial)
Summarized by f(x)
b) Lists of numbers,
x1, x2, …, xn
Visualization
Idea: Visually represent “distributions” (2 types)
a) Probability Distributions (e.g. Binomial)
Summarized by f(x)
b) Lists of numbers,
x1, x2, …, xn
Use subscripts to index different ones
Visualization
Examples of lists:
(will often use below)
1. Collection of “#’s of Males, from HW ???
2. 2.3, 4.5, 4.7, 4.8, 5.1
Visualization
Examples of lists:
(will often use below)
1. Collection of “#’s of Males, from HW ???
2. 2.3, 4.5, 4.7, 4.8, 5.1
…
(there are many others)
Visualization
Connections between prob. dist’ns and lists
Visualization
Connections between prob. dist’ns and lists:
(i) Given dist’n, can construct a related list by
drawing sample values from dist’n
Visualization
Connections between prob. dist’ns and lists:
(i) Given dist’n, can construct a related list by
drawing sample values from dist’n
e.g.
Bi(1,0.5)
(toss coins, count H’s)
1, 1, 1, 0, 0, 0, 1
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
(not thinking of these as random,
so use lower case)
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
ˆf x   # xi  x 
n
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
ˆf x   # xi  x 
n
Use different symbol, to distinguish from f
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
ˆf x   # xi  x 
n
Use different symbol, to distinguish from f
Use “hat” to indicate “estimate”
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
E.g. For above list:
ˆf x   # xi  x 
n
1, 1, 1, 0, 0, 0, 1
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
E.g. For above list:
ˆf x   # xi  x 
n
1, 1, 1, 0, 0, 0, 1
x0

ˆf x   
x 1

0 otherwise

3
7
4
7
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
ˆf x   # xi  x 
n
Called the “empirical prob. dist’n”
or “frequency distribution”
Visualization
Connections between prob. dist’ns and lists
(ii) Given a list, x1, x2, …, xn,
can construct a dist’n:
ˆf x   # xi  x 
n
Called the “empirical prob. dist’n”
or “frequency distribution”
Provides probability model for: choose random
number from list
Visualization
Note: if start with f(x),
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn, (as in (i))
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn, (as in (i))
(random, so use capitals)
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn,
ˆ
f
And construct frequency distribution  x  of
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn,
ˆ
f
And construct frequency distribution  x  of
Then for n large, fˆ  x   f  x 
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn,
ˆ
f
And construct frequency distribution  x  of
Then for n large, fˆ  x   f  x 
(so “hat” notation is sensible)
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn,
ˆ
f
And construct frequency distribution  x  of
Then for n large, fˆ  x   f  x 
–
Recall “frequentist interpretation” of probability
Visualization
Note: if start with f(x),
and draw
random sample, X1, X2, …, Xn,
ˆ
f
And construct frequency distribution  x  of
Then for n large, fˆ  x   f  x 
–
Recall “frequentist interpretation” of probability
–
Can make precise, using
lim
n
Visualization
Simple visual representation for lists:
Use number line, put x’s
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
3
4
5
6
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
•
3
4
5
6
Picture already gives better impression
than list of numbers
Visualization
Simple visual representation for lists:
Use number line, put x’s
E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1
2
•
3
4
5
6
Will be much better when lists become “too
long to comprehend”
Visualization
Drawbacks of:
Number line, & x’s
Visualization
Drawbacks of:
Number line, & x’s
When have many data points:
•
Hard to construct
•
Can’t see all (overplotting)
•
Hard to interpret
Visualization
Alternatives (Text, Sec. 1.1):
•
Stem and leaf plots
Visualization
Alternatives (Text, Sec. 1.1):
•
Stem and leaf plots
–
Clever visualization, for only pencil & paper
–
But we have computers
–
So won’t study further
Visualization
Alternatives (Text, Sec. 1.1):
•
Stem and leaf plots
•
Histograms
–
Will study carefully
Statistical Folklore
Graphical Displays:
•
Important Topic in Statistics
•
Has large impact
•
Need to think carefully to do this
•
Watch for attempts to fool you
Statistical Folklore
Graphical Displays:
Interesting Article:
“How to Display Data Badly”
Howard Wainer
The American Statistician, 38, 137-147.
Internet Available:
http://links.jstor.org
Statistical Folklore
Main Idea:
•
Point out 12 types of bad displays
•
With reasons behind
•
Here are some favorites…
Statistical Folklore
Hiding the data in the scale
Statistical Folklore
The eye perceives
areas as “size”:
Statistical Folklore
Change of
Scales
in MidAxis
Really trust
the
Post???
Histograms
Idea: show rectangles, where area represents
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Note: will studies these in parallel for a while
(several concepts apply to both)
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Caution: There are variations not based on
areas, see bar graphs in text
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Caution: There are variations not based on
areas, see bar graphs in text
But eye perceives area, so sensible to use it
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns:
If possible values are: x = 0, 1, … , n,
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns:
If possible values are: x = 0, 1, … , n,
get good picture from choice:
[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns:
If possible values are: x = 0, 1, … , n,
get good picture from choice:
[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)
where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns:
If possible values are: x = 0, 1, … , n,
get good picture from choice:
[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)
where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”
(called a “half open interval”)
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
same e.g. as above
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
As above use half open intervals
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
As above use half open intervals
(to break ties)
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
As above use half open intervals
•
Note:
These contain full data set
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
Can use anything for class intervals
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
Can use anything for class intervals
•
But some choices better than others…
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
(a) Probs: use f(x) for [x-½, x+½), etc.
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
(a) Probs: use f(x) for [x-½, x+½), etc.
(b) Lists: [1,3): rel. freq. = 1/5 = 20%
[3,7): rel. freq. = 4/5 = 80%
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
3. Above each interval, draw rectangle where
area represents class frequency
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs: If width = 1, then
area = width x height = height
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs: If width = 1, then
area = width x height = height
So get area = f(x), by taking height = f(x)
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs: If width = 1, then
area = width x height = height
So get area = f(x), by taking height = f(x)
E.g. Binomial Distribution
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
(do 1st two, and drag box)
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
•
Compute f(x) values
(create 1st one, and drag twice)
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
•
Compute f(x) values
•
Make bar plot
Binomial Prob. Histograms
•
Make bar plot
–
–
–
“Insert” tab
Choose “Column”
Right Click – Select Data
(Horizontal – x’s, “Add series”, Probs)
Resize, and move by dragging
Delete legend
Click and change title
Right Click on Bars, Format Data Series:
–
–
–
–
•
•
Border Color, Solid Line, Black
Series Options, Gap Width = 0
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
•
Compute f(x) values
•
Make bar plot
•
Make several, for interesting comparison