Subjective Probability

Subjective Probability
Information Design
Scott Matthews
Courses: 12-706 / 19-702/ 73-359
1
Admin Issues
HW 5 (due next wed)
Next project schedule
Case studies coming
12-706 and 73-359
2
Subjective Probabilities
Main Idea: We all have to make personal
judgments (and decisions) in the face of
uncertainty (Granger Morgan’s career)
These personal judgments are subjective
Subjective judgments of uncertainty can be made in
terms of probability
Examples:
“My house will not be destroyed by a hurricane.”
“The Pirates will have a winning record (ever).”
“Driving after I have 2 drinks is safe”.
12-706 and 73-359
3
Outcomes and Events
Event: something about which we are uncertain
Outcome: result of uncertain event
Subjectively: once event (e.g., coin flip) has
occurred, what is our judgment on outcome?
Represents degree of belief of outcome
Long-run frequencies, etc. irrelevant - need one
Example: Steelers* play AFC championship game at
home. I Tivo it instead of watching live. I assume
before watching that they will lose.
*Insert Cubs, etc. as needed (Sox removed 2005)
12-706 and 73-359
4
Next Steps
Goal is capturing the uncertainty/ biases/ etc. in
these judgments
Might need to quantify verbal expressions (e.g.,
remote, likely, non-negligible..)
What to do if question not answerable directly?
Example: if I say there is a “negligible” chance of
anyone failing this class, what probability do you
assume?
What if I say “non-negligible chance that someone
will fail”?
12-706 and 73-359
5
Merging of Theories
Science has known that “objective” and
“subjective” factors existed for a long time
Only more recently did we realize we could
represent subjective as probabilities
But inherently all of these subjective decisions
can be ordered by decision tree
Where we have a gamble or bet between what we
know and what we think we know
Clemen uses the basketball game gamble example
We would keep adjusting payoffs until optimal
12-706 and 73-359
6
Probability Wheel
Mechanism for formalizing our thoughts
on probabilities of comparative lotteries
You select the area of the pie chart until
you’re indifferent between the two lotteries
Quick 2-person exercise. Then we’ll
discuss p-values.
12-706 and 73-359
7
Continuous Distributions
Similar to above, but we need to do it a
few times.
E.g., try to get 5%, 50%, 95% points on
distribution
Each point done with a “cdf-like” lottery
comparison
12-706 and 73-359
8
Danger: Heuristics and Biases
Heuristics are “rules of thumb”
Which do we use in life? Biased? How?
Representativeness (fit in a category)
Availability (seen it before, fits memory)
Anchoring/Adjusting (common base point)
Motivational Bias (perverse incentives)
Idea is to consider these in advance and
make people aware of them
12-706 and 73-359
9
Asking Experts
In the end, often we do studies like this,
but use experts for elicitation
Idea is we should “trust” their predictions
more, and can better deal with biases
Lots of training and reinforcement steps
But in the end, get nice prob functions
12-706 and 73-359
10
Information Design
What is it? Idea of carefully linking what data you
have with what you want to say
“God” of the field: Edward Tufte (.com)
Quotes from his books (mostly his first)
The eye can recognize 150 Mbits of information
And is connected to our brain, a great processor
Perhaps most important: don’t just blindly use builtin graph/graphic tools when you have a significant
point to make
a.k.a. Excel and Powerpoint are not friends!
They create simplistic graphs that dumb us down
and 73-359your perceived command
11
Your graphics say a12-706
lot about
Some pre-thoughts
In statistics, plotting raw data is useful because it can show outliers (easy to see)
Analytical results need same treatment
12-706 and 73-359
12
Strive for “Graphical Excellence”
"consists of complex ideas communicated
with clarity, precision, and efficiency
is that which gives to the viewer the
greatest number of ideas in the shortest
time with the least “ink” in the smallest
space
is nearly always multivariate
“requires telling the truth about the data."
12-706 and 73-359
13
Graphics/Viz should:
 "show the data
 induce viewer to think about the substance rather than
about methodology, graphic design, the technology, etc.
 avoid distorting what the data have to say
 present many numbers in a small space
 make large data sets coherent
 encourage the eye to compare different pieces of data
 reveal the data at several levels of detail, from a broad
overview to the fine structure
 serve a reasonably clear purpose: description, exploration,
tabulation, or decoration
 be closely integrated with the statistical and verbal
descriptions of a data set."
12-706 and 73-359
14
Visualization goals
content focus
comparison rather than mere description
Integrity
high resolution
utilization of classic designs and concepts
proven by time.
12-706 and 73-359
15
Content Focus
“Above all else show the data." The focus
should be on the content of the data, not the
visualization technique. This leads to design
transparency.
The success of a visualization is based on deep
knowledge and care about the substance, and
the quality, relevance and integrity of the content
Assume that the viewer is just as smart as you
and cares just as much
Never `dumb-down' a visualization.
12-706 and 73-359
16
Comparison vs. Description
At the heart of quantitative reasoning is a single
question: Compared to what?
Most visualizations today are descriptive rather
than comparative. The xy-plot invites reasoning
about causality in a way that even the most
impressive isosurface does not.
We should strive for relational, rather than
merely descriptive, visualizations.
Avoid relying on the viewer's memory to make
visual comparisons; a weak facility in most of
us.
12-706 and 73-359
17
Integrity - Misleading
visualizations are common
To help limit unintentional visualization lies:
"The representation of numbers, as physically
measured on the surface of the graphic itself, should
be directly proportional to the numerical quantities
represented
Clear, detailed, and thorough labeling should be used
to defeat graphical distortion and ambiguity
Write out explanations of the data on the graphic
itself. Label important events in the data
Show data variation, not design variation
The number of information-carrying (variable)
dimensions depicted should not exceed the number
and 73-359
18
of dimensions in the12-706
data
“Lie Factor”
Lie-factor = size-of-effect-shown-invisualization / size-of-effect-in-data
12-706 and 73-359
19
Design Guidelines
Visualizations "are paragraphs about data and
should be treated as such." Words, pictures,
and numbers are all part of the information to be
visualized, not separate entities
"have a properly chosen format and design
use words, numbers, and drawing together
reflect balance, proportion, sense of relevant scale
display an accessible complexity of detail
often have a narrative quality, a story to tell about the
data
avoid content-free decoration, including “chartjunk”
(miscellaneous graphics
that
have nothing to do with 20
12-706 and
73-359
Examples, and what’s wrong?
Think of Tufte’s “rules” above. Specify.
12-706 and 73-359
21
Nice attempt gone bad..
Graphic was bad before scan made it worse ;-)
Source: NY Times, Aug 9, 1978, p. D-2
Caption says “Fuel Economy Standards for Autos, set by Congress
12-706
73-359
22
And supplemented by DOT,
in and
miles
per gallon”
12-706 and 73-359
23
12-706 and 73-359
24
12-706 and 73-359
25
12-706 and 73-359
26
12-706 and 73-359
27
12-706 and 73-359
28
What’s wrong?
What could we do better?
12-706 and 73-359
29
Sorted by 5-yr
Formatted nicer (big small)
Source:http://edwardtufte.com
12-706 and 73-359
30
Consistent scale in this case
Causes lots of crossover and
Clutter.
12-706 and 73-359
31
12-706 and 73-359
32
Labels on both sides!
12-706 and 73-359
33
12-706 and 73-359
34
How far we’ve come!
12-706 and 73-359
35