Graphs - 400 Bad Request

Graphs
"The important thing is not to stop questioning"
Overview
One of the most popular techniques for visually displaying quantitative information is the graph, but you know this
if you have opened an introductory economics textbook or read the financial sections of newspaper or magazines.
Unfortunately, experience has clearly indicated there is a considerable amount of information lost in the translation
between words and graphs. Stated somewhat differently, it has become painfully clear graphs are often
misunderstood by many. On the input side, graphs often get in the way of student's learning of economics rather
than aiding in their learning, and on the output side, graphs seldom add to the quality of student presentations /
writing. And this is not peculiar to Economics. Richard Bowen, a Psychology researcher who has seen much the
same thing in his discipline, has written an interesting little book, Graph It! How to Make, Read, and Interpret
Graphs. In his book he talks of Graphicacy, graph literacy, as the goal for his readers. He recognizes, however,
that to achieve this goal he will need to help many overcome Graphobia, the fear of graphs. There is also a need to
motivate others to make the investment in developing the skills necessary to create and interpret graphs.
Fortunately, these skills will pay off well beyond any single college course since more and more data are being
presented in quantitative form and very often the data are presented graphically.
The difficulty many people have with graphs is not surprising once you realize how recently we 'discovered'
graphs. Until the 1800s graphical design was dependent upon a direct analogy to the physical world. The first
graphs were maps. When you look at a sheet of paper, it is fairly easy to make the transition from the right-left and
up-down movements on the page to the physical movements east west and north south. The grid lines on the page
substitute for latitude and longitude. We tend to order things spatially and thus it was a natural to develop maps
that mimicked that order. For example, look at the map of the URI campus at Kingston. Once you get yourself
orientated to the north, if you turn to the right you will be able to look at the map and see what you will be looking
at. If you are looking North while standing in Chafee and then turn to the right you will see Woodward and Tyler
Hall, which is just what you see in the map. You will also see that Tyler is about twice as far from Chafee as
Woodward so not only is direction retained, so is concept of distance (URI map).
Similarly, if you have a map of the area surrounding URI (star), you will find Wickford is to the northeast of
campus and Newport is approximately twice as far in the easterly direction. One of the earliest maps was produced
in China in 1137 - nearly 400 years before comparable maps were produced in Western Europe. It was not until
the late 1600's, however, that we began to see the emergence of maps to which data are added. An example of a
data map that is available electronically appears below. What is missing is the legend, but we can see there is
'something' that sets apart the area extending from Virginia to New Hampshire, and California, while at the other
end of the pole you find the lower Mississippi delta region (Arkansas, Mississippi, and Louisiana).
Median Family Income by State
Eventually, however, we were able to extend the reach of our visual representation of information beyond space to
time and by 1786 we saw our first time-series graph in The Commercial and Political Atlas, by William Playfair.
1
Now we could visualize the passage of time with a graph-just as we had been able to envision the passage of
space. Just as we could say the movement from Kingston to Newport was twice the move from Kingston to
Wickford, now we can say that the movement from 1970 to 1980 was half the distance of the move from 1970 to
1990. Our ability at ordering of time allowed us to translate time-series relationships fairly easily into time-series
graphs. It allows us to 'see' the relationship between interest rates and time in the graph below. The reader can
easily see that interest rates peaked in 1980 and have been following a cyclical pattern downward.
Occasionally, we can combine space and time in one graph. One very impressive example would be Charles
Minard's graphical depiction of Napoleon's march into Russia, which we will talk about in class. As you look at
the graph below, try to take note of how many variables are being graphed simultaneously. If you start at Kovno,
the width of drown line conveys how many troops were in Napoleon's army - 442,000 - as it set off for Russia in
June 1812. As you move along with the army you see the location of the major battles and the size of the shrinking
army. The line switches colors at Moscow as the French abandon the city and retreat homeward. By the time they
leave Moscow, Napoleon's 400,000+ army has been reduced to about 100,000, and the temperature is falling
toward 0 degrees Celsius. So now we have location, size of army, date, direction, and temperature on the same
graph.
The final advance in the graphical display of information was to move beyond space and time to relational
graphics. Again it was Playfair who "broke free of the analogies to the physical world and drew graphics as
designs-in-themselves." The implication was that 'any variable quantity could be placed in relationship to any
other variable quantity, measured for the same units of observation."
The examples of these relational graphs from your economics books are numerous, although as Tufte pointed out,
these graphs are not too frequent in the popular press. When he examined 15 news publications for the years 19741980, it was in Japan and Germany where we saw highest use of relational graphics, but even there the number of
statistical graphics based on more than one variable that were not time-series or maps, ranged between 5 and 10
percent. One of my favorites, The Economist, had only 2 percent, while the New York Times and Time had .5
percent and 0 percent. In a similar review of college and high school textbooks, Tufte found relational graphics to
be significantly more common, with the "leaders" being 77 percent in the high school text, Chemical Principles by
2
William Masterton and Emil Slowinski and 82 percent in the college text, Statistics: A Guide to the Unknown by
Judith Tanurnum. Among the three economics texts reviewed, relational graphics accounted for 16 percent of the
statistical graphics in the classic college text by Samuelson, about midway between what he found in two high
school texts.
It's now time to try your hand at data graphics which Tufte describes as a "visual display of quantities by means of
the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color." As it turns
out, it is a good thing the creator of the graph has so many dimension to use since in the majority of cases what we
are interested in representing are multivariate relationships on sheets of paper-what Tufte calls the flatlands. When
you see a graph, you should think that behind this graph is a table of numbers and that the creator of the graph was
attempting to make it easier for the reader to see the story / pattern that existed in the data. As Bowen indicates,
"Graphs are intended to make it easy to read, understand, and remember a relationship found in a set of data." To
get there, however, we need to look a little more closely at the mechanics of the various graphs. Because we will
use graphs, you will need to be sure you have mastered graphs, and before you get out there and create some
graphs for me you should check out the guidelines based on previous student-generated graphs that can be found at
the end of this unit. Be sure to read it so you learn from the efforts of previous students and avoid their mistakes,
and in the unit that follows you will find a sample of graphs I find interesting and provocative.
Some Basics of Graphing
When do we use a graph and how do we create them? The short answers are sparingly and with care, but let's look
at them a bit more carefully. On the when issue, I suspect your experiences have been similar to mine - you have
seen some impressive graphs that immediately invoke an image, and you have seen some losers that convey
virtually no information. Experience has convinced me of the importance of graphs, and of the ease with which
they can be abused. As for the how, you now have at your disposal a number of software packages that allow you
to produce some impressive visuals.
As a starter, we can look at texts, equations, tables and graphs as alternative means of presenting information and
you must decide which to use. First, equations are to be used only when it is designed for a professional audience a
graph. You will find them in economics journals, and scientific and medical journals, but not too many other
places. Second, the choice between tables and graphs, which you do see often in presentations, is not something
we can easily capture in some checklist, but there are some guidelines. To help you in making that choice here are
two examples - Slippery Slope’s revenues and stock market data for APC.
Slippery Slope: Below you have a table and a graph containing data on revenues for Slippery Slope University
over a six year period.
Slippery Slope University Revenues
Revenue
($millions)
1991
100
1992
90
1993
92
1994
95
1995
98
1996
101
The advantage of the Table is we can determine the actual revenue figures for each year, something not possible in
the graph [you could put value labels on and solve this problem for some graphs]. For example, if you needed to
know revenue in 1992, you would want to use a table. The graph, on the other hand, has the advantage of a visual
representation of any relationship between the variables in the table. Although the growth in revenues is
'contained' in both the table and the graph, it is far more obvious in the graph, and if it were your intent to convey
the growth, the graph would have been your preferred choice. This is not a choice to be ignored since the
3
technology allows you to create, almost effortlessly, any graphical representation of the data, and in the section on
graphs we will look at this issue closely.
APC stock prices: Consider the stock market and the price of stocks. When the stock market is booming as it was
in 1998, everyone seemed to be watching stock prices. If you were interested in following your stock, you could
access the following information on American Power Conversion, a successful local company whose stock is
traded on NADAQ.
First let's look at a common tabular presentation that conveys information on the volume of shares traded that day
(801,600), the price of the stock at the end of trading (2311/16), the High and Low prices reached during the day
(237/8 to 227/8) and the net change from the beginning to end of day (+11/16). You also know that over the past
year the price has ranged from a high of 31.5 to a low of 8.5. Taken together we have 9 pieces of information that
are conveyed in the table.
52 Weeks
Vol
Hi
Low Symbol 100s High Low Close Net Change
31.5
8.5
APEX 8016 237/8 227/8 2311/16 +11/16
Now turn your attention to the following time-series graphs of APC's stock price and the volume of shares traded
for the past 200 days. In these graphs there are about 400 pieces of information presented, far more than you
would be able to present in a table. When we want to convey a large volume of data, it is most likely that we will
want to use a graph.
If you look at the graph you will note there is something important missing-the actual stock price. While the graph
allows us to look quickly for a pattern in the past prices of APC's stock, it does not allow us to quickly determine
the price yesterday. This is another of the important differences between graphs and tables - when you need a few
precise numbers, a table will probably serve the purpose, but when we want to demonstrate a relationship between
two variables, we would want to use a graph. In the graphs above we are able to look quickly at the relationship
between stock prices and time, volume and time, and with a little effort, the relationship between volume and
price. It takes only seconds to realize that APC's price was falling for most of 1997 after more than doubling
during the last four months of 1996. As for volume, there were three episodes of unusually high volume, and it
would look as though sharp price changes accompanied the increased activity levels.
You will also find economists tend to use graphs to look at the logical implications of some of their ideas /
theories. One particularly interesting historical example was produced Dr. John Snow who was concerned with the
spread of cholera in London. Two obvious ways of conveying the information on cholera deaths would be the line
graphs describe the deaths each day and the cumulative number of deaths. Neither of these, however, helped
convey Dr. Snow's theory of cholera, that is was the result of the mixing of drinking water and sewage. To convey
this relationship Dr. Snow produced a data map, which clearly indicated the deaths tended to be concentrated
4
around one water pump. This demonstration was so convincing, the authorities removed the handle from the pump
and the epidemic ended.
As we go forward keep in mind that reading graphs is like riding a bike – it is difficult in the beginning and you
might fall a few times – but with practice you can master it and then it seems very easy. All graphs are pictures of
relationships – so start with figuring out what are the variables in the relationship. Also make sure you recognize
the difference between describing a graph and explaining it. Describing it usually involves phrases like “it was
rising” or “they are inversely related.” You are simply describing the pattern you see. Explaining is more difficult
because you are then trying to identify what might explain the pattern you are seeing, and this usually means you
need to have some outside information. In this course you will be asked to do both.
Now let's move on to a discussion of individual graph types. Once you have decided to use a graph, you must then
decide what type of graph to use. It should not surprise you that technology will allow you to produce almost any
type of graph, all you must decide is which one to use and which one does a best job of conveying the information
you would like the viewer to know.
A Sample of Graph Types
Pie Graphs
Pie graphs are good with cross-section data where the primary concern is allocation - the parts of a whole. If you
use a pie graph you MUST be graphing something that when you add up all of the pieces you have something
meaningful. For example, you could construct a pie graph showing how much of your total spending is on food,
clothing, recreation, education, and transportation. In the example below, we have a pie graph of regional
population shares in the United States in 1990. New England and the Mountain regions had the smallest
populations, each with 5 percent of the nation's population, while the East North Central and South Atlantic
regions were the largest with 18 percent. You cannot, however, determine from the graph the level of population.
Bar / Column Graphs
Bar and column graphs are appropriate with non continuous variables - a good example being population growth
for the nine US census regions. There is no notion of continuity as we move from one region to another so we
would not want to use a line graph. Below we have a bar graph of the distribution of US population by regions, the
same information as displayed in the pie graph - just a different spin. In both graphs we would see the two largest
regions are the East North Central and the South Atlantic, each with populations over 40 million. The difference
here is we know the population in the regions, but we do not know the shares. Another thing to keep in mind with
the bar graph is that the order of the variables matter. The regions in the graph below are not randomly chosen –
they are ordered by geographic location. You begin in the Pacific region (PA) and move through the Mountain (M)
all the way east to New England (NE).
We could go beyond this to look at stack graphs that convey information that would normally appear in a
bar/column graph and a pie graph. Below is a stack graph where the height of the columns provides us information
5
on total US population in 1900, 1950, 1980, and 1990, while the division of the column allows us to guesstimate
the regional population figures. What is not that easy to see here is what has happened to the share of population in
a region. For example, what has happened to the share of the US population in the Pacific region between 1900
and 1990? This information could be seen with a stack graph that looks at only the composition for each year so
the columns all are the same height at 100%.
Scatter Diagram
A third graph is the scatter diagram, the graph that for reasons unknown to me, seems over the years to have
caused students the most problems. Scatter diagrams are designed to provide a visual image of a potential
relationship between any two variables. To better see how to construct and interpret the graphs, assume you have
decided to undertake a study to determine the relationship between the variables X and Y. As a first step, you
collect eight observations on the variables for the years 1981 through 1988 – maybe the relationships between
consumption spending and income, quantity demanded and price, exchange rates and imports, inflation and
unemployment, or the budget and trade deficits. What you want to know is whether or not these data support the
hypothesis that there is a relationship between Y and X?
1981
1982
1983
1984
1985
1986
1987
1988
X
x1
x2
x3
x4
x5
x6
x7
x8
Y
y1
y2
y3
y4
y5
y6
y7
y8
Below you will find four possible 'patterns' that could emerge from your analysis. In each diagram the points
correspond to the individual years with the point corresponding to 1983 having been marked in each diagram.
What you are looking for is a pattern that could be approximated by a line because this would be evidence of a
relationship between X and Y. In the first diagram the points tend to be loosely scattered around the positively
sloped line, while in the second diagram the points seem to be more tightly packed around the negatively sloped
line. Based on these findings we would be led to conclude in the first case that there is weak evidence that X and Y
are positively related, while in the second there is strong evidence of a negative relationship. In diagram c, where
the scatter of points resembles the scatter of darts thrown by a novice, there is little evidence of any relationship as
the points seem to be randomly distributed. Finally, in diagram d, the data suggests there is evidence of a positive
relationship between y and x, but it a nonlinear relationship. This is the type of relationship we would expect in a
study of the income-consumption spending relationship.
6
Before leaving scatter diagrams behind, let us turn to the specific problem of determining the relationship between
the inflation rate and interest rates. Economic theory leads you to believe interest rates (r) and inflation rates (i) are
positively related, an increase in inflation rates pushing up interest rates. To test this theory the data on interest
rates and inflation that appear in the accompanying table were collected. Do these data support the hypothesis
there is a relationship between interest rates and inflation?
Year
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
Interest Rate
14
10.7
8.6
9.6
7.5
6
5.8
6.7
8.1
7.5
6.1
Inflation
10.3
6.2
3.2
4.3
3.6
1.9
3.6
5.8
4.8
5.4
3.9
The scatter diagram generated by these data is presented below, where once again each point represents one year.
For example, the highest point on the Inflation and Interest Rate scatter diagram corresponds to 1981 when the
interest rate was 14 percent. As you can see, there does tend to be a relationship between the two variables. The
scatter of points tends to rise as we move to the right; as I increases r tends to increase, but certainly not in a way
that can be easily captured by some linear relationship. This is where you would need to call on some of the
techniques that you learned in statistics, but we will leave that for a later date.
Time-Series Graph
A time-series graph is a line graph where time is measured on the horizontal axis. You can see the importance of
these graphs by browsing through your economics textbooks and seeing how often they appear. As you study these
graphs, ask yourself whether you have a systematic approach to analyzing them? You should, if you are to extract
from the graph the information that is embedded in it. To get a feel for how easy generating a time-series graph
has become, you should check out the on-line site economagic and create a graph in a matter of a few minutes.
To extract the information embedded in any time-series graph, you should begin by conceptually decomposing the
movement of the variable being graphed into its separate components: long term trends, short term cyclical
movements, seasonal patterns, and unexplained fluctuations called noise. For example, if we were looking to
forecast retail sales for a state, we could expect the change in sales in any month would be larger if the state were
in the midst of a period of sustained growth rather than decline (Florida vs. North Dakota), if the State economy
was in the midst of a recovery rather than in the throes of a recession, if we were talking about December rather
than January. [An example of the decomposition of a time series can be found in a discussion of how the time
frame in any analysis affects the results - the When question].
When decomposing the movement in sales of widgets, economists' favorite hypothetical good, it is easiest if you
begin with the trend. This would usually be reflected in the 'average slope’ that can be determined by the
unsophisticated eyeball approach or the sophisticated regression approach. Does the curve slope up, does it slope
down, or is there a definite change in the direction or the magnitude of the slope over time? These are the
7
questions you must attempt to answer at the outset of your analysis. Comments such as, wages have grown on
average approximately 4 percent per year for the past two decades or that labor productivity rose at a rate of
approximately 2 percent per year during the post W.W.II period, refer to these trends. In the graph below, the
underlying trend appears to be positive throughout the entire time under review.
It is clear from this graph, however, there are also significant variations about this trend, variations that are
referred to as cycles, business cycles to be more precise. These business cycles are a pervasive feature of capitalist
economies. The reoccurring pattern of recessions (R) followed by expansions (E) characterizes nearly all measures
of economic performance in nearly all economies. If you have any doubts, just look at the graphs of inflation,
unemployment, interest rates, budget deficits, trade deficits, and exchange rates that you will find in your
macroeconomics texts. You will see that the peaks and troughs of the various time-series graphs often tend to
coincide with each other.
Two examples of 'real' world time-series graphs appear below. In the first we can see the history of inflation in the
United States over the past 120 years. When trying to find the “story” in a time series graph I suggest you look for
cycle and trends - the short-term and long-term patterns. You should make a list of what you 'see' in the graph, and
then read on. I see two peaks, one associated with WWI (1916-1920) and one following WWII (1946-1948). [One
of the limitations is I cannot identify exactly what year it is peaking, but if this mattered, it may be best to use a
table. Some other features of the graph are: (1) there are no negative numbers (deflation) after 1948; (2) there is an
upward trend beginning approximately 1960 and ending in approximately 1980; (3) there is a downward trend
extending from 1980 through the early 1990s.
In the second time series graph we have both the interest rate and the inflation rate. You would construct this graph
if you were looking to find relationships between relationships. For example, the inflation line describes the
relationship between inflation and time, while the second line describes the relationship between interest rates and
time. What I see is they have similar patterns - interest rates tend to rise when inflation rates rise (1970s) and tend
to fall when inflation rates fall (1980s).
8
Before looking into the specifics, however, we will look at two important features of graphs that deserve special
attention when you are reading them or creating them – the difference between causality and correlation and the
importance of the scale of axes in a graph.
Line Graph
Of all the possible graphs, the most popular among economists is the line graph. Unfortunately it is not one of the
easier graphs to understand. Given it is an important and frequently used tool and the lack of a widespread
understanding of the tool, we'll spend a little time working on the basics of the line-graph.
The first step in creating a line graph involves specifying the axes of the graph. There are two number lines that
are perpendicular to each other (blue lines). Each point on the graph represents one combination of values for the
two variables. In the graph below you will see two points. If we assume that X is being measured on the
horizontal axis and Y on the vertical axis, then at point #1 the value of X = -3 and the value of Y = 5. At point #2,
X = 4 and Y = -2.
Now let's answer the question: What happens to the values of X and Y as we move from point #1 to point #2.
First you should note that the direction is down and to the right. Down means less of Y while to the right means
more of X. The movement from #1 to #2 therefore is a visual representation of an increase in X and a decrease in
Y. Since they move in opposite directions, you would say that they were negatively related.
Now let's talk about some graphs. The place to start is by recognizing that behind every line graph is a table and
behind every table is a story. So let's start with a simple story and work our way to a table and then to a graph. The
story is about grades and the relationship between grades and study time. At this time in your educational career
you probably believe there is a positive relationship between grades and time spent studying. To make life easy,
let's make it a very simple relationship. If you study no hours, you get a zero and for each hour studying you get a
5-point increase in your grade. What would the table and the graph look like?
The table and the graph appear below. In the table we can easily check out the story. As we move down one row
we are looking to see what happens with one more hour of study and we see that the grade increases by 5
points. We get the same number whenever we compare two rows - it is always 5 more points for another hour of
study? What we have is a constant rate of increase - every time we increase time spent studying by one hour, the
grade increases just as fast. So what does the picture of this relationship look like? Don't cheat and look
ahead. First see if you have an image of what the picture of the relationship would look like. You know that this is
a positive relationship - when study time increases the grade increases. What will you see in the graph that
conveys this information?
9
Time
0
2
4
6
8
10
12
14
16
18
20
Grade
0
10
20
30
40
50
60
70
80
90
100
What you see is a line sloping upward from left to right. This is the "picture" of a positive relationship. Each point
on the line corresponds to a line on the table. The red line corresponds to the row with ten hours of study time. If
you read over from the graph you see that at 10 hours of study time, the grade will be 50. Now let's look at the
slope - the steepness of the line. A steeper line is said to have a greater slope, but what does that mean in terms of
our story? Let's look at the graph below and work back to a table and then a story. We see the line is still positively
sloped so we know that there is still a positive relationship. We also see that the curve is steeper, which tells us the
improvement in the grade for each hour spent studying is now greater. How do we know that? From high school
algebra you know that slope = rise/run. So let's pick two points. The first is the origin where the grade is zero for
zero hours spent studying. The second is at five hours of studying. In the old graph (Blue line), the grade is
25. The grade on the new graph (green line) is 30. The slopes of the two lines would be:
slope of Blue line = rise/run = ΔY/ΔX = (25-0)/(5-0) = 25/5 = 5
slope of Green line = rise/run = ΔY/ΔX = (30-0)/(5-0) = 30/5 = 6
The steeper slope means each hour spent studying in the new situation increases the grade by 6 points. We would
draw a steeper slope if we want to demonstrate a situation where changes in X (hours) created greater changes in
Y (grades).
You have now worked both ways - translating from words to graphs and from graphs to words. This is a skill that
comes with practice and you should consider getting some practice if you are not real comfortable with the
translations. For some practice consider the following two "extensions" of the model.
The first is a nonlinear situation - a line that is not straight. To see what we have here let's try to create a table and
a graph that correspond to the following "story." You will get 14 points without any study time and the first two
hours spent studying will give you 13 additional points, the second 2 hours will give you 12 additional points, the
third two hours will give you an additional 12 points, ... You can see the pattern. What will the table and the graph
look like?
10
Time Grade
0
14
2
27
4
39
6
50
8
60
10
69
12
77
14
84
16
90
18
95
20
99
Let's look at the table first and see what the table tells us about the rate of change. When we increase study time
from 0 to 2 hours, the grade increases by 13 points. When you increase from 10 hours to 12 hours the grade
increase by 8 points. What has happened to the rate of increase? The rate of change, or slope, for the two points
is:
slope at 0 = rise/run = ΔY/ΔX = (27-13)/(2-0) = 14/2 = 7
slope at 12 = rise/run = ΔY/ΔX = (77-69)/(12-10) = 8/2 = 4
The rate of increase has gone down. In fact as we increase the hours spent studying, the improvement to the grade
continues to fall. The rate at which the grades increase declines as the time spent studying increases. Now how do
we see this on the graph? Let's look. The change in the rate of change will show up as a change in the
slope. Because the rate of change slows as we increase hours studying, the slope decreases as we move to the right.
Now to that second extension. When we draw a graph showing a relationship between two variables (Grade and
hours), this does not suggest that there are not other factors that influence the relationship. Can you think of other
factors that might alter the relationship between hours spent studying and grades? How about amount of serious
drinking? I suspect that heavy drinking would decrease the value of your time studying. What about watching TV
while you are studying? How about a study group? And what about a scale? What about the type of questions?
We'll look at the effect of the type of questions. Let's assume you will get a mix of questions with more easy
questions. What will happen to the relationship between the grades and the time spent studying? You should get a
higher grade with the new questions - precisely what you see in the table below. Regardless of how many hours
are spent studying, the grade is higher for the easy test. So how does this look on the graph? The entire curve
shifts. When one of the "other" factors that influences grades change, the relationship between grades and hours
changes which shows up as a new line.
Time Difficult Easy
0
0
5
2
10
15
4
20
25
6
30
35
8
40
45
10
50
55
12
60
65
14
70
75
16
80
85
18
90
95
11
Correlation vs. causality
Regardless of the variables being graphed, caution must be taken when interpreting the 'results'. As a case in
point, consider the graph of interest rates and wage growth rates. The similarity in the trends and cycles of interest
rates and wage growth suggests there might be a direct relationship between the two phenomena. In the 1960s and
1970s interest rates and wage growth rates tended to rise together and in the 1980s, they both dropped
precipitously.
In fact, there is no direct relationship. These two phenomena appear to be related because they are both related to a
common third factor, the inflation rate. When inflation rates are high, both wage growth and interest rates tend to
be high, while all three tend to fall together. In normal times one can expect lenders of money will charge interest
rates to borrowers that are somewhat above inflation rates. Workers, meanwhile, can be expected to bargain for,
and receive, wage increases that exceed inflation as compensation for productivity increases. Neither lenders nor
workers will normally accept less than the inflation rate because by doing so they would be losing buying power.
Now you have the answer to the time-share sales person who tries to "scare" you into buying a time-share by
comparing the future price of the vacation with your current income, a very unfair and unjustifiable comparison. It
should be clear the driving force behind the rise in hotel room prices is inflation, which will also drive up your
wages. Given the close historical association of wages and prices, it would seem reasonable to assume your wages
would also keep rising along with the price of the hotel rooms. It is very likely the $5,187 your hotel room was
projected to cost in ten years would not represent any more of a financial strain then the current $2,000. If your
earnings, which are now $4,000 a month, followed the same pattern of growth then you would be earning in
excess of $10,000 a month in ten years. And if this is not enough, you can see the historical record suggests that 10
percent a year inflation is unlikely to be a reasonable forecast for inflation, but more about that at a later time.
Scale distortion
An important choice one must make when presenting data is the appropriate scale for the graphs. When you look
at a graph, if you are anything like me, your initial interpretation of the relationship between the two variables is
influenced by the steepness of the curve. For example, consider the time-series graphs of sales for companies Y
and X. Based on these graphs I suspect that you would be predisposed to believing that Company Y had
experienced the sharpest sales decline.
12
The truth of the matter is, however, there is no reason to believe this because the numeric scale on the axes are
excluded in the two graphs. To see the significance of this, consider the following simple example of beer demand.
The columns representing demand differ only in that in the first column output is measured in terms of quarts,
while in the second pints are the unit of measurement. At a price of 20, demand will be 10 quarts or 20 pints, and
as the price falls to 16, demand will increase by 2 quarts or 4 pints.
Beer Demand
Price Demand Demand
(Quarts) (Pints)
20
10
20
16
12
24
12
14
28
8
16
32
4
18
36
0
20
40
When we graph these relationships we can see the influence of the units of measurement on the steepness of the
demand curves. The demand curve, when specified in terms of pints, would appear far more responsive to price
change, but as we know from the data above, there is no difference. This is why economists tend to favor
elasticities as measures of responsiveness rather than slope, something we will talk more about later.
We are now ready to return to the Ansylium problem for another look at the effectiveness of the drug. Was there a
sharp drop in the blood level of chemical X following the introduction of Ansylium? It depends on which of the
two diagrams below better describes the situation. Although both diagrams have identical shapes, in Diagram A
there is a 50 percent decline in X between 1984 and 1990, while in Diagram B the decline is only 2 percent. So be
careful when reading a graph. Be sure to read the labels and the units of measurement before you jump to any
hasty interpretations.
Now let’s look at the basics of good graphs.
A Guide to Good Graphics
Not all graphs are created equal: some are wrong and misleading, some are boring, while others are WOW graphs
that capture the viewers attention.1 You want to be in the latter category especially if it is true that “we live in a
data driven world where the ability to create effective charts and graphs has become as indispensible as good
writing.” (Wong, 13) The good news is the technology exists that allows us to create “spiffy” graphics quite easily,
and we will look at how to create those graphs and the things to keep in mind as you are creating the graphs. The
bad news is there are too many graphs that are high on techniques such as 3-D and low on informational content
and persuasiveness. There is a good deal of !@#$%^ out there produced by those who thought little about the story
13
they were trying to convey, so as we move forward keep in mind that we are talking abut story telling with the
goal that your graphics will add to the stories you are telling.
One book that is devoted to this is Dona Wong’s, The Wall Street Journal Guide to Information Graphics (Norton
2010). Wong identifies four key steps in the process that are worth keeping in mind as you work on your graphics
– research, edit, plot, and review. Good graphics start with the research that gets you the underlying data that must
be of high quality. Second you must edit the data since it is unlikely to be in the form that is most useful, an issue
discussed in the opening unit. Third you need to plot the data and this will / should involve the creation of a
number of graphs from which you will pick the best. Think of it as a photo shoot that produces an image in a
magazine. If you ever watched what goes into it you realize there were hundreds of photos taken to create one
photo in the magazine, which is why they look so good. You will need to do the same, and fortunately the
technology allows you to create the graphs easily and quickly. Finally, you need to review the graphs and make
sure they are correct since too often I see student work with errors – and given the adage “a chain is as strong as its
weakest link – if your graph is trash then so is the rest of your story.
Another good source would be the works of Edward Tufte. According to Tufte, graphical displays should:
1.
2.
3.
4.
5.
6.
7.
induce the reader to think about the substance rather than about methodology, graphic design, the technology
of graphic production, or something else
avoid distorting what data have to say
present many numbers in a small space
make large data sets coherent
encourage the eye to compare different pieces of data
serve a reasonable clear purpose: description, exploration, tabulation, or decoration
be closely integrated with the statistical and verbal descriptions of a data set
Now let’s look at a few things to keep in mind when constructing a graph.
Checklist for your graphs
1.
Story matters: Most graphs are part of a story – either a written document or a presentation – and you need to
make sure that you do a good job of integrating the graphic into the story. This is a BIG problem with student
presentations and reports. If you are to use a graph then it should provide some insight into the story. For
example, below is a graph of the Shiller Home Price Index for Miami, Florida. Based on this graph you can
tell a story about the run up in home prices in the decade ending in 2006 and the collapse in the following two
years. Home prices in Miami tripled in the decade, while they fell about 50% before bottoming out in 2011.
2.
Type matters: Make sure you create the best graph for the data. Be sure you understand the difference
between continuous and discontinuous variables since the graph you use depends upon the difference. You
would not use a line graph when looking at data for different age brackets, states, or racial and ethnic groups.
These are all discontinuous variables so you should use column / bar graphs rather than line graphs. If you are
14
looking for a comparison of two or more variables over a specified periods of time use a line graph and not a
column graph. I trust you can see the difference.
3.
4.
5.
Size matters: Be careful to avoid graphs that are too small. You should also make sure the font is large
enough in all of the text in the graph. If it is a PPT presentation make sure you have checked it out from the
distance the audience will be sitting at the presentation.
Complexity matters: Do not try to do too much. Graphs should not include too many variables because when
you have too many variables the noise level in the exhibit rises. You do not want a pie graph with too many
slices, a column graph with too many columns, or a line graph with too many lines. Below is a time series
graph of unemployment rates in the New England states. There are some subtle differences across states, but
the number of lines in the graph make it difficult to see those subtle differences. It might be best to compare
one state to the US or regional average rather than the eight lines.
Titles matter. Make sure titles make convey accurate information and that the reader has enough information
to understand the relationship being represented in the graph. Be careful with your title since very often in
student work it does not correspond to the data being graphed. And do not forget, spelling matters.
• Use abbreviations when possible. When looking at regions in the US, the titles could be
abbreviated to NE. MA, ENC, which would give much more space for the graph.
• If the data is in millions, thousands... then you should put this in parentheses below the title or on the axis
where the data are. An example would be:
15
•
6.
7.
United States Population in Billions at the Beginning of Each Decade
v
US Population: 1960-2000
(millions)
Make it short. There is no need for the following title (and I did not make the typos).
Number of Government Workers at the Beginning of Each Year for the a 10 Year Period
to be replaced with
Government Employment
Axes matter: Be careful about the axes. Sometimes Excel specifies axes it wants, not ones you want. As an
example, one column graph had columns designated as 1, 2 , 3…., but this did not match the data or the
description. Be careful to look at the graph and make sure it is what you want.
Real estate matters: Think of a graph’s axes as defining a square, and your job is to adjust the default scales
so that the graph occupies as much of the real estate as possible. Do not use too much for the legend, and
make sure you do not use too many integers in the numbers on the axes. For example, when looking at
incomes you probably do not want 345.67546. You could probably use 345. The same is true when the table
includes GDP numbers like $3,980,000,000,000. In this case you would want to have the numbers in billions
and have $3,980. Do not include excessive 0s in the graph and do not include unnecessary digits. Also, be
sure to choose the scale to fill in the real estate. The impact of the scale can be seen in the graphs below.
8.
Colors matters Choose your colors carefully and assume the chart will be copied so make sure it looks good
in black-and-white. Avoid using a variety of colors on a graph and think of using different shades of the same
color – and use darker shade when you want to emphasize something. In a line graph make sure you use
different line types so when it is copied the reader will be able to identify the lines. Also be careful if the
graph will be reproduced in color. The graph on the left makes it easy for us to separate the two lines, but not
the one on the right. Use different styles and widths for the lines if it is going to be in black and white.
9.
Accuracy matters: Make sure the graph makes sense. The graph below makes no sense to me because I can’t
really come up with a story that goes with it. This is another example of looking at what you get from Excel
before going public with it.
16
10. Interesting matters: If a graph can be explained with one or two sentences, then you do not need a graph. If
there is no pattern that "jumps" out at you then it is not clear this information is best shown in a graph. It
might be better as a part of the discussion with no graph. You could transform the data and this might create
an interesting story. Below you will see a graph of the CPI in Italy on the left - not the type of graph that
should be part of any presentation. What we see is the price level rising - but this is no surprise - and we
described it in one sentence. On the right we have a graph based on the same data. The inflation rate is the
percentage change in the price level so it was an easy transformation. But here there is a story - a dramatic
decline in the inflation rate in the early 1980s, and then another decline in the late 1990s. There is a story
about the euro here and the OPEC inflations of the late 1970s and the change in monetary policies in the early
1980s to mention a few that you could not see in the price level graph.
11. Effort matters: Anyone using excel knows what the default settings are and seldom are they the ones you
want so a reader will know that you have not taken the time to “clean up” the graph. When you generate a set
of graphs as you look for the correct one you needn’t do this, but once you have settled on a graph you need to
work on it. The fact is everyone who has done Excel graphs knows the default settings so make sure the
reader knows you took the extra effort to make your graph stand out from the crowd. Using the default
settings in a graph would be similar to using the Word resume template when constructing a
resume. Everyone in the business of hiring knows these forms, and you probably do not want to use one if you
are trying to separate yourself from the crowd. here are a couple of things to consider in that work.
• Use the gray scale and the gridlines at your peril. Unless you have a good reason to keep in the gridlines
and gray scale that are the defaults on excel, get rid of them. These are just distractions that take away
from your graph.
• Legends are defaults, but many times you do not need them. There is no reason to put a legend on a graph
when there is only one variable. Also, if you have a few variables you may want to try to write then onto
the graph and eliminate the legend. Compare the two graphs below and I am certain you will see a BIG
difference.
17
A Critique of Some Graphs
Given the supply of graphs that are not quite ready for prime time, I am trying another approach. In this section I
have included a number of graphs that have been submitted along with my comments on those graphs. Some of the
comments have to do with the variables and some have to do with the graph itself and I have chosen a few graphs
to point out the features that are mostly covered in the reading.
Graph 1
1.
2.
3.
Titles/legends: Make sure the titles are efficient – they are correct and as short as possible. This looks like a
good title until I see the legend that has production rather than consumption. Also, if this is actually
production, I would eliminate “Production” from the legend. When using data from an online source it is
common to reduce the name when creating the graph. What you need to do is change the title on the
spreadsheet column and it carries over to the graph. There is also no need to have 2013 in the horizontal scale
since you could do this with putting 2013 into the title.
Gridlines: This graph does double duty. The gridlines do nothing to help the reader so get rid of them, which
we saw is very simple.
Scale: Look at how much space is unused in the graph. There is no reason not to change the scale from 100220 so the data will fill the graph.
U.S. Primary Energy Consumption (BTU quadrillions)
18
Graph 2:
1. Kiss (keep it simple). You can pull out all of the stops, but I suggest keeping it simple. Here is an example
where the colors are overkill because they add nothing to the story being told. One thing I have done,
however, is if I wanted to emphasize one regions / state/ country I would change the color of one column to
focus attention on that one.
Graph 3
1. Variable selection: This is a valid scatter diagram because we have two variables, but it does not tell us much
about a relationship between the two variables because they are probably both driven by a third variable –
population. We would expect more juvenile and criminal cases in big states, so we have nothing here that is
interesting.
2. Title: The title needs work since this does not give me a good idea of what I am looking at in the graph.
19
Graph 4:
1. Title: The title is ambiguous. I think it represents the rate of smoking in each state, but it might not. Also there
is no date NE usually means New England, but in this case it means Northeast. Also, you need to have female
in the legend.
2. Variable choice: Why have a Total column in the graph? It is a distraction for the reader and if we have both
males and females, then we have the picture with just those two and this would make it easier to see
differences across the states. For example, why is there such a small difference between M and F smoking
rates in RI? This jumps out easier without the total column.
Graph 4:
1. Title: The title needs to let us know the jurisdiction. Is this the US, California, Canada…? And you do not
want to use Trends because that is what you are looking for. I suggest simply Homicides or Number of
Homicides in US. You also do not need the years because you see that in the scale.
2. Scale: The scale for time should be changed to have fewer entries. For example, why not have 5-year intervals
rather than 1-year intervals. There is also the scale issue for the vertical variable. If you have a good title you
do not need to write it out on the vertical plus you could start at 10,000. I would also change the numbers from
10000 to 10,000.
3. Gridlines: The gridlines serve no useful purpose here, so get rid of them.
20
Graph 5
1. Legend: This is a good graph except there is no reason for the legend. Get rid of it
2. Title: Need more information in the title. Is this US?
3. Scale: the scale is good except in the numbers get rid of the decimal points.
Graph 6
1. Graph Type: This is a good example of a poor graph choice. The data here shows the number of people with
disabilities – at least I think that is what we have here. This would be best represented by a column graph. A
scatter graph should have two continuous variables, but states is not a continuous variable. What would work
as a scatter graph would be a graph with the rate of disability and the average age in a set of states.
2. Variables: The problem here is we again have a graph that tells us what we expect so there is no reason for it.
We see the number with disabilities is bigger in states with more people. A better variable would have been
the rate of disability to see differences between states.
21
Graph 7:
1. Graph type / title: I am not sure where the problem is, but there is a problem. If the title is right, then the graph
type is wrong because a pie graph is good for representing pieces of a whole. Crime rates would be better
shown with a column graph. If we actually have the numbers of crimes, then the graph would be OK, but the
title is wrong because we do not have rates. You also need to give the jurisdiction and I believe these are only
violent crimes so this should be in the title.
Graph 8:
1. Title: The title is confusing and some work needs to be one on the wording. The problem is it is a strange
variable that you do not often see. It seems to be getting at what we get at with a budget deficit that measures
the imbalance between income and spending. There is also a spelling error, which is pretty close to a fatal
mistake.
2. Graph type: The data appears to contain data for every other year because there are dots on only every two
years and the line connecting them is straight. Given this the time series graph is not the best choice, which it
would be if there were data for every year. In the case of data every two years I would suggest a column
graph, or get the data for each year.
3. Gridlines: They serve no purpose so get rid of them.
22
Graph 9:
1. Graph?: Here is an example of a graph that pretty much shows us it has no reason to be here. There are only
two categories and one is dominant, so writing that 98 percent of students are schooled in regular schools tells
us everything we need to know.
2. Title: Where are we and when are we talking about? You need this in the title.
Graph 10:
Here is an example of what happens with the copy/paste function. The first graph is a cut and paste while the
second is a cut / paste special (picture). The second one does a better job of holding the format that was in the
original, although you do lose a bit of definition.
23
A Sample of Interesting graphs.
If you are looking for some more “modern” examples of graphs, you should check out Data 360
(http://www.data360.org/index.aspx ) and The Economist magazine’s Graphic Detail
(http://www.economist.com/blogs/graphicdetail ). You may also want to check out Gapminder (animated data sets
that have shown up in Rosling’s TED videos). The first two graphs are from the Gapminder site.
24
25
26
27
1. As you begin your work with graphs, however, you should realize graphs can be used to distort as well as to
inform and you should be aware of what Tufte calls Graphical Integrity. According to Tufte, Graphical Integrity is
more likely to result if these principles are followed:
The representation of numbers, as physically measured on the surface of the graph itself, should be
directly proportional to the numerical quantities represented
Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write
out explanations of the data on the graphic itself. Label important events in the data.
Show data variation, not design variation
In time-series displays of money, deflated and standardized measurements are always better than nominal
units
Graphics must not quote data out of context
28