Graphs "The important thing is not to stop questioning" Overview One of the most popular techniques for visually displaying quantitative information is the graph, but you know this if you have opened an introductory economics textbook or read the financial sections of newspaper or magazines. Unfortunately, experience has clearly indicated there is a considerable amount of information lost in the translation between words and graphs. Stated somewhat differently, it has become painfully clear graphs are often misunderstood by many. On the input side, graphs often get in the way of student's learning of economics rather than aiding in their learning, and on the output side, graphs seldom add to the quality of student presentations / writing. And this is not peculiar to Economics. Richard Bowen, a Psychology researcher who has seen much the same thing in his discipline, has written an interesting little book, Graph It! How to Make, Read, and Interpret Graphs. In his book he talks of Graphicacy, graph literacy, as the goal for his readers. He recognizes, however, that to achieve this goal he will need to help many overcome Graphobia, the fear of graphs. There is also a need to motivate others to make the investment in developing the skills necessary to create and interpret graphs. Fortunately, these skills will pay off well beyond any single college course since more and more data are being presented in quantitative form and very often the data are presented graphically. The difficulty many people have with graphs is not surprising once you realize how recently we 'discovered' graphs. Until the 1800s graphical design was dependent upon a direct analogy to the physical world. The first graphs were maps. When you look at a sheet of paper, it is fairly easy to make the transition from the right-left and up-down movements on the page to the physical movements east west and north south. The grid lines on the page substitute for latitude and longitude. We tend to order things spatially and thus it was a natural to develop maps that mimicked that order. For example, look at the map of the URI campus at Kingston. Once you get yourself orientated to the north, if you turn to the right you will be able to look at the map and see what you will be looking at. If you are looking North while standing in Chafee and then turn to the right you will see Woodward and Tyler Hall, which is just what you see in the map. You will also see that Tyler is about twice as far from Chafee as Woodward so not only is direction retained, so is concept of distance (URI map). Similarly, if you have a map of the area surrounding URI (star), you will find Wickford is to the northeast of campus and Newport is approximately twice as far in the easterly direction. One of the earliest maps was produced in China in 1137 - nearly 400 years before comparable maps were produced in Western Europe. It was not until the late 1600's, however, that we began to see the emergence of maps to which data are added. An example of a data map that is available electronically appears below. What is missing is the legend, but we can see there is 'something' that sets apart the area extending from Virginia to New Hampshire, and California, while at the other end of the pole you find the lower Mississippi delta region (Arkansas, Mississippi, and Louisiana). Median Family Income by State Eventually, however, we were able to extend the reach of our visual representation of information beyond space to time and by 1786 we saw our first time-series graph in The Commercial and Political Atlas, by William Playfair. 1 Now we could visualize the passage of time with a graph-just as we had been able to envision the passage of space. Just as we could say the movement from Kingston to Newport was twice the move from Kingston to Wickford, now we can say that the movement from 1970 to 1980 was half the distance of the move from 1970 to 1990. Our ability at ordering of time allowed us to translate time-series relationships fairly easily into time-series graphs. It allows us to 'see' the relationship between interest rates and time in the graph below. The reader can easily see that interest rates peaked in 1980 and have been following a cyclical pattern downward. Occasionally, we can combine space and time in one graph. One very impressive example would be Charles Minard's graphical depiction of Napoleon's march into Russia, which we will talk about in class. As you look at the graph below, try to take note of how many variables are being graphed simultaneously. If you start at Kovno, the width of drown line conveys how many troops were in Napoleon's army - 442,000 - as it set off for Russia in June 1812. As you move along with the army you see the location of the major battles and the size of the shrinking army. The line switches colors at Moscow as the French abandon the city and retreat homeward. By the time they leave Moscow, Napoleon's 400,000+ army has been reduced to about 100,000, and the temperature is falling toward 0 degrees Celsius. So now we have location, size of army, date, direction, and temperature on the same graph. The final advance in the graphical display of information was to move beyond space and time to relational graphics. Again it was Playfair who "broke free of the analogies to the physical world and drew graphics as designs-in-themselves." The implication was that 'any variable quantity could be placed in relationship to any other variable quantity, measured for the same units of observation." The examples of these relational graphs from your economics books are numerous, although as Tufte pointed out, these graphs are not too frequent in the popular press. When he examined 15 news publications for the years 19741980, it was in Japan and Germany where we saw highest use of relational graphics, but even there the number of statistical graphics based on more than one variable that were not time-series or maps, ranged between 5 and 10 percent. One of my favorites, The Economist, had only 2 percent, while the New York Times and Time had .5 percent and 0 percent. In a similar review of college and high school textbooks, Tufte found relational graphics to be significantly more common, with the "leaders" being 77 percent in the high school text, Chemical Principles by 2 William Masterton and Emil Slowinski and 82 percent in the college text, Statistics: A Guide to the Unknown by Judith Tanurnum. Among the three economics texts reviewed, relational graphics accounted for 16 percent of the statistical graphics in the classic college text by Samuelson, about midway between what he found in two high school texts. It's now time to try your hand at data graphics which Tufte describes as a "visual display of quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color." As it turns out, it is a good thing the creator of the graph has so many dimension to use since in the majority of cases what we are interested in representing are multivariate relationships on sheets of paper-what Tufte calls the flatlands. When you see a graph, you should think that behind this graph is a table of numbers and that the creator of the graph was attempting to make it easier for the reader to see the story / pattern that existed in the data. As Bowen indicates, "Graphs are intended to make it easy to read, understand, and remember a relationship found in a set of data." To get there, however, we need to look a little more closely at the mechanics of the various graphs. Because we will use graphs, you will need to be sure you have mastered graphs, and before you get out there and create some graphs for me you should check out the guidelines based on previous student-generated graphs that can be found at the end of this unit. Be sure to read it so you learn from the efforts of previous students and avoid their mistakes, and in the unit that follows you will find a sample of graphs I find interesting and provocative. Some Basics of Graphing When do we use a graph and how do we create them? The short answers are sparingly and with care, but let's look at them a bit more carefully. On the when issue, I suspect your experiences have been similar to mine - you have seen some impressive graphs that immediately invoke an image, and you have seen some losers that convey virtually no information. Experience has convinced me of the importance of graphs, and of the ease with which they can be abused. As for the how, you now have at your disposal a number of software packages that allow you to produce some impressive visuals. As a starter, we can look at texts, equations, tables and graphs as alternative means of presenting information and you must decide which to use. First, equations are to be used only when it is designed for a professional audience a graph. You will find them in economics journals, and scientific and medical journals, but not too many other places. Second, the choice between tables and graphs, which you do see often in presentations, is not something we can easily capture in some checklist, but there are some guidelines. To help you in making that choice here are two examples - Slippery Slope’s revenues and stock market data for APC. Slippery Slope: Below you have a table and a graph containing data on revenues for Slippery Slope University over a six year period. Slippery Slope University Revenues Revenue ($millions) 1991 100 1992 90 1993 92 1994 95 1995 98 1996 101 The advantage of the Table is we can determine the actual revenue figures for each year, something not possible in the graph [you could put value labels on and solve this problem for some graphs]. For example, if you needed to know revenue in 1992, you would want to use a table. The graph, on the other hand, has the advantage of a visual representation of any relationship between the variables in the table. Although the growth in revenues is 'contained' in both the table and the graph, it is far more obvious in the graph, and if it were your intent to convey the growth, the graph would have been your preferred choice. This is not a choice to be ignored since the 3 technology allows you to create, almost effortlessly, any graphical representation of the data, and in the section on graphs we will look at this issue closely. APC stock prices: Consider the stock market and the price of stocks. When the stock market is booming as it was in 1998, everyone seemed to be watching stock prices. If you were interested in following your stock, you could access the following information on American Power Conversion, a successful local company whose stock is traded on NADAQ. First let's look at a common tabular presentation that conveys information on the volume of shares traded that day (801,600), the price of the stock at the end of trading (2311/16), the High and Low prices reached during the day (237/8 to 227/8) and the net change from the beginning to end of day (+11/16). You also know that over the past year the price has ranged from a high of 31.5 to a low of 8.5. Taken together we have 9 pieces of information that are conveyed in the table. 52 Weeks Vol Hi Low Symbol 100s High Low Close Net Change 31.5 8.5 APEX 8016 237/8 227/8 2311/16 +11/16 Now turn your attention to the following time-series graphs of APC's stock price and the volume of shares traded for the past 200 days. In these graphs there are about 400 pieces of information presented, far more than you would be able to present in a table. When we want to convey a large volume of data, it is most likely that we will want to use a graph. If you look at the graph you will note there is something important missing-the actual stock price. While the graph allows us to look quickly for a pattern in the past prices of APC's stock, it does not allow us to quickly determine the price yesterday. This is another of the important differences between graphs and tables - when you need a few precise numbers, a table will probably serve the purpose, but when we want to demonstrate a relationship between two variables, we would want to use a graph. In the graphs above we are able to look quickly at the relationship between stock prices and time, volume and time, and with a little effort, the relationship between volume and price. It takes only seconds to realize that APC's price was falling for most of 1997 after more than doubling during the last four months of 1996. As for volume, there were three episodes of unusually high volume, and it would look as though sharp price changes accompanied the increased activity levels. You will also find economists tend to use graphs to look at the logical implications of some of their ideas / theories. One particularly interesting historical example was produced Dr. John Snow who was concerned with the spread of cholera in London. Two obvious ways of conveying the information on cholera deaths would be the line graphs describe the deaths each day and the cumulative number of deaths. Neither of these, however, helped convey Dr. Snow's theory of cholera, that is was the result of the mixing of drinking water and sewage. To convey this relationship Dr. Snow produced a data map, which clearly indicated the deaths tended to be concentrated 4 around one water pump. This demonstration was so convincing, the authorities removed the handle from the pump and the epidemic ended. As we go forward keep in mind that reading graphs is like riding a bike – it is difficult in the beginning and you might fall a few times – but with practice you can master it and then it seems very easy. All graphs are pictures of relationships – so start with figuring out what are the variables in the relationship. Also make sure you recognize the difference between describing a graph and explaining it. Describing it usually involves phrases like “it was rising” or “they are inversely related.” You are simply describing the pattern you see. Explaining is more difficult because you are then trying to identify what might explain the pattern you are seeing, and this usually means you need to have some outside information. In this course you will be asked to do both. Now let's move on to a discussion of individual graph types. Once you have decided to use a graph, you must then decide what type of graph to use. It should not surprise you that technology will allow you to produce almost any type of graph, all you must decide is which one to use and which one does a best job of conveying the information you would like the viewer to know. A Sample of Graph Types Pie Graphs Pie graphs are good with cross-section data where the primary concern is allocation - the parts of a whole. If you use a pie graph you MUST be graphing something that when you add up all of the pieces you have something meaningful. For example, you could construct a pie graph showing how much of your total spending is on food, clothing, recreation, education, and transportation. In the example below, we have a pie graph of regional population shares in the United States in 1990. New England and the Mountain regions had the smallest populations, each with 5 percent of the nation's population, while the East North Central and South Atlantic regions were the largest with 18 percent. You cannot, however, determine from the graph the level of population. Bar / Column Graphs Bar and column graphs are appropriate with non continuous variables - a good example being population growth for the nine US census regions. There is no notion of continuity as we move from one region to another so we would not want to use a line graph. Below we have a bar graph of the distribution of US population by regions, the same information as displayed in the pie graph - just a different spin. In both graphs we would see the two largest regions are the East North Central and the South Atlantic, each with populations over 40 million. The difference here is we know the population in the regions, but we do not know the shares. Another thing to keep in mind with the bar graph is that the order of the variables matter. The regions in the graph below are not randomly chosen – they are ordered by geographic location. You begin in the Pacific region (PA) and move through the Mountain (M) all the way east to New England (NE). We could go beyond this to look at stack graphs that convey information that would normally appear in a bar/column graph and a pie graph. Below is a stack graph where the height of the columns provides us information 5 on total US population in 1900, 1950, 1980, and 1990, while the division of the column allows us to guesstimate the regional population figures. What is not that easy to see here is what has happened to the share of population in a region. For example, what has happened to the share of the US population in the Pacific region between 1900 and 1990? This information could be seen with a stack graph that looks at only the composition for each year so the columns all are the same height at 100%. Scatter Diagram A third graph is the scatter diagram, the graph that for reasons unknown to me, seems over the years to have caused students the most problems. Scatter diagrams are designed to provide a visual image of a potential relationship between any two variables. To better see how to construct and interpret the graphs, assume you have decided to undertake a study to determine the relationship between the variables X and Y. As a first step, you collect eight observations on the variables for the years 1981 through 1988 – maybe the relationships between consumption spending and income, quantity demanded and price, exchange rates and imports, inflation and unemployment, or the budget and trade deficits. What you want to know is whether or not these data support the hypothesis that there is a relationship between Y and X? 1981 1982 1983 1984 1985 1986 1987 1988 X x1 x2 x3 x4 x5 x6 x7 x8 Y y1 y2 y3 y4 y5 y6 y7 y8 Below you will find four possible 'patterns' that could emerge from your analysis. In each diagram the points correspond to the individual years with the point corresponding to 1983 having been marked in each diagram. What you are looking for is a pattern that could be approximated by a line because this would be evidence of a relationship between X and Y. In the first diagram the points tend to be loosely scattered around the positively sloped line, while in the second diagram the points seem to be more tightly packed around the negatively sloped line. Based on these findings we would be led to conclude in the first case that there is weak evidence that X and Y are positively related, while in the second there is strong evidence of a negative relationship. In diagram c, where the scatter of points resembles the scatter of darts thrown by a novice, there is little evidence of any relationship as the points seem to be randomly distributed. Finally, in diagram d, the data suggests there is evidence of a positive relationship between y and x, but it a nonlinear relationship. This is the type of relationship we would expect in a study of the income-consumption spending relationship. 6 Before leaving scatter diagrams behind, let us turn to the specific problem of determining the relationship between the inflation rate and interest rates. Economic theory leads you to believe interest rates (r) and inflation rates (i) are positively related, an increase in inflation rates pushing up interest rates. To test this theory the data on interest rates and inflation that appear in the accompanying table were collected. Do these data support the hypothesis there is a relationship between interest rates and inflation? Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 Interest Rate 14 10.7 8.6 9.6 7.5 6 5.8 6.7 8.1 7.5 6.1 Inflation 10.3 6.2 3.2 4.3 3.6 1.9 3.6 5.8 4.8 5.4 3.9 The scatter diagram generated by these data is presented below, where once again each point represents one year. For example, the highest point on the Inflation and Interest Rate scatter diagram corresponds to 1981 when the interest rate was 14 percent. As you can see, there does tend to be a relationship between the two variables. The scatter of points tends to rise as we move to the right; as I increases r tends to increase, but certainly not in a way that can be easily captured by some linear relationship. This is where you would need to call on some of the techniques that you learned in statistics, but we will leave that for a later date. Time-Series Graph A time-series graph is a line graph where time is measured on the horizontal axis. You can see the importance of these graphs by browsing through your economics textbooks and seeing how often they appear. As you study these graphs, ask yourself whether you have a systematic approach to analyzing them? You should, if you are to extract from the graph the information that is embedded in it. To get a feel for how easy generating a time-series graph has become, you should check out the on-line site economagic and create a graph in a matter of a few minutes. To extract the information embedded in any time-series graph, you should begin by conceptually decomposing the movement of the variable being graphed into its separate components: long term trends, short term cyclical movements, seasonal patterns, and unexplained fluctuations called noise. For example, if we were looking to forecast retail sales for a state, we could expect the change in sales in any month would be larger if the state were in the midst of a period of sustained growth rather than decline (Florida vs. North Dakota), if the State economy was in the midst of a recovery rather than in the throes of a recession, if we were talking about December rather than January. [An example of the decomposition of a time series can be found in a discussion of how the time frame in any analysis affects the results - the When question]. When decomposing the movement in sales of widgets, economists' favorite hypothetical good, it is easiest if you begin with the trend. This would usually be reflected in the 'average slope’ that can be determined by the unsophisticated eyeball approach or the sophisticated regression approach. Does the curve slope up, does it slope down, or is there a definite change in the direction or the magnitude of the slope over time? These are the 7 questions you must attempt to answer at the outset of your analysis. Comments such as, wages have grown on average approximately 4 percent per year for the past two decades or that labor productivity rose at a rate of approximately 2 percent per year during the post W.W.II period, refer to these trends. In the graph below, the underlying trend appears to be positive throughout the entire time under review. It is clear from this graph, however, there are also significant variations about this trend, variations that are referred to as cycles, business cycles to be more precise. These business cycles are a pervasive feature of capitalist economies. The reoccurring pattern of recessions (R) followed by expansions (E) characterizes nearly all measures of economic performance in nearly all economies. If you have any doubts, just look at the graphs of inflation, unemployment, interest rates, budget deficits, trade deficits, and exchange rates that you will find in your macroeconomics texts. You will see that the peaks and troughs of the various time-series graphs often tend to coincide with each other. Two examples of 'real' world time-series graphs appear below. In the first we can see the history of inflation in the United States over the past 120 years. When trying to find the “story” in a time series graph I suggest you look for cycle and trends - the short-term and long-term patterns. You should make a list of what you 'see' in the graph, and then read on. I see two peaks, one associated with WWI (1916-1920) and one following WWII (1946-1948). [One of the limitations is I cannot identify exactly what year it is peaking, but if this mattered, it may be best to use a table. Some other features of the graph are: (1) there are no negative numbers (deflation) after 1948; (2) there is an upward trend beginning approximately 1960 and ending in approximately 1980; (3) there is a downward trend extending from 1980 through the early 1990s. In the second time series graph we have both the interest rate and the inflation rate. You would construct this graph if you were looking to find relationships between relationships. For example, the inflation line describes the relationship between inflation and time, while the second line describes the relationship between interest rates and time. What I see is they have similar patterns - interest rates tend to rise when inflation rates rise (1970s) and tend to fall when inflation rates fall (1980s). 8 Before looking into the specifics, however, we will look at two important features of graphs that deserve special attention when you are reading them or creating them – the difference between causality and correlation and the importance of the scale of axes in a graph. Line Graph Of all the possible graphs, the most popular among economists is the line graph. Unfortunately it is not one of the easier graphs to understand. Given it is an important and frequently used tool and the lack of a widespread understanding of the tool, we'll spend a little time working on the basics of the line-graph. The first step in creating a line graph involves specifying the axes of the graph. There are two number lines that are perpendicular to each other (blue lines). Each point on the graph represents one combination of values for the two variables. In the graph below you will see two points. If we assume that X is being measured on the horizontal axis and Y on the vertical axis, then at point #1 the value of X = -3 and the value of Y = 5. At point #2, X = 4 and Y = -2. Now let's answer the question: What happens to the values of X and Y as we move from point #1 to point #2. First you should note that the direction is down and to the right. Down means less of Y while to the right means more of X. The movement from #1 to #2 therefore is a visual representation of an increase in X and a decrease in Y. Since they move in opposite directions, you would say that they were negatively related. Now let's talk about some graphs. The place to start is by recognizing that behind every line graph is a table and behind every table is a story. So let's start with a simple story and work our way to a table and then to a graph. The story is about grades and the relationship between grades and study time. At this time in your educational career you probably believe there is a positive relationship between grades and time spent studying. To make life easy, let's make it a very simple relationship. If you study no hours, you get a zero and for each hour studying you get a 5-point increase in your grade. What would the table and the graph look like? The table and the graph appear below. In the table we can easily check out the story. As we move down one row we are looking to see what happens with one more hour of study and we see that the grade increases by 5 points. We get the same number whenever we compare two rows - it is always 5 more points for another hour of study? What we have is a constant rate of increase - every time we increase time spent studying by one hour, the grade increases just as fast. So what does the picture of this relationship look like? Don't cheat and look ahead. First see if you have an image of what the picture of the relationship would look like. You know that this is a positive relationship - when study time increases the grade increases. What will you see in the graph that conveys this information? 9 Time 0 2 4 6 8 10 12 14 16 18 20 Grade 0 10 20 30 40 50 60 70 80 90 100 What you see is a line sloping upward from left to right. This is the "picture" of a positive relationship. Each point on the line corresponds to a line on the table. The red line corresponds to the row with ten hours of study time. If you read over from the graph you see that at 10 hours of study time, the grade will be 50. Now let's look at the slope - the steepness of the line. A steeper line is said to have a greater slope, but what does that mean in terms of our story? Let's look at the graph below and work back to a table and then a story. We see the line is still positively sloped so we know that there is still a positive relationship. We also see that the curve is steeper, which tells us the improvement in the grade for each hour spent studying is now greater. How do we know that? From high school algebra you know that slope = rise/run. So let's pick two points. The first is the origin where the grade is zero for zero hours spent studying. The second is at five hours of studying. In the old graph (Blue line), the grade is 25. The grade on the new graph (green line) is 30. The slopes of the two lines would be: slope of Blue line = rise/run = ΔY/ΔX = (25-0)/(5-0) = 25/5 = 5 slope of Green line = rise/run = ΔY/ΔX = (30-0)/(5-0) = 30/5 = 6 The steeper slope means each hour spent studying in the new situation increases the grade by 6 points. We would draw a steeper slope if we want to demonstrate a situation where changes in X (hours) created greater changes in Y (grades). You have now worked both ways - translating from words to graphs and from graphs to words. This is a skill that comes with practice and you should consider getting some practice if you are not real comfortable with the translations. For some practice consider the following two "extensions" of the model. The first is a nonlinear situation - a line that is not straight. To see what we have here let's try to create a table and a graph that correspond to the following "story." You will get 14 points without any study time and the first two hours spent studying will give you 13 additional points, the second 2 hours will give you 12 additional points, the third two hours will give you an additional 12 points, ... You can see the pattern. What will the table and the graph look like? 10 Time Grade 0 14 2 27 4 39 6 50 8 60 10 69 12 77 14 84 16 90 18 95 20 99 Let's look at the table first and see what the table tells us about the rate of change. When we increase study time from 0 to 2 hours, the grade increases by 13 points. When you increase from 10 hours to 12 hours the grade increase by 8 points. What has happened to the rate of increase? The rate of change, or slope, for the two points is: slope at 0 = rise/run = ΔY/ΔX = (27-13)/(2-0) = 14/2 = 7 slope at 12 = rise/run = ΔY/ΔX = (77-69)/(12-10) = 8/2 = 4 The rate of increase has gone down. In fact as we increase the hours spent studying, the improvement to the grade continues to fall. The rate at which the grades increase declines as the time spent studying increases. Now how do we see this on the graph? Let's look. The change in the rate of change will show up as a change in the slope. Because the rate of change slows as we increase hours studying, the slope decreases as we move to the right. Now to that second extension. When we draw a graph showing a relationship between two variables (Grade and hours), this does not suggest that there are not other factors that influence the relationship. Can you think of other factors that might alter the relationship between hours spent studying and grades? How about amount of serious drinking? I suspect that heavy drinking would decrease the value of your time studying. What about watching TV while you are studying? How about a study group? And what about a scale? What about the type of questions? We'll look at the effect of the type of questions. Let's assume you will get a mix of questions with more easy questions. What will happen to the relationship between the grades and the time spent studying? You should get a higher grade with the new questions - precisely what you see in the table below. Regardless of how many hours are spent studying, the grade is higher for the easy test. So how does this look on the graph? The entire curve shifts. When one of the "other" factors that influences grades change, the relationship between grades and hours changes which shows up as a new line. Time Difficult Easy 0 0 5 2 10 15 4 20 25 6 30 35 8 40 45 10 50 55 12 60 65 14 70 75 16 80 85 18 90 95 11 Correlation vs. causality Regardless of the variables being graphed, caution must be taken when interpreting the 'results'. As a case in point, consider the graph of interest rates and wage growth rates. The similarity in the trends and cycles of interest rates and wage growth suggests there might be a direct relationship between the two phenomena. In the 1960s and 1970s interest rates and wage growth rates tended to rise together and in the 1980s, they both dropped precipitously. In fact, there is no direct relationship. These two phenomena appear to be related because they are both related to a common third factor, the inflation rate. When inflation rates are high, both wage growth and interest rates tend to be high, while all three tend to fall together. In normal times one can expect lenders of money will charge interest rates to borrowers that are somewhat above inflation rates. Workers, meanwhile, can be expected to bargain for, and receive, wage increases that exceed inflation as compensation for productivity increases. Neither lenders nor workers will normally accept less than the inflation rate because by doing so they would be losing buying power. Now you have the answer to the time-share sales person who tries to "scare" you into buying a time-share by comparing the future price of the vacation with your current income, a very unfair and unjustifiable comparison. It should be clear the driving force behind the rise in hotel room prices is inflation, which will also drive up your wages. Given the close historical association of wages and prices, it would seem reasonable to assume your wages would also keep rising along with the price of the hotel rooms. It is very likely the $5,187 your hotel room was projected to cost in ten years would not represent any more of a financial strain then the current $2,000. If your earnings, which are now $4,000 a month, followed the same pattern of growth then you would be earning in excess of $10,000 a month in ten years. And if this is not enough, you can see the historical record suggests that 10 percent a year inflation is unlikely to be a reasonable forecast for inflation, but more about that at a later time. Scale distortion An important choice one must make when presenting data is the appropriate scale for the graphs. When you look at a graph, if you are anything like me, your initial interpretation of the relationship between the two variables is influenced by the steepness of the curve. For example, consider the time-series graphs of sales for companies Y and X. Based on these graphs I suspect that you would be predisposed to believing that Company Y had experienced the sharpest sales decline. 12 The truth of the matter is, however, there is no reason to believe this because the numeric scale on the axes are excluded in the two graphs. To see the significance of this, consider the following simple example of beer demand. The columns representing demand differ only in that in the first column output is measured in terms of quarts, while in the second pints are the unit of measurement. At a price of 20, demand will be 10 quarts or 20 pints, and as the price falls to 16, demand will increase by 2 quarts or 4 pints. Beer Demand Price Demand Demand (Quarts) (Pints) 20 10 20 16 12 24 12 14 28 8 16 32 4 18 36 0 20 40 When we graph these relationships we can see the influence of the units of measurement on the steepness of the demand curves. The demand curve, when specified in terms of pints, would appear far more responsive to price change, but as we know from the data above, there is no difference. This is why economists tend to favor elasticities as measures of responsiveness rather than slope, something we will talk more about later. We are now ready to return to the Ansylium problem for another look at the effectiveness of the drug. Was there a sharp drop in the blood level of chemical X following the introduction of Ansylium? It depends on which of the two diagrams below better describes the situation. Although both diagrams have identical shapes, in Diagram A there is a 50 percent decline in X between 1984 and 1990, while in Diagram B the decline is only 2 percent. So be careful when reading a graph. Be sure to read the labels and the units of measurement before you jump to any hasty interpretations. Now let’s look at the basics of good graphs. A Guide to Good Graphics Not all graphs are created equal: some are wrong and misleading, some are boring, while others are WOW graphs that capture the viewers attention.1 You want to be in the latter category especially if it is true that “we live in a data driven world where the ability to create effective charts and graphs has become as indispensible as good writing.” (Wong, 13) The good news is the technology exists that allows us to create “spiffy” graphics quite easily, and we will look at how to create those graphs and the things to keep in mind as you are creating the graphs. The bad news is there are too many graphs that are high on techniques such as 3-D and low on informational content and persuasiveness. There is a good deal of !@#$%^ out there produced by those who thought little about the story 13 they were trying to convey, so as we move forward keep in mind that we are talking abut story telling with the goal that your graphics will add to the stories you are telling. One book that is devoted to this is Dona Wong’s, The Wall Street Journal Guide to Information Graphics (Norton 2010). Wong identifies four key steps in the process that are worth keeping in mind as you work on your graphics – research, edit, plot, and review. Good graphics start with the research that gets you the underlying data that must be of high quality. Second you must edit the data since it is unlikely to be in the form that is most useful, an issue discussed in the opening unit. Third you need to plot the data and this will / should involve the creation of a number of graphs from which you will pick the best. Think of it as a photo shoot that produces an image in a magazine. If you ever watched what goes into it you realize there were hundreds of photos taken to create one photo in the magazine, which is why they look so good. You will need to do the same, and fortunately the technology allows you to create the graphs easily and quickly. Finally, you need to review the graphs and make sure they are correct since too often I see student work with errors – and given the adage “a chain is as strong as its weakest link – if your graph is trash then so is the rest of your story. Another good source would be the works of Edward Tufte. According to Tufte, graphical displays should: 1. 2. 3. 4. 5. 6. 7. induce the reader to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else avoid distorting what data have to say present many numbers in a small space make large data sets coherent encourage the eye to compare different pieces of data serve a reasonable clear purpose: description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of a data set Now let’s look at a few things to keep in mind when constructing a graph. Checklist for your graphs 1. Story matters: Most graphs are part of a story – either a written document or a presentation – and you need to make sure that you do a good job of integrating the graphic into the story. This is a BIG problem with student presentations and reports. If you are to use a graph then it should provide some insight into the story. For example, below is a graph of the Shiller Home Price Index for Miami, Florida. Based on this graph you can tell a story about the run up in home prices in the decade ending in 2006 and the collapse in the following two years. Home prices in Miami tripled in the decade, while they fell about 50% before bottoming out in 2011. 2. Type matters: Make sure you create the best graph for the data. Be sure you understand the difference between continuous and discontinuous variables since the graph you use depends upon the difference. You would not use a line graph when looking at data for different age brackets, states, or racial and ethnic groups. These are all discontinuous variables so you should use column / bar graphs rather than line graphs. If you are 14 looking for a comparison of two or more variables over a specified periods of time use a line graph and not a column graph. I trust you can see the difference. 3. 4. 5. Size matters: Be careful to avoid graphs that are too small. You should also make sure the font is large enough in all of the text in the graph. If it is a PPT presentation make sure you have checked it out from the distance the audience will be sitting at the presentation. Complexity matters: Do not try to do too much. Graphs should not include too many variables because when you have too many variables the noise level in the exhibit rises. You do not want a pie graph with too many slices, a column graph with too many columns, or a line graph with too many lines. Below is a time series graph of unemployment rates in the New England states. There are some subtle differences across states, but the number of lines in the graph make it difficult to see those subtle differences. It might be best to compare one state to the US or regional average rather than the eight lines. Titles matter. Make sure titles make convey accurate information and that the reader has enough information to understand the relationship being represented in the graph. Be careful with your title since very often in student work it does not correspond to the data being graphed. And do not forget, spelling matters. • Use abbreviations when possible. When looking at regions in the US, the titles could be abbreviated to NE. MA, ENC, which would give much more space for the graph. • If the data is in millions, thousands... then you should put this in parentheses below the title or on the axis where the data are. An example would be: 15 • 6. 7. United States Population in Billions at the Beginning of Each Decade v US Population: 1960-2000 (millions) Make it short. There is no need for the following title (and I did not make the typos). Number of Government Workers at the Beginning of Each Year for the a 10 Year Period to be replaced with Government Employment Axes matter: Be careful about the axes. Sometimes Excel specifies axes it wants, not ones you want. As an example, one column graph had columns designated as 1, 2 , 3…., but this did not match the data or the description. Be careful to look at the graph and make sure it is what you want. Real estate matters: Think of a graph’s axes as defining a square, and your job is to adjust the default scales so that the graph occupies as much of the real estate as possible. Do not use too much for the legend, and make sure you do not use too many integers in the numbers on the axes. For example, when looking at incomes you probably do not want 345.67546. You could probably use 345. The same is true when the table includes GDP numbers like $3,980,000,000,000. In this case you would want to have the numbers in billions and have $3,980. Do not include excessive 0s in the graph and do not include unnecessary digits. Also, be sure to choose the scale to fill in the real estate. The impact of the scale can be seen in the graphs below. 8. Colors matters Choose your colors carefully and assume the chart will be copied so make sure it looks good in black-and-white. Avoid using a variety of colors on a graph and think of using different shades of the same color – and use darker shade when you want to emphasize something. In a line graph make sure you use different line types so when it is copied the reader will be able to identify the lines. Also be careful if the graph will be reproduced in color. The graph on the left makes it easy for us to separate the two lines, but not the one on the right. Use different styles and widths for the lines if it is going to be in black and white. 9. Accuracy matters: Make sure the graph makes sense. The graph below makes no sense to me because I can’t really come up with a story that goes with it. This is another example of looking at what you get from Excel before going public with it. 16 10. Interesting matters: If a graph can be explained with one or two sentences, then you do not need a graph. If there is no pattern that "jumps" out at you then it is not clear this information is best shown in a graph. It might be better as a part of the discussion with no graph. You could transform the data and this might create an interesting story. Below you will see a graph of the CPI in Italy on the left - not the type of graph that should be part of any presentation. What we see is the price level rising - but this is no surprise - and we described it in one sentence. On the right we have a graph based on the same data. The inflation rate is the percentage change in the price level so it was an easy transformation. But here there is a story - a dramatic decline in the inflation rate in the early 1980s, and then another decline in the late 1990s. There is a story about the euro here and the OPEC inflations of the late 1970s and the change in monetary policies in the early 1980s to mention a few that you could not see in the price level graph. 11. Effort matters: Anyone using excel knows what the default settings are and seldom are they the ones you want so a reader will know that you have not taken the time to “clean up” the graph. When you generate a set of graphs as you look for the correct one you needn’t do this, but once you have settled on a graph you need to work on it. The fact is everyone who has done Excel graphs knows the default settings so make sure the reader knows you took the extra effort to make your graph stand out from the crowd. Using the default settings in a graph would be similar to using the Word resume template when constructing a resume. Everyone in the business of hiring knows these forms, and you probably do not want to use one if you are trying to separate yourself from the crowd. here are a couple of things to consider in that work. • Use the gray scale and the gridlines at your peril. Unless you have a good reason to keep in the gridlines and gray scale that are the defaults on excel, get rid of them. These are just distractions that take away from your graph. • Legends are defaults, but many times you do not need them. There is no reason to put a legend on a graph when there is only one variable. Also, if you have a few variables you may want to try to write then onto the graph and eliminate the legend. Compare the two graphs below and I am certain you will see a BIG difference. 17 A Critique of Some Graphs Given the supply of graphs that are not quite ready for prime time, I am trying another approach. In this section I have included a number of graphs that have been submitted along with my comments on those graphs. Some of the comments have to do with the variables and some have to do with the graph itself and I have chosen a few graphs to point out the features that are mostly covered in the reading. Graph 1 1. 2. 3. Titles/legends: Make sure the titles are efficient – they are correct and as short as possible. This looks like a good title until I see the legend that has production rather than consumption. Also, if this is actually production, I would eliminate “Production” from the legend. When using data from an online source it is common to reduce the name when creating the graph. What you need to do is change the title on the spreadsheet column and it carries over to the graph. There is also no need to have 2013 in the horizontal scale since you could do this with putting 2013 into the title. Gridlines: This graph does double duty. The gridlines do nothing to help the reader so get rid of them, which we saw is very simple. Scale: Look at how much space is unused in the graph. There is no reason not to change the scale from 100220 so the data will fill the graph. U.S. Primary Energy Consumption (BTU quadrillions) 18 Graph 2: 1. Kiss (keep it simple). You can pull out all of the stops, but I suggest keeping it simple. Here is an example where the colors are overkill because they add nothing to the story being told. One thing I have done, however, is if I wanted to emphasize one regions / state/ country I would change the color of one column to focus attention on that one. Graph 3 1. Variable selection: This is a valid scatter diagram because we have two variables, but it does not tell us much about a relationship between the two variables because they are probably both driven by a third variable – population. We would expect more juvenile and criminal cases in big states, so we have nothing here that is interesting. 2. Title: The title needs work since this does not give me a good idea of what I am looking at in the graph. 19 Graph 4: 1. Title: The title is ambiguous. I think it represents the rate of smoking in each state, but it might not. Also there is no date NE usually means New England, but in this case it means Northeast. Also, you need to have female in the legend. 2. Variable choice: Why have a Total column in the graph? It is a distraction for the reader and if we have both males and females, then we have the picture with just those two and this would make it easier to see differences across the states. For example, why is there such a small difference between M and F smoking rates in RI? This jumps out easier without the total column. Graph 4: 1. Title: The title needs to let us know the jurisdiction. Is this the US, California, Canada…? And you do not want to use Trends because that is what you are looking for. I suggest simply Homicides or Number of Homicides in US. You also do not need the years because you see that in the scale. 2. Scale: The scale for time should be changed to have fewer entries. For example, why not have 5-year intervals rather than 1-year intervals. There is also the scale issue for the vertical variable. If you have a good title you do not need to write it out on the vertical plus you could start at 10,000. I would also change the numbers from 10000 to 10,000. 3. Gridlines: The gridlines serve no useful purpose here, so get rid of them. 20 Graph 5 1. Legend: This is a good graph except there is no reason for the legend. Get rid of it 2. Title: Need more information in the title. Is this US? 3. Scale: the scale is good except in the numbers get rid of the decimal points. Graph 6 1. Graph Type: This is a good example of a poor graph choice. The data here shows the number of people with disabilities – at least I think that is what we have here. This would be best represented by a column graph. A scatter graph should have two continuous variables, but states is not a continuous variable. What would work as a scatter graph would be a graph with the rate of disability and the average age in a set of states. 2. Variables: The problem here is we again have a graph that tells us what we expect so there is no reason for it. We see the number with disabilities is bigger in states with more people. A better variable would have been the rate of disability to see differences between states. 21 Graph 7: 1. Graph type / title: I am not sure where the problem is, but there is a problem. If the title is right, then the graph type is wrong because a pie graph is good for representing pieces of a whole. Crime rates would be better shown with a column graph. If we actually have the numbers of crimes, then the graph would be OK, but the title is wrong because we do not have rates. You also need to give the jurisdiction and I believe these are only violent crimes so this should be in the title. Graph 8: 1. Title: The title is confusing and some work needs to be one on the wording. The problem is it is a strange variable that you do not often see. It seems to be getting at what we get at with a budget deficit that measures the imbalance between income and spending. There is also a spelling error, which is pretty close to a fatal mistake. 2. Graph type: The data appears to contain data for every other year because there are dots on only every two years and the line connecting them is straight. Given this the time series graph is not the best choice, which it would be if there were data for every year. In the case of data every two years I would suggest a column graph, or get the data for each year. 3. Gridlines: They serve no purpose so get rid of them. 22 Graph 9: 1. Graph?: Here is an example of a graph that pretty much shows us it has no reason to be here. There are only two categories and one is dominant, so writing that 98 percent of students are schooled in regular schools tells us everything we need to know. 2. Title: Where are we and when are we talking about? You need this in the title. Graph 10: Here is an example of what happens with the copy/paste function. The first graph is a cut and paste while the second is a cut / paste special (picture). The second one does a better job of holding the format that was in the original, although you do lose a bit of definition. 23 A Sample of Interesting graphs. If you are looking for some more “modern” examples of graphs, you should check out Data 360 (http://www.data360.org/index.aspx ) and The Economist magazine’s Graphic Detail (http://www.economist.com/blogs/graphicdetail ). You may also want to check out Gapminder (animated data sets that have shown up in Rosling’s TED videos). The first two graphs are from the Gapminder site. 24 25 26 27 1. As you begin your work with graphs, however, you should realize graphs can be used to distort as well as to inform and you should be aware of what Tufte calls Graphical Integrity. According to Tufte, Graphical Integrity is more likely to result if these principles are followed: The representation of numbers, as physically measured on the surface of the graph itself, should be directly proportional to the numerical quantities represented Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data. Show data variation, not design variation In time-series displays of money, deflated and standardized measurements are always better than nominal units Graphics must not quote data out of context 28
© Copyright 2026 Paperzz