sorted bar plot with 45 degree labels step by step REV 2016-08

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Sorted bar plot with 45 degree labels in R
In this exercise we’ll plot a bar graph, sort it in decreasing order (big to small from left to right) and place
long labels under the bars. The labels will be at a 45 degree angle so that they can fit and still be
readable. Note that in Illustrator you can quickly do this with a rotated text box and another box that wraps
the text and forces the labels to align with the bars at the base of the graph.
Thanks to Gabriel Bentley and Maggie Lee (TAs) who researched the code. The database file was
originally used by student Michelle Boccia.
Download he data set here, composed of increases in tuition by various state universities from 2010-11 to
2011-12. In the exercise, the percent increase will be plotted. The data set file (CSV) is called:
stateU1011.csv Below the way the text file looks and the way it will look in R.
The data has been cleaned and there are no spaces or special characters in the file name or header
names. For example if there is a dash in the name, R will change that into a period when importing it.
Also, if you start the header names with a number, R will append an X in front of the name when
importing.
Please note that not all data sets are ideally plotted as a bar chart. I believe that bar plots are best used
when the X axis (horizontal) is used for categories (universities, states, etc) rather than dates. When a
time series needs to be plotted (years etc. on the X axis) then a line graph is sufficient. Also, plotting
percentages as bars usually is great for comparison between the items, but the relation to the whole
(100%) is usually trimmed at the top and that can skew the perception of the graph. Just beware of it.
The final code can be found at the end of this document.
Import the dataset stateU1011.csv into R-Studio (header: yes, comma separated: yes) and plot. As a rule,
type your code in the R script window (upper left). Run code (button in upper right of window. If
necessary, select only the code you would like to run, then run. In the matrix, identify which data columns
from the data set you are going to visualize. Your choices are the labels in the boxes along the diagonal
of the matrix. For each plot, look up or down to identify the X axis, and look sideways to identify the Y
axis. For this step, refer also to Chapter 6 in the textbook, and especially my annotated pages 188-189. In
this example we will pick campus and percentIncrease.
Page 1 of 7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Plot matrix of possible bivariate combinations
plot(stateU1011)
Plot campus and percentIncrease to check graph, by default R will plot little dots.
plot(stateU1011$ay2011, stateU1011$percentIncrease)
Page 2 of 7
Wednesday, August 31, 2016
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Plot percentIncrease using the barplot command (not that only one data column is needed to plot the
graph. Bars are arranged alphabetically by campus (the names of the universities). It looks cool but it’s
difficult to compare each university with the others. Sorting the bars will look less cool but it will be much
more informative.
barplot(stateU1011$percentIncrease)
Below, add the campus name labels using names.arg. Notice that only a few labels are displayed, simply
because there is not enough room for all the labels to show up.
barplot(stateU1011$percentIncrease, names.arg=stateU1011$campus)
Page 3 of 7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Next, we’ll sort the bars. In order to do this, we’ll create an object in R where the data will be sorted by the
increase amount.
sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg=“")
See result below and sortedTable object in following picture.
Select the sortedTable object (right window) to display this new virtual data set (sorted by
percentIncrease). Note that by default R sorted the data in increasing order (small to big).
Page 4 of 7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Now labels for the university names will be added at a 45 degree angle (run all three lines at once or
individually). If desired, experiment with the values for the text.
sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")
text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.5, srt=45, xpd=TRUE, pos=2)
x tells R where the labels should be positioned (it creates a separate object to do this: midpts+.5 – see
data set window, but don’t worry about it here).
y sets the vertical distance from the bars. Play around with this value as it might look like nothing
happened, but if you don’t get an error, it probably means the labels are rendering off screen, outside the
window. Change the value until the labels appear.
sortedTable displays the names of the campuses but in the new sorted order by percent increase.
srt sets the angle of the label, in this case 45 degrees.
xpd (I have no idea but if you write FALSE the labels won’t appear)
pos sets the alignment, I think 2 stands for Flush Right or right side — try different numbers for fun.
Next, we’ll reverse the sorting to the more traditional large to small, left to right. See highlights in code
below. It’s the same as before with the extra decreasing part, and the type size (cex) is bigger). Run all at
once again.
sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")
text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.75, srt=45, xpd=TRUE, pos=2)
Note that the labels are still disappearing under the window. Don’t worry, after exporting the plot to PDF
and opening the file in Illustrator the labels will display correctly, just make the artboard bigger to make
them fit. (See pic on page 6).
Page 5 of 7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Export graph to PDF. Note that the slanted labels may
appear truncated below the edge of the graph in the
exported PDF. That’s OK, they are still there. You will
see them when you open the PDF in Illustrator —
enlarge the document artboard as needed.
After opening the file in Illustrator you may need to
release the clipping mask:
Select all > Object > Clipping Mask > Release
Also: Compound Path > Release.
Remove any unwanted boxes.
When editing objects (rectangles etc.) remember that
each object is split into two separate objects: fill and
border. Unlike the normal way, where border and fill are
separate attributes but belong to the same object (this is
a quirk of the R —> PDF export).
If you want to change the spacing of the labels, you
need to use Align and space equally. Or put all text in
one continuous text box, rotate, place object wrap on
top, and use leading (line spacing) to space labels.
For more information:
How can I sort my data in R?
http://bit.ly/dxWybg
How to display all x labels in R barplot?
http://bit.ly/1fkfVhu
Page 6 of 7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2016
sorted bar plot with 45 degree labels – step by step
Wednesday, August 31, 2016
Final code, also available here:
# plot matrix of possible bivariate combinations
plot(stateU1011)
# plot campus and percentIncrease to check graph
# by default R will plot little dots.
plot(stateU1011$ay2011, stateU1011$percentIncrease)
# Plot percentIncrease using the barplot command
# (not that only one data column is needed to plot the graph.
barplot(stateU1011$percentIncrease)
# add the campus name labels using names.arg. Notice that only a few labels are
# displayed, simply because there is not enough room for all the labels to show up.
barplot(stateU1011$percentIncrease, names.arg=stateU1011$campus)
# now sort the bars by size. In order to do this, we’ll create an object in R
# where the data will be sorted by the increase amount.
sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")
# now labels for the university names will be added at a 45 degree angle
# (run all three lines at once or individually).
# If desired, experiment with the values for the text.
sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")
text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.5, srt=45, xpd=TRUE, pos=2)
# reverse the sorting (decreasing = TRUE) from large to small,
# left to right. Run all at once again. Labels are bigger.
sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]
midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="")
text(x=midpts+.5, y=-1, sortedTable$campus, cex=0.75, srt=45, xpd=TRUE, pos=2)
# export graph to PDF. note that the slanted labels may appear truncated below
# the edge of the graph in the exported PDF. That’s OK, they are still there.
# You will see them when you open the PDF in Illustrator — enlarge the document
# art-board as needed.
Page 7 of 7