how to speak ggplot2 like a native DC R Meetup Predictive Analytics World October 19th, 2010 Harlan D. Harris, PhD [email protected] ggplot's philosophy Graphics are (should be!) created by combining a specification with data. (Wilkinson, 2005) The specification is not the name of the visual form (bar graph, scatterplot, histogram). The specification is a collection of rules that together describe how to build a graph, a Grammar of Graphics October 19th, 2010 Harlan D. Harris, PhD 2 graphics as grammar 12 10 15 10 Colum n1 5 Colum n2 0 Colum n3 Row 3 data date ct sz z October 19th, 2010 6 4 Column 2 2 Column 3 0 Row 2 Row 4 Row 1 Row 3 Row 1 me Column 1 8 x=date y=ct/sz bars group by z Harlan D. Harris, PhD 3 advantages Flexible Smart can define new graph types by changing specifications can combine many forms into single graphs compact: rules have useful defaults graphs always have meaning Reusable can plug new data into old specification can explore many types of plots from a set of data October 19th, 2010 Harlan D. Harris, PhD 4 ggplot2 Hadley Wickham (Rice Univ.) Extends & implements The Grammar of Graphics (Wilkinson, 1995, 2005) also: reshape2, plyr, etc. Focus on layers; based on grid Specification as R objects constructed by functions Large library of components with good defaults ggplot2: Elegant Graphics for Data Analysis (Wickham, 2009) October 19th, 2010 Harlan D. Harris, PhD 5 my gripes Specification is hierarchical structure; grammar is left-to-right R expression; graph is spatial Can't see the structure (usefully) Abuses both notation and R semantics Deep Magic with lazy evaluation, proto objects Existing tutorials lead to conceptual confusion, requires relearning of fundamentals Start with the structure, not with the shortcuts October 19th, 2010 Harlan D. Harris, PhD 6 goal October 19th, 2010 Harlan D. Harris, PhD 7 data to plot October 19th, 2010 Harlan D. Harris, PhD 8 ggplot likes “long” data October 19th, 2010 Harlan D. Harris, PhD 9 will plot model vs. empirical October 19th, 2010 Harlan D. Harris, PhD 10 simplest plot aes=”aesthetics”=”create mapping” October 19th, 2010 Harlan D. Harris, PhD 11 you don't need to know this! structure ggplot(data=d.long.EI, mapping=aes(x=Parameter, y=Errors, color=Condition)) + layer(geom="line") ggplot data (copy) layers Ø mapping scales coords facets options x=Param. y=Errs color=Cond. layer[1] data mapping geom line stat identity geom_ params stat_ params structure(p), str(p) October 19th, 2010 Harlan D. Harris, PhD 12 add empirical data and chance October 19th, 2010 Harlan D. Harris, PhD 13 you don't need to know this! structure so far ggplot data (copy) layers mapping scales x=Param. y=Errs color=Cond. coords facets options layer[1] data data (U) data (K) data mapping geom layer[1] line mapping mapping yint=Errs mapping yint=[64] October 19th, 2010 stat identity geom layer[1] stat point identity hline geom hline stat geom layer[1] hline Harlan D. Harris, PhD stat hline geom_ params geom_ params size=3 geom_ params size=2 geom_ stat_ params stat_ params stat_ params stat_ color=”black” params params linetype=2 size=.5 14 scales October 19th, 2010 Harlan D. Harris, PhD 15 coordinates & scales coordinates affect display of axes scales affect data mapping cartesian, polar, map, etc. colors, shapes, lines source of confusion set axis ticks/breaks and labels with scale_x_continuous() or scale_y_discrete(), but restrict DATA range with scale_*(limits=c(1,10)) restrict AXIS (plotted) range with coord_cartesian(xlim=c(1,10)) October 19th, 2010 Harlan D. Harris, PhD 16 options October 19th, 2010 Harlan D. Harris, PhD 17 shortcuts All those layer() calls are tedious! geom_*() creates a layer with a specific geom (and various defaults, including a stat) stat_*() creates a layer with a specific stat (and various defaults, including a geom) qplot() creates a ggplot and a layer October 19th, 2010 Harlan D. Harris, PhD 18 quick note on stats stat=”identity” stat=”lm” stat=”smooth” fit y=f(x) with loess() stat=”summary” fit y=f(x) with lm(), generate new data to be plotted by geom_line(), CIs with geom_ribbon() y=f(x) with arbitrary f() stat=”bin” histograms October 19th, 2010 Harlan D. Harris, PhD 19 simplest faceted plot October 19th, 2010 Harlan D. Harris, PhD 20 everything else (+alpha) October 19th, 2010 Harlan D. Harris, PhD 21 other things I find useful scale_x_continuous(breaks=seq(1,9,2), labels=c(“one”, “”, “five”, “”, “nine”)) geom_text(aes(x=.., y=.., label=..)) annotate(geom=”text”, x=14, y=19, “outlier!”) geom_density() stat_summary(fun.data=”mean_cl_boot”, geom=”crossbar”) geom_jitter(position=position_jitter(width=.5)) October 19th, 2010 Harlan D. Harris, PhD 22 “fizzy bubbly” plot rated.movies <- subset(movies, mpaa!=“”) rated.movies$mpaa <factor(rated.movies$mpaa) p <- ggplot(rated.movies, aes(mpaa, rating)) + geom_jitter(alpha=.5) + stat_summary(fun.data= “mean_sdl”, geom=“crossbar”, color=“red”, size=1) ggsave(“movies.png”, p, dpi=150) October 19th, 2010 takehomes a ggplot graph is generated by a specification + data ggplot specifications are a core object plus layers mappings among data, x/y, scales, and other attributes are fundamental geom and stat shortcuts allow smart/compact construction of graphs ggplot encourages good graphs, with facets, good use of color, minimal chartjunk October 19th, 2010 Harlan D. Harris, PhD 24 2010 case study competition winner October 19th, 2010 resources Wickham, H. (2009) ggplot2: Elegant Graphics for Data Analysis. Springer. http://had.co.nz/ggplot2/ http://groups.google.com/group/ggplot2 http://stackoverflow.com/questions/tagged/r http://github.com/hadley/ggplot2/wiki October 19th, 2010 Harlan D. Harris, PhD 26 thanks! October 19th, 2010 Harlan D. Harris, PhD 27
© Copyright 2025 Paperzz