Basics and Beyond: Displaying Your Data

Basics and Beyond: Displaying Your Data
Mario Davidson, PhD
Vanderbilt University School of Medicine
Department of Biostatistics
Instructor
Introduction

Purpose of Visual Displays

Goals


To understand visual displays of data
Structure

Interactive Discussion

Test Your Knowledge

Conclusion
Objectives
1.Understand effective displays
2.Understand how a Table 1 typically looks
3.Be able to interpret basic graphs
4.Know the type of displays that may be used
dependent upon the type of data
5.Be introduced to less familiar displays of the data
Creating Effective Displays (Obj 1)
“One graph is more effective than
another if its quantitative information
can be decoded more quickly or more
easily by most observers. (Robbins,
2005)”
Popular Displays
Description of Table 1 (Obj 2)

Typically summarizes baseline characteristics

Compares statistics: descriptives, confidence intervals, p-
values

Summaries of all types of data

Likert scale: Scale indicating degree of agreement (e.g.
Rate the following statement: I have a had a difficult time
focusing on my studies this semester: SD D N A SA
Example of a Table 1 (Obj 2)
(Ghodasara, et al. 2011)
Example of a Table 1 (Obj 2)
Pie Charts and Bar Graphs (Obj 3)
•Interpret the following graphs.
•Cherry or Apple Pies sold
the most in January. “Other”
pies sold the least
•Nearly 15 subjects chose
Saturday as their favorite
day. Sunday was the least
chosen.
Pie Charts (Obj 3 & 4)

Features (Obj3)


Categorical Data
Advantages:

Easily Interpreted
•Larger Area; Greater
Proportion


Easy to Create
Disadvantages

Difficult to Judge Areas

Wastes Ink
Bar Plots (Obj 2)
Features (Obj3)

Categorical Data

Advantages

Same as Pie Chart

Disadvantages

Similar to Pie Chart

Analyte 2.5

Histogram and Dot Plot (Obj 3)
•The most frequent BMI seems to be
approximately around 24-26.
•There were 8 subject weighing
approximately 0 grams. There
was only one weighing 10 grams.
Histograms (Obj 4)
Features
Shows Distribution
Quantitative Data
Advantages
Easy to Interpret
Easy to Produce
Disadvantages
Size of Bins Change
Perception
No Exact Values

Dot Plot (Obj 4)

Features



Quantitative Data
Advantages

Small and Moderate Data

Easily to Interpret
Disadvantages

Large Data

Not Produced in all
Packages
Stem and Leaf Plot (Obj 4)
Features
Quantitative Data
Advantages
Small and Moderate Data
May be Used with Large
Data
Can be Produced by
Hand
Easy to Interpret
Disadvantages
May be Difficult to
Measure Center
Not Appealing

•The most frequent USMLE1 scores in our
data were in the 220's, 230's, and 260's.
The highest and lowest scores were 190
and 278 respectively.
Test Your Knowledge

Why is this graph difficult to interpret?

What is the trend?
Test Your Knowledge

There is no y-label (Obj 1)

R is a statistical software

From Jan-Dec, there is an upward trend( Obj 3)
Line Graph (Obj 4)
Features

Used with Continuous

and Categorical Data
Associations, Trends,

and Range
Advantages

Produced in Most

Packages
Effective Graphs: Critiquing (Obj 1)
•Meltzoff, J. 2010
Line Graph with Rugplot
Scatterplots (Obj 3)
• Age doesn't
seem to
influence
survival for
Grade 4 tumors;
however, this
appears different
for Grade 3
tumors.
Scatterplot (Obj 4)
Features (Obj3)

Quantitative

Associations

Trend

Advantages

All Data

Produced in Most Packages

Exact Values

Easy to Interpret

Disadvantage

Large Data

Test Your Knowledge (Obj 4)
For the following scenarios, which graph(s) (Pie, Bar,
Histogram, Dot Plot, Stem and Leaf, Line, Scatterplot) is
appropriate?
Dr. Jennings is interested in displaying the number of
minutes people exercise/week.

Nurse Zhou wants to display the relationship between
the amount of blood loss by patients and their weight.

Mr. Thompson wants to display the distribution of
BMI.

Test Your Knowledge (Obj 4)
For the following scenarios, which graph(s) (Pie, Bar, Histogram,
Dot Plot, Stem and Leaf, Line, Scatterplot) would be appropriate?
Dr. Jennings is interested in displaying the number of minutes
people exercise/week.
Histogram and Stem and Leaf
Nurse Zhou wants to display the relationship between the
amount of blood loss by patients and their weight.
Scatterplot
 Mr. Thompson wants to display the distribution of BMI.
Trick question
•If BMI is quantitative, a histogram or stem and leaf are
appropriate; possibly a dot plot.
•If BMI is categorical, a bar graph or pie chart are
appropriate.

Less Familiar Graphs
(Obj 5)
Boxplot (Obj3)
Some Interpretations

The post cardiac patients tend to

have a higher Overall
Competency Score (OCS).
The median OCS for pre-cardiac

patients is 70.
The minimum pre-cardiac

patients’ OCS is 50 and the
maximum is about 95.
The lower quartile of post cardiac

patients’ OCS is about 69. The
third quartile is about 82.
Boxplot (Obj4)
Features

Categorical and Quantitative Data

Compare Groups

Advantages

Good Summary: Min, 1Q, 2Q

(median), 3Q, Max
Disadvantages

Does not Display All the Data

Not as Appealing

Cannot be Created in All Packages

May not be Recognized

Boxplot Overlayed with Stripchart
(Obj4)

Features



Same as Boxplot
Advantages

Same as Boxplot

All of the Data
Disadvantage
Cannot be Created

in All Packages
Total Quality Improvement Knowledge
Assessment Tool Score
Dot Chart (Obj 3 and 4)



Features

All Types of Data

Can Make Comparisons
Advantages

Easy to Interpret

All Sizes of Data
Disadvantage

May Not be Recognized
Kaplan Meier Curve (Obj 3)

Demonstrates the
probability of survival

The plot suggests that
males have a more
favorable rate of survival
over the years.

Can be created in most
programs
Number at Risk
Probably Even Less Familiar Graphs
(Obj 5)
Spaghetti Plot (Obj 4)
•The overall trend suggest
that as age increases so do
earnings.
Features

Quantitative and
Longitudinal
Shows Trend

Advantages

Shows all of the Data

Disadvantages

Not Available in All

Earnings (thousands)

Packages
May be Difficult to

Interpret
Age(years)
Dendogram: Cluster (Obj4)

Determine
Clusters

Data Reduction

PGY and
Clinical Year
Clustered
Scatter Plot with Marginal Histograms
(Obj4)

Features

Quantitative

Trends, Associations,
and Distributions

Virtually appealing

Cannot be created in many
programs
Large Data Sets
(Obj 5)
Sunflower Plot (Obj4)
Features

Categorical and

Quantitative
Large Data

More Ink; More Dense

More fresh embryos to the

uterine were transferred on
day 3.
Heat Map with Rugplot

Lightness or
Darkness
Indicates Intensity

May not be
Created in Some
Programs
Nomogram

May Provide Risk, Probability,
etc.

Predictive Scores

Sum the “Points” for each
characteristic, find the “Total
Points,” then look at the
corresponding “Risk of Death.”

40 y.o., Male, 200 Cholesterol,
and 170 BP – 48% Risk of
Death
Creating Effective Displays
(Robbins 2005)

Shades of Gray May be
Indistinguishable when Copied

Proofread the Displays

Display Should be Consistent with Text

Draw to Scale
Conclusion

Choose Best Display

Consider the Type of Data

Effective Graphs

Consider your target audience

Color may cost
References
•Hamid, et al. BMC Infectious Diseases 2010, 10:364. http://www.biomedcentral.com/14712334/10/364
•Grober, E, Hall, CB, Lipton, RB, Zonderman, AB, Resnick, SM, and Kawas, C (2009). Memory
impairment, executive dysfunction, and intellectual decline in preclinical Alzheimer's disease.
Journal of the International Neuropsychological Society, 14(2), 266-278.
•Ghodasara, SL, Davidson, MA, Reich MS, Savoie, CV, Rodgers, SM. (2011). Assessing student
mental health at the Vanderbilt University School of Medicine. Academic Medicine 86, 116-121,
2011
•Meltzoff, J. (2010). Critical thinking about research. American Pyschological Association;
Washington D.C.
•Robbin, Naomi (2005). Creating more effective graphs. Wiley. Hoboken, NJ.
•http://data.vanderbilt.edu/rapache/bbplot/