Visualization of Public Health Data in Maps: The Potential for Misinterpretation

Visualization of Public Health Data in Maps:
The Potential for Misinterpretation
Eric Hanson • Minnesota Environmental Public Health Tracking • Minnesota Department of Health
Introduction
Spatial Aggregation and Scale
The visualization of public health data in maps can be an informative means to view patterns and trends in the data, explore and analyze the
data and as a quick visual method in providing information. Maps can also cause a misinterpretation of the data. Using an inappropriate
classification method in creating a choropleth map, not evaluating potential boundary changes from year to year and the potential of patterns
being masked due to aggregation could be problematic in providing an accurate cartographic display and analysis of the data. These three
examples of potential problems are examined.
The ability to identify geographic patterns or have access to a finer geographic
scale of data is often dependent on data availability or confidentiality. Viewing
data at large geographic units such as county, compared to smaller units such
as census tract, reduces the ability to find patterns or see trends in the data.
Ideally it is best to view data at smaller geographic units, such as census tract,
but often this data is not available, either because of small numbers and
confidentiality or the data was not collected at that level of geography.
Choropleth Map Classification Methods
Equal Interval
3.3
2.6
1.3
0.66
10
2.0
Value ranges divided
equally.
8
0.00 - 0.66
6
0.67 - 1.32
1.33 - 1.98
4
1.99 - 2.64
2
2.65 - 3.30
0
0.82
1.6
Data Distribution
2.5
3.3
Class Breaks
Quantile
3.3
0.68
0.2
0.38
10
0.96
Each class contains an
equal number of features.
8
In the example below, disease rates are mapped for block groups,
census tracts, zip codes and county. The ability to identify patterns in
the disease data and provide increased information is most effective
with the smaller geographic units of block groups and census tracts.
A choropleth map is a type of
thematic map in which geographic
areas are organized into classes
based on values. There are a
number of different methods to
group values into classes. The
most common are equal interval,
quantile, natural breaks and
standard deviation. Depending on
the data distribution, the maps
could reveal different results with
the same data.
Block Groups
As the size of the units increase to zip code and county, the
identification of patterns decreases as well as the ability for more
precise comparisons between geographic units and access to information.
Census Tracts
In the example to the left, all four
maps have the same data set, but
reveal a variance in outcome. The
differences between the equal
interval and quantile maps with
this data set could result in varied
analysis and interpretation. This
misinterpretation could cause
areas that have relatively low
values, to be displayed in a high
value class or vice versa.
Zip Codes
County
Confidentiality
Data availability
Pattern identification
Increased information
0.00 - 0.20
6
0.21 - 0.38
0.39 - 0.68
4
0.69 - 0.96
2
0.97 - 3.30
0
0.82
1.6
Data Distribution
2.5
3.3
Class Breaks
Often the best choice of
classification method may depend
on the data that is being displayed
and the purpose of the map.
Examining the data distribution
and how each method categorizes
the data should be an essential
step in this process.
Geographic Boundary Change
!
3.3
2.31
0.59
0.24
1.07
Classes divided based on
natural groupings in the data.
!
8
B
6
C
C
A
!
!
4
0.60 - 1.07
B
!
!
!
!
1.08 - 2.31
2
2.32 - 3.30
0
0.82
1.6
2.5
Data Distribution
3.3
Year X
A
A
D
!
!
! !
! !
!
Class Breaks
E
!
G
F
C
3.3
2.2
1.6
0.96
0.36
< -0.50 Std. Dev.
Year Y
6
4
A
E
0.82
Data Distribution
1.6
2.5
Class Breaks
!
!
3.3
H
G
!
From Year X
to Year Y, the
geographic
boundaries
of Units A
and B change.
In Year Y,
Unit B loses
area to
Unit A.
!
!
H
!
F
!
!
!
!
!
!
!
!
!
!
B
!
!
!
!
!
C
!
!
A
!
!
!
! ! ! !
!! !! !
!! !
! !
! !!
!
! !
!
!
!
!
D
!
!
Disease counts are represented by
!
points on the
! map. If the geographic
location of disease data is similar for
both Year X and Year Y , but the geographic
! !
boundaries
change, it!could cause a
!
misanalysis when comparing Year X to
! !
! !
Year Y. If Unit A has 6 counts of
disease in Year X and has 15 counts of
disease in Year Y, it may appear there !
! was a large increase
! of disease counts
!
! in
in Unit A from Year X to Year Y, but
fact, the increase was caused by a
!
geographic border change.
!
!
!
!
!
E
!A
!
!
!
!
!
!
G
!
!
!
F
H
!
!
!
!
!
F
!
!
!
!
!
Disease Counts
!
!
!
! ! ! !
!! !! !
!! !
! !
! !!
!
10 - 17
G
H
A B
C D E
Disease
Counts
!
F
B
Mapping disease counts in a
choropleth map for both years
!
will show an increase in
disease counts for Unit A in
!
Year Y.
A
D
C
D
Year Y
G
9
E
Disease Counts
6
2-3
!
3
0
G H
The apparent increase in
disease in Unit A in Year Y,
is simply the result of a
boundary change.
12
!
F
Units
!
!
!
7-9
E
4-6
7-9
A !B
C D E
Units
!
!
0
4-6
15
B
!
10
2-3
!
!
!
! !
! !
D
!
!
!
!
G
!
!
!
> 2.5 Std. Dev.
!
!
D
2
0
E
!
!
!
!
B
8
A
D
Disease
Counts
5
!
!
!
Classes based on standard
deviations from the mean.
1.5 - 2.5 Std. Dev.
!
!
Standard Deviation
0.50 - 1.5 Std. Dev.
D
!
15
C
B
!
H
-0.50 - 0.50 Std. Dev.
!
!
!
10
!
!
!
! ! ! !
!! !! !
!! !
! !
! !!
!
!
!
! ! ! !
!! !! !
!! !
! !
! !!
Year X
20
B
!
0.00 - 0.24
0.25 - 0.59
Visualizing public health data on maps can be an important means to
!
examine and analyze data. Maps can also potentially cause misinterpretation
of the data if best practices are not applied. Best practices would consist of
having a firm understanding of the data, what purposes are being
achieved by mapping the data and what would be an appropriate
classification method for creating a choropleth map; being aware of
boundary changes that may have occurred that could alter comparisons;
!
and if available,
map at the smallest geographic unit possible if it is
appropriate for the
purpose of the map.
!
Another source of potential misinterpretation occurs when
geographic boundaries change. When viewing trends in
data, if the geographic units have changed their !
boundaries, it could result in an inaccurate comparison
of the geographic units between different time periods.
The example below shows how an alteration of borders
between two geographic units can reveal!very different
results.
Natural Breaks
10
Conclusion
!
F
G H
10 - 17
H
F