SCALE-SPECIFIC AUTOMATED MAP LINE SIMPLIFICATION BY

The Pennsylvania State University
The Graduate School
Department of Geography, College of Earth and Mineral Sciences
SCALE-SPECIFIC AUTOMATED MAP LINE SIMPLIFICATION BY
VERTEX CLUSTERING ON A HEXAGONAL TESSELLATION
A Thesis in
Geography
by
Paulo Raposo
 2011 Paulo Raposo
Submitted in Partial Fulfillment
of the Requirements
for the Degree of
Master of Science
August 2011
The thesis of Paulo Raposo was reviewed and approved* by the following:
Cynthia A. Brewer
Professor of Geography
Thesis Advisor
Donna Peuquet
Professor of Geography
Karl Zimmerer
Professor of Geography
Head of the Department of Geography
*Signatures are on file in the Graduate School
iii
ABSTRACT
Despite decades of research, effective automated line simplification methods retaining
cartographic nuance remain elusive. Solutions to this problem have also become increasingly
important with the development and prevalence of web cartography. Most existing algorithms
involve parameters selected arbitrarily or heuristically, with little reference to the scale change
between the original data and the desired generalized representation. This research presents the
hexagonal quantization algorithm, a new method by which regular hexagonal tessellations of
variable scale are used to sample cartographic lines for simplification. Hexagon width, reflecting
sampling fidelity, is varied in proportion to target scale, thereby allowing for cartographic scalespecificity. Tesserae then constitute loci among which a new set of vertices can be defined by
vertex clustering quantization, and this set of vertices is used to compose a generalized correlate
of the input line which is appropriate for its intended mapping scale. Hexagon scaling is
informed by sampling theory, drawn both from the field of geography and other cognate signalprocessing fields such as computer science and engineering.
The present study also compares the hexagonal quantization algorithm to the LiOpenshaw raster-vector algorithm, which undertakes a similar process using traditional square
raster cells. The lines produced by either algorithm using the same tessera width are objectively
compared for fidelity to the original line in two ways: spatial displacement from the input line is
measured using Hausdorff distances, and the product lines are presented against their input lines
for visual inspection.
Results show that hexagonal quantization offers appreciable advantages over the square
tessellations of traditional raster cells for vertex clustering line simplification in that product lines
are less displaced from input lines. It is found that product lines from the hexagonal quantization
algorithm generally maintained shorter Hausdorff distances than did those from the Li-Openshaw
iv
raster-vector algorithm. Also, visual inspection suggests lines produced by the hexagonal
quantization algorithm retain informative geographical shapes for greater differences in scale than
do those produced by the Li-Openshaw raster-vector algorithm. Results of this study yield a
scale-specific cartographic line simplification algorithm that is readily applicable to cartographic
linework.
v
TABLE OF CONTENTS
List of Figures .......................................................................................................................... vii
List of Tables ........................................................................................................................... x
Acknowledgements .................................................................................................................. xi
Chapter 1 Introduction ............................................................................................................. 1
Lines on Maps .................................................................................................................. 1
Unique Contributions ............................................................................................... 5
Thesis Structure................................................................................................................ 6
Chapter 2 Line Simplification Literature ................................................................................. 7
Generalization .................................................................................................................. 8
Line Simplification .......................................................................................................... 11
Characteristic Points................................................................................................. 12
Segmentation and Strip Trees .................................................................................. 14
Point Reduction vs. Line Redefinition ..................................................................... 15
Constraints and Scale-Specificity............................................................................. 16
Classes of Line Simplification Algorithms .............................................................. 18
Survey of Cartographic Algorithms ................................................................................. 20
Algorithms Popular in Cartography ......................................................................... 20
Outside of Cartography: Vertex Clustering and Mesh Simplification ..................... 30
Hexagonal and Square Tessellations Applied to Pattern Analysis and
Generalization .................................................................................................. 33
Hausdorff Distance .......................................................................................................... 38
The Hausdorff Distance vs. McMaster's Measures of Simplified Lines .................. 41
Summary .......................................................................................................................... 42
Chapter 3 The Hexagonal Quantization Algorithm and Study Methods ................................. 44
Overview of the Algorithm .............................................................................................. 44
Tessellation and Polyline Structure .......................................................................... 47
Steps of the Hexagonal Quantization Algorithm ............................................................. 49
Calculation of Tessellation Resolution..................................................................... 49
Tessellation Layout .................................................................................................. 50
Vertex Clustering and Quantization ......................................................................... 51
Clustering Routine Compared Li & Openshaw’s Suggestion .................................. 53
Implementation ................................................................................................................ 56
Sample Lines ............................................................................................................ 58
Experiment Design and Statistical Comparison Between Hexagonal and Square
Outputs ..................................................................................................................... 60
Experimental Design ................................................................................................ 61
vi
Chapter 4 Results and Interpretations ...................................................................................... 64
Resulting Line Simplifications: Visual Presentation ....................................................... 64
Statistical Results ............................................................................................................. 76
Interpretations .................................................................................................................. 85
Discussion of Cartographic Results ......................................................................... 85
Discussion of Statistical Results .............................................................................. 88
Summary .......................................................................................................................... 92
Chapter 5 Conclusions and Future Work ................................................................................. 94
Relative Magnitude of Improvement ....................................................................... 95
Future Tessellation Variations ................................................................................. 96
Repair of Line Self-Crossings .................................................................................. 98
General Summary ............................................................................................................ 99
Appendix A Summary Table of All Sample Lines ................................................................. 100
Appendix B Example Text Report from Software.................................................................. 105
References ................................................................................................................................ 106
vii
LIST OF FIGURES
Figure 2.1 - Attneave's sleeping cat. (Source: Attneave, 1954, p. 185) ................................... 13
Figure 2.2 - Perkal’s method at three different values of ε. Hatched areas are inaccessible
to the roulette, and therefore dropped from the lake form. (Source: Perkal, 1965, p.
65) .................................................................................................................................... 25
Figure 2.3 - The Douglas-Peucker algorithm. (Source: McMaster & Shea, 1992, p. 8081) .................................................................................................................................... 27
Figure 2.4 - The Visvalingam-Whyatt algorithm. (Source: Visvalingam & Whyatt, 1993,
p. 47) ................................................................................................................................ 28
Figure 2.5 - The Li-Openshaw raster-vector algorithm. The sinuous gray line represents
the input line, the darker gray lines are segments within cells from entry to exit
points of the input line, and the black line is the simplified line, formed from the
midpoints of the darker gray lines. (Source: Weibel, 1997, p. 125) ............................... 30
Figure 2.6 - Mesh simplification. (Source: Dalmau, 2004) .................................................... 31
Figure 2.7 - The three possible regular tessellations of the plane. (Source: Peuquet, 2002) .. 34
Figure 2.8 - Connectivity paradox; in triangles and squares, whether or not regions A and
B are connected by the corners of cells l and m is unclear, as is whether or not gray
cells form a continuous region across cells p and q. There is no such ambiguity in
hexagons. (Adapted from source: Duff, Watson, Fountain, & Shaw, 1973, p. 254) ...... 36
Figure 2.9 - An equilateral hexagon and square in their circumcircles. The area of the
hexagon is closer to its circumcircle than is the square’s to that of its circumcircle.
(Source: WolframAlpha.com) .......................................................................................... 37
Figure 2.10 - The Hausdorff Distance in ℝ2. Line M represents the longest distance an
element a of all elements A has to go to reach the closest element b. Line N
represents the same, but from B (and all elements b thereof) to the closest element a.
Line M is the directed Hausdorff distance from A to B, while line N is the directed
Hausdorff distance from B to A. The longer of these two (M) represents the
(overall) Hausdorff distance. (Figure adapted from source:
http://www.mathworks.com/matlabcentral/fileexchange/26738-hausdorff-distance,
graphic by Zachary Danziger).......................................................................................... 39
Figure 3.1 - The hexagonal quantization algorithm. In each hexagon, the input vertices
(gray) are quantized to a single output vertex (black), resulting in a simplified output
line (in black). .................................................................................................................. 44
Figure 3.2 - Hexagon width (i.e., tessera resolution). .............................................................. 46
Figure 3.3 - Sixty-degree range of rotation for regular hexagonal tessellations. ..................... 46
viii
Figure 3.4 - The effect on output lines caused by shifting the tesserae. Input vertices and
lines are in gray, and output vertices and lines are in red. ............................................... 47
Figure 3.5 - Layout of hexagons using the bounding box delimiting the line. The
hexagon in the north-west corner is drawn centered on the bounding box corner first,
with hexagons below it drawn to follow. The second “column” of hexagons to the
east is drawn next, and the process continues until the bounding box is completely
covered by a hexagon on all sides. ................................................................................... 51
Figure 3.6 - Constructing an output vertex (orange) for each pass (first in red, second in
blue) of the input line through the hexagon. .................................................................... 52
Figure 3.7 - The two clustering methods used in this research. The midpoint of the first
and last vertices method is illustrated on the left, while the spatial mean of vertices is
illustrated on the right. ..................................................................................................... 53
Figure 3.8 - Li's suggested solution for single vertex selection within cells with multiple
passes of the input line - see cell at top, center. (Source: Li, 2007, p. 153) .................... 54
Figure 3.9 - An effect of Li's suggested method of selecting single vertices in a cell with
multiple input line passes. In this example, the application of Li’s suggestion at the
tessera overlapping the peninsula’s connection to the mainland would cause the
entire peninsula to be deleted, whereas a representation of it could be retained at this
cell resolution (i.e., target scale). ..................................................................................... 55
Figure 3.10 - A screen shot of the graphical user interface of the software developed to
implement the algorithms and the calculation of Hausdorff distances. ........................... 57
Figure 3.11 - Locations of the 34 sample lines used in this research. Coast and shore
lines are indicated in italics. (Background hypsometric tint courtesy of Tom
Patterson, source: NaturalEarthData.com) ....................................................................... 60
Figure 4.1 - All 34 lines simplified by the hexagonal quantization algorithm to 1:500,000
and drawn to scale. ........................................................................................................... 65
Figure 4.2 - Simplifications of a portion of the coast of Maine produced by both the
hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector
algorithm (green) using the spatial mean quantization option, against the original
line (gray). All lines drawn to 1:24,000. ......................................................................... 67
Figure 4.3 - Simplifications of a portion of the coast of Maine produced by both the
hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector
algorithm (green) using the midpoint first and last vertices quantization option,
against the original line (gray). All lines drawn to 1:24,000........................................... 68
Figure 4.4 - Simplifications of a portion of the coast of the Alaskan Peninsula produced
by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw
raster-vector algorithm (green, right) using the spatial mean quantization option,
against the original line (gray). All lines drawn to 1:24,000........................................... 69
ix
Figure 4.5 - Simplifications of a portion of the coast of the Alaskan Peninsula produced
by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw
raster-vector algorithm (green, right) using the midpoint first and last vertices
quantization option, against the original line (gray). All lines drawn to 1:24,000.......... 70
Figure 4.6 - Portion of the coast of Newfoundland, simplified to seven target scales by
the hexagonal quantization algorithm using the midpoint first and last vertices
quantization option........................................................................................................... 71
Figure 4.7 - Portion of the coast of Newfoundland, simplified to seven target scales by
the Li-Openshaw raster-vector algorithm using the midpoint first and last vertices
quantization option........................................................................................................... 72
Figure 4.8 - Portion of the Humboldt River, simplified to 1:150,000 by both algorithms
using both quantization options. The orange box signifies the location of the
1:24,000 segment (at top) on the simplified lines. ........................................................... 73
Figure 4.9 - Portion of the Mississippi Delta coastline, simplified to 1:250,000 by both
algorithms using both quantization options. .................................................................... 74
Figure 4.10 - Portion of the shore of the Potomac River, simplified to 1:500,000 by both
algorithms using both quantization options. The orange box signifies the location of
the 1:24,000 segment (at top-center) on the simplified lines. .......................................... 75
Figure 4.11 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and
square samples, using the spatial mean quantization option. ........................................... 78
Figure 4.12 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and
square samples using the midpoint first and last vertices quantization option. ............... 79
Figure 5.1 - “Untwisting” line self-crossings. The routine iterates through all line
segments, checking for intersections with other line segments. When one is found,
the sequence of vertices starting from the second vertex of the first line segment
until the first vertex of the second line segment is reversed. The process repeats
from the beginning of the line, “untwisting” self-crossings one at a time, until no
more are detected. ............................................................................................................ 98
x
LIST OF TABLES
Table 2.1 - Distances and areas for different regular tessellation geometries. (Source:
Duff et al., 1973, p. 245) .................................................................................................. 35
Table 2.2 - Required properties of a true mathematical metric. (Source: Veltkamp &
Hagedoorn, 2000, p. 468)................................................................................................. 40
Table 4.1 - Mean Hausdorff distances (in ground meters) between simplified and input
vertices. Each mean Hausdorff distance is calculated from n = 34 simplified lines
and their related input lines. ............................................................................................. 76
Table 4.2 - Pearson correlation coefficients for differences in means observed between
the hexagonal and square algorithms, using the midpoint first and last vertices
quantization option........................................................................................................... 81
Table 4.3 - T test statistics across seven scales for the difference in mean Hausdorff
distances between square and hexagonal algorithms using the midpoint first and last
vertices quantization option. ............................................................................................ 81
Table 4.4 - Pearson correlation coefficients for differences in means observed between
the hexagonal and square algorithms, using the spatial mean quantization option.......... 81
Table 4.5 - T test statistics across seven scales for the difference in mean Hausdorff
distances between square and hexagonal algorithms using the spatial mean
quantization option........................................................................................................... 82
Table 4.6 - Related-samples Wilcoxon signed rank statistics. ................................................. 83
Table 4.7 - Three-way ANOVA test statistics across all 952 simplifications and three
factors. .............................................................................................................................. 84
Table 4.8 - Mean percent reductions in vertices from the input line, averaged across all 34
sample lines, for each algorithm and each quantization option. ...................................... 85
xi
ACKNOWLEDGEMENTS
I would like to thank my advisor Dr. Cynthia Brewer for all her invaluable support and
guidance, both on my thesis work and other projects. I also wish to thank the Department of
Geography in general, for having provided me with an excellent environment and community of
scholars within which to learn and work. I’ve had several thought-provoking and inspiring
conversations with Donna Peuquet about this research, and she has served as thesis reader - thank
you! I am indebted, both to Professor Krzysztof Janowicz, and my fellow graduate student
Alexander Savelyev, for their teaching and help with programming in Java. Also, conversations
with the following persons have helped me develop the ideas in this thesis, and I thank them each:
Barbara Buttenfield, Charlie Frye, Zhilin Li, and Alan Saalfeld.
Chapter 1
Introduction
This research has set out to develop a scale-specific algorithm for cartographic line
simplification that uses two-dimensional regular hexagonal tessellations and a vertex clustering
quantization technique. In development of the algorithm, this research has had two main goals: to
implement classical sampling theory and map resolution theory in service to cartographic line
simplification in order to achieve scale-specificity, and to demonstrate that hexagonally
tessellated sampling performs with greater fidelity to original lines than do the square
tessellations of traditional rasters. The algorithm developed, termed “hexagonal quantization,”
has been implemented in original software. It has then been compared to the Li-Openshaw rastervector algorithm, which also uses a regular tessellation and vertex clustering technique, but with
square raster cells. The essential differences between the two algorithms are the geometry of the
tessellation used, and the formulae by which tessera dimensions are calculated in relation to target
scale. Using constant input lines and tessera widths, comparison is done of the distances between
the lines produced by either algorithm and the input line. A simple formula based on map
resolution at target scale has been developed for use with the hexagonal quantization algorithm to
permit scale-specificity.
Lines on Maps
One of the most fundamental notations one can make when drawing any kind of diagram
or sketch is a line. Chaining lines together or making one end where it began builds any kind of
2
polygonal representation in a sketch. Maps, of course, are like any other kind of sketch in which
lines figure strongly.
In mathematics, a line is frequently regarded as the set of all points that lie on the path
defined by a function. By this definition, a line may exist in anywhere from one to infinite
dimensional space (i.e., ℝ 1, ℝ2, ... , ℝ ∞), and is composed of an infinite set of points within the
range of the function. Such a definition mimics the behavior of a line in the real world, in that the
number of points along a real line is limited only by the precision with which the line can be
observed or measured. Some lines in contemporary geographic information systems and
cartography are defined in this mathematical way, such as Bezier curves, which draw a curve that
meets certain smoothness criteria by first deriving a mathematical function for it. These,
however, are rare, and most GIS and cartographic lines are defined as polylines.
Polylines are defined by finite sets of points, between which straight-line segments are
sequentially chained to build a linear feature. Strictly speaking, there is no curvature in polylines,
but because straight segments can meet at vertices at variable angles, the overall form can follow
or mimic curves. The large majority of contemporary digital “lines,” whether in cartography or
any other form of digital graphics, exist in this form. A reason for this may be the relative
simplicity of these lines over those defined by mathematical functions, both in terms of
conceptualization and in digital creation; it is generally easier to digitize a line by a series of
points than to try finding a mathematical function that satisfactorily models a real-world linear
feature. Another reason is the way in which sequenced sets of points in a polyline are easily
encoded and manipulated in the form of programming language arrays.
With polylines being the de facto standard for lines in a GIS, manipulation and rendering
of map lines is a matter of computation on the set of points (or, the set of line segments between
points) that define them. There are significant implications for multi-scale representation in this
fact, and these mainly reflect how the polyline model relates to the real world feature it
3
represents. For example, vertices along polylines exist with frequency or density that can be
measured several ways, such as number of points per unit polyline length, or mean distance
between points. Vertex density in a line is a measure of the precision with which (or resolution to
which) that line is defined. In cartography, that precision is usually closely or directly related to
the cartographic scale at which the data are collected or meant to be drawn.
Based on such an understanding of polylines in maps, cartographic line simplification
may be defined as the transformation of the set of points that define a polyline to a new set
defining a new polyline which represents the input line at reduced symbol dimensions and with
reduced detail. The work presented in this thesis deals exclusively with line simplification, being
one of many operators (i.e., logical, geometric and graphical processes which transform spatial
data) involved in cartographic generalization. Line simplification in this work is understood as a
problem set in computational geometry regarding the transformation of sets of points which
define polylines in digital cartographic data. Specifically, this thesis presents an original,
automatable line simplification algorithm, with comparison and commentary on how it performs
against a similar algorithm: the raster-vector permutation (Li 2007) of Li and Openshaw's (1993)
natural principle algorithm.
Motivation for the development of the hexagonal quantization algorithm has come from
several sources. Chief among these has been the belief that scale is the single most important
factor driving the need to simplify cartographic linework. As map scale decreases, less space is
available for symbolic representation of a sinuous linear landscape feature. While this is an
obvious fact well-known to cartographers, most techniques for line simplification presently used
are entirely uncoupled from the notion of scale. Whereas cartographers frequently refer to
technical specifications that clearly describe the desired linework qualities at given map scales,
the input parameters of most line simplification algorithms refer to metrics that cannot be
objectively related to a specific scale. Examples of these algorithms include the Douglas-Peucker
4
(1973) and Visvalingam-Whyatt (1993) algorithms. While scale-specific line simplification
algorithms have been developed (such as those by Perkal (1966), Li and Openshaw (1993), and
Dutton (1999)), they have not yet enjoyed popularity in implementation. Several reasons may
exist for this, including performance, relative complexity, and availability in commercial
software. Regardless of how common these scale-specific algorithms are, the author believes
they each conceive the cartographic problem properly, and the algorithm presented here is in the
same vein. It is hoped that the performance, relative simplicity and availability of the presented
algorithm helps to bring attention to scale-specific multi-scale representation and generalization
methods.
Further motivation to develop the algorithm described here was provided by the desire to
demonstrate an alternative geometric conception of the cartographic line simplification problem.
Many approaches to line simplification among cartographers have revolved around the notion of
characteristic points in a line, and the importance of retaining these. Characteristic points are
defined as those along a line that, as a subset, make an effective abstract gestalt of the line (these
will be discussed in greater depth in the literature review chapter). The author contends that
while characteristic points can be identified in any given map line, their qualification as
characteristic does not necessarily hold as scale is reduced, and thus that their retention in line
simplification uninformed by scale is a flawed approach. The present algorithm uses a different
approach, in that the input line is sampled using a regular tessellation upon which all vertices of
the input line are weighted equally, with none regarded as more characteristic of the input line as
any other. Rather than seek to retain certain input vertices, the present algorithm seeks to follow
the many input line vertices as closely as possible within a certain spatial resolution defined in
direct relation to target scale.
5
Unique Contributions
The hexagonal quantization algorithm developed and presented in this research represents
an effort in a new paradigm of scale-driven generalization operators as described by Li (1996).
An important element of the algorithm is its scale-specificity, and the development of that quality
in this research is unique in its direct reference to representational (i.e., visual) resolution at target
scale, informed by sampling theory.
The algorithm is essentially a cartographic application of vertex clustering, a
generalization technique employed in computer graphics research outside cartography, in that it
reduces line vertices specific to each tessera of an imposed tessellation. The process of reduction
undergone in each tessera is known in signal processing literature as quantization (Rossignac,
2004, p. 1224). Though neither the terms “vertex clustering” nor “quantization” are used by Li
and Openshaw, essentially the same process occurs in the raster-vector mode of their algorithm
(Li & Openshaw, 1992).
Two essential differences exist in the algorithm presented here and the Li-Openshaw
raster-vector algorithm: tessellation geometry and mathematical means of objectively relating
tessera dimensions to target scale. The Li-Openshaw method essentially performs vertex collapse
within the square pixels of traditional raster structures (i.e., regular square tessellations), whereas
the algorithm presented here does the same in a regular hexagonal tessellation. The hexagonal
tessellation is chosen for its radial symmetry and uniform inter-tessera topology and distances.
Also, whereas sound guidelines for estimation of raster cell size in relation to target scale are
given by Li (2007, p. 65) for use in the raster-vector Li-Openshaw algorithm, the present research
offers formulae for the direct calculation of scale-specific, appropriate tessera dimension derived
from resolution theory.
6
Thesis Structure
The remainder of this thesis is laid out in four chapters. The Line Simplification
Literature chapter reviews ideas on the subject put forth by cartographers for the last five decades,
as well as topics relevant to this research; these include computational geometry, hexagonal
tessellations, and signal processing applications. The next chapter, The Hexagonal Quantization
Algorithm and Study Methods, explains in detail how the algorithm operates. This chapter also
describes the methods by which the hexagonal quantization algorithm was implemented and then
tested against an implementation of the Li-Openshaw raster-vector algorithm, as well as the
metric by which positional deviation from the input line was measured for both algorithms. The
Results and Interpretations chapter documents the data produced and gathered by the research,
and then discusses these. The thesis is concluded by the Conclusions and Future Work chapter,
which contains observations and future plans regarding the research.
Chapter 2
Line Simplification Literature
Line simplification has been an important topic in the cartographic literature for decades.
That importance is also apparent in the literature of other fields, where the motivation and
problem formulation may be different, but the essential computational geometry of extracting
pattern and form at one scale of measurement in a linear signal machine-retrieved at a higher (or
noisier) scale of measurement bears many similarities. Particularly noteworthy among these
fields are signal processing, pattern detection, and computer graphics. Across these fields as well
as in cartography, interest in automated line simplification has been driven by the objective of
reformulating signal (e.g., a map line, or lines in a computer-read image) for some other scale of
representation, or for producing a simplified correlate of a line.
Literature from several fields is reviewed in this chapter. In particular, automated line
simplification is briefly considered in light of the larger topic of cartographic generalization, a
field of research that spans several types of procedures (i.e., operators) on map data, with the
simplification of lines being among them. Line simplification, has held a seemingly privileged
place in the generalization literature, ostensibly because most geospatial data in vector form is
composed of some type of polyline, with the exception perhaps only of point data. This review
then seeks to describe some of the most interesting and popular conceptualizations of digital
cartographic lines and their simplification. Combined with these cartographic views, several
algorithms developed within the cartographic community are noted, with brief descriptions of
their essential workings. This review generally then departs from explicitly cartographic
literature to discuss simplification solutions from cognate fields such as computer graphics and
signal processing. In particular, the quantization process in computer graphics known as vertex
8
clustering is examined, in part to observe precedent to, and document consilience with, certain
parts of the hexagonal quantization algorithm in this research. This discussion includes cellular
or cell-like geometric shapes within which vertex clustering can be defined.
The review then shifts to literature from various signal-processing fields that almost
unanimously extols the benefits of hexagonal sampling lattices over square ones. Several authors
describe the unique properties of regular hexagonal tessellation in ℝ2 (i.e., the two-dimensional
Euclidean plane), and how these permit the collection of data samples that are more efficient in
representation, more error-free, and less anisotropic than samples collected with sensors (i.e.,
cells, pixels) arranged in regular square grids; these claims are corroborated by quantitative
measures.
Finally, the review moves to the Hausdorff distance, a metric used by geometers to
measure the distance (i.e., difference) between two sets of objects. Mathematically speaking, the
metric is applicable in any metric space; it is used in this research as an objective means of output
line evaluation, quantifying the maximum displacement of simplified lines from their detailed
input lines across the ℝ2 surface of a projected map.
Generalization
Weibel (1997, p. 101) offers the following description:
In cartography, the process which is responsible for cartographic scale reduction
is termed generalization (or map generalization, or cartographic generalization).
It encompasses a reduction of the complexity in a map, emphasizing the essential
while suppressing the unimportant, maintaining logical and unambiguous
relations between map objects, and preserving aesthetic quality. The main
objective then is to create maps of high graphical clarity, so that the map image is
easily perceived and the message that the map intends to deliver can be readily
understood.
9
It is apparent from the passage above that there exists no singular, objectively correct
way to generalize maps. Further, the broad objective given by Weibel refers to the achievement
of certain qualities in generalized maps, namely clarity and ease of perception, that are inherently
difficult to measure. These points about the generalization process along with others, such as the
challenge of encoding cartographically-acceptable representation decisions in strict logic for
computer automation, illustrate the circumstances under which cartographers have both
repeatedly conceived of generalization and recommended how it should be done.
Usually, it is agreed that generalization is a process that takes place in order to use
existing map data from some larger cartographic scale on smaller scale maps. By that definition,
generalization is scale-driven (Dutton, 1999; Li, 1996). The representation changes that
generalization calls for are negotiations of the reduced map area on which features must be
modeled and depicted while still retaining acceptable positional accuracy. Successful
generalization has been described by Ruas (2002, p. 75) as a synthesis process: the number or
complexity of symbols used is reduced, while the chief pieces of information and character of the
original are retained in a generalization that clearly conveys information.
The process of making any map at scales smaller than 1:1 involves generalization, in that
the real-world objects being mapped cannot be represented in their full detail. Generalization,
then, is essential to understanding a map (Bertin, 1983). Cartographic generalization, in the sense
that cartographers commonly refer to it, is an additional step of abstraction whereby existing
representations (e.g., polylines representing rivers) are further abstracted. Ratajski (1967) draws
a distinction between quantitative (i.e., numbers of features represented) and qualitative
generalization (i.e., abstraction of form). Similarly, Bertin (1983) describes conceptual and
structural types of generalization, being morphological redefinition of features in the prior and
diminution of frequencies of occurrence in the latter.
10
Cartographers do not unilaterally agree, however, on the understanding that
generalization is a scale-driven process and, by extension, what should be expected of its
successful invocation. Without necessarily requiring a change in scale, generalization can be
viewed as the work necessary when a map fails to “maintain clarity, with appropriate content, at a
given scale, for a chosen map purpose and intended audience” (McMaster & Shea, 1988, p. 242).
Still others regard data reduction as an additional explicit goal of generalization (Cromley, 1992),
an objective that is partly a relic of early map computerization and concerns about efficient use of
digital memory. Reflecting this variety of view points, there are several somewhat divergent
descriptions and sets of requirements for generalization in the literature, both in text books and in
scholarly journals (examples in Brassel & Weibel, 1988; Li, 2007; McMaster & Shea, 1992;
Sarjakoski, 2007; Stoter, Smaalen, Bakkerand, & Hardy, 2009). Robinson et al. (1995) describe
the collection of map scale, map purpose, graphic limitations and data quality as the appropriate
controls of the generalization process. Several authors have created taxonomies of situations that
call for generalization; notably among these Shea and McMaster (1989, p. 58) and McMaster and
Shea (1992, p. 45) describe the six conditions of congestion, coalescence, conflict, complication,
inconsistency and imperceptibility.
Harrie and Weibel (2007) describe generalization as having gone through an evolution,
from condition-action modeling where cartographers respond to problems (such as those
described by McMaster and Shea above), through human interaction modeling where the process
is semi-automated (as in many geoprocessing tools available through Esri's ArcGIS package), to
constraint-based modeling, where automated processes are run according to parameters defined
by explicit map requirements. Many of the processes undertaken to this day involve human
interaction and evaluation. Part of the reason for this may be the difficulty or even impossibility
of relating operator input parameters to target scales, such that operators need to be run and their
products evaluated iteratively until a satisfactory map is made. Related to this fact, some authors
11
have suggested that generalization is best undertaken with equipment that allows for the real-time
observation of products (Lecordix, Plazanet, & Lagrange, 1997; R. B. McMaster & Veregin,
1991); an example of such a piece of equipment is the online line simplification tool
MapShaper.org (Bloch & Harrower, 2008).
Regardless of motivation and paradigm, the process of generalization is regarded as
composed of various distinct operators, being processes that conduct specific geometric
modifications on specific types of data. As examples, line simplification is an operator which
modifies linear features, while amalgamation (or aggregation) is another which modifies any of
point, line, polygon or cellular (i.e., raster) features. The generalization process cartographers
undertake in map production is usually conceived of as a set of various operations, either
employed in parallel or in sequence or in mixtures thereof, and across several datasets.
Line Simplification
Line simplification is arguably one of the most important generalization operators, since
almost every map includes some form of lines. The volume and diversity of writings on this topic
in the cartographic literature reflects the community's continued concern over unresolved
geometric and practical issues. Even though some algorithms have been implemented and made
available in commercial GIS (such as Esri’s ArcGIS), no algorithms are yet considered suitable
and trust-worthy enough for large-scale automated map production.
Cartographers have taken various theoretical approaches to line simplification, reflecting
diverse formulations of what constitutes line symbols and how they convey information, both in
and of themselves as well as implemented as polylines in digital cartography. Among the most
popular of these conceptualizations is that of Peucker (1976), who models polylines as potentially
noisy vertex-based representations of true feature lines. In his model, the frequency of vertices
12
along a polyline bears a relationship to how efficiently, and with how much lateral displacement,
the polyline communicates the position of the true line (Peucker, 1976, p. 508):
A line is a combination of a number of frequencies, each of which can be
represented by certain bandwidths. The break-up of a line into series of bands,
therefore, could be equated with the stepwise removal of the high frequencies
from the line.
Elaborating on his notion, Peucker writes (p. 512):
It is the objective of the theory to keep the computation at any time on the highest
level of abstraction that the particular problem allows. The critical issue of the
theory is to find the highest level in any different type of problem.
Writing in 1976, an explicit objective of Peucker's approach was to allow for the
winnowing of points from what could be a noisily-digitized polyline, with the degree of point
reduction being tailored to the data resolution required for a given spatial analysis task (Weibel,
1997, p. 120). While Peucker did not exclude the possibility that his ideas could apply to the
simplification of lines as scale reduces, it is in this sense that the algorithm he developed with
Douglas (Douglas & Peucker, 1973) became widely adopted. Operations using points
corresponding to various bandwidths around lines have been developed for use both in the
simplification of lines (with examples to follow) and in comparing corresponding lines digitized
at different point frequencies (e.g., Savary & Zeitouni, 2005).
Characteristic Points
Related to the notion of point hierarchy with respect to bandwidth has been the very
popular notion of point hierarchy with respect to varying degrees with which points represent a
line. Borrowing theory developed by psychologist Fred Attneave (1954), cartographers have
believed that among the set of vertices making up a polyline there exist subsets of characteristic
points. Attneave asserts that certain points can be identified along perceived linear features in
13
anything a person can see, such that these points can be used alone and in abstract to successfully
represent the real object to a human. Generally these points are those where lines have the
greatest directional change, being apexes of curves and sharp points. He provides a now famous
example of a sleeping cat (p. 185), drawn by connecting only 38 points with straight lines (Figure
2.1). Marino (1979) related Attneave's ideas to existing thoughts on special points in cartography
(Dent, 1972). In her influential study, she found that human subjects exhibited a high degree of
consistency in selecting points along river lines as being important to retain when seeking to
represent the line with a pre-defined number of vertices. Her findings were widely taken to
corroborate the notion that characteristic points existed in cartographic lines, and that
simplification routines that retained such points would yield symbolically-optimal results.
Figure 2.1 - Attneave's sleeping cat.
(Source: Attneave, 1954, p. 185)
Following Marino's research, several authors began to regard the line simplification
process as one of removing extraneous points, while retaining those that were characteristic
(McMaster, 1987; McMaster & Shea, 1988; Veregin, 1999; White, 1985). These scholars pointed
out that the Douglas-Peucker algorithm (1973) was the best available for automatically
identifying and retaining characteristic points along a line, and that the paradigm it represented
should thus be continued. Further applying theory on characteristic points to cartography, Jenks
(1979) defines characteristic points as being of two types: those that are relevant to perceived
form (e.g., curve apexes) and those that are given particular geographic importance (e.g., where a
14
river passes under a bridge). In later work, Jenks (1989) continues to advocate for the use of the
Douglas-Peucker algorithm, though his emphasis is more in line with the original conception of
the algorithm as a means of making data more efficient, than with the popular assertion that the
algorithm is well suited to reconstruct lines through scale change. With respect to map making,
Jenks (1989, p. 34) makes a recommendation that has still not been successfully implemented
today: that characteristic points should be differentially selected with respect to map purpose and
scale.
Segmentation and Strip Trees
Beyond treating certain points in lines as special, several scholars have also suggested
that distinctions should be made between certain lengths in lines. Arguing reasonably that lines
may exhibit decidedly different morphologies at different positions along their length (e.g., a river
that follows a jagged course through rough terrain and then gently meanders through plains),
some have suggested that local morphology should drive the degree to which simplification is
carried out, as well as possibly which routines are used. A strong proponent of this approach has
been Buttenfield (1985, 1989, 1991), and her position has been echoed by Cromley and Campbell
(1992), Dutton (1999, p. 36), Plazanet (1995), and García and Fdez-Valdivia (1994). According
to Buttenfield, lines can be characterized by their structure signature, consisting of geometric
measures observed on hierarchically-divided lengths of the line (1991, p. 152). She goes on to
say "the structure signature's purpose is to determine scales at which the geometry of a line
feature changes" (p. 170). Particular segments are defined with geometric reasoning, often using
zeros of curvature (i.e., points along the line at which curvature in one direction ceases and then
goes to the other direction). While several relevant measures have been suggested elsewhere in
the literature (Carstensen, 1990; McMaster, 1986), mathematically relating such differential
15
measurements along the line to input parameters for simplification algorithms still remains
hypothetical and unclear.
Directly related to the idea of segmenting curves, though not necessarily in tandem with
the notion that they should be simplified to locally-customized degrees, is the division of sinuous
lines into elementary curves, each encapsulated within a ribbon-like band delimiting the
dimensions of the curve. Suggestions for these schemes, often called strip trees (Ballard, 1981;
Buttenfield, 1985; Cromley, 1991), usually center around efficient computation and indexing of
geometrically-distinct areas of a complex line, as well as the possibility that length-specific
treatments in polylines can allow for multiple levels of detail (LODs) in their representation.
Authors have suggested that strip trees and other related segmentation schemes can be easily
constructed from line digitization processes (Ballard, 1981), as well as from certain simplification
algorithms that inherently segment lines for the purpose of deciding vertex eliminations within
small spans, such as the Douglas-Peucker (1973), Lang (1969), and Visvalingam-Whyatt (1993)
algorithms. Other segmentation strategies have been proposed, such as the use of Delaunay
triangulation in the space between lines (Van Der Poorten & Jones, 2002), as well as the use of
regularly-spaced (i.e., tessellated) areas (Dutton, 1999; Li & Openshaw, 1992; Zhan &
Buttenfield, 1996).
Point Reduction vs. Line Redefinition
As discussed above, some researchers (Jenks, 1989; McMaster, 1987; Veregin, 1999)
regard point reduction as a necessary, if not sufficient, quality for an algorithm to be considered
as implementing the generalization operator of line simplification. This is to say that these
authors generally regard line simplification to be a process which reduces input vertices down to
a representative subset. This view has been challenged by several scholars (e.g., Dutton, 1999;
16
Raposo, 2010). Even though there are grounds for thinking of some of vertices in a polyline as
more representative of a shape, Dutton (1999) raises the point that all the vertices in question are
representative geometric abstractions, and no particular subset of them should be considered
sacrosanct when simplifying whole lines. Further, he avers that while some point reduction is a
likely consequence of simplification, transformation of features is valid and sometimes required
(1999, p. 34). Such transformations may involve the creation of new vertices which were not part
of the original data set. This view has been shared by Li and Openshaw (1992, 1993), who
classify generalization operators as falling into one of two groups: those that reduce points, and
those that smooth features. They argue that point reduction is relevant only to data efficiency,
and that authors who apply it to multi-scale representation are in error. Their arguments include
the widely-known poor performance of the Douglas-Peucker algorithm at relatively higher
bandwidths (supposedly being used for simplifying lines to much smaller scales), and the fact that
point-reduction methods cannot be considered a theory-based paradigm applicable to the whole
generalization process (Li & Openshaw, 1992, pp. 376-377). Line simplification for multi-scale
representation, then, ought not to be concerned with point reduction or retention, but rather with
producing lines appropriate for use at specified scales (Dutton, 1999; Li, 2007).
Constraints and Scale-Specificity
As previously mentioned, Harrie and Weibel (2007) describe the present paradigm of
generalization research as focused around the notion of constraints. Attempts to formalize
constraints usually hinge on quantifiable requirements (e.g., the minimum distance between lines
in map space for legibility). It has been suggested that popular algorithms such as the DouglasPeucker method could be calibrated using a priori map constraints (Veregin, 2000). Also,
constraints such as “lines must not self-cross” have inspired post-processing routines (Saalfeld,
17
1999). This methodological focus on constraints is in keeping with contemporary tendencies in
the broader generalization literature (e.g., Stoter et al., 2009). Using constraints, some authors
have suggested that knowledge-based expert systems be used for line simplification (e.g.,
Kazemi, Lim, & Paik, 2009; Skopeliti & Tsoulos, 2001). Such systems are described by
Buchanan and Duda (1983, p. 1) as heuristic (they reason according to knowledge from theory
programmed into them), transparent (they make their reasoning explicit if/when it is inquired),
and flexible (they can integrate new knowledge and so evolve). Wang and Muller (1998) present
methods based on recognizing line shapes and comparing them against cartographic
specifications, using, in their example implementation, rules from the Swiss Society of
Cartography (1977); confusingly, they do not regard their methods as rule based, claiming that
rules for lines are too ambiguous in practice and thus cannot be applied.
Scale-specificity, itself a kind of constraint, has been discussed in the literature with
curious infrequency. Weibel (1997, pp. 120-121) suggests that the popularity of the pointreduction-based Douglas-Peucker algorithm is due to its having an early FORTRAN
implementation, and its subsequent inclusion into popular GIS applications such as ArcMap; this,
in conjunction with the popularity of earlier research on characteristic points, may well amount to
the reason why point-reduction methods have been dominant over scale-specific methods in
research.
Töpfer and Pillewizer (1966) authored one of the only pieces of literature expressly
devoted to scale-specificity in generalization, their work having become known as the Radical
Law. This work provides several equations for the calculation of how many features should
remain on a map generalized to a target scale, given the number of those features present on the
initial larger-scale map. As many scholars have noted, the Radical Law provides a guide for the
number of symbols to be retained as scale decreases, but not which.
18
Li and Openshaw (1990) proposed the natural principle as a theoretical basis for
generalization. They apply the observation that in natural human vision, details gradually
diminish to the point of imperceptibility as one gets further and further away from the scene being
viewed. The series of algorithms produced by Li and Openshaw (Li, 2007; Li & Openshaw,
1992) are all efforts in their self-described “scale-driven paradigm” (Li, 1996). Similarly, Dutton
advocates for scale-specific line generalization: his Quaternary Triangular Mesh (QTM) (1999) is
a hierarchical nested tessellation of triangles on the globe's surface, and he describes how line
simplification can be done with respect to specific scale levels in the hierarchy. The hexagonal
quantization algorithm presented in this research is also an example of an expressly scale-specific
method.
Classes of Line Simplification Algorithms
McMaster (1987) and McMaster and Shea (1992, p. 73) offer a taxonomy of five kinds of
line simplification algorithms:
•
independent point algorithms
(Routines that operate on individual points irrespective of neighboring points.
An example is an algorithm that eliminates every nth point.)
•
local processing algorithms
(Algorithms that use calculations on immediate vertex neighbors to determine
whether a vertex should be dropped.)
•
constrained extended local processing algorithms
(As with local processing algorithms, but performing calculations using
neighbors beyond just those immediately in sequence.)
19
•
unconstrained extended local processing algorithms
(Like constrained extended local processing algorithms, except with neighboring
vertex search range defined by local geometric calculations.)
•
global algorithms
(Algorithms that perform calculations on the whole line synoptically.)
As has been mentioned above, some authors have disagreed with the utility of this
classification system, and have pointed out that its emphasis on point-reduction algorithms is
flawed. Douglas and Peucker (1973) gave a three-category taxonomy:
•
algorithms based on elimination of points along the line
•
algorithms based on approximation of a line using mathematical functions
•
algorithms that delete map features from the line
Seemingly viewing Douglas and Peucker’s third category as superfluous, Li and Openshaw
(1992) suggest there be only two categories:
•
data reduction methods
•
smoothing methods
The algorithm presented in this research, much like the Li-Openshaw raster-vector
algorithm, can be classified as of the second type both in Douglas and Peucker’s, and in Li and
Openshaw’s taxonomies. It does not fit well in the McMaster and Shea taxonomy, since while it
applies a global hexagonal tessellation, it does not calculate line geometric properties for use in
the procedure except for total spatial extent. All of the geometric distances used in the algorithm
are derived from the specification of a target scale, rather than a metric criteria by which points
should be reduced.
20
Survey of Cartographic Algorithms
In this section, several algorithms are reviewed, as well as two hierarchical-tessellation
systems developed by Zhan and Buttenfield (1996) and Dutton (1999), respectively. Following
brief discussions of several cartographic line simplification algorithms, four noteworthy
algorithms are described and discussed in some detail: Perkal’s ε-band method (1965), the
Douglas-Peucker algorithm (1973), the Visvalingam-Whyatt algorithm (1993), and the Li-
Openshaw raster-vector algorithm (1992). The methods described here represent those most
commonly available to and used by cartographers to date, and include one of the most successful
applications of scale-specificity.
Algorithms Popular in Cartography
Perhaps because the topic inherently arouses geometric curiosity, researchers in line
simplification have exhibited an impressive degree of creativity. The problem has been
approached from several perspectives. Two of the most interesting methods put forward—being
among the few that are explicitly related to cartographic scale-specificity—are those of Dutton
(1999) and Zhan and Buttenfield (1996). As previously mentioned, Dutton presents a global
hierarchical triangular tessellation (Quaternary Triangular Mesh, or QTM) a construct he then
applies widely to many aspects of multi-scale representation and spatial indexing. In his paper,
Dutton (1999, p. 38) focuses on vector generalization, and in particular, line simplification:
Generalization via QTM is primarily a spatial filtering process. Starting at a data
set's highest encoded level of QTM detail, this filtering works by collecting line
vertices that occupy facets at the (coarser) target level of resolution, then
sampling coordinates within each such facet.
Arguments Dutton makes for the benefits of generalization by QTM include the fact that
latitude and longitude coordinates are directly manipulated, meaning that projection by any
21
means can be done after generalization with vertex positions remaining faithful to where they
belong on the globe, and the quality of the simplification can be manipulated by means of using
different sampling and simplification strategies inside each triangular mesh element (i.e., tessera).
His essential strategy is shared by Zhan and Buttenfield: they employ a raster pyramid scheme,
being a nested tessellation of square cells. Map resolutions (i.e., the resolutions to which
simplified lines can be drawn) are progressively doubled by doubling the cell resolution, a
processes analogous to doubling cartographic ratio scale (Zhan & Buttenfield, 1996, p. 207).
Lines are simplified in a step-wise sequence as one goes from one resolution level to the next,
using methods described by Meer, Sher and Rosenfeld (1990). They decline to relate pyramid
levels to specific map scales, but suggest that the most detailed method should have a pixel
resolution of 0.2 to 0.3 mm in map space, reflecting the smallest marks that may be visible on the
map medium.
Consideration of shore lines in the famous work of Mandelbrot (1982) may well have
contributed to the enthusiasm some cartographers have shown for fractal-based simplification
methods (e.g., Buttenfield, 1989). A key concept in fractal geometry is the notion of selfsimilarity, whereby magnification (or diminution) of the neighborhood around a form yields the
same form as the whole set. Researchers have pointed out the imperfect application of selfsimilarity to coastlines, since the whole of the line is acted upon by various geomorphologic
forces operating at various spatial scales, and therefore cannot be expected to display selfsimilarity throughout scales. Buttenfield (1989), rather than apply self-similarity to lines as
wholes, suggests that features are self-similar at various sets of scales, then change at critical
points with scale-dependence, and again have self-similarity at different scale ranges—
quantifiable behavior she terms structure signature. Normant and Tricot (1993) sought to clarify
among cartographers that fractal geometry does not necessarily require the use of self-similarity.
The fractal dimension of a form (Mandelbrot, 1982) is a measure of how much that form fills a
22
space. It differs from Euclidean dimension in that it can be expressed in real (i.e., not just
integer) numbers. So, while a curving line on a plane may exist in Euclidean dimensions ℝ2, it
may have a fractal dimension of something like 2.6, reflecting its sinuous, space-filling nature.
Muller (1987) has suggested that the preservation of measured fractal dimension should be a
guideline for simplification algorithms and used to evaluate their results, since product lines with
very similar fractal dimensions to their higher-detailed counterparts should retain the
morphological character of the line. Normant and Tricot (1993) used a convex-hull fractal
dimension computation method to operationalize that idea.
Other efforts in line simplification have involved computations on various derived
geometric shapes around the line. Cromley (1992) sought to implement an alternative bandwidth
concept proposed by Peucker (1976), wherein the band is defined around the principal axis of the
points in a length of the line (rather than the segment joining the first and last points of that
segment). His method is similar to the standard Douglas-Peucker (1973) and Lang (1969)
methods. Similarly inspired, Christensen (2000) sought to digitally implement—by way of
standard polygonal buffering procedures commonly used to create waterlines—Perkal's (1965)
proposal that medial axes of polygonal areas could be used to collapse areas to linear features.
Essentially, increasingly convergent lines eventually create the points at which a medial axis line
is defined, and this line can be used to represent the shape at scales at which the shape area is no
longer resolvable. Christensen suggests that a very similar methodology can be applied to lines:
the lines are artificially made into polygons, the process is undertaken, and then the artificial arcs
are removed (p. 24). Van Der Poorten and Jones (2002) propose a complex system in which the
areas around a sinuous polyline and within its calculated bounding box are partitioned using
Delaunay triangulation. Sequences of triangles in the resultant tessellation are used to define
“branches” of the sinuous form, which can be measured for differential weighting in
simplification routines, or flagged for deletion by pruning.
23
Relating more closely to line treatments from pattern recognition and processing fields,
Thapa (1988a) presents a cartographic algorithm based on Gaussian smoothing. Following
related work in function convolution by researchers in pattern recognition, Thapa's method
produces a mathematical approximation of the curve by taking the convoluted values of the
second derivative of the Gaussian, and overlays this with the original line to find intersection
points, described as zero-crossings. His method can be used to varying degrees of simplification
by varying the Gaussian smoothing, though it is unclear whether this method can be related to a
target map scale. Thapa points out (1988b) that his method is also useful for detection of critical
points along a line, which he insists are not relevant for use in multi-scale representation but can
be useful for pattern recognition and data compaction.
While several multi-scale solutions involving chain pyramids (e.g., Zhan & Buttenfield,
1996), strip trees (e.g., Ballard, 1981), and Gaussian smoothing are available, Rosin has
suggested that it is most sensible to determine “natural scales” of lines, being those levels of
generalization where only the most informative forms of the curve are retained (Rosin, 1992, p.
1315):
The structure of an object is generated by a number of different processes,
operating at a variety of scales. Conversely, each of these scales is a natural
scale to describe one level of structure generated in the object's contour.
His method segments curves into elementary convex and concave arcs defined by zeros
of curvature, and applies Gaussian smoothing along with a shrinkage correction (since Gaussian
smoothing tends to shrink forms). The method is robust for even noisy lines (Rosin, 1992, p.
1321).
Finally, relating to concepts in animation, Cecconi (2003, pp. 84-112) suggests the use of
morphing (i.e., gradual shape changing between two states, as is commonly done in computer
graphics). This method requires established control scales where two “keyframes” (i.e., map
extents, in this application) are in spatial correspondence. Shape transformation then occurs
24
between the two keyframes by interpolation techniques. One of the two keyframes must always
be of lower detail, and the other of higher detail, than the desired generalization.
Perkal's ε-band
Developed before the digitization and automation of cartography, Perkal's ε-band method
(1965) is one of the few truly scale-based methods of simplification. Perkal devised the method
for the simplification of the borders of polygonal areas, but some scholars have suggested that the
same methods can be implemented for open lines. Nevertheless, it has been difficult to
implement Perkal's method in software (Christensen, 2000; Li, 2007, p. 147). The method entails
rolling a circular roulette of diameter ε along the edge of a polygonal feature. Lengths of the
polygon perimeter inaccessible to the roulette are considered too fine to retain, and instead the arc
formed by the roulette edge is taken to be the new, simplified line, until it connects again with the
line in the original polygon (Figure 2.2).
Perkal's method is scale-specific in that the value of ε is considered in direct relation to
the target scale for which the map is being generalized. For example, if the line weight one
wishes to use is 0.5 mm, and one is generalizing a lake from a map at 1:25,000 for use on a map
at 1:100,000, one should use a roulette in the lake on the 1:25,000 map with a diameter of 2 mm,
being 0.5 mm increased by the ratio of target and input scales. Generally:
𝑆𝑡
𝜀 = 𝑤� �
𝑆𝑖
where ε is the width of the band within which the original line must not overlap itself, or needs to
be generalized (i.e., dropped), in terms of map units; w is the desired line weight to be used on the
target map, in map units; and S t and S i are the target and initial scales, respectively.
25
Figure 2.2 - Perkal’s method at three different values of ε.
Hatched areas are inaccessible to the roulette, and therefore
dropped from the lake form. (Source: Perkal, 1965, p. 65)
The Douglas-Peucker Algorithm
The Douglas-Peucker algorithm (1973) is by far the most popular algorithm for line
simplification in use by cartographers today, and has had several advocates from a theoretical
standpoint (McMaster, 1987; White, 1985). It is a standard method in the suite of geoprocessing
tools in Esri's ArcMap software, and various researchers have included it in their constructions of
whole generalization systems (Nickerson, 1988). The algorithm is based on Peucker's (1976)
26
theories of the nature of a cartographic polyline, being a form composed of vertices that
correspond to varying frequencies (i.e., levels of detail). It should be noted that the algorithm,
published in 1973 and having been independently developed, is virtually identical to that of
Ramer (1972), who was working on lines in computer graphics.
The algorithm begins when a user supplies a tolerance value, being a distance which a
vertex must lie beyond in order to be kept. The algorithm then considers every vertex in the line.
The first point is taken as an anchor, and a reference line connecting this and the last point, the
so-called floater, is drawn. The perpendicular distances to all other points to this line are then
measured. If there exist vertices with distances beyond that of the tolerance given (i.e., vertices
outside the band delimited by the tolerance distance from the measuring line), the algorithm
proceeds (otherwise, it generalizes the whole line to the segment running from the first to the last
vertex). The algorithm selects the vertex whose perpendicular distance to the line was greatest,
and uses this vertex as a new floating point. Also, the floating point is saved as a member of a
stack for later use. Now using a reference line between the anchor and the new floater, the
algorithm repeats the process of measuring perpendicular distances for each point in the line
between the anchor and present floater, and again will establish a new floater and add it to the
stack, if this is necessitated by the presence of vertices outside the tolerance band. The algorithm
continues to iterate, progressively working backward toward the beginning of the line and
establishing a stack of anchor points for itself for later use. When during these iterations the
algorithm does not find vertices beyond the tolerance distance, it considers any vertices within the
tolerance distance as extraneous, and deletes them, keeping only the anchor and floater and
joining them by a straight line. Each time it makes this join, it moves the anchor ahead to the
floater position, and repeats the process using the next-available floater from the recorded stack.
Figure 2.3 illustrates this process, depicting several steps from start to finish on a short polyline.
27
Figure 2.3 - The Douglas-Peucker algorithm. (Source: McMaster & Shea,
1992, p. 80-81)
Several authors have noted practical problems with the Douglas-Peucker algorithm (e.g.,
Li & Openshaw, 1992; Zhan & Buttenfield, 1996). The main issues consistently raised are the
problem of how the algorithm can produce self-intersecting lines given complex input lines, and
that the output tends to be so angular as to degrade the aesthetic quality of the line. Muller (1990)
has described a suite of post-processing methods for correcting self-intersection after any line
generalization procedure (though his work is generally aimed toward the Douglas-Peucker
algorithm). Saalfeld (1999) suggests that a test for self-crossings using convex hulls on segments
between anchors and floaters can be added to the algorithm. In this implementation, the
algorithm would not stop itself and move along the line until this test was satisfied, and thereby
not produce topological errors.
28
A final criticism of the Douglas-Peucker algorithm is that there is no reliable objective
way to relate the tolerance band distance to a target scale.
The Visvalingam-Whyatt Algorithm
The Visvalingam-Whyatt algorithm (1993) examines each vertex along the line with
respect to the triangle it forms with its immediate two neighbors. When this area falls below a
user-specified areal displacement tolerance, the point in question is dropped. “The basic idea
underpinning this algorithm is to iteratively drop the point which results in the least areal
displacement from the current part-simplified line” (Visvalingam & Whyatt, 1993, p. 47).
Geometrically simple, this algorithm is also widely used, and incorporated into Esri’s ArcMap
software. It is also prone to topological error (i.e., self-crossing), and the user-specified tolerance,
as with the Douglas-Peucker algorithm, cannot be objectively related to target scale. Figure 2.4
illustrates the algorithm.
Figure 2.4 - The Visvalingam-Whyatt
algorithm. (Source: Visvalingam & Whyatt,
1993, p. 47)
29
The Li-Openshaw Raster-Vector Algorithm
One of few scale-specific line simplification algorithms, the Li-Openshaw raster-vector
algorithm is actually one of three related variants, the others being raster-mode and vector-mode
(Li & Openshaw, 1992). The algorithm is based on the natural principle developed by the
authors (1990), and forms a central part of Li's suggested “new paradigm” for map generalization
(Li, 1996; Li & Su, 1995).
To use the algorithm, the user first determines the width of the smallest visible size
(SVS), being the smallest mark that can be made on the target map; this value often falls between
0.2 to 1.0 mm (Li, 2007, p. 65; quoting Speiss, 1988), though Li writes that experience suggests
values from 0.5 to 0.7 mm for best results. The value of the SVS in terms of real distance units is
calculated by
𝐾 = 𝑘 × 𝑆𝑇 × (1 −
𝑆𝑇
)
𝑆𝑆
where K is the SVS diameter in ground units; k is the map symbol size (i.e., SVS in map units);
and S S and S T are the initial and target scales, respectively (Li, 2007, p. 65).
The SVS size in real world units is used to generate a raster, with one cell centered on the
first vertex of the line to be simplified. The raster is made large enough to cover the extent of the
line, such that every vertex of the line falls within some raster cell. Then, sequencing along the
line, all the vertices falling into a cell are collapsed to a single vertex. While Li suggests that
many different methods of generalized point selection within a cell are acceptable (2007, pp. 152153), he recommends using the midpoint of the segment between the point at which the input line
enters a cell and the point at which it exits the cell; Figure 2.5 illustrates the method, using the
midpoint point selection strategy.
30
Figure 2.5 - The Li-Openshaw raster-vector algorithm. The sinuous gray line represents the input
line, the darker gray lines are segments within cells from entry to exit points of the input line, and
the black line is the simplified line, formed from the midpoints of the darker gray lines. (Source:
Weibel, 1997, p. 125)
Outside of Cartography: Vertex Clustering and Mesh Simplification
In the fields of computer graphics and computational geometry, several strategies have
been employed to tackle the problem of geometric simplification. Noteworthy in the present
research is the concept of mesh simplification, and in particular, vertex clustering. To the
author's knowledge there has been little acknowledgement in the cartographic literature of the
similarity of vertex clustering to certain cartographic generalization routines. The concept is
generally the same as that employed in the Li-Openshaw algorithms (1992), Dutton's QTM
generalization scheme (1999), and the algorithm presented here.
Mesh simplification is a family of methods that seeks to reduce the geometric detail with
which a two- or three-dimensional form is rendered. (In principle, it remains possible to apply
the methods to objects in higher dimensions.) These methods are frequently applied in a variety
of computer graphics settings. A short survey of the literature suggests that the methods are most
31
often applied to three-dimensional forms composed of vertices defining triangle faces; the meshlike system of the vertices and triangle faces that make up a form is known as a manifold. Yang
and Chuang (2003, p. 206) describe the mesh simplification methods as follows:
Most algorithms work by applying local geometry based criteria for simplifying
small regions on the meshes ... The criteria are iteratively applied until they are
no longer satisfied or a user-specified reduction rate is achieved.
Simplification of meshes is often required. For example, a mesh may constitute an object
in a computer video game in which the player's view is a simulated first-person perspective.
While the player is far from the object it is unnecessary to compute the appearance of the object
onscreen in all its detail; there would likely also be insufficient screen space (or pixels) to display
the detail. Instead, the form is usually pre-computed to several levels of detail (LODs),
corresponding to viewing distances from the player. As the player moves closer to the object in
the game, the game can progressively implement progressively more detailed LODs of the object
for rendering. This example describes an example of a progressive mesh (Hoppe, 1996).
Mesh simplification is generally done by a process of elimination of vertices from the
manifold, sometimes also creating new vertices to represent those that have been collapsed
(Figure 2.6). One such family of algorithms for this process is vertex and edge collapse
Figure 2.6 - Mesh simplification. (Source: Dalmau, 2004)
algorithms (Dalmau, 2004), which search the local neighborhoods of vertices (or edges) and
delete vertices (or two, for an edge) from a manifold when triangle faces are found to be
sufficiently coplanar, given a defined tolerance. The resulting gap in the manifold is then
smoothed over with new, larger triangle faces.
32
Related conceptually to vertex and edge collapsing is vertex clustering. Rossignac (2004,
p. 1224) writes:
Vertex clustering, among the simplest simplification techniques, is based on a
crude vertex quantization, obtained by imposing a uniform, axis-aligned grid and
clustering all vertices that fall in the same grid cell.
The most generic version of this method applied to three-dimensional manifolds is a
three-dimensional tessellation of cubic voxels. It can be seen intuitively that the size of the
voxels defines the number of vertices that will fall within it, and thus defines the degree of
simplification. The next task in the method is to chose a vertex in each voxel to represent all the
others. Rossignac and Borrel (1993) found that choosing the vertex farthest from the center of
the object's bounding box made for the best results, likely because this would counteract the
tendency the method has to shrink three-dimensional manifolds (Rossignac, 2004, pp. 12241225). Rossignac goes on to suggest that even better results can be achieved by using vertices
achieved by more computationally-costly methods, such as comparing vertices for weights
reflecting the likelihood that the vertex would be part of the object's silhouette from a random
viewing angle.
Mesh simplification has been suggested in cartography before: Burghardt and Cecconi
(2007) have suggested it be used for building generalization.
Mesh simplification and vertex clustering both depend on the tessellation of space. This
review now shifts to issues specific to the use of tessellations as means of sampling signal;
concepts discussed are pertinent to the hexagonal clustering algorithm presented in this thesis.
The Hausdorff distance is then discussed as an objective means of evaluating signal sampled
using tessellation schemes.
33
Hexagonal and Square Tessellations Applied to Pattern Analysis and Generalization
The benefits peculiar to data models based on tessellations of the plane are well known,
and the familiar square-pixel raster data model is probably the best known implementation.
Applications of data models of this type have been widespread in fields such as pattern analysis
and computer vision, but have been rarely considered in cartographic line simplification.
Examples are Dutton (1999), Li and Openshaw (1992), and Zhan and Buttenfield (1996).
Uniform tiling (i.e., regular tessellations) is frequently used to both sense and represent
planar data of various kinds (e.g., Landsat images). Inherent to the creation of a tiled
representation is a process of quantization; a single measurement is recorded in each cell of a
sampling mesh, thereby generalizing what is potentially infinitely-differentiable signal.
Naturally, the quantization in a uniform mesh is a function of the signal and the geometry —size,
shape, topology, orientation— of the cell (excluding matters of measurement precision). So long
as cells are arranged in a true tessellation (i.e., without gaps or overlaps), and so long as the mesh
spans the whole extent of the signal in question, geometric intersections between the signal and
the sampling mesh will always exist. This is to say there cannot be degenerate cases, such as
points beyond the sampling grid, or points falling between sampling mesh elements (Akman,
Franklin, Kankanhalli, & Narayanaswami, 1989). While it is true that there is a loss of the exact,
real data points as they are quantized to the locations of grid pixels (Kamgar-Parsi, Kamgar-Parsi,
& Sander, 1989, p. 604), this loss is always bounded by the mesh (i.e., sampling) resolution. It is
important to keep in mind that in the objective to generalize a complex signal, data loss is actually
a requirement (as it is, for example, in cartographic generalization).
As mentioned previously, quantization is a function of sampling geometry. A large body
of literature on pattern analysis examines the relative merits of the three possible regular
tessellations of the plane, being the triangular, rectangular, and hexagonal (Figure 2.7) . The
34
triangular is rarely considered for this purpose, as there is inherent orientation variability in that
geometry, making measurements across pixels more complex than those of the rectangular and
hexagonal tessellations. Regarding the issue of element orientation, most literature discusses
squares and equilateral hexagons, though variations in pixel dimensions are considered as well,
Figure 2.7 - The three possible regular tessellations of the
plane. (Source: Peuquet, 2002)
usually for optimization in specialized applications (e.g., Iftekharuddin & Karim, 1993; KamgarParsi et al., 1989, p. 609; Scholten & Wilson, 1983).
Overwhelmingly, the literature indicates that hexagonal sampling meshes perform more
efficiently, with less error, and with more meaningful inter-element connectivity than square
meshes (Birch, Oom, & Beecham, 2007; Carr, Olsen, & White, 1992; Condat, Van De Ville, &
Blu, 2005; Duff et al., 1973; Graham, 1990; Iftekharuddin & Karim, 1993; Mersereau, 1978,
1979; Nell, 1989; Puu, 2005; Scholten & Wilson, 1983; Weed & Polge, 1984; Yajima, Goodsell,
Ichida, & Hiraishi, 1981). Regardless of this virtual consensus, square pixel data models remain
dominant in practices using regular grids, examples of which include common digital graphics
formats (such as standard screen pixels and image file types), climate and ecology models, and
GIS raster modeling. Graham (1990, p. 56) has suggested that early pioneering work in computer
spatial modeling by Unger (1958) using square pixels may have set a decisive precedent, and the
uniform Cartesian coordinates with which square pixels may be easily indexed is a quality that
makes square pixels attractive (Birch et al., 2007, p. 354). It also seems probable that the popular
35
adoption of square pixels was influenced by available hardware, as early devices were engineered
and made available used square meshes (e.g., cartographic digitizing tablets).
Connectivity between cells is one of the most convincing reasons why hexagons are
frequently regarded as more suited to sampling planar signal than squares. If the neighbors of a
cell are considered to be those that contact the cell by either an edge or a corner, then it is seen
that triangles have 12 neighbors, squares 8, and hexagons 6. Table 2.1 summarizes the
comparative distances between neighboring tessera.
Shape
Triangle
Number of
neighbors
12
Distance between neighbors
1
√3
𝐶 or 𝐶 or
Square
8
𝐶 or √2𝐶
Hexagon
6
√3𝐶
𝐶 = length of side of cell.
2
√3
𝐶
Cell area
√3 2
𝐶
4
𝐶2
√3 2
𝐶
2
Table 2.1 - Distances and areas for different regular tessellation geometries. (Source: Duff et al.,
1973, p. 245)
It is readily apparent that the only shape with a consistent distance to its neighbors is the
hexagon. Furthermore, connectivity to neighbors is defined exclusively by edge contact, meaning
that the spatial relationship between one tessera and its neighbor always has a consistent spatial
meaning for hexagons; this is untrue both for triangles, which have edge connectivity and two
orientations of corner connectivity, and for squares which have edge connectivity (i.e., four
neighbors in a von Neumann neighborhood of range = 1, each at a distance equal to the square's
side) and corner connectivity (i.e., four additional neighbors at diagonals, each at a distance equal
to √2 times the square's side). Because hexagons can neighbor each other exclusively by sharing
common edges, they evade the connectivity paradox that occurs in triangle and square arrays
36
when connectivity by corners is permitted (Figure 2.8). Thus, connectivity between hexagonal
cells is better defined than for square cells (Yajima et al., 1981, p. 223). Further, when sampling
a linear signal, hexagonal error is less sensitive to sampled line orientation than square, because
the six-fold radial symmetry of hexagons is more isotropic than the four-fold symmetry of
squares (Kamgar-Parsi et al., 1989, p. 609; Mersereau, 1979, p. 932).
Figure 2.8 - Connectivity paradox; in triangles and squares,
whether or not regions A and B are connected by the
corners of cells l and m is unclear, as is whether or not gray
cells form a continuous region across cells p and q. There
is no such ambiguity in hexagons. (Adapted from source:
Duff, Watson, Fountain, & Shaw, 1973, p. 254)
Pappus of Alexandria (c. 290 - c. 350 CE) proposed the “honeycomb conjecture,”
mathematically proven only quite recently (Hales, 2001). It suggests that regular hexagons are
37
the most efficient way to tessellate the plane in terms of total perimeter per area covered. A
related property of hexagons in comparison to squares is how closely each shape approximates a
circle (Figure 2.9); since the area of a circle is defined as the locus of points at or within a certain
distance from the circle center (the distance being the circle's radius), a circle is the most compact
shape possible in ℝ2. Any equilateral polygon that covers its circumcircle more completely is a
closer approximation of the circle than another equilateral polygon which covers less. As a
Figure 2.9 - An equilateral hexagon and square in their circumcircles. The
area of the hexagon is closer to its circumcircle than is the square’s to that of
its circumcircle. (Source: WolframAlpha.com)
corollary, hexagons, because they approximate circles more closely, are more compact than
squares. This fact has direct application to any set of point sensors arranged on a plane or similar
surface, and can be seen reflected in nature (e.g., most animal vision organs have rods and cones
arranged in nearly-regular hexagonal tessellations in the eye's fovea). Essentially, these
compactness properties mean that when using either geometry in a tessellated plane array of
sensors, the hexagonal array can sample a given planar signal with the same degree of fidelity
using fewer tessera (Condat et al., 2005; Mersereau, 1978; Nell, 1989, pp. 109-110).
Graham (1990) tested for anisotropic effects in medical images across three tessellations:
a pentagonal approximation of hexagonal tessellation, a non-regular hexagonal grid, and a regular
hexagonal grid. He found that tessellation artifacts in the sensor response were consistently
lowest in the regular grid. He thus recommends the use of regular hexagonal grids for their
superior detection and representation of local variation on a plane.
38
Beyond applications of Christaller's (1933) classic theory, hexagonal tessellation has
been advocated for thematic cartography by Carr, Olsen and White (2004), and has been used to
study cluster perception in animated maps (Griffin, MacEachren, Hardisty, Steiner, & Li, 2006),
as well as color perception (Brewer, 1996).
Hausdorff Distance
The Hausdorff distance has seen widespread application in computer science, often in
automated pattern matching applications (Alt, Godau, Knauer, & Wenk, 2002; Alt & Guibas,
2000; Huttenlocher, Klanderman, & Rucklidge, 1993; Knauer, Löffler, Scherfenberg, & Wolle,
2009; Llanas, 2005; Rucklidge, 1996, 1997; Veltkamp & Hagedoorn, 2000). Hausdorff distance
has also been used in cartography to both measure generalizations (Hangouët, 1995) and conflate
datasets of differing levels of generalization (Savary & Zeitouni, 2005). It is computationally
efficient, provides a single measure of global spatial difference, and is meaningful on the plane of
any distance-preserving map projection.
Named for Felix Hausdorff (1868 - 1942) and described in some detail by Rucklidge
(1996) and Veltkamp (2001), the Hausdorff distance is a measure of distance between two sets in
a metric space, commonly used in computer science image-matching applications. One kind of
distance commonly used is the L 2 metric (i.e., the Euclidean straight-line distance). With two
sets A and B, the directed Hausdorff distance (h) from A to B is expressed as
�⃗(𝐴, 𝐵) = 𝑠𝑢𝑝𝑎∈𝐴 𝑖𝑛𝑓𝑏∈𝐵 𝑑(𝑎, 𝑏)
ℎ
with d(a,b) being the underlying distance. This formula equates the directed Hausdorff distance
from set A to set B to the maximum value (sup, short for supremum) among all the shortest (inf,
short for infimum) distances from any a (i.e., a member of set A) to any b (i.e., a member of set
B); the longer dotted line M in Figure 2.10 illustrates this relationship.
39
Figure 2.10 - The Hausdorff Distance in ℝ2. Line M represents the longest
distance an element a of all elements A has to go to reach the closest
element b. Line N represents the same, but from B (and all elements b
thereof) to the closest element a. Line M is the directed Hausdorff distance
from A to B, while line N is the directed Hausdorff distance from B to A.
The longer of these two (M) represents the (overall) Hausdorff distance.
(Figure adapted from source:
http://www.mathworks.com/matlabcentral/fileexchange/26738-hausdorffdistance, graphic by Zachary Danziger)
Separate, directed distances between the two sets are required, because the distance in
either direction is not necessarily the same. The directed Hausdorff distance between B and A
(line N, figure 2.10) is given simply by inverted notation:
�⃗(𝐵, 𝐴) = 𝑠𝑢𝑝𝑏∈𝐵 𝑖𝑛𝑓𝑎∈𝐴 𝑑(𝑏, 𝑎)
ℎ
The Hausdorff distance (H) is the greater of the two directed Hausdorff distances:
�⃗(𝐴, 𝐵), ℎ
�⃗(𝐵, 𝐴))
𝐻(𝐴, 𝐵) = max (ℎ
Applying the metric to points, the Hausdorff distance is the farthest away any one point
from either of two sets is from the nearest point of the other set; it is a global measure of the
greatest local difference in position observed between two point sets. If one set is derived from
another, the Hausdorff distance can be considered a measure of deviation. In this manner, the
40
vertices of input and simplified polylines, as they exist projected on a map (i.e., in ℝ2), are
meaningfully measured for displacement using the Hausdorff distance and the L 2 metric.
Mathematically speaking, the Hausdorff distance qualifies as a true metric because it
satisfies the properties outlined in Table 2.2.
Nonnegativity
Identity
Uniqueness
Triangle Inequality
𝑑(𝐴, 𝐵) ≥ 0
The distance between sets A
and B will be zero or greater.
𝑑(𝐴, 𝐴) = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑎 ∈ 𝐴
There is no distance between
an element of set A and itself,
so that the distance from A to
itself is zero.
𝑑(𝐴, 𝐵) = 0 𝑖𝑓𝑓 𝐴 = 𝐵
There is zero distance
between two sets if and only
if the two sets are equal.
𝑑(𝐴, 𝐵) + 𝑑(𝐴, 𝐶) ≥ 𝑑(𝐵, 𝐶)
The sum of distances between
sets A and B and between B
and C must be greater than or
equal to the distance between
A and C.
Table 2.2 - Required properties of a true mathematical metric. (Source: Veltkamp & Hagedoorn,
2000, p. 468)
The last condition, triangle inequality, is a particularly crucial property of mathematical distance
measures; it is important when using a metric to compare patterns, since without it, it is possible
to have A very similar (i.e., close) to B, and B very similar to C, such that A is also very similar to
C (Arkin, Chew, Huttenlocher, Kedem, & Mitchell, 1991, p. 209).
41
The Hausdorff Distance vs. McMaster's Measures of Simplified Lines
In the context of cartographic line generalization measurement, McMaster has asserted
that six measures of a generalized line against its original correlate are useful, having narrowed
the list from a set of thirty (1986, p. 115):
1. percent change in the number of coordinates;
2. percent change in the standard deviation of the number of coordinates per map
unit;
3. percent change in angularity;
4. total vector displacement per map unit;
5. total areal displacement per map unit;
6. percent change in number of curvilinear segments (lengths in which all angles at
vertices are either positive or negative).
Three of McMaster's measures (1, 2 and by corollary 6, above) are related to the decrease
of vertices from a polyline, an objective he believes is integral to cartographic line simplification.
The present author does not share this view, and instead believes that the morphological
simplification of a line as scale changes is of principal importance, rather than the number of
vertices used in the polyline to represent the feature. While lines will tend strongly to go from
higher to lower angularity with increasing levels of simplification, changes in angularity (3
above) are not absolutely reliable. It is conceivable, for example, that a rounded promontory at
one scale may be represented by a relatively more angular, and small, bump at a smaller scale,
increasing total line angularity (all other lengths of the line remaining equally angular).
Displacement measures are important, particularly in topographic mapping contexts, since they
relate to positional accuracy. Total vector and areal displacement (4 and 5) are difficult to relate
to the positional accuracy of a generalized line in its totality, since the displacement along
42
different lengths of the line may be variable, and separate computation would be required to find
where that displacement existed and in what quantities.
It is instead suggested that Hausdorff distance be used for measuring the relative
displacement of cartographic lines after simplification. Since the Hausdorff distance provides a
single number that represents the furthest distance any one element of one set is from an element
of the other set, it can be applied to the vertices of an initial polyline and its simplified polyline,
thereby describing the greatest existing displacement between the two lines. This value seems a
sensible measure for the relative positional deviation between the lines. Also, if the input line is
taken to be authoritatively “correct” in position, this value describes the “error” of displacement
introduced by a simplification. Further details on this reasoning are given in the next chapter,
where the application Hausdorff distance in this research is explained.
Summary
This literature review has illustrated several approaches cartographers have taken in
seeking to automate line simplification, and contrasted these to some of the approaches taken in
other cognate fields, such as signal processing. It is seen that among cartographers the goals of
line simplification are not unanimously agreed-upon, with particular disagreement on whether or
not simplification retains subsets of vertices, or is ultimately unconcerned with these and instead
seeks a simplified correlate line without much concern for the particular vertices that make up its
form. Cartographers, also, have frequently pursued methods that lack scale-specificity, and
discussion of simplification for particular target scales is curiously rare in the literature. Finally,
there does not exist consensus among cartographers as to how simplified lines should be
evaluated.
43
The work presented in the following chapters represents a cartographic effort with the
goal of scale-specificity, and from the viewpoint that appropriate cartographic lines for target
scales are what should be sought, rather than the retention of certain vertices from the input line.
Chapter 3
The Hexagonal Quantization Algorithm and Study Methods
Overview of the Algorithm
The hexagonal quantization algorithm uses a vertex clustering technique to alter the set of
points that define a map polyline (Figure 3.1). The basic concept of the vertex clustering method
is to impose a tessellation on the form to be simplified, and to reduce the number of vertices
Figure 3.1 - The hexagonal quantization algorithm. In each hexagon, the input
vertices (gray) are quantized to a single output vertex (black), resulting in a simplified
output line (in black).
falling within each tessera to a single vertex, the later process known in the field of signal
processing as quantization. Vertex clustering works for either polyhedra or computer-graphics
manifolds in ℝ3 using three-dimensional tessellations such as voxels or, for polygons, polylines
or point sets in ℝ2, using a two-dimensional tessellation. Tessellations used can either be regular,
having tessera of equal shape and dimensions, or irregular, depending on the intended application
of the method. Vertex clustering has been in use for some time now in computer graphics
45
applications (Dalmau, 2004; Rosignac, 2004; Yang and Chuang, 2003), but is relatively new to
cartographic data transformation.
The algorithm operates across scale, and within a given resolution. Scale is understood in
the traditional cartographic sense, and is expressed as a ratio indicating the magnitude of
reduction at which the representation exists from the real feature. Resolution is understood to be
the level of representational detail possible (and by corollary, level of visual detail discernable)
on the target map, and is expressed by the size of the smallest possible graphic on the map (e.g.,
0.25 mm, or the width of a pixel). To determine the level of simplification necessary, the
hexagonal quantization algorithm ingests the target scale, in the form of a ratio denominator, and
a resolution, in the form of a line weight. These determine the size of hexagons used in the
sampling tessellation, which reflect the resolution at which the output line may be shown to
change direction between vertices on the target map. Details regarding the tessera width
calculation are given later in this chapter.
The hexagonal quantization algorithm begins with map data as it exists after projection;
that is, the data is considered and computed in the form of two-dimensional coordinates lying on
a Euclidean plane. There is considerable basis for investigating a vertex clustering approach to
line simplification using spherical tessellations and spherical geometry (i.e., angular latitude and
longitude coordinates and geodetic surfaces), but this is outside the scope of the present research.
One consideration in applying tessellated vertex clustering to three-dimensional surfaces is the
impossibility of a regular spherical tessellation in Euclidean geometry; semi-regular tessellations,
such as the alternation of hexagons and pentagons seen on a common soccer ball, are possible,
but violate the desired quality of line vertex sampling equipotential because of inconstant tessera
size and orientation.
Regular hexagonal tessellations are used in the algorithm; all hexagons have the same
dimensions, angles, and orientation. The algorithm specifically uses equilateral hexagons (i.e., all
46
corners being 60° and all sides of equal length). Upon computing a desired resolution for the
tessellation from a target scale and line weight, hexagons are drawn according to their desired
“width,” being the perpendicular distance between two opposing sides (Figure 3.2). The
hexagons produced by the algorithm are all oriented such that two sides (i.e. the “top” and
“bottom” sides) are oriented perpendicular to true north.
Figure 3.2 - Hexagon width (i.e., tessera
resolution).
It is possible to rotate the tessellation through a range of 60° to achieve different hexagon
orientations (Figure 3.3). It is intuitively understood that differences in orientation would
produce differences in the sets of simplified line vertices produced by the algorithm, and thereby
produce differences in distance measured between the input and simplified lines. Similarly,
different positions of the tessellated grid over the input line would also produce differentlyclustered output vertices (Figure 3.4). These variations are not tested in the present research, and
will comprise future research and development of the algorithm.
Figure 3.3 - Sixty-degree range of rotation for regular hexagonal tessellations.
47
Figure 3.4 - The effect on output lines caused by shifting the tesserae. Input
vertices and lines are in gray, and output vertices and lines are in red.
Tessellation and Polyline Structure
The tessellations used by the algorithm provide both a sampling strategy and a structure
upon which to construct simplified product lines. Regular tessellations are chosen over irregular
tessellations on the basis of equipotential sampling of the input polyline vertices.
The vertices of an input line are presumed to be points on an Euclidean plane (ℝ2)
defined by Cartesian coordinates. The coordinate values of these points correspond to any twodimensional coordinate system associated with a map projection, such as the eastings and
northings of the UTM or U.S. State Plane systems. By this definition, the vertices of the input
line are free to exist at any point within the bounding box defined by the maximum and minimum
x and maximum and minimum y values among the set of points defining the line.
Because the points can exist anywhere in this bounding box, the algorithm
implementations in this thesis begin with the assumption that any sampling window (i.e., tessera)
placed anywhere within the bounding box is equally as likely to intersect with one or more points
as it is at any other position within the area. This holds true so long as the area of the sampling
48
window (i.e., tessera) is constant; it would be false if the area were variable, with likelihood of
intersection increasing as sample window area increases.
While an equipotential sampling approach is taken, it is understood that vertices along a
polyline defining a feature such as a river are not randomly placed, but are patterned to model a
real landscape feature. This holds true in both the case of data digitized by some mechanical
sampling method (e.g., digitizing tablets recording a point every second), as well as data where
each vertex was placed deliberately, since in both cases the vertices are placed along the linear
feature being modeled. To sample the variability in polyline direction changes by way of tessera
sampling windows, it is important that the orientation of the tesserae remain constant, so that any
measures of direction can be consistently made against a common tessellation layout. Each
straight line segment between vertices has its own orientation, and there is likely to be high
variability among those orientations across all the line segments in the polyline, particularly in the
sinuous polylines of rivers or coastline features. Because there are usually many line segments
constituting a polyline, there are many instances at which the polyline changes direction and wide
variation in the degree to which it does so. A common tessellation layout throughout allows the
variability in direction change to be consistently sampled.
In considering the variation in direction throughout any given map polyline and
observing that constant sampling orientation be maintained, another quality of sampling
tessellation geometry becomes desirable: equidistance to all immediate neighbors. Applying
tessellations as schemes for sampling plane surfaces where signal can be distributed freely across
the plane, the quality of equidistance to all immediate neighbors in each tessera translates to
regular and uniform sampling. This is desirable, since non-uniform sampling of areal point
features can introduce geometric artifacts into the set of detections which do not reflect the real
nature of the signal. As was noted in the preceding chapter, of the three possible regular
49
tessellations of the Euclidean plane, only hexagons maintain equidistance to all immediate
neighbors, a quality described by the term radial symmetry.
Steps of the Hexagonal Quantization Algorithm
The following section describes the three essential stages in the hexagonal quantization
algorithm: (1) the calculation of tessellation resolution, (2) the layout of the hexagons over the
input line, and (3) the vertex clustering procedure.
Calculation of Tessellation Resolution
The algorithm must first determine the dimensions of the hexagons to be used from user
input parameters. As with the Li-Openshaw raster-vector algorithm, the hexagonal quantization
algorithm achieves scale-specificity by sizing tessera according to a mathematical relation with
target scale. Li and Openshaw (1992, p. 378) suggest calculation of the diameter of their smallest
visible object (SVO) in relation to the input data scale as well as the target scale and symbol
width. In contrast, the method of calculating tessera resolution in this research considers target
scale and map resolution (i.e., symbol width) alone as definitive of appropriate resolution. This
approach is based on map resolution as described by Tobler (1987), who draws on notions from
sampling theory (Nyquist, 1928; Shannon, 1948). Tobler defines the resolution of a map to be
half the size of the smallest detectable feature on the map (1987, p. 42). He considers the
smallest mark a cartographer can make on the map, calculates the ground distance that size
represents at the map scale, and takes that value as the map resolution. That resolution is
understood to be sufficient for detecting (or representing) objects twice the size. Elaborating on
that reasoning, he offers an adjustment to compensate for inconsistencies in data sampling (p. 44):
50
From sampling theory it is known that the detection of a feature is only possible
if the sampling rate is twice as fine as the size of the feature to be detected ...
Since observations are never perfect, the better rule of thumb is to use a sampling
interval one fifth the size of the feature to be detected.
Building directly from Tobler's ideas, the hexagonal algorithm takes two input
parameters: the target scale for the product simplified line, and the line weight (i.e., symbol
thickness) at which the product line will be drawn. From these two values the tessellation
resolution r is derived using the following simple formula:
𝑟 = 5(𝑙)(𝑠)
where l is line weight and s is the target scale denominator. Units used throughout the calculation
should be those desired for defining the real world width of tessera (e.g., using meters, a line
weight of 0.5 mm [0.0005 m] and target scale of 1:250,000 yields r = 625 m).
In all cases in this research, a line weight of 0.25 mm was used. This value was chosen to
reflect the resolutions of modern topographic paper map printing, as well as today's high pixel
density displays (such as smart phone displays, which can exceed 240 ppi).
Tessellation Layout
The algorithm next computes the bounding box of the line to be simplified by identifying
the maximum and minimum x and y values of the vertices that make up the line. It then proceeds
to completely cover the area of the bounding box with hexagons of width r. This is done with
overlap around all four bounding box edges, to ensure that no points near the edges of the box fail
to intersect with a hexagon. The first hexagon is drawn at the north-west corner of the bounding
box, taking the corner as its center and defining its six corner points around that center. A
column of hexagons is then drawn south of this first hexagon until the southern edge of the
bounding box has been crossed. A new column is then defined immediately east of the first,
51
staggered to the south by half the value of r. All new hexagons borrow the exact x and y values
of corner vertices from pre-existing neighbors in order to ensure that no “sliver” gaps or overlaps
occur in the tessellation as a result of minute computer rounding errors. New columns are
defined until the eastern edge of the bounding box has been completely crossed. The process is
illustrated in Figure 3.5.
Figure 3.5 - Layout of hexagons using the bounding box delimiting the line. The
hexagon in the north-west corner is drawn centered on the bounding box corner
first, with hexagons below it drawn to follow. The second “column” of
hexagons to the east is drawn next, and the process continues until the bounding
box is completely covered by a hexagon on all sides.
Vertex Clustering and Quantization
Upon tessellation layout, the algorithm iterates through the vertices of the input line.
Starting with the first vertex, the single hexagon with which intersection occurs is identified.
Each subsequent vertex also identifies which hexagon it intersects with. If that hexagon is the
same as that intersected by the previous vertex, the current vertex is added to a current collection
of vertices pertaining to a single cluster. If the hexagon is different from that intersected by the
previous vertex, the previous collection of vertices is considered closed and a new collection is
52
begun with the current vertex. In this manner, a single hexagon may have more than one
collection (i.e., cluster of vertices) defined within itself, depending on how many times the input
line passes through it and places vertices in it (Figure 3.6). Many hexagons, especially at scales
Figure 3.6 - Constructing an output vertex (orange) for each
pass (first in red, second in blue) of the input line through the
hexagon.
closer to that of the input data, will have only one pass of the input line through them, but because
multiple passes are common, it is important to handle events in which this occurs. As clusters are
defined, they are stored in a sequential array.
Once the whole line has been considered upon iterating through all vertices and every
line vertex has been assigned to a cluster, the algorithm implements the collapse of each cluster to
a single vertex (i.e., it quantizes each cluster). As Li (2007, pp. 152-153) notes, this can be done
by an almost infinite number of methods (i.e., any point within the hexagon can be used to
represent all points contributing to that cluster). This research considers two vertex clustering
methods. Each method represents a distinct means of quantizing the vertices in a given tessera.
These are made available to the user as options, chosen before the algorithms is run. The
methods are illustrated in Figure 3.7, and described as follows:
53
•
the midpoint of a line segment drawn between the first and last vertices in a
cluster;
•
the spatial mean of the vertices in a cluster.
In both choices, the case of a single-vertex cluster quantizes to the unmoved vertex itself.
Finally, once all clusters have been quantized, their product single points are strung
together in sequence to produce the output simplified line.
Figure 3.7 - The two clustering methods used in this research. The midpoint of
the first and last vertices method is illustrated on the left, while the spatial mean
of vertices is illustrated on the right.
Clustering Routine Compared Li & Openshaw’s Suggestion
An important difference from the Li-Openshaw raster-vector algorithm is used in this
research for addressing instances in which a line loops through a tessera more than once. Li
(2007, p. 154) writes, “If there is more than one intersection, the first (from the inlet direction)
and the last (from the outlet direction) intersections are used to determine the position of the new
point.” This is illustrated in Figure 3.8. This strategy effectively cuts off any portions of the line
outside the tessera between the inlet and outlet points in question. Effectively, this strategy
guarantees that no line self-intersections can occur in the product line, since the output line will
54
Figure 3.8 - Li's suggested solution for
single vertex selection within cells with
multiple passes of the input line - see
cell at top, center. (Source: Li, 2007, p.
153)
always progress from one raster cell to the next without risk of curving back on itself. However,
if any important line features between the inlet and outlet vertices in question exist, these will be
deleted by the strategy (Figure 3.9).
The hexagonal quantization algorithm instead places a collapsed vertex inside each
tessera for each pass of the line through it (Figure 3.6). This permits all line segments to be
represented, though it also reintroduces the possibility or line self-intersection. This is a
particularly important property for the hexagonal quantization algorithm in this research, since
omissions of line segments based on upstream vertex clustering would problematically skew
observed Hausdorff distances between input and product lines. Though self-intersections are
observed to be rare, they are fundamental problems that must be resolved. While selfintersections are not solved in this thesis, a method for their repair as a post-processing routine
has been devised, and is under development by the author (further details are given in the
Conclusions and Further Work chapter).
55
Figure 3.9 - An effect of Li's suggested method of selecting
single vertices in a cell with multiple input line passes. In
this example, the application of Li’s suggestion at the
tessera overlapping the peninsula’s connection to the
mainland would cause the entire peninsula to be deleted,
whereas a representation of it could be retained at this cell
resolution (i.e., target scale).
In addition to the development of the hexagonal quantization algorithm, this study has
also implemented the Li-Openshaw raster-vector algorithm. To allow for comparison across
geometries, both the hexagonal quantization and Li-Openshaw raster-vector algorithms are
implemented and run with tessera resolution derived by the formula given above (i.e., the LiOpenshaw square cell size is not calculated using Li and Openshaw's SVO estimation formula).
There are two reasons for this. First and most importantly, maintaining like tessera “width”
allows for direct comparability between squares and hexagons. While it was considered that
squares and hexagons of equal area should be used, equal “width” was deemed more appropriate.
This was because width, rather than area, plays a definitive role in placing output polyline
vertices at such distances as have been determined to be visually resolvable. Second, the formula
developed here is based on map resolution at target scale, and does not require the input data
scale as a parameter, whereas the formula given by Li and Openshaw does. Li and Openshaw's
formula parameterizes their product lines by a scale-differential similar to that proposed by
Töpfer and Pillewizer (1966). However, not requiring an input scale parameter offers advantages
56
in that input data of variable or uncertain vertex resolution can be used, and error in data that are
maintained to inconsistent resolutions (often caused by inconsistent digitization) is not
propagated to the output line.
Also, the Li-Openshaw algorithm is implemented using the same tessellation and vertex
clustering methods described above.
Implementation
The algorithm was implemented using a mixture of tools in Esri's ArcGIS and software
custom-written in Java (version 6). Input lines were first loaded from Esri shapefile data in
ArcMap, and projected to the appropriate UTM zone, using the North American 1983 datum.
The Esri geoprocessing tool “Dissolve” was used to reduce sample lines to single polylines,
where each vertex along the whole line was then stored in a single, ordered data array. These
lines were then reduced to their vertices using the ArcGIS geoprocessing tool “Feature Vertices to
Points”. Two new columns were added to the attribute tables of these lines, one for Eastings and
another for Northings; these were subsequently calculated, in meters, from the UTM projection.
These attribute tables were then exported to csv files.
All of the tessellation and vertex collapse processes were handled by the custom-written
Java software. This software was designed with a graphical user interface, or GUI (Figure 3.10).
The interface permitted the selection of input csv line files; specification of output csv files;
selection of hexagonal, square, and Hausdorff distance calculation routines; selection of vertex
clustering methods; and specification of input parameters. The GUI also produced textual reports
on algorithm runs, and enabled the saving of these reports to txt files.
Both the hexagonal and square algorithms operated by accepting arrays of custom-written
objects of type Point as the vertices of an input line. These were read from the csv files exported
57
Figure 3.10 - A screen shot of the graphical user interface of the
software developed to implement the algorithms and the calculation of
Hausdorff distances.
from ArcMap. Using the input parameters specified by the user, the algorithms called on various
routines to lay out tessellations and perform the vertex clustering according the methods
described earlier, as well as save their outputs to new user-specified csv files. The output files
took the form of basic csv tables, where each record represented a vertex along the simplified
line. Each record was attributed with three pieces of data: its number in the sequence of vertices
along the output line, and its easting and northing coordinates in meters.
Output csv files were then loaded into ArcMap, and x,y plotting was used to draw the
vertices in map space. A public-domain script written by David Wynne called “Points to Line”
(available for download from http://arcscripts.esri.com/details.asp?dbid=15945) was then used to
58
string the plotted vertices together in the sequence defined in their attribute values, and to save
the product lines in Esri shapefile format.
The Java portion of the implementation used in this research was designed to permit
sequential running of both the hexagonal quantization algorithm and the implementation of the
Li-Openshaw raster-vector algorithm using the same input parameters. In this manner, it was
possible to couple both algorithms, each time using the same input file and input parameters (line
weight and target scale), and each time calculating the Hausdorff distances between input and
output vertices. Thus, with each run, it was possible to produce two output simplified lines, one
by either algorithm, with hexagon or square width being identical across both shapes. Thus, the
products of a vertex clustering method using hexagons of width x could be compared to those of
the method using squares of side-length x. To keep related hexagon and square products
associated, a file naming convention was adopted that contained the line name, the collapse
method used, the shape used, and the scale to which the line was simplified (e.g.,
"NovaScotia_C_MpH_250k.csv" indicated the coast of Nova Scotia, collapsed using the midpoint of the 1st and last vertices in a hexagonal tessera, simplified to 1:250,000, being a hexagon
width of 312.5 meters). Also, coupled files were identified by name and algorithm parameters
stated in each output text report produced by the Java software (an example of one of these is
provided in Appendix B).
Sample Lines
Thirty-four sample lines were used in this study. A sample size of 34 was chosen for two
general reasons: first, when all lines would be considered across one given scale and algorithm
processing iteration, there would be a sufficient sample size (i.e., 30 or greater) to expect a
Gaussian distribution, making the use of parametric statistical analyses more likely to be
59
appropriate. Second, 34 lines, when processed once for each algorithm, each vertex clustering
method, and each scale, came to a total of 952 lines, a number which seemed both reasonable and
manageable.
All lines used in this study are portions of coastlines and rivers from Canada or the
United States. American lines were taken from the “high resolution,” 1:24,000 USGS National
Hydrography Dataset (NHD) (Simley & Carswell Jr., 2009). NHD data were downloaded using
the USGS National Map Viewer (http://viewer.nationalmap.gov/viewer/). Canadian data were
taken from the National Hydro Network (NHN), maintained by the Canadian Council on
Geomatics, and drawn from geospatial data collected by both federal and provincial or territorial
governments. NHN data are produced to varying scales from 1:10,000 to 1:50,000, and are
provided to the largest scale available in any given area (Geomatics Canada, 2010, p. 6);
Canadian lines were carefully chosen to be of larger rather than smaller scales.
All lines were sampled from larger downloaded data sets. Each line was clipped from a
larger river line or coastline such that the straight-line distance from beginning to end points was
within 15 to 20 km. Lines were selected to have a wide variety of complexities. Also, lines were
selected to represent a range of geomorphologic river and coast types (Trenhaile, 2007). Sample
coasts were taken from ice-dominated rocky beaches (e.g., the coast of Killiniq Island, Nunavut),
tidal-dominated coasts (e.g., the shore of the Bay of Fundy, Nova Scotia), a sandy wave
dominated beache (Myrtle Beach, South Carolina), an estuary shore (e.g., Potomac River,
Virginia), lake shores (e.g., Lake Superior, Ontario), and a river delta (Mississippi River delta,
Louisiana). Rivers were chosen to represent complex and highly sinuous lines that strongly need
simplification at reduced scale (e.g., Humboldt River, Nevada; Sweetwater River, Wyoming; Rio
Grande, Texas), as well as those with relatively straighter courses (e.g., Yukon River, Yukon
Territory; Cedar River, Iowa). All 34 lines used are mapped in Figure 3.11. All lines are also
listed and depicted without simplification in thumbnails in Appendix A.
60
Figure 3.11 - Locations of the 34 sample lines used in this research. Coast and shore lines are
indicated in italics. (Background hypsometric tint courtesy of Tom Patterson, source:
NaturalEarthData.com)
Experiment Design and Statistical Comparison
Between Hexagonal and Square Outputs
Notes on the Use of Hausdorff Distance
The Hausdorff distance, as explained in Chapter 2, is a metric for measuring the
difference between two sets in a metric space. In this research, the Hausdorff distance using the
Euclidean distance between two points in ℝ2 is measured between the sets of input and output
line vertices. Since the output line is generated from the input line, the Hausdorff distance
61
between the two lines can be taken to reflect a measure of maximum aerial deviation of the
simplified line from the input line.
Because the output line is created by a vertex clustering approach within the cells of a
regular tessellation, the dimensions of the tessera provide an absolute upper-bound to the possible
resultant Hausdorff distance (Rossignac, 2004). In other words, the Hausdorff distance cannot
exceed the maximum length possible within a tessera. That distance is the one from one corner to
the opposite corner in the cases of both hexagons and squares. For example, in a hexagon of
“width” (side-to-opposite-side) 100 m, the corner-to-opposite-corner distance is 116 m. Since
116 m is the largest distance that can fit within the hexagon, it provides an upper bound to any
Hausdorff distances that can result from a within-hexagon vertex clustering operation. (While it
is possible to shrink squares such that both shapes have equal corner-to-opposite-corner
dimensions and compare the two resulting tessellations, this would not allow for comparison
across the differing geometric connectivity with neighboring tessera between hexagons and
squares of a given resolution.)
Finally, while it is well known that the Hausdorff distance is sensitive to outliers, the
vertex clustering approach undertaken in this research guarantees that no outliers are ever
produced.
Experimental Design
To compare the relative displacements caused by the hexagonal and square algorithms, a
randomized block experimental design (Mendenhall, Beaver, & Beaver, 2006) was used. All 34
lines were each simplified by both hexagonal and square algorithms to seven different scales,
chosen to correspond with round-number scales commonly used by national mapping agencies:
•
1:50,000
62
•
1:100,000
•
1:150,000
•
1:200,000
•
1:250,000
•
1:500,000
•
1:1,000,000
Also, all lines were processed for a target map resolution of 0.7 PostScript points
(equivalent to 0.25 mm), this value having been chosen to reflect common map printing standards
to date.
Hausdorff distances were measured between the vertices of all simplifications and their
input lines. Thus, for each of the seven target scales, 34 Hausdorff distances were recorded for
each of the set of hexagonal simplifications and the set of square simplifications. This entire
process was carried out twice: one for simplifications using the midpoint of a line segment
between the first and last vertices in a tessera as the quantization method, and again for
simplifications taking the spatial mean within a tessera.
SPSS (version 18) and R (version 2.11.0) statistical software packages were used to
analyze all Hausdorff distance data. The means and standard deviation of all sets of Hausdorff
distances across the 34 sample lines were calculated in order to compare relative values across
hexagon-square pairings. All sets of Hausdorff distances were examined using quantile-quantile
(Q-Q) plots for normality. It was observed from these that most data sets exhibited normal
distributions. Thus, the data were subjected to paired samples T-tests to 95% confidence intervals
in order to determine whether relative mean values of Hausdorff distances across hexagon and
square simplifications differed significantly. Because some of the Hausdorff distance datasets
deviated substantially from normality, the data sets were also all subjected to nonparametric
related-samples Wilcoxon signed rank tests for comparison of results against the parametric
63
statistics. Finally, mean Hausdorff distances collected from all 952 simplification runs were
analyzed in a three-way analysis of variance (ANOVA) test to examine for significant effects
from three factors independently, as well as in interaction combinations: algorithm used,
quantization method used, and scale. Results of these statistical analyses are reported in the next
chapter.
Chapter 4
Results and Interpretations
There are two sets of results reported in this chapter, reflecting the cartographic results of
the line simplification algorithms implemented, and the results of the statistics calculated on
differing Hausdorff distances between the hexagonal quantization and Li-Openshaw raster-vector
line simplifications. Interpretations are then offered.
Resulting Line Simplifications: Visual Presentation
Both the hexagonal algorithm and the implementation of the Li-Openshaw raster-vector
algorithm yielded simplified lines. A total of 952 simplified lines were produced from the 34
samples across all iterations of both algorithms, all target scales, and both vertex clustering
methods. For concision, a sample of these lines are presented here; these were chosen by the
author as a representative sample of qualities observed across all of the study’s output lines. All
figures draw output lines at a line weight of 0.7 PostScript points (0.25 mm).
Figure 4.1 illustrates all 34 lines, simplified to 1:500,000 using the hexagonal
quantization algorithm and midpoint first and last vertices vertex clustering method. This figure
illustrates the general success of the algorithm’s application to a diversity of line forms.
65
Figure 4.1 - All 34 lines simplified by the hexagonal quantization algorithm to 1:500,000 and drawn to
scale.
66
Figure 4.2 uses a portion of the coast of Maine to illustrate the output lines of both
algorithms at all seven scales using the spatial mean vertex clustering method in each tessera.
Figure 4.3 does the same, but using the midpoint first and last points vertex clustering for either
algorithm. All lines on both figures are drawn at 1:24,000 with the original 1:24,000 line drawn
in gray in the background. While the Hausdorff distance analyses given later in this chapter
provide quantitative evaluations of difference, these figures are given to allow for visual
comparison of the positional fidelity of either algorithm to a common input line. Figures 4.4 and
4.5 are similar to 4.2 and 4.3; these illustrate the four most extreme scales in the study
(1:200,000; 1:250,000; 1:500,000; and 1:1,000,000) for a complex, curving pair of narrow
peninsulas extending from the Alaskan Peninsula. Again, the products of both algorithms are
drawn at 1:24,000 above the original 1:24,000 line for visual appraisal of relative fidelity.
Using a portion of the coast of Newfoundland, Figures 4.6 and 4.7 together compare the
output of both algorithms, this time with lines drawn to target scale. These figures provide
examples of the performances of either algorithm at the target scales at which they are meant to
be observed. Careful observation of the two figures permits visual comparison of the products of
either algorithm.
Figures 4.8 through 4.10 each provide output lines from both algorithms using both
vertex clustering methods; each figure illustrates one of three locations and one of three separate
scales. These figures provide further material for reader visual inspection.
A discussion of these figures is given under the Interpretations heading of this chapter.
67
Figure 4.2 - Simplifications of a portion of the coast of Maine produced by both the hexagonal
quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the
spatial mean quantization option, against the original line (gray). All lines drawn to 1:24,000.
68
Figure 4.3 - Simplifications of a portion of the coast of Maine produced by both the hexagonal
quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the
midpoint first and last vertices quantization option, against the original line (gray). All lines
drawn to 1:24,000.
69
Figure 4.4 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both
the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm
(green, right) using the spatial mean quantization option, against the original line (gray). All lines
drawn to 1:24,000.
70
Figure 4.5 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both
the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm
(green, right) using the midpoint first and last vertices quantization option, against the original
line (gray). All lines drawn to 1:24,000.
71
Figure 4.6 - Portion of the coast of Newfoundland, simplified to seven target scales by the
hexagonal quantization algorithm using the midpoint first and last vertices quantization option.
72
Figure 4.7 - Portion of the coast of Newfoundland, simplified to seven target scales by the LiOpenshaw raster-vector algorithm using the midpoint first and last vertices quantization option.
73
Figure 4.8 - Portion of the Humboldt River, simplified to 1:150,000 by both algorithms using
both quantization options. The orange box signifies the location of the 1:24,000 segment (at top)
on the simplified lines.
74
Figure 4.9 - Portion of the Mississippi Delta coastline, simplified to 1:250,000 by both algorithms
using both quantization options.
75
Figure 4.10 - Portion of the shore of the Potomac River, simplified to 1:500,000 by both
algorithms using both quantization options. The orange box signifies the location of the 1:24,000
segment (at top-center) on the simplified lines.
76
Statistical Results
Mean Hausdorff Distances
Table 4.1 reports mean Hausdorff distances, in ground meters, between simplified and
input vertices calculated from all sample lines, across both simplification algorithms, at each
target scale.
PairedSamples T
Tests
Tessera
width (m)
Spatial Means
1:50,000
62.5
38.9
39.9
41.0
40.2
1:100,000
125.0
75.0
82.2
78.2
88.4
1:150,000
187.5
110.3
123.7
121.0
127.6
1:200,000
250.0
144.0
156.8
154.4
174.1
1:250,000
312.5
181.5
193.4
216.7
216.6
1:500,000
625.0
354.2
381.3
388.4
410.9
1:1,000,000
1250.0
676.8
701.7
765.1
747.0
Hexagons
Squares
Midpoint of 1st and last
vertices
Hexagons
Squares
Table 4.1 - Mean Hausdorff distances (in ground meters) between simplified and input vertices.
Each mean Hausdorff distance is calculated from n = 34 simplified lines and their related input
lines.
From Table 4.1 it is apparent that mean Hausdorff distances were usually (i.e., 11 of 14
pairs) shorter for hexagonal simplifications than their paired square counterparts. To determine
whether each difference between means was statistically significant, two tests were conducted:
the paired-samples T test for difference in means (a parametric test), and the related samples
Wilcoxon signed-ranks test (a nonparametric test).
Both the T tests and Wilcoxon signed-rank test begin with the null hypothesis (denoted
by H 0 ) that there is no significant difference between the mean Hausdorff distances produced by
77
either algorithm. Both tests were conducted to the 95% confidence level (i.e., α = 0.05). The
significance value (i.e., p-value) calculated by either test indicates the probability of observing
mean Hausdorff distances differing as extremely as they do in the data, if the null hypothesis of
no significant difference in means were true. When the significance value for either test is below
0.05, the null hypothesis of equivalent Hausdorff distances is rejected, and there is evidence to
suggest that the differences in mean distance seen in Table 4.1 are due to differences between the
performances of either algorithm.
Figures 4.11 and 4.12 illustrate Q-Q (“quantile-quantile”) plots drawn on all sets of
Hausdorff distance measures. Q-Q plots are used to compare two probability distributions by
plotting their quantiles against each other. If one distribution consists of observed values while
the other consists of theoretically expected values, a Q-Q plot may be used to visually determine
whether or not a dataset is statistically normal, and can thereby be analyzed using parametric
statistical techniques. A normal dataset will lie in a relatively straight line along the y=x line,
indicating that observed values conform closely with those expected for normality. Interpreting
Q-Q plots is not strictly objective, and requires discrimination on the part of the analyst. In
Figure 4.11, for example, the Q-Q plots for hexagons and squares at 1:250,000 are particularly
exemplary of normal datasets, while that for squares at 1:250,000 in Figure 4.12 is not normal.
78
Figure 4.11 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and
square samples, using the spatial mean quantization option.
79
Figure 4.12 - Quantile-Quantile plots for mean Hausdorff distances across
hexagonal and square samples using the midpoint first and last vertices quantization
option.
80
Having observed that most data sets conformed to normality, it was decided to first
analyze using parametric statistical techniques throughout. For either vertex clustering method,
seven paired-samples T tests were run, one for each target scale. Their calculated statistics are
given in Tables 4.2 through 4.5, each corresponding to one of the vertex clustering methods used.
Pearson correlation coefficients are given for each paired sample set (i.e., hexagons and squares
to a given scale), in Tables 4.2 and 4.4. These, when given with a significance value lower than
0.05, describe the degree to which one can predict a relationship between the paired samples (e.g.,
how consistently hexagons will result in smaller Hausdorff distances than squares). These
correlations are important because they describe, across several scales, how consistently one
algorithm may have performed with shorter Hausdorff distances than the other. All Pearson
correlation coefficients in Table 4.2, for example, are significant, while that given for the
1:50,000 comparison in Table 4.4 is not. The Pearson correlation coefficient can take any value
from -1 to 1, with -1 signifying a perfect negative correlation, zero signifying no correlation, and
1 signifying a perfect positive correlation. For example, the Pearson correlation coefficient of
0.799, with significance 0.000 given for the 1:500,000 comparison in Table 4.2 indicates a sure
and strong correlation between the algorithm used and a shorter mean Hausdorff distance (i.e., the
hexagonal algorithm yielded shorter Hausdorff distances).
81
Hexagons vs. Squares Scale Pairings
1:50,000
N
34
Correlation
0.553
Sig.
0.001
1:100,000
34
0.748
0.000
1:150,000
34
0.768
0.000
1:200,000
34
0.584
0.000
1:250,000
34
0.768
0.000
1:500,000
34
0.799
0.000
1:1,000,000
34
0.520
0.002
Table 4.2 - Pearson correlation coefficients for differences in means observed between the
hexagonal and square algorithms, using the midpoint first and last vertices quantization option.
Paired Differences
Scale
Pairings
1:50,000
Mean
-0.805294118
Std.
Deviation
9.927300965
Std. Error
Mean
1.70251807
1:100,000
10.17705882
12.03082469
1:150,000
6.590294118
1:200,000
95% Confidence Interval of the
Difference
Lower
-4.269093175
Upper
2.65850494
t
-0.473
df
33
Sig. (2tailed)
0.639
2.063269412
5.979305643
14.374812
4.932
33
0.000
18.1278225
3.10889591
0.265197832
12.9153904
2.12
33
0.042
19.69058824
28.06652605
4.813369507
9.89771434
29.48346213
4.091
33
0.000
1:250,000
-0.098823529
34.70618579
5.952061758
-12.20838423
12.01073717
-0.017
33
0.987
1:500,000
22.55235294
54.79708984
9.39762338
3.43274442
41.67196146
2.4
33
0.022
1:1,000,000
-18.08323529
179.0896429
30.71362037
-80.57056577
44.40409518
-0.589
33
0.560
Table 4.3 - T test statistics across seven scales for the difference in mean Hausdorff distances
between square and hexagonal algorithms using the midpoint first and last vertices quantization
option.
Hexagons vs. Squares Scale Pairings
1:50,000
N
34
Correlation
0.334
Sig.
0.054
1:100,000
34
0.482
0.004
1:150,000
34
0.473
0.005
1:200,000
34
0.595
0.000
1:250,000
34
0.357
0.038
1:500,000
34
0.210
0.233
1:1,000,000
34
0.339
0.050
Table 4.4 - Pearson correlation coefficients for differences in means observed between the
hexagonal and square algorithms, using the spatial mean quantization option.
82
Paired Differences
95% Confidence Interval of the
Difference
Scale
Pairings
1:50,000
Mean
1.006764706
Std. Deviation
10.27130652
Std. Error
Mean
1.761514536
Lower
-2.577063565
Upper
4.590592977
t
0.572
df
33
Sig. (2-tailed)
0.572
1:100,000
7.091764706
7.523655813
1.290296327
4.46663709
9.716892321
5.496
33
0.000
1:150,000
13.42558824
9.863044644
1.691498202
9.984209269
16.8669672
7.937
33
0.000
1:200,000
12.85882353
10.88008321
1.865918877
9.06258303
16.65506403
6.891
33
0.000
1:250,000
11.90323529
17.79977285
3.052635859
5.692600941
18.11386965
3.899
33
0.000
1:500,000
27.08029412
40.19238572
6.892937284
13.05650777
41.10408047
3.929
33
0.000
1:1,000,000
24.87529412
91.23266126
15.64627233
-6.957286275
56.70787451
1.59
33
0.121
Table 4.5 - T test statistics across seven scales for the difference in mean Hausdorff distances
between square and hexagonal algorithms using the spatial mean quantization option.
The T-test results provide indications regarding whether or not differences in observed
mean Hausdorff distances between hexagonal and square treatments were significant at each
target scale. In both Tables 4.3 and 4.5, the “Mean” column gives the difference in mean
Hausdorff distances at each scale between the two algorithms (in meters). The “t” column gives
the calculated T-test statistic. This value must be greater than some critical value at the test’s
degrees of freedom (“df” column) to indicate statistically significant difference between means.
These critical values can be looked up on a table of t-distribution critical values, but the SPSS
output provides a two-tailed significance value (the right-most column) that makes this
unnecessary. If the value in the “Sig (2-tailed)” column is 0.05 or less, there is reason to reject
the hypothesis that the two means being compared are equal. For example, the statistics
calculated between mean Hausdorff distances from the hexagonal and square algorithms at
1:100,000 given in Table 4.3 indicate significant difference, while those at 1:1,000,000 do not.
Because some distributions of Hausdorff distances departed substantially from normality,
the nonparametric related-samples Wilcoxon signed rank test was also used to test for significant
difference in means, to a confidence interval of 95% in all cases. This was done to corroborate
findings from the paired samples T-tests, and to make certain that T-test findings were not
83
spurious in the presence of some non-normal data. These statistics are given in Table 4.6. For
each target scale (along the left-most column) and for either vertex-clustering method (indicated
in the top-most row), the mean Hausdorff distances from all 34 lines were compared across both
algorithms to determine whether they were significantly different, this time without assuming a
normal probability distribution in the data. When significance values are 0.05 or less, the null
hypothesis of no differences in the mean Hausdorff distances derived from either algorithm is
rejected (i.e., that one algorithm places output lines significantly closer to the input line than does
the other).
Spatial Means
Midpoint of 1st and last
point
H 0 : no
significant
difference
in means
Significance
Kept
.562
1:50,000
H 0 : no
significant
difference
in means
Kept
Significance
.066
1:100,000
Rejected
.000
Rejected
.000
1:150,000
Rejected
.000
Rejected
.020
1:200,000
Rejected
.000
Rejected
.001
1:250,000
Rejected
.001
Kept
.285
1:500,000
Rejected
.001
Rejected
.009
1:1,000,000
Kept
.101
Kept
.952
Scale
Pairings
Table 4.6 - Related-samples Wilcoxon signed rank statistics.
The Hausdorff distance of each of the 952 line simplifications generated in this research
represents a permutation of three factor variables: the algorithm used (hexagons vs. squares), the
quantization method used (spatial mean vs. midpoint of first and last vertices), and the target
scale (i.e., tessera width, by corollary). In order to investigate the effects of each of these factors,
84
both independently and in interaction with each other, a three-way analysis of variance (ANOVA)
test was conducted. As with the T test and Wilcoxon signed rank tests, this test determines
whether significant difference exists between groupings of means of a dependant variable (being
Hausdorff distance in this case); the null hypothesis is that no significant difference exists. At the
95% confidence level, the null hypothesis is rejected when the calculated significance value falls
below 0.05. The results of this test are given in Table 4.7.
Factors
Scale
Quantization
Algorithm
Scale × Quantization
Scale × Algorithm
Quantization × Algorithm
Scale × Quantization × Algorithm
Residuals
df
1
1
1
1
1
1
1
944
Sum of
Squares
46366000
116255
23214
101993
24
4116
10725
3498465
Mean
Square
46366000
116255
23214
101993
24
4116
10725
3706
F value
12511.0597
31.3693
6.2638
27.5212
0.0064
1.1106
2.8939
Sig.
0.000
0.000
0.012
0.000
0.936
0.292
0.089
Table 4.7 - Three-way ANOVA test statistics across all 952 simplifications and three factors.
Along with Hausdorff distances, numbers of vertices were recorded with each
simplification, and a percent reduction in this number from the input line was calculated for each
simplified line. Mean values for hexagonal and square treatments at each scale, and across both
vertex clustering methods, are given in Table 4.8. While no statistical analyses are performed on
these values, it can be quickly seen that differences in reductions of vertices by either algorithm,
using either vertex clustering method, are minute. An immediate conclusion from these data is
that neither algorithm seems to reduce vertices appreciably more than the other. While vertex
reduction has been a concern for some authors in line simplification research, it is not the aim of
either algorithm used in this research.
85
1:50,000
1:100,000
1:150,000
1:200,000
1:250,000
1:500,000
1:1,000,000
Spatial Means
Hexagons
Squares
47.70
49.15
67.58
68.19
76.56
76.96
81.84
82.03
85.04
85.21
91.95
92.29
95.98
95.69
Midpoint of 1st and last point
Hexagons
Squares
48.15
49.73
67.91
68.51
76.78
77.19
82.01
82.20
85.18
85.35
92.03
92.36
96.02
95.72
Table 4.8 - Mean percent reductions in vertices from the input line, averaged across all 34 sample
lines, for each algorithm and each quantization option.
Interpretations
One of the goals of this research has been to demonstrate that the fidelity of cartographic
lines produced from a vertex clustering simplification algorithm using hexagonal tessellated
sampling is greater than that produced by the similar Li-Openshaw raster-vector algorithm, which
uses square raster cells. There can be both subjective and objective evaluations of this claim,
based on either aesthetic or metric judgments.
Discussion of Cartographic Results
From the preceding material in this chapter, it can be seen that both the hexagonal
algorithm and the implementation of the Li-Openshaw raster-vector algorithm produce
comparably acceptable cartographic lines. It should be repeated that for the sake of direct
comparability, this research has used the tessera width calculation formula developed for the
hexagonal quantization algorithm for both hexagon and square size; thus, the products of the LiOpenshaw raster-vector algorithm presented here are not precisely those that would be achieved
for a given target scale using Li and Openshaw's (1992, p. 378) formula. Also, the issue of line
86
self-intersections was discussed in the third chapter; while both algorithms as implemented in this
research did produce occasional self-intersections, this would not have happened with the LiOpenshaw algorithm had Li's (2007, p. 154) recommended vertex clustering method been
employed. A counter argument against using Li's suggested clustering method, however, is that
whole portions of lines, such as peninsulas or small bays, would have been omitted because their
outlets were narrow enough to fall within one tessera (Figure 3.9).
Upon close observation, it is clear that the algorithms seem to always produce differing
lines. Figures 4.6 through 4.10 provide various examples of the output lines of either algorithm
drawn at target scale and with target line weight. An important point to make is that neither
algorithm seems to have an obvious advantage over the other in producing lines more acceptable
on aesthetic grounds. This is to say that both the hexagonal quantization algorithm and the LiOpenshaw raster-vector algorithm are successful methods, and able to produce lines that would
seem acceptable to many cartographers and map readers.
Allowing for the comparable performance between the two algorithms in terms of
aesthetics, it is important to note that another important consideration, particularly in topographic
mapping settings, is the degree to which either algorithm deviates from the original line, or, put
another way, the degree of fidelity to the original line each algorithm exhibits. One way to
consider and seek to evaluate this is by direct visual comparison, as is possible with Figures 4.2
through 4.5. Observing these figures one can imagine a simplified line that a trained cartographer
may manually draw while seeking to stay faithful to the original line. The product lines from
either algorithm can then be considered for how closely each approximates the line drawn by the
imaginary cartographer, which we assume would be a superior line.
In the small, sinuous bay on the coast of Maine given in Figures 4.2 and 4.3, the line
produced by the hexagonal quantization algorithm seems to straighten curving sections and retain
narrow inlets with greater success than the Li-Openshaw raster-vector algorithm (noteworthy
87
examples are seen in the 1:500,000 graphics in both figures). Given sufficient space and
resolution on the target map to depict small details such as narrow inlets, the retention of these
makes an output line more faithful than another produced to the same scale that does not retain
the inlets. By this reasoning, one may conclude that the hexagonal quantization algorithm
performs with greater fidelity to the input line in that it will tend to retain visible geographical
features through greater scale change than will the Li-Openshaw raster-vector algorithm.
Further, in the case of extreme scale change to 1:1,000,000, the hexagonal quantization
algorithm retains a more descriptive shape for the bay in Figures 4.2 and 4.3 than does the LiOpenshaw raster-vector algorithm. This too contributes to the greater fidelity of the hexagonal
quantization algorithm, since it tends to draw more geographically informative forms at extreme
scale changes than does the Li-Openshaw raster-vector algorithm.
The hexagonal quantization algorithm is also seen at times to reduce small details with
greater success than the Li-Openshaw algorithm. Figures 4.4 and 4.5 illustrate the performance
of both algorithms on a complex set of peninsulas. As inspection of these two figures may
suggest, the hexagonal quantization algorithm tended to omit the very narrow portion of the
southern peninsula more often than did the Li-Openshaw raster-vector algorithm, even though the
hexagonal algorithm still retained the larger portion of the peninsula. This illustrates a successful
simplification of the peninsula, retaining the important fact that a peninsula of significant land
mass exists while pruning away detail too small for the target map. In retaining the narrower
portion of the peninsula more often, the Li-Openshaw raster-vector algorithm as here
implemented encountered self-intersection problems more often than did the hexagonal
quantization algorithm. Still, both algorithms exhibit this flaw; future work, outlined in greater
detail in the Conclusions chapter, will address and resolve this issue.
It is difficult to visually isolate effects between tessera shape difference vs. vertex
clustering method from the product lines presented (Figures 4.2 vs.4.3; 4.4 vs. 4.5; and Figures
88
4.8 through 4.10). Close visual inspection suggests that the lines produced by the spatial mean
quantization are slightly less angular, and thus slightly more aesthetically-pleasing, though this is
not immediately obvious. Interestingly, this is coincident with the fact that the spatial mean
quantization always produced shorter mean Hausdorff distances than did the midpoint first and
last vertex quantization, at all scales and for both algorithms (see Table 4.1). This suggests that
greater objective positional accuracy actually contributes, however minutely, to aesthetically
superior results.
Discussion of Statistical Results
Objective evaluation is based on the statistical analyses of the Hausdorff distances
between input and simplified lines. Descriptive statistics in Table 4.1 indicate shorter mean
Hausdorff distances for hexagons than squares in 11 of 14 pairings. Both parametric and
nonparametric tests (Tables 4.3, 4.5 - 4.7) demonstrated significant difference in the Hausdorff
distances between hexagonal and square simplifications, for either vertex clustering method.
Since tests were conducted both across all seven target scales (three-way ANOVA) and at each
target scale (T tests and Wilcoxon signed rank test), the following discussion treats each set of
analyses individually.
Three-way ANOVA Results, Across Target Scales
The results of the three-way ANOVA test conducted (Table 4.7) indicate strongly
significant effects on Hausdorff distances from each of the following factors and combinations of
factors: scale, algorithm, quantization method, scale in interaction with quantization method, and
scale, algorithm and quantization all in interaction.
89
The first factor, scale, is known a priori to have an effect on Hausdorff distance;
simplifications are a function of target scale and will obviously exhibit increasing Hausdorff
distances as target scale decreases (i.e., as tessera width increases). Thus the significance of scale
in this test is not surprising, but it is important to include it in the ANOVA model in order to
account for its effects against other factors.
Of most interest is the fact that the algorithm factor was determined to be significant,
nearly to the 99% confidence level (“Sig.” value of 0.012, Table 4.7). This provides grounds to
reject the null hypothesis of equivalence of mean Hausdorff distances between the hexagonal and
square algorithms. Since the descriptive statistics given in Table 4.1 indicate shorter mean
distances for hexagons in 11 of 14 comparisons, the rejection of the equivalence hypothesis
strongly suggests an advantage attributable to the hexagonal algorithm.
Quantization method was determined to be highly significant (“Sig.” value of 0.000,
Table 4.7). This, when considered with the consistently shorter distances for the spatial mean
quantization method seen in Table 4.1, strongly suggests that that quantization method
dependably produces simplified lines more faithful to the input line than those produced by the
midpoint first and last method.
Both significant interactions (scale × quantization and scale × quantization × algorithm)
include scale. Since, as stated above, scale is expected to drive Hausdorff distance values,
interactions with scale are not interesting results.
T Test and Wilcoxon Signed Rank Results, Within Target Scales
As seen from test results in Tables 4.3, 4.5 and 4.6, significant difference existed in the
Hausdorff distances generated between hexagonal and square simplifications, for either vertex
90
clustering method, at most, but not all, target scales. Since hexagons were not better in all cases,
explorations of the exceptions are offered.
In the case of simplifications to 1:50,000, the hexagon mean was lower than square for
the spatial mean method, but not for the midpoint first and last vertices method, and in neither
case was the difference statistically significant (Table 4.1). A general explanation for this is that
scale difference from the input data, at approximately 1:24,000 (allowing for the variability in the
Canadian data), to the target scale at 1:50,000, is not particularly large. Hexagon and square
widths at that scale were 62.5 m. Table 4.7 provides a table of mean vertex reductions upon
simplification using either algorithm and either collapse method. It can be seen from that table
that vertex reductions in all cases for the 1:50,000 scale hovered just below 50%. Relative
spacing between input line vertices was visually inspected by the author and seen to be generally
consistent and uniform along the line. Granting, then, that input vertices were spaced at generally
regular intervals, this means that on average a single tessera in the 1:50,000 algorithm runs, in
either algorithm, usually produced its output point from two input vertices. Since both hexagonal
and square sampling tessera were usually calculating output vertices from two input vertices (i.e.,
within tessera variability was relatively constant across tessera shapes), the distances between the
collapsed points and input points would frequently be similar across both algorithms. This is true
of either collapse method, since collapsed points would be the same, whether using the spatial
mean or the midpoint between first and last points clustering method. At 1:50,000, hexagons,
then, did not produce statistically significantly shorter Hausdorff distances than squares because
the scale change was not large enough to take advantage of the lesser anisotropy of the hexagonal
packing; either tessera being only large enough to capture about two points, the differing point
clustering afforded between hexagons and squares was not reflected by the output vertices.
For the midpoint first and last clustering method, output lines at the 1:250,000 scale did
not display a statistically significant difference in mean Hausdorff distances between hexagons
91
and squares. Means were very close, with hexagons producing a slightly longer distance (216.7
m for hexagons and 216.6 m for squares, see Table 4.1). This same pattern, with hexagonal mean
Hausdorff distance greater than square is seen once more, using the same vertex clustering
method at 1:1,000,000. Reason for this likely exists in the overall layout of the input lines and
how closely they followed the natural anisotropy of either tessellation. While hexagons are
generally less anisotropic because of their six-fold radial symmetry and consistency of distance
between tessera, it has been observed by some authors (Iftekharuddin & Karim, 1993; KamgarParsi et al., 1989) that square tessellations can have higher sampling fidelity rates when the signal
itself is more orthogonally distributed. Cartographic lines such as rivers and coastlines generally
have naturally high directional variability; while it may be true in general that hexagons sample
these lines with greater fidelity, there will be some instances at some tessellation resolutions
when a line will lay out relatively more orthogonally along the x and y axes. In these relatively
infrequent cases, a square tessellation can actually sample the line with greater fidelity.
As noted before, the spatial mean quantization method always produced shorter
Hausdorff distances than did the midpoint first and last vertices method (Table 4.1). This is
because the midpoint first and last points clustering method will tend to place output vertices
further away from input vertices in the same tessera (see Figure 3.7). In the case of an input line
intersecting with sampling tessera in a relatively orthogonal pattern, the midpoint first and last
clustering method can actually reinforce the orthogonality, whereas the spatial means clustering
method would tend to obscure it.
Finally, at 1:1,000,000, hexagons were not seen to yield statistically significantly shorter
Hausdorff distances than squares for either vertex clustering method. The mean Hausdorff
distance in the case of the spatial means clustering method was shorter for hexagons, but not
within the 95% confidence interval (2-tailed significance of .121, Table 4.5). As such, while
hexagons did in fact perform better overall at this scale and with this clustering method (i.e., they
92
yielded a lower mean Hausdorff distance, Table 4.1), from this hexagon-square pairing alone at
1:1,000,000, the statistical analysis does not support rejection of the possibility that this is due to
chance.
In the case of the midpoint first and last clustering method at 1:100,000, mean hexagonal
Hausdorff distance was slightly longer than square (Table 4.1); the same explanation regarding
relative orthogonality of input lines at certain sampling tessera widths is suggested to account for
this.
Magnitude of Improvement over Squares
From the compared mean Hausdorff distances, it can be seen that the magnitude of
improvement presented by hexagons over squares is small (as evidenced by the mean Hausdorff
distances given in Table 4.1); this difference is observed across all test pairings in Table 4.1 to
represent approximately 3.5% of the width of the tessera used (4.2% for the spatial mean and
2.9% for the midpoint first and last vertices methods independently).
Summary
It is observed that of the two algorithms, the hexagonal quantization algorithm generally
performs with the greatest positional fidelity to the input line because it produces lines with
shorter Hausdorff distances to the input line. This conclusion is supported by the results of a
three-way analysis of variance on mean Hausdorff distances, taking into account the factors of
algorithm used, quantization method used, and target scale. It is also supported by 11 of 14 trial
pairings at seven target scales. The fidelity difference between output lines from hexagons vs.
squares is statistically confirmed, with the benefit of hexagons over squares being relatively
93
small. These findings are taken to support the notion that hexagons demonstrated superior
performance over squares in general. It is also found that the spatial mean quantization method
produces simplified lines at significantly shorter displacements from the input line than does the
midpoint first and last vertices method.
Chapter 5
Conclusions and Future Work
There are two principal conclusions to this thesis. The first is that classical sampling
theory can be successfully coupled to map resolution to inform scale-specific map generalization
processes. This corroborates Li’s (1995) vision of a scale-driven paradigm for automated digital
map generalization. It allows for objective generalization, and removes the need to iteratively
repeat processes until a desirable solution is achieved. This is an important finding for
cartography, particularly because many algorithms currently in use by cartographers cannot be
calibrated to target scales, despite the fact that cartographers are often tasked with making
generalizations for maps whose scales are determined ahead of time, as is the case, for example,
in national topographic mapping settings.
The second principal conclusion of this thesis is that hexagonal tessellation generally
produces demonstrably more faithful simplified map lines then does square tessellation using a
vertex clustering line simplification technique. “Faithfulness” is understood to be the
minimization of positional difference, measured between the two sets of input and output polyline
vertices in ℝ2 by the Hausdorff distance. It is also argued from visual inspection that the
performance of the hexagonal quantization algorithm is closer to that which may be expected
from a human cartographer than is the performance of the Li-Openshaw raster-vector algorithm
(Li & Openshaw, 1992). One implication of this is that there now exists a method of line
simplification using a similar technique that produces improved lines from those generated by the
existing Li-Openshaw raster-vector algorithm.
95
This research has also demonstrated the utility of the Hausdorff distance in evaluating the
products of line simplification algorithms. The Hausdorff distance enjoys widespread use in
computer vision for pattern-matching because it metrizes pattern differences; the same ability to
compare can be applied to cartographic input and generalized data to quantify generalization
fidelity.
Related to the conclusion stated above regarding the relative performances of the
hexagonal quantization and Li-Openshaw raster-vector algorithms, another conclusion of this
thesis is formed from the significant difference in fidelity seen between the two quantization
methods tested. The spatial means method produces lines less deviated from the input line than
does the midpoint first and last method.
Relative Magnitude of Improvement
As was mentioned in the preceding chapter, the magnitude of fidelity improvement
presented by hexagons over squares is relatively small. Given that the tessera width is calculated
such that it is barely resolvable at target scale, the relatively small cartographic improvement the
hexagonal algorithm affords is not immediately visually appreciable. Small differences in
product lines are, however, visible at times upon close inspection (see Figures 4.2 through 4.10).
Future work by the author is planned to examine whether or not any visible differences in the
products of the two algorithms are due to differing levels of anisotropy inherited from either
tessellation; if one algorithm is found to be significantly more isotropic in its output, it is
suggested that that algorithm's cartographic output is truer to reality, even if the differences are
minutely noticeable.
Despite the small visual improvements afforded by the hexagonal algorithm, a known
value lies in the fact that it is, by however little, more accurate than square sampling. Any
96
subsequent modeling or analyses undertaken on simplified line data produced by the hexagonal
algorithm will be based on data with less inherent systematic error than data produced by a square
sampling process such as that of the Li-Openshaw raster-vector algorithm. Though the
magnitude of this difference in error is not great, there is no reason why analysts can't employ a
more accurate solution, particularly since it is no more difficult to employ. Even though analysts
frequently opt to use the highest-detail data available, maintaining low error levels in generalized
data is worthwhile: there always exist phenomena in geographic models that operate at smaller
scales. When examining for these, analysts are wise to select geospatial data appropriate to their
model’s scaling. Also, while “zooming in” on a line produced by the hexagonal quantization
algorithm goes against the scale-specific spirit in which the method was conceived, it is likely to
happen, given current paradigms in internet and mobile cartography. “Zooming in” on a vector
line produced by the hexagonal quantization algorithm would yield a more accurate line position
than doing the same on a line produced by the Li-Openshaw raster-vector algorithm, with the
slight improvements in the hexagonal quantization algorithm becoming more and more visually
appreciable as one increases map scale.
Future Tessellation Variations
One important possibility for investigation is the ability to refine fidelity by iterating
through many possible placements of the tessellation (Figure 3.4) and selecting the placement that
yields the lowest areal displacement for a given vertex clustering method. The Li-Openshaw
raster-vector algorithm, as described by Li and Openshaw (1992), explicitly places the first raster
cell centered on the first vertex of the input line, thus placing all other cells around this first one
in defined locations. Li also suggests (2007, p. 154) that his algorithm could be implemented
with a sliding raster grid following the input line, though he does not detail suggestions for
97
defining the amount of translation. A future branch of research on the hexagonal algorithm may,
for example, place the tessellation in 100 different randomly-generated positions. This may be
achieved by "jittering" the translation in x and y directions, each by some random value between
0 and 1, times half the tessera width. That calculation theoretically allows the tessellation to
move throughout the full range of motion possible before it simply re-coincides with its initial
position. Similarly, the hexagonal tessellation may be rotated through 60° (Figure 3.3). A
simplification may be undertaken for each tessellation position, and the software implementing
this may chose the simplification with the shortest Hausdroff distance as the final product. This
same process may be undertaken with the Li-Openshaw raster-vector algorithm, allowing for
shifting of the raster grid away from the position defined by the first vertex of the input line. In
this way it may be possible to optimize the outputs of either algorithm, and compare the
optimized fidelities.
Related to the translation of the sampling tessellation is the idea of varying local tessera
size according to local input line statistics. Local line statistics may include vertex frequency
along the line, or neighborhood total angularity, among other possibilities. It may be possible to
begin with a tessera size derived from a target map scale and resolution, and then expand or
contract local hexagons in relation to local line statistics, in order to achieve locally-varying
levels of simplification. This technique may also be useful in exaggeration procedures, identified
by researchers as a map generalization operator distinct from line simplification.
Future work on the hexagonal quantization algorithm may also involve alternative
quantitative evaluation methods. Several researchers have noted the utility of fractal dimension
in characterizing map lines (Buttenfield & McMaster, 1991; Normant & Tricot, 1993). It has
been asserted that maintaining fractal dimension should be an objective for automated line
simplification, since doing so would presumably retain the essential character of the line, and that
algorithms can be evaluated on their performance in this regard (Muller, 1987). Future analyses
98
of the hexagonal quantization algorithm, then, will measure and compare fractal dimension of the
input and output lines. Other metrics may also include more basic line characteristics, such as
sinuosity and angularity.
Repair of Line Self-Crossings
While this research has produced occasional self-crossings for either algorithm
implemented, a process for undoing these is currently under development. Because lines were
permitted to place more than one output vertex in a tessera, they may have crossed themselves
one or more times. A line self crossing is thought of as a “twist”. It is observed that a simple
process of checking for intersecting line segments and reversing vertex connectivity sequences is
able to undo these twists. A hypothetical post-processing algorithm would progress according to
the process laid out in Figure 5.1. Future work may implement this process, examine whether it
satisfactorily resolves self-crossings without creating spurious landscape features, and explore its
application to the products of other algorithms, such as the Douglas-Peucker (1973) algorithm.
Figure 5.1 - “Untwisting” line self-crossings. The routine iterates through all line segments,
checking for intersections with other line segments. When one is found, the sequence of vertices
starting from the second vertex of the first line segment until the first vertex of the second line
segment is reversed. The process repeats from the beginning of the line, “untwisting” selfcrossings one at a time, until no more are detected.
99
General Summary
The preceding work has detailed the invention and implementation of a new scalespecific line simplification algorithm, termed the hexagonal quantization algorithm. The
development of this algorithm has demonstrated that scale-specificity in cartographic line
simplification can be achieved objectively by applying basic sampling theory to map resolution.
It has also been demonstrated that lines produced by the hexagonal quantization algorithm are
more faithful to their input lines than those produced by a closely related algorithm, the LiOpenshaw raster-vector algorithm (Li & Openshaw, 1992).
Appendix A
Summary Table of All Sample Lines
All lines from Canadian or U.S. rivers and shores, sampled from National Hydro Network
(Canadian Council on Geomatics) or National Hydrography Dataset “high resolution” (USGS)
datasets. All straight-line distances from end to end within 15 to 20 km. Thumbnails are
individually reduced to fit.
Line
Thumbnail
Geomorphological Type
Alaskan Peninsula
Ice-dominated rocky beach
Baranof Island coast
Ice-dominated rocky beach
Bay of Fundy shore
Tidal-dominated coast
Black River
Contorted river
Cape Breton coast
Rocky glacier-formed shore
101
Cape Cod coast
Wave-dominated, depositional shore
Cedar River
Dendritic river
Western Florida coast
Sandy wave-dominated beach
Gaspé Peninsula coast
Rocky glacier-formed shore
Humboldt River
Contorted river
Île Jésus, Laval shore
Depositional river island shore
Killiniq Island coast
Ice-dominated rocky beach
Klinaklini River
Dendritic river
Southeastern Labrador coast
Ice-dominated rocky beach
102
Lake Ontario shore
Wave-dominated lake shore
Lake Superior shore
Wave-dominated lake shore
Southern Maine coast
Rocky glacier-formed shore
Mancos River
Contorted river
Northern Michigan shore
Wave-dominated lake shore
Mississippi Delta coast
Aluvial river delta shore
Myrtle Beach coast
Sandy wave-dominated beach
Southeastern Newfoundland coast
Rocky glacier-formed shore
Eastern Nova Scotia coast
Rocky glacier-formed shore
103
Obion River
Dendritic, meandering river
Northern Oregon coast
Wave-dominated sandy beach, some
sea cliffs
Pecatonica River
Dendritic, meandering river
Northeastern Prince Edward Island
Rocky glacier-formed shore
coast
Potomac River shore
Estuary shore
Rio Grande
Dammed, agriculturally-managed
meandering river
Saline River
High-sediment, meandering river
San Francisco coast
Partly human-defined shore
Suwannee River
Meandering river
104
Sweetwater River
Dendritic, high-sediment river
Yukon River
Dendritic river through mountainous
region
Appendix B
Example Text Report from Software
::: Starting Program :::::::::::::::::::::::::::::::::::::::::::::
Reading input file:
C:\Courses\Thesis\ThesisData\SampleLines2csv\NovaScotia_C.csv
Input scale:
Target scale:
Tessera width:
Vertex collapse method:
Calculating Hausdorff distances:
10000 - 50000
250000
312.5 m
midpoint 1st & last
true
...HEXAGONS.......................................
using output file:
C:\Courses\Thesis\ThesisData\SampleLines2csvSimplified\NovaScotia_C_MpH_250k.csv
Input vertices: 1190
Output vertices: 187
Output vertices are 15.714% of input.
(84.286% decrease)
~~ Hausdorff Report ~~~~~~~~~~~~~~~
h(input to simplified) = 251.13 m
h(simplified to input) = 73.59 m
* H(input, simplified) = 251.13 m
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... Hexagons done! ..................................
...SQUARES.......................................
using output file:
C:\Courses\Thesis\ThesisData\SampleLines2csvSimplified\NovaScotia_C_MpS_250k.csv
Input vertices: 1190
Output vertices: 183
Output vertices are 15.378% of input.
(84.622% decrease)
~~ Hausdorff Report ~~~~~~~~~~~~~~~
h(input to simplified) = 253.9 m
h(simplified to input) = 106.28 m
* H(input, simplified) = 253.9 m
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... Squares done! ................................
::: Ending Program :::::::::::::::::::::::::::::::::::::::::::::::
106
References
Akman, V., Franklin, W. R., Kankanhalli, M., & Narayanaswami, C. (1989). Geometric
computing and uniform grid technique. Computer-Aided Design, 21(7), 410-420.
Alt, H., Godau, M., Knauer, C., & Wenk, C. (2002). Computing the Hausdorff distance of
geometric patterns and shapes. Discrete and Computational Geometry-The GoodmanPollack-Festschrift.
Alt, H., & Guibas, L. J. (2000). Discrete geometric shapes: matching, interpolation, and
approximation; a survey. In J.-R. Sack & J. Urrutia (Eds.), Handbook of Computational
Geometry (pp. 121–153). Amsterdam: Elsevier Science B.V.
Arkin, E. M., Chew, L. P., Huttenlocher, D. P., Kedem, K., & Mitchell, J. S. B. (1991). An
efficiently computable metric for comparing polygonal shapes. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 13(3), 209-216.
Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review,
61(3), 183-193.
Ballard, D. H. (1981). Strip trees: a hierarchical representation for curves. Communications of the
ACM, 24(5), 310-321.
Bertin, J. (1983). Semiology of Graphics: University of Wisconsin Press.
Birch, C. P. D., Oom, S. P., & Beecham, J. A. (2007). Rectangular and hexagonal grids used for
observation, experiment and simulation in ecology. Ecological Modelling, 206(3-4), 347359.
Bloch, M., & Harrower, M. (2008, 4 September, 2008). Mapshaper, from mapshaper.org
Brassel, K., & Weibel, R. (1988). A review and conceptual framework of automated map
generalization. International Journal of Geographical Information Science, 2(3), 229244.
Brewer, C. A. (1996). Prediction of simultaneous contrast between map colors with Hunt's model
of color appearance. Color Research and Application, 21(3), 221-235.
Buchanan, B. G., & Duda, R. O. (1983). Principles of rule-based expert systems. Advances in
Computers, 22, 163-216.
Burghardt, D., & Cecconi, A. (2007). Mesh simplification for building typification. International
Journal of Geographical Information Science, 21(3), 283-283.
Buttenfield, B. P. (1985). Treatment of the cartographic line. Cartographica: The International
Journal for Geographic Information and Geovisualization, 22(2), 1-26.
Buttenfield, B. P. (1989). Scale-dependence and self-similarity in cartographic lines.
Cartographica, 26(1), 79-100.
Buttenfield, B. P. (1991). A rule for describing line feature geometry. In B. P. Buttenfield & R. B.
McMaster (Eds.), Map Generalization: Making Rules for Knowledge Representation (pp.
150-239). Essex: Longman Scientific & Technical.
Buttenfield, B. P., & McMaster, R. B. (Eds.). (1991). Map Generalization: Making Rules for
Knowledge Representation. Essex: Longman Scientific and Technical.
Carr, D. B., Olsen, A. R., & White, D. (1992). Hexagon mosaic maps for display of univariate
and bivariate geographical data. Cartography and Geographic Information Science,
19(4), 228-236.
Carstensen, L. W. (1990). Angularity and capture of the cartographic line during digital data
entry. Cartography and Geographic Information Systems, 17(3), 209-224.
Cartography, S. S. o. (1977). Cartographic generalization Cartographic Publication Series.
Enshede, The Netherlands: ITC Cartography Department.
107
Cecconi, A. (2003). Integration of cartographic generalization and multi-scale databases for
enhanced web mapping. Ph.D., Universität Zürich, Zürich. Retrieved from http://ecollection.ethbib.ethz.ch/show?type=extdiss&nr=6
Christaller, W. (1933). Die zentralen Orte in Suddeutschland. Jena: Gustav Fischer.
Christensen, A. H. J. (2000). Line generalization by waterlining and medial-axis transformation.
Successes and issues in an implementation of Perkal's proposal. The Cartographic
Journal, 37(1), 19-28.
Condat, L., Van De Ville, D., & Blu, T. (2005). Hexagonal versus orthogonal lattices: a new
comparison using approximation theory. Paper presented at the IEEE International
Conference on Image Processing.
Cromley, R. G. (1991). Hierarchical methods of line simplification. Cartography and Geographic
Information Science, 18(2), 125-131.
Cromley, R. G. (1992). Principal axis line simplification. Computers & Geosciences, 18(8), 10031011.
Cromley, R. G., & Campbell, G. M. (1992). Integrating quantitative and qualitative aspects of
digital line simplification. The Cartographic Journal, 29(1), 25-30.
Dalmau, D. S-C. (2004). Core techniques and algorithms in game programming: New Riders
Publishing.
Dent, B. (1972). A note on the importance of shape in cartogram communication. Journal of
Geography, 71, 393-401.
Douglas, D. H., & Peucker, T. K. (1973). Algorithms for the reduction of the number of points
required to represent a digitized line or its caricature. Cartographica, 10(2), 112-122.
Duff, M. J. B., Watson, D. M., Fountain, T. J., & Shaw, G. K. (1973). A cellular logic array for
image processing. Pattern Recognition, 5, 229-247.
Dutton, G. (1999). Scale, sinuosity, and point selection in digital line generalization. Cartography
and Geographic Information Science, 26(1), 33-53.
García, J. A., & Fdez-Valdivia, J. (1994). Boundary simplification in cartography preserving the
different-scale shape features. Computers & Geosciences, 20(3), 349-368.
Geomatics Canada. (2010). National Hydro Network Data Product Specifications Distribution
Profile. Sherbrooke, Quebec: Her Majesty the Queen in Right of Canada, Department of
Natural Resources. Retrieved from
http://www.geobase.ca/doc/specs/pdf/GeoBase_NHN_Specs_EN.pdf.
Graham, M. D. (1990). Comparison of three hexagonal tessellations through extraction of blood
cell geometric features. Analytical and Quantitative Cytology and Histology, 12(1), 5672.
Griffin, A. L., MacEachren, A. M., Hardisty, F., Steiner, E., & Li, B. (2006). A comparison of
animated maps with static small-multiple maps for visually identifing space-time clusters.
Annals of the Association of American Geographers, 96(4), 740-753.
Hales, T. C. (2001). The honeycomb conjecture. Discrete and Computational Geometry, 25(1), 122.
Hangouët, J. (1995). Computation of the Hausdorff distance between plane vector polylines.
Paper presented at the AutoCarto12 Conference, Charlotte, North Carolina.
Harrie, L., & Weibel, R. (2007). Modelling the overall process of generalisation. In W. A.
Mackaness, A. Ruas & L. T. Sarjakoski (Eds.), Generalisation of Geographic
Information: Cartographic Modelling and Applications (pp. 67-87). Elsevier.
Hoppe, H. (1996). Progressive meshes. Paper presented at the ACM SIGGRAPH Conference.
Huttenlocher, D., Klanderman, G., & Rucklidge, W. (1993). Comparing Images Using the
Hausdorff Distance. IEEE Transactions on pattern analysis and machine intelligence,
15(9).
108
Iftekharuddin, K. M., & Karim, M. A. (1993). Acquisition of noise-free and noisy signal: effect of
different staring focal-plane-array pixel geometry. Paper presented at the IEEE National
Aerospace and Electronics Conference, Dayton, Ohio.
Jenks, G. F. (1979). Thoughts on line generalization. Paper presented at the AutoCarto 4
Conference, Reston, Virgina.
Jenks, G. F. (1989). Geographic logic in line generalization. Cartographica: The International
Journal for Geographic Information and Geovisualization, 26(1), 27-42.
Kamgar-Parsi, B., Kamgar-Parsi, B., & Sander, W. A., III. (1989). Quantization error in spatial
sampling: comparison between square and hexagonal pixels. Paper presented at the
Computer Vision and Pattern Recognition Conference.
Kazemi, S., Lim, S., & Paik, H. (2009). Generalisation expert system (GES): a knowledge-based
approach for generalisation of line and polyline spatial datasets. Paper presented at the
Surveying & Spatial Sciences Institute Biennial International Conference, Adelaide,
South Australia.
Knauer, C., Löffler, M., Scherfenberg, M., & Wolle, T. (2009). The directed Hausdorff distance
between imprecise point sets. In Y. Dong, D.-Z. Du & O. Ibarra (Eds.), Algorithms and
Computation (Vol. 5878, pp. 720-729). Berlin & Heidelberg: Springer.
Lang, T. (1969). Rules for robot draughtsmen. The Geographical Magazine, 42(1), 50-51.
Lecordix, F., Plazanet, C., & Lagrange, J. P. (1997). A platform for research in generalization:
application to caricature. GeoInformatica, 1(2), 161-182.
Li, Z. (1996). Transformation of spatial representation in scale dimension: a new paradigm for
digital generalization of spatial data. International Archives of Photogrammetry and
Remote Sensing, 31, 453-458.
Li, Z. (2007). Algorithmic Foundation of Multi-Scale Spatial Representation. Boca Raton,
London, New York: CRC Press.
Li, Z., & Openshaw, S. (1990). A natural principle of objective generalization of digital map data
and other spatial data. RRL Research Report: CURDS, University of Newcastle upon
Tyne.
Li, Z., & Openshaw, S. (1992). Algorithms for automated line generalization based on a natural
principle of objective generalization. International Journal of Geographical Information
Systems, 6(5), 373-389.
Li, Z., & Openshaw, S. (1993). A natural principle for the objective generalization of digital
maps. Cartography and Geographic Information Science, 20(1), 19-29.
Li, Z., & Su, B. (1995). From phenomena to essence: envisioning the nature of digital map
generalisation. The Cartographic Journal, 32(1), 45-47.
Llanas, B. (2005). Efficient computation of the Hausdorff distance between polytopes by exterior
random covering. Computational Optimization and Applications, 30, 161-194.
Mandelbrot, B. (1982). The Fractal Geometry of Nature: San Francisco: Freeman.
Marino, J. (1979). Identification of characteristic points along naturally occurring lines: an
empirical study. The Canadian Cartographer, 16(1), 70-80.
McMaster. (1986). A statistical analysis of mathematical measures for linear simplifcation. The
American Cartographer, 13(2), 103-116.
McMaster, R. B. (1987). Automated line generalization. Cartographica, 24(2), 74-111.
McMaster, R. B. & Shea, K. S. (1988). Cartographic generalization in a digital environment: a
framework for implementation in a geographic information system. Paper presented at the
GIS/LIS'88 Conference, San Antonio, Texas.
McMaster, R. B. & Shea, K. S. (1992). Generalization in digital cartography. Washington, D.C.:
Association of American Geographers.
109
McMaster, R. B., & Veregin, H. (1991). Visualizing cartographic generalization. Paper presented
at the AutoCarto 10 Conference, Baltimore, Maryland.
Meer, P., Sher, C. A., & Rosenfeld, A. (1990). The chain pyramid: hierarchical countour
processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(4),
363-376.
Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). Introduction to Probability and Statistics
(12 ed.). Belmont, California: Duxbury, Thomson Brooks/Cole.
Mersereau, R. M. (1978). Two-dimensional signal processing from hexagonal rasters. Paper
presented at the IEEE International Conference on Acoustics, Speech, and Signal
Processing.
Mersereau, R. M. (1979). The processing of hexagonally-sampled two-dimensional signals.
Proceedings of the IEEE, 67(6), 930-949.
Muller, J-C. (1987). Fractal and automated line generalization. The Cartographic Journal, 24(1),
27-34.
Muller, J. C. (1990). The removal of spatial conflicts in line generalization. Cartography and
Geographic Information Science, 17(2), 141-149.
Nell, A. L. (1989). Hexagonal image processing. Paper presented at the Southern African
Conference on Communications and Signal Processing, Stellenbosch, South Africa.
Nickerson, B. G. (1988). Automated cartographic generalization for linear features.
Cartographica, 25(3), 15-66.
Normant, F., & Tricot, C. (1993). Fractal simplification of lines using convex hulls. Geographical
Analysis, 25(2), 118-129.
Nyquist, H. (1928). Certain topics in telegraph transmission theory. Transactions of the American
Institute of Electrical Engineers, 47(2), 617-644.
Perkal, J. (1965). An attempt at objective generalization. Michigan Inter-University Community
of Mathematical Geographers, Discussion Paper 10.
Peucker, T. (1976). A theory of the cartographic line. International yearbook of cartography, 16,
134-143.
Peuquet, D. J. (2002). Representations of space and time. New York: Guilford Press.
Plazanet, C. (1995). Measurement, characterization and classification for automated line feature
generalization. Paper presented at the AutoCarto Conference, Charlotte, North Carolina.
Puu, T. (2005). On the genesis of hexagonal shapes. Networks and Spatial Economics, 5(1), 5-20.
Ramer, U. (1972). An iterative procedure for the polygonal approximation of plane curves.
Computer Graphics and Image Processing, 1, 244-256.
Raposo, P. (2010). Piece by Piece: A Method of Cartographic Line Generalization Using Regular
Hexagonal Tessellation. Paper presented at the ASPRS/CaGIS 2010 Fall Specialty
Conference, AutoCarto 2010, Orlando, Florida.
Ratajski, L. (1967). Phénomène des points de généralisation. International Yearbook of
Cartography, 7, 143-152.
Robinson, A. H., Morrison, J. J., Muehrcke, P. C., Kimerling, A. J., & Guptill, S. C. (1995).
Elements of Cartography (6 ed.): Wiley.
Rosin, P. L. (1992). Representing curves at their natural scales. Pattern Recognition, 25(11),
1315-1325.
Rossignac, J. (2004). Surface Simplification and 3D Geometry Compression. In J. E. Goodman &
J. O'Rourke (Eds.), Handbook of Discrete and Computational Geometry (2 ed.). Boca
Raton, Florida: Chapman & Hall/CRC.
Rossignac, J., & Borrel, P. (1993). Multi-resolution 3D approximations for rendering complex
scenes Geometric Modeling in Computer Graphics (pp. 445-465). Berlin: SpringerVerlag.
110
Ruas, A. (2002). Les problématiques de l'automatisation de la généralisation. In A. Ruas (Ed.),
Généralisation et représentation multiple (pp. 75-90). Hermès.
Rucklidge, W. (1996). Efficient visual recognition using the Hausdorff distance. Berlin: Springer
Verlag.
Rucklidge, W. (1997). Efficiently locating objects using the Hausdorff distance. International
Journal of Computer Vision, 24(3), 251-270.
Saalfeld, A. (1999). Topologically Consistent Line Simplification with the Douglas-Peucker
Algorithm. Cartography and Geographic Information Science, 26(1), 7-18.
Sarjakoski, L. T. (2007). Conceptual Models of Generalisation and Multiple Representation. In
W. A. Mackaness, A. Ruas & L. T. Sarjakoski (Eds.), Generalisation of Geographic
Information: Cartographic Modelling and Applications (pp. 11-36). Singapore: Elsevier,
on behalf of the International Cartographic Association.
Savary, L., & Zeitouni, K. (2005). Automated linear geometric conflation for spatial data
warehouse integration process. Paper presented at the Associated Geographic
Information Laboratories Europe (AGILE) Conference, Estoril, Portugal.
Scholten, D. K., & Wilson, S. G. (1983). Chain coding with a hexagonal lattice. IEEE
Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(5), 526-533.
Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical
Journal, 27, 379-423.
Shea, K. S., & McMaster, R. B. (1989). Cartographic generalization in a digital environment:
When and how to generalize. Paper presented at the AutoCarto 9 Conference, Baltimore,
Maryland.
Simley, J. D., & Carswell Jr., W. J. (2009). The National Map - Hydrography: U.S. Geological
Survey Fact Sheet. (3054). Retrieved from
http://pubs.usgs.gov/fs/2009/3054/pdf/FS2009-3054.pdf.
Skopeliti, A., & Tsoulos, L. (2001). A knowledge based approach for the generalization of linear
features. Paper presented at the International Cartographic Conference, Beijing, China.
Speiss, E. (1988). Map compilation. In R. W. Anson (Ed.), Basic Cartography. London: Elsevier.
Stoter, J., Smaalen, J. v., Bakkerand, N., & Hardy, P. (2009). Specifying map requirements for
automated generalization of topographic data. The Cartographic Journal, 46(3), 214-227.
Thapa, K. (1988a). Automatic line generalization using zero-crossings. Photogrammetric
Engineering and Remote Sensing, 54, 511-517.
Thapa, K. (1988b). Critical points detection and automatic line generalisation in raster data using
zero-crossings. The Cartographic Journal, 25(1), 58-68.
Tobler, W. R. (1987). Measuring spatial resolution. Paper presented at the International
Workshop On Geographic Information Systems, Beijing, China.
Töpfer, F., & Pillewizer, W. (1966). The principles of selection. The Cartographic Journal, 3(1),
10-16.
Trenhaile, A. S. (2007). Geomorphology: a Canadian Perspective (3 ed.). Toronto: Oxford
University Press.
Unger, S. H. (1958). A computer oriented toward spatial problems. Proceedings of the IRE,
46(10), 1744-1750.
Van Der Poorten, P. M., & Jones, C. B. (2002). Characterisation and generalisation of
cartographic lines using Delaunay triangulation. International Journal of Geographical
Information Science, 16(8), 773 - 794.
Veltkamp, R. C. (2001). Shape matching: similarity measures and algorithms. Paper presented at
the International Conference on Shape Modeling & Applications, Genova, Italy.
Veltkamp, R. C., & Hagedoorn, M. (2000). Shape similiarity measures, properties, and
constructions. Paper presented at the 4th International VISUAL 2000 Conference.
111
Veregin, H. (1999). Line simplification, geometric distortion, and positional error.
Cartographica, 36(1), 25-39.
Veregin, H. (2000). Quantifying positional error induced by line simplification. International
Journal of Geographical Information Science, 14(2), 113-130.
Visvalingam, M., & Whyatt, J. (1993). Line generalisation by repeated elimination of points. The
Cartographic Journal, 30(1), 46-51.
Wang, Z., & Muller, J. (1998). Line generalization based on analysis of shape characteristics.
Cartography and Geographic Information Science, 25(1), 3-15.
Weed, J., & Polge, R. (1984). An efficient implementation of a hexagonal FFT. Paper presented at
the IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP) '84, San Diego, California.
Weibel, R. (1997). Generalization of spatial data: Principles and selected algorithms. In M. van
Kreveld, J. Nievergelt, T. Roos & P. Widmayer (Eds.), Algorithmic Foundations of
Geographic Information Systems (Vol. 1340, pp. 99-152): Springer Berlin / Heidelberg.
White, E. R. (1985). Assessment of line-generalization algorithms using characteristic points.
Cartography and Geographic Information Science, 12(1), 17-28.
Yajima, S., Goodsell, J. L., Ichida, T., & Hiraishi, H. (1981). Data Compression of the Kanji
Character Patterns Digitized on the Hexagonal Mesh. IEEE Transactions on Pattern
Analysis and Machine Intelligence, PAMI-3(2), 221-230.
Yang, S.-K., & Chuang, J.-H. (2003). Material-discontinuity preserving progressive mesh using
vertex-collapsing simplification. Virtual Reality, 6(4), 205-216.
Zhan, B., & Buttenfield, B. (1996). Multi-scale representation of a digital line. Cartography and
Geographic Information Science, 23(4), 206-228.