The Pennsylvania State University The Graduate School Department of Geography, College of Earth and Mineral Sciences SCALE-SPECIFIC AUTOMATED MAP LINE SIMPLIFICATION BY VERTEX CLUSTERING ON A HEXAGONAL TESSELLATION A Thesis in Geography by Paulo Raposo 2011 Paulo Raposo Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science August 2011 The thesis of Paulo Raposo was reviewed and approved* by the following: Cynthia A. Brewer Professor of Geography Thesis Advisor Donna Peuquet Professor of Geography Karl Zimmerer Professor of Geography Head of the Department of Geography *Signatures are on file in the Graduate School iii ABSTRACT Despite decades of research, effective automated line simplification methods retaining cartographic nuance remain elusive. Solutions to this problem have also become increasingly important with the development and prevalence of web cartography. Most existing algorithms involve parameters selected arbitrarily or heuristically, with little reference to the scale change between the original data and the desired generalized representation. This research presents the hexagonal quantization algorithm, a new method by which regular hexagonal tessellations of variable scale are used to sample cartographic lines for simplification. Hexagon width, reflecting sampling fidelity, is varied in proportion to target scale, thereby allowing for cartographic scalespecificity. Tesserae then constitute loci among which a new set of vertices can be defined by vertex clustering quantization, and this set of vertices is used to compose a generalized correlate of the input line which is appropriate for its intended mapping scale. Hexagon scaling is informed by sampling theory, drawn both from the field of geography and other cognate signalprocessing fields such as computer science and engineering. The present study also compares the hexagonal quantization algorithm to the LiOpenshaw raster-vector algorithm, which undertakes a similar process using traditional square raster cells. The lines produced by either algorithm using the same tessera width are objectively compared for fidelity to the original line in two ways: spatial displacement from the input line is measured using Hausdorff distances, and the product lines are presented against their input lines for visual inspection. Results show that hexagonal quantization offers appreciable advantages over the square tessellations of traditional raster cells for vertex clustering line simplification in that product lines are less displaced from input lines. It is found that product lines from the hexagonal quantization algorithm generally maintained shorter Hausdorff distances than did those from the Li-Openshaw iv raster-vector algorithm. Also, visual inspection suggests lines produced by the hexagonal quantization algorithm retain informative geographical shapes for greater differences in scale than do those produced by the Li-Openshaw raster-vector algorithm. Results of this study yield a scale-specific cartographic line simplification algorithm that is readily applicable to cartographic linework. v TABLE OF CONTENTS List of Figures .......................................................................................................................... vii List of Tables ........................................................................................................................... x Acknowledgements .................................................................................................................. xi Chapter 1 Introduction ............................................................................................................. 1 Lines on Maps .................................................................................................................. 1 Unique Contributions ............................................................................................... 5 Thesis Structure................................................................................................................ 6 Chapter 2 Line Simplification Literature ................................................................................. 7 Generalization .................................................................................................................. 8 Line Simplification .......................................................................................................... 11 Characteristic Points................................................................................................. 12 Segmentation and Strip Trees .................................................................................. 14 Point Reduction vs. Line Redefinition ..................................................................... 15 Constraints and Scale-Specificity............................................................................. 16 Classes of Line Simplification Algorithms .............................................................. 18 Survey of Cartographic Algorithms ................................................................................. 20 Algorithms Popular in Cartography ......................................................................... 20 Outside of Cartography: Vertex Clustering and Mesh Simplification ..................... 30 Hexagonal and Square Tessellations Applied to Pattern Analysis and Generalization .................................................................................................. 33 Hausdorff Distance .......................................................................................................... 38 The Hausdorff Distance vs. McMaster's Measures of Simplified Lines .................. 41 Summary .......................................................................................................................... 42 Chapter 3 The Hexagonal Quantization Algorithm and Study Methods ................................. 44 Overview of the Algorithm .............................................................................................. 44 Tessellation and Polyline Structure .......................................................................... 47 Steps of the Hexagonal Quantization Algorithm ............................................................. 49 Calculation of Tessellation Resolution..................................................................... 49 Tessellation Layout .................................................................................................. 50 Vertex Clustering and Quantization ......................................................................... 51 Clustering Routine Compared Li & Openshaw’s Suggestion .................................. 53 Implementation ................................................................................................................ 56 Sample Lines ............................................................................................................ 58 Experiment Design and Statistical Comparison Between Hexagonal and Square Outputs ..................................................................................................................... 60 Experimental Design ................................................................................................ 61 vi Chapter 4 Results and Interpretations ...................................................................................... 64 Resulting Line Simplifications: Visual Presentation ....................................................... 64 Statistical Results ............................................................................................................. 76 Interpretations .................................................................................................................. 85 Discussion of Cartographic Results ......................................................................... 85 Discussion of Statistical Results .............................................................................. 88 Summary .......................................................................................................................... 92 Chapter 5 Conclusions and Future Work ................................................................................. 94 Relative Magnitude of Improvement ....................................................................... 95 Future Tessellation Variations ................................................................................. 96 Repair of Line Self-Crossings .................................................................................. 98 General Summary ............................................................................................................ 99 Appendix A Summary Table of All Sample Lines ................................................................. 100 Appendix B Example Text Report from Software.................................................................. 105 References ................................................................................................................................ 106 vii LIST OF FIGURES Figure 2.1 - Attneave's sleeping cat. (Source: Attneave, 1954, p. 185) ................................... 13 Figure 2.2 - Perkal’s method at three different values of ε. Hatched areas are inaccessible to the roulette, and therefore dropped from the lake form. (Source: Perkal, 1965, p. 65) .................................................................................................................................... 25 Figure 2.3 - The Douglas-Peucker algorithm. (Source: McMaster & Shea, 1992, p. 8081) .................................................................................................................................... 27 Figure 2.4 - The Visvalingam-Whyatt algorithm. (Source: Visvalingam & Whyatt, 1993, p. 47) ................................................................................................................................ 28 Figure 2.5 - The Li-Openshaw raster-vector algorithm. The sinuous gray line represents the input line, the darker gray lines are segments within cells from entry to exit points of the input line, and the black line is the simplified line, formed from the midpoints of the darker gray lines. (Source: Weibel, 1997, p. 125) ............................... 30 Figure 2.6 - Mesh simplification. (Source: Dalmau, 2004) .................................................... 31 Figure 2.7 - The three possible regular tessellations of the plane. (Source: Peuquet, 2002) .. 34 Figure 2.8 - Connectivity paradox; in triangles and squares, whether or not regions A and B are connected by the corners of cells l and m is unclear, as is whether or not gray cells form a continuous region across cells p and q. There is no such ambiguity in hexagons. (Adapted from source: Duff, Watson, Fountain, & Shaw, 1973, p. 254) ...... 36 Figure 2.9 - An equilateral hexagon and square in their circumcircles. The area of the hexagon is closer to its circumcircle than is the square’s to that of its circumcircle. (Source: WolframAlpha.com) .......................................................................................... 37 Figure 2.10 - The Hausdorff Distance in ℝ2. Line M represents the longest distance an element a of all elements A has to go to reach the closest element b. Line N represents the same, but from B (and all elements b thereof) to the closest element a. Line M is the directed Hausdorff distance from A to B, while line N is the directed Hausdorff distance from B to A. The longer of these two (M) represents the (overall) Hausdorff distance. (Figure adapted from source: http://www.mathworks.com/matlabcentral/fileexchange/26738-hausdorff-distance, graphic by Zachary Danziger).......................................................................................... 39 Figure 3.1 - The hexagonal quantization algorithm. In each hexagon, the input vertices (gray) are quantized to a single output vertex (black), resulting in a simplified output line (in black). .................................................................................................................. 44 Figure 3.2 - Hexagon width (i.e., tessera resolution). .............................................................. 46 Figure 3.3 - Sixty-degree range of rotation for regular hexagonal tessellations. ..................... 46 viii Figure 3.4 - The effect on output lines caused by shifting the tesserae. Input vertices and lines are in gray, and output vertices and lines are in red. ............................................... 47 Figure 3.5 - Layout of hexagons using the bounding box delimiting the line. The hexagon in the north-west corner is drawn centered on the bounding box corner first, with hexagons below it drawn to follow. The second “column” of hexagons to the east is drawn next, and the process continues until the bounding box is completely covered by a hexagon on all sides. ................................................................................... 51 Figure 3.6 - Constructing an output vertex (orange) for each pass (first in red, second in blue) of the input line through the hexagon. .................................................................... 52 Figure 3.7 - The two clustering methods used in this research. The midpoint of the first and last vertices method is illustrated on the left, while the spatial mean of vertices is illustrated on the right. ..................................................................................................... 53 Figure 3.8 - Li's suggested solution for single vertex selection within cells with multiple passes of the input line - see cell at top, center. (Source: Li, 2007, p. 153) .................... 54 Figure 3.9 - An effect of Li's suggested method of selecting single vertices in a cell with multiple input line passes. In this example, the application of Li’s suggestion at the tessera overlapping the peninsula’s connection to the mainland would cause the entire peninsula to be deleted, whereas a representation of it could be retained at this cell resolution (i.e., target scale). ..................................................................................... 55 Figure 3.10 - A screen shot of the graphical user interface of the software developed to implement the algorithms and the calculation of Hausdorff distances. ........................... 57 Figure 3.11 - Locations of the 34 sample lines used in this research. Coast and shore lines are indicated in italics. (Background hypsometric tint courtesy of Tom Patterson, source: NaturalEarthData.com) ....................................................................... 60 Figure 4.1 - All 34 lines simplified by the hexagonal quantization algorithm to 1:500,000 and drawn to scale. ........................................................................................................... 65 Figure 4.2 - Simplifications of a portion of the coast of Maine produced by both the hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the spatial mean quantization option, against the original line (gray). All lines drawn to 1:24,000. ......................................................................... 67 Figure 4.3 - Simplifications of a portion of the coast of Maine produced by both the hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the midpoint first and last vertices quantization option, against the original line (gray). All lines drawn to 1:24,000........................................... 68 Figure 4.4 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm (green, right) using the spatial mean quantization option, against the original line (gray). All lines drawn to 1:24,000........................................... 69 ix Figure 4.5 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm (green, right) using the midpoint first and last vertices quantization option, against the original line (gray). All lines drawn to 1:24,000.......... 70 Figure 4.6 - Portion of the coast of Newfoundland, simplified to seven target scales by the hexagonal quantization algorithm using the midpoint first and last vertices quantization option........................................................................................................... 71 Figure 4.7 - Portion of the coast of Newfoundland, simplified to seven target scales by the Li-Openshaw raster-vector algorithm using the midpoint first and last vertices quantization option........................................................................................................... 72 Figure 4.8 - Portion of the Humboldt River, simplified to 1:150,000 by both algorithms using both quantization options. The orange box signifies the location of the 1:24,000 segment (at top) on the simplified lines. ........................................................... 73 Figure 4.9 - Portion of the Mississippi Delta coastline, simplified to 1:250,000 by both algorithms using both quantization options. .................................................................... 74 Figure 4.10 - Portion of the shore of the Potomac River, simplified to 1:500,000 by both algorithms using both quantization options. The orange box signifies the location of the 1:24,000 segment (at top-center) on the simplified lines. .......................................... 75 Figure 4.11 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and square samples, using the spatial mean quantization option. ........................................... 78 Figure 4.12 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and square samples using the midpoint first and last vertices quantization option. ............... 79 Figure 5.1 - “Untwisting” line self-crossings. The routine iterates through all line segments, checking for intersections with other line segments. When one is found, the sequence of vertices starting from the second vertex of the first line segment until the first vertex of the second line segment is reversed. The process repeats from the beginning of the line, “untwisting” self-crossings one at a time, until no more are detected. ............................................................................................................ 98 x LIST OF TABLES Table 2.1 - Distances and areas for different regular tessellation geometries. (Source: Duff et al., 1973, p. 245) .................................................................................................. 35 Table 2.2 - Required properties of a true mathematical metric. (Source: Veltkamp & Hagedoorn, 2000, p. 468)................................................................................................. 40 Table 4.1 - Mean Hausdorff distances (in ground meters) between simplified and input vertices. Each mean Hausdorff distance is calculated from n = 34 simplified lines and their related input lines. ............................................................................................. 76 Table 4.2 - Pearson correlation coefficients for differences in means observed between the hexagonal and square algorithms, using the midpoint first and last vertices quantization option........................................................................................................... 81 Table 4.3 - T test statistics across seven scales for the difference in mean Hausdorff distances between square and hexagonal algorithms using the midpoint first and last vertices quantization option. ............................................................................................ 81 Table 4.4 - Pearson correlation coefficients for differences in means observed between the hexagonal and square algorithms, using the spatial mean quantization option.......... 81 Table 4.5 - T test statistics across seven scales for the difference in mean Hausdorff distances between square and hexagonal algorithms using the spatial mean quantization option........................................................................................................... 82 Table 4.6 - Related-samples Wilcoxon signed rank statistics. ................................................. 83 Table 4.7 - Three-way ANOVA test statistics across all 952 simplifications and three factors. .............................................................................................................................. 84 Table 4.8 - Mean percent reductions in vertices from the input line, averaged across all 34 sample lines, for each algorithm and each quantization option. ...................................... 85 xi ACKNOWLEDGEMENTS I would like to thank my advisor Dr. Cynthia Brewer for all her invaluable support and guidance, both on my thesis work and other projects. I also wish to thank the Department of Geography in general, for having provided me with an excellent environment and community of scholars within which to learn and work. I’ve had several thought-provoking and inspiring conversations with Donna Peuquet about this research, and she has served as thesis reader - thank you! I am indebted, both to Professor Krzysztof Janowicz, and my fellow graduate student Alexander Savelyev, for their teaching and help with programming in Java. Also, conversations with the following persons have helped me develop the ideas in this thesis, and I thank them each: Barbara Buttenfield, Charlie Frye, Zhilin Li, and Alan Saalfeld. Chapter 1 Introduction This research has set out to develop a scale-specific algorithm for cartographic line simplification that uses two-dimensional regular hexagonal tessellations and a vertex clustering quantization technique. In development of the algorithm, this research has had two main goals: to implement classical sampling theory and map resolution theory in service to cartographic line simplification in order to achieve scale-specificity, and to demonstrate that hexagonally tessellated sampling performs with greater fidelity to original lines than do the square tessellations of traditional rasters. The algorithm developed, termed “hexagonal quantization,” has been implemented in original software. It has then been compared to the Li-Openshaw rastervector algorithm, which also uses a regular tessellation and vertex clustering technique, but with square raster cells. The essential differences between the two algorithms are the geometry of the tessellation used, and the formulae by which tessera dimensions are calculated in relation to target scale. Using constant input lines and tessera widths, comparison is done of the distances between the lines produced by either algorithm and the input line. A simple formula based on map resolution at target scale has been developed for use with the hexagonal quantization algorithm to permit scale-specificity. Lines on Maps One of the most fundamental notations one can make when drawing any kind of diagram or sketch is a line. Chaining lines together or making one end where it began builds any kind of 2 polygonal representation in a sketch. Maps, of course, are like any other kind of sketch in which lines figure strongly. In mathematics, a line is frequently regarded as the set of all points that lie on the path defined by a function. By this definition, a line may exist in anywhere from one to infinite dimensional space (i.e., ℝ 1, ℝ2, ... , ℝ ∞), and is composed of an infinite set of points within the range of the function. Such a definition mimics the behavior of a line in the real world, in that the number of points along a real line is limited only by the precision with which the line can be observed or measured. Some lines in contemporary geographic information systems and cartography are defined in this mathematical way, such as Bezier curves, which draw a curve that meets certain smoothness criteria by first deriving a mathematical function for it. These, however, are rare, and most GIS and cartographic lines are defined as polylines. Polylines are defined by finite sets of points, between which straight-line segments are sequentially chained to build a linear feature. Strictly speaking, there is no curvature in polylines, but because straight segments can meet at vertices at variable angles, the overall form can follow or mimic curves. The large majority of contemporary digital “lines,” whether in cartography or any other form of digital graphics, exist in this form. A reason for this may be the relative simplicity of these lines over those defined by mathematical functions, both in terms of conceptualization and in digital creation; it is generally easier to digitize a line by a series of points than to try finding a mathematical function that satisfactorily models a real-world linear feature. Another reason is the way in which sequenced sets of points in a polyline are easily encoded and manipulated in the form of programming language arrays. With polylines being the de facto standard for lines in a GIS, manipulation and rendering of map lines is a matter of computation on the set of points (or, the set of line segments between points) that define them. There are significant implications for multi-scale representation in this fact, and these mainly reflect how the polyline model relates to the real world feature it 3 represents. For example, vertices along polylines exist with frequency or density that can be measured several ways, such as number of points per unit polyline length, or mean distance between points. Vertex density in a line is a measure of the precision with which (or resolution to which) that line is defined. In cartography, that precision is usually closely or directly related to the cartographic scale at which the data are collected or meant to be drawn. Based on such an understanding of polylines in maps, cartographic line simplification may be defined as the transformation of the set of points that define a polyline to a new set defining a new polyline which represents the input line at reduced symbol dimensions and with reduced detail. The work presented in this thesis deals exclusively with line simplification, being one of many operators (i.e., logical, geometric and graphical processes which transform spatial data) involved in cartographic generalization. Line simplification in this work is understood as a problem set in computational geometry regarding the transformation of sets of points which define polylines in digital cartographic data. Specifically, this thesis presents an original, automatable line simplification algorithm, with comparison and commentary on how it performs against a similar algorithm: the raster-vector permutation (Li 2007) of Li and Openshaw's (1993) natural principle algorithm. Motivation for the development of the hexagonal quantization algorithm has come from several sources. Chief among these has been the belief that scale is the single most important factor driving the need to simplify cartographic linework. As map scale decreases, less space is available for symbolic representation of a sinuous linear landscape feature. While this is an obvious fact well-known to cartographers, most techniques for line simplification presently used are entirely uncoupled from the notion of scale. Whereas cartographers frequently refer to technical specifications that clearly describe the desired linework qualities at given map scales, the input parameters of most line simplification algorithms refer to metrics that cannot be objectively related to a specific scale. Examples of these algorithms include the Douglas-Peucker 4 (1973) and Visvalingam-Whyatt (1993) algorithms. While scale-specific line simplification algorithms have been developed (such as those by Perkal (1966), Li and Openshaw (1993), and Dutton (1999)), they have not yet enjoyed popularity in implementation. Several reasons may exist for this, including performance, relative complexity, and availability in commercial software. Regardless of how common these scale-specific algorithms are, the author believes they each conceive the cartographic problem properly, and the algorithm presented here is in the same vein. It is hoped that the performance, relative simplicity and availability of the presented algorithm helps to bring attention to scale-specific multi-scale representation and generalization methods. Further motivation to develop the algorithm described here was provided by the desire to demonstrate an alternative geometric conception of the cartographic line simplification problem. Many approaches to line simplification among cartographers have revolved around the notion of characteristic points in a line, and the importance of retaining these. Characteristic points are defined as those along a line that, as a subset, make an effective abstract gestalt of the line (these will be discussed in greater depth in the literature review chapter). The author contends that while characteristic points can be identified in any given map line, their qualification as characteristic does not necessarily hold as scale is reduced, and thus that their retention in line simplification uninformed by scale is a flawed approach. The present algorithm uses a different approach, in that the input line is sampled using a regular tessellation upon which all vertices of the input line are weighted equally, with none regarded as more characteristic of the input line as any other. Rather than seek to retain certain input vertices, the present algorithm seeks to follow the many input line vertices as closely as possible within a certain spatial resolution defined in direct relation to target scale. 5 Unique Contributions The hexagonal quantization algorithm developed and presented in this research represents an effort in a new paradigm of scale-driven generalization operators as described by Li (1996). An important element of the algorithm is its scale-specificity, and the development of that quality in this research is unique in its direct reference to representational (i.e., visual) resolution at target scale, informed by sampling theory. The algorithm is essentially a cartographic application of vertex clustering, a generalization technique employed in computer graphics research outside cartography, in that it reduces line vertices specific to each tessera of an imposed tessellation. The process of reduction undergone in each tessera is known in signal processing literature as quantization (Rossignac, 2004, p. 1224). Though neither the terms “vertex clustering” nor “quantization” are used by Li and Openshaw, essentially the same process occurs in the raster-vector mode of their algorithm (Li & Openshaw, 1992). Two essential differences exist in the algorithm presented here and the Li-Openshaw raster-vector algorithm: tessellation geometry and mathematical means of objectively relating tessera dimensions to target scale. The Li-Openshaw method essentially performs vertex collapse within the square pixels of traditional raster structures (i.e., regular square tessellations), whereas the algorithm presented here does the same in a regular hexagonal tessellation. The hexagonal tessellation is chosen for its radial symmetry and uniform inter-tessera topology and distances. Also, whereas sound guidelines for estimation of raster cell size in relation to target scale are given by Li (2007, p. 65) for use in the raster-vector Li-Openshaw algorithm, the present research offers formulae for the direct calculation of scale-specific, appropriate tessera dimension derived from resolution theory. 6 Thesis Structure The remainder of this thesis is laid out in four chapters. The Line Simplification Literature chapter reviews ideas on the subject put forth by cartographers for the last five decades, as well as topics relevant to this research; these include computational geometry, hexagonal tessellations, and signal processing applications. The next chapter, The Hexagonal Quantization Algorithm and Study Methods, explains in detail how the algorithm operates. This chapter also describes the methods by which the hexagonal quantization algorithm was implemented and then tested against an implementation of the Li-Openshaw raster-vector algorithm, as well as the metric by which positional deviation from the input line was measured for both algorithms. The Results and Interpretations chapter documents the data produced and gathered by the research, and then discusses these. The thesis is concluded by the Conclusions and Future Work chapter, which contains observations and future plans regarding the research. Chapter 2 Line Simplification Literature Line simplification has been an important topic in the cartographic literature for decades. That importance is also apparent in the literature of other fields, where the motivation and problem formulation may be different, but the essential computational geometry of extracting pattern and form at one scale of measurement in a linear signal machine-retrieved at a higher (or noisier) scale of measurement bears many similarities. Particularly noteworthy among these fields are signal processing, pattern detection, and computer graphics. Across these fields as well as in cartography, interest in automated line simplification has been driven by the objective of reformulating signal (e.g., a map line, or lines in a computer-read image) for some other scale of representation, or for producing a simplified correlate of a line. Literature from several fields is reviewed in this chapter. In particular, automated line simplification is briefly considered in light of the larger topic of cartographic generalization, a field of research that spans several types of procedures (i.e., operators) on map data, with the simplification of lines being among them. Line simplification, has held a seemingly privileged place in the generalization literature, ostensibly because most geospatial data in vector form is composed of some type of polyline, with the exception perhaps only of point data. This review then seeks to describe some of the most interesting and popular conceptualizations of digital cartographic lines and their simplification. Combined with these cartographic views, several algorithms developed within the cartographic community are noted, with brief descriptions of their essential workings. This review generally then departs from explicitly cartographic literature to discuss simplification solutions from cognate fields such as computer graphics and signal processing. In particular, the quantization process in computer graphics known as vertex 8 clustering is examined, in part to observe precedent to, and document consilience with, certain parts of the hexagonal quantization algorithm in this research. This discussion includes cellular or cell-like geometric shapes within which vertex clustering can be defined. The review then shifts to literature from various signal-processing fields that almost unanimously extols the benefits of hexagonal sampling lattices over square ones. Several authors describe the unique properties of regular hexagonal tessellation in ℝ2 (i.e., the two-dimensional Euclidean plane), and how these permit the collection of data samples that are more efficient in representation, more error-free, and less anisotropic than samples collected with sensors (i.e., cells, pixels) arranged in regular square grids; these claims are corroborated by quantitative measures. Finally, the review moves to the Hausdorff distance, a metric used by geometers to measure the distance (i.e., difference) between two sets of objects. Mathematically speaking, the metric is applicable in any metric space; it is used in this research as an objective means of output line evaluation, quantifying the maximum displacement of simplified lines from their detailed input lines across the ℝ2 surface of a projected map. Generalization Weibel (1997, p. 101) offers the following description: In cartography, the process which is responsible for cartographic scale reduction is termed generalization (or map generalization, or cartographic generalization). It encompasses a reduction of the complexity in a map, emphasizing the essential while suppressing the unimportant, maintaining logical and unambiguous relations between map objects, and preserving aesthetic quality. The main objective then is to create maps of high graphical clarity, so that the map image is easily perceived and the message that the map intends to deliver can be readily understood. 9 It is apparent from the passage above that there exists no singular, objectively correct way to generalize maps. Further, the broad objective given by Weibel refers to the achievement of certain qualities in generalized maps, namely clarity and ease of perception, that are inherently difficult to measure. These points about the generalization process along with others, such as the challenge of encoding cartographically-acceptable representation decisions in strict logic for computer automation, illustrate the circumstances under which cartographers have both repeatedly conceived of generalization and recommended how it should be done. Usually, it is agreed that generalization is a process that takes place in order to use existing map data from some larger cartographic scale on smaller scale maps. By that definition, generalization is scale-driven (Dutton, 1999; Li, 1996). The representation changes that generalization calls for are negotiations of the reduced map area on which features must be modeled and depicted while still retaining acceptable positional accuracy. Successful generalization has been described by Ruas (2002, p. 75) as a synthesis process: the number or complexity of symbols used is reduced, while the chief pieces of information and character of the original are retained in a generalization that clearly conveys information. The process of making any map at scales smaller than 1:1 involves generalization, in that the real-world objects being mapped cannot be represented in their full detail. Generalization, then, is essential to understanding a map (Bertin, 1983). Cartographic generalization, in the sense that cartographers commonly refer to it, is an additional step of abstraction whereby existing representations (e.g., polylines representing rivers) are further abstracted. Ratajski (1967) draws a distinction between quantitative (i.e., numbers of features represented) and qualitative generalization (i.e., abstraction of form). Similarly, Bertin (1983) describes conceptual and structural types of generalization, being morphological redefinition of features in the prior and diminution of frequencies of occurrence in the latter. 10 Cartographers do not unilaterally agree, however, on the understanding that generalization is a scale-driven process and, by extension, what should be expected of its successful invocation. Without necessarily requiring a change in scale, generalization can be viewed as the work necessary when a map fails to “maintain clarity, with appropriate content, at a given scale, for a chosen map purpose and intended audience” (McMaster & Shea, 1988, p. 242). Still others regard data reduction as an additional explicit goal of generalization (Cromley, 1992), an objective that is partly a relic of early map computerization and concerns about efficient use of digital memory. Reflecting this variety of view points, there are several somewhat divergent descriptions and sets of requirements for generalization in the literature, both in text books and in scholarly journals (examples in Brassel & Weibel, 1988; Li, 2007; McMaster & Shea, 1992; Sarjakoski, 2007; Stoter, Smaalen, Bakkerand, & Hardy, 2009). Robinson et al. (1995) describe the collection of map scale, map purpose, graphic limitations and data quality as the appropriate controls of the generalization process. Several authors have created taxonomies of situations that call for generalization; notably among these Shea and McMaster (1989, p. 58) and McMaster and Shea (1992, p. 45) describe the six conditions of congestion, coalescence, conflict, complication, inconsistency and imperceptibility. Harrie and Weibel (2007) describe generalization as having gone through an evolution, from condition-action modeling where cartographers respond to problems (such as those described by McMaster and Shea above), through human interaction modeling where the process is semi-automated (as in many geoprocessing tools available through Esri's ArcGIS package), to constraint-based modeling, where automated processes are run according to parameters defined by explicit map requirements. Many of the processes undertaken to this day involve human interaction and evaluation. Part of the reason for this may be the difficulty or even impossibility of relating operator input parameters to target scales, such that operators need to be run and their products evaluated iteratively until a satisfactory map is made. Related to this fact, some authors 11 have suggested that generalization is best undertaken with equipment that allows for the real-time observation of products (Lecordix, Plazanet, & Lagrange, 1997; R. B. McMaster & Veregin, 1991); an example of such a piece of equipment is the online line simplification tool MapShaper.org (Bloch & Harrower, 2008). Regardless of motivation and paradigm, the process of generalization is regarded as composed of various distinct operators, being processes that conduct specific geometric modifications on specific types of data. As examples, line simplification is an operator which modifies linear features, while amalgamation (or aggregation) is another which modifies any of point, line, polygon or cellular (i.e., raster) features. The generalization process cartographers undertake in map production is usually conceived of as a set of various operations, either employed in parallel or in sequence or in mixtures thereof, and across several datasets. Line Simplification Line simplification is arguably one of the most important generalization operators, since almost every map includes some form of lines. The volume and diversity of writings on this topic in the cartographic literature reflects the community's continued concern over unresolved geometric and practical issues. Even though some algorithms have been implemented and made available in commercial GIS (such as Esri’s ArcGIS), no algorithms are yet considered suitable and trust-worthy enough for large-scale automated map production. Cartographers have taken various theoretical approaches to line simplification, reflecting diverse formulations of what constitutes line symbols and how they convey information, both in and of themselves as well as implemented as polylines in digital cartography. Among the most popular of these conceptualizations is that of Peucker (1976), who models polylines as potentially noisy vertex-based representations of true feature lines. In his model, the frequency of vertices 12 along a polyline bears a relationship to how efficiently, and with how much lateral displacement, the polyline communicates the position of the true line (Peucker, 1976, p. 508): A line is a combination of a number of frequencies, each of which can be represented by certain bandwidths. The break-up of a line into series of bands, therefore, could be equated with the stepwise removal of the high frequencies from the line. Elaborating on his notion, Peucker writes (p. 512): It is the objective of the theory to keep the computation at any time on the highest level of abstraction that the particular problem allows. The critical issue of the theory is to find the highest level in any different type of problem. Writing in 1976, an explicit objective of Peucker's approach was to allow for the winnowing of points from what could be a noisily-digitized polyline, with the degree of point reduction being tailored to the data resolution required for a given spatial analysis task (Weibel, 1997, p. 120). While Peucker did not exclude the possibility that his ideas could apply to the simplification of lines as scale reduces, it is in this sense that the algorithm he developed with Douglas (Douglas & Peucker, 1973) became widely adopted. Operations using points corresponding to various bandwidths around lines have been developed for use both in the simplification of lines (with examples to follow) and in comparing corresponding lines digitized at different point frequencies (e.g., Savary & Zeitouni, 2005). Characteristic Points Related to the notion of point hierarchy with respect to bandwidth has been the very popular notion of point hierarchy with respect to varying degrees with which points represent a line. Borrowing theory developed by psychologist Fred Attneave (1954), cartographers have believed that among the set of vertices making up a polyline there exist subsets of characteristic points. Attneave asserts that certain points can be identified along perceived linear features in 13 anything a person can see, such that these points can be used alone and in abstract to successfully represent the real object to a human. Generally these points are those where lines have the greatest directional change, being apexes of curves and sharp points. He provides a now famous example of a sleeping cat (p. 185), drawn by connecting only 38 points with straight lines (Figure 2.1). Marino (1979) related Attneave's ideas to existing thoughts on special points in cartography (Dent, 1972). In her influential study, she found that human subjects exhibited a high degree of consistency in selecting points along river lines as being important to retain when seeking to represent the line with a pre-defined number of vertices. Her findings were widely taken to corroborate the notion that characteristic points existed in cartographic lines, and that simplification routines that retained such points would yield symbolically-optimal results. Figure 2.1 - Attneave's sleeping cat. (Source: Attneave, 1954, p. 185) Following Marino's research, several authors began to regard the line simplification process as one of removing extraneous points, while retaining those that were characteristic (McMaster, 1987; McMaster & Shea, 1988; Veregin, 1999; White, 1985). These scholars pointed out that the Douglas-Peucker algorithm (1973) was the best available for automatically identifying and retaining characteristic points along a line, and that the paradigm it represented should thus be continued. Further applying theory on characteristic points to cartography, Jenks (1979) defines characteristic points as being of two types: those that are relevant to perceived form (e.g., curve apexes) and those that are given particular geographic importance (e.g., where a 14 river passes under a bridge). In later work, Jenks (1989) continues to advocate for the use of the Douglas-Peucker algorithm, though his emphasis is more in line with the original conception of the algorithm as a means of making data more efficient, than with the popular assertion that the algorithm is well suited to reconstruct lines through scale change. With respect to map making, Jenks (1989, p. 34) makes a recommendation that has still not been successfully implemented today: that characteristic points should be differentially selected with respect to map purpose and scale. Segmentation and Strip Trees Beyond treating certain points in lines as special, several scholars have also suggested that distinctions should be made between certain lengths in lines. Arguing reasonably that lines may exhibit decidedly different morphologies at different positions along their length (e.g., a river that follows a jagged course through rough terrain and then gently meanders through plains), some have suggested that local morphology should drive the degree to which simplification is carried out, as well as possibly which routines are used. A strong proponent of this approach has been Buttenfield (1985, 1989, 1991), and her position has been echoed by Cromley and Campbell (1992), Dutton (1999, p. 36), Plazanet (1995), and García and Fdez-Valdivia (1994). According to Buttenfield, lines can be characterized by their structure signature, consisting of geometric measures observed on hierarchically-divided lengths of the line (1991, p. 152). She goes on to say "the structure signature's purpose is to determine scales at which the geometry of a line feature changes" (p. 170). Particular segments are defined with geometric reasoning, often using zeros of curvature (i.e., points along the line at which curvature in one direction ceases and then goes to the other direction). While several relevant measures have been suggested elsewhere in the literature (Carstensen, 1990; McMaster, 1986), mathematically relating such differential 15 measurements along the line to input parameters for simplification algorithms still remains hypothetical and unclear. Directly related to the idea of segmenting curves, though not necessarily in tandem with the notion that they should be simplified to locally-customized degrees, is the division of sinuous lines into elementary curves, each encapsulated within a ribbon-like band delimiting the dimensions of the curve. Suggestions for these schemes, often called strip trees (Ballard, 1981; Buttenfield, 1985; Cromley, 1991), usually center around efficient computation and indexing of geometrically-distinct areas of a complex line, as well as the possibility that length-specific treatments in polylines can allow for multiple levels of detail (LODs) in their representation. Authors have suggested that strip trees and other related segmentation schemes can be easily constructed from line digitization processes (Ballard, 1981), as well as from certain simplification algorithms that inherently segment lines for the purpose of deciding vertex eliminations within small spans, such as the Douglas-Peucker (1973), Lang (1969), and Visvalingam-Whyatt (1993) algorithms. Other segmentation strategies have been proposed, such as the use of Delaunay triangulation in the space between lines (Van Der Poorten & Jones, 2002), as well as the use of regularly-spaced (i.e., tessellated) areas (Dutton, 1999; Li & Openshaw, 1992; Zhan & Buttenfield, 1996). Point Reduction vs. Line Redefinition As discussed above, some researchers (Jenks, 1989; McMaster, 1987; Veregin, 1999) regard point reduction as a necessary, if not sufficient, quality for an algorithm to be considered as implementing the generalization operator of line simplification. This is to say that these authors generally regard line simplification to be a process which reduces input vertices down to a representative subset. This view has been challenged by several scholars (e.g., Dutton, 1999; 16 Raposo, 2010). Even though there are grounds for thinking of some of vertices in a polyline as more representative of a shape, Dutton (1999) raises the point that all the vertices in question are representative geometric abstractions, and no particular subset of them should be considered sacrosanct when simplifying whole lines. Further, he avers that while some point reduction is a likely consequence of simplification, transformation of features is valid and sometimes required (1999, p. 34). Such transformations may involve the creation of new vertices which were not part of the original data set. This view has been shared by Li and Openshaw (1992, 1993), who classify generalization operators as falling into one of two groups: those that reduce points, and those that smooth features. They argue that point reduction is relevant only to data efficiency, and that authors who apply it to multi-scale representation are in error. Their arguments include the widely-known poor performance of the Douglas-Peucker algorithm at relatively higher bandwidths (supposedly being used for simplifying lines to much smaller scales), and the fact that point-reduction methods cannot be considered a theory-based paradigm applicable to the whole generalization process (Li & Openshaw, 1992, pp. 376-377). Line simplification for multi-scale representation, then, ought not to be concerned with point reduction or retention, but rather with producing lines appropriate for use at specified scales (Dutton, 1999; Li, 2007). Constraints and Scale-Specificity As previously mentioned, Harrie and Weibel (2007) describe the present paradigm of generalization research as focused around the notion of constraints. Attempts to formalize constraints usually hinge on quantifiable requirements (e.g., the minimum distance between lines in map space for legibility). It has been suggested that popular algorithms such as the DouglasPeucker method could be calibrated using a priori map constraints (Veregin, 2000). Also, constraints such as “lines must not self-cross” have inspired post-processing routines (Saalfeld, 17 1999). This methodological focus on constraints is in keeping with contemporary tendencies in the broader generalization literature (e.g., Stoter et al., 2009). Using constraints, some authors have suggested that knowledge-based expert systems be used for line simplification (e.g., Kazemi, Lim, & Paik, 2009; Skopeliti & Tsoulos, 2001). Such systems are described by Buchanan and Duda (1983, p. 1) as heuristic (they reason according to knowledge from theory programmed into them), transparent (they make their reasoning explicit if/when it is inquired), and flexible (they can integrate new knowledge and so evolve). Wang and Muller (1998) present methods based on recognizing line shapes and comparing them against cartographic specifications, using, in their example implementation, rules from the Swiss Society of Cartography (1977); confusingly, they do not regard their methods as rule based, claiming that rules for lines are too ambiguous in practice and thus cannot be applied. Scale-specificity, itself a kind of constraint, has been discussed in the literature with curious infrequency. Weibel (1997, pp. 120-121) suggests that the popularity of the pointreduction-based Douglas-Peucker algorithm is due to its having an early FORTRAN implementation, and its subsequent inclusion into popular GIS applications such as ArcMap; this, in conjunction with the popularity of earlier research on characteristic points, may well amount to the reason why point-reduction methods have been dominant over scale-specific methods in research. Töpfer and Pillewizer (1966) authored one of the only pieces of literature expressly devoted to scale-specificity in generalization, their work having become known as the Radical Law. This work provides several equations for the calculation of how many features should remain on a map generalized to a target scale, given the number of those features present on the initial larger-scale map. As many scholars have noted, the Radical Law provides a guide for the number of symbols to be retained as scale decreases, but not which. 18 Li and Openshaw (1990) proposed the natural principle as a theoretical basis for generalization. They apply the observation that in natural human vision, details gradually diminish to the point of imperceptibility as one gets further and further away from the scene being viewed. The series of algorithms produced by Li and Openshaw (Li, 2007; Li & Openshaw, 1992) are all efforts in their self-described “scale-driven paradigm” (Li, 1996). Similarly, Dutton advocates for scale-specific line generalization: his Quaternary Triangular Mesh (QTM) (1999) is a hierarchical nested tessellation of triangles on the globe's surface, and he describes how line simplification can be done with respect to specific scale levels in the hierarchy. The hexagonal quantization algorithm presented in this research is also an example of an expressly scale-specific method. Classes of Line Simplification Algorithms McMaster (1987) and McMaster and Shea (1992, p. 73) offer a taxonomy of five kinds of line simplification algorithms: • independent point algorithms (Routines that operate on individual points irrespective of neighboring points. An example is an algorithm that eliminates every nth point.) • local processing algorithms (Algorithms that use calculations on immediate vertex neighbors to determine whether a vertex should be dropped.) • constrained extended local processing algorithms (As with local processing algorithms, but performing calculations using neighbors beyond just those immediately in sequence.) 19 • unconstrained extended local processing algorithms (Like constrained extended local processing algorithms, except with neighboring vertex search range defined by local geometric calculations.) • global algorithms (Algorithms that perform calculations on the whole line synoptically.) As has been mentioned above, some authors have disagreed with the utility of this classification system, and have pointed out that its emphasis on point-reduction algorithms is flawed. Douglas and Peucker (1973) gave a three-category taxonomy: • algorithms based on elimination of points along the line • algorithms based on approximation of a line using mathematical functions • algorithms that delete map features from the line Seemingly viewing Douglas and Peucker’s third category as superfluous, Li and Openshaw (1992) suggest there be only two categories: • data reduction methods • smoothing methods The algorithm presented in this research, much like the Li-Openshaw raster-vector algorithm, can be classified as of the second type both in Douglas and Peucker’s, and in Li and Openshaw’s taxonomies. It does not fit well in the McMaster and Shea taxonomy, since while it applies a global hexagonal tessellation, it does not calculate line geometric properties for use in the procedure except for total spatial extent. All of the geometric distances used in the algorithm are derived from the specification of a target scale, rather than a metric criteria by which points should be reduced. 20 Survey of Cartographic Algorithms In this section, several algorithms are reviewed, as well as two hierarchical-tessellation systems developed by Zhan and Buttenfield (1996) and Dutton (1999), respectively. Following brief discussions of several cartographic line simplification algorithms, four noteworthy algorithms are described and discussed in some detail: Perkal’s ε-band method (1965), the Douglas-Peucker algorithm (1973), the Visvalingam-Whyatt algorithm (1993), and the Li- Openshaw raster-vector algorithm (1992). The methods described here represent those most commonly available to and used by cartographers to date, and include one of the most successful applications of scale-specificity. Algorithms Popular in Cartography Perhaps because the topic inherently arouses geometric curiosity, researchers in line simplification have exhibited an impressive degree of creativity. The problem has been approached from several perspectives. Two of the most interesting methods put forward—being among the few that are explicitly related to cartographic scale-specificity—are those of Dutton (1999) and Zhan and Buttenfield (1996). As previously mentioned, Dutton presents a global hierarchical triangular tessellation (Quaternary Triangular Mesh, or QTM) a construct he then applies widely to many aspects of multi-scale representation and spatial indexing. In his paper, Dutton (1999, p. 38) focuses on vector generalization, and in particular, line simplification: Generalization via QTM is primarily a spatial filtering process. Starting at a data set's highest encoded level of QTM detail, this filtering works by collecting line vertices that occupy facets at the (coarser) target level of resolution, then sampling coordinates within each such facet. Arguments Dutton makes for the benefits of generalization by QTM include the fact that latitude and longitude coordinates are directly manipulated, meaning that projection by any 21 means can be done after generalization with vertex positions remaining faithful to where they belong on the globe, and the quality of the simplification can be manipulated by means of using different sampling and simplification strategies inside each triangular mesh element (i.e., tessera). His essential strategy is shared by Zhan and Buttenfield: they employ a raster pyramid scheme, being a nested tessellation of square cells. Map resolutions (i.e., the resolutions to which simplified lines can be drawn) are progressively doubled by doubling the cell resolution, a processes analogous to doubling cartographic ratio scale (Zhan & Buttenfield, 1996, p. 207). Lines are simplified in a step-wise sequence as one goes from one resolution level to the next, using methods described by Meer, Sher and Rosenfeld (1990). They decline to relate pyramid levels to specific map scales, but suggest that the most detailed method should have a pixel resolution of 0.2 to 0.3 mm in map space, reflecting the smallest marks that may be visible on the map medium. Consideration of shore lines in the famous work of Mandelbrot (1982) may well have contributed to the enthusiasm some cartographers have shown for fractal-based simplification methods (e.g., Buttenfield, 1989). A key concept in fractal geometry is the notion of selfsimilarity, whereby magnification (or diminution) of the neighborhood around a form yields the same form as the whole set. Researchers have pointed out the imperfect application of selfsimilarity to coastlines, since the whole of the line is acted upon by various geomorphologic forces operating at various spatial scales, and therefore cannot be expected to display selfsimilarity throughout scales. Buttenfield (1989), rather than apply self-similarity to lines as wholes, suggests that features are self-similar at various sets of scales, then change at critical points with scale-dependence, and again have self-similarity at different scale ranges— quantifiable behavior she terms structure signature. Normant and Tricot (1993) sought to clarify among cartographers that fractal geometry does not necessarily require the use of self-similarity. The fractal dimension of a form (Mandelbrot, 1982) is a measure of how much that form fills a 22 space. It differs from Euclidean dimension in that it can be expressed in real (i.e., not just integer) numbers. So, while a curving line on a plane may exist in Euclidean dimensions ℝ2, it may have a fractal dimension of something like 2.6, reflecting its sinuous, space-filling nature. Muller (1987) has suggested that the preservation of measured fractal dimension should be a guideline for simplification algorithms and used to evaluate their results, since product lines with very similar fractal dimensions to their higher-detailed counterparts should retain the morphological character of the line. Normant and Tricot (1993) used a convex-hull fractal dimension computation method to operationalize that idea. Other efforts in line simplification have involved computations on various derived geometric shapes around the line. Cromley (1992) sought to implement an alternative bandwidth concept proposed by Peucker (1976), wherein the band is defined around the principal axis of the points in a length of the line (rather than the segment joining the first and last points of that segment). His method is similar to the standard Douglas-Peucker (1973) and Lang (1969) methods. Similarly inspired, Christensen (2000) sought to digitally implement—by way of standard polygonal buffering procedures commonly used to create waterlines—Perkal's (1965) proposal that medial axes of polygonal areas could be used to collapse areas to linear features. Essentially, increasingly convergent lines eventually create the points at which a medial axis line is defined, and this line can be used to represent the shape at scales at which the shape area is no longer resolvable. Christensen suggests that a very similar methodology can be applied to lines: the lines are artificially made into polygons, the process is undertaken, and then the artificial arcs are removed (p. 24). Van Der Poorten and Jones (2002) propose a complex system in which the areas around a sinuous polyline and within its calculated bounding box are partitioned using Delaunay triangulation. Sequences of triangles in the resultant tessellation are used to define “branches” of the sinuous form, which can be measured for differential weighting in simplification routines, or flagged for deletion by pruning. 23 Relating more closely to line treatments from pattern recognition and processing fields, Thapa (1988a) presents a cartographic algorithm based on Gaussian smoothing. Following related work in function convolution by researchers in pattern recognition, Thapa's method produces a mathematical approximation of the curve by taking the convoluted values of the second derivative of the Gaussian, and overlays this with the original line to find intersection points, described as zero-crossings. His method can be used to varying degrees of simplification by varying the Gaussian smoothing, though it is unclear whether this method can be related to a target map scale. Thapa points out (1988b) that his method is also useful for detection of critical points along a line, which he insists are not relevant for use in multi-scale representation but can be useful for pattern recognition and data compaction. While several multi-scale solutions involving chain pyramids (e.g., Zhan & Buttenfield, 1996), strip trees (e.g., Ballard, 1981), and Gaussian smoothing are available, Rosin has suggested that it is most sensible to determine “natural scales” of lines, being those levels of generalization where only the most informative forms of the curve are retained (Rosin, 1992, p. 1315): The structure of an object is generated by a number of different processes, operating at a variety of scales. Conversely, each of these scales is a natural scale to describe one level of structure generated in the object's contour. His method segments curves into elementary convex and concave arcs defined by zeros of curvature, and applies Gaussian smoothing along with a shrinkage correction (since Gaussian smoothing tends to shrink forms). The method is robust for even noisy lines (Rosin, 1992, p. 1321). Finally, relating to concepts in animation, Cecconi (2003, pp. 84-112) suggests the use of morphing (i.e., gradual shape changing between two states, as is commonly done in computer graphics). This method requires established control scales where two “keyframes” (i.e., map extents, in this application) are in spatial correspondence. Shape transformation then occurs 24 between the two keyframes by interpolation techniques. One of the two keyframes must always be of lower detail, and the other of higher detail, than the desired generalization. Perkal's ε-band Developed before the digitization and automation of cartography, Perkal's ε-band method (1965) is one of the few truly scale-based methods of simplification. Perkal devised the method for the simplification of the borders of polygonal areas, but some scholars have suggested that the same methods can be implemented for open lines. Nevertheless, it has been difficult to implement Perkal's method in software (Christensen, 2000; Li, 2007, p. 147). The method entails rolling a circular roulette of diameter ε along the edge of a polygonal feature. Lengths of the polygon perimeter inaccessible to the roulette are considered too fine to retain, and instead the arc formed by the roulette edge is taken to be the new, simplified line, until it connects again with the line in the original polygon (Figure 2.2). Perkal's method is scale-specific in that the value of ε is considered in direct relation to the target scale for which the map is being generalized. For example, if the line weight one wishes to use is 0.5 mm, and one is generalizing a lake from a map at 1:25,000 for use on a map at 1:100,000, one should use a roulette in the lake on the 1:25,000 map with a diameter of 2 mm, being 0.5 mm increased by the ratio of target and input scales. Generally: 𝑆𝑡 𝜀 = 𝑤� � 𝑆𝑖 where ε is the width of the band within which the original line must not overlap itself, or needs to be generalized (i.e., dropped), in terms of map units; w is the desired line weight to be used on the target map, in map units; and S t and S i are the target and initial scales, respectively. 25 Figure 2.2 - Perkal’s method at three different values of ε. Hatched areas are inaccessible to the roulette, and therefore dropped from the lake form. (Source: Perkal, 1965, p. 65) The Douglas-Peucker Algorithm The Douglas-Peucker algorithm (1973) is by far the most popular algorithm for line simplification in use by cartographers today, and has had several advocates from a theoretical standpoint (McMaster, 1987; White, 1985). It is a standard method in the suite of geoprocessing tools in Esri's ArcMap software, and various researchers have included it in their constructions of whole generalization systems (Nickerson, 1988). The algorithm is based on Peucker's (1976) 26 theories of the nature of a cartographic polyline, being a form composed of vertices that correspond to varying frequencies (i.e., levels of detail). It should be noted that the algorithm, published in 1973 and having been independently developed, is virtually identical to that of Ramer (1972), who was working on lines in computer graphics. The algorithm begins when a user supplies a tolerance value, being a distance which a vertex must lie beyond in order to be kept. The algorithm then considers every vertex in the line. The first point is taken as an anchor, and a reference line connecting this and the last point, the so-called floater, is drawn. The perpendicular distances to all other points to this line are then measured. If there exist vertices with distances beyond that of the tolerance given (i.e., vertices outside the band delimited by the tolerance distance from the measuring line), the algorithm proceeds (otherwise, it generalizes the whole line to the segment running from the first to the last vertex). The algorithm selects the vertex whose perpendicular distance to the line was greatest, and uses this vertex as a new floating point. Also, the floating point is saved as a member of a stack for later use. Now using a reference line between the anchor and the new floater, the algorithm repeats the process of measuring perpendicular distances for each point in the line between the anchor and present floater, and again will establish a new floater and add it to the stack, if this is necessitated by the presence of vertices outside the tolerance band. The algorithm continues to iterate, progressively working backward toward the beginning of the line and establishing a stack of anchor points for itself for later use. When during these iterations the algorithm does not find vertices beyond the tolerance distance, it considers any vertices within the tolerance distance as extraneous, and deletes them, keeping only the anchor and floater and joining them by a straight line. Each time it makes this join, it moves the anchor ahead to the floater position, and repeats the process using the next-available floater from the recorded stack. Figure 2.3 illustrates this process, depicting several steps from start to finish on a short polyline. 27 Figure 2.3 - The Douglas-Peucker algorithm. (Source: McMaster & Shea, 1992, p. 80-81) Several authors have noted practical problems with the Douglas-Peucker algorithm (e.g., Li & Openshaw, 1992; Zhan & Buttenfield, 1996). The main issues consistently raised are the problem of how the algorithm can produce self-intersecting lines given complex input lines, and that the output tends to be so angular as to degrade the aesthetic quality of the line. Muller (1990) has described a suite of post-processing methods for correcting self-intersection after any line generalization procedure (though his work is generally aimed toward the Douglas-Peucker algorithm). Saalfeld (1999) suggests that a test for self-crossings using convex hulls on segments between anchors and floaters can be added to the algorithm. In this implementation, the algorithm would not stop itself and move along the line until this test was satisfied, and thereby not produce topological errors. 28 A final criticism of the Douglas-Peucker algorithm is that there is no reliable objective way to relate the tolerance band distance to a target scale. The Visvalingam-Whyatt Algorithm The Visvalingam-Whyatt algorithm (1993) examines each vertex along the line with respect to the triangle it forms with its immediate two neighbors. When this area falls below a user-specified areal displacement tolerance, the point in question is dropped. “The basic idea underpinning this algorithm is to iteratively drop the point which results in the least areal displacement from the current part-simplified line” (Visvalingam & Whyatt, 1993, p. 47). Geometrically simple, this algorithm is also widely used, and incorporated into Esri’s ArcMap software. It is also prone to topological error (i.e., self-crossing), and the user-specified tolerance, as with the Douglas-Peucker algorithm, cannot be objectively related to target scale. Figure 2.4 illustrates the algorithm. Figure 2.4 - The Visvalingam-Whyatt algorithm. (Source: Visvalingam & Whyatt, 1993, p. 47) 29 The Li-Openshaw Raster-Vector Algorithm One of few scale-specific line simplification algorithms, the Li-Openshaw raster-vector algorithm is actually one of three related variants, the others being raster-mode and vector-mode (Li & Openshaw, 1992). The algorithm is based on the natural principle developed by the authors (1990), and forms a central part of Li's suggested “new paradigm” for map generalization (Li, 1996; Li & Su, 1995). To use the algorithm, the user first determines the width of the smallest visible size (SVS), being the smallest mark that can be made on the target map; this value often falls between 0.2 to 1.0 mm (Li, 2007, p. 65; quoting Speiss, 1988), though Li writes that experience suggests values from 0.5 to 0.7 mm for best results. The value of the SVS in terms of real distance units is calculated by 𝐾 = 𝑘 × 𝑆𝑇 × (1 − 𝑆𝑇 ) 𝑆𝑆 where K is the SVS diameter in ground units; k is the map symbol size (i.e., SVS in map units); and S S and S T are the initial and target scales, respectively (Li, 2007, p. 65). The SVS size in real world units is used to generate a raster, with one cell centered on the first vertex of the line to be simplified. The raster is made large enough to cover the extent of the line, such that every vertex of the line falls within some raster cell. Then, sequencing along the line, all the vertices falling into a cell are collapsed to a single vertex. While Li suggests that many different methods of generalized point selection within a cell are acceptable (2007, pp. 152153), he recommends using the midpoint of the segment between the point at which the input line enters a cell and the point at which it exits the cell; Figure 2.5 illustrates the method, using the midpoint point selection strategy. 30 Figure 2.5 - The Li-Openshaw raster-vector algorithm. The sinuous gray line represents the input line, the darker gray lines are segments within cells from entry to exit points of the input line, and the black line is the simplified line, formed from the midpoints of the darker gray lines. (Source: Weibel, 1997, p. 125) Outside of Cartography: Vertex Clustering and Mesh Simplification In the fields of computer graphics and computational geometry, several strategies have been employed to tackle the problem of geometric simplification. Noteworthy in the present research is the concept of mesh simplification, and in particular, vertex clustering. To the author's knowledge there has been little acknowledgement in the cartographic literature of the similarity of vertex clustering to certain cartographic generalization routines. The concept is generally the same as that employed in the Li-Openshaw algorithms (1992), Dutton's QTM generalization scheme (1999), and the algorithm presented here. Mesh simplification is a family of methods that seeks to reduce the geometric detail with which a two- or three-dimensional form is rendered. (In principle, it remains possible to apply the methods to objects in higher dimensions.) These methods are frequently applied in a variety of computer graphics settings. A short survey of the literature suggests that the methods are most 31 often applied to three-dimensional forms composed of vertices defining triangle faces; the meshlike system of the vertices and triangle faces that make up a form is known as a manifold. Yang and Chuang (2003, p. 206) describe the mesh simplification methods as follows: Most algorithms work by applying local geometry based criteria for simplifying small regions on the meshes ... The criteria are iteratively applied until they are no longer satisfied or a user-specified reduction rate is achieved. Simplification of meshes is often required. For example, a mesh may constitute an object in a computer video game in which the player's view is a simulated first-person perspective. While the player is far from the object it is unnecessary to compute the appearance of the object onscreen in all its detail; there would likely also be insufficient screen space (or pixels) to display the detail. Instead, the form is usually pre-computed to several levels of detail (LODs), corresponding to viewing distances from the player. As the player moves closer to the object in the game, the game can progressively implement progressively more detailed LODs of the object for rendering. This example describes an example of a progressive mesh (Hoppe, 1996). Mesh simplification is generally done by a process of elimination of vertices from the manifold, sometimes also creating new vertices to represent those that have been collapsed (Figure 2.6). One such family of algorithms for this process is vertex and edge collapse Figure 2.6 - Mesh simplification. (Source: Dalmau, 2004) algorithms (Dalmau, 2004), which search the local neighborhoods of vertices (or edges) and delete vertices (or two, for an edge) from a manifold when triangle faces are found to be sufficiently coplanar, given a defined tolerance. The resulting gap in the manifold is then smoothed over with new, larger triangle faces. 32 Related conceptually to vertex and edge collapsing is vertex clustering. Rossignac (2004, p. 1224) writes: Vertex clustering, among the simplest simplification techniques, is based on a crude vertex quantization, obtained by imposing a uniform, axis-aligned grid and clustering all vertices that fall in the same grid cell. The most generic version of this method applied to three-dimensional manifolds is a three-dimensional tessellation of cubic voxels. It can be seen intuitively that the size of the voxels defines the number of vertices that will fall within it, and thus defines the degree of simplification. The next task in the method is to chose a vertex in each voxel to represent all the others. Rossignac and Borrel (1993) found that choosing the vertex farthest from the center of the object's bounding box made for the best results, likely because this would counteract the tendency the method has to shrink three-dimensional manifolds (Rossignac, 2004, pp. 12241225). Rossignac goes on to suggest that even better results can be achieved by using vertices achieved by more computationally-costly methods, such as comparing vertices for weights reflecting the likelihood that the vertex would be part of the object's silhouette from a random viewing angle. Mesh simplification has been suggested in cartography before: Burghardt and Cecconi (2007) have suggested it be used for building generalization. Mesh simplification and vertex clustering both depend on the tessellation of space. This review now shifts to issues specific to the use of tessellations as means of sampling signal; concepts discussed are pertinent to the hexagonal clustering algorithm presented in this thesis. The Hausdorff distance is then discussed as an objective means of evaluating signal sampled using tessellation schemes. 33 Hexagonal and Square Tessellations Applied to Pattern Analysis and Generalization The benefits peculiar to data models based on tessellations of the plane are well known, and the familiar square-pixel raster data model is probably the best known implementation. Applications of data models of this type have been widespread in fields such as pattern analysis and computer vision, but have been rarely considered in cartographic line simplification. Examples are Dutton (1999), Li and Openshaw (1992), and Zhan and Buttenfield (1996). Uniform tiling (i.e., regular tessellations) is frequently used to both sense and represent planar data of various kinds (e.g., Landsat images). Inherent to the creation of a tiled representation is a process of quantization; a single measurement is recorded in each cell of a sampling mesh, thereby generalizing what is potentially infinitely-differentiable signal. Naturally, the quantization in a uniform mesh is a function of the signal and the geometry —size, shape, topology, orientation— of the cell (excluding matters of measurement precision). So long as cells are arranged in a true tessellation (i.e., without gaps or overlaps), and so long as the mesh spans the whole extent of the signal in question, geometric intersections between the signal and the sampling mesh will always exist. This is to say there cannot be degenerate cases, such as points beyond the sampling grid, or points falling between sampling mesh elements (Akman, Franklin, Kankanhalli, & Narayanaswami, 1989). While it is true that there is a loss of the exact, real data points as they are quantized to the locations of grid pixels (Kamgar-Parsi, Kamgar-Parsi, & Sander, 1989, p. 604), this loss is always bounded by the mesh (i.e., sampling) resolution. It is important to keep in mind that in the objective to generalize a complex signal, data loss is actually a requirement (as it is, for example, in cartographic generalization). As mentioned previously, quantization is a function of sampling geometry. A large body of literature on pattern analysis examines the relative merits of the three possible regular tessellations of the plane, being the triangular, rectangular, and hexagonal (Figure 2.7) . The 34 triangular is rarely considered for this purpose, as there is inherent orientation variability in that geometry, making measurements across pixels more complex than those of the rectangular and hexagonal tessellations. Regarding the issue of element orientation, most literature discusses squares and equilateral hexagons, though variations in pixel dimensions are considered as well, Figure 2.7 - The three possible regular tessellations of the plane. (Source: Peuquet, 2002) usually for optimization in specialized applications (e.g., Iftekharuddin & Karim, 1993; KamgarParsi et al., 1989, p. 609; Scholten & Wilson, 1983). Overwhelmingly, the literature indicates that hexagonal sampling meshes perform more efficiently, with less error, and with more meaningful inter-element connectivity than square meshes (Birch, Oom, & Beecham, 2007; Carr, Olsen, & White, 1992; Condat, Van De Ville, & Blu, 2005; Duff et al., 1973; Graham, 1990; Iftekharuddin & Karim, 1993; Mersereau, 1978, 1979; Nell, 1989; Puu, 2005; Scholten & Wilson, 1983; Weed & Polge, 1984; Yajima, Goodsell, Ichida, & Hiraishi, 1981). Regardless of this virtual consensus, square pixel data models remain dominant in practices using regular grids, examples of which include common digital graphics formats (such as standard screen pixels and image file types), climate and ecology models, and GIS raster modeling. Graham (1990, p. 56) has suggested that early pioneering work in computer spatial modeling by Unger (1958) using square pixels may have set a decisive precedent, and the uniform Cartesian coordinates with which square pixels may be easily indexed is a quality that makes square pixels attractive (Birch et al., 2007, p. 354). It also seems probable that the popular 35 adoption of square pixels was influenced by available hardware, as early devices were engineered and made available used square meshes (e.g., cartographic digitizing tablets). Connectivity between cells is one of the most convincing reasons why hexagons are frequently regarded as more suited to sampling planar signal than squares. If the neighbors of a cell are considered to be those that contact the cell by either an edge or a corner, then it is seen that triangles have 12 neighbors, squares 8, and hexagons 6. Table 2.1 summarizes the comparative distances between neighboring tessera. Shape Triangle Number of neighbors 12 Distance between neighbors 1 √3 𝐶 or 𝐶 or Square 8 𝐶 or √2𝐶 Hexagon 6 √3𝐶 𝐶 = length of side of cell. 2 √3 𝐶 Cell area √3 2 𝐶 4 𝐶2 √3 2 𝐶 2 Table 2.1 - Distances and areas for different regular tessellation geometries. (Source: Duff et al., 1973, p. 245) It is readily apparent that the only shape with a consistent distance to its neighbors is the hexagon. Furthermore, connectivity to neighbors is defined exclusively by edge contact, meaning that the spatial relationship between one tessera and its neighbor always has a consistent spatial meaning for hexagons; this is untrue both for triangles, which have edge connectivity and two orientations of corner connectivity, and for squares which have edge connectivity (i.e., four neighbors in a von Neumann neighborhood of range = 1, each at a distance equal to the square's side) and corner connectivity (i.e., four additional neighbors at diagonals, each at a distance equal to √2 times the square's side). Because hexagons can neighbor each other exclusively by sharing common edges, they evade the connectivity paradox that occurs in triangle and square arrays 36 when connectivity by corners is permitted (Figure 2.8). Thus, connectivity between hexagonal cells is better defined than for square cells (Yajima et al., 1981, p. 223). Further, when sampling a linear signal, hexagonal error is less sensitive to sampled line orientation than square, because the six-fold radial symmetry of hexagons is more isotropic than the four-fold symmetry of squares (Kamgar-Parsi et al., 1989, p. 609; Mersereau, 1979, p. 932). Figure 2.8 - Connectivity paradox; in triangles and squares, whether or not regions A and B are connected by the corners of cells l and m is unclear, as is whether or not gray cells form a continuous region across cells p and q. There is no such ambiguity in hexagons. (Adapted from source: Duff, Watson, Fountain, & Shaw, 1973, p. 254) Pappus of Alexandria (c. 290 - c. 350 CE) proposed the “honeycomb conjecture,” mathematically proven only quite recently (Hales, 2001). It suggests that regular hexagons are 37 the most efficient way to tessellate the plane in terms of total perimeter per area covered. A related property of hexagons in comparison to squares is how closely each shape approximates a circle (Figure 2.9); since the area of a circle is defined as the locus of points at or within a certain distance from the circle center (the distance being the circle's radius), a circle is the most compact shape possible in ℝ2. Any equilateral polygon that covers its circumcircle more completely is a closer approximation of the circle than another equilateral polygon which covers less. As a Figure 2.9 - An equilateral hexagon and square in their circumcircles. The area of the hexagon is closer to its circumcircle than is the square’s to that of its circumcircle. (Source: WolframAlpha.com) corollary, hexagons, because they approximate circles more closely, are more compact than squares. This fact has direct application to any set of point sensors arranged on a plane or similar surface, and can be seen reflected in nature (e.g., most animal vision organs have rods and cones arranged in nearly-regular hexagonal tessellations in the eye's fovea). Essentially, these compactness properties mean that when using either geometry in a tessellated plane array of sensors, the hexagonal array can sample a given planar signal with the same degree of fidelity using fewer tessera (Condat et al., 2005; Mersereau, 1978; Nell, 1989, pp. 109-110). Graham (1990) tested for anisotropic effects in medical images across three tessellations: a pentagonal approximation of hexagonal tessellation, a non-regular hexagonal grid, and a regular hexagonal grid. He found that tessellation artifacts in the sensor response were consistently lowest in the regular grid. He thus recommends the use of regular hexagonal grids for their superior detection and representation of local variation on a plane. 38 Beyond applications of Christaller's (1933) classic theory, hexagonal tessellation has been advocated for thematic cartography by Carr, Olsen and White (2004), and has been used to study cluster perception in animated maps (Griffin, MacEachren, Hardisty, Steiner, & Li, 2006), as well as color perception (Brewer, 1996). Hausdorff Distance The Hausdorff distance has seen widespread application in computer science, often in automated pattern matching applications (Alt, Godau, Knauer, & Wenk, 2002; Alt & Guibas, 2000; Huttenlocher, Klanderman, & Rucklidge, 1993; Knauer, Löffler, Scherfenberg, & Wolle, 2009; Llanas, 2005; Rucklidge, 1996, 1997; Veltkamp & Hagedoorn, 2000). Hausdorff distance has also been used in cartography to both measure generalizations (Hangouët, 1995) and conflate datasets of differing levels of generalization (Savary & Zeitouni, 2005). It is computationally efficient, provides a single measure of global spatial difference, and is meaningful on the plane of any distance-preserving map projection. Named for Felix Hausdorff (1868 - 1942) and described in some detail by Rucklidge (1996) and Veltkamp (2001), the Hausdorff distance is a measure of distance between two sets in a metric space, commonly used in computer science image-matching applications. One kind of distance commonly used is the L 2 metric (i.e., the Euclidean straight-line distance). With two sets A and B, the directed Hausdorff distance (h) from A to B is expressed as �⃗(𝐴, 𝐵) = 𝑠𝑢𝑝𝑎∈𝐴 𝑖𝑛𝑓𝑏∈𝐵 𝑑(𝑎, 𝑏) ℎ with d(a,b) being the underlying distance. This formula equates the directed Hausdorff distance from set A to set B to the maximum value (sup, short for supremum) among all the shortest (inf, short for infimum) distances from any a (i.e., a member of set A) to any b (i.e., a member of set B); the longer dotted line M in Figure 2.10 illustrates this relationship. 39 Figure 2.10 - The Hausdorff Distance in ℝ2. Line M represents the longest distance an element a of all elements A has to go to reach the closest element b. Line N represents the same, but from B (and all elements b thereof) to the closest element a. Line M is the directed Hausdorff distance from A to B, while line N is the directed Hausdorff distance from B to A. The longer of these two (M) represents the (overall) Hausdorff distance. (Figure adapted from source: http://www.mathworks.com/matlabcentral/fileexchange/26738-hausdorffdistance, graphic by Zachary Danziger) Separate, directed distances between the two sets are required, because the distance in either direction is not necessarily the same. The directed Hausdorff distance between B and A (line N, figure 2.10) is given simply by inverted notation: �⃗(𝐵, 𝐴) = 𝑠𝑢𝑝𝑏∈𝐵 𝑖𝑛𝑓𝑎∈𝐴 𝑑(𝑏, 𝑎) ℎ The Hausdorff distance (H) is the greater of the two directed Hausdorff distances: �⃗(𝐴, 𝐵), ℎ �⃗(𝐵, 𝐴)) 𝐻(𝐴, 𝐵) = max (ℎ Applying the metric to points, the Hausdorff distance is the farthest away any one point from either of two sets is from the nearest point of the other set; it is a global measure of the greatest local difference in position observed between two point sets. If one set is derived from another, the Hausdorff distance can be considered a measure of deviation. In this manner, the 40 vertices of input and simplified polylines, as they exist projected on a map (i.e., in ℝ2), are meaningfully measured for displacement using the Hausdorff distance and the L 2 metric. Mathematically speaking, the Hausdorff distance qualifies as a true metric because it satisfies the properties outlined in Table 2.2. Nonnegativity Identity Uniqueness Triangle Inequality 𝑑(𝐴, 𝐵) ≥ 0 The distance between sets A and B will be zero or greater. 𝑑(𝐴, 𝐴) = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑎 ∈ 𝐴 There is no distance between an element of set A and itself, so that the distance from A to itself is zero. 𝑑(𝐴, 𝐵) = 0 𝑖𝑓𝑓 𝐴 = 𝐵 There is zero distance between two sets if and only if the two sets are equal. 𝑑(𝐴, 𝐵) + 𝑑(𝐴, 𝐶) ≥ 𝑑(𝐵, 𝐶) The sum of distances between sets A and B and between B and C must be greater than or equal to the distance between A and C. Table 2.2 - Required properties of a true mathematical metric. (Source: Veltkamp & Hagedoorn, 2000, p. 468) The last condition, triangle inequality, is a particularly crucial property of mathematical distance measures; it is important when using a metric to compare patterns, since without it, it is possible to have A very similar (i.e., close) to B, and B very similar to C, such that A is also very similar to C (Arkin, Chew, Huttenlocher, Kedem, & Mitchell, 1991, p. 209). 41 The Hausdorff Distance vs. McMaster's Measures of Simplified Lines In the context of cartographic line generalization measurement, McMaster has asserted that six measures of a generalized line against its original correlate are useful, having narrowed the list from a set of thirty (1986, p. 115): 1. percent change in the number of coordinates; 2. percent change in the standard deviation of the number of coordinates per map unit; 3. percent change in angularity; 4. total vector displacement per map unit; 5. total areal displacement per map unit; 6. percent change in number of curvilinear segments (lengths in which all angles at vertices are either positive or negative). Three of McMaster's measures (1, 2 and by corollary 6, above) are related to the decrease of vertices from a polyline, an objective he believes is integral to cartographic line simplification. The present author does not share this view, and instead believes that the morphological simplification of a line as scale changes is of principal importance, rather than the number of vertices used in the polyline to represent the feature. While lines will tend strongly to go from higher to lower angularity with increasing levels of simplification, changes in angularity (3 above) are not absolutely reliable. It is conceivable, for example, that a rounded promontory at one scale may be represented by a relatively more angular, and small, bump at a smaller scale, increasing total line angularity (all other lengths of the line remaining equally angular). Displacement measures are important, particularly in topographic mapping contexts, since they relate to positional accuracy. Total vector and areal displacement (4 and 5) are difficult to relate to the positional accuracy of a generalized line in its totality, since the displacement along 42 different lengths of the line may be variable, and separate computation would be required to find where that displacement existed and in what quantities. It is instead suggested that Hausdorff distance be used for measuring the relative displacement of cartographic lines after simplification. Since the Hausdorff distance provides a single number that represents the furthest distance any one element of one set is from an element of the other set, it can be applied to the vertices of an initial polyline and its simplified polyline, thereby describing the greatest existing displacement between the two lines. This value seems a sensible measure for the relative positional deviation between the lines. Also, if the input line is taken to be authoritatively “correct” in position, this value describes the “error” of displacement introduced by a simplification. Further details on this reasoning are given in the next chapter, where the application Hausdorff distance in this research is explained. Summary This literature review has illustrated several approaches cartographers have taken in seeking to automate line simplification, and contrasted these to some of the approaches taken in other cognate fields, such as signal processing. It is seen that among cartographers the goals of line simplification are not unanimously agreed-upon, with particular disagreement on whether or not simplification retains subsets of vertices, or is ultimately unconcerned with these and instead seeks a simplified correlate line without much concern for the particular vertices that make up its form. Cartographers, also, have frequently pursued methods that lack scale-specificity, and discussion of simplification for particular target scales is curiously rare in the literature. Finally, there does not exist consensus among cartographers as to how simplified lines should be evaluated. 43 The work presented in the following chapters represents a cartographic effort with the goal of scale-specificity, and from the viewpoint that appropriate cartographic lines for target scales are what should be sought, rather than the retention of certain vertices from the input line. Chapter 3 The Hexagonal Quantization Algorithm and Study Methods Overview of the Algorithm The hexagonal quantization algorithm uses a vertex clustering technique to alter the set of points that define a map polyline (Figure 3.1). The basic concept of the vertex clustering method is to impose a tessellation on the form to be simplified, and to reduce the number of vertices Figure 3.1 - The hexagonal quantization algorithm. In each hexagon, the input vertices (gray) are quantized to a single output vertex (black), resulting in a simplified output line (in black). falling within each tessera to a single vertex, the later process known in the field of signal processing as quantization. Vertex clustering works for either polyhedra or computer-graphics manifolds in ℝ3 using three-dimensional tessellations such as voxels or, for polygons, polylines or point sets in ℝ2, using a two-dimensional tessellation. Tessellations used can either be regular, having tessera of equal shape and dimensions, or irregular, depending on the intended application of the method. Vertex clustering has been in use for some time now in computer graphics 45 applications (Dalmau, 2004; Rosignac, 2004; Yang and Chuang, 2003), but is relatively new to cartographic data transformation. The algorithm operates across scale, and within a given resolution. Scale is understood in the traditional cartographic sense, and is expressed as a ratio indicating the magnitude of reduction at which the representation exists from the real feature. Resolution is understood to be the level of representational detail possible (and by corollary, level of visual detail discernable) on the target map, and is expressed by the size of the smallest possible graphic on the map (e.g., 0.25 mm, or the width of a pixel). To determine the level of simplification necessary, the hexagonal quantization algorithm ingests the target scale, in the form of a ratio denominator, and a resolution, in the form of a line weight. These determine the size of hexagons used in the sampling tessellation, which reflect the resolution at which the output line may be shown to change direction between vertices on the target map. Details regarding the tessera width calculation are given later in this chapter. The hexagonal quantization algorithm begins with map data as it exists after projection; that is, the data is considered and computed in the form of two-dimensional coordinates lying on a Euclidean plane. There is considerable basis for investigating a vertex clustering approach to line simplification using spherical tessellations and spherical geometry (i.e., angular latitude and longitude coordinates and geodetic surfaces), but this is outside the scope of the present research. One consideration in applying tessellated vertex clustering to three-dimensional surfaces is the impossibility of a regular spherical tessellation in Euclidean geometry; semi-regular tessellations, such as the alternation of hexagons and pentagons seen on a common soccer ball, are possible, but violate the desired quality of line vertex sampling equipotential because of inconstant tessera size and orientation. Regular hexagonal tessellations are used in the algorithm; all hexagons have the same dimensions, angles, and orientation. The algorithm specifically uses equilateral hexagons (i.e., all 46 corners being 60° and all sides of equal length). Upon computing a desired resolution for the tessellation from a target scale and line weight, hexagons are drawn according to their desired “width,” being the perpendicular distance between two opposing sides (Figure 3.2). The hexagons produced by the algorithm are all oriented such that two sides (i.e. the “top” and “bottom” sides) are oriented perpendicular to true north. Figure 3.2 - Hexagon width (i.e., tessera resolution). It is possible to rotate the tessellation through a range of 60° to achieve different hexagon orientations (Figure 3.3). It is intuitively understood that differences in orientation would produce differences in the sets of simplified line vertices produced by the algorithm, and thereby produce differences in distance measured between the input and simplified lines. Similarly, different positions of the tessellated grid over the input line would also produce differentlyclustered output vertices (Figure 3.4). These variations are not tested in the present research, and will comprise future research and development of the algorithm. Figure 3.3 - Sixty-degree range of rotation for regular hexagonal tessellations. 47 Figure 3.4 - The effect on output lines caused by shifting the tesserae. Input vertices and lines are in gray, and output vertices and lines are in red. Tessellation and Polyline Structure The tessellations used by the algorithm provide both a sampling strategy and a structure upon which to construct simplified product lines. Regular tessellations are chosen over irregular tessellations on the basis of equipotential sampling of the input polyline vertices. The vertices of an input line are presumed to be points on an Euclidean plane (ℝ2) defined by Cartesian coordinates. The coordinate values of these points correspond to any twodimensional coordinate system associated with a map projection, such as the eastings and northings of the UTM or U.S. State Plane systems. By this definition, the vertices of the input line are free to exist at any point within the bounding box defined by the maximum and minimum x and maximum and minimum y values among the set of points defining the line. Because the points can exist anywhere in this bounding box, the algorithm implementations in this thesis begin with the assumption that any sampling window (i.e., tessera) placed anywhere within the bounding box is equally as likely to intersect with one or more points as it is at any other position within the area. This holds true so long as the area of the sampling 48 window (i.e., tessera) is constant; it would be false if the area were variable, with likelihood of intersection increasing as sample window area increases. While an equipotential sampling approach is taken, it is understood that vertices along a polyline defining a feature such as a river are not randomly placed, but are patterned to model a real landscape feature. This holds true in both the case of data digitized by some mechanical sampling method (e.g., digitizing tablets recording a point every second), as well as data where each vertex was placed deliberately, since in both cases the vertices are placed along the linear feature being modeled. To sample the variability in polyline direction changes by way of tessera sampling windows, it is important that the orientation of the tesserae remain constant, so that any measures of direction can be consistently made against a common tessellation layout. Each straight line segment between vertices has its own orientation, and there is likely to be high variability among those orientations across all the line segments in the polyline, particularly in the sinuous polylines of rivers or coastline features. Because there are usually many line segments constituting a polyline, there are many instances at which the polyline changes direction and wide variation in the degree to which it does so. A common tessellation layout throughout allows the variability in direction change to be consistently sampled. In considering the variation in direction throughout any given map polyline and observing that constant sampling orientation be maintained, another quality of sampling tessellation geometry becomes desirable: equidistance to all immediate neighbors. Applying tessellations as schemes for sampling plane surfaces where signal can be distributed freely across the plane, the quality of equidistance to all immediate neighbors in each tessera translates to regular and uniform sampling. This is desirable, since non-uniform sampling of areal point features can introduce geometric artifacts into the set of detections which do not reflect the real nature of the signal. As was noted in the preceding chapter, of the three possible regular 49 tessellations of the Euclidean plane, only hexagons maintain equidistance to all immediate neighbors, a quality described by the term radial symmetry. Steps of the Hexagonal Quantization Algorithm The following section describes the three essential stages in the hexagonal quantization algorithm: (1) the calculation of tessellation resolution, (2) the layout of the hexagons over the input line, and (3) the vertex clustering procedure. Calculation of Tessellation Resolution The algorithm must first determine the dimensions of the hexagons to be used from user input parameters. As with the Li-Openshaw raster-vector algorithm, the hexagonal quantization algorithm achieves scale-specificity by sizing tessera according to a mathematical relation with target scale. Li and Openshaw (1992, p. 378) suggest calculation of the diameter of their smallest visible object (SVO) in relation to the input data scale as well as the target scale and symbol width. In contrast, the method of calculating tessera resolution in this research considers target scale and map resolution (i.e., symbol width) alone as definitive of appropriate resolution. This approach is based on map resolution as described by Tobler (1987), who draws on notions from sampling theory (Nyquist, 1928; Shannon, 1948). Tobler defines the resolution of a map to be half the size of the smallest detectable feature on the map (1987, p. 42). He considers the smallest mark a cartographer can make on the map, calculates the ground distance that size represents at the map scale, and takes that value as the map resolution. That resolution is understood to be sufficient for detecting (or representing) objects twice the size. Elaborating on that reasoning, he offers an adjustment to compensate for inconsistencies in data sampling (p. 44): 50 From sampling theory it is known that the detection of a feature is only possible if the sampling rate is twice as fine as the size of the feature to be detected ... Since observations are never perfect, the better rule of thumb is to use a sampling interval one fifth the size of the feature to be detected. Building directly from Tobler's ideas, the hexagonal algorithm takes two input parameters: the target scale for the product simplified line, and the line weight (i.e., symbol thickness) at which the product line will be drawn. From these two values the tessellation resolution r is derived using the following simple formula: 𝑟 = 5(𝑙)(𝑠) where l is line weight and s is the target scale denominator. Units used throughout the calculation should be those desired for defining the real world width of tessera (e.g., using meters, a line weight of 0.5 mm [0.0005 m] and target scale of 1:250,000 yields r = 625 m). In all cases in this research, a line weight of 0.25 mm was used. This value was chosen to reflect the resolutions of modern topographic paper map printing, as well as today's high pixel density displays (such as smart phone displays, which can exceed 240 ppi). Tessellation Layout The algorithm next computes the bounding box of the line to be simplified by identifying the maximum and minimum x and y values of the vertices that make up the line. It then proceeds to completely cover the area of the bounding box with hexagons of width r. This is done with overlap around all four bounding box edges, to ensure that no points near the edges of the box fail to intersect with a hexagon. The first hexagon is drawn at the north-west corner of the bounding box, taking the corner as its center and defining its six corner points around that center. A column of hexagons is then drawn south of this first hexagon until the southern edge of the bounding box has been crossed. A new column is then defined immediately east of the first, 51 staggered to the south by half the value of r. All new hexagons borrow the exact x and y values of corner vertices from pre-existing neighbors in order to ensure that no “sliver” gaps or overlaps occur in the tessellation as a result of minute computer rounding errors. New columns are defined until the eastern edge of the bounding box has been completely crossed. The process is illustrated in Figure 3.5. Figure 3.5 - Layout of hexagons using the bounding box delimiting the line. The hexagon in the north-west corner is drawn centered on the bounding box corner first, with hexagons below it drawn to follow. The second “column” of hexagons to the east is drawn next, and the process continues until the bounding box is completely covered by a hexagon on all sides. Vertex Clustering and Quantization Upon tessellation layout, the algorithm iterates through the vertices of the input line. Starting with the first vertex, the single hexagon with which intersection occurs is identified. Each subsequent vertex also identifies which hexagon it intersects with. If that hexagon is the same as that intersected by the previous vertex, the current vertex is added to a current collection of vertices pertaining to a single cluster. If the hexagon is different from that intersected by the previous vertex, the previous collection of vertices is considered closed and a new collection is 52 begun with the current vertex. In this manner, a single hexagon may have more than one collection (i.e., cluster of vertices) defined within itself, depending on how many times the input line passes through it and places vertices in it (Figure 3.6). Many hexagons, especially at scales Figure 3.6 - Constructing an output vertex (orange) for each pass (first in red, second in blue) of the input line through the hexagon. closer to that of the input data, will have only one pass of the input line through them, but because multiple passes are common, it is important to handle events in which this occurs. As clusters are defined, they are stored in a sequential array. Once the whole line has been considered upon iterating through all vertices and every line vertex has been assigned to a cluster, the algorithm implements the collapse of each cluster to a single vertex (i.e., it quantizes each cluster). As Li (2007, pp. 152-153) notes, this can be done by an almost infinite number of methods (i.e., any point within the hexagon can be used to represent all points contributing to that cluster). This research considers two vertex clustering methods. Each method represents a distinct means of quantizing the vertices in a given tessera. These are made available to the user as options, chosen before the algorithms is run. The methods are illustrated in Figure 3.7, and described as follows: 53 • the midpoint of a line segment drawn between the first and last vertices in a cluster; • the spatial mean of the vertices in a cluster. In both choices, the case of a single-vertex cluster quantizes to the unmoved vertex itself. Finally, once all clusters have been quantized, their product single points are strung together in sequence to produce the output simplified line. Figure 3.7 - The two clustering methods used in this research. The midpoint of the first and last vertices method is illustrated on the left, while the spatial mean of vertices is illustrated on the right. Clustering Routine Compared Li & Openshaw’s Suggestion An important difference from the Li-Openshaw raster-vector algorithm is used in this research for addressing instances in which a line loops through a tessera more than once. Li (2007, p. 154) writes, “If there is more than one intersection, the first (from the inlet direction) and the last (from the outlet direction) intersections are used to determine the position of the new point.” This is illustrated in Figure 3.8. This strategy effectively cuts off any portions of the line outside the tessera between the inlet and outlet points in question. Effectively, this strategy guarantees that no line self-intersections can occur in the product line, since the output line will 54 Figure 3.8 - Li's suggested solution for single vertex selection within cells with multiple passes of the input line - see cell at top, center. (Source: Li, 2007, p. 153) always progress from one raster cell to the next without risk of curving back on itself. However, if any important line features between the inlet and outlet vertices in question exist, these will be deleted by the strategy (Figure 3.9). The hexagonal quantization algorithm instead places a collapsed vertex inside each tessera for each pass of the line through it (Figure 3.6). This permits all line segments to be represented, though it also reintroduces the possibility or line self-intersection. This is a particularly important property for the hexagonal quantization algorithm in this research, since omissions of line segments based on upstream vertex clustering would problematically skew observed Hausdorff distances between input and product lines. Though self-intersections are observed to be rare, they are fundamental problems that must be resolved. While selfintersections are not solved in this thesis, a method for their repair as a post-processing routine has been devised, and is under development by the author (further details are given in the Conclusions and Further Work chapter). 55 Figure 3.9 - An effect of Li's suggested method of selecting single vertices in a cell with multiple input line passes. In this example, the application of Li’s suggestion at the tessera overlapping the peninsula’s connection to the mainland would cause the entire peninsula to be deleted, whereas a representation of it could be retained at this cell resolution (i.e., target scale). In addition to the development of the hexagonal quantization algorithm, this study has also implemented the Li-Openshaw raster-vector algorithm. To allow for comparison across geometries, both the hexagonal quantization and Li-Openshaw raster-vector algorithms are implemented and run with tessera resolution derived by the formula given above (i.e., the LiOpenshaw square cell size is not calculated using Li and Openshaw's SVO estimation formula). There are two reasons for this. First and most importantly, maintaining like tessera “width” allows for direct comparability between squares and hexagons. While it was considered that squares and hexagons of equal area should be used, equal “width” was deemed more appropriate. This was because width, rather than area, plays a definitive role in placing output polyline vertices at such distances as have been determined to be visually resolvable. Second, the formula developed here is based on map resolution at target scale, and does not require the input data scale as a parameter, whereas the formula given by Li and Openshaw does. Li and Openshaw's formula parameterizes their product lines by a scale-differential similar to that proposed by Töpfer and Pillewizer (1966). However, not requiring an input scale parameter offers advantages 56 in that input data of variable or uncertain vertex resolution can be used, and error in data that are maintained to inconsistent resolutions (often caused by inconsistent digitization) is not propagated to the output line. Also, the Li-Openshaw algorithm is implemented using the same tessellation and vertex clustering methods described above. Implementation The algorithm was implemented using a mixture of tools in Esri's ArcGIS and software custom-written in Java (version 6). Input lines were first loaded from Esri shapefile data in ArcMap, and projected to the appropriate UTM zone, using the North American 1983 datum. The Esri geoprocessing tool “Dissolve” was used to reduce sample lines to single polylines, where each vertex along the whole line was then stored in a single, ordered data array. These lines were then reduced to their vertices using the ArcGIS geoprocessing tool “Feature Vertices to Points”. Two new columns were added to the attribute tables of these lines, one for Eastings and another for Northings; these were subsequently calculated, in meters, from the UTM projection. These attribute tables were then exported to csv files. All of the tessellation and vertex collapse processes were handled by the custom-written Java software. This software was designed with a graphical user interface, or GUI (Figure 3.10). The interface permitted the selection of input csv line files; specification of output csv files; selection of hexagonal, square, and Hausdorff distance calculation routines; selection of vertex clustering methods; and specification of input parameters. The GUI also produced textual reports on algorithm runs, and enabled the saving of these reports to txt files. Both the hexagonal and square algorithms operated by accepting arrays of custom-written objects of type Point as the vertices of an input line. These were read from the csv files exported 57 Figure 3.10 - A screen shot of the graphical user interface of the software developed to implement the algorithms and the calculation of Hausdorff distances. from ArcMap. Using the input parameters specified by the user, the algorithms called on various routines to lay out tessellations and perform the vertex clustering according the methods described earlier, as well as save their outputs to new user-specified csv files. The output files took the form of basic csv tables, where each record represented a vertex along the simplified line. Each record was attributed with three pieces of data: its number in the sequence of vertices along the output line, and its easting and northing coordinates in meters. Output csv files were then loaded into ArcMap, and x,y plotting was used to draw the vertices in map space. A public-domain script written by David Wynne called “Points to Line” (available for download from http://arcscripts.esri.com/details.asp?dbid=15945) was then used to 58 string the plotted vertices together in the sequence defined in their attribute values, and to save the product lines in Esri shapefile format. The Java portion of the implementation used in this research was designed to permit sequential running of both the hexagonal quantization algorithm and the implementation of the Li-Openshaw raster-vector algorithm using the same input parameters. In this manner, it was possible to couple both algorithms, each time using the same input file and input parameters (line weight and target scale), and each time calculating the Hausdorff distances between input and output vertices. Thus, with each run, it was possible to produce two output simplified lines, one by either algorithm, with hexagon or square width being identical across both shapes. Thus, the products of a vertex clustering method using hexagons of width x could be compared to those of the method using squares of side-length x. To keep related hexagon and square products associated, a file naming convention was adopted that contained the line name, the collapse method used, the shape used, and the scale to which the line was simplified (e.g., "NovaScotia_C_MpH_250k.csv" indicated the coast of Nova Scotia, collapsed using the midpoint of the 1st and last vertices in a hexagonal tessera, simplified to 1:250,000, being a hexagon width of 312.5 meters). Also, coupled files were identified by name and algorithm parameters stated in each output text report produced by the Java software (an example of one of these is provided in Appendix B). Sample Lines Thirty-four sample lines were used in this study. A sample size of 34 was chosen for two general reasons: first, when all lines would be considered across one given scale and algorithm processing iteration, there would be a sufficient sample size (i.e., 30 or greater) to expect a Gaussian distribution, making the use of parametric statistical analyses more likely to be 59 appropriate. Second, 34 lines, when processed once for each algorithm, each vertex clustering method, and each scale, came to a total of 952 lines, a number which seemed both reasonable and manageable. All lines used in this study are portions of coastlines and rivers from Canada or the United States. American lines were taken from the “high resolution,” 1:24,000 USGS National Hydrography Dataset (NHD) (Simley & Carswell Jr., 2009). NHD data were downloaded using the USGS National Map Viewer (http://viewer.nationalmap.gov/viewer/). Canadian data were taken from the National Hydro Network (NHN), maintained by the Canadian Council on Geomatics, and drawn from geospatial data collected by both federal and provincial or territorial governments. NHN data are produced to varying scales from 1:10,000 to 1:50,000, and are provided to the largest scale available in any given area (Geomatics Canada, 2010, p. 6); Canadian lines were carefully chosen to be of larger rather than smaller scales. All lines were sampled from larger downloaded data sets. Each line was clipped from a larger river line or coastline such that the straight-line distance from beginning to end points was within 15 to 20 km. Lines were selected to have a wide variety of complexities. Also, lines were selected to represent a range of geomorphologic river and coast types (Trenhaile, 2007). Sample coasts were taken from ice-dominated rocky beaches (e.g., the coast of Killiniq Island, Nunavut), tidal-dominated coasts (e.g., the shore of the Bay of Fundy, Nova Scotia), a sandy wave dominated beache (Myrtle Beach, South Carolina), an estuary shore (e.g., Potomac River, Virginia), lake shores (e.g., Lake Superior, Ontario), and a river delta (Mississippi River delta, Louisiana). Rivers were chosen to represent complex and highly sinuous lines that strongly need simplification at reduced scale (e.g., Humboldt River, Nevada; Sweetwater River, Wyoming; Rio Grande, Texas), as well as those with relatively straighter courses (e.g., Yukon River, Yukon Territory; Cedar River, Iowa). All 34 lines used are mapped in Figure 3.11. All lines are also listed and depicted without simplification in thumbnails in Appendix A. 60 Figure 3.11 - Locations of the 34 sample lines used in this research. Coast and shore lines are indicated in italics. (Background hypsometric tint courtesy of Tom Patterson, source: NaturalEarthData.com) Experiment Design and Statistical Comparison Between Hexagonal and Square Outputs Notes on the Use of Hausdorff Distance The Hausdorff distance, as explained in Chapter 2, is a metric for measuring the difference between two sets in a metric space. In this research, the Hausdorff distance using the Euclidean distance between two points in ℝ2 is measured between the sets of input and output line vertices. Since the output line is generated from the input line, the Hausdorff distance 61 between the two lines can be taken to reflect a measure of maximum aerial deviation of the simplified line from the input line. Because the output line is created by a vertex clustering approach within the cells of a regular tessellation, the dimensions of the tessera provide an absolute upper-bound to the possible resultant Hausdorff distance (Rossignac, 2004). In other words, the Hausdorff distance cannot exceed the maximum length possible within a tessera. That distance is the one from one corner to the opposite corner in the cases of both hexagons and squares. For example, in a hexagon of “width” (side-to-opposite-side) 100 m, the corner-to-opposite-corner distance is 116 m. Since 116 m is the largest distance that can fit within the hexagon, it provides an upper bound to any Hausdorff distances that can result from a within-hexagon vertex clustering operation. (While it is possible to shrink squares such that both shapes have equal corner-to-opposite-corner dimensions and compare the two resulting tessellations, this would not allow for comparison across the differing geometric connectivity with neighboring tessera between hexagons and squares of a given resolution.) Finally, while it is well known that the Hausdorff distance is sensitive to outliers, the vertex clustering approach undertaken in this research guarantees that no outliers are ever produced. Experimental Design To compare the relative displacements caused by the hexagonal and square algorithms, a randomized block experimental design (Mendenhall, Beaver, & Beaver, 2006) was used. All 34 lines were each simplified by both hexagonal and square algorithms to seven different scales, chosen to correspond with round-number scales commonly used by national mapping agencies: • 1:50,000 62 • 1:100,000 • 1:150,000 • 1:200,000 • 1:250,000 • 1:500,000 • 1:1,000,000 Also, all lines were processed for a target map resolution of 0.7 PostScript points (equivalent to 0.25 mm), this value having been chosen to reflect common map printing standards to date. Hausdorff distances were measured between the vertices of all simplifications and their input lines. Thus, for each of the seven target scales, 34 Hausdorff distances were recorded for each of the set of hexagonal simplifications and the set of square simplifications. This entire process was carried out twice: one for simplifications using the midpoint of a line segment between the first and last vertices in a tessera as the quantization method, and again for simplifications taking the spatial mean within a tessera. SPSS (version 18) and R (version 2.11.0) statistical software packages were used to analyze all Hausdorff distance data. The means and standard deviation of all sets of Hausdorff distances across the 34 sample lines were calculated in order to compare relative values across hexagon-square pairings. All sets of Hausdorff distances were examined using quantile-quantile (Q-Q) plots for normality. It was observed from these that most data sets exhibited normal distributions. Thus, the data were subjected to paired samples T-tests to 95% confidence intervals in order to determine whether relative mean values of Hausdorff distances across hexagon and square simplifications differed significantly. Because some of the Hausdorff distance datasets deviated substantially from normality, the data sets were also all subjected to nonparametric related-samples Wilcoxon signed rank tests for comparison of results against the parametric 63 statistics. Finally, mean Hausdorff distances collected from all 952 simplification runs were analyzed in a three-way analysis of variance (ANOVA) test to examine for significant effects from three factors independently, as well as in interaction combinations: algorithm used, quantization method used, and scale. Results of these statistical analyses are reported in the next chapter. Chapter 4 Results and Interpretations There are two sets of results reported in this chapter, reflecting the cartographic results of the line simplification algorithms implemented, and the results of the statistics calculated on differing Hausdorff distances between the hexagonal quantization and Li-Openshaw raster-vector line simplifications. Interpretations are then offered. Resulting Line Simplifications: Visual Presentation Both the hexagonal algorithm and the implementation of the Li-Openshaw raster-vector algorithm yielded simplified lines. A total of 952 simplified lines were produced from the 34 samples across all iterations of both algorithms, all target scales, and both vertex clustering methods. For concision, a sample of these lines are presented here; these were chosen by the author as a representative sample of qualities observed across all of the study’s output lines. All figures draw output lines at a line weight of 0.7 PostScript points (0.25 mm). Figure 4.1 illustrates all 34 lines, simplified to 1:500,000 using the hexagonal quantization algorithm and midpoint first and last vertices vertex clustering method. This figure illustrates the general success of the algorithm’s application to a diversity of line forms. 65 Figure 4.1 - All 34 lines simplified by the hexagonal quantization algorithm to 1:500,000 and drawn to scale. 66 Figure 4.2 uses a portion of the coast of Maine to illustrate the output lines of both algorithms at all seven scales using the spatial mean vertex clustering method in each tessera. Figure 4.3 does the same, but using the midpoint first and last points vertex clustering for either algorithm. All lines on both figures are drawn at 1:24,000 with the original 1:24,000 line drawn in gray in the background. While the Hausdorff distance analyses given later in this chapter provide quantitative evaluations of difference, these figures are given to allow for visual comparison of the positional fidelity of either algorithm to a common input line. Figures 4.4 and 4.5 are similar to 4.2 and 4.3; these illustrate the four most extreme scales in the study (1:200,000; 1:250,000; 1:500,000; and 1:1,000,000) for a complex, curving pair of narrow peninsulas extending from the Alaskan Peninsula. Again, the products of both algorithms are drawn at 1:24,000 above the original 1:24,000 line for visual appraisal of relative fidelity. Using a portion of the coast of Newfoundland, Figures 4.6 and 4.7 together compare the output of both algorithms, this time with lines drawn to target scale. These figures provide examples of the performances of either algorithm at the target scales at which they are meant to be observed. Careful observation of the two figures permits visual comparison of the products of either algorithm. Figures 4.8 through 4.10 each provide output lines from both algorithms using both vertex clustering methods; each figure illustrates one of three locations and one of three separate scales. These figures provide further material for reader visual inspection. A discussion of these figures is given under the Interpretations heading of this chapter. 67 Figure 4.2 - Simplifications of a portion of the coast of Maine produced by both the hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the spatial mean quantization option, against the original line (gray). All lines drawn to 1:24,000. 68 Figure 4.3 - Simplifications of a portion of the coast of Maine produced by both the hexagonal quantization algorithm (purple) and the Li-Openshaw raster-vector algorithm (green) using the midpoint first and last vertices quantization option, against the original line (gray). All lines drawn to 1:24,000. 69 Figure 4.4 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm (green, right) using the spatial mean quantization option, against the original line (gray). All lines drawn to 1:24,000. 70 Figure 4.5 - Simplifications of a portion of the coast of the Alaskan Peninsula produced by both the hexagonal quantization algorithm (purple, left) and the Li-Openshaw raster-vector algorithm (green, right) using the midpoint first and last vertices quantization option, against the original line (gray). All lines drawn to 1:24,000. 71 Figure 4.6 - Portion of the coast of Newfoundland, simplified to seven target scales by the hexagonal quantization algorithm using the midpoint first and last vertices quantization option. 72 Figure 4.7 - Portion of the coast of Newfoundland, simplified to seven target scales by the LiOpenshaw raster-vector algorithm using the midpoint first and last vertices quantization option. 73 Figure 4.8 - Portion of the Humboldt River, simplified to 1:150,000 by both algorithms using both quantization options. The orange box signifies the location of the 1:24,000 segment (at top) on the simplified lines. 74 Figure 4.9 - Portion of the Mississippi Delta coastline, simplified to 1:250,000 by both algorithms using both quantization options. 75 Figure 4.10 - Portion of the shore of the Potomac River, simplified to 1:500,000 by both algorithms using both quantization options. The orange box signifies the location of the 1:24,000 segment (at top-center) on the simplified lines. 76 Statistical Results Mean Hausdorff Distances Table 4.1 reports mean Hausdorff distances, in ground meters, between simplified and input vertices calculated from all sample lines, across both simplification algorithms, at each target scale. PairedSamples T Tests Tessera width (m) Spatial Means 1:50,000 62.5 38.9 39.9 41.0 40.2 1:100,000 125.0 75.0 82.2 78.2 88.4 1:150,000 187.5 110.3 123.7 121.0 127.6 1:200,000 250.0 144.0 156.8 154.4 174.1 1:250,000 312.5 181.5 193.4 216.7 216.6 1:500,000 625.0 354.2 381.3 388.4 410.9 1:1,000,000 1250.0 676.8 701.7 765.1 747.0 Hexagons Squares Midpoint of 1st and last vertices Hexagons Squares Table 4.1 - Mean Hausdorff distances (in ground meters) between simplified and input vertices. Each mean Hausdorff distance is calculated from n = 34 simplified lines and their related input lines. From Table 4.1 it is apparent that mean Hausdorff distances were usually (i.e., 11 of 14 pairs) shorter for hexagonal simplifications than their paired square counterparts. To determine whether each difference between means was statistically significant, two tests were conducted: the paired-samples T test for difference in means (a parametric test), and the related samples Wilcoxon signed-ranks test (a nonparametric test). Both the T tests and Wilcoxon signed-rank test begin with the null hypothesis (denoted by H 0 ) that there is no significant difference between the mean Hausdorff distances produced by 77 either algorithm. Both tests were conducted to the 95% confidence level (i.e., α = 0.05). The significance value (i.e., p-value) calculated by either test indicates the probability of observing mean Hausdorff distances differing as extremely as they do in the data, if the null hypothesis of no significant difference in means were true. When the significance value for either test is below 0.05, the null hypothesis of equivalent Hausdorff distances is rejected, and there is evidence to suggest that the differences in mean distance seen in Table 4.1 are due to differences between the performances of either algorithm. Figures 4.11 and 4.12 illustrate Q-Q (“quantile-quantile”) plots drawn on all sets of Hausdorff distance measures. Q-Q plots are used to compare two probability distributions by plotting their quantiles against each other. If one distribution consists of observed values while the other consists of theoretically expected values, a Q-Q plot may be used to visually determine whether or not a dataset is statistically normal, and can thereby be analyzed using parametric statistical techniques. A normal dataset will lie in a relatively straight line along the y=x line, indicating that observed values conform closely with those expected for normality. Interpreting Q-Q plots is not strictly objective, and requires discrimination on the part of the analyst. In Figure 4.11, for example, the Q-Q plots for hexagons and squares at 1:250,000 are particularly exemplary of normal datasets, while that for squares at 1:250,000 in Figure 4.12 is not normal. 78 Figure 4.11 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and square samples, using the spatial mean quantization option. 79 Figure 4.12 - Quantile-Quantile plots for mean Hausdorff distances across hexagonal and square samples using the midpoint first and last vertices quantization option. 80 Having observed that most data sets conformed to normality, it was decided to first analyze using parametric statistical techniques throughout. For either vertex clustering method, seven paired-samples T tests were run, one for each target scale. Their calculated statistics are given in Tables 4.2 through 4.5, each corresponding to one of the vertex clustering methods used. Pearson correlation coefficients are given for each paired sample set (i.e., hexagons and squares to a given scale), in Tables 4.2 and 4.4. These, when given with a significance value lower than 0.05, describe the degree to which one can predict a relationship between the paired samples (e.g., how consistently hexagons will result in smaller Hausdorff distances than squares). These correlations are important because they describe, across several scales, how consistently one algorithm may have performed with shorter Hausdorff distances than the other. All Pearson correlation coefficients in Table 4.2, for example, are significant, while that given for the 1:50,000 comparison in Table 4.4 is not. The Pearson correlation coefficient can take any value from -1 to 1, with -1 signifying a perfect negative correlation, zero signifying no correlation, and 1 signifying a perfect positive correlation. For example, the Pearson correlation coefficient of 0.799, with significance 0.000 given for the 1:500,000 comparison in Table 4.2 indicates a sure and strong correlation between the algorithm used and a shorter mean Hausdorff distance (i.e., the hexagonal algorithm yielded shorter Hausdorff distances). 81 Hexagons vs. Squares Scale Pairings 1:50,000 N 34 Correlation 0.553 Sig. 0.001 1:100,000 34 0.748 0.000 1:150,000 34 0.768 0.000 1:200,000 34 0.584 0.000 1:250,000 34 0.768 0.000 1:500,000 34 0.799 0.000 1:1,000,000 34 0.520 0.002 Table 4.2 - Pearson correlation coefficients for differences in means observed between the hexagonal and square algorithms, using the midpoint first and last vertices quantization option. Paired Differences Scale Pairings 1:50,000 Mean -0.805294118 Std. Deviation 9.927300965 Std. Error Mean 1.70251807 1:100,000 10.17705882 12.03082469 1:150,000 6.590294118 1:200,000 95% Confidence Interval of the Difference Lower -4.269093175 Upper 2.65850494 t -0.473 df 33 Sig. (2tailed) 0.639 2.063269412 5.979305643 14.374812 4.932 33 0.000 18.1278225 3.10889591 0.265197832 12.9153904 2.12 33 0.042 19.69058824 28.06652605 4.813369507 9.89771434 29.48346213 4.091 33 0.000 1:250,000 -0.098823529 34.70618579 5.952061758 -12.20838423 12.01073717 -0.017 33 0.987 1:500,000 22.55235294 54.79708984 9.39762338 3.43274442 41.67196146 2.4 33 0.022 1:1,000,000 -18.08323529 179.0896429 30.71362037 -80.57056577 44.40409518 -0.589 33 0.560 Table 4.3 - T test statistics across seven scales for the difference in mean Hausdorff distances between square and hexagonal algorithms using the midpoint first and last vertices quantization option. Hexagons vs. Squares Scale Pairings 1:50,000 N 34 Correlation 0.334 Sig. 0.054 1:100,000 34 0.482 0.004 1:150,000 34 0.473 0.005 1:200,000 34 0.595 0.000 1:250,000 34 0.357 0.038 1:500,000 34 0.210 0.233 1:1,000,000 34 0.339 0.050 Table 4.4 - Pearson correlation coefficients for differences in means observed between the hexagonal and square algorithms, using the spatial mean quantization option. 82 Paired Differences 95% Confidence Interval of the Difference Scale Pairings 1:50,000 Mean 1.006764706 Std. Deviation 10.27130652 Std. Error Mean 1.761514536 Lower -2.577063565 Upper 4.590592977 t 0.572 df 33 Sig. (2-tailed) 0.572 1:100,000 7.091764706 7.523655813 1.290296327 4.46663709 9.716892321 5.496 33 0.000 1:150,000 13.42558824 9.863044644 1.691498202 9.984209269 16.8669672 7.937 33 0.000 1:200,000 12.85882353 10.88008321 1.865918877 9.06258303 16.65506403 6.891 33 0.000 1:250,000 11.90323529 17.79977285 3.052635859 5.692600941 18.11386965 3.899 33 0.000 1:500,000 27.08029412 40.19238572 6.892937284 13.05650777 41.10408047 3.929 33 0.000 1:1,000,000 24.87529412 91.23266126 15.64627233 -6.957286275 56.70787451 1.59 33 0.121 Table 4.5 - T test statistics across seven scales for the difference in mean Hausdorff distances between square and hexagonal algorithms using the spatial mean quantization option. The T-test results provide indications regarding whether or not differences in observed mean Hausdorff distances between hexagonal and square treatments were significant at each target scale. In both Tables 4.3 and 4.5, the “Mean” column gives the difference in mean Hausdorff distances at each scale between the two algorithms (in meters). The “t” column gives the calculated T-test statistic. This value must be greater than some critical value at the test’s degrees of freedom (“df” column) to indicate statistically significant difference between means. These critical values can be looked up on a table of t-distribution critical values, but the SPSS output provides a two-tailed significance value (the right-most column) that makes this unnecessary. If the value in the “Sig (2-tailed)” column is 0.05 or less, there is reason to reject the hypothesis that the two means being compared are equal. For example, the statistics calculated between mean Hausdorff distances from the hexagonal and square algorithms at 1:100,000 given in Table 4.3 indicate significant difference, while those at 1:1,000,000 do not. Because some distributions of Hausdorff distances departed substantially from normality, the nonparametric related-samples Wilcoxon signed rank test was also used to test for significant difference in means, to a confidence interval of 95% in all cases. This was done to corroborate findings from the paired samples T-tests, and to make certain that T-test findings were not 83 spurious in the presence of some non-normal data. These statistics are given in Table 4.6. For each target scale (along the left-most column) and for either vertex-clustering method (indicated in the top-most row), the mean Hausdorff distances from all 34 lines were compared across both algorithms to determine whether they were significantly different, this time without assuming a normal probability distribution in the data. When significance values are 0.05 or less, the null hypothesis of no differences in the mean Hausdorff distances derived from either algorithm is rejected (i.e., that one algorithm places output lines significantly closer to the input line than does the other). Spatial Means Midpoint of 1st and last point H 0 : no significant difference in means Significance Kept .562 1:50,000 H 0 : no significant difference in means Kept Significance .066 1:100,000 Rejected .000 Rejected .000 1:150,000 Rejected .000 Rejected .020 1:200,000 Rejected .000 Rejected .001 1:250,000 Rejected .001 Kept .285 1:500,000 Rejected .001 Rejected .009 1:1,000,000 Kept .101 Kept .952 Scale Pairings Table 4.6 - Related-samples Wilcoxon signed rank statistics. The Hausdorff distance of each of the 952 line simplifications generated in this research represents a permutation of three factor variables: the algorithm used (hexagons vs. squares), the quantization method used (spatial mean vs. midpoint of first and last vertices), and the target scale (i.e., tessera width, by corollary). In order to investigate the effects of each of these factors, 84 both independently and in interaction with each other, a three-way analysis of variance (ANOVA) test was conducted. As with the T test and Wilcoxon signed rank tests, this test determines whether significant difference exists between groupings of means of a dependant variable (being Hausdorff distance in this case); the null hypothesis is that no significant difference exists. At the 95% confidence level, the null hypothesis is rejected when the calculated significance value falls below 0.05. The results of this test are given in Table 4.7. Factors Scale Quantization Algorithm Scale × Quantization Scale × Algorithm Quantization × Algorithm Scale × Quantization × Algorithm Residuals df 1 1 1 1 1 1 1 944 Sum of Squares 46366000 116255 23214 101993 24 4116 10725 3498465 Mean Square 46366000 116255 23214 101993 24 4116 10725 3706 F value 12511.0597 31.3693 6.2638 27.5212 0.0064 1.1106 2.8939 Sig. 0.000 0.000 0.012 0.000 0.936 0.292 0.089 Table 4.7 - Three-way ANOVA test statistics across all 952 simplifications and three factors. Along with Hausdorff distances, numbers of vertices were recorded with each simplification, and a percent reduction in this number from the input line was calculated for each simplified line. Mean values for hexagonal and square treatments at each scale, and across both vertex clustering methods, are given in Table 4.8. While no statistical analyses are performed on these values, it can be quickly seen that differences in reductions of vertices by either algorithm, using either vertex clustering method, are minute. An immediate conclusion from these data is that neither algorithm seems to reduce vertices appreciably more than the other. While vertex reduction has been a concern for some authors in line simplification research, it is not the aim of either algorithm used in this research. 85 1:50,000 1:100,000 1:150,000 1:200,000 1:250,000 1:500,000 1:1,000,000 Spatial Means Hexagons Squares 47.70 49.15 67.58 68.19 76.56 76.96 81.84 82.03 85.04 85.21 91.95 92.29 95.98 95.69 Midpoint of 1st and last point Hexagons Squares 48.15 49.73 67.91 68.51 76.78 77.19 82.01 82.20 85.18 85.35 92.03 92.36 96.02 95.72 Table 4.8 - Mean percent reductions in vertices from the input line, averaged across all 34 sample lines, for each algorithm and each quantization option. Interpretations One of the goals of this research has been to demonstrate that the fidelity of cartographic lines produced from a vertex clustering simplification algorithm using hexagonal tessellated sampling is greater than that produced by the similar Li-Openshaw raster-vector algorithm, which uses square raster cells. There can be both subjective and objective evaluations of this claim, based on either aesthetic or metric judgments. Discussion of Cartographic Results From the preceding material in this chapter, it can be seen that both the hexagonal algorithm and the implementation of the Li-Openshaw raster-vector algorithm produce comparably acceptable cartographic lines. It should be repeated that for the sake of direct comparability, this research has used the tessera width calculation formula developed for the hexagonal quantization algorithm for both hexagon and square size; thus, the products of the LiOpenshaw raster-vector algorithm presented here are not precisely those that would be achieved for a given target scale using Li and Openshaw's (1992, p. 378) formula. Also, the issue of line 86 self-intersections was discussed in the third chapter; while both algorithms as implemented in this research did produce occasional self-intersections, this would not have happened with the LiOpenshaw algorithm had Li's (2007, p. 154) recommended vertex clustering method been employed. A counter argument against using Li's suggested clustering method, however, is that whole portions of lines, such as peninsulas or small bays, would have been omitted because their outlets were narrow enough to fall within one tessera (Figure 3.9). Upon close observation, it is clear that the algorithms seem to always produce differing lines. Figures 4.6 through 4.10 provide various examples of the output lines of either algorithm drawn at target scale and with target line weight. An important point to make is that neither algorithm seems to have an obvious advantage over the other in producing lines more acceptable on aesthetic grounds. This is to say that both the hexagonal quantization algorithm and the LiOpenshaw raster-vector algorithm are successful methods, and able to produce lines that would seem acceptable to many cartographers and map readers. Allowing for the comparable performance between the two algorithms in terms of aesthetics, it is important to note that another important consideration, particularly in topographic mapping settings, is the degree to which either algorithm deviates from the original line, or, put another way, the degree of fidelity to the original line each algorithm exhibits. One way to consider and seek to evaluate this is by direct visual comparison, as is possible with Figures 4.2 through 4.5. Observing these figures one can imagine a simplified line that a trained cartographer may manually draw while seeking to stay faithful to the original line. The product lines from either algorithm can then be considered for how closely each approximates the line drawn by the imaginary cartographer, which we assume would be a superior line. In the small, sinuous bay on the coast of Maine given in Figures 4.2 and 4.3, the line produced by the hexagonal quantization algorithm seems to straighten curving sections and retain narrow inlets with greater success than the Li-Openshaw raster-vector algorithm (noteworthy 87 examples are seen in the 1:500,000 graphics in both figures). Given sufficient space and resolution on the target map to depict small details such as narrow inlets, the retention of these makes an output line more faithful than another produced to the same scale that does not retain the inlets. By this reasoning, one may conclude that the hexagonal quantization algorithm performs with greater fidelity to the input line in that it will tend to retain visible geographical features through greater scale change than will the Li-Openshaw raster-vector algorithm. Further, in the case of extreme scale change to 1:1,000,000, the hexagonal quantization algorithm retains a more descriptive shape for the bay in Figures 4.2 and 4.3 than does the LiOpenshaw raster-vector algorithm. This too contributes to the greater fidelity of the hexagonal quantization algorithm, since it tends to draw more geographically informative forms at extreme scale changes than does the Li-Openshaw raster-vector algorithm. The hexagonal quantization algorithm is also seen at times to reduce small details with greater success than the Li-Openshaw algorithm. Figures 4.4 and 4.5 illustrate the performance of both algorithms on a complex set of peninsulas. As inspection of these two figures may suggest, the hexagonal quantization algorithm tended to omit the very narrow portion of the southern peninsula more often than did the Li-Openshaw raster-vector algorithm, even though the hexagonal algorithm still retained the larger portion of the peninsula. This illustrates a successful simplification of the peninsula, retaining the important fact that a peninsula of significant land mass exists while pruning away detail too small for the target map. In retaining the narrower portion of the peninsula more often, the Li-Openshaw raster-vector algorithm as here implemented encountered self-intersection problems more often than did the hexagonal quantization algorithm. Still, both algorithms exhibit this flaw; future work, outlined in greater detail in the Conclusions chapter, will address and resolve this issue. It is difficult to visually isolate effects between tessera shape difference vs. vertex clustering method from the product lines presented (Figures 4.2 vs.4.3; 4.4 vs. 4.5; and Figures 88 4.8 through 4.10). Close visual inspection suggests that the lines produced by the spatial mean quantization are slightly less angular, and thus slightly more aesthetically-pleasing, though this is not immediately obvious. Interestingly, this is coincident with the fact that the spatial mean quantization always produced shorter mean Hausdorff distances than did the midpoint first and last vertex quantization, at all scales and for both algorithms (see Table 4.1). This suggests that greater objective positional accuracy actually contributes, however minutely, to aesthetically superior results. Discussion of Statistical Results Objective evaluation is based on the statistical analyses of the Hausdorff distances between input and simplified lines. Descriptive statistics in Table 4.1 indicate shorter mean Hausdorff distances for hexagons than squares in 11 of 14 pairings. Both parametric and nonparametric tests (Tables 4.3, 4.5 - 4.7) demonstrated significant difference in the Hausdorff distances between hexagonal and square simplifications, for either vertex clustering method. Since tests were conducted both across all seven target scales (three-way ANOVA) and at each target scale (T tests and Wilcoxon signed rank test), the following discussion treats each set of analyses individually. Three-way ANOVA Results, Across Target Scales The results of the three-way ANOVA test conducted (Table 4.7) indicate strongly significant effects on Hausdorff distances from each of the following factors and combinations of factors: scale, algorithm, quantization method, scale in interaction with quantization method, and scale, algorithm and quantization all in interaction. 89 The first factor, scale, is known a priori to have an effect on Hausdorff distance; simplifications are a function of target scale and will obviously exhibit increasing Hausdorff distances as target scale decreases (i.e., as tessera width increases). Thus the significance of scale in this test is not surprising, but it is important to include it in the ANOVA model in order to account for its effects against other factors. Of most interest is the fact that the algorithm factor was determined to be significant, nearly to the 99% confidence level (“Sig.” value of 0.012, Table 4.7). This provides grounds to reject the null hypothesis of equivalence of mean Hausdorff distances between the hexagonal and square algorithms. Since the descriptive statistics given in Table 4.1 indicate shorter mean distances for hexagons in 11 of 14 comparisons, the rejection of the equivalence hypothesis strongly suggests an advantage attributable to the hexagonal algorithm. Quantization method was determined to be highly significant (“Sig.” value of 0.000, Table 4.7). This, when considered with the consistently shorter distances for the spatial mean quantization method seen in Table 4.1, strongly suggests that that quantization method dependably produces simplified lines more faithful to the input line than those produced by the midpoint first and last method. Both significant interactions (scale × quantization and scale × quantization × algorithm) include scale. Since, as stated above, scale is expected to drive Hausdorff distance values, interactions with scale are not interesting results. T Test and Wilcoxon Signed Rank Results, Within Target Scales As seen from test results in Tables 4.3, 4.5 and 4.6, significant difference existed in the Hausdorff distances generated between hexagonal and square simplifications, for either vertex 90 clustering method, at most, but not all, target scales. Since hexagons were not better in all cases, explorations of the exceptions are offered. In the case of simplifications to 1:50,000, the hexagon mean was lower than square for the spatial mean method, but not for the midpoint first and last vertices method, and in neither case was the difference statistically significant (Table 4.1). A general explanation for this is that scale difference from the input data, at approximately 1:24,000 (allowing for the variability in the Canadian data), to the target scale at 1:50,000, is not particularly large. Hexagon and square widths at that scale were 62.5 m. Table 4.7 provides a table of mean vertex reductions upon simplification using either algorithm and either collapse method. It can be seen from that table that vertex reductions in all cases for the 1:50,000 scale hovered just below 50%. Relative spacing between input line vertices was visually inspected by the author and seen to be generally consistent and uniform along the line. Granting, then, that input vertices were spaced at generally regular intervals, this means that on average a single tessera in the 1:50,000 algorithm runs, in either algorithm, usually produced its output point from two input vertices. Since both hexagonal and square sampling tessera were usually calculating output vertices from two input vertices (i.e., within tessera variability was relatively constant across tessera shapes), the distances between the collapsed points and input points would frequently be similar across both algorithms. This is true of either collapse method, since collapsed points would be the same, whether using the spatial mean or the midpoint between first and last points clustering method. At 1:50,000, hexagons, then, did not produce statistically significantly shorter Hausdorff distances than squares because the scale change was not large enough to take advantage of the lesser anisotropy of the hexagonal packing; either tessera being only large enough to capture about two points, the differing point clustering afforded between hexagons and squares was not reflected by the output vertices. For the midpoint first and last clustering method, output lines at the 1:250,000 scale did not display a statistically significant difference in mean Hausdorff distances between hexagons 91 and squares. Means were very close, with hexagons producing a slightly longer distance (216.7 m for hexagons and 216.6 m for squares, see Table 4.1). This same pattern, with hexagonal mean Hausdorff distance greater than square is seen once more, using the same vertex clustering method at 1:1,000,000. Reason for this likely exists in the overall layout of the input lines and how closely they followed the natural anisotropy of either tessellation. While hexagons are generally less anisotropic because of their six-fold radial symmetry and consistency of distance between tessera, it has been observed by some authors (Iftekharuddin & Karim, 1993; KamgarParsi et al., 1989) that square tessellations can have higher sampling fidelity rates when the signal itself is more orthogonally distributed. Cartographic lines such as rivers and coastlines generally have naturally high directional variability; while it may be true in general that hexagons sample these lines with greater fidelity, there will be some instances at some tessellation resolutions when a line will lay out relatively more orthogonally along the x and y axes. In these relatively infrequent cases, a square tessellation can actually sample the line with greater fidelity. As noted before, the spatial mean quantization method always produced shorter Hausdorff distances than did the midpoint first and last vertices method (Table 4.1). This is because the midpoint first and last points clustering method will tend to place output vertices further away from input vertices in the same tessera (see Figure 3.7). In the case of an input line intersecting with sampling tessera in a relatively orthogonal pattern, the midpoint first and last clustering method can actually reinforce the orthogonality, whereas the spatial means clustering method would tend to obscure it. Finally, at 1:1,000,000, hexagons were not seen to yield statistically significantly shorter Hausdorff distances than squares for either vertex clustering method. The mean Hausdorff distance in the case of the spatial means clustering method was shorter for hexagons, but not within the 95% confidence interval (2-tailed significance of .121, Table 4.5). As such, while hexagons did in fact perform better overall at this scale and with this clustering method (i.e., they 92 yielded a lower mean Hausdorff distance, Table 4.1), from this hexagon-square pairing alone at 1:1,000,000, the statistical analysis does not support rejection of the possibility that this is due to chance. In the case of the midpoint first and last clustering method at 1:100,000, mean hexagonal Hausdorff distance was slightly longer than square (Table 4.1); the same explanation regarding relative orthogonality of input lines at certain sampling tessera widths is suggested to account for this. Magnitude of Improvement over Squares From the compared mean Hausdorff distances, it can be seen that the magnitude of improvement presented by hexagons over squares is small (as evidenced by the mean Hausdorff distances given in Table 4.1); this difference is observed across all test pairings in Table 4.1 to represent approximately 3.5% of the width of the tessera used (4.2% for the spatial mean and 2.9% for the midpoint first and last vertices methods independently). Summary It is observed that of the two algorithms, the hexagonal quantization algorithm generally performs with the greatest positional fidelity to the input line because it produces lines with shorter Hausdorff distances to the input line. This conclusion is supported by the results of a three-way analysis of variance on mean Hausdorff distances, taking into account the factors of algorithm used, quantization method used, and target scale. It is also supported by 11 of 14 trial pairings at seven target scales. The fidelity difference between output lines from hexagons vs. squares is statistically confirmed, with the benefit of hexagons over squares being relatively 93 small. These findings are taken to support the notion that hexagons demonstrated superior performance over squares in general. It is also found that the spatial mean quantization method produces simplified lines at significantly shorter displacements from the input line than does the midpoint first and last vertices method. Chapter 5 Conclusions and Future Work There are two principal conclusions to this thesis. The first is that classical sampling theory can be successfully coupled to map resolution to inform scale-specific map generalization processes. This corroborates Li’s (1995) vision of a scale-driven paradigm for automated digital map generalization. It allows for objective generalization, and removes the need to iteratively repeat processes until a desirable solution is achieved. This is an important finding for cartography, particularly because many algorithms currently in use by cartographers cannot be calibrated to target scales, despite the fact that cartographers are often tasked with making generalizations for maps whose scales are determined ahead of time, as is the case, for example, in national topographic mapping settings. The second principal conclusion of this thesis is that hexagonal tessellation generally produces demonstrably more faithful simplified map lines then does square tessellation using a vertex clustering line simplification technique. “Faithfulness” is understood to be the minimization of positional difference, measured between the two sets of input and output polyline vertices in ℝ2 by the Hausdorff distance. It is also argued from visual inspection that the performance of the hexagonal quantization algorithm is closer to that which may be expected from a human cartographer than is the performance of the Li-Openshaw raster-vector algorithm (Li & Openshaw, 1992). One implication of this is that there now exists a method of line simplification using a similar technique that produces improved lines from those generated by the existing Li-Openshaw raster-vector algorithm. 95 This research has also demonstrated the utility of the Hausdorff distance in evaluating the products of line simplification algorithms. The Hausdorff distance enjoys widespread use in computer vision for pattern-matching because it metrizes pattern differences; the same ability to compare can be applied to cartographic input and generalized data to quantify generalization fidelity. Related to the conclusion stated above regarding the relative performances of the hexagonal quantization and Li-Openshaw raster-vector algorithms, another conclusion of this thesis is formed from the significant difference in fidelity seen between the two quantization methods tested. The spatial means method produces lines less deviated from the input line than does the midpoint first and last method. Relative Magnitude of Improvement As was mentioned in the preceding chapter, the magnitude of fidelity improvement presented by hexagons over squares is relatively small. Given that the tessera width is calculated such that it is barely resolvable at target scale, the relatively small cartographic improvement the hexagonal algorithm affords is not immediately visually appreciable. Small differences in product lines are, however, visible at times upon close inspection (see Figures 4.2 through 4.10). Future work by the author is planned to examine whether or not any visible differences in the products of the two algorithms are due to differing levels of anisotropy inherited from either tessellation; if one algorithm is found to be significantly more isotropic in its output, it is suggested that that algorithm's cartographic output is truer to reality, even if the differences are minutely noticeable. Despite the small visual improvements afforded by the hexagonal algorithm, a known value lies in the fact that it is, by however little, more accurate than square sampling. Any 96 subsequent modeling or analyses undertaken on simplified line data produced by the hexagonal algorithm will be based on data with less inherent systematic error than data produced by a square sampling process such as that of the Li-Openshaw raster-vector algorithm. Though the magnitude of this difference in error is not great, there is no reason why analysts can't employ a more accurate solution, particularly since it is no more difficult to employ. Even though analysts frequently opt to use the highest-detail data available, maintaining low error levels in generalized data is worthwhile: there always exist phenomena in geographic models that operate at smaller scales. When examining for these, analysts are wise to select geospatial data appropriate to their model’s scaling. Also, while “zooming in” on a line produced by the hexagonal quantization algorithm goes against the scale-specific spirit in which the method was conceived, it is likely to happen, given current paradigms in internet and mobile cartography. “Zooming in” on a vector line produced by the hexagonal quantization algorithm would yield a more accurate line position than doing the same on a line produced by the Li-Openshaw raster-vector algorithm, with the slight improvements in the hexagonal quantization algorithm becoming more and more visually appreciable as one increases map scale. Future Tessellation Variations One important possibility for investigation is the ability to refine fidelity by iterating through many possible placements of the tessellation (Figure 3.4) and selecting the placement that yields the lowest areal displacement for a given vertex clustering method. The Li-Openshaw raster-vector algorithm, as described by Li and Openshaw (1992), explicitly places the first raster cell centered on the first vertex of the input line, thus placing all other cells around this first one in defined locations. Li also suggests (2007, p. 154) that his algorithm could be implemented with a sliding raster grid following the input line, though he does not detail suggestions for 97 defining the amount of translation. A future branch of research on the hexagonal algorithm may, for example, place the tessellation in 100 different randomly-generated positions. This may be achieved by "jittering" the translation in x and y directions, each by some random value between 0 and 1, times half the tessera width. That calculation theoretically allows the tessellation to move throughout the full range of motion possible before it simply re-coincides with its initial position. Similarly, the hexagonal tessellation may be rotated through 60° (Figure 3.3). A simplification may be undertaken for each tessellation position, and the software implementing this may chose the simplification with the shortest Hausdroff distance as the final product. This same process may be undertaken with the Li-Openshaw raster-vector algorithm, allowing for shifting of the raster grid away from the position defined by the first vertex of the input line. In this way it may be possible to optimize the outputs of either algorithm, and compare the optimized fidelities. Related to the translation of the sampling tessellation is the idea of varying local tessera size according to local input line statistics. Local line statistics may include vertex frequency along the line, or neighborhood total angularity, among other possibilities. It may be possible to begin with a tessera size derived from a target map scale and resolution, and then expand or contract local hexagons in relation to local line statistics, in order to achieve locally-varying levels of simplification. This technique may also be useful in exaggeration procedures, identified by researchers as a map generalization operator distinct from line simplification. Future work on the hexagonal quantization algorithm may also involve alternative quantitative evaluation methods. Several researchers have noted the utility of fractal dimension in characterizing map lines (Buttenfield & McMaster, 1991; Normant & Tricot, 1993). It has been asserted that maintaining fractal dimension should be an objective for automated line simplification, since doing so would presumably retain the essential character of the line, and that algorithms can be evaluated on their performance in this regard (Muller, 1987). Future analyses 98 of the hexagonal quantization algorithm, then, will measure and compare fractal dimension of the input and output lines. Other metrics may also include more basic line characteristics, such as sinuosity and angularity. Repair of Line Self-Crossings While this research has produced occasional self-crossings for either algorithm implemented, a process for undoing these is currently under development. Because lines were permitted to place more than one output vertex in a tessera, they may have crossed themselves one or more times. A line self crossing is thought of as a “twist”. It is observed that a simple process of checking for intersecting line segments and reversing vertex connectivity sequences is able to undo these twists. A hypothetical post-processing algorithm would progress according to the process laid out in Figure 5.1. Future work may implement this process, examine whether it satisfactorily resolves self-crossings without creating spurious landscape features, and explore its application to the products of other algorithms, such as the Douglas-Peucker (1973) algorithm. Figure 5.1 - “Untwisting” line self-crossings. The routine iterates through all line segments, checking for intersections with other line segments. When one is found, the sequence of vertices starting from the second vertex of the first line segment until the first vertex of the second line segment is reversed. The process repeats from the beginning of the line, “untwisting” selfcrossings one at a time, until no more are detected. 99 General Summary The preceding work has detailed the invention and implementation of a new scalespecific line simplification algorithm, termed the hexagonal quantization algorithm. The development of this algorithm has demonstrated that scale-specificity in cartographic line simplification can be achieved objectively by applying basic sampling theory to map resolution. It has also been demonstrated that lines produced by the hexagonal quantization algorithm are more faithful to their input lines than those produced by a closely related algorithm, the LiOpenshaw raster-vector algorithm (Li & Openshaw, 1992). Appendix A Summary Table of All Sample Lines All lines from Canadian or U.S. rivers and shores, sampled from National Hydro Network (Canadian Council on Geomatics) or National Hydrography Dataset “high resolution” (USGS) datasets. All straight-line distances from end to end within 15 to 20 km. Thumbnails are individually reduced to fit. Line Thumbnail Geomorphological Type Alaskan Peninsula Ice-dominated rocky beach Baranof Island coast Ice-dominated rocky beach Bay of Fundy shore Tidal-dominated coast Black River Contorted river Cape Breton coast Rocky glacier-formed shore 101 Cape Cod coast Wave-dominated, depositional shore Cedar River Dendritic river Western Florida coast Sandy wave-dominated beach Gaspé Peninsula coast Rocky glacier-formed shore Humboldt River Contorted river Île Jésus, Laval shore Depositional river island shore Killiniq Island coast Ice-dominated rocky beach Klinaklini River Dendritic river Southeastern Labrador coast Ice-dominated rocky beach 102 Lake Ontario shore Wave-dominated lake shore Lake Superior shore Wave-dominated lake shore Southern Maine coast Rocky glacier-formed shore Mancos River Contorted river Northern Michigan shore Wave-dominated lake shore Mississippi Delta coast Aluvial river delta shore Myrtle Beach coast Sandy wave-dominated beach Southeastern Newfoundland coast Rocky glacier-formed shore Eastern Nova Scotia coast Rocky glacier-formed shore 103 Obion River Dendritic, meandering river Northern Oregon coast Wave-dominated sandy beach, some sea cliffs Pecatonica River Dendritic, meandering river Northeastern Prince Edward Island Rocky glacier-formed shore coast Potomac River shore Estuary shore Rio Grande Dammed, agriculturally-managed meandering river Saline River High-sediment, meandering river San Francisco coast Partly human-defined shore Suwannee River Meandering river 104 Sweetwater River Dendritic, high-sediment river Yukon River Dendritic river through mountainous region Appendix B Example Text Report from Software ::: Starting Program ::::::::::::::::::::::::::::::::::::::::::::: Reading input file: C:\Courses\Thesis\ThesisData\SampleLines2csv\NovaScotia_C.csv Input scale: Target scale: Tessera width: Vertex collapse method: Calculating Hausdorff distances: 10000 - 50000 250000 312.5 m midpoint 1st & last true ...HEXAGONS....................................... using output file: C:\Courses\Thesis\ThesisData\SampleLines2csvSimplified\NovaScotia_C_MpH_250k.csv Input vertices: 1190 Output vertices: 187 Output vertices are 15.714% of input. (84.286% decrease) ~~ Hausdorff Report ~~~~~~~~~~~~~~~ h(input to simplified) = 251.13 m h(simplified to input) = 73.59 m * H(input, simplified) = 251.13 m ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... Hexagons done! .................................. ...SQUARES....................................... using output file: C:\Courses\Thesis\ThesisData\SampleLines2csvSimplified\NovaScotia_C_MpS_250k.csv Input vertices: 1190 Output vertices: 183 Output vertices are 15.378% of input. (84.622% decrease) ~~ Hausdorff Report ~~~~~~~~~~~~~~~ h(input to simplified) = 253.9 m h(simplified to input) = 106.28 m * H(input, simplified) = 253.9 m ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... Squares done! ................................ ::: Ending Program ::::::::::::::::::::::::::::::::::::::::::::::: 106 References Akman, V., Franklin, W. R., Kankanhalli, M., & Narayanaswami, C. (1989). Geometric computing and uniform grid technique. Computer-Aided Design, 21(7), 410-420. Alt, H., Godau, M., Knauer, C., & Wenk, C. (2002). Computing the Hausdorff distance of geometric patterns and shapes. Discrete and Computational Geometry-The GoodmanPollack-Festschrift. Alt, H., & Guibas, L. J. (2000). Discrete geometric shapes: matching, interpolation, and approximation; a survey. In J.-R. Sack & J. Urrutia (Eds.), Handbook of Computational Geometry (pp. 121–153). Amsterdam: Elsevier Science B.V. Arkin, E. M., Chew, L. P., Huttenlocher, D. P., Kedem, K., & Mitchell, J. S. B. (1991). An efficiently computable metric for comparing polygonal shapes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 13(3), 209-216. Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61(3), 183-193. Ballard, D. H. (1981). Strip trees: a hierarchical representation for curves. Communications of the ACM, 24(5), 310-321. Bertin, J. (1983). Semiology of Graphics: University of Wisconsin Press. Birch, C. P. D., Oom, S. P., & Beecham, J. A. (2007). Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecological Modelling, 206(3-4), 347359. Bloch, M., & Harrower, M. (2008, 4 September, 2008). Mapshaper, from mapshaper.org Brassel, K., & Weibel, R. (1988). A review and conceptual framework of automated map generalization. International Journal of Geographical Information Science, 2(3), 229244. Brewer, C. A. (1996). Prediction of simultaneous contrast between map colors with Hunt's model of color appearance. Color Research and Application, 21(3), 221-235. Buchanan, B. G., & Duda, R. O. (1983). Principles of rule-based expert systems. Advances in Computers, 22, 163-216. Burghardt, D., & Cecconi, A. (2007). Mesh simplification for building typification. International Journal of Geographical Information Science, 21(3), 283-283. Buttenfield, B. P. (1985). Treatment of the cartographic line. Cartographica: The International Journal for Geographic Information and Geovisualization, 22(2), 1-26. Buttenfield, B. P. (1989). Scale-dependence and self-similarity in cartographic lines. Cartographica, 26(1), 79-100. Buttenfield, B. P. (1991). A rule for describing line feature geometry. In B. P. Buttenfield & R. B. McMaster (Eds.), Map Generalization: Making Rules for Knowledge Representation (pp. 150-239). Essex: Longman Scientific & Technical. Buttenfield, B. P., & McMaster, R. B. (Eds.). (1991). Map Generalization: Making Rules for Knowledge Representation. Essex: Longman Scientific and Technical. Carr, D. B., Olsen, A. R., & White, D. (1992). Hexagon mosaic maps for display of univariate and bivariate geographical data. Cartography and Geographic Information Science, 19(4), 228-236. Carstensen, L. W. (1990). Angularity and capture of the cartographic line during digital data entry. Cartography and Geographic Information Systems, 17(3), 209-224. Cartography, S. S. o. (1977). Cartographic generalization Cartographic Publication Series. Enshede, The Netherlands: ITC Cartography Department. 107 Cecconi, A. (2003). Integration of cartographic generalization and multi-scale databases for enhanced web mapping. Ph.D., Universität Zürich, Zürich. Retrieved from http://ecollection.ethbib.ethz.ch/show?type=extdiss&nr=6 Christaller, W. (1933). Die zentralen Orte in Suddeutschland. Jena: Gustav Fischer. Christensen, A. H. J. (2000). Line generalization by waterlining and medial-axis transformation. Successes and issues in an implementation of Perkal's proposal. The Cartographic Journal, 37(1), 19-28. Condat, L., Van De Ville, D., & Blu, T. (2005). Hexagonal versus orthogonal lattices: a new comparison using approximation theory. Paper presented at the IEEE International Conference on Image Processing. Cromley, R. G. (1991). Hierarchical methods of line simplification. Cartography and Geographic Information Science, 18(2), 125-131. Cromley, R. G. (1992). Principal axis line simplification. Computers & Geosciences, 18(8), 10031011. Cromley, R. G., & Campbell, G. M. (1992). Integrating quantitative and qualitative aspects of digital line simplification. The Cartographic Journal, 29(1), 25-30. Dalmau, D. S-C. (2004). Core techniques and algorithms in game programming: New Riders Publishing. Dent, B. (1972). A note on the importance of shape in cartogram communication. Journal of Geography, 71, 393-401. Douglas, D. H., & Peucker, T. K. (1973). Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica, 10(2), 112-122. Duff, M. J. B., Watson, D. M., Fountain, T. J., & Shaw, G. K. (1973). A cellular logic array for image processing. Pattern Recognition, 5, 229-247. Dutton, G. (1999). Scale, sinuosity, and point selection in digital line generalization. Cartography and Geographic Information Science, 26(1), 33-53. García, J. A., & Fdez-Valdivia, J. (1994). Boundary simplification in cartography preserving the different-scale shape features. Computers & Geosciences, 20(3), 349-368. Geomatics Canada. (2010). National Hydro Network Data Product Specifications Distribution Profile. Sherbrooke, Quebec: Her Majesty the Queen in Right of Canada, Department of Natural Resources. Retrieved from http://www.geobase.ca/doc/specs/pdf/GeoBase_NHN_Specs_EN.pdf. Graham, M. D. (1990). Comparison of three hexagonal tessellations through extraction of blood cell geometric features. Analytical and Quantitative Cytology and Histology, 12(1), 5672. Griffin, A. L., MacEachren, A. M., Hardisty, F., Steiner, E., & Li, B. (2006). A comparison of animated maps with static small-multiple maps for visually identifing space-time clusters. Annals of the Association of American Geographers, 96(4), 740-753. Hales, T. C. (2001). The honeycomb conjecture. Discrete and Computational Geometry, 25(1), 122. Hangouët, J. (1995). Computation of the Hausdorff distance between plane vector polylines. Paper presented at the AutoCarto12 Conference, Charlotte, North Carolina. Harrie, L., & Weibel, R. (2007). Modelling the overall process of generalisation. In W. A. Mackaness, A. Ruas & L. T. Sarjakoski (Eds.), Generalisation of Geographic Information: Cartographic Modelling and Applications (pp. 67-87). Elsevier. Hoppe, H. (1996). Progressive meshes. Paper presented at the ACM SIGGRAPH Conference. Huttenlocher, D., Klanderman, G., & Rucklidge, W. (1993). Comparing Images Using the Hausdorff Distance. IEEE Transactions on pattern analysis and machine intelligence, 15(9). 108 Iftekharuddin, K. M., & Karim, M. A. (1993). Acquisition of noise-free and noisy signal: effect of different staring focal-plane-array pixel geometry. Paper presented at the IEEE National Aerospace and Electronics Conference, Dayton, Ohio. Jenks, G. F. (1979). Thoughts on line generalization. Paper presented at the AutoCarto 4 Conference, Reston, Virgina. Jenks, G. F. (1989). Geographic logic in line generalization. Cartographica: The International Journal for Geographic Information and Geovisualization, 26(1), 27-42. Kamgar-Parsi, B., Kamgar-Parsi, B., & Sander, W. A., III. (1989). Quantization error in spatial sampling: comparison between square and hexagonal pixels. Paper presented at the Computer Vision and Pattern Recognition Conference. Kazemi, S., Lim, S., & Paik, H. (2009). Generalisation expert system (GES): a knowledge-based approach for generalisation of line and polyline spatial datasets. Paper presented at the Surveying & Spatial Sciences Institute Biennial International Conference, Adelaide, South Australia. Knauer, C., Löffler, M., Scherfenberg, M., & Wolle, T. (2009). The directed Hausdorff distance between imprecise point sets. In Y. Dong, D.-Z. Du & O. Ibarra (Eds.), Algorithms and Computation (Vol. 5878, pp. 720-729). Berlin & Heidelberg: Springer. Lang, T. (1969). Rules for robot draughtsmen. The Geographical Magazine, 42(1), 50-51. Lecordix, F., Plazanet, C., & Lagrange, J. P. (1997). A platform for research in generalization: application to caricature. GeoInformatica, 1(2), 161-182. Li, Z. (1996). Transformation of spatial representation in scale dimension: a new paradigm for digital generalization of spatial data. International Archives of Photogrammetry and Remote Sensing, 31, 453-458. Li, Z. (2007). Algorithmic Foundation of Multi-Scale Spatial Representation. Boca Raton, London, New York: CRC Press. Li, Z., & Openshaw, S. (1990). A natural principle of objective generalization of digital map data and other spatial data. RRL Research Report: CURDS, University of Newcastle upon Tyne. Li, Z., & Openshaw, S. (1992). Algorithms for automated line generalization based on a natural principle of objective generalization. International Journal of Geographical Information Systems, 6(5), 373-389. Li, Z., & Openshaw, S. (1993). A natural principle for the objective generalization of digital maps. Cartography and Geographic Information Science, 20(1), 19-29. Li, Z., & Su, B. (1995). From phenomena to essence: envisioning the nature of digital map generalisation. The Cartographic Journal, 32(1), 45-47. Llanas, B. (2005). Efficient computation of the Hausdorff distance between polytopes by exterior random covering. Computational Optimization and Applications, 30, 161-194. Mandelbrot, B. (1982). The Fractal Geometry of Nature: San Francisco: Freeman. Marino, J. (1979). Identification of characteristic points along naturally occurring lines: an empirical study. The Canadian Cartographer, 16(1), 70-80. McMaster. (1986). A statistical analysis of mathematical measures for linear simplifcation. The American Cartographer, 13(2), 103-116. McMaster, R. B. (1987). Automated line generalization. Cartographica, 24(2), 74-111. McMaster, R. B. & Shea, K. S. (1988). Cartographic generalization in a digital environment: a framework for implementation in a geographic information system. Paper presented at the GIS/LIS'88 Conference, San Antonio, Texas. McMaster, R. B. & Shea, K. S. (1992). Generalization in digital cartography. Washington, D.C.: Association of American Geographers. 109 McMaster, R. B., & Veregin, H. (1991). Visualizing cartographic generalization. Paper presented at the AutoCarto 10 Conference, Baltimore, Maryland. Meer, P., Sher, C. A., & Rosenfeld, A. (1990). The chain pyramid: hierarchical countour processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(4), 363-376. Mendenhall, W., Beaver, R. J., & Beaver, B. M. (2006). Introduction to Probability and Statistics (12 ed.). Belmont, California: Duxbury, Thomson Brooks/Cole. Mersereau, R. M. (1978). Two-dimensional signal processing from hexagonal rasters. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing. Mersereau, R. M. (1979). The processing of hexagonally-sampled two-dimensional signals. Proceedings of the IEEE, 67(6), 930-949. Muller, J-C. (1987). Fractal and automated line generalization. The Cartographic Journal, 24(1), 27-34. Muller, J. C. (1990). The removal of spatial conflicts in line generalization. Cartography and Geographic Information Science, 17(2), 141-149. Nell, A. L. (1989). Hexagonal image processing. Paper presented at the Southern African Conference on Communications and Signal Processing, Stellenbosch, South Africa. Nickerson, B. G. (1988). Automated cartographic generalization for linear features. Cartographica, 25(3), 15-66. Normant, F., & Tricot, C. (1993). Fractal simplification of lines using convex hulls. Geographical Analysis, 25(2), 118-129. Nyquist, H. (1928). Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers, 47(2), 617-644. Perkal, J. (1965). An attempt at objective generalization. Michigan Inter-University Community of Mathematical Geographers, Discussion Paper 10. Peucker, T. (1976). A theory of the cartographic line. International yearbook of cartography, 16, 134-143. Peuquet, D. J. (2002). Representations of space and time. New York: Guilford Press. Plazanet, C. (1995). Measurement, characterization and classification for automated line feature generalization. Paper presented at the AutoCarto Conference, Charlotte, North Carolina. Puu, T. (2005). On the genesis of hexagonal shapes. Networks and Spatial Economics, 5(1), 5-20. Ramer, U. (1972). An iterative procedure for the polygonal approximation of plane curves. Computer Graphics and Image Processing, 1, 244-256. Raposo, P. (2010). Piece by Piece: A Method of Cartographic Line Generalization Using Regular Hexagonal Tessellation. Paper presented at the ASPRS/CaGIS 2010 Fall Specialty Conference, AutoCarto 2010, Orlando, Florida. Ratajski, L. (1967). Phénomène des points de généralisation. International Yearbook of Cartography, 7, 143-152. Robinson, A. H., Morrison, J. J., Muehrcke, P. C., Kimerling, A. J., & Guptill, S. C. (1995). Elements of Cartography (6 ed.): Wiley. Rosin, P. L. (1992). Representing curves at their natural scales. Pattern Recognition, 25(11), 1315-1325. Rossignac, J. (2004). Surface Simplification and 3D Geometry Compression. In J. E. Goodman & J. O'Rourke (Eds.), Handbook of Discrete and Computational Geometry (2 ed.). Boca Raton, Florida: Chapman & Hall/CRC. Rossignac, J., & Borrel, P. (1993). Multi-resolution 3D approximations for rendering complex scenes Geometric Modeling in Computer Graphics (pp. 445-465). Berlin: SpringerVerlag. 110 Ruas, A. (2002). Les problématiques de l'automatisation de la généralisation. In A. Ruas (Ed.), Généralisation et représentation multiple (pp. 75-90). Hermès. Rucklidge, W. (1996). Efficient visual recognition using the Hausdorff distance. Berlin: Springer Verlag. Rucklidge, W. (1997). Efficiently locating objects using the Hausdorff distance. International Journal of Computer Vision, 24(3), 251-270. Saalfeld, A. (1999). Topologically Consistent Line Simplification with the Douglas-Peucker Algorithm. Cartography and Geographic Information Science, 26(1), 7-18. Sarjakoski, L. T. (2007). Conceptual Models of Generalisation and Multiple Representation. In W. A. Mackaness, A. Ruas & L. T. Sarjakoski (Eds.), Generalisation of Geographic Information: Cartographic Modelling and Applications (pp. 11-36). Singapore: Elsevier, on behalf of the International Cartographic Association. Savary, L., & Zeitouni, K. (2005). Automated linear geometric conflation for spatial data warehouse integration process. Paper presented at the Associated Geographic Information Laboratories Europe (AGILE) Conference, Estoril, Portugal. Scholten, D. K., & Wilson, S. G. (1983). Chain coding with a hexagonal lattice. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(5), 526-533. Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379-423. Shea, K. S., & McMaster, R. B. (1989). Cartographic generalization in a digital environment: When and how to generalize. Paper presented at the AutoCarto 9 Conference, Baltimore, Maryland. Simley, J. D., & Carswell Jr., W. J. (2009). The National Map - Hydrography: U.S. Geological Survey Fact Sheet. (3054). Retrieved from http://pubs.usgs.gov/fs/2009/3054/pdf/FS2009-3054.pdf. Skopeliti, A., & Tsoulos, L. (2001). A knowledge based approach for the generalization of linear features. Paper presented at the International Cartographic Conference, Beijing, China. Speiss, E. (1988). Map compilation. In R. W. Anson (Ed.), Basic Cartography. London: Elsevier. Stoter, J., Smaalen, J. v., Bakkerand, N., & Hardy, P. (2009). Specifying map requirements for automated generalization of topographic data. The Cartographic Journal, 46(3), 214-227. Thapa, K. (1988a). Automatic line generalization using zero-crossings. Photogrammetric Engineering and Remote Sensing, 54, 511-517. Thapa, K. (1988b). Critical points detection and automatic line generalisation in raster data using zero-crossings. The Cartographic Journal, 25(1), 58-68. Tobler, W. R. (1987). Measuring spatial resolution. Paper presented at the International Workshop On Geographic Information Systems, Beijing, China. Töpfer, F., & Pillewizer, W. (1966). The principles of selection. The Cartographic Journal, 3(1), 10-16. Trenhaile, A. S. (2007). Geomorphology: a Canadian Perspective (3 ed.). Toronto: Oxford University Press. Unger, S. H. (1958). A computer oriented toward spatial problems. Proceedings of the IRE, 46(10), 1744-1750. Van Der Poorten, P. M., & Jones, C. B. (2002). Characterisation and generalisation of cartographic lines using Delaunay triangulation. International Journal of Geographical Information Science, 16(8), 773 - 794. Veltkamp, R. C. (2001). Shape matching: similarity measures and algorithms. Paper presented at the International Conference on Shape Modeling & Applications, Genova, Italy. Veltkamp, R. C., & Hagedoorn, M. (2000). Shape similiarity measures, properties, and constructions. Paper presented at the 4th International VISUAL 2000 Conference. 111 Veregin, H. (1999). Line simplification, geometric distortion, and positional error. Cartographica, 36(1), 25-39. Veregin, H. (2000). Quantifying positional error induced by line simplification. International Journal of Geographical Information Science, 14(2), 113-130. Visvalingam, M., & Whyatt, J. (1993). Line generalisation by repeated elimination of points. The Cartographic Journal, 30(1), 46-51. Wang, Z., & Muller, J. (1998). Line generalization based on analysis of shape characteristics. Cartography and Geographic Information Science, 25(1), 3-15. Weed, J., & Polge, R. (1984). An efficient implementation of a hexagonal FFT. Paper presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) '84, San Diego, California. Weibel, R. (1997). Generalization of spatial data: Principles and selected algorithms. In M. van Kreveld, J. Nievergelt, T. Roos & P. Widmayer (Eds.), Algorithmic Foundations of Geographic Information Systems (Vol. 1340, pp. 99-152): Springer Berlin / Heidelberg. White, E. R. (1985). Assessment of line-generalization algorithms using characteristic points. Cartography and Geographic Information Science, 12(1), 17-28. Yajima, S., Goodsell, J. L., Ichida, T., & Hiraishi, H. (1981). Data Compression of the Kanji Character Patterns Digitized on the Hexagonal Mesh. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3(2), 221-230. Yang, S.-K., & Chuang, J.-H. (2003). Material-discontinuity preserving progressive mesh using vertex-collapsing simplification. Virtual Reality, 6(4), 205-216. Zhan, B., & Buttenfield, B. (1996). Multi-scale representation of a digital line. Cartography and Geographic Information Science, 23(4), 206-228.
© Copyright 2026 Paperzz