PP slides (with comments)

Representation of
spatial data
GIS thematic layers, raster and vector,
conversion, subdivision representation,
continuous data: contours, DEMs, TINs
Thematic map layers
• Separate storage of data according to
theme: map layers (or data layers)
• GIS typically use tens to hundreds of
map layers
• For example: municipality borders, land
use, cadastral boundaries, water pipes,
churches, etc.
Example
map layers
Census data, 1995
(U.S.A.)
Geometry, topology and
attributes
• Geometry: coordinates
• Topology: adjacency relations of objects
• Attributes: properties, values
Example: Country map of South America
Geometry: coordinates of the borders
Topology: which countries border which
Attributes: names of countries, population, etc.
Representation of geometry
• Two main approaches:
raster and vector
• Can also be mixed in a
GIS, any map layer
• Conversion raster-vector
and vice versa possible
• Representation depends
on type of data, way of
acquisition, desired
operations, etc.
Raster structure
• Division of space into equal-size cells (squares,
pixels)
• Theme gives cells a value (nominal, ordinal,
interval, ratio, vector, …)
• Cells should not contain any further spatial
information (more detail)
Data in raster form
Point object in
raster form
Line object in
raster form
Plane object in
raster form
Raster maps
Raster: pros and cons
• Simple structure
• Simple operations
• Obtained after scanning,
remote sensing
• Less suitable for point and
line objects: representation
does not follow intuition
• Network analysis difficult
• Not adaptive: no difference
in detail possible in
different regions
• Either expensive in
memory, or little precision
• Not obtained after digitizing
Raster: memory reduction
• Run-length encoding: no 2-dim array but coding
start pixel with value and length of run
• Block encoding: 2-dim version
• Disadvantage: makes structure and operations
much more complex
(34,67) forest 9
(34,67) forest 4,6
Vector structure
• Objects stored as points, lines and areas
• Points have coordinates; lines connect points;
areas are delimited by lines
• Attributes are stored with the objects (point, line
or areal)
Vector: pros and cons
• Elegant structure; fits with
both point, line and areal
objects
• Small storage consumption
• Precise
• Adaptive: additional
control points possible
• Network and cluster
analysis possible
• Obtained after digitizing
• Relatively complex
• Map overlay and buffer
computation complex
Vector representation of a region
• Not necessarily
simply-connected:
– NL has islands
– NL has holes
(Baarle-Nassau /
Baarle-Hertog);
there are even
regions in these
holes
Representation of subdivisions
Subdivisions: spaghetti model
• Every chain is
represented by a list
with coordinate pairs
• Split nodes are doubly
stored
• Areas are not present
explicitly
C1
C2
C5
C4
C3
C6
C1: (..,..), (..,..), (..,..), ...
C2: (..,..), (..,..), (..,..), ...
C3: (..,..), (..,..), (..,..), ...
Subdivisions: polygon ring
structure
• Every area is represented
by a list with coordinate
pairs
• Control points are doubly
stored
• Neighbor areas are
difficult to determine
• Consistency is difficult to
maintain
P1
P2
P3
P1: (..,..), (..,..), (..,..), ...
P2: (..,..), (..,..), (..,..), ...
P3: (..,..), (..,..), (..,..), ...
Subdivisions: topological
structure (node-link structure)
• Nodes are objects with
coordinates
• Edges are connections
of nodes
• Sequences of edges
along polygon
boundaries form cycles
• Polygons are objects
that can access their
boundaries
Doubly-connected edge list
Subdivisions: topological
structure
• Edges are split into
directed half-edges
• Half-edges have
pointers to
– Twin half-edge
– Origin vertex
– Next and Prev half-edges
of incident polygon
– Incident polygon
• Polygons have pointers
to half-edges, one in
each bounding cycle
Origin
polygon
Twin
Prev
Next
polygon
Subdivisions: topological chain
structure
• Splitting nodes are
objects with coordinates
• Chains are connections
between splitting nodes
and contain zero or more
nodes with coordinates
• Sequences of chains along
polygon boundaries form
half-chains
cycles
• Polygons are objects that
can access their
Doubly-connected chain list
boundaries
Vector structures
Memory
Duplication
Polygon
retrieve
Topology
retrieve
Spaghetti
++
+
--
-
Polygon ring
-
--
++
-
DC edge list
--
++
-
+
DC chain list
++
++
+
++
Raster-vector conversion
E.g. for data integration
• Vector-to-raster: Like in computer graphics:
scan-conversion of lines, etc.
• Raster-to-vector: Consider pixel sides
between pixels with different values as
boundary and put in vector representation
 Thinning, line simplification
Thinning
Raster-vector
conversion
Thinning
Line simplification
• Douglas-Peucker algorithm from 1973
• Input: chain p1, …, pn and error 

p1
pn
DP-algorithm

• Draw line segment between first and last point
• If all points in between are within error: ready
• Otherwise, determine farthest point and recursively continue
on the part until farthest point and the part after farthest point
DP-algorithm
DP-standard(i, j, )
Determine farthest point pk between pi and pj
If distance(pk, pi pj) > 
then DP-standard(i, k, )
DP-standard(k, j, )
Return the concatenation of
the simplifications






Properties of the DP-algorithm
• DP-algorithm does not minimize the number of
points in the simplification


DP-algorithm

Optimal
Properties of the DP-algorithm
• Determining farthest point takes O(n) time
• Whole algorithm takes
T(n) = T(m) + T(n-m+1) + O(n),
T(2) = O(1) time,
splitting in m and n-m+1 points
• “Fair” split gives O(n log n) time
• Worst case gives quadratic time
Properties of the DP-algorithm
• DP-algorithm may give self-intersections in the
output

Solution: test output for self-intersections
and continue adding control points if necessary
Improved DP-algorithm
DP-improved(i, j, )
Simp = DP-standard(i, j, )
V = set of intersecting segments of Simp
Repeat
For all segments s  V: Refine(s) in Simp;
do 1 refinement à la DP by adding the
farthest point, giving a new Simp
V = set of intersecting segments of Simp
Until V is empty
Continuous data representation
Digital Elevation Model (DEM)
• Data on interval or ratio measurement scale
• Data values of points near by will usually be not
very different
• Representation is necessarily an approximation:
finite representation of information with infinite
detail
• Raster (1x) or vector (2x)
Elevation models
Raster
Vector
21
20
21 20
20 19
Vector
15
25
10
10
(Elevation) grid
Contour line
model
Triangulation
(TIN; triangulated
irregular network)
Grid elevation model
TIN elevation model
Elevation models
• Contour model well-suited for visualisation,
not for representation or storage
• Interpretations grid:
- elevation whole cel: not a continuous model
- elevation middle cel: interpolation needed;
how?
• Advantage grid: simple storage, operations
simple too
• Advantage TIN: more efficient in storage,
adaptive
Interpolation for grid
18
20
18
4
18
20
18
18
22
18
22
Linear interpolation;
saddle point problem
22
20+18+18+22
20
= 19.5
20
18
20
18
18
22
18
22
Linear
Non-linear
interpolation;
interpolation
additional point
Topological TIN structure
• With explicit vertex and triangle representation
t2
w
t1
t1
t3
t2
t
t
v
u
t3
u
v
w
x, y-coordinates and elevation
Topological TIN structure
• With explicit vertex and triangle representation
t2
w
t1
t1
t3
t2
t
t
v
u
t3
u
v
w
Because t1 has pointers to two the same vertices as t,
we can determine their shared edge, even though it is
not represented explicitly
Topological TIN structure
• With explicit vertex and triangle representation
w
t1
w
t2
t1
t
v
u
t3
t2
t
v
u
t3
Topological TIN structure
• Alternatively, edges have an explicit representation
too
w
t1
e1
u
t
e3
t3
t2
w
t1
e2
e1
v
u
e2
t
e3
v
Summary representation
• Objects have geometry and attributes, at least
the attributes are in a database
• Geometry can be stored in raster or vector
form; each has advantages and disadvantages
• Important geometric types of representations
are those for subdivisions and for elevation
models
• For subdivisions, the doubly-connected chain list
is the most suitable structure
• For elevation models, grids or TINs are most
useful