CH4-Mining-Spatial

Overview of Mining Spatial Data
7/13/2017
1
Mining Spatial Data

Mining spatial databases and data warehouses
7/13/2017

Spatial DBMS

Spatial Data Warehousing

Spatial Data Mining

Spatiotemporal Data Mining
2
Generalizing Spatial

Spatial data:
 Generalize detailed geographic points into
clustered regions, such as business,
residential, industrial, or agricultural areas,
according to land usage

7/13/2017
Require the merge of a set of geographic
areas by spatial operations
3
What Is a Spatial Database System?

Geometric, geographic or spatial data: space-related data

Example: Geographic space (2-D abstraction of earth surface),
VLSI design, model of human brain, 3-D space representing the
arrangement of chains of protein molecule.

Spatial database system vs. image database systems.

Image database system: handling digital raster image (e.g.,
satellite sensing, computer topography), may also contain
techniques for object analysis and extraction from images and
some spatial database functionality.

Spatial (geometric, geographic) database system: handling
objects in space that have identity and well-defined extents,
locations, and relationships.
7/13/2017
4
GIS (Geographic Information System)

GIS (Geographic Information System)


7/13/2017
Analysis and visualization of geographic data
Common analysis functions of GIS

Search (thematic search, search by region)

Location analysis (buffer, corridor, overlay)

Terrain analysis (slope/aspect, drainage network)

Flow analysis (connectivity, shortest path)

Distribution (nearest neighbor, proximity, change detection)

Spatial analysis/statistics (pattern, centrality, similarity, topology)

Measurements (distance, perimeter, shape, adjacency, direction)
5
Spatial DBMS (SDBMS)

SDBMS is a software system that
supports spatial data models, spatial ADTs,
and a query language supporting them

supports spatial indexing, spatial operations
efficiently, and query optimization



can work with an underlying DBMS
Examples

Oracle Spatial Data Catridge

ESRI Spatial Data Engine
7/13/2017
6
Modeling Spatial Objects

What needs to be represented?

Two important alternative views

Single objects: distinct entities arranged in space each
of which has its own geometric description


modeling cities, forests, rivers
Spatially related collection of objects: describe space
itself (about every point in space)

modeling land use, partition of a country into
districts
7/13/2017
7
Modeling Single Objects: Point, Line and Region

Point: location only but not extent

Line (or a curve usually represented by a polyline, a
sequence of line segment):

moving through space, or connections in space (roads,
rivers, cables, etc.)

Region:

Something having extent in 2D-space (country, lake,
park). It may have a hole or consist of several disjoint
pieces.
7/13/2017
8
Modeling Spatially Related Collection of Objects

Modeling spatially related collection of objects: plane partitions and
networks.

A partition: a set of region objects that are required to be disjoint
(e.g., a thematic map). There exist often pairs of objects with a
common boundary (adjacency relationship).

A network: a graph embedded into the plane, consisting of a set of
point objects, forming its nodes, and a set of line objects describing
the geometry of the edges, e.g., highways. rivers, power supply
lines.

Other interested spatially related collection of objects: nested
partitions, or a digital terrain (elevation) model.
7/13/2017
9
Spatial Data Types and Models
Field-based model: raster
data
 framework: partitioning
of space
 Object-based model: vector
model
 point, line, polygon,
Objects, Attributes

7/13/2017
10
Spatial Query Language
Spatial query language
 Spatial data types, e.g. point, line segment, polygon, …
 Spatial operations, e.g. overlap, distance, nearest
neighbor, …
 Callable from a query language (e.g. SQL3) of
underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300
 Standards
 SQL3 (a.k.a. SQL 1999) is a standard for query
languages

7/13/2017
11
File Organization and Indices
SDBMS: Dataset is in the secondary storage, e.g. disk
 Space Filling Curves: An ordering on the locations in a
multi-dimensional space
 Linearize a multi-dimensional space
 Helps search efficiently

7/13/2017
12
Spatial Query Optimization
A spatial operation can be processed using
different strategies

Computation cost of each strategy depends on
many parameters


Query optimization is the process of

ordering operations in a query and

selecting efficient strategy for each operation

based on the details of a given dataset
7/13/2017
13
Spatial Data Warehousing

Spatial data warehouse: Integrated, subject-oriented, time-variant,
and nonvolatile spatial data repository

Spatial data integration: a big issue

Structure-specific formats (raster- vs. vector-based, OO vs.
relational models, different storage and indexing, etc.)


Vendor-specific formats (ESRI, MapInfo, Integraph, IDRISI, etc.)

Geo-specific formats (geographic vs. equal area projection, etc.)
Spatial data cube: multidimensional spatial database

7/13/2017
Both dimensions and measures may contain spatial components
14
Dimensions and Measures in Spatial
Data Warehouse

Dimensions



7/13/2017
non-spatial
 e.g. “25-30 degrees”
generalizes to“hot”
(both are strings)
spatial-to-nonspatial
 e.g. Seattle generalizes
to description “Pacific
Northwest” (as a string)
spatial-to-spatial
 e.g. Seattle generalizes
to Pacific Northwest (as
a spatial region)

Measures


numerical (e.g. monthly revenue
of a region)

distributive (e.g. count, sum)

algebraic (e.g. average)

holistic (e.g. median, rank)
spatial

collection of spatial pointers
(e.g. pointers to all regions
with temperature of 25-30
degrees in July)
15
Spatial-to-Spatial Generalization


7/13/2017
Generalize detailed
geographic points into
clustered regions, such as
businesses, residential,
industrial, or agricultural
areas, according to land
usage
Dissolve
Requires the merging of a
set of geographic areas by
spatial operations
Intersect
Merge
Clip
Union
16
Example: British Columbia Weather
Pattern Analysis

Input




Output


A map that reveals patterns: merged (similar) regions
Goals




A map with about 3,000 weather probes scattered in B.C.
Daily data for temperature, precipitation, wind velocity, etc.
Data warehouse using star schema
Interactive analysis (drill-down, slice, dice, pivot, roll-up)
Fast response time
Minimizing storage space used
Challenge

7/13/2017
A merged region may contain hundreds of “primitive” regions
(polygons)
17
Star Schema of the BC Weather Warehouse

Spatial data warehouse
 Dimensions
 region_name
 time
 temperature
 precipitation

Measurements
 region_map
 area
 count
Dimension table
7/13/2017
Fact table
18
Spatial Association Analysis

Spatial association rule: A  B [s%, c%]



A and B are sets of spatial or non-spatial predicates

Topological relations: intersects, overlaps, disjoint, etc.

Spatial orientations: left_of, west_of, under, etc.

Distance information: close_to, within_distance, etc.
s% is the support and c% is the confidence of the rule
Examples
1) is_a(x, large_town) ^ intersect(x, highway)  adjacent_to(x, water)
[7%, 85%]
2) What kinds of objects are typically located close to golf courses?
7/13/2017
19
Spatial Autocorrelation

Spatial data tends to be highly self-correlated

Example: Neighborhood, Temperature

Items in a traditional data are independent of each
other, whereas properties of locations in a map are
often “auto-correlated”.

First law of geography:
“Everything is related to everything, but nearby things are
more related than distant things.”
7/13/2017
20
Spatial Trend Analysis

Function

Detect changes and trends along a spatial dimension

Study the trend of non-spatial or spatial data
changing with space

Application examples

Observe the trend of changes of the climate or
vegetation with increasing distance from an ocean

Crime rate or unemployment rate change with regard
to city geo-distribution
7/13/2017
21
Spatial Cluster Analysis


Mining clusters—k-means, k-medoids,
hierarchical, density-based, etc.
Analysis of distinct features of the
clusters
7/13/2017
22
Constraints-Based Clustering

Constraints on individual objects


Clustering parameters as constraints


K-means, density-based: radius, min-# of points
Constraints specified on clusters using SQL
aggregates


Simple selection of relevant objects before clustering
Sum of the profits in each cluster > $1 million
Constraints imposed by physical obstacles

7/13/2017
Clustering with obstructed distance
23
Constrained Clustering: Planning ATM
Locations
C2
C3
C1
River
Mountain
Spatial data with obstacles
7/13/2017
C4
Clustering without taking
obstacles into consideration
24
Mining Spatiotemporal Data


Spatiotemporal data
 Data has spatial extensions and changes with
time
 Ex: Forest fire, moving objects, hurricane &
earthquakes
Automatic anomaly detection in massive moving
objects
 Moving objects are ubiquitous: GPS, radar, etc.
 Ex: Maritime vessel surveillance
 Problem: Automatic anomaly detection
7/13/2017
25
7/13/2017
26