Data Sources, Data Input and Data Quality

Babu Ram Dawadi
1
Major data feeds to GIS
 GIS Systems Employ a Wide Range of Data Sources
 Most GIS projects has to rely almost exclusively upon
data available only in printed or "paper" form.
 Much of the data available for use is still published on
paper, but a great deal of information is now
distributed in digital formats
2
Private Suppliers (Commercial Data)
 Commercial mapmaking firms are among the largest
providers, but other firms have for years supplied detailed
demographic and economic information, such as data on
retail trade and marketing trends.
 Some of this information can be quite expensive to
purchase
 Copyright/licensing restriction play roles
 Many software vendors earn a substantial income by
repackaging and selling data in the proprietary forms used
by their software products.
3
Questions regarding data sources














Where did it come from?
In what medium was it originally produced?
What is the area coverage of the data?
To what map scale was the data digitized?
What projection, coordinate system, and datum were used in
maps?
What was the density of observations used for its compilation?
How accurate are positional and attribute features?
Does the data seem logical and consistent?
Do cartographic representations look "clean?"
Is the data relevant to the project at hand?
In what format is the data kept?
How was the data checked?
Why was the data compiled?
What is the reliability of the provider?
4
Source: Map
 Marks on a paper that stands for definable things on
the earth's surface.
 A representation usually on a flat surface, of the whole
or a part of an area
 Any concrete or abstract image of the distributions and
features that occur on or near the surface of the earth
or other bodies
 Map Resolution: Refers to how accurately the
location and shape of the map features can be
depicted for a given map scale
5
Maps gain value in three ways
 As a way of recording and storing information:
Governments, business, and society must store large
quantities of information about the environment and the
location of natural resources, capital assess, and people.
 As a mean of analyzing distributions and spatial
patterns: Maps let us recognize spatial distribution and
relationships and make it possible for us to visualize and
hence conceptualize patterns and processes that operate
through space.
 As a method of presenting information and
communication findings: Maps allow us to convey
information and findings that are difficult to express
verbally.
6
Virtual Maps vs. Real Maps
 Real map: A hard copy or conventional map.
 Virtual map: Information that can be converted into a
real map, i.e. information on a computer screen,
mental images, field information, notes, and remote
sensing information.
 Map Features
 Point
 Line
 Area
7
Elements of Map
 Scale: The extent of the reduction necessary to put a
proportion of the earth's surface on a sheet of paper.
 Numeric or ratio scales: 1:24,000 1/24,000 both are the
same, this means that one inch on a map = 24,000
inches on the ground.
 Verbal: 1 inch = 100 feet.
 Graphic or Bar: Rake scale or some other graphical
representation
 Direction
 Explanation (Legend)
8
9
Global Positioning System
 GPS provides specially coded satellite signals that can
be processed in a GPS receiver, enabling the receiver to
compute position, velocity and time.
 GPS is funded by and controlled by the U. S.
Department of Defense (DOD). While there are many
thousands of civil users of GPS world-wide, the system
was designed for and is operated by the U. S. military
 Four GPS, satellite signals are used to compute
positions in three dimensions and the time offset in
the receiver clock.
10
GPS: Space Segment
 The Space Segment of the system consists of the GPS
satellites. These space vehicles (SVs) send radio signals
from space
 The nominal GPS Operational Constellation consists
of 24 satellites that orbit the earth in 24 hours
 There are often more than 24 operational satellites as
new ones are launched to replace older satellites
 The orbit altitude is such that the satellites repeat the
same track and configuration over any point
approximately each 24 hours (4 minutes earlier each
day)
11
Space Segment
 The nominal GPS Operational
Constellation consists of 24
satellites that orbit the earth in 24
hours. There are often more than
24 operational satellites as new
ones are launched to replace older
satellites. The satellite orbits repeat
almost the same ground track (as
the earth turns beneath them) once
each day. The orbit altitude is such
that the satellites repeat the same
track and configuration over any
point approximately each 24 hours
(4 minutes earlier each day). There
are six orbital planes, with
nominally four SVs (Satellite
Vehicles) in each, equally spaced
(60 degrees apart), and inclined at
about fifty-five degrees with
respect to the equatorial plane.
This constellation provides the user
with between five and eight SVs
visible from any point on the earth.
12
Space Segment Contd..
 There are six orbital planes (with nominally four SVs in
each), equally spaced (60 degrees apart), and inclined
at about fifty-five degrees with respect to the
equatorial plane.
 This constellation provides the user with between five
and eight SVs visible from any point on the earth
13
Control Segment
 The Control Segment consists of a system of tracking
stations located around the world.
14
User Segment
 The GPS User Segment consists of the GPS receivers
and the user community
 GPS receivers convert SV signals into position, velocity,
and time estimates
 Four satellites are required to compute the four
dimensions of X, Y, Z (position) and Time
15
GPS Data
 The GPS Navigation Message consists of time-tagged data




bits marking the time of transmission of each sub frame at
the time they are transmitted by the SV.
A data bit frame consists of 1500 bits divided into five 300bit sub frames. A data frame is transmitted every thirty
seconds.
Three six-second sub frames contain orbital and clock data.
SV Clock corrections are sent in sub frame one and precise
SV orbital data sets for the transmitting SV are sent in sub
frames two and three.
Sub frames four and five are used to transmit different
pages of system data.
An entire set of twenty-five frames (125 sub frames) makes
up the complete Navigation Message that is sent over a 12.5
minute period.
16
 Data bit sub frames (300 bits transmitted over six
seconds) contain parity bits that allow for data
checking and limited error correction.
 Clock data parameters describe the SV clock and its
relationship to GPS time.
Satellite 1
Satellite 2 (X2,
Y2, Z2)
(X1, Y1, Z1)
R1
R2
Satellite 3
Time
(X3, Y3, Z3)
R3
R4
GPS Receiving
Station (Xr, Yr, Zr)
17
Interruptions to the Satellite
 There are some factors that can affect the satellites
performance and job to relay the data to the
receivers such as:
 Ionosphere and Troposphere Delays
 Signal Multipath
 Receiver Clock Errors
 Orbital Errors
 Number of Satellites Visible
 Satellite Geometry/Shading
 Intentional Degradation of the Satellite Signal
http://www8.garmin.com/aboutGPS/
18
Interruptions Continued…
 Ionosphere and
Troposphere:
 Signal that slows down in
transition through the
atmosphere
 Signal Multi-path:
 Affected by things in the
surrounding area (such as
rocky mountains or
buildings)
 Receiver Clock:
 Clocks do not match up on
the receiver and the satellite
 Number of Visible
Satellites:
 The more the better
 Satellite
Geometry/Shading:
 Need to be spaced properly;
line/tight group = bad signals
 Intentional Degradation of
Satellite Signal:
 Specific for the Military use
but affects the civilian
populations use of GPS
19
Other Global Navigation Satellite Systems (GNSS)
• GLONASS
– Russian Federation
– (24) Satellites
• Galileo
– European Union
– (27+3) Satellites
• Compass
– China
– (27 MEO+3IGSO+5GEO) Satellites
• Regional Constellation
– Indian Regional Navigational Satellite
System (IRNSS) (7)
– Quasi-Zenith Satellite System (QZSS) (Japan) (4)
20
Satellite Navigation Orbits
Comparison
21
Remote Sensing
 The term remote sensing was coined by
geographers in the office of Naval Research of the
United States in the 1960s to refer to the
acquisition of information about an object without
physical contact
 Remote Sensing is the science and art of acquiring
information (spectral, spatial, temporal) about
material objects, area, or phenomenon, without
coming into physical contact with the objects, or
area, or phenomenon under investigation.
22
 Electromagnetic waves are radiated through space.
When the energy encounters an object, even a very
tiny one like a molecule of air, one of three reactions
occurs
 The radiation will either be reflected off the object,
absorbed by the object, or transmitted through the object
 In remote sensing, information transfer is
accomplished by use of electromagnetic radiation
(EMR).
 EMR is a form of energy that reveals its presence by
the observable effects it produces when it strikes the
matter.
23
 The total amount of radiation that strikes an object is
referred to as the incident radiation, and is equal to:
 Reflected radiation + absorbed radiation +
transmitted radiation
 In remote sensing, we are largely concerned with
REFLECTED RADIATION
 This is the radiation that causes our eyes to see colors,
causes infrared film to record vegetation, and allows
radar images of the earth to be created.
24
Types of Remote Sensing
 In respect to the type of Energy Resources:
Passive Remote Sensing: Makes use of sensors that detect the
reflected or emitted electro-magnetic radiation from natural sources.
Active remote Sensing: Makes use of sensors that detect reflected
responses from objects that are irradiated from artificially-generated
energy sources, such as radar.
 In respect to Wavelength Regions:
Remote Sensing is classified into three types in respect to the
wavelength regions
o
Visible and Reflective Infrared Remote Sensing.
o
Thermal Infrared Remote Sensing.
o
Microwave Remote Sensing.
25
Passive Remote Sensing
Active Remote Sensing
E. transmission, reception, and pre-processing
A. the Sun: energy source
F. processing, interpretation and analysis
C. target
26
D. sensor: receiving and/or energy source G. analysis and application
27
Global Geostationary Satellites
N. & S. American
Eastern Pacific
Earth radius 6,370 km
Satellite altitude 35,800 km
Europe and Africa
C. Asia, India Ocean
Jap. Aus. W. Paci
China, India Ocean
29
Energy Interactions
 The proportions of energy reflected, absorbed, and
transmitted will vary for different earth features,
depending upon their material type and
conditions.
 These differences permit us to distinguish different
features on an image.
 Even within a given feature type, the proportion of
reflected, absorbed, and transmitted energy will
vary at different wavelengths.
31
Spatial data input
 Direct spatial data acquisition
 ground based field surveys
 remote sensors in satellites or airplanes
 In practice, it is not always feasible to obtain spatial
data using these techniques. Factors of cost and
available time may be a hindrance
 Digitizing paper maps
 On-tablet
 On-screen
32
The vectorization process
 Vectorization is the process that attempts to distill
points, lines and polygons from a scanned image.
 As scanned image, as scanned lines may be several
pixels wide, they are often first thinned, to retain only
the centerline.
 This thinning process is also known as skeletonizing,
as it removes all pixels that make the line wider than
just one pixel
 Semi-automatic vectorization proceeds by placing the
mouse pointer at the start of a line to be vectorized
33
Scanned Image
Vectorized Data
After Processing
34
Spatial Referencing
 Geographic referencing, which is sometimes simply
called georeferencing, is defined as the representation
of the location of real-world features within the spatial
framework of a particular coordinate system
 The objective of georeferencing is to provide a rigid
spatial framework by which the position of the realworld features are measured, computed, recorded, and
analyzed
35
Spatial reference system and frames
 The geometry and motion of objects in 3D Euclidean space




are described in a reference coordinate system
A reference coordinate system is a coordinate system with
well-defined origin and orientation of the three
orthogonal, coordinate axes
We shall refer to such a system as a spatial reference system
(SRS)
Several spatial reference systems are used in the earth
sciences. The most important one for the GIS community is
the International Terrestrial Reference System (ITRS)
The ITRS has its origin in the center of mass of the earth
36
(a) The ITRS and (b) The ITRF
visualized as the fundamental polyhedron
37
ITRS
 The ITRS is realized through the International Terrestrial
Reference Frame (ITRF), a catalogue of estimated
coordinates (and velocities) at a particular epoch (era)
 They can be thought of as defining the vertices of a
fundamental polyhedron of several specific, identifiable
points
 Maintenance of the spatial reference frame means relating
the rotated, translated and deformed polyhedron at a later
epoch to the fundamental polyhedron
 Frame maintenance is necessary because of geophysical
processes that deform the earth’s crust at measurable
global, regional and local scales.
38
Spatial reference surfaces and datum
 ITRF is sufficient for describing the geometry and
behavior in time of objects of interest near and on the
earth surface in terms of a uniform triad of geocentric,
Cartesian X, Y, Z coordinates and velocities
 Then Why do we need to also introduce spatial
reference surfaces?
 Splitting the description of 3D location in 2D
(horizontal) and 1D (height) has a long tradition in
earth sciences.
39
SRS & Datum…
 we humans are essentially inhabitants of 2D space
 In first instance, we have sought intuitively to describe
our environment in two dimensions.
 Hence we need a simple 2D curved reference surface
upon which the complex 2D earth topography can be
projected for easier 2D horizontal referencing and
computations
40
Datum
 A datum is a set of parameters defining a coordinate
system, and a set of control points whose geometric
relationships are known, either through measurement
or calculation (Dew Hurst, 1990).
 A datum is defined by a spheroid, which approximates
the shape of the Earth, and the spheroid’s position
relative to the center of the Earth. There are many
spheroids representing the shape of the Earth, and
many more datums based upon them.
41
The geoid and the vertical datum
 To describe heights, we need an imaginary surface of
zero height
 A surface where water does not flow, a level surface, is
a good candidate
 Each level surface is a surface of constant height
 However, there are infinitely many level surfaces.
Which one should we choose as the height reference
surface?
 The most obvious choice is the level surface that most
closely approximates all the earth’s oceans
 We call this surface the geoid
42
 Every point on the geoid has the same zero height all
over the world
 This makes it an ideal global reference surface for
heights
 Historically, the geoid has been realized only locally,
not globally
 For the Netherlands and Germany, the local mean sea
level is realized through the Amsterdam tide-gauge
(zero height).
 Obviously, there are several realizations of local mean
sea levels, also called local vertical datums, in the
world. They are parallel to the geoid but offset by up to
a couple of meters
43
The ellipsoid and the horizontal datum
 Earth has been found to be slightly flattened at the
poles, and the physical shape of the real earth is closely
approximated by the mathematical surface of the
rotational ellipsoid. The ellipsoid is widely used as the
reference surface for horizontal coordinates (latitude &
longitude)
Ellipsoid globally best fitting
to the geoid
Regi
on
of
best
fit
Ellipsoid regionally best
fitting to the geoid
The geiod
44
Ellipsoid…
 The mathematical shape that is simple enough and
most closely approximates the local mean sea level is
the surface of an ellipsoid
 An ellipsoid with specific dimensions – a and b as half
the length of the major, respectively minor, axis is
chosen which best fits the local mean sea level
 Then the ellipsoid is positioned and oriented with
respect to the local mean sea level by adopting a
latitude (φ) and longitude (§) and height (h) of a so
called fundamental point
45
 We say that a local horizontal datum is defined by:
 Dimensions (a, b) of the ellipsoid
 The adopted geographic coordinates φ and § and h of
the fundamental point, and
 Azimuth from this point to another
 Different ellipsoids with varying position and
orientation had to be adopted to best fit the local
mean sea level in different countries or regions
 An example is the Potsdam datum, the local
horizontal datum used in Germany. The fundamental
point is in Rauenberg and the underlying ellipsoid is
the Bessel ellipsoid (a=6,377.156m, b=6,356,079.175m).
46
Datum Transformation
 Satellite positioning and navigation technology, now
widely used around the world for spatial referencing,
implies a global geocentric datum
 global and regional data sets refer now days almost
always to a global geocentric datum and are useful to
individual nations only if they can be reconciled with
the local datum
 Mapping organizations do not only coach the user
community about the implications of the geocentric
datum. They also develop tools to enable users to
transform coordinates of spatial objects from the new
datum to the old one
47
Datum…
 This process is known as datum transformations. The
tools are called datum transformation parameters
 The good news is that a transformation from datum A
to datum B is a mathematically straight forward
process
 Essentially, it is a transformation between two
orthogonal Cartesian spatial reference frames together
with some elementary tools from adjustment theory
48
Datum
 To translate one datum to another we must know the
relationship between the chosen ellipsoids in terms of
position and orientation. The relationship is defined
by 7 constants
 3 - Distance of the ellipsoid center from the center of the
earth (X, Y, Z)
 3 - Rotations around the X, Y, and Z of the Cartesian
coordinate system Axes (, , )
 1 - Scale change (S) of the survey control network
49
Movement of points along an Axis
X
Z
Y
Movement of points around an Axis



Changing the distance between points
50
Map Projections
 A map projection is an attempt to portray the surface of the
earth or a portion of the earth on a flat surface; the manner
in which the spherical surface of the earth is represented
on a two-dimensional surface
 All projections distort properties of map (conformality,
distance, direction, scale, or area). Choose a projection
that will MINIMIZE distortion in your area and be best
suited for your application.
 Conformality: When the scale of a map at any point on
the map is the same in any direction, the projection is
conformal. Meridians (lines of longitude) and parallels
(lines of latitude) intersect at right angles. Shape is
preserved locally on conformal maps
51
Map Projections..
 Distance: A map is equidistant when it portrays
distances from the center of the projection to any other
place on the map.
 Direction: A map preserves direction when azimuths
(angles from a point on a line to another point) are
portrayed correctly in all directions
 Scale: Scale is the relationship between a distance
portrayed on a map and the same distance on the
Earth
 Area: When a map portrays areas over the entire map
so that all mapped areas have the same proportional
relationship to the areas on the Earth that they
represent, the map is an equal-area map
52
Classification of map projections
 Map projections fall into three general classes:
 Cylindrical
 Conical
 Planar or Azimuthal
 Cylindrical Projection is assumed to circumscribe a
transparent globe (marked with meridians and
parallels) so that the cylinder touches the equator
throughout its circumference
 Assuming that a light bulb is placed at the center of
the globe, the graticule of the globe is projected on to
the cylinder
53
Cylindar
 By cutting open the cylinder along a meridian and
unfolding it, a rectangle-shaped cylindrical projection
is obtained
54
Conical
 Conical Projection: a cone is placed over the globe in
such a way that the apex of the cone is exactly over the
polar axis
55
Planar or Azimuthal
 Planar or Azimuthal Projection
A plane is placed
so that it touches the globe at the north or South Pole.
 The projection resulting is better known as the polar
Azimuthal projection
 It is circular in shape with meridians projected as
straight lines radiating from the center of the circle,
which is the pole
56
Data precision, error and repair
 Precision refers to the level of measurement and exactness
of description in a GIS database. Precise location data may
measure position to a fraction of a unit.
 The level of precision required for particular applications
varies greatly. Engineering projects such as road and utility
construction require very precise information measured to
the millimeter or tenth of an inch.
 Highly precise data can be very difficult and costly to
collect. Carefully surveyed locations needed by utility
companies to record the locations of pumps, wires, pipes
and transformers cost $5-20 per point to collect
57
Precision, error and accuracy
 Acquired data sets must be checked for consistency
and completeness. This requirement applies to the
geometric and topological quality as well as the
semantic quality of the data
 There are different approaches to clean up data. Errors
can be identified automatically, after which manual
editing methods can be applied to correct the errors
Before cleanup
After cleanup
Description
Erase duplicates or
silver lines
Erase
objects
overshoots
dangling
or
58
Multiple data sources
 A GIS project usually involves multiple data sets, so a
next step addresses the issue of how these multiple
sets relate to each other
 There are three fundamental cases to be considered if
we compare data sets pair wise:
 They may be about the same area, but differ in accuracy,
 They may be about the same area, but differ in choice of
representation, and
 They may be about adjacent areas, and have to be
merged into a single data set
59
Differences in accuracy
 Images come at a certain resolution, and paper maps at
certain scale. This typically results in differences of
resolution of acquired data sets
 Due to scale differences in the sources, the resulting
polygons do not perfectly coincide, and polygon
boundaries cross each other
The integration of two vector data
sets may lead to silver
60
Differences in representation
 There exist more advanced GIS applications that
require the possibility of representing the same
geographic phenomenon in different ways
Object in scale i
Object in scale j
Object in scale k
Object with multiple
representation
Multi-scale and multi-representation systems compared; the main
difference is that multi-representation systems have a built in
understanding that different representations belong together.
61
Data Transformation
 Format Change: Raster to vector and vector to raster
conversion within the same GIS system. May also
include raster to vector and vector to raster data
 Loss of detail: especially at features edges, generally
vector data more accurately represents a feature
 Loss of attribute data: some raster formats do not
allow for multiple attributes per cell
 Some systems use only one format exclusively and
provide utilities or import options to bring in the data
and convert it to the needed format.
62
Data Transformation
63