MADALGO-projects.pdf

In the name of Allah
Massive Data Algorithmics
An Introduction
Overview
MADALGO
 SCALGO
 Basic Concepts
 The TerraFlow Project
 STREAM
 The TerraStream Project
 TPIE

MADALGO- Introduction
Center for MAssive Data ALGOrithmics
 A major basic research center funded by
The Danish National Research
Foundation
 Covers all areas of the design, analysis and
implementation of algorithms and data
structures for processing massive data

MADALGO- Four core research
areas

I/O-efficient algorithms
◦ Algorithms designed in a two-level external
memory (or I/O-) model
◦ The memory hierarchy consists of a main
memory of limited size M and an external
memory (disk) of unlimited size
◦ the goal is to minimize the number of times a
block of B consecutive elements is read (or
written) from (to) disk (an I/O-operation, or
simply I/O)
MADALGO- Four core research
areas

cache-oblivious algorithms
◦ Algorithms designed in the I/O-model – but
without knowledge of M and B– and then
analyzed as I/O-model algorithms
◦ Holds simultaneously on all levels of any
multi-level memory hierarchy.
MADALGO- Four core research
areas

streaming algorithms
◦ Only one (or a small constant number of)
sequential pass(es) over the data is (are)
allowed
◦ Solve a given problem using significantly less
space than the input data size
◦ Process each data element as fast as possible
MADALGO- Four core research
areas

algorithm engineering
◦ the design and analysis of practical algorithms
◦ efficient implementation of these algorithms
◦ experimentation that provide insight into
their applicability and further improvements
SCALGO
SCALGO: SCALable alGOrithmics
 Was founded in 2009 in Aarhus, Denmark
 Mission: to bring cutting-edge massive
terrain data-processing technology to
market

Terrain

Terrain: The vertical and horizontal
dimension of land surface
LIDAR
LIDAR: Light Detection And Ranging
 an optical remote sensing technology
 measures the distance to, or other
properties of, a target by illuminating the
target with light
 often uses pulses from a laser

Point cloud
A set of vertices in a three-dimensional
coordinate system
 Usually defined by X, Y, and Z coordinates
 Typically intended to be representative of
the external surface of an object

DEM
DEM: Digital elevation model
 A digital model or 3D representation of a
terrain's surface

◦ Two most used types of DEM are regular grid
and triangulated irregular network (TIN)
Regular grid DEM

a matrix of equally spaced points with
each point having x, y and z coordinate
values
Regular grid DEM- Quadtree
a tree data structure in
which each internal
node has exactly four
children
 most often used to
partition a two
dimensional space by
recursively subdividing
it into four quadrants
or regions

Triangulated Irregular Network
(TIN)
irregularly distributed nodes and lines
with three-dimensional coordinates
 arranged in a network of non-overlapping
triangles

TIN- Delaunay triangulation
A triangulation for a set of points such
that no point is inside the circumcircle of
any triangle
 maximizes the minimum angle of all the
angles of the triangles in the triangulation
 tends to avoid skinny triangles

The TerraFlow Project
Has emerged from the experiences with
terrain analysis applications which do not
scale up to large datasets
 a software package for computing flow
routing and flow accumulation on massive
grid-based terrains
 based on theoretically optimal algorithms
designed using external memory
paradigms

Flow direction, flow routing and
flow accumulation

The flow directions of a cell correspond to the
directions in which water would flow if poured at
that cell onto the terrain
◦ water cannot go uphill

The flow routing problem: the problem of assigning
flow directions to all cells in the DEM such that
1. flow directions do not induce any cycles;
2. every cell has a flow path off the edge of the terrain

The flow accumulation of a terrain is an index
which estimates the surface runoff for each cell in
the terrain
STREAM- Introduction
STREAM: Scalable Techniques for hiResolution Elevation data Analysis and
Modeling
 Located in the CS department at Duke
university
 funded by the U.S. Army Research Office

STREAM- Projects

Constructing DEM
◦ developed two methods for efficiently
converting LIDAR point sets to more
conventional formats:
 Grid Construction: uses a quad-tree segmentation
 TIN Construction: uses a Delaunay triangulation
algorithm

Terrain Flow Modeling
◦ improvements to existing work done as part
of the TerraFlow project
STREAM- Projects

Noise Removal
◦ There is some level of noise in DEMs derived
from LIDAR
◦ computes a persistence score for topological
features
◦ uses this persistence score to remove small
topological features likely the result of noise
STREAM- Projects

Hierarchical Watershed Decomposition
◦ partitions a terrain into a hierarchy of nested
watersheds
STREAM- Projects

Topographic Change
◦ Detecting topographic change can quickly
identify beach dunes damaged by hurricanes,
monitor urban development or measure
change in forest growth
TerraSTREAM- Introduction




A series of libraries and front-ends for these
libraries
Allows the user to perform a series of
computational tasks on very large digital
elevation models
The data is represented either as a TIN or a
GRID
A collaboration between Duke University
CS researchers and researchers at
MADALGO
TerraStream- Features

DEM Construction
◦ Computes a digital elevation model (DEM)
from a point cloud
◦ The input data is typically gathered using
LIDAR
◦ Constructs both TINs and grids
TerraStream- Features

DEM Topological Conditioning
◦ Simplifies digital elevation models by first
identifying and then removing insignificant
geographical features
◦ Significance is the feature's height, area and
volume or any combination of these
◦ A feature is insignificant if its significance is
smaller than some threshold specified by the
user
TerraStream- Features

Flow Routing
◦ Compute flow directions for each data point
in a DEM
◦ The routing models supported are
 steepest-flow-descent
 multiple-flow-directions
 flux decomposition

Flow Accumulation
◦ Accumulate amounts of, e.g., water on a DEM
along flow paths as computed by the flow
routing module
TerraStream- Features

Flood Simulation
◦ Flood Mask
 computes a mask of the cells that are flooded if the
water lever were raised 'x' units
◦ General
 Transforms a DEM to a new DEM
 The height of each cell in the produced DEM is the
minimum height that the water level needs to be
raised to in order for that particular cell to flood
TerraStream- Features

Contour Map Computation
◦ Computes the contour map of a terrain
TerraStream- Features

Raster Quality Assessment
◦ takes a raster and point cloud
◦ computes how far the center of each raster cell
is from the closest point in the point cloud
◦ it is easy to spot areas of the grid where there is
no points close
◦ If the point cloud used is the same used for
generating the input raster this can be used for
quality control of the point cloud, the
classification algorithm used and the produced
raster
TerraStream- Features

Watershed Hierarchy Construction
◦ Construct a Pfafstetter labeling of the watersheds
of a DEM

LS-Factor Computation
◦ LS-factor: an aggregate of the slope length factor
(L) and the slope steepness factor (S)
◦ estimate the effects of slope length and steepness
on erosion

Format Flexibility
◦ reading and writing mosaic grids in many
common formats
TPIE- Introduction





TPIE: The Templated Portable I/O Environment
A tool-box providing efficient and convenient
tools
To ease the implementation of algorithm and data
structures on very large sets of data
The algorithms and data structures that form the
core of TPIE all provide efficient worst-case space,
time and disk usage guarantees
In Windows, TPIE is known to work with the
Microsoft Visual Studio 2008 and 2010 compilers
TPIE- Example

Internal sorting
TPIE- Example

Reading and writing file streams
TPIE- Example

External sorting
TPIE- Example

Priority queue
TPIE- I/O parameters

M and B

get_block_size() implementation
TPIE- I/O parameters

Elements’ block size
◦ Pass the block factor to the constructor
The End

Thank you for your time