(D-03) Paul Amalaman - Department of Computer Science

Modeling Evolution in Spatial
Datasets
Paul Amalaman
2/17/2012
Data Mining and Machine Learning Lab Team Members
Dr Eick Christoph
Nouhad Rizk
Zechun Cao
Sujing Wang
Anirup Dutta
Swati Goyal
Tarikul Islam
Paul Amalaman
1
IIIIIIIV-
Background
Research Goals
Case Study
Summary
2
I-Background
Machine Learning Techniques are mostly used where
• modeling implicit trends is possible (Regression)
• stable patterns exist in dataset (Classification)
Simulation Systems are used when
• a model is hard to establish
• there is a great degree of randomness in the attribute values
• there are a lot of interactions between objects
• when attributes have to be predicted recursively over many
steps
Example Applications of Simulation Systems:
Traffic Modeling, Weather Forecasting, Social Networks, Urban
Modeling
3
I-Background continued(3)
Spatial Simulation Systems
ABM
Cellular Automata (CA)
(Cell centered approach)
Continuous Agent Space
Or Multi Agent System (MAS)
(Agent centered approach)
4
I-Background continued(3)
Modeling with Cellular Automata
• Concept of neighborhood
• Moore Neighborhood
• Von Newman neighborhood
D(x-1,y-1)
D(x-1,y)
D(x+1,y-1)
D(x-1,y)
P(x,y)
D(x+1,y)
D(x-1,y+1)
D(x-1,y+1) D(x+1,y+1)
Moore Neighborhood
D(x-1,y)
D(x-1,y)
P(x,y)
D(x+1,y)
D(x-1,y+1)
Von Newman Neighborhood
http://en.wikipedia.org/wiki/Von_Neumann_neighborhood
http://en.wikipedia.org/wiki/Moore_neighborhood
5
I-Background continued(4)
Modeling with Cellular Automata
Cellular Automata
• provides the programmer a cell-centered
programming style where the set of cells
represents computing units that are regularly
organized
• good efficiency with parallel architecture
6
II-Research Goals
Using Data Mining and Machine Learning Techniques to
Enhance Simulation Systems
New approach= Machine Learning Techniques + Spatial
Simulation Systems
Goal1: Grid-based Models for Progression in Spatial Datasets
Goal2: Development of Cluster-based Bias Removal Methods
7
II-Research Goal continued (1)
Goal1:Grid-based Models for Progression in Spatial
Datasets
?
t
t +1
yi,j,t+1= fij(x1,1,1,t,…, x1,n,n,t,… , xm,1,1,t,…, xm,n,n,t, y1,1,t,…,y,n,n,t)
X1(t)
X2(t)
.
.
Xn(t)
Y(t)
X1(t+Δt)=?
X2(t+Δt)=?
.
.
Xn(t+Δt)=?
Y(t+Δt)=?
Given that at t we
know all the attribute
values including the
output variable Y, can
we predict all attribute
values at t+1?
Challenges:
1. Many target variables to predict; different variables have to be predicted at different location
2. Target variables are not independent of each other (e.g. some are auto-correlated)
3. Models has to be used over multiple steps
8
II-Research Goal continued (2)
Goal2:Development of Cluster-based Bias Removal
Methods
Input x
Output + bias
b(x)
Model
EPA prediction models are meteorological and chemical transport models. Those models are
derived from solving differential equations. Over time, the model bias grows larger
http://www.epa.gov/AMD/CMAQ/ch06.pdf
Whether pattern
recognition
Input
x
group(x)
b(x)
Model
Output
Correction
(bias
removal)
Output
h(b(x), group(x))
Bias removal based on whether pattern recognition
Our model, model h learn group(x), and b(x) and make better prediction
9
III-Case Study
Improving Ozone Forecasting For HoustonGalveston Area
Goal1: Development of a Grid-based Prediction Framework
Goal2: Development of Cluster-based Bias Removal Methods
In Collaboration with
UH-IMAQS Institute for Multidimensional Air Quality Studies
(UH Department of Earth and Atmospheric Science)
-Dr Rappenglueck, Bernhard
-Dr Li, Xiangshang
10
III-Case Study Continued(1)
Ozone Prediction
Goal 1:Improving Prediction for Spatial
Progression
Given what happened at t, can we predict what
happens at t+Δ, t+2Δ, ..?
11
III-Case Study Continued(2)
Ozone Prediction
Goal 2- Improving forecast Accuracy
12
III-Case Study Continued(2)
Status of Dissertation
• Methods to collect ozone data and to capture
it in a relational database have been
developed.
• The necessary knowledge for simulationbased prediction systems in general, and
ozone prediction in particular has been
obtained
• Started work on different modeling
approaches for grid-based prediction
13
IV-SUMMARY
14
Thank you!
15