History and Philosophy of Science

Discovering Dynamic Models
Lecture 21
Dynamic Models: Introduction
Dynamic models can describe how variables change over
time or explain variation by appealing to mechanisms.
The informatics systems in this lecture address the
discovery of three types of models.
Gretl:
supports the discovery of descriptive,
statistical models of time series.
MECHEM:
generates qualitative models that explain the
results of chemical reactions.
Prometheus: creates quantitative models that explain the
dynamics of complex systems.
Each system combines domain knowledge provided by an
expert with heuristic search enacted by a computer.
ARIMA Models
ARIMA models are commonly used in statistical time series
analysis, particularly within the field of econometrics.
ARIMA stands for
Autoregressive:
Xt is a linear combination of the prior p values
and a random process Zt.
Xt = a1Xt-1 + … + apXt-p + Zt
Integrated:
a differencing method to remove a trend.
e.g., Xt is replaced with Xt – Xt-1 for estimation
Moving Average:
Xt is a linear combination of the random
processes in the previous q values.
Xt = μ + Zt + b1Zt-1 + … + bqZt-q
An ARIMA model describes the time series for one variable
and may be used for forecasting.
VARIMA models extend these to support multiple variables.
The Gretl Environment
Gretl is an informatics tool used by econometricians that
supports a wide variety of statistical analyses.
The main window shows basic
information about the data
Gretle supports several,
customizable plots.
This data in this plot
show seasonality and a
general upward trend.
Input for Gretl
Gretl can discover the parameters of
a model, but not its structure.
Scientists tell Gretl
• the variables to be described;
• the type of model to use; and
• the structure of the model.
Gretl supports ARIMA and other
types of time-series models.
The structure of an ARIMA model
reflects the number of terms.
Search in Gretl
Gretl uses maximum likelihood
estimation to fit ARIMA parameters.
Xt =
a1Xt-1 + … + apXt-p + Zt + μ +
b1Zt-1 + … + bqZt-q
Where Z0 = 0, and
Zt = Xt – μ – (b1Zt-1 + … + bqZt-q)
This window shows the resulting
parameters and fitness scores.
This graph shows the predictions (in
blue) from an ARIMA model.
The forecast is in blue and its 95%
confidence interval in green.
The observed values are in red.
The MECHEM Environment
MECHEM is an interactive tool that generates plausible
reaction pathways for chemical interactions.
Scientists interact with MECHEM through a graphical
interface as presented in the figure.
Clockwise from the upper left,
the interface shows
• the main menu;
• the current reaction;
• an example mechanism;
• a set of constraints; and
• the output log.
Knowledge in MECHEM
Constraints on the structure of Reaction-specific information
candidate mechanisms.
including chemical products.
1. Every conjectured intermediate has at
most two O atoms
2. O atoms cannot be bonded to O atoms
3. Reject Eley-Rideal mechanisms
starting materials:
catalyst site:
M
dual catalyst sites:
observed C1 products:
CO, H2
MM
CO2, CH3OH,
CH4, H2O
4. The site M must be present in all steps
5. No multiple occurrences of any reactant
pair on the left-hand side of steps
6. CH4 cannot appear on the left-hand
side of a step
MECHEM also includes a
general set of parameters,
such as, “Consider at most 7
conjectured species.”
The purpose of all this knowledge is to rule out a large
number of implausible reaction-pathways.
Search in MECHEM
MECHEM uses heuristic search through a space of
symbolic structures to identify reaction pathways.
The search has several features:
• it favors pathways with few species and steps;
• it ensures the unique generation of candidate pathways;
• it requires balanced chemical equations; and
• it limits steps to at most two reactants and two products.
These aspects involve highly general constraints that limit
the search space before a scientist adds prior knowledge.
1. H2 + MM ➞ 2MH
2. CO + MM ➞ M2CO
3. MH + M2CO ➞ M2CHOM
Partial reaction pathway
discovered by MECHEM.
The Prometheus Environment
Recall that Prometheus supports quantitative process
models that relate variables through numeric processes.
Scientists can build models from generic processes that
appear in domain-specific libraries.
Knowledge in Prometheus
Generic processes encode the knowledge used by
Prometheus to discover quantitative process models.
generic process exponential_loss
variables: S{species}, D{detritus}
parameters:  [0, 1]
equations: d[S,t,1] = 1    S
d[D,t,1] =   S
Generic processes specify
• which variables may interact,
• the equations that govern the
dynamics of their interaction,
generic process grazing
variables: S1{species}, S2{species}, D{detritus}
parameters:  [0, 1],  [0, 1]
•
equations: d[S1,t,1] =     S1
d[D,t,1] = (1  )    S1
d[S2,t,1] = 1    S1
generic process nutrient_uptake
variables: S{species}, N{nutrient}
parameters:  [0, ],  [0, 1],  [0, 1]
conditions: N > 
equations: d[S,t,1] =   S
d[N,t,1] = 1      S
ranges on the parameters
associated with the processes.
Scientists also give Prometheus a
list of variables that should
appear in the model.
Search in Prometheus
Unlike Gertl and MECHEM, Prometheus carries out multistage search for a model’s structure and its parameters:
1. Find all ways to instantiate known generic processes
with specific variables, subject to type constraints;
2. Combine instantiated processes into candidate generic
models subject to additional constraints;
3. For each generic model, carry out search through
parameter space to find good coefficients;
4. Return the parameterized models with the best overall
scores (e.g., sum of squared error).
Like MECHEM, Prometheus returns multiple models for the
scientist to inspect and possibly refine.
To this end, the environment supports incremental revision.
Discovering Dynamic Models: Summary
The informatics systems discussed in this lecture covered
three types of dynamic models.
Nevertheless there were commonalities as each system
• used domain knowledge such as a model structure,
constraints on candidates, or model components;
• used search techniques to discover model parameters,
structures, or both;
• worked interactively with scientists to explain their data.
Moreover, researchers have applied these systems to
generate new knowledge in scientific domains.