Estimation de systèmes peu observables en grande dimension par
apprentissage statistique
K. K ASPER1 , L. M ATHELIN2 & H. Abou-Kandil1
1
: SATIE - ENS Cachan
2
: LIMSI, Orsay
Motivation: Inverse problems (IP)
Given set of data S + model F describing state/solution u :
−→ estimate parameters y
parameters: y ≈ D x
space sometimes HD
or temporal fields)
(spatial
state: u ≈ F y
F : forward model (expensive,
uncertain, non-linear)
data: S ≈ Ou
O: observation operator
data limited, noisy & indirect
state space is HD
Lionel M ATHELIN
Estimation par apprentissage statistique
2 / 23
Motivation
Goal
Inference of a high-dimensional spatial field from a few sensors.
Illustration adapted from Jafarpour & Khaninezhad, 2015.
Lionel M ATHELIN
Estimation par apprentissage statistique
3 / 23
Motivation
Goal
Inference of a high-dimensional spatial field from a few sensors.
Illustration adapted from Jafarpour & Khaninezhad, 2015.
Context
High-dimensional spatial field
Limited number of sensors
: Severely ill-posed problem
: CPU intensive while need for fast solutions
Limited data
Lionel M ATHELIN
Estimation par apprentissage statistique
3 / 23
Standard approach (deterministic framework) — ROM-based state
estimation
b = DROM x.
Derive a reduced-order model: y ≈ y
n
(i)
DROM
onD
i=1
typically are PCA modes. x is the
associated coefficients vector.
From a correlation kernel or a “learning” (unsorted) sequence Y := y (1) . . . y (nsnap ) ∈ Rn×nsnap :
DROM Σ V ∗
thin SVD
≈
Y.
For a given number nD of retained modes, leads to the best approximation in the following sense:
b = DROM X
b = arg min e
b = Σ V ∗.
Y
with X
Y − Y
e ∈Rn×nsnap
Y
h i
e ≤n
rank Y
D
F
y ∈ Rn field of interest,
DROM ∈ Rn×nD the approximation basis [Dictionary],
x ∈ RnD the basis coefficients as estimated from the ns sensors,
s ∈ Rns , sensor information.
Lionel M ATHELIN
Estimation par apprentissage statistique
4 / 23
Standard approach (deterministic framework) — ROM-based state
estimation
On site, what is measured is s = O F y =: G y ∈ Rns only.
F : Rn → Rn , y 7→ u: model operator.
O : Rn → Rns , u 7→ s: observation operator.
G : Rn → Rns , y 7→ s: forward operator.
Observer such that
b ∈ arg min s − G DROM x
e ,
x
2
[data misfit]
e∈RnD
x
b = (G DROM )+ s.
or simply x
y ∈ Rn field of interest,
DROM ∈ Rn×nD the approximation basis [Dictionary],
x ∈ RnD the basis coefficients as estimated from the ns sensors,
s ∈ Rns , sensor information.
Lionel M ATHELIN
Estimation par apprentissage statistique
5 / 23
Standard approach (deterministic framework) — ROM-based state
estimation
On site, what is measured is s = O F y =: G y ∈ Rns only.
F : Rn → Rn , y 7→ u: model operator.
O : Rn → Rns , u 7→ s: observation operator.
G : Rn → Rns , y 7→ s: forward operator.
Observer such that
b ∈ arg min s − G DROM x
e ,
x
2
[data misfit]
e∈RnD
x
b = (G DROM )+ s.
or simply x
The reconstructed field is finally:
b = DROM x
b = DROM (G DROM )+ s.
y
LROM := DROM (C DROM )+ is the ROM-based lift-up operator.
y ∈ Rn field of interest,
DROM ∈ Rn×nD the approximation basis [Dictionary],
x ∈ RnD the basis coefficients as estimated from the ns sensors,
s ∈ Rns , sensor information.
Lionel M ATHELIN
Estimation par apprentissage statistique
5 / 23
Standard approach (deterministic framework) — ROM-based state
estimation
On site, what is measured is s = O F y =: G y ∈ Rns only.
F : Rn → Rn , y 7→ u: model operator.
O : Rn → Rns , u 7→ s: observation operator.
G : Rn → Rns , y 7→ s: forward operator.
Observer such that
b ∈ arg min s − G DROM x
e ,
x
2
[data misfit]
e∈RnD
x
b = (G DROM )+ s.
or simply x
←−
requires ns ≥ nD (if no additional hyp.)
The reconstructed field is finally:
b = DROM x
b = DROM (G DROM )+ s.
y
LROM := DROM (C DROM )+ is the ROM-based lift-up operator.
y ∈ Rn field of interest,
DROM ∈ Rn×nD the approximation basis [Dictionary],
x ∈ RnD the basis coefficients as estimated from the ns sensors,
s ∈ Rns , sensor information.
Lionel M ATHELIN
Estimation par apprentissage statistique
5 / 23
Inversion as statistical inference (SI)
Deterministic inversion often ill-posed (existence, uniqueness, stability)
Lionel M ATHELIN
Estimation par apprentissage statistique
6 / 23
Inversion as statistical inference (SI)
Deterministic inversion often ill-posed (existence, uniqueness, stability)
−→ sparse basis learning approach
Lionel M ATHELIN
Estimation par apprentissage statistique
6 / 23
Inversion as statistical inference (SI)
Deterministic inversion often ill-posed (existence, uniqueness, stability)
−→ sparse basis learning approach
−→ Statistical inference restates IP as a well-posed extension in a larger
space of probability distributions
Lionel M ATHELIN
Estimation par apprentissage statistique
6 / 23
A dictionary learning algorithm
Without additional hypotheses, impossible to estimate nD > ns modes
−→ derive an over-complete dictionary D for sparse representation of the field
y ∈ My to be infered
y ≈ D x,
Lionel M ATHELIN
x ∈ RnD
Estimation par apprentissage statistique
7 / 23
A dictionary learning algorithm
Without additional hypotheses, impossible to estimate nD > ns modes
−→ derive an over-complete dictionary D for sparse representation of the field
y ∈ My to be infered
y ≈ D x,
−→
x ∈ RnD
ns -sparse approximation
y ≈ Dx
Lionel M ATHELIN
with
(i) x ≤ ns ,
0
∀y ∈ My
Estimation par apprentissage statistique
7 / 23
A dictionary learning algorithm
Without additional hypotheses, impossible to estimate nD > ns modes
−→ derive an over-complete dictionary D for sparse representation of the field
y ∈ My to be infered
y ≈ D x,
−→
x ∈ RnD
ns -sparse approximation
y ≈ Dx
with
eX
e
Find {D, X } ∈ arg min Y − D
e X
e
D,
Lionel M ATHELIN
F
(i) x ≤ ns ,
0
s.t.
∀y ∈ My
e(i) x ≤ ns ,
0
Estimation par apprentissage statistique
∀i,
X := x (1) . . . x (nsnap ) .
7 / 23
A dictionary learning algorithm
−→ use K-SVD algorithm
Repeat
e
Sparse Coding : X ∈ arg min Y − D X
e
X
F
s.t.
e(i) x ≤ ns , ∀i.
0
CodeBook Update : Update D and X in order to lower kY − D X kF while maintaining the
support of x (i) i .
Lionel M ATHELIN
Estimation par apprentissage statistique
8 / 23
A dictionary learning algorithm
−→ use K-SVD algorithm
Repeat
e
Sparse Coding : X ∈ arg min Y − D X
e
X
F
s.t.
e(i) x ≤ ns , ∀i.
0
CodeBook Update : Update D and X in order to lower kY − D X kF while maintaining the
support of x (i) i .
But typically x (i) ∈ RnD cannot be estimated from measurements: Observability issue
−→ determine D given the SI framework for estimating x (i) from s(i) instead of y (i) .
−→ dictionary D both accurate and observable: D (Y , S).
Lionel M ATHELIN
Estimation par apprentissage statistique
8 / 23
−→ Observability-oriented Dictionary Learning
On-site data: s = G (y).
n
o
Let My = y 1 , . . . , y nsnap examples of plausible fields. One looks for a recovery procedure
minimizing the Bayes risk
ZZ
h
i
y − y
b (s)2
b (s)2 p (s|y) p (y) ds dy,
Es,y y − y
=
2
2
≈
2
b (S)
∝ Y − Y
.
F
[iid samples].
with
y
s
≈
≈
b = D x,
y
b
s = P (x)
−→
[P: Sparsifying measurement predictor].
eX
e S; P
e {D, X , P} ∈ arg min Y − D
.
e X
e ,P
e
D,
Lionel M ATHELIN
Estimation par apprentissage statistique
F
9 / 23
−→ Observability-oriented Dictionary Learning
On-site data: s = G (y).
n
o
Let My = y 1 , . . . , y nsnap examples of plausible fields. One looks for a recovery procedure
minimizing the Bayes risk
ZZ
h
i
y − y
b (s)2
b (s)2 p (s|y) p (y) ds dy,
Es,y y − y
=
2
2
≈
2
b (S)
∝ Y − Y
.
F
[iid samples].
with
y
s
≈
≈
b = D x,
y
b
s = P (x)
−→
[P: Sparsifying measurement predictor].
eX
e S; P
e {D, X , P} ∈ arg min Y − D
.
e X
e ,P
e
D,
F
Alternatively:
y
s
≈
≈
−→
b = L (x)
y
b
s = Dfeat x.
[L: Nonlinear sparsifying lift-up operator],
e S; D
e feat {Dfeat , X , L} ∈ arg min Y − e
L X
.
e ,X
e ,e
D
L
feat
Lionel M ATHELIN
Estimation par apprentissage statistique
F
9 / 23
−→ Observability-oriented Dictionary Learning
2
o
n
e feat x
e
x (i) , Dfeat ∈ arg min s(i) − D
2
e
e ,D
x
feat
s.t.
x
e ≤ ns
0
∀i,
e
d feat,l = 1 ∀l,
2
e L ∈ arg min Y − e
L X
.
e
L
Lionel M ATHELIN
F
Estimation par apprentissage statistique
10 / 23
−→ Observability-oriented Dictionary Learning
2
o
n
e feat x
e
x (i) , Dfeat ∈ arg min s(i) − D
2
e
e ,D
x
feat
s.t.
x
e ≤ ns
0
∀i,
e
d feat,l = 1 ∀l,
2
e L ∈ arg min Y − e
L X
.
F
e
L
2
e feat x
e(i) e(i)
Let Ψ(i) := s(i) − D
+ regul x
2
Lionel M ATHELIN
Estimation par apprentissage statistique
10 / 23
−→ Observability-oriented Dictionary Learning
2
o
n
e feat x
e
x (i) , Dfeat ∈ arg min s(i) − D
e
e ,D
x
feat
x
e ≤ ns
0
s.t.
2
∀i,
e
d feat,l = 1 ∀l,
2
e L ∈ arg min Y − e
L X
.
F
e
L
2
e feat x
e(i) e(i) , it leads to
Let Ψ(i) := s(i) − D
+ regul x
2
−→ Double-optimization problem
{Dfeat , X , L} ∈
e arg min Y − e
L X
,
F
e
e ,e
Dfeat,
X
L
s.t.
c (i) :=
∂Ψ(i)
e
∂x
(i)
∂Ψ
= 0,
e
∂ d feat,l
= 0,
s.t.
∀ 1 ≤ i ≤ nsnap ,
e
d feat,l = 1,
2
∀ 1 ≤ l ≤ nD .
−→ consistent and as realistic as possible
Lionel M ATHELIN
Estimation par apprentissage statistique
10 / 23
−→ Observability-oriented Dictionary Learning
2
o
n
e feat x
e
x (i) , Dfeat ∈ arg min s(i) − D
e
e ,D
x
feat
x
e ≤ ns
0
s.t.
2
∀i,
e
d feat,l = 1 ∀l,
2
e L ∈ arg min Y − e
L X
.
F
e
L
2
e feat x
e(i) e(i) , it leads to
Let Ψ(i) := s(i) − D
+ regul x
2
−→ Double-optimization problem
{Dfeat , X , L} ∈
e arg min Y − e
L X
,
F
e
e ,e
Dfeat,
X
L
s.t.
c (i) :=
∂Ψ(i)
e
∂x
(i)
∂Ψ
= 0,
e
∂ d feat,l
= 0,
s.t.
∀ 1 ≤ i ≤ nsnap ,
e
d feat,l = 1,
2
∀ 1 ≤ l ≤ nD .
−→ consistent and as realistic as possible
constraints c (i) are independent −→ parallel solving,
e.g., use l − BFGS.
Lionel M ATHELIN
Estimation par apprentissage statistique
10 / 23
−→ Observability-oriented Dictionary Learning – Linear framework
On-site data: s = G y. Linear relationship
QR
L ≡ D ∈ Rn×nD ∈ span y (i) . Let Y = Q R, D = Y B.
Lionel M ATHELIN
Estimation par apprentissage statistique
11 / 23
−→ Observability-oriented Dictionary Learning – Linear framework
On-site data: s = G y. Linear relationship
QR
L ≡ D ∈ Rn×nD ∈ span y (i) . Let Y = Q R, D = Y B.
Using block-coordinate descent, alternate solve for
Observability-oriented Sparse Coding
e
x (i) ∈ arg min s(i) − Dfeat x
,
2
e∈RnD
x
s.t.
kxk0 ≤ ns ,
Estimation CodeBook Update
e X
D ∈ arg min Y − D
.
F
n×nD
e
D∈R
+
−→ solved directly: D ← Y X
[using qrginv (Katsikis et al., 2011)].
−→ Recovery error: R Insnap − B X F
[inexpensive to evaluate].
Feature CodeBook Update
with
T
bl ,
d feat,l ∝ S − Dfeat,\l X\l x
rank-1
d
bl ≈ R Insnap − B\l X\l .
R
bl x
d feat,l = 1,
2
∀ 1 ≤ l ≤ nD ,
−→ consistent and as realistic as possible
Lionel M ATHELIN
Estimation par apprentissage statistique
11 / 23
Bayesian Compressive Sensing
Relax kxk0 (NP-hard optimization problem) in k·k1 .
e2 + τ x
e ,
x ∈ arg min s − Dfeat x
2
1
e∈RnD
x
Lionel M ATHELIN
Estimation par apprentissage statistique
12 / 23
Bayesian Compressive Sensing
Relax kxk0 (NP-hard optimization problem) in k·k1 .
e2 + τ x
e ,
x ∈ arg min s − Dfeat x
2
1
e∈RnD
x
Full information on the solution −→ Bayesian modeling
Independent data during training.
es ∝ L x
es p x
e ,
x ∈ arg max q x
e∈RnD
x
b
s = Dfeat x,
with
L (x|s, β) ∼ N sDfeat x, β −1 ,
λ
λ
p (x) =
exp − kxk1 ,
[Laplace prior],
2
2
τ = λ/β
[sparsity penalty].
Lionel M ATHELIN
Estimation par apprentissage statistique
12 / 23
Bayesian Compressive Sensing (cnt’d)
Laplace prior not conjugate to the likelihood model
Sparsity regularization penalty τ unknown
p (x|γ)
=
nD
Y
N (xi |0, γi ),
−→ hierarchical formulation
γ = γ1 . . . γnD ,
(1)
i=1
p (γi |λ)
=
p (λ|ν)
=
λ γi
λ
exp −
,
2
2
Γ (λ|ν/2, ν/2) .
(2)
Each of the nD independent hyperparameters γi controls the strength of the prior −→ introduces
sparsity in the model.
Lionel M ATHELIN
Estimation par apprentissage statistique
13 / 23
Sequential Sparse Bayesian Learning
−→ three-stage hierarchy. Eqs. (1)-(2) result in a Laplace distribution p (x|λ),
−→ q (x|s, γ, β, λ) is a multivariate Gaussian distribution N (x|µ, Σ),
−1
T
T
µ = β Σ Dfeat
s,
Σ = β Dfeat
Dfeat + Λ
, Λ = diag γ1−1 , . . . , γn−1
.
D
Follows Babacan et al. (2010).
Hyperparameters maximize the (log-) marginal likelihood
Z
log p (s, γ, β, λ) = log p (s|x, β) p (x|γ) p (γ|λ) p (λ) p (β) dx
b = D µ.
−→ MAP estimate: y
Lionel M ATHELIN
Estimation par apprentissage statistique
14 / 23
In a nutshell
1
Basis learning [offline]
I
I
2
Form a snapshot matrix Y of expected realizations of the field and corresponding
matrix S,
Given the sensors, learn representation dictionaries D (and Dfeat ) using Sparse
Bayesian Dictionary Learning,
Field reconstruction [online]
I
I
Use SBL with the measure s to estimate x ∼ N (x|µ, Σ).
b = D µ or use Markov chain Monte Carlo.
Reconstruct the MAP total field from y
Lionel M ATHELIN
Estimation par apprentissage statistique
15 / 23
Numerical experiment: pressure field
y: pressure field
s: measurements at the surface (ns ≤ 5)
Lionel M ATHELIN
Estimation par apprentissage statistique
16 / 23
Observability issue. Full information: s ≡ y
−→ significantly better L2 -reconstruction performance than PCA/KL.
Lionel M ATHELIN
Estimation par apprentissage statistique
17 / 23
Observability issue. Sensor information only
−→ significantly better L2 -reconstruction performance than PCA/KL.
−→ PCA coefficients are not accurately estimated from the sensors.
Lionel M ATHELIN
Estimation par apprentissage statistique
18 / 23
Reconstruction
Relative pressure field. Exact (left), estimated from PCA/KL (middle), estimated from present
approach (right).
−→ Much better reconstruction performance.
Lionel M ATHELIN
Estimation par apprentissage statistique
19 / 23
Combined with SensorSpace. . .
−→ PCA/Effective Independence compared to SOBAL / SensorSpace.
Lionel M ATHELIN
Estimation par apprentissage statistique
20 / 23
Cavity flow – Dictionary
Lionel M ATHELIN
Estimation par apprentissage statistique
21 / 23
Recovery performance – 3 sensors
Lionel M ATHELIN
Estimation par apprentissage statistique
22 / 23
Closing remarks
Offline/Online strategy for inference. First learn about the system at hand, then exploit,
Basis learning philosophy is a key for a realistic and successful approach,
Need to balance between representation accuracy and observabity when using a dictionary
−→ observability-oriented sparse Bayesian learning.
Lionel M ATHELIN
Estimation par apprentissage statistique
23 / 23
Closing remarks
Offline/Online strategy for inference. First learn about the system at hand, then exploit,
Basis learning philosophy is a key for a realistic and successful approach,
Need to balance between representation accuracy and observabity when using a dictionary
−→ observability-oriented sparse Bayesian learning.
On-going efforts:
High-dimensional fields, nonlinear observation operator,
Combined with an assimilation technique, e.g., sparsity exploiting EnKF.
Lionel M ATHELIN
Estimation par apprentissage statistique
23 / 23
© Copyright 2026 Paperzz