Towards a Formal Model for View Maintenance in Data

Towards a Formal Model
for View Maintenance
in Data Warehouses
D. Agrawal, A. El Abbadi, A. Mostéfaoui, M. Raynal and M. Roy
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.1/22
Summary
The Data Warehouse Problem
Definitions
Existing protocols
A Formal Definition of the Problem
Formal Definition of Data Objects
Abstract Definition of View Management
The Protocol
A Virtual Topology
A Pipelining Technique
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.2/22
The Data Warehouse Problem
A set of databases x1 , x2 , · · · , xn
How to efficiently query a database
aggregate?
x1
x2
x3
x4
x5
Query
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22
The Data Warehouse Problem
A set of databases x1 , x2 , · · · , xn
How to efficiently query a database
aggregate? By adding a Data Warehouse
x1
Query
x2
x3
x4
x5
Data
Warehouse
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22
Data Warehouse: Definition
The Data Warehouse maintains a DB summary
a Select-Project-Join (SPJ) expression:
F (X1 , · · · , Xn ) = ΠA (σC (X1 ./ · · · ./ Xn ))
Data Warehouse (DWH) problem ≡ calculus of a
“Simple” distributed function with changing Data
Sources.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.4/22
Extremal Solutions
The DWH maintains the total aggregation of
all Data Sources.
costly in space
unnecessary network usage
The DWH stores no datum, and forwards
queries to Data Sources
high latency
unnecessary network usage
then, the DWH is just a proxy
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.5/22
Proposed Solutions
The DWH maintains the SPJ expression F
Periodically, it calculates the ∆F
Major Problem: asynchrony of updates on
Data Sources
Error Terms
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.6/22
Major Difficulties
Asynchrony and distribution of the model:
Consistency issues
Performance issues
network usage
memory/disk usage on dwh.
Complexity of proposed protocols:
unproved algorithms
need for a formal definition of the
problem.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.7/22
Formal Definitions (data)
Data Objects
denoted xi
a data manager is associated with each xi
can be updated and read using the
query/update primitives
Timeline: the successive values of xi are
[t]
denoted (xi )t>0 .
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.8/22
Formal Definitions (operations)
Data Operations
add/remove, denoted ⊕, for source
updates
associative
commutative.
a join operation, denoted ⊗
associative,
commutative,
distributive over ⊕.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.9/22
Formal Definitions (dwh)
the Data Warehouse calculates F such that
F = x 1 ⊗ x2 ⊗ · · · ⊗ x n
consistency is mandatory at any time.
up-to-dateness is eventual for performance
reasons
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.10/22
Abstract Def. of View Management
A View Management protocol should satisfy:
Validity any query on the dwh returns an
[t ]
f = x1 1 ⊗ · · · xn[tn ] .
[t11 ]
x1 ⊗ · · · xn[t1n ]
Order Consistency If q1 =
(resp
[t21 ]
q2 = x1 ⊗ · · · xn[t2n ] ) is the result of a query, if
q1 was issued before q2 , then ∀i, t1i ≤ t2i .
Up-to-Dateness for any t > 0, for any
i ∈ [1..n], an infinite sequence of queries will
[t0 ]
return at least an f = F (· · · , xi , · · ·) with
t0 ≥ t.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.11/22
The Protocol: a single update
Suppose that F = x1 ⊗ x2 ⊗ x3 ⊗ x4 .
if x1 is updated to x1 ⊕ δ1 , then the
corresponding ∆F is:
∆F = δ1 ⊗ x2 ⊗ x3 ⊗ x4
x1 ’s data manager sends δ1 to x2 :
x2 ’s data manager computes δ1 ⊗ x2 and sends the
result to x3
x3 ’s data manager computes δ1 ⊗ x2 ⊗ x3
when x4 ’s data manager computes
δ1 ⊗ x2 ⊗ x3 ⊗ x4 , it can send the result to the dwh
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.12/22
The Protocol: Concurrent Updates
Now, suppose that both x1 and x2 are updated.
F 0 = (x1 ⊕ δ1 ) ⊗ (x2 ⊕ δ2 ) ⊗ x3 ⊗ x4
F 0 = F ⊕ ∆F
∆F = (δ1 ⊗x2 ⊗x3 ⊗x4 )⊕(x1 ⊗δ2 ⊗x3 ⊗x4 )⊕(δ1 ⊗δ2 ⊗x3 ⊗x4 )
complexity increases with concurrency
two solutions:
1. compute error terms
2. order the updates
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.13/22
The Protocol: a Virtual Topology
the star topology (center: dwh, edges: nodes)
is seen as a ring
a token perpetually moves on the ring
it generates a natural order on updates
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.14/22
The Protocol: Pipelining Updates
The token generates a global time (# of steps)
the sites maintain an additional variable, the
difference δi between the current xi and the
last commited xi .
when an update made a total rotation, it can
be integrated to the data warehouse.
the token can contain up to n updates in
commitment phase.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.15/22
The Protocol: Code (1)
when the token arrives to xi with sequence
number sn:
1. let ∆F = token[i];
2. if (∆F 6= ⊥) then sn ← sn + 1;
send incr (∆F , sn) to DWH endif;
3. token[i] ← ∆i ;
4. ∀j 6= i do token[j] ← (token[j] ⊗ (xi ∆i )) enddo;
5. ∆i ← ⊥;
6. send token sn (token, sn) to next data
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.16/22
The Protocol: Code (2)
when update
(δi ) is received by xi :
1. xi ← xi ⊕ δi ;
2. ∆i ← ∆i ⊕ δi
when incr
(∆F , sn) is received by DWH:
1. wait (next_sn = sn);
2. f ← f ⊕ ∆F ;
3. next_sn ← next_sn + 1
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.17/22
The Protocol: Sketch for the Proof
Validity, Up-to-dateness and Order Consistency
use a total order: the number of steps
performed by the token
induction on the content of the token
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.18/22
a Real Life Protocol
How to make a quiescent protocol?
when there is no update, then the token is
destroyed.
when an update occurs, the data source
sends a request to the data warehouse
if the token was destroyed, it is recreated
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.19/22
a Real Life Protocol (2)
How to remove the ring assumption?
in a star network, each message comes
from/to the dwh
the dwh incorporates updates and
destroys/recreates the token when necessary
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.20/22
Extension: Multi Term
Meta-datawarehouse:
aggregation of multiple data warehouses
a data object may appear in several views
computed in the data warehouses
x1
x2
DWH1
x3
x4
DWH2
x5
Meta−DWH
x1x3x4+x2x3x4x5
synchronization problems,
possible deadlocks.
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.21/22
Conclusion
a formal definition of a database problem
an abstract protocol
provable
can be adapted to fit to real-life systems
efficient
Towards a Formal Modelfor View Maintenance in Data Warehouses – p.22/22