10^(10^6) Worlds and Beyond: Efficient Representation and

Dan Olteanu, Christoph Koch, Lyublena Antova
(ICDE2007 paper)
Presenter: For




Motivation of the Studies
Efficient representation of incomplete data
Relational Algebra query on incomplete data
Experiment

Managing uncertain data
185 or 785 ?
Single or married?
185 or 186?
Marital Status?

Storing uncertain data

Is the Or-set relation practical?

Data Cleaning: unique Social Number

Or-set fail to represent the afterward result!

Imposing constraint

Preserve all information
Instead of storing
You need to store …
T1
T1
T1
T2
T2
T2
185
185
185
185
185
185
185
185
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
186
186
186
186
186
186
186
186
185
185
185
185
186
186
186
186
185
185
185
185
186
186
186
186
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Possible Worlds

Census Survey
Setting: 50 Qs per survey

Population: 200 Millions = (2*10^8)

2*10^8
..
..
50
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
Certain DB




Total answers: = population* questions = 10^10
Error rate: 1 in 10^4
Uncertain answers: = answers / error rate =10^6
Possible Worlds: 2^(10^6)
2*10^8
..
..
50
.. ..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
Certain DB
..
50*(2*10^8)
..
..
..
..
..
2^(10^6) ..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
Uncertain DB
Possible Worlds
T1
T1
T
1
T2
T2
T
2
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
1
1
1
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
185
185
185
185
185
185
185
185
186
186
186
186
186
186
186
186
185
185
185
185
186
186
186
186
185
185
185
185
186
186
186
186
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Does it work when
constrains are introduced?
T1
T1
T
1
T2
T2
T
2
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
1
1
1
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
185
185
185
185
185
185
185
185
186
186
186
186
186
186
186
186
185
185
185
185
186
186
186
186
185
185
185
185
186
186
186
186
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Yes, It works.
T1
T1
T
1
T2
T2
T
2
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
185
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
785
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
Smith
1
1
1
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
185
185
185
185
185
185
185
185
186
186
186
186
186
186
186
186
185
185
185
185
186
186
186
186
185
185
185
185
186
186
186
186
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
component

Intuition of the World Set Decomposition
Store Independent tuples fields in separate components
Store dependent tuples fields within same components






Selection with constant
Selection with variables
Projection
Difference
Union / Product
Normalization of query answers
1) Search within the relevant components
2) Test the condition, if false, mark it as
3) Propagate the
within the same component
1) Pair up the relevant fields
2) Search within the relevant components
3) Test the condition, if false, mark it as
4) Propagate the
within the same component
Information loss:
Only one tuple appear in each world is lost
1) Merge all components involving t1,t2
2) Propagate the
3) Select field(s) for projection
R.t1.A
R.t2.A
S.t1.A
1
2
1
..
…
…
..
…
…
2
3
3
T.t1.A
T.t2.A
2
…
…
…
…
2
Make a copy of everything
available to another relation
Normalize:
remove t2 as it is invalid in all possible world



DataSets: 5% extract from the 1990 US cenus,
50 multiple choice question,
~12.5millions of tuples
Adding noise to data: replace some answers with or-set
noise density: 0.005%, 0.01%, 0.05%, 0.1%
( i.e. 0.1% means 1 in 1000 fields are replaced by or-sets)
Query: selection, projection, rename
1) X-axis : Number of tuples
2) Y-axis: time in seconds
3) Different noise density data is used for the experiments
1) Larger noise density, more possible worlds
2) Query time of multiple worlds comparable to single world
Explanation for query time of multiple worlds comparable to single world
- in practice, there are rather few differences between the worlds,
making the mapping and components relative small.