Dan Olteanu, Christoph Koch, Lyublena Antova (ICDE2007 paper) Presenter: For Motivation of the Studies Efficient representation of incomplete data Relational Algebra query on incomplete data Experiment Managing uncertain data 185 or 785 ? Single or married? 185 or 186? Marital Status? Storing uncertain data Is the Or-set relation practical? Data Cleaning: unique Social Number Or-set fail to represent the afterward result! Imposing constraint Preserve all information Instead of storing You need to store … T1 T1 T1 T2 T2 T2 185 185 185 185 185 185 185 185 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 186 186 186 186 186 186 186 186 185 185 185 185 186 186 186 186 185 185 185 185 186 186 186 186 Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Possible Worlds Census Survey Setting: 50 Qs per survey Population: 200 Millions = (2*10^8) 2*10^8 .. .. 50 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Certain DB Total answers: = population* questions = 10^10 Error rate: 1 in 10^4 Uncertain answers: = answers / error rate =10^6 Possible Worlds: 2^(10^6) 2*10^8 .. .. 50 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Certain DB .. 50*(2*10^8) .. .. .. .. .. 2^(10^6) .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Uncertain DB Possible Worlds T1 T1 T 1 T2 T2 T 2 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 185 185 185 185 185 185 185 185 186 186 186 186 186 186 186 186 185 185 185 185 186 186 186 186 185 185 185 185 186 186 186 186 Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Does it work when constrains are introduced? T1 T1 T 1 T2 T2 T 2 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 185 185 185 185 185 185 185 185 186 186 186 186 186 186 186 186 185 185 185 185 186 186 186 186 185 185 185 185 186 186 186 186 Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Yes, It works. T1 T1 T 1 T2 T2 T 2 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 185 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 785 Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith Smith 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 185 185 185 185 185 185 185 185 186 186 186 186 186 186 186 186 185 185 185 185 186 186 186 186 185 185 185 185 186 186 186 186 Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown Brown 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 component Intuition of the World Set Decomposition Store Independent tuples fields in separate components Store dependent tuples fields within same components Selection with constant Selection with variables Projection Difference Union / Product Normalization of query answers 1) Search within the relevant components 2) Test the condition, if false, mark it as 3) Propagate the within the same component 1) Pair up the relevant fields 2) Search within the relevant components 3) Test the condition, if false, mark it as 4) Propagate the within the same component Information loss: Only one tuple appear in each world is lost 1) Merge all components involving t1,t2 2) Propagate the 3) Select field(s) for projection R.t1.A R.t2.A S.t1.A 1 2 1 .. … … .. … … 2 3 3 T.t1.A T.t2.A 2 … … … … 2 Make a copy of everything available to another relation Normalize: remove t2 as it is invalid in all possible world DataSets: 5% extract from the 1990 US cenus, 50 multiple choice question, ~12.5millions of tuples Adding noise to data: replace some answers with or-set noise density: 0.005%, 0.01%, 0.05%, 0.1% ( i.e. 0.1% means 1 in 1000 fields are replaced by or-sets) Query: selection, projection, rename 1) X-axis : Number of tuples 2) Y-axis: time in seconds 3) Different noise density data is used for the experiments 1) Larger noise density, more possible worlds 2) Query time of multiple worlds comparable to single world Explanation for query time of multiple worlds comparable to single world - in practice, there are rather few differences between the worlds, making the mapping and components relative small.
© Copyright 2024 Paperzz