Towards a Logic Formalization of Taxonomic Concepts Dave Thau, Bertram Ludäscher, Shawn Bowers UC Davis [email protected] Names are Confusing Adapted from R. Peet Ranunculus plumosa R.plumosa var intermedia R.plumosa var plumosa Ranunculus pinetcola Chapman 1860 Kral 1998 Ranunculus plumosa Ranunculus plumosa [email protected] Gray 1834 Ranunculus homunculus 5th International Conference on Ecological Informatics Thau 2006 2 Impact on Data Analysis • Can’t find data – If A B, a search on A should retrieve B – Same if A B • Can’t aggregate data – If A B, you should be able to combine data from A into B [email protected] 5th International Conference on Ecological Informatics 3 Where In Greece Can I Find Ranunculus aquatilis? R. aquatilis [email protected] R. trichophyllus 5th International Conference on Ecological Informatics 4 Mapping Taxonomies Benson, 1948 FNA-03, 1997 Ranunculus aquatilis R.a. var calvescens R.a. var capillaceus Ranunculus aquatilis R.a. var aquatilis R.a. var diffusus R.a. var hispidulus A B A B B A B A 512 This results in (more than 240 million) possible sets Aof relationships. B A B A overlap B A disjoint B B A [email protected] 5th International Conference on Ecological Informatics 5 Overview • The problems – Names change, experts disagree, data become incomparable • The partial solution – Taxonomic Concepts • Another part of the solution – Logic • Representing taxonomy in logic • Using the representation to detect inconsistencies and discover new relations • Applications [email protected] 5th International Conference on Ecological Informatics 6 Logic, why? • • • • Precise modeling language Solid mathematical basis Good tools for reasoning are available Explicit, “portable” representation (not buried in code) [email protected] 5th International Conference on Ecological Informatics 7 Basic Taxonomy A isa isa B • Rooted tree • Only “Isa” relations C B A isa C A isa T = (N, E) N = {A, B, C} isa isa E = {B A, C A} isaTx:m(x) n(x)m isa n E, T=(N,E)) } In the basic taxonomy TisaT [email protected] 5th International Conference on Ecological Informatics 8 Some Additional Constraints • No empty nodes – All nodes have at least one element – Tx: n(x)n N, T=(N,E)) } • Disjointness A isa B isa C – The children of a node are disjoint – !Tx: n1(x) n2(x) isa n1 isa m E, n 2 m E, T=(N,E)) } • Closed World – A node with children is defined as the union of those children – This one’s formula is a bit long – trust me… [email protected] 5th International Conference on Ecological Informatics 9 Mapping Formulae • Mappings between nodes in two different taxonomies have their owns • In the slides and proofs to come I will use these symbols: A B: A is included in B A B: A includes B A B: A and B are equivalent [email protected] 5th International Conference on Ecological Informatics 10 Inferring Unstated Correspondences Benson, 1948 Ranunculus arizonicus R.a. var chihuahua Kartesz, 2004 Given: R.a. var typicus Ranunculus arizonicus Given: We can demonstrate: Peet, 2005: B.1948:R.a.typicus is included in K.2004:R. arizonicus B.1948:R. arizonicus is congruent to K.2004:R. arizonicus [email protected] 5th International Conference on Ecological Informatics 11 Proving New Mappings Benson, 1948 A Ranunculus arizonicus B R.a. var chihuahua Kartesz, 2004 D Ranunculus arizonicus C R.a. var typicus ? Show B D and (D B) [email protected] 5th International Conference on Ecological Informatics 12 Formal Proof of Mapping Part 1 Part 2 [email protected] 5th International Conference on Ecological Informatics 13 Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus Ranunculus hydrocharoides R.h. var stolonife r R.h. var typicus Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides [email protected] 5th International Conference on Ecological Informatics 14 Proving Inconsistency Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus [email protected] Ranunculus hydrocharoides R.h. var stolonife r R.h. var typicus 5th International Conference on Ecological Informatics 15 Formal Proof of Inconsistency [email protected] 5th International Conference on Ecological Informatics 16 Showing Inconsistency Using Popular Tools Benson, 1948 Kartesz, 2004 Ranunculus Ranunculus macranthus Ranunculus petiolaris Ranunculus … Ranunculus petiolaris … B.48:R. petiolaris K.04:R. petiolaris B.48:R. macranthus contradicts B.48:R. macranthus and B.48:R. petiolaris are disjoint. Peet, 2005: B.1948:R. macranthus contains K.2004: R. petiolaris B.1948:R. petiolaris is contained by K. petiolaris [email protected] 5th International Conference on Ecological Informatics 17 Resolving Inconsistencies • Trying to simultaneously satisfy no emptiness, disjointness and the closed world • Relaxing any of these makes the mapping consistent – giving us clues to hidden truths • It turns out that Kartesz and Benson focus on different localities. [email protected] 5th International Conference on Ecological Informatics 18 Inconsistent Mapping Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus Ranunculus hydrocharoides R.h. var stolonife r R.h. var typicus Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides [email protected] 5th International Conference on Ecological Informatics 19 Summary • Taxonomic Concepts are important • Logic is a useful tool when reasoning about mappings between taxonomies • We have the beginnings of a representation for taxonomies • That representation can find unstated mappings • And detect inconsistent mappings [email protected] 5th International Conference on Ecological Informatics 20 Future Work • Beefing up the representation – Formalizing more constraints, such as rank – Working in other factors, such as locality • Adding ‘intelligence’ to tools which build mappings • Using the representation in a workflow system to aid data integration [email protected] 5th International Conference on Ecological Informatics 21 Thanks! Questions? • We would like to acknowledge: – Bob Peet for the Ranunculus data set – NSF, under SEEK awards 0225676, 0225665, 0225635, and 0533368 [email protected] 5th International Conference on Ecological Informatics 22
© Copyright 2026 Paperzz