thau_isei5_logic_and_taxonomy

Towards a Logic Formalization of
Taxonomic Concepts
Dave Thau, Bertram Ludäscher, Shawn Bowers
UC Davis
[email protected]
Names are Confusing
Adapted from R. Peet
Ranunculus
plumosa
R.plumosa
var
intermedia
R.plumosa
var
plumosa
Ranunculus
pinetcola
Chapman
1860
Kral
1998
Ranunculus
plumosa
Ranunculus
plumosa
[email protected]
Gray
1834
Ranunculus
homunculus
5th International Conference on Ecological
Informatics
Thau
2006
2
Impact on Data Analysis
• Can’t find data
– If A  B, a search on A should retrieve B
– Same if A  B
• Can’t aggregate data
– If A  B, you should be able to combine data
from A into B
[email protected]
5th International Conference on Ecological
Informatics
3
Where In Greece Can I Find Ranunculus aquatilis?
R. aquatilis
[email protected]

R. trichophyllus
5th International Conference on Ecological
Informatics
4
Mapping Taxonomies
Benson, 1948
FNA-03, 1997
 
Ranunculus
aquatilis
R.a. var
calvescens
R.a. var
capillaceus
Ranunculus
aquatilis
R.a. var
aquatilis
R.a. var
diffusus
R.a. var
hispidulus


A
B
A
B
B
A
B
A
512
This results in
(more than 240 million) possible sets
Aof relationships.
B
A  B
A overlap B
A disjoint B
B  A
[email protected]
5th International Conference on Ecological
Informatics
5
Overview
• The problems – Names change, experts
disagree, data become incomparable
• The partial solution – Taxonomic Concepts
• Another part of the solution – Logic
• Representing taxonomy in logic
• Using the representation to detect
inconsistencies and discover new relations
• Applications
[email protected]
5th International Conference on Ecological
Informatics
6
Logic, why?
•
•
•
•
Precise modeling language
Solid mathematical basis
Good tools for reasoning are available
Explicit, “portable” representation (not
buried in code)
[email protected]
5th International Conference on Ecological
Informatics
7
Basic Taxonomy
A
isa
isa
B
• Rooted tree
• Only “Isa” relations
C
B  A
isa
C  A
isa
T = (N, E)
N = {A, B, C}
isa
isa
E = {B  A, C  A}
isaTx:m(x)  n(x)m isa
 n  E, T=(N,E)) }
In the basic taxonomy TisaT
[email protected]
5th International Conference on Ecological
Informatics
8
Some Additional Constraints
• No empty nodes
– All nodes have at least one element
– Tx: n(x)n  N, T=(N,E)) }
• Disjointness
A
isa
B
isa
C
– The children of a node are disjoint
– !Tx: n1(x)  n2(x) 
isa
n1 isa
m

E,
n

2  m  E, T=(N,E)) }
• Closed World
– A node with children is defined as the union of those
children
– This one’s formula is a bit long – trust me…
[email protected]
5th International Conference on Ecological
Informatics
9
Mapping Formulae
• Mappings between nodes in two different
taxonomies have their owns
• In the slides and proofs to come I will use
these symbols:
A  B: A is included in B
A  B: A includes B
A  B: A and B are equivalent
[email protected]
5th International Conference on Ecological
Informatics
10
Inferring Unstated Correspondences
Benson, 1948
Ranunculus
arizonicus
R.a. var
chihuahua
Kartesz, 2004
Given: 
R.a. var
typicus
Ranunculus
arizonicus
Given: 
We can demonstrate: 
Peet, 2005:
B.1948:R.a.typicus is included in K.2004:R. arizonicus
B.1948:R. arizonicus is congruent to K.2004:R. arizonicus
[email protected]
5th International Conference on Ecological
Informatics
11
Proving New Mappings
Benson, 1948
A
Ranunculus
arizonicus
B
R.a. var
chihuahua
Kartesz, 2004
D
Ranunculus
arizonicus

C
R.a. var
typicus

?
Show B
 D and (D  B)
[email protected]
5th International Conference on Ecological
Informatics
12
Formal Proof of Mapping
Part 1
Part 2
[email protected]
5th International Conference on Ecological
Informatics
13
Inconsistent Mapping
Benson, 1948
Kartesz, 2004

Ranunculus
hydrocharoides
R.h. var
natans
R.h. var
stolonifer
R.h. var
typicus

Ranunculus
hydrocharoides
R.h. var
stolonife
r
R.h. var
typicus

Peet, 2005:
B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer
B.1948:R.h.typicus is congruent to K.2004:R.h.typicus
B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides
[email protected]
5th International Conference on Ecological
Informatics
14
Proving Inconsistency
Benson, 1948
Kartesz, 2004

Ranunculus
hydrocharoides
R.h. var
natans
R.h. var
stolonifer
R.h. var
typicus

[email protected]
Ranunculus
hydrocharoides
R.h. var
stolonife
r
R.h. var
typicus

5th International Conference on Ecological
Informatics
15
Formal Proof of Inconsistency
[email protected]
5th International Conference on Ecological
Informatics
16
Showing Inconsistency Using Popular Tools
Benson, 1948
Kartesz, 2004
Ranunculus
Ranunculus
macranthus
Ranunculus
petiolaris
Ranunculus
…
Ranunculus
petiolaris
…


B.48:R. petiolaris  K.04:R. petiolaris  B.48:R. macranthus contradicts
B.48:R. macranthus and B.48:R. petiolaris are disjoint.
Peet, 2005:
B.1948:R. macranthus contains K.2004: R. petiolaris
B.1948:R. petiolaris is contained by K. petiolaris
[email protected]
5th International Conference on Ecological
Informatics
17
Resolving Inconsistencies
• Trying to simultaneously satisfy no
emptiness, disjointness and the closed
world
• Relaxing any of these makes the mapping
consistent – giving us clues to hidden
truths
• It turns out that Kartesz and Benson focus
on different localities.
[email protected]
5th International Conference on Ecological
Informatics
18
Inconsistent Mapping
Benson, 1948
Kartesz, 2004

Ranunculus
hydrocharoides
R.h. var
natans
R.h. var
stolonifer
R.h. var
typicus

Ranunculus
hydrocharoides
R.h. var
stolonife
r
R.h. var
typicus

Peet, 2005:
B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer
B.1948:R.h.typicus is congruent to K.2004:R.h.typicus
B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides
[email protected]
5th International Conference on Ecological
Informatics
19
Summary
• Taxonomic Concepts are important
• Logic is a useful tool when reasoning
about mappings between taxonomies
• We have the beginnings of a
representation for taxonomies
• That representation can find unstated
mappings
• And detect inconsistent mappings
[email protected]
5th International Conference on Ecological
Informatics
20
Future Work
• Beefing up the representation
– Formalizing more constraints, such as rank
– Working in other factors, such as locality
• Adding ‘intelligence’ to tools which build
mappings
• Using the representation in a workflow
system to aid data integration
[email protected]
5th International Conference on Ecological
Informatics
21
Thanks! Questions?
• We would like to acknowledge:
– Bob Peet for the Ranunculus data set
– NSF, under SEEK awards 0225676, 0225665,
0225635, and 0533368
[email protected]
5th International Conference on Ecological
Informatics
22