Weighted Partonomy-Taxonomy Trees
with Local Similarity Measures for
Semantic Buyer-Seller Match-Making
Lu Yang, Marcel Ball, Virendra C. Bhavsar
and Harold Boley
BASeWEB, May 8, 2005
1
Outline
Introduction
Motivation
Partonomy Similarity Algorithm
–
–
–
–
Node Label Similarity
–
–
2
Tree representation
Tree simplicity
Partonomy similarity
Experimental Results
Inner-node similarity
Leaf-node similarity
Conclusion
Introduction
Buyer-Seller matching in e-business, e-learning
Web
Browser
Main Server
User Info
User Profiles
Agents
User
User Agents
Cafe-1
Matcher 1
To other sites
(network)
Cafe-n
Matcher n
A multi-agent system
3
Introduction
An
e-Learning scenario
Learner 1
Learner 2
Course Provider 1
Cafe
Course Provider 2
Matcher
Learner n
Course Provider m
H. Boley, V. C. Bhavsar, D. Hirtle, A. Singh, Z. Sun and L. Yang, A match-making system for learners
and learning Objects. Learning & Leading with Technology, International Society for Technology in
Education, Eugene, OR, 2005 (to appear).
4
Motivation
Metadata for buyers and sellers
–
–
5
Keywords/keyphrases
Trees
Tree similarity
Tree representation
Characteristics of our trees
–
–
–
Node-labled, arc-labled and arc-weighted
Arcs are labled in lexicographical order
Weights sum to 1
Car
Make
0.3
Year
0.5
Model
0.2
Ford
6
Explorer
2002
Tree representation – Serialization of trees
Weighted Object-Oriented RuleML
– XML attributes for arc weights and subelements for
arc labels
–
<Cterm>
<Ctor>Car</Ctor>
<slot weight="0.3"><Ind>Make</Ind><Ind>Ford</Ind></slot>
<slot weight="0.2"><Ind>Model</Ind><Ind>Explorer</Ind></slot>
<slot weight="0.5"><Ind>Year</Ind><Ind>2002</Ind></slot>
</Cterm>
7
Tree serialization in WOO RuleML
Tree simplicity
A (0.9)
tree simplicity:
0.0563
b
a
0.7
0.3
B
c
0.2
D
–
–
8
d
0.8
E
e
0.1
F
C (0.45)
f
0.9
G (0.225)
The deeper the leaf node, the less its contribution to the tree simplicity
Depth degradation index (0.9)
Depth degradation factor (0.5)
Reciprocal of tree breadth
L. Yang, B. Sarker, V.C. Bhavsar and H. Boley, A weighted-tree simplicity algorithm for similarity
matching of partial product descriptions (submitted for publication).
Tree simplicity – Computation
Š (T )
DI ( DF ) d
1
m
if T is a leaf node,
m
w
j
Š (T j )
otherwise.
j 1
Š(T): the simplicity value of a single tree T
DI and DF: depth degradation index and depth degradation factor
d: depth of a leaf node
m: root node degree of tree T that is not a leaf
wj: arc weight of the jth arc below the root node of tree T
Tj: subtree below the jth arc with arc weight wj
9
Partonomy similarity – Simple trees
1
tree t
Ford
1
10
tree t´
Car
Make
0.3
0
Inner nodes
Model
0.7
Car (House)
Model
0.7
Make
0.3
Mustang
Escape
Ford
Leaf nodes
0
Partonomy similarity – Complex trees
educational
0.3334
edu-set
general
0.5
format
0.5
tec-set
platform
0.5
technical
0.7
0.3
0.3333
gen-set
0.5
general
technical
0.3333
language title
lom
t´
lom
t
gen-set
language title format
0.2
tec-set
platform
0.8
0.1
0.9
Basic
Oracle
*
WinXP
en
en
Introduction HTML WinXP
to Oracle
* : Don’t Care
(si (wi + w'i)/2)
11
A(si) ≥ si
(A(si)(wi + w'i)/2)
Partonomy similarity – Main functions
Three main functions (Relfun)
–
Treesim(t,t'): Recursively compares any (unordered) pair of trees
Paremeters N and i
–
Treemap(l,l'): Recursively maps two lists, l and l', of labeled
and weighted arcs: descends into identical–
labeled subtrees
Treeplicity(i,t): Decreases the similarity with decreasing simplicity
–
V. C. Bhavsar, H. Boley and L. Yang, A weighted-tree similarity algorithm for multi-agent systems in ebusiness environments. Computational Intelligence, 2004, 20(4):584-602.
12
Similarity of simple trees
Experiments
Tree
make
1
auto
year
0.5
0.5
ford
t1
auto
1.0
0.0
ford
2
2002
t1
auto
year
make
0.0
ford
13
2002
year
make
1.0
t3
Results
Tree
2002
make
auto
year
0.5
0.5
0.1
chrysler t2
1998
auto
make
year
1.0
ford
make
0.0
t2
auto
1.0
ford
1998
year
0.0
t4
0.55
2002
1.0
Similarity of simple trees (Cont’d)
Experiments
Tree
Tree
auto
auto
model
1.0
3
mustang
ford explorer 2000
t1
t2
auto
auto
model
1.0
14
year
make
0.45 model 0.45
0.1
year
make
model 0.05
0.05
0.9
mustang
ford explorer 2000
t3
t4
Results
0.2823
0.1203
Similarity of identical tree structures
Experiments
4
Tree
Tree
auto
auto
yea
make
0.3 model r 0.5
0.2
make model year
0.3
0.2 0.5
ford explorer 2002
ford explorer 1999
t1
t2
auto
auto
make
year
make model yea
model
0.3334 0.3333 r0.3333 0.3334 0.3333 0.3333
15
ford explorer 2002
ford explorer 1999
t3
t4
Results
0.55
0.7000
Similarity of complex trees
A
t
d
b
0.3333
c
0.3333
b
B4 C1
0.8160
16
0.9316
E
d
c
0.3333
0.3334
0.3333
0.3334
B
C
D
b1
c1
c3 d1
0.5 b4 0.3333 c2 0.3334
1.0
0.3333
0.5
B1
A
t´
C3 D1
0.8996
B
C
D
b3
c1
b1
c4
0.3334
d1
0.25
c2 c3
0.3333 b2
0.25 0.25 0.25 1.0
0.3333
B1
0.9230
B2
B3 C1
0.9647
F
C3 C4 D1
0.9793
Similarity of complex trees (Cont’d)
A
t
d
b
0.3333
c
0.3333
b
B4 C1
0.8555
17
0.9626
E
d
c
0.3333
0.3334
0.3333
0.3334
B
C
D
b1
c1
c3 d1
0.5 b4 0.3333 c2 0.3334
1.0
0.3333
0.5
B1
A
t´
C3 D1
0.9314
B
C
D
b3
c1
b1
c4
0.3334
d1
0.25
c2 c3
0.3333 b2
0.25 0.25 0.25 1.0
0.3333
B1
0.9499
B2
B3 C1
0.9824
E F C3 C4 D1
0.9902
Similarity of complex trees (Cont’d)
A
t
d
b
0.3333
c
0.3333
b
B4 C1
0.9134
18
0.9697
E
d
c
0.3333
0.3334
0.3333
0.3334
B
C
D
b1
c1
c3 d1
0.5 b4 0.3333 c2 0.3334
1.0
0.3333
0.5
B1
A
t´
C3 D1
0.9530
B
*
D
b3
c1
b1
c4
0.3334
d1
0.25
c2 c3
0.3333 b2
0.25 0.25 0.25 1.0
0.3333
B1
0.9641
B2
B3 C1
0.9844
F
C3 C4 D1
0.9910
Node label similarity
For inner nodes and leaf nodes
– Exact string matching
binary result 0.0 or 1.0
– Permutation of strings
“Java Programming” vs. “Programming in Java”
Number of identical words
Maximum length of the two strings
Example
19
For two node labels “a b c” and “a b d e”, their similarity is:
2
= 0.5
4
Node label similarity (Cont’d)
Example
Node labels “electric chair” and “committee chair”
1
= 0.5
2
• Semantic
20
similarity
meaningful?
Node label similarity – Inner nodes vs. leaf nodes
Inner nodes — class-oriented
–
–
–
Leaf nodes — type-oriented
–
–
21
Inner node labels can be classes
classes are located in a taxonomy tree
taxonomic class similarity measures
address, currency, date, price and so on
type similarity measures (local similarity measures)
Node label similarity
Non-Semantic
Matching
Exact String
Matching
(both inner
and leaf
nodes)
22
String
Permutation
(both inner
and leaf
nodes)
Semantic
Matching
Taxonomic
Class Similarity
(inner nodes)
Type Similarity
(leaf nodes)
Inner node similarity – Partonomy trees
Distributed Programming
Tuition
Credit
0.2 Duration Textbook0.4
0.1
0.3
2months “Introduction $800
3
to Distributed
Programming”
t1
23
Object-Oriented Programming
Tuition
Credit
0.1 Duration Textbook0.2
0.5
0.2
$1000
3months “Objected-Oriented
3
Programming
Essentials”
t2
Inner node similarity – Taxonomy tree
Programming Techniques
0.3
Object-Oriented
0.5
0.7
0.4
0.2
General
Concurrent
Programming
Programming Sequential
Applicative
Automatic 0.3
0.9
Programming
Programming Programming
0.5
Parallel
Distributed
Programming Programming
Arc weights
• same level of a subtree: do not need to add up to 1
• assigned by human experts or extracted from documents
A. Singh, Weighted tree metadata extraction. MCS Thesis (in preparation), University of New
24
Brunswick, Fredericton, Canada, 2005.
Inner node similarity – Taxonomic class similarity
Programming Techniques
0.3
0.5
General
0.7
0.4
0.2
Applicative
Automatic 0.3
Programming Programming
0.5
Object-Oriented
Programming
Concurrent
Programming
Sequential
0.9
Programming
Parallel
Distributed
Programming Programming
• red arrows stop at the nearest common ancestor
• the product of subsumption factors on the two paths = 0.018
25
Inner node similarity – Integration of
taxonomy tree into partonomy trees
Taxonomy tree
–
extra taxonomic class similarity measures
Semantic similarity without
–
–
changing our partonomy similarity algorithm
losing taxonomic semantic similarity
Encode the (subsections) of taxonomy tree
into partonomy trees
www.teclantic.ca
26
Inner node similarity – Encoding taxonomy
tree into partonomy tree
Programming Techniques
Applicative Programming
0.1
Automatic
Programming
0.1
*
0.15
Object-Oriented
General Programming
0.3
*
Sequential Programming
Concurrent
Programming
0.2
*
Distributed
Programming
0.6
*
0.15
*
Parallel
Programming
0.4
*
encoded taxonomy tree
27
*
*
Inner node similarity – Encoding taxonomy
tree into partonomy tree (Cont’d)
course
course
Classification
Classification
Tuition
Tuition
0.65
Duration
0.65
Duration
Title 0.05
Credit
Title 0.05
Credit
taxonomy
0.05
0.1
taxonomy
0.2
0.05
0.15
0.05
Programming
3 2months Distributed $800
Programming 3 3months Object- $1000
Techniques
Programming
Oriented
Techniques
1.0
Programming
*
1.0
Concurrent
Sequential
Programming Programming
*
Sequential
Object-Oriented
0.7
0.3
Programming
Programming
*
*
0.8
0.2
Distributed
Parallel
Programming Programming
*
*
0.4
0.6
*
28
*
t2
t1
encoded partonomy trees
Leaf node similarity (local similarity)
Different leaf node types
different type similarity measures
Various leaf node types
– “Price”-typed leaf nodes
e.g. for buyer ≤$800 [0, Max]
for seller ≥$1000 [Min, ∞]
29
Leaf node similarity (local similarity)
Example: “Date”-typed leaf nodes
Project
0.74
start_date
0.5
end_date
0.5
Nov 3, 2004
Project
May 3, 2004
0.5
30
{
t2
if | d1 – d2 | ≥ 365,
0.0
1–
Jan 20, 2004
Feb 18, 2005
t1
DS(d1, d2) =
start_date
0.5
end_date
| d1 – d2 |
365
otherwise.
Conclusion
31
Arc-labeled and arc-weighted trees
Partonomy similarity algorithm
– Traverses trees top-down
– Computes similarity bottom-up
Node label similarity
– Exact string matching (inner and leaf nodes)
– String permutation (inner and leaf nodes)
– Taxonomic class similarity (inner nodes)
Taxonomy tree
Encoding taxonomy tree into partonomy tree
– Type similarity (leaf nodes)
date-typed similarity measures
Questions?
32
© Copyright 2026 Paperzz