Rcd Projects

Muse: A System for Understanding and
Designing Mappings
Bogdan Alexe Laura Chiticariu
UC Santa Cruz
Renée J. Miller
U. of Toronto
Motivation
Daniel Pepper Wang-Chiew Tan
UC Santa Cruz
Muse Overview
• Muse is a mapping design wizard that uses data examples to help designers
understand, design and refine schema mappings
• Schema mapping = relationship between a source database schema
and a target database schema
• Designing a schema mapping is a fundamental problem in information
integration
• Specifying a semantically correct schema mapping is usually a complex
task
• In Muse, the designer works with data examples rather than with complex
specifications to understand the semantics of a mapping
• Muse uses real data examples whenever possible, otherwise it constructs
synthetic examples
• Automatic tools can suggest potential mappings
• Ensuring mapping correctness still requires intricate manual work
• Muse consists of two components: Muse-G (design of desired nesting semantics for
mappings) and Muse-D (choosing the desired interpretation of ambiguous
mappings)
• Few tools are available for helping a designer understand and design
alternative mappings
Designing Nesting Semantics with Muse-G

CompDB: Rcd
Companies: Set of
Company: Rcd
cid
cname
location
f1
Projects: Set of
Project: Rcd
pid
pname
cid
manager
Employees: Set of
f2
Employee: Rcd
eid
ename
contact
OrgDB: Rcd
Orgs: Set of
Org: Rcd
oname
Projects: Set of
Project: Rcd
pname
manager
Employees: Set of
Employee: Rcd
eid
ename
• Nesting semantics are expressed through grouping functions, which are defined
for each nested set in the target schema
Step 2: Probing on the cname attribute
Example source:
Companies
11 IBM NY
14 SBC NY
Projects
P1 DB 11 e4
P4 WiFi 14 e6
Employees
e4 John x234
e6 Kat
x331
• A grouping function is a form of Skolem function, with atomic attributes as
parameters
• Example grouping function from mapping m2
SKProjs(<…all attributes of c, p and e …>) : target Project records are grouped
according to the values of all attributes of the Company, Project and Employee
source records
Example: Designing the grouping function for the target Projects set
• Suppose the set of possible arguments is S = {cid, cname, location}
• Muse-G probes every attribute in S
• At each probe, a small carefully chosen source instance is considered, from which
two differentiating target instances are obtained: one includes the probed attribute
in the grouping function (Scenario 1 below), and the other omits it (Scenario 2
below).
Step 1: Probing on the cid attribute
m1: for c in CompDB.Companies exists o in OrgDB.Orgs
where c.cname=o.oname and
o.Projects = SKProjs(c.cid,c.cname,c.location)
m2: for c in CompDB.Companies, p in CompDB.Projects,
e in CompDB.Employees
satisfy p.cid=c.cid and e.eid=p.manager
exists o in OrgDB.Orgs, p1 in o.Projects, e1 in OrgDB.Employees
satisfy p1.manager=e1.eid
where c.cname=o.oname and e.eid=e1.eid and e.ename=e1.ename
and p.pname=p1.pname and
o.Projects = SKProjs(<…all attributes of c, p and e …>)
Example source:
Companies
11 IBM NY
12 IBM NY
Projects
P1 DB 11 e4
P2 Web 12 e5
Employees
e4 John x234
e5 Anna x888
m3: for e in CompDB.Employees exists e1 in OrgDB.Employees
where e.eid = e1.eid and e.ename=e1.ename
Target instances:
Scenario 2:
Scenario 1:
OrgDB
OrgDB
Orgs
Orgs
IBM
IBM
Projects:SK(y)
Projects:SK(IBM,y)
DB e4
DB e4
WiFi e6
SBC
SBC
Projects:SK(SBC,y)
Projects:SK(y)
WiFi e6
DB e4
Employees
WiFi e6
e4 John
Employees
e6 Kat
e4 John
y subset of {NY}
e6 Kat
The designer chooses scenario 1 (includes cname in the grouping function)
Step 3: Probing on the location attribute
Example source:
Companies
11 IBM NY
13 IBM SF
Projects
P1 DB 11 e4
P2 Web 13 e5
Employees
e4 John x234
e5 Anna x888
Target instances:
Scenario 1:
Scenario 2:
OrgDB
OrgDB
Orgs
Orgs
IBM
IBM
Projects:SK(11,y)
Projects:SK(y)
DB e4
DB e4
IBM
Web e5
Projects:SK(12,y)
Employees
Web e5
e4 John
Employees
e5 Anna
e4 John
y subset of {IBM,NY}
e5 Anna
The designer chooses scenario 2 (excludes cid from the grouping function)
Target instances:
Scenario 1:
OrgDB
Orgs
IBM
Projects:SK(IBM,NY)
DB e4
IBM
Projects:SK(IBM,SF)
Web e5
Employees
e4 John
e5 Anna
Scenario 2:
OrgDB
Orgs
IBM
Projects:SK(IBM)
DB e4
Web e5
Employees
e4 John
e5 Anna
The designer chooses scenario 2 (excludes location from the grouping function)
Conclusion: the desired grouping function for Projects is SK(cname)
Choosing Desired Mapping Interpretation with Muse-D
CompDB: Rcd
Projects: Set of
Project: Rcd
pid
pname
manager
tech-lead
Employees: Set of
Employee: Rcd
eid
ename
contact
OrgDB: Rcd
Projects: Set of
Project: Rcd
pname
supervisor
email
Ambiguous mapping:
ma : for p in CompDB.Projects,
e1 in CompDB.Employees,
e2 in CompDB.Employees
satisfy e1.eid=p.manager and
e2.eid=p.tech-lead
exists p1 in OrgDB.Projects
where p.pname=p1.pname and
(e1.ename=p1.supervisor
or e2.ename=p1.supervisor)
and
(e1.contact=p1.email
or e2.contact=p1.email)
Extensions
• The mapping scenario on the left is ambiguous: it can be interpreted in several ways
e.g. the project supervisor can be either the manager or the tech-lead
• In total, there are four alternative interpretations
• Key idea of Muse-D: provide an example source instance to illustrate the four
interpretations in a compact way
Example source:
Target instance:
Projects
P1 DB e4 e5
Orgs:
Projects:
DB John
Anna
Employees
e4 John john@ibm
e5 Anna anna@ibm
john@ibm
anna@ibm
Choice values for
supervisor and email
(the designer makes
one selection for each attribute)
• Muse-G can take advantage of constraints on
the source schema (such as keys, and more
generally, functional dependencies)
• The designer can refine the desired nesting
semantics incrementally