Databases: Some Research Opportunities For Latin America Marcelo Arenas Pontificia Universidad Católica de Chile Goal Of This Talk Present an interesting area of research in databases Has been identified as an important area Has enough open problems for many research projects Needs theoretical and practical research Has been the subject of some research projects in Latin America The Problem Of Sharing Data Main challenges: Data may reside at several different sites Data may be stored in several different ways Schema level: name, employee_name, emp_name, … Format level: Relational databases, XML, plain text, … The Problem Of Sharing Data name Peter John Global database <employee> <name> John </name> <name> Phil </name> </employee> emp_name Peter Ron Data Exchange Transform data structured under a source schema into data structured under a target schema ∑ST S T Data Exchange Emp(X) Emp(name) Emp name Peter John Worker(X) Worker(name) Worker name Peter John Worker name Ron Data Exchange: Main Challenges Emp(X) Emp(name) Worker(X) Worker(name) What is a good rule language? Emp(name) Emp(name, phone) Worker(name, salary) Worker(name, salary) Data Exchange: Main Challenges Rule language: precise semantics and good expressive power Emp(name) E Emp(X) Y Worker(X,Y) Worker(name, salary) How can we translate the source data? Emp name Peter John Can we do this efficiently? Data Exchange: Main Challenges Emp(name) Emp name Peter John E Emp(X) Y Worker(X,Y) Worker(name, salary) Worker name salary Peter NULL1 100K NULL John NULL2 120K NULL What is a good translation? Data Exchange: Main Challenges Emp(name) Emp name Peter John E Emp(X) Y Worker(X,Y) Worker(name, salary) Worker name salary Peter NULL1 John NULL2 Does Peter have a salary? What is the salary of Peter? How do we answer target queries? Data Exchange: Relational Databases Data exchange has been extensively studied in the relational world IBM Almaden, UCSC and UofT It has also been implemented: Clio (DB2) Semantics of data exchange has been precisely defined Efficient algorithms for translating source data and answering target queries have been developed Ongoing Work XML data exchange Metadata management XML Data Exchange Transform XML data structured What under is the a XML query language: Navigational capabilities difference? source schema into data structured under XML schema: Powerful schema language a target schema. ∑ST S T XML document: Data is semi-structured XML Document: Example <company> <employee> <name> Peter Buneman </name> </employee> <employee> <name> <first> Ron </first> <last> Fagin </last> </name> </employee> </company> Data Exchange: Relational And XML Relational XML schema schema XML schema Emp name <employee> <name> Peter </name> Peter <name> John </name> John </employee> We can do the same for other data formats! XML Data Exchange: Our Contribution Ongoing project: U. Edinburgh, UofT and PUC Chile Results: Fundamental problems of XML data exchange has been solved XML Data Exchange: Our Contribution XML schema XML schema Semantics of XML data exchange has been precisely defined Efficient algorithms for answering Efficient algorithms target for translating queries have also been developed source data have been developed Rule language: precise semantics and good expressive power What Else Has To Be Done? Ongoing Work XML data exchange Metadata management Metadata Management Process of creating schema mappings is time-consuming We need tools to manage schema mappings automatically Metadata Management: Composition ∑ST S T ∑TU ∑SU Composition: ∑SU = ∑ST o ∑TU U Metadata Management: Inverse ∑ST S T Inverse: ∑TU = (∑UT)-1 ∑TU ∑UT ∑SU = ∑ST o (∑UT)-1 Composition: ∑SU = ∑ST o ∑TU U Metadata Management: More Operators ∑ST S T What do we do in this case? ∑SW ∑TU ∑WU W U Metadata Management For Data Exchange Systems General metadata management framework was proposed by Bernstein Based on generic schema-mapping operators: Composition, Inverse, ... Has been studied for the case of relational databases Microsoft, IBM Almaden and UCSC Composition operator has been extensively studied Metadata Management For Data Exchange Systems: Our (proposed) Contribution Starting project: IBM Almaden and PUC Chile Two main components: Continue the study of the relational metadata operators Extend the framework to XML data exchange systems Thank You! Another Interesting Area: RDF What is RDF? A framework for representing information in the Web (W3C) Graph data model RDF: Example John Person rdf:type rdf:sc Employee rdf:type Company Peter works_in Microsoft rdf:type RDF: Possible Applications Web metadata Automatization of information processing on the Web by Agents RDF Databases: Motivation Large volumes of RDF data Use of RDF data in ways unpredicted when first designed Need to design reliable tools to manage RDF data RDF Databases: Motivation “Perhaps most interesting is the research opportunities suggested by the term “semantic Web.” While it may be unclear what the concept truly entails, much of the recent work has centered on “ontologies.” [...] The database community should be looking for opportunities to exploit these developments in future database management systems.” The Lowell Database Research SelfAssessment Meeting, May 2003 RDF Databases: Our Contribution Foundations of RDF databases: U. Chile, CWR and UofT Querying RDF databases: U. Chile, CWR, U. Talca and PUC Chile Querying RDF Databases John Person rdf:type rdf:sc Employee rdf:type rdf:type Company Peter works_in Microsoft rdf:type Querying RDF Databases: Our Contribution SPARQL: A query language for RDF Graph-matching query language W3C Candidate Recommendation 6 April 2006 SPARQL: Example ?X John ?Y ?X :- (?X, works_in, Microsoft) [email protected] John email [email protected] rdf:type Peter rdf:sc Person Employee rdf:type Peter ?X, ?Y :- (?X, rdf:type, Employee) Company ?X OPTIONAL (?X, email, ?Y) Peter works_in Microsoft rdf:type SPARQL: Our Contribution We consider a fragment of SPARQL which encompasses all the main issues yet is simple to formalize. We provide a formal semantics for this fragment. We study the complexity of evaluating queries. Provide complexity bounds. We propose some optimizations techniques. © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
© Copyright 2025 Paperzz