SPARQL Optimisation and Semantic store use cases Ganesh Selvaraj [email protected] 1 Semantic web The Semantic Web is an extension of the Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF). 2 Semantic web 3 Semantic web AAA - “Anyone can say Anything about Any topic” Which means schema less or semistructured schema. 4 Triplestore A semantic database to store semantic web data in triples. Subject Ganesh Predicate LivesIn Object Auckland 5 Schema free - challenges Semantic web’s (or any schema free datastores) biggest issue is its main advantage -> AAA Ganesh LivesIn Auckland Ganesh ResidesIn Auckland 6 Adding meaning about schema Ontologies are used to add meaning to semantic data. Also this needs to be interpreted beforehand. Reasoners are used to infer more facts based on assertions. ResidesIn SameAs LivesIn 7 Sample RDF Data Ganesh livesIn Auckland. Ganesh likes cars. John likes cars. John likes bikes. John likes surfing. Ganesh friendof John. 8 SPARQL 9 SPARQL SPARQL (a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, used to retrieve data from triplestores. 1. 2. 3. SQL like syntax and capabilities (join, filter, aggregate etc). Has pattern matching capabilities. Has 4 capabilities -> Select, Ask, Construct, Describe. 10 SPARQL Example Select ?x,?y { ?x LivesIn “Auckland”. ?x friendof ?y. } 11 Query Optimisation Query Optimization is defined as the process of reducing the response time (the time elapsed from the moment a query started its execution until the time it returns the result) for a query. 12 Query optimisation Challenges One of the hardest problems in query optimization is to accurately estimate the costs of alternative query plans. Optimizers cost query plans using a mathematical model of query execution costs that relies heavily on estimates of the cardinality, or number of tuples, flowing through each edge in a query plan 13 Join order optimisation The order in which the joins are executed in a query is called a join execution order. Suitable reordering of joins (join-order optimization) in a query can reduce the query response time by several orders of magnitude. 14 Join order optimisation Query version1: Select ?x { ?x Gender Male. -->pattern A ?x hasEmail [email protected] -->pattern B } Query version2: Same Query with re-ordered joins, Select ?x { ?x hasEmail [email protected] -->pattern B ?x Gender Male. -->pattern A } 15 Join order optimisation 16 Cost estimation Simply it can be assumed as number of results a particular pattern might result in. For example; costOf(?x Gender Male) > costOf(x hasEmail [email protected]). 17 PdStore query evaluation ATMO, PDStore uses index nested loop joins to evaluate queries. As per join order optimisation, in a nested loop, a low costing pattern has to be executed before a high costing pattern. 18 Pdstore - LSO LSO -> Learning Statistics Optimiser is a hybrid cost and heuristics based optimiser used in PdStore. 19 Cost Model In a QEP, cost model describes how a cost for a query or part of query is generated and stored for later use. 20 SPARQL Cost model Challenges In relational databases, the cost model is usually developed against the schemata, but the schema-relaxed nature of RDF and unpredictable join paths (due to the absence of key constraints) in SPARQL complicate the cost model for RDF. A cost model comprising the combination of all RDF terms in a triple pattern would be sufficient but may result in huge statistics. Furthermore, creating and maintaining such exhaustive statistical data in an often changing web-scale scenario would be very costly in terms of time, hardware and other resources 21 LSO Cost model Predicate driven cost model, as almost 77% or more queries have known predicates. 22 Cost information per predicate Example Ganesh likes 1000 John likes 5000 Dave likes 2000 Kate likes 10000 23 Abstract Triple Pattern The different possible combinations of bound and unbound values in a triple pattern, as a concept not associated with any data in particular, we call that abstract triple pattern. In a triple pattern, apart from S,P and O, an unbound (un- known query variable) value is denoted as X. 24 Abstract Triple pattern The possible abstract triple pattern combinations are: ● ● ● ● ● ● ● ● XXX: All three RDF terms are unbound. XPX: Only the Predicate is bound. SXX: Only the Subject is bound. XXO: Only the Object is bound. SXO: Only the Predicate is unbound. SPX: Only the Object is unbound. XPO: Only the Subject is unbound. SPO: All three RDF terms are bound. 25 LSO Cost model Subject Predicate Object Predicate AbstractTriplePatter nType Cost hasEmail XPO 1 hasEmail SPX 3 26 Learning query cost from query Execution 27 Heuristics to estimate cost Sel(XXX) > Sel (XXO, SXX, XPX) > Sel (SPX, XPO) > Sel(SXO) > Sel(SPO) 28 Benchmarking Results - LUBM 29 30 31 LSO vs Jena 32 use case 1 - Recommendation engine 33 Use Case - Paper Smart insights - ESWC 2014 https://pdfs.semanticscholar.org/107a/6e4a63884bbcc4b730d2d3190ff32290fdd0. pdf 34 Thank You 35
© Copyright 2026 Paperzz