SPARTex: A Vertex-Centric Framework for RDF Data Analytics
Ibrahim Abdelaziz∗
Razen Harbi∗
Semih Salihoglu‡
Panos Kalnis∗
Nikos Mamoulis†
King Abdullah University of Science & Technology, {first}.{last}@kaust.edu.sa
Stanford University, [email protected]
†
University of Ioannina, [email protected]
∗
‡
MOTIVATION
• Limitations:
• Public knowledge bases have billions of facts
in RDF format:
– Bio2RDF, DBpedia, Probase ...etc.
• A growing number of applications require
combining SPARQL queries with generic graph
analytics.
– RDF systems support SPARQL only and
can not run analytics.
– SPARQL lacks procedural capabilities.
– Graph frameworks can not evaluate adhoc SPARQL queries.
SYSTEM ARCHITECTURE
•
•
•
•
•
•
Based on the vertex-centric computation model.
Efficient and scalable SPARQL operator.
Graph analytics and ad-hoc SPARQL querying can be pipelined.
Computation results are materialized as vertex attributes.
Vertex-centric programs are treated as stored procedures.
Stored procedures are declaratively invoked from SPARQL.
SPARQL EXTENSION
• User-Defined Function Invocation
PREFIX p r e f i x : path
CALL p r e f i x : proc ( l i s t [ parmas ] ) AS l i s t [ p r o p e r t i e s ]
• Managing Graph Properties
ADD PROPERTY { l i s t [ p r o p e r t y p a t t e r n s ] } WHERE {BGP}
DROP PROPERTY { l i s t [ p r o p e r t y p a t t e r n s ] } WHERE {BGP}
• Graph Analytics Filters using SPARQL
FILTER_VERTEX AS f i l t e r _ n a m e WHERE { BGP }
FILTER_EDGE AS f i l t e r _ n a m e WHERE { BGP }
Combining SPARQL and graph properties
SPARQL and analytics
PREFIX a l g o : < f i l e : / / p a t h _ t o _ a l g o r i t h m s >
PREFIX s p t x : < h t t p : / / www. s p a r t e x . com / a n a l y t i c s / >
CALL a l g o : c e n t r a l i t y ( ) AS s p t x : c e n t r a l i t y
CALL a l g o : PageRank ( m a x _ i t e r ) AS s p t x : pRank
ADD PROPERTY { ? p s p t x : p o p u l a r " 1 " . } WHERE {
?p
teaches ?c .
?s
takes
?c .
?s
a d v i s o r ?p .
?p
s p t x : pRank
? rank .
?c
sptx : c e n t r a l i t y
?cent .
FILTER ( ? rank > v a l 1 && ? c e n t > v a l 2 )
}
FILTER_VERTEX AS s t a r t WHERE {
?p s p t x : p o p u l a r " 1 " .
}
CALL a l g o : SSSP ( ) USING s t a r t AS s p t x : sssp
SELECT ?s WHERE {
?p
teaches ?c .
?s
takes
?c .
?s
a d v i s o r ?p .
?p
s p t x : pRank
? rank .
?c
sptx : c e n t r a l i t y
?cent .
FILTER ( ? rank > v a l 1 && ? c e n t > v a l 2 )
}
SPARQL OPERATOR
Queries are solved by following a tour of the query graph.
300
Query graphs are made Eulerian.
250
Branch and bound strategy to find all possible plans.
Cost-based optimizer that minimizes the number of exchanged
messages during query evaluation.
Runtime (mins)
•
•
•
•
Rich RDF Analytics
350
200
Analytics
Formating
304
Re-Indexing
SPARQL
211.37
207.27
150
100
50
20.38
16.28
0
H2RDF+ H2RDF+ SPARTex
GPS PEGASUS
H2RDF+ SPARTex
GPS
Use Case 1
Use Case 2
SPARQL Query Evaluation
LUBM-10240
L1
L2
L3
L4
L5
L6
L7
SPARTex
4.97 10.29
5.81 3.34 3.33 2.38 7.81
SHARD
413.72 187.31 ABORTED 358.20 116.62 209.80 469.34
H2RDF+
285.43 71.72
264.78 24.12 4.76 22.91 180.32
SPARTex-Native 2.881 0.406
2.953 0.001 0.001 0.01 2.386
RDF-3X
7765.36 14.91 1927.80 0.020 0.010 0.51 75.70
Trinity.RDF
7.00 3.50
6.00 0.004 0.003 0.01 27.5
TriAD
7.631 1.663
4.290 0.002 0.001 0.069 14.895
TriAD-SG
2.15 2.02
1.64 0.001 0.001 0.001 16.863
http://cloud.kaust.edu.sa
DOWNLOAD IT · USE IT · SHARE IT · CONTACT US
© Copyright 2026 Paperzz