A Discussion on the Design of Graph Database Benchmarks

A Discussion on the Design of
Graph Database Benchmarks
David Dominguez-Sal, Norbert
Martínez, Victor Muntes, Pere Baleta,
Josep Lluis Larriba
TPCTC 2010
Singapore
Motivation
GDB Benchmarking
2
Growing volumes of graph data to analyze
Motivation
Emerging market
GDB Benchmarking
Performance?
Relational, object oriented, XML, etc.
Few proposals available
3
Benchmark graph databases
Other benchmarks not suitable
Many new graph libraries
Neo4j, HypergraphDB, Pregel, Jena-RDF, DEX, etc.
HPC-SGAB (Bader et al.)
Objectives
Survey of graph applications with large
data volumes
Classify graph applications
GDB Benchmarking
4
Datasets
Operations
Set GDB benchmarking as an open
discussion topic
1.
GDB Benchmarking
2.
3.
4.
5
Introduction
Graph description
Graph operations
Experimental setting
Representative areas
1.
Social graphs
GDB Benchmarking
2.
Biological graphs
6
Relations generated explicetely by human
interactions.
E.g. Facebook, flickr, citation author networks...
Relation defined by observations on nature
E.g. Protein to protein interaction, food web
chain, biochemical reaction
Representative areas
Routing
3.
Relations are physical (usually 2D)
E.g. Road routing, communication networks,
real time traffic analysis.
GDB Benchmarking
4.
Recommendation
7
Mixed information sources to mine
Eg: product recommendation, advertising…
Graph description
Attributes
GDB Benchmarking
Nodes, edges (e.g. weight).
Identifiers
Directed / Undirected
Labeling (Typing)
Multigraphs
Hypergraphs
8
Hyperedges may be modeled as special nodes
1.
GDB Benchmarking
2.
3.
4.
10
Introduction
Graph description
Graph operations
Experimental setting
Graph operations
Basic analysis:
GDB Benchmarking
Basic transformations
11
Get node/edge
Get attributes from a node or an edge
Get neighbors
Node degree
Add/delete node/edge
Add/delete/update attribute
Graph operations
High level operations
GDB Benchmarking
12
Traversals
Component analysis
Communities
Graph analysis (statistics)
Centrality measures
Pattern matching
Anonymization
Operation categorization
Transformation / Analysis
Cascaded access
GDB Benchmarking
Scale
Nodes, edges, none
Result
13
Global, neighborhood
Attributes
At least depth 2 (friends of my friends)
Graph, aggregated results, sets.
GDB Benchmarking
14
Summary of graph operations
1.
GDB Benchmarking
2.
3.
4.
15
Introduction
Graph description
Graph operations
Experimental setting
Experimental setting
Configuration and setup
GDB Benchmarking
Experimental process
Warmp up, query sequence, sampling procedure
Measures
16
Data partitioning, indexing, redundancy, data
reorganization,
ACID? Isolation? Eventual consistency?
Simple but adapted to the audience
Eg: Load time, response time, throghput, image
size, power, price/throughput, etc.
Adapted to graph TEPS, query completeness vs
time
Conclusions
Graph databases is an emerging market
GDB Benchmarking
Many applications appearing
17
Large volumes of graph data available to analyze.
Benchmark comparison
Graphs are varied and its applications
differ, but they have many shared
aspects
Conclusions
GDB Benchmarking
Expectations of a generic graph
benchmark:
Candidate scenario: Social networks
Large datasets, variety of operations, industrial
interest.
Future work
18
Attributed, labeled (types), directed, multigraph.
Significant set of cascaded and graph result
operations
Definition of experimental process
Materialize the benchmark
Analysis and optimization of QA systems
19
Thanks!
Questions