A Discussion on the Design of Graph Database Benchmarks David Dominguez-Sal, Norbert Martínez, Victor Muntes, Pere Baleta, Josep Lluis Larriba TPCTC 2010 Singapore Motivation GDB Benchmarking 2 Growing volumes of graph data to analyze Motivation Emerging market GDB Benchmarking Performance? Relational, object oriented, XML, etc. Few proposals available 3 Benchmark graph databases Other benchmarks not suitable Many new graph libraries Neo4j, HypergraphDB, Pregel, Jena-RDF, DEX, etc. HPC-SGAB (Bader et al.) Objectives Survey of graph applications with large data volumes Classify graph applications GDB Benchmarking 4 Datasets Operations Set GDB benchmarking as an open discussion topic 1. GDB Benchmarking 2. 3. 4. 5 Introduction Graph description Graph operations Experimental setting Representative areas 1. Social graphs GDB Benchmarking 2. Biological graphs 6 Relations generated explicetely by human interactions. E.g. Facebook, flickr, citation author networks... Relation defined by observations on nature E.g. Protein to protein interaction, food web chain, biochemical reaction Representative areas Routing 3. Relations are physical (usually 2D) E.g. Road routing, communication networks, real time traffic analysis. GDB Benchmarking 4. Recommendation 7 Mixed information sources to mine Eg: product recommendation, advertising… Graph description Attributes GDB Benchmarking Nodes, edges (e.g. weight). Identifiers Directed / Undirected Labeling (Typing) Multigraphs Hypergraphs 8 Hyperedges may be modeled as special nodes 1. GDB Benchmarking 2. 3. 4. 10 Introduction Graph description Graph operations Experimental setting Graph operations Basic analysis: GDB Benchmarking Basic transformations 11 Get node/edge Get attributes from a node or an edge Get neighbors Node degree Add/delete node/edge Add/delete/update attribute Graph operations High level operations GDB Benchmarking 12 Traversals Component analysis Communities Graph analysis (statistics) Centrality measures Pattern matching Anonymization Operation categorization Transformation / Analysis Cascaded access GDB Benchmarking Scale Nodes, edges, none Result 13 Global, neighborhood Attributes At least depth 2 (friends of my friends) Graph, aggregated results, sets. GDB Benchmarking 14 Summary of graph operations 1. GDB Benchmarking 2. 3. 4. 15 Introduction Graph description Graph operations Experimental setting Experimental setting Configuration and setup GDB Benchmarking Experimental process Warmp up, query sequence, sampling procedure Measures 16 Data partitioning, indexing, redundancy, data reorganization, ACID? Isolation? Eventual consistency? Simple but adapted to the audience Eg: Load time, response time, throghput, image size, power, price/throughput, etc. Adapted to graph TEPS, query completeness vs time Conclusions Graph databases is an emerging market GDB Benchmarking Many applications appearing 17 Large volumes of graph data available to analyze. Benchmark comparison Graphs are varied and its applications differ, but they have many shared aspects Conclusions GDB Benchmarking Expectations of a generic graph benchmark: Candidate scenario: Social networks Large datasets, variety of operations, industrial interest. Future work 18 Attributed, labeled (types), directed, multigraph. Significant set of cascaded and graph result operations Definition of experimental process Materialize the benchmark Analysis and optimization of QA systems 19 Thanks! Questions
© Copyright 2026 Paperzz