Graph

SQLintersection
Graph Database Processing
with Azure SQL DB and SQL Server 2017
Joe Sack, Principal Program Manager, Microsoft
[email protected]
Session Objectives
 Understand graph scenarios and when they may be useful
 Learn about what’s being offered in SQL Server 2017 and Azure SQL
Database
 Learn enough to be able to get quickly started on your own
 Questions, feedback or issues we don’t get to?

Email me at [email protected]
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Adaptability Adapt based on customer workload characteristics
SQL Server
2017 Themes
Choice
Mindshare
Provide customers with a choice
Leverage the strength of strong technical communities
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Ingredient Recall Scenario
Understanding the impact
Relational vs. Graph
 Graph and relational designs can answer the same questions
 But if traversal of relationships define the primary application
requirements, Graph can solve this more intuitively and with less code
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Define: Graph
 Nodes: Entities – for example, stores, people, products, businesses
 Edges: Relationships between nodes, lines that connect nodes to other
nodes
 Properties or Attributes: information associated with specific nodes
and edges
 Graph: a collection of nodes and edges (or entities and relationships)
What is a graph database?
 Edges or relationships are first class
entities in a Graph Database and can
have attributes or properties
associated with them
 A single edge can flexibly connect
multiple nodes in a Graph Database
 You can express pattern matching and
multi-hop navigation queries easily
 Supports OLTP and OLAP (analytics) just
like SQL databases
Why Graph Databases?
A
Manages
Manages
Location
Location
Manages
O
works_for
Manages
Manages
B
Manages
Leads
E
C
Manages
D
Leads
Manages
Leads
Location
B
Leads
F
collaborates
Hierarchical or interconnected
data, entities with multiple
parents.
A
Manages
Manages
C
D
collaborates
Complex many-to-many
relationships. Organically
grow connections as the
business evolves.
Analyze interconnected data,
materialize new information
from existing facts. Identify nonobvious connections
Graph Scenarios
 Recommendation engines
 Social networks
 Networks and IT infrastructure topologies
 Fraud detection
 Product catalog with sales and marketing data
 IOT device telemetry
SQL Server 2017 Support for graph data
 Native nodes and edge
table support
 Query language
extension provides multihop navigation using
join-free pattern
matching
 Query across regular SQL
tables and graph data
 Interoperability – for
example, support for
Columnstore indexes
Why Extend SQL Server to support Graphs?
• Trusted by many customers for enterprise and mission critical
applications
• Mature product, supports many advanced technologies like high
availability, disaster recovery etc. Also comes with cutting edge
technologies like Columnstore, advanced analytics, ML etc.
• One platform with no need to extract, transform and load data into
another system to analyze relationships
Recommendation Engine
Scenario examples
• Product sales data
• If a person P bought product A, find friends P
who bought A and also bought other products.
Make recommendation based on that.
• Social Network
• Yelp like application, find me friends who like
the same restaurant that I like and also
recommend other restaurants.
Friends
location
hasBought
isLooking
sellsAt
Product
Recommending Songs: Approach
User1
Song1
User2
Song2
User3
Song3
User4
Song4
User5
Song5
The Million Song Dataset
Data courtesy of the Million Song Dataset
(https://labrosa.ee.columbia.edu/millionsong/) and the
Echo Nest Taste Profile Subset
(https://labrosa.ee.columbia.edu/millionsong/tasteprofile
)
Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman,
and Paul Lamere. The Million Song Dataset. In
Proceedings of the 12th International Society for Music
Information Retrieval Conference (ISMIR 2011), 2011.
The Echo Nest Taste profile subset, the official user data
collection for the Million Song Dataset, available at:
http://labrosa.ee.columbia.edu/millionsong/tasteprofile
Implementation using SQL Graph
UniqueUser
(node table)
UniqueSong
Likes
(node table)
(edge table)
CREATE TABLE UniqueUser
(UserId VARCHAR(80))
AS NODE
CREATE TABLE Likes
(ListenCount BIGINT))
AS EDGE
CREATE TABLE UniqueSong
(SongId VARCHAR(50)
,SongTitle VARCHAR(500)
,ArtistName
VARCHAR(500))
AS NODE
Demo
SQL Graph
(Special thanks to Arvind Shyamsundar from the SQLCAT team!)
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Storing edge and nodes in tables
• Natural choice for us (no need to re-invent)
• Storing edge data in a separate table allows us to benefit from the
query optimizer, which can pick the optimal join strategy for large
queries
• Depending on the complexity of query and data statistics, the
optimizer can pick a nested loop join, hash join, or other join
strategies
$node_id
 Node tables have an implicit $node_id
column created which uniquely identifies a
given node in the database
 Combination of object_id of that node
table and an internally generated bigint
value
 When the $node_id column is selected, a
computed value in the form of a JSON
string is displayed
 $node_id is a pseudo column, that maps to
an internal name with hex string in it.
When you select $node_id from the table,
the column name will appear as
$node_id_\<hex_string>
Edge Columns
Metadata
sys.tables
• is_node and is_edge
sys.columns
• graph_type and graph_type_desc
• Shows graph column types like “from ID” and “to ID”
Indexing
Nodes
• We create a default unique, non-clustered index on $node_id by default
Edges
• We automatically create a unique non-clustered index on $edge_id
• For OLTP scenarios, we recommend that users create indexes on these
columns ($from_id, $to_id) for faster lookups in the direction of the edge
DDL (beyond creating the table)
ALTER TABLE
Supported for user-defined columns, indexes or constraints
CREATE INDEX
Supported on user-defined and pseudo-columns for node and edge tables, including
CCI and NCCI
DROP TABLE
Supported, but we don’t automatically cascade deletion of edges or nodes
Supported table types
 Regular tables are supported
 Not supported for nodes or edge tables (for this release):
Memory-optimized
Temporary tables (global and local)
Temporal tables
Stretch tables
External tables (PolyBase)
Can I alter an existing table into a node or edge table?
 In the first release, ALTER TABLE to convert an existing relational table into a
node or edge table is not supported
 Instead, users can create a node table and use INSERT INTO … SELECT FROM
to populate data into the node table
 To populate an edge table from an existing table, proper $from_id and
$to_id values must be obtained from the node tables.
INSERTs into Graph tables
 Inserting into a node is same as inserting into any relational table; the
values for the $node_id column are automatically generated.
 For edge table, users must provide values for $from_id and $to_id columns
DELETE, UPDATE, MERGE
DELETE
• Works the same as regular table
• No constraints to check for edges pointing to deleted nodes, or nodes used with
edges (no cascade)
UPDATE
• You can update user-defined columns – but not $from_id, $to_id
MERGE
• Not supported in this version
How can I ingest unstructured data?
• Since we are storing data in tables, users must know the schema at
the time of creation
• Users can always add new types of nodes or edges to their schema
• If you want to modify an existing node or edge table, they can use
ALTER TABLE to add or delete attributes
• If you expect any unknown attributes in your schema, you could
either use sparse columns or create a column to hold JSON strings
and use that as a placeholder for unknown attributes
Transitive Closure
• How do I find a node connected to me and arbitrary number of hops
away, in my graph?
• The ability to navigate through a combination of nodes and edges, an
arbitrary number of times, is called transitive closure
• For example, find all the people connected to me through three levels of indirections or
find the employee chain for a given employee in an organization
• Transitive closure is not supported in the first release and we’re going to
publish alternative working examples in the meantime
Graph Polymorphism
• How do I find ANY Node connected to me in my graph?
• The ability to find any type of node connected to a given node in a graph is
called polymorphism
• SQL graph does not support polymorphism in the first release
• A possible workaround is to write queries with UNION clause over a known
set of node and edge types
MATCH
 Specifies the search condition for a graph
 Can only be used with graph node and edge tables
 Usable in SELECT statement as part of WHERE clause
 MATCH (<graph_search_pattern>)
Uses ASCII art syntax to traverse a path in the graph
One node to another via an edge in the direction of the arrow provided
MATCH
 Node names or aliases appear at the two ends of the arrow
MATCH(UniqueUser-(LikesThis)->MySong)
 Edge names or aliases are provided inside parentheses
MATCH(UniqueUser-(LikesThis)->MySong)
 The arrow can go in either direction in the pattern
MATCH(SimilarSong<-(LikesOther)-UniqueUser-(LikesThis)->MySong)
MATCH
 The node names inside MATCH can be repeated
 An edge name cannot be repeated inside MATCH
Use an alias for the edge if you want to use to form a new path
 An edge can point in either direction, but it must have an explicit direction
MATCH
 MATCH can be combined with other expressions using AND in the WHERE clause
 OR and NOT operators are not supported in the MATCH pattern
But you can have a sub-query using EXISTS or NOT EXISTS part of the overall query
SQL Graph Customer Scenarios
Fraud detection for online gaming
• Look at devices, IPs, logins and connect to gaming activity
Sales Lead Management
• Traverse a hierarchy of child accounts, and given an opportunity see who
responded to specific emails and promotions
Product hierarchies, Bill of Materials management
• Find the hierarchy for a given product or the hierarchies for a set of products
• Find all products that contain a given product according to the BoM hierarchy
(Revisit) Ingredient Recall Scenario
(Revisit) Understanding the impact
Questions?
Don’t forget to complete an online evaluation!
Graph Database Processing
with Azure SQL DB and SQL Server 2017
Your evaluation helps organizers build better conferences
and helps speakers improve their sessions.
Thank you!
Save the Date
Oct 30-Nov 2, 2017
We’re back in Vegas baby!
www.SQLintersection.com