Learning Convolutional Neural Networks for Graphs

GA-653449
Learning Convolutional Neural
Networks for Graphs
Mathias Niepert
Mohamed Ahmed
NEC Laboratories Europe
Konstantin Kutzkov
Representation Learning for Graphs
Telecom
Safety
Transportation
Industry
Intermediate Representation: Graphs
?
Deep Learning System
(Edge) deployment
2
Learning Convolutional Neural Networks for Graphs
Smart cities
Problem Definition
▌ Input: Finite collection of graphs
…
● Nodes of any two graphs are not necessarily in correspondence
● Nodes and edges may have attributes (discrete and continuous)
▌ Problem: Learn a representation for classification/regression
▌ Example: Graph classification problem
[
class
3
] =?
Learning Convolutional Neural Networks for Graphs
State of the Art: Graph Kernels
▌ Define kernel based on substructures
● Shortest paths
● Random walks
● Subtrees
● ...
▌ Kernel is similarity function on pairs of graphs
● Count the number of common substructures
▌ Use graph kernels with SVMs
4
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
5
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
6
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
7
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
8
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
9
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
10
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
node sequence selection (w=6 nodes)
11
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
12
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
13
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
14
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
15
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
16
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
17
Learning Convolutional Neural Networks for Graphs
Patchy: Learning CNNs for Graphs
3
1
2
4
1
3
1
2
2
2
3
4
4
3
4
4
1
2
1
4
2
3
neighborhood normalization (exactly k=4 nodes)
neighborhood assembly (at least k=4 nodes)
node sequence selection (w=6 nodes)
18
Learning Convolutional Neural Networks for Graphs
1
3
Patchy: Learning CNNs for Graphs
●
●
2
4
n
a
a
m
a
…
1
m
a
…
1
3
1
2
2
2
3
4
4
3
4
4
1
2
1
4
neighborhood normalization
19
1 2 3 4
1
2
3
4
1
1 2 3 4
1
…
a
…
1
1
2
3
4
1 2 3 4
normalized neighborhoods serve as receptive fields
node and edge attributes correspond to channels
3
1
…
a
a
1 2 3 4
1
…
a
m
1
2
3
4
a
1 2 3 4
a
1 2 3 4
1
…
a
n
a
n
Convolutional architecture
Learning Convolutional Neural Networks for Graphs
2
3
1
3
Node Sequence Selection
▌ We use centrality measures to generate the node sequences
▌ Nodes with similar structural roles are aligned across graphs
node sequence selection
A: Betweenness centrality
centrality
C: Eigenvector centrality
20
B: Closeness
D: Degree centrality
Learning Convolutional Neural Networks for Graphs
Neighborhood Assembly
▌ Simple breadth-first expansion until at least k nodes added,
or no additional nodes to add
neighborhood assembly
21
Learning Convolutional Neural Networks for Graphs
Graph Normalization Problem
▌ Nodes of any two graphs should have similar position in the
adjacency matrices iff their structural roles are similar
adjacency matrices under labeling
2
4
1
labeling
method
(centrality etc.)
distance measures in
matrix and graph space
normalization
▌ Result: For several distance measure pairs
it is possible to efficiently compare labeling
methods without supervision
▌ Example: ||A – A’||1 and edit distance
on graphs
22
3
Learning Convolutional Neural Networks for Graphs
Graph Normalization
3
2
3
3
1
2
3
Distance to root node
23
Learning Convolutional Neural Networks for Graphs
Graph Normalization
5
2
5
4
1
3
6
Centrality measures, etc.
3
2
3
3
1
2
3
Distance to root node
24
Learning Convolutional Neural Networks for Graphs
Graph Normalization
5
2
6
4
1
3
7
Canonicalization (break ties)
5
2
5
4
1
3
6
Centrality measures, etc.
3
2
3
3
1
2
3
Distance to root node
25
Learning Convolutional Neural Networks for Graphs
Computational Complexity
▌ At most linear in number of input graphs
▌ At most quadratic in number of nodes for each graph
(depends on maximal node degree and centrality measure)
26
Learning Convolutional Neural Networks for Graphs
v
1
…
v
M
Convolutional Architecture
27
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
v
1
…
v
M
Convolutional Architecture
28
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
Convolutional Architecture
M
v
v
1
v
1
…
…
v
M
…
29
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
v
1
…
v
N
Convolutional Architecture
field size: 3, stride: 1, N filters
M
v
v
1
v
1
…
…
v
M
…
30
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
v
1
…
v
N
Convolutional Architecture
field size: 3, stride: 1, N filters
M
v
v
1
v
1
…
…
v
M
…
31
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
Convolutional Architecture
N
v
v
1
v
1
…
…
v
N
…
field size: 3, stride: 1, N filters
M
v
v
1
v
1
…
…
v
M
…
32
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 3 4
1 2 3 4
1
n
a
…
1
a
1 2 3 4
…
a
n
a
…
1
1 2 3 4
a
a
1 2 3 4
1
…
a
n
field size: 4, stride: 4, filters: M
Experiments - Graph Classification
▌ Finite collection of graphs and their class labels
…
class = 1
class = 0
class = 1
● Nodes of any two graphs are not necessarily in correspondence
● Nodes and edges may have attributes (discrete and continuous)
▌ Learn a function from graphs to class labels
[
class
33
] =?
Learning Convolutional Neural Networks for Graphs
Experiments - Convolutional Architecture
softmax
flatten, dense 128 units
8
…
1
1
…
8
…
field size: 10, stride: 1, filters: 8
16
…
…
16
1
1
…
34
Learning Convolutional Neural Networks for Graphs
n
a
…
n
a
…
1
a
1 2 … 10
1 2 … 10
1
n
a
…
1
a
1 2 … 10
…
a
n
a
…
1
1 2 … 10
a
a
1 2 … 10
1
…
a
n
field size: k, stride: k, filters: 16
Classification Datasets
▌ MUTAG: Nitro compounds where classes indicate mutagenic
effect on a bacterium (Salmonella Typhimurium)
▌ PTC: Chemical compounds where classes indicate
carcinogenicity for male and female rats
▌ NCI: Chemical compounds where classes indicate activity
against non-small cell lung cancer and ovarian cancer cell lines
▌ D&D: Protein structures where classes indicate whether
structure is an enzyme or not
▌ ...
35
Learning Convolutional Neural Networks for Graphs
Experiments - Graph Classification
▌ Q: How efficient and effective compared to graph kernels?
▌ Apply Patchy to typical graph classification benchmark data
36
Learning Convolutional Neural Networks for Graphs
Experiments - Visualization
▌ Q: What do learned edge filters look like?
▌ Restricted Boltzmann machine applied to graphs
▌ Receptive field size of hidden layer: 9
graphs sampled from RBM
small instances of graphs
37
weights of hidden nodes
Learning Convolutional Neural Networks for Graphs
Discussion
▌ Pros:
● Graph kernel design not required
● Outperforms graph kernels on several datasets (speed and accuracy)
● Incorporates node and edge features (discrete and continuous)
● Supports visualizations (graph motifs, etc.)
▌ Cons:
● Prone to overfitting on smaller data sets (graph kernel benchmarks)
● Shift from designing graph kernels to tuning hyperparameters
● Graph normalization not part of learning
code to be released: patchy.neclab.eu
38
Learning Convolutional Neural Networks for Graphs