Subgraph Isomorphism Graph Challenge - draft -

Subgraph Isomorphism Graph Challenge
- draft Siddharth Samsi, Vijay Gadepally, Jeremy Kepner, Albert Reuther
http://GraphChallenge.org
Outline
•  Introduction
•  Data Sets
•  Static Graph Isomorphism Challenge
•  Metrics
•  Summary
Graph Challenge - 2
Introduction
•  Previous challenges in machine learning, High Performance Computing and visual
analytics include
–  YOHO, MNIST, HPC Challenge, ImageNet, VAST
•  GraphChallenge encourages community approaches, such as DARPA HIVE, to
develop new solutions for analyzing graphs derived from social media, sensor feeds,
and scientific data to enable relationships between events to be discovered as they
unfold in the field
•  GraphChallenge organizers will provide specifications, data sets, data generators, and
serial implementations in various languages
•  GraphChallenge participants are encouraged to apply innovative hardware, software,
and algorithm techniques to push the envelop of power efficiency, computational
efficiency, and scale
•  Submissions will be in the form of full conference write ups to IEEE HPEC which will
allow participants to be evaluated on their complete solution
Graph Challenge - 3
Graph Challenge
•  GraphChallenge seeks input from diverse communities to develop graph challenges
that take the best of what has been learned from groundbreaking efforts such as
GraphAnalysis, Graph500, FireHose, MiniTri, and GraphBLAS to create a new set of
challenges to move the community forward
•  Initial Graph Challenges
–  Static Graph Challenge: Sub-Graph Isomorphism
•  This challenge seeks to identify a given sub-graph in a larger graph
–  Streaming Graph Challenge: Stochastic Block Optimization
•  This challenge seeks to identify optimal blocks (or clusters) in a larger graph
Graph Challenge - 4
Static versus Streaming Mode
•  Static graph processing
–  Given a large graph G
–  Evaluate ƒ(G)
•  Two classes of streaming: stateless and stateful
–  Stateless: process data g as it goes by ƒ(g)
–  Stateful: add data g to a larger corpus G and then process corpus ƒ(G + g)
•  Graph processing is often in the stateful category
–  Easy to simulate by partitioning G into streaming pieces g
Graph Challenge - 5
Labels and Filtering
•  Filtering on edge/vertex labels is often used when available to reduce the search
space
–  Filtering can be applied at initialization, during intermediate steps, or at the vertex level
–  Very problem dependent
•  Some Graph Challenge data sets have labels and some data sets are unlabeled
–  Some participants will want to filter on labels
•  Initial example implementations will work without labels
–  Labels can be used if they choose
•  GraphChallenge is judged by a panel
–  Panel will value the variations that are submitted
Graph Challenge - 6
Outline
•  Introduction
•  Data Sets
•  Static Graph Isomorphism Challenge
•  Metrics
•  Summary
Graph Challenge - 7
Publicly Available Datasets
Initial Datasets in Blue
Stanford Large Network Dataset Collection snap.stanford.edu/data
• 
Social networks (10 data sets up to 4.8M vertices & 69M edges)
– 
• 
• 
– 
• 
• 
Nodes represent scientists, edges represent collaborations
Nodes represent webpages and edges are hyperlinks
Amazon networks (5 data sets up to 548K vertices & 3.4M edges)
– 
• 
• 
Nodes represent products and edges link commonly co-purchased
products
– 
Nodes represent computers and edges communication
• 
Data from online communities such as Reddit and Flickr
Online reviews (6 data sets up to 34M product reviews)
– 
Graph Challenge - 8
Memetracker phrases, links and Tweets
Online communities (3 data sets up to 2.3M images)
– 
Internet networks (9 data sets up to 62K vertices & 147K edges)
Talk, editing, voting, and article data from Wikipedia
Twitter and Memetracker (4 data sets up to 96M vertices & 476M edges)
– 
• 
Social networks with geographic check-ins
Wikipedia networks, articles, and metadata (7 data sets up to 3.5M
vertices & 250M edges)
– 
• 
Networks with positive and negative edges (friend/foe, trust/distrust)
Location-based online social networks (2 data sets up to 198K vertices
& 950K edges)
– 
• 
Graphs of the internet
Signed networks (10 data sets up to 4.8M vertices & 69M edges)
– 
Web graphs (4 data sets up to 875K vertices & 5.1M edges)
– 
• 
• 
Nodes represent intersections and edges roads connecting the
intersections
Autonomous systems (5 data sets up to 26K vertices & 106K edges)
– 
Nodes represent papers, edges represent citations
Collaboration networks (5 data sets up to 23K vertices & 198K edges)
– 
• 
Email communication networks with edges representing communication
Citation networks (3 data sets up to 3.7M vertices & 16M edges)
Road networks (3 data sets up to 1.9M vertices & 2.8M edges)
– 
Ground-truth network communities in social and information networks
Communication networks (3 data sets up to 2.3M vertices & 5M edges)
– 
• 
Online social networks, edges represent interactions between people
Networks with ground-truth (6 data sets up to 66M vertices & 1.8B edges,
e.g. Friendster)
– 
• 
Data from online review systems such as BeerAdvocate and Amazon
Publicly Available Datasets
Initial Datasets in Blue
Yahoo! Webscope Datasets webscope.sandbox.yahoo.com
AWS Public Data Sets aws.amazon.com/public-data-sets
• 
• 
• 
• 
• 
Astronomy (1 data set 180 GB)
–  Sloan Digital Sky Survey SQL MDF files
Biology (4 data sets up to 200 TB)
–  Genome sequence data
Climate (3 data sets size growing daily)
–  Satellite imagery data
Economics (10 data sets up to 220 GB)
–  Census, transaction, and transportation data
• 
Encyclopedic (10 data sets up to 541 TB - Common Crawl Corpus)
–  Various online encyclopedia data
Geographic (3 data sets up to 125 GB)
• 
–  Street maps
Mathematics (1 data sets 160 GB)
• 
–  Yahoo!'s auction-based platform for selling advertising space
• 
Competition Data (3 data sets up to 1.5 GB)
–  Data challenges run by Yahoo
• 
Computing Systems Data (5 data sets up to 8.8 GB)
–  Computer systems log data
• 
Graph and Social Data (3 data sets up to 5 GB)
–  Graph data from search, groups, and webpages
• 
Image Data (3 data sets up to 14 GB)
–  Flickr imagery and metadata
• 
Language Data (29 data sets up to 166 GB - Answers browsing behavior)
–  Wide range of question/answer data sets
–  University of Florida Sparse Matrix Collection
• 
Graph500.org Data Generator
Advertising and Market Data (4 data sets up to 3.7 GB)
Ratings and Classification Data (10 data sets up to 83 GB — 1.5 TB
dataset withdrawn)
–  Community preferences and data
•  Used to generate world’s largest power law graphs
•  Can be modeled with Kronecker Product of a Recursive
MATrix (R-MAT): G⊗k = G⊗k–1 ⊗ G
– 
Where “⊗”denotes the Kronecker product of two matrices
Graph Challenge - 9
Tools to generate (at various scales and parameters)
data sets will be provided as part of the challenge
•  R-MAT: A Recursive Model for Graph Mining, Chakrabarti, Zhan & Faloutsos (2004)
•  Mathematically Tractable Graph Generation and Evolution, Leskovec, Chakrabarti, Kleinberg & Faloutsos, (2005)
Ti
CA
AA phai
an S
RN ne
Bfo
et Lara garin
rC
m
icloha
H
ud
k__w
ud ello
CurlyBe
autys
josephredfern
lp
he
rm
lsrl
ed
9
6C
dn
elA
nd
nic
scottdware
emoore
CertiVox
kasimerkan rees
_
ughdom
ity
pan
Cover
RFairclo
ri
hyeti
ieIO
LinuxMisyone ri
ch Sec
mastalet
linuxnotla
cuer t ou u27 Te nfo
tI
nresw8rbwpgri4
ne
Ha
n
ntai ks
Ja
Cana
mou wor
s in
Sinan
Net
esu kle
r
ec
an T_
gu
ecJ joe
unda ult IC nfos
mD RrlaciS
tofuS
kI
re
n
Info
DayTTa
ig
Clic
DrKe
PG
es
D
glew
ah
ooha
G ns
re
atC
C
ks
on
m
SS
PO urj
ID ik
ire
c
Jh tors
on
yK
nx
Ju
n
am
ch
moh
Ra
Wikipedia Adminship
Road Network
yP
ge
inen
l
Ed
oist
ola
ita
kk
ig
riK
Pet i usiJaa
nD
so
eI U
ea
t
t
Aks A
Rerie
es
ne
kki
ez
vaH
ckie
on
LD
altd
PJ
Pere
rism
VM
gu d
e g owie
chmichto vikY
olic afta B
Thai
Hop sh Cas d
mK
Fran
ar
gm
el
U
1
kg
_
lh0
linprot
ku im
F ymza ta r wi
or
ota asgr
e
bioS
he aHuhmpile
ala
x g3aT
eK rbyem ng
hm
oherr
anme
rk
er6_6
rli inbi
gerCo
Liv ecm
mat Jasin
heini
eri
teng
num
cje
ia D
hester Ge
ut no
PauliAlti
a the88
th ays se onys Scec
se
sal
ulC k
ian
th
sp Pa dd kjope tomihj jve
info
Cynpsd an
ilwain
da
w b berze kra
vo
mattmc
Ho pz cy
atotoro
junaxu
de
UX
ox saunmobify
io_
lexisani
oinB
Turm
15
ELF
d_o
Bluc
Bar
n
Indie
ko ano
reaerf
laGe
AlexKara
ospad
thalmicTese
msauk
Andvgc
archer aggiora
giannim
Apryl_P
evktalo
r00f
gen ova
hack_lu
Tiffani_B
nikmd23
SofieHa
ibroadfo
bull3tp
erry atseitlin
KPbewelldoc
xandersh
is RusiAlpo
farahney
_tty0
antkanan
SexyLikeMeios
inboxbygmail mikk0j
skriptuurus
kev_mcdougall
defcoin
ustwogames
anttitikkanen
SpikeBeeJo
willcanine
twittersecurity
NeustarCTO
daavidhentunen marpichun
AbiWilks
HopeFran
jerrychen
k
herrod
GPGTools chrisb
jonasanbrow
kelly
certiv
nsporndly
shittydogs
Reversity camicelli
Mbosinwa
fallen
ox
prcheachine
abhishe
jehova SoluM
stevewerby
lab
ney s
Agsarr simonb
jasontco
kmdb
i_FRCryp ayly soto
Boulevard_
phil
hen mesosphe
ngc
pra
toV
ponore leahk
iliMo
Beer
xistorM
yu
PekkaN
uipers
ijim illa
gua
an
n0
out
Weikan
q hN3
riaNig
en sege
geeb
stin
erikander
l htD
sleemarkaci
angeTe e
x0
ar hum
ws der
75
iveStu
lorbo
son
alb
ble ro
sb
re st viP_ri mat eichp9
xa
0
Daarchuser
Se
ert
aeFe
l04
ne Fig
dio
sV
0olng
ini10
ria ma
nm
htThe ha
einm evetesh tbra1c
An
loogxii nox_ CAh
arsmi reasona
onO_Po cke mhausher
wo JR ga Gudad
ar
ym
rspace
Jeffr bly
roag
o wer13
r
odstar
NSaNew
o
ztehowe
eyCh snd jda sb
WOGofFARs
G su
bernin
Ancks emmangoldstein
lessn lehtio
DC s Br
deklano
rebrowfest
Ubad
irib b
upha
r2
_ua ian
oRn
o
er_P
Uto
wski
Go
d
ga
AS
nt
n
ot top
ea
2
ifeulne
ule
D
arOS
ke lerthbpi Erik
es Keltoedy
aark
t_is _S
r
gusuEC
a Rno Colis jjcga
Kolib
rtw en nn
utro
felip
ervic
Pron
ree
P
ti ele
ra
bla Mic un
cu hAatrtifa_io an
okjetaoscl
ad ivlp
ecli
e on
deray
sktnwe LB
db
M
or
de
bct
ck hqeirllee etbe er
re
Fo
anna
d e
fre pla
rs oor
adrie
rte nichquito
LxM C2D rtsra penos ap
n692
eb otr_im
olas flo
ageis
tlu i diea alex
skim
90
sd nsP langcKry P nruiz ajuC
nd
jheb orba UN
mfr
pp lear
lia
hu
gir le
u ep H o
lesh n3
ert ieli BF
sednawk
cssq
as age tocoIL_ ra
ca
lComunityCube
cl
Fund
KillYourTV_I2P
CSua
ow 005
ure ejomcin Fdan_sie
me
Dog
re_u
bo
huAPISHsllvim
__ nejoepie91
o
Flo
nb
nkI inm
iamjohnoliver
sM
Kedecourl
i2porignal
aks
o
tilla
vin ayy11 mna
rq
MorH
DebonairFox
ic od UID
Da
Sp g _
nieM
anked
Har
ehes
iamsambee
lRari
mon
MsLods
r ka
opjk
eD
rs
ers
yG
va
87
rits
ne
ijsF
dueri
noF
ari
Tay
ort
esl
eD
ale
ob
iggs
za
Be
atr
ijs
Rits
No
em
diN
a
ov h
i art
ro
be
rt
Twitter Data
•  Public data is available in a variety of formats
–  Linked list, tab separated, labeled/unlabeled
•  Requires parsing and standardization
•  Proposed formats
–  Tab separated triples in ASCII file with labels removed
–  MMIO ASCII format: math.nist.gov/MatrixMarket
Amazon cloud services will be used to host particularly large data sets
Graph Challenge - 10
The Matrix Market Exchange Formats: Initial Design, Boisvert, Pozo & Remington (1996)
erc
esA
Ke ust w
lly ri illn
BB a
ichjo
att
thhn
les
ary
be
ele
ktro anns
he
na
ut1 a
utar
Jede
sc
jim
bo
diah
op
bv
SP
es
ez
ro
Cyclin
eti
yaurlana
dyt
gnew
thle
Roc
tbw
te
sfeed
ans
samp
amA ns
lu
a
les
M
bth
muO
i b rafa
o1 ne
i
joshca
am Netm
_
O
tieurm
z hersvm
ganebo
biozul
YourAno
mo
HUBB be
ahn
taBjaee
torma
gg
oks
nXploit
i_erian
zar
sh
LE_spckyfe an cTionk
boo_eb
ks ooks
erd9
ace rreiraikam
bike
To9
snob
ther
sander
Spac dk s_ed
edo
a_enyc
h
books
sonmic uardng
eC ords tek yTec
h ic
hae hahe
rlie mu
ItalianFD
MarineTckUK
hilto
firmreadgoldtris
yami
kayla_ebo
Ideen
ellla_
wand
raffi
tesse
Yorgosh
er
tealc
Mauerfal
birds
erer
n
avalo
l89 oks
mo
fallsinJe
nLucceitanm
getlychee
__agwa
lent
open
Piqua kreno
zfson
D3rB3obacht3r
DebConf
osxnt
jlan
TheMatja
gda
z
Sklcrshrmtn
le
oursdemer
samhocevar Lucio
cijournalism
YaiAou
Astero
The_SolarS
idEne
IO
ystem
rgy
mathpunk
Fl0range
Street_Dogglevudev
yla
G
nda
zm
m
m
Wikipedia Voting
alld
jcsjay
csdotc
jcs om
JM
XX
X
to
ky
ph
oe
nixs
Be
nts
ha
M
ed
lu
Va
tra
ns
ee
fV uis
Pro ekh
Bro
nt uib
e
ials
de H
of0fe0
on
ater
eCing
fm
sp
ILiv
n
zW y
rre
billo
eie
Blit keel
in
co
p
lo
e_av
de
p
Sho
atB
ap
ShoKN
Uru
La rk ss_
ss
a le
erg
w
eD uthe
e 00p
3tnO
Th fatim
liz
l3 vy_Cre
In 9e0jp e
ca infil
avs
en 20 ric
Lo
HL
th
ndro799imJPion
sP
eSpto
aJe
lig
2
lla
odry
moo
oklCh Tire
She
CC
dP
g bHvied mm
enn
leda coDa g isla uk Goo
ee
sl1pe Ja dSmta
eirsv
un is kayg
lly eieeidg_
ago
ke Swdmro s ISorror
en wsom om RCte
frod
illym
lle
ra e c ndr RCAn
phlast
en
T
tie
ya
cgoa nN coineaslp
gazin
4kmith
usleb
u ticho
th no ainDeb
errls
lecy
ale
tism
erMa
autis ssivau
ira ckstca
uic A st ysSea
tor
go
fraukeld
pre
SX
ni
H i sununteVer
ica Numm
spst ks
r75 tok
de
Au
th
nF
n
oo
ca
mr
vo
nde
wd Debod
386_ 3Mch
lo oP
uin eb 16
s sc13ns Franz wo eleben
ff
TA
m Mw
pQto 6 i_dude s y erlo
urx smau uzat s_Koll
he 6ra6nd
hory
us532 sy ks kg
traeum
mer
T
au
_m
bz
ul
4b
TL
_bolte
n
pS
im
erC flr lcdc
tte
o_h oka nt_Thoma
matthich
die
Ibasw
SBhi
ett ca rm
neahe
al
anto
euNetzros
swad
RA
thusa
ers
yipo
scnd
rdluu
augru
riF 2
EL_
trcoipn mille
rd4U_
m
one
_jo
elizabe
Dtoistitzere
Ata 234
Ne
Fr
m
th
SPIEG fibre
eb
orp mitS
at
ni
rg
ba herc
tonic
ro
da
ntoon Kad
rech
eo
So
eC om
ana
mipham
jonoevheat Gajam
die
H
enSek
dPCcam
rs Cus
calsilcock
al
ulist sin
C
ug
lan _N
hremWasTaken
Bundes_G
alc din
No vir abn
keska
ibNorC
lec PerE
LUrlieS
Tom
tin
tail
ior
s
Cha
litt
AC
acmebench
McN
s geln
ip11 To ey
chaospira
CrowdTRe
lej
erdo
indie
casa
philsweeney
eliGN
pdhalili
on ver
RISI
auditdi
alk
OlafScholz
rtorseregu
_girlveRo
raisen
VElet linux jimfilm
esR
plan
eyer
Ro
Jam
rshall_
Nienor_
t
theodoricm
4
a_ma
etnaS
Lilia
cavatica
urrier
MontrealSauce
rcasticlArchip
fpa ey_va _mattio
coracchri
e201
Sa
ultweet
supaheld
landonfulle
DrPr hiltil IVAW
resentf
arkila langl
Ph
flamsm
tions
PoliceVideo
mur
ParteiHH
accessjames
esaopera
bradfitz
bspector
anarchival
sleepyle UnmagicalMe
x
cznweb
yawnbo
EightTons
Arb
Gar
m
willie
Infola it
ikis
dire
TACp
Sureca are
llsttArt
pa
od
shca
Hot uug
g
pestnd
GISg
vid
Spo ee
k
rack
am
t
deerv
equarte
y BAES
nodesecu
ystems
ichetandhem
rity
_AI
bre
minoad7 nolaforensix
AcademiaObscura
l BitquarkBot
martinbubertendart
romainle
miratim
sitelutions
TK5EPfujin_ i
ne ge
mu
ihmd
elizae
man
ksau
and idcom
peoE
wZeal
dotG
PureNe cpu
h_
nalytics
l iya
RulerA
avemd
Ha
an
n1dq
reNZtr
Z ulliv
Explo
s es
pNattrs
tem m
Scoozim
ysh_ga
er4
lana
ksid1HA
ng
fis
iusS
t
a
tre
lly
e
ud je an
dpKhal
dyR ouln
Clo qm
V
An
iallu
s
ofic
mile
p
_S
JD
Ap
l
ge
Yo
an
n
in
mie
M
te
jim
NT
O kurs
IC
au
is
H
es_
am
nrtG
diau
oad
ez R ey
tin
ds
m
in
m
r _L
su thy
nki o
iko Tim
therealred
gutweiler
JayKyuu
TheGutes
vnik5287
eill
nho
Tural30
Ch1 coliccn
DomSpoerli
stertpraetorianlabs
eckm
Sock
arx etIO
Am
M
clear
anHa ob
ile
hej
_S us login
ec
m 107
sw In djdrdika
Me
run
itc te av rso
lnik Midma lbour
hing
rFF ies7 gg
a_rek rketCI neGe
uz
davidth
toaxgu
O
Aio
sya
ek
P n2te
m
odey
nsI 4Vchinsi
s
Nftas
icky de
Th C Gr_
thecre
an s
e haJHsv
veet
G po bittaGin ndWod
leillK
hem
raM n
rm gerD rsohlenmis
cFieose
bem
an o karisaer
rdD
dJa
rwso
g
Fauta
ha
nM
tuA
og nixnd
ajou
thrertis
hipa
ya
ey
in t
e
es
2n
kt
esh
terd
kO
ct3
luqm
z ay
u3re
psy
an
_h
ak
im
_y
_ton
ch
y_
ris
L
ric
ha adyD
ha
ph
rds
ate sl
ete am
je
ctiv qu
rin
es1 ang5
a
D
78
ow
57
nto
8
nA
bb
ey
GraphChallenge Data Formats
Outline
•  Introduction
•  Data Sets
•  Static Graph Isomorphism Challenge
•  Metrics
•  Summary
Graph Challenge - 11
Static Subgraph Isomorphism Problem
Vertex-Based
Array-Based
•  Given a sub-graph H and
a larger graph G
•  Given sub-graph adjacency array B and
a larger graph adjacency array A
•  Is there a 1-to-1 mapping of the vertices
in H to vertices in G such that every
edge in H is also in G ?
•  A subgraph isomorphism exists iff there
is a permutation array P where
B = PT A P
Bold uppercase indicates a matrix, Bold lowercase indicates a vector, lowercase indicates scalars, sets, ...
Graph Challenge - 12
Scaling Observations
•  Two key parameters govern the complexity of subgraph isomorphism
–  Size of G and H
–  Nominal complexity is |G||H|
•  Large G and H is may be too challenging
•  Large G or H is more practical
•  Small G and G and H are of same order is less interesting
–  Focus of program is large graphs
•  Large G and small H (G is much larger than H) is more interesting
–  Example: H is a triangle (i.e., triangle finding algorithm)
–  Example: H is a k-truss (i.e., k-truss algorithm)
Graph Challenge - 13
Triangle Counting
•  Triangles are the most basic non-trivial subgraph
–  Set of three mutually adjacent vertices in a graph
•  Number of triangles in a graph is an important metric used in applications such as
– 
– 
– 
– 
– 
– 
Social network mining (e.g.: clustering coefficient calculation)
Link classification and recommendation
Cyber security
Intelligence
3
Functional biology
1
Spam detection
•  Example of triangles in a Graph
–  Triangle 1 : Nodes 3,4,5
–  Triangle 2 : Nodes 2,4,5
Graph Challenge - 14
Counting and Sampling Triangles from a Graph Stream
A. Pavan, K. Tangwongsan, S. Tirthapura & Kun-Lung Wu, VLDB 2013
4
2
5
2
1
Triangle Counting and Enumeration
- Example Algorithm -
•  Lower complexity array approach
gle listing
results for
SNAP datasets.
•  Triangle
enumeration
has big I/O
•  Triangle counting has minimal I/O
Article
Graphing trillions of triangles
Information Visualization
1–10
! The Author(s) 2016
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1473871616666393
ivi.sagepub.com
Paul Burkhardt
rkhardt
hich every vertex is connected to every other vertex,
d a cycle, i.e. a path with identical endpoints.
iangles in graphs are useful for complex network
alysis based solely on structural, i.e. adjacency,
ormation. Let D(v) be the local count of triangles
volving vertex v and D(G) be the global count of trigles in G. There can be many more triangles than
rtices and edges, and if the graph is completely concted, where G itself is a clique, the total number of
Graph Challenge - 15
5
Abstract
The increasing size of Big Data is often heralded but how data are transformed and represented is also profoundly important to knowledge discovery, and this is exemplified in Big Graph analytics. Much attention has
been placed on the scale of the input graph but the product of a graph algorithm can be many times larger
than the input. This is true for many graph problems, such as listing all triangles in a graph. Enabling scalable graph
to algorithms, architectures, and visual analy2 s exploration for Big Graphs requires new
1 approaches 2
s
tics. A brief tutorial is given to aid the argument for
thoughtful
representation
v of data in the context of graph
v
2
analysis.
Then a new
1
2 s algebraic method to reduce the arithmetic operations in counting and listing triangles in
graphs
a scalable triangle listing algorithm in the MapReduce model will be preij
6 isijintroduced. Additionally,
sented followed by a description of the experiments with that algorithm that led to the current largest and
fastest triangle listing benchmarks to date. Finally, a method for identifying triangles in new visual graph
exploration technologies is proposed.
Theorem 1. Given G P
and the Hadamard product
(A
(A A) and D(G) =
P A), then D(v) =
(A A) .
Proof. The paths of length two, i.e. 2-paths, are identified by A2 ; then element-wise multiplication A A2
Keywords
retains
only those
fv, ugvisualendpoints
2-paths
that are
Graph,
scalable algorithms,
triangle counting,
analytics, parallelof
programming,
MapReduce
also the fv, ug 2 E endpoints of edges. Then the sum
connected triplets in a graph2or network of entities,
Introduction
th
column
vector
of AwhereAeachisvertex
the
of
elements
in
the
v
often depicted
as a triangle
of the
Data, data, data, and more data! Obtaining data is just
s
s
triangle represents an entity that is connected
1 3to the
Triangle Counting using miniTri
- Example Algorithm •  Array based algorithm
–  Given a graph adjacency array A and incidence array E
3
C = AE
nT = nnz(C)/3
4
5
•  Multiplication is overloaded such that
2
C(i,j) = {i,x,y} iff
A(i,x) = A(i,y) = 1 and ET(x,j) = ET(y,j) = 1
0
B
B
A
=
A B
B
@
0
0
0
0
1
Graph Challenge - 16
0
0
0
1
1
0
0
0
1
1
0
1
1
0
1
1
1
1
1
0
1
C
C
C
C
A
0
B
B
T
B
=
E B
B
@
1
0
0
0
1
0
1
0
0
1
0
0
1
0
1
0
0
1
1
0
0
1
0
1
0
0
0
0
1
1
1
C
C
C
C
A
0
B
B
C=AET ==BB@
;
;
;
;
;
;
;
;
;
;
;
{2, 4, 5}
;
;
;
;
;
{3, 4, 5}
; {4, 2, 5} {4, 3, 5}
;
;
;
;
;
;
{5, 3, 4} {5, 2, 4}
;
miniTri represented in compact linear algebra-based operations
Kokkos/Qthreads Task-Parallel Approach to Linear Algebra Based Graph Analytics
Wolf, Edwards & Olivier, IEEE HPEC 2016
1
1
C
C
C
C
A
K-Truss
•  A graph is a k-truss if each edge is part of at least k-2 triangles
•  A generalization of a clique (a k-clique is a k-truss), ensuring a minimum level of
connectivity within the graph
•  Traditional technique to find a k-truss subgraph:
–  Compute the support for every edge
–  Remove any edges with support less than k-2 and update the list of edges
–  When all edges have support of at least k-2, we have a k-truss
Example 3-truss
Graph Challenge - 17
Graphulo: Linear algebra graph kernels for NoSQL databases
Gadepally, Bolewski, Hook, Hutchison, Miller & Kepner, IPDPS GABB 2015
K Truss: Array Formulation
- Example Algorithm -
•  If E is the unoriented incidence array
(rows are edges and columns are
vertices) of graph G, and A is the
adjacency array
Algorithm
R=EA
x = find( (R == 2 ) 1 < k − 2 )
while x
Ex = E(x,∶)
•  If G is a k-truss, the following must
be satisfied
–  (E A == 2) 1 > (k – 2)
E = E(xc,∶)
R = E(xc,∶) A
R = R − E ( ExExT − diag(ExExT) )
x = find( (R == 2 ) 1 < k − 2 )
Graph Challenge - 18
Graphulo: Linear algebra graph kernels for NoSQL databases,
Gadepally, Bolewski, Hook, Hutchison, Miller & Kepner, IPDPS GABB 2015
Outline
•  Introduction
•  Data Sets
•  Static Graph Isomorphism Challenge
•  Metrics
•  Summary
Graph Challenge - 19
Metrics
•  Correctness
–  Number of triangles is exact for a given graph
–  Enumerating all k-trusses is exact for a given graph
•  Performance
– 
– 
– 
– 
– 
Total number of edges in graph (edges)
Execution time (seconds)
Rate (edges/second)
Energy (Watts)
Rate per energy (edges/second/Watt)
Graph Challenge - 20
Summary
•  Static sub-graph isomorphism graph challenge consists of
–  Triangle finding
–  K-Truss
•  An implementation will perform these algorithms on provided data sets and report the
given metrics
•  A submission to the IEEE HPEC Graph Challenge consists of a conference style paper
describing the approach, implementation, innovations, and results
•  Both hardware and software should be described in solution
–  Innovative hardware solutions are of interest (in addition to best algorithm for hardware)
–  Special interest in performance for very large scale data sets (1-100 trillion edges)
Graph Challenge - 21