slides

Application of Graph Theory
to OO Software Engineering
Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides
Department of Applied Informatics
University of Macedonia
Thessaloniki, Greece
WISER 2006, May 20, 2006, Shanghai, China
Motivation
• Application of Graph Theory to SE is not new:
Planning: network diagrams (CPM, PERT)
Analysis: DFDs, FSMs, Petri Nets
Design: everything is essentially a graph
Testing: McCabe's complexity measure
...
• Graph Theory is suitable for object-oriented SE:
Class diagrams can be perfectly mapped to graphs
System Representation
Identification of "God" classes
• Goal: to identify heavily loaded classes of an OO design
• such "God" classes imply a poor model
• Inspiration comes from the Web (HITS algorithm)
HITS
Eat Anything
Super Cars
Relative Importance: Low
Car Loans
MyHumble
HomePage
Anti-Wrinkle
Mykonos
Alternative
Music
SEI
Relative Importance: High
IEEE TSE
ACM
SIGSOFT
ICSE
MyHumble
HomePage
GoF
NASA
Identification of "God" classes
• OO system : directed graph G=(V, E)
• classes  vertices
• associations  edges
• Each edge is annotated with an integer mp,q corresponding to the
number of discrete messages sent to the same direction from p to q.
αq1
hq1
q1
q1
m
1
q1,
p
hq2
q2
mq2, p
m q3, p
hq3
q3
m p, q
αp
p
hp
p
mp, q2
αq2
q2
m
p,q
3
αq3
q3
Identification of "God" classes
message4
message1
message2
message5
message6
1
message7
message3
message8
1
1
1
3
1
2
2
2
1
1
message10
2
message9
4
4
a1  0  h1  1  h2  0  h3  0  h4
h1  0  a1  2  a 2  0  a 3  0  a 4
a 2  2  h1  0  h2  1  h3  1  h4
h2  1  a1  0  a 2  1  a 3  1  a 4
a 3  0  h1  1  h2  0  h3  1  h4
h3  0  a1  1  a 2  0  a 3  2  a 4
a 4  0  h1  1  h2  2  h3  0  h4
a  AT h
3
1
h4  0  a1  1  a 2  1  a 3  0  a 4
h  Aa
0
1
A 
0

0
2 0 0
0 1 1
1 0 2

1 1 0
Identification of "God" classes
• Using theorems from Linear Algebra, authority/hub weights can be
obtained by finding the principal eigenvectors of ATA and AAT
Door
2
doorClose
Power
Tube
turnOn
doorOpen
5
isOpen
turnOff
turnOn
cook
Oven
Button
1
turnOff
Light
6
3
cancel
setT imeZero
exp ired
countDown
add60sec
beep
Beeper
Timer
4
7
Identification of "God" classes
2
5
1 2 3 4 5 6 7
2
1
2
1
3
1
6
2
3
4
2
1
A=
1
2
3
4
5
6
7
0
0
0
0
0
0
0
0
0
1
0
0
0
0
2
2
0
1
0
0
0
0
0
3
0
0
0
0
0
0
2
0
0
0
0
0
0
2
0
0
0
0
0
0
1
0
0
0
0
7
an T  0 0.229 0 0.688 0.459 0.459 0.229
hn T  0 0 1 0 0 0 0
Clustering
• Goal: to partition the system into strongly communicating classes
• might imply relevance of functionality
• might imply possible reusable components
• Spectral graph partitioning employs the degree matrix (diagonal
matrix containing the degrees of vertices), and the
• Laplacian matrix, defined as L = D – A
• the smallest eigenvalue of L is always zero
Clustering
• the properties of the eigenvector x2 associated with the second
smallest eigenvalue λ2 have been explored by M. Fiedler
• Clustering a graph G into two sub-graphs according to the positive
and negatives entries of the Fiedler vector, corresponds to a partition
which minimizes the weight of the cut set.
9
11
1
6
6
7
7
Clustering
weightcut-set = 17
9
11
1
6
6
7
7
weightcut-set = 18
9
11
1
6
6
7
7
weightcut-set = 1
11
9
1
6
6
7
7
provided by
Fiedler
vector
Clustering
• Application to OO systems: edges are undirected and edge weight
is the sum of number of messages exchanged in both directions
• Partitioning is performed iteratively
• When to stop ? when a resulting graph is less cohesive than the
parent graph
Clustering
InputForm
Entity1
6
3
2
BusinessLogic
1
7
3
1
2
5
3
Connection
3
7
5
4
2
Result
2
5
5
Statement
1
1
5
3
4
2
9
8
2
Confirmation
10
DB
2
3
2
2
3
8
2
5
6
MainFrame
1
Entity2
9
2
4
10
Clustering
x2T  [0.491,0.491,0.192,0.285,0.480,0.410]
6
3
2
2
3
7
8
2
Logic
5
4
2
10
GUI
5
3
2
1
9
4
5
2
DB
x2T  [0.446,0.359,0.317,0.359,0.152,0.152,0.108,0.313,0.388,0.366]
Design Pattern Detection
• Design Patterns (descriptions of communicating classes): form
solutions to common problems
• According to Parnas software engineering deals with multi-version
projects
• Multiple Versions + Large Number of Components =
Complicated and messy architecture
• Patterns impose structure
• Consequently, the identification of implemented patterns
• is useful for understanding an existing design
• enables further improvements
Design Pattern Detection
System under study
...
...
...
Sought Design Pattern
I
1
0 1 0
1 10 01
1 10
1
1 1 1
1 1
1
0 1 0 0 ...0
0 10 1 0 .0. . 1. . 0
01 .0. .10 1 0 .0. . 1. . 0
1 0 .0. .1 1 0 . . . 1
0 0 0 .1.... . 0
1 0 0 0 1... 0
1 1 0 0 0 1... 0
matches
core
+doIt()
+ further annotations
A
D
+doIt()
+doIt()
X
+doIt()
+doX()
Y
+doIt()
+doY()
Class Diagram (UML)
System / Pattern
1
Z
0 1 ... 1 0
1 10 .1. . .0. 01
1 .10. .1. . .0.
1 .1. . . .
1 1 . . . 0. .1.
1 1 ... 0
1 1 ...
0
01 0
0 0
1
0 1
+doIt()
+doZ()
Graph Representation
Representation as set of matrices
Design Pattern Detection
• Classical pattern matching algorithms fail since patterns often
differ from the standard representation
A
a
1
b
2
B
C
System Segment 1
System Segment 2
Pattern
Design Pattern Detection
• Exploiting recent research on graph similarity [Blondel2004] it is
possible to measure the degree of similarity between two vertices
Design Pattern Detection
A
1
a
B
b
2
C
Generalization Graphs
A
similarity: 0.5
a
similarity: 1
1
B
similarity: 1
b
C
2
similarity: 0.5
Association Graphs
A
similarity: 1
similarity: 0
1
a
B
b
C
similarity: 1
similarity: 0
2
Design Pattern Detection
1
a
b
2
 0.5

 0

0 
0.5 
A
a
1
b
2
B
C
System Segment 1
System Segment 2
Pattern
Design Pattern Detection
• Experimental Results:
JHotDraw v5.1 (172 classes)
JRefactory 2.6.24 (572 classes)
JUnit 3.7 (99 classes)
Design Pattern Detection
Design Patterns
Adapter*/Command
Composite
Decorator
Factory Method
Observer
Prototype
Singleton
State/Strategy
Template Method
Visitor
JHotDraw v5.1
TP
FN
18
0
1
0
3
0
2
1
5
0
1
0
2
0
22
1
5
0
1
0
JRefactory v2.6.24
TP
FN
7
0
0
0
1
0
1
3
0
0
0
0
12
0
11
1
17
0
2
0
JUnit v3.7
TP
FN
1
0
1
0
1
0
0
0
4
0
0
0
0
0
3
0
1
0
0
0
Scale-Freeness of OO Systems
• Popular topic: investigation of whether certain systems
(technological, biological, social etc) are scale-free
• A scale-free phenomenon shows up statistically in the form of
power law.
• For a network, the probability P(k) that a node in the network
connects with k other nodes is P(k) ~ k-γ
Scale-Freeness of OO Systems
• Naturally, research has also focused on OO systems
• Scale-freeness is usually graphically detected, since the relationship
of P(k) vs. k, plotted on a log-log scale, appears as a line with slope -γ
Cummulative Frequency
10000
1000
JUnit
100
JHotDraw
JRefactory
10
1
1
10
100
k
1000
Scale-Freeness of OO Systems
(a)
(b)
Cumulative Frequency
100
10
1
1
10
100
Vertex Degree
(e)
(c)
(d)
Degree Sequence = {16, 8, 8, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
Scale-Freeness of OO Systems
• Recently, in [Li2005], a structural metric has been proposed to
evaluate the scale-freeness of a network.
• For an undirected, simple and connected graph g=(V,E)
s g  
 di d j
i , j E
• The metric value is maximized when high-degree nodes ("hubs") are
connected to other high-degree nodes.
• Among all graphs having the same degree sequence, there is a graph
smax that maximizes the value of the metric s(g) and a graph smin that
minimizes it. Thus:
s g   smin
S
smax  smin
Scale-Freeness of OO Systems
Given such a metric, it is possible:
• to validate whether a given OO system is scale-free
• to assess whether an optimization increases scale-freeness
• to evaluate the evolution of systems in terms of scale-freeness
0.65
scale-free metric S
0.6
0.55
JUnit
0.5
JHotDraw
0.45
JRefactory
0.4
0.35
0.3
versions
Conclusions
• Graph Theory has been widely applied on several CS fields
• It can provide a powerful "tool" for analyzing OO systems
• quantification of properties
• identification of structures
• Graph Theory is important for CS curricula
Application of Graph Theory
to OO Software Engineering
Thank you for your attention
WISER 2006, May 20, 2006, Shanghai, China