Software Networks

Software Networks
Christian Bird
Computer Science Dept.
UC Davis
A network like any other
• A software network is made up of
– Nodes: software artifacts
– Edges: relationships between those artifacts
(may be directed or undirected)
imports
function
module
requires
co-comitted
class
file
includes
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions (3000 in apache)
– Classes
int add (int a, int b) {
printf(“%i + %i = ”, a, b);
– Files
int c = a + b;
printf(“%i\n”, c);
return c;
– Modules/Packages
}
– Directories
– Libraries
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
Class Logger {
int logItem(Object item, int level) {
stuff…
}
int logError(String msg) {
more stuff…
}
more functions…
}
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
– Classes
– Files (300 in apache)
– Modules/Packages
– Directories
– Libraries
math.c
float absoluteValue(float a) {
return a > 0 ? a : -a;
}
void printName(char *name) {
printf(“Hello %s\n”, name);
}
more functions…
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
class Logger {
stuff…
}
class LogMessage {
stuff…
}
class LogError {
stuff…
}
more classes…
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
/apache/http-2.0/server/core/handle.c
– Classes
/apache/http-2.0/server/core/serve.c
/apache/http-2.0/server/core/cgi.c
/apache/http-2.0/server/core/locking.c
– Files
– Modules/Packages
– Directories (65 in apache)
– Libraries
Nodes
• The nodes in a software network usually
represent software artifacts at various
levels of granularity
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries (25 in apache)
libkdeinit_konqueror.so
libkonq.so.4
libkutils.so.1
libkio.so.4
libkdeui.so.4
libkdesu.so.4
libkdecore.so.4
libDCOP.so.4
libdl.so.2
libresolv.so.2
libutil.so.1
libart_lgpl_2.so.2
libidn.so.11
libqt-mt.so.3
libpng12.so.0
libXext.so.6
libX11.so.6
libSM.so.6
libICE.so.6
libXrender.so.1
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
int add (int a, int b) {
printf(“%i + %i = ”, a, b);
int c = a + b;
printf(“%i\n”, c);
return c;
}
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
Class Logger inherits Writer{
int logItem(LogMessage item, int level) {
stuff…
}
int logError(String msg) {
more stuff…
}
more functions…
FileWriter w
}
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
math.c
float absoluteValue(float a) {
return max(a, -a);
}
void printName(char *name) {
printf(“Hello %s\n”, name);
}
more functions…
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
import java.lang.util;
import edu.ucdavis.senses;
class WirelessSensor {
…
}
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
A function in /apache/http-2.0/server/core/handle.c
– Classes
may call a function in /apache/http-2.0/apr-util/hash.c
– Files
– Modules/Packages
– Directories
– Libraries
Edges
• Edges in a software network represent a
relationship such as a function call,
instance member, library dependence, etc.
– Functions
– Classes
– Files
– Modules/Packages
– Directories
– Libraries
Library libkdecore.so may need to
Load libqt3-mt.so which in turn may
Need to load libX11.so and libm.so which
All need libc.so
libkdecore.so
libqt3-mt.so
libX11.so
libm.so
libc.so
Example Callgraph
void printInt(int a) {
printf(“the number is %i\n”, a);
}
main
int add(int a, int b) {
return a + b;
}
int multiply(int a, int b) {
return a * b;
}
int factorial(int a) {
if (a == 1) return a;
return multiply(a,factorial(a-1));
}
printInt
factorial
printf
multiply
add
void main() {
printf(“calculating 6!\n”);
printInt(factorial(6));
}
Never called
Static versus Runtime Callgraphs
• Static callgraphs are constructed by a syntactic analysis
of the source code
• Pros
–
–
–
–
Don’t have to build or run the program
Works in the presence of syntactic or semantic errors
Catches calls for exceptional situations
Fairly fast
• Cons
–
–
–
–
Doesn’t get valued information (how many calls to each function)
Includes calls in dead code. Example: if (0 == 3) logError(…)
Doesn’t include calls through function pointers
Doesn’t include calls to functions in dynamically loaded libraries
Static versus Runtime Callgraphs
• Runtime callgraphs are constructed by running a piece
of software one or more times and logging the number of
function calls
• Pros
– Includes number of times function calls occur
– Includes calls through function pointers and dynamically loaded
libraries
– Will not include calls in dead code
• Cons
–
–
–
–
Requires building the software
Hard to get complete code coverage
Can take a long time
May require a test harness of some kind (especially for
interactive applications) along with test data
Differences between callgraphs
and other graphs we’ve seen
• Has a root and commonly will form a tree-like
structure
• Few if any cycles in callgraphs (direct or indirect
recursion is rare)
• Reciprocity is not common due to levels of
abstraction
• Preferential attachment?
– If a function is called by many functions is it more
likely to be called by other functions in the future?
Maybe.
Software Repositories
• Used in development of virtually any software project
(commercial, personal, OSS, etc.)
• Examples include RCS, CVS, subversion, perforce,
bitkeeper, and sourcesafe
• Keeps track of every change to the software, who made
the change, time of change, comments associated with a
change, etc.
• Allows us to view the evolution of a piece of software
• A developer makes changes to software code and then
commits the changes to the software respository with a
description of the changes
Software Networks from Repositories
• The software history allows us to relate
different artifacts in the software
• Create an edge between functions, files,
classes, if they all were modified in the
same commit
• Create an edge between artifacts if they
were modified by the same developer
Modularity: one use of a callgraph
• The characteristic of a system that has been divided into
smaller subsystems which interact with each other
• Software that is modular has distinct subsystems
(modules) with high levels of interaction within the
subsystems and low levels of interaction between the
subsystems
• Software that is modular is easier to understand and
maintain
Modular OS
Scheduler
Networking
Filesystem
Kernel
I/O devices
Memory Management
Modularity Case Study using Callgraphs
• Exploring the structure of Complex Software Designs: An Empirical
Study of Open Source by Alan MacCormack, John Rusnak, and
Carliss Baldwin
• Created a “Design Structure Matrix” at the file level using function
calls as ties. (i.e. if a function in foo.c calls a function in bar.c then
there is a tie from foo.c to bar.c, non-symmetric)
• Used static analysis to extract the file-level callgraph
• Clustered the DSM using standard clustering techniques
• Metrics used:
– Clustering cost: measure of how many function calls are not within a
cluster
– Propagation cost: measure of how many functions will be affected if a
particular function is modified
DSM examples
Example System in Graphical
and Dependency Matrix Form
A DSM with dependencies in an
“Idealized Modular Form”
A change to F propagates to E, C, and A
while a change to B only propagates to A
All calls are within clusters so
the clustering cost is 0
Mozilla Project
• Netscape opensourced Navigator in March 1998
• The project was named Mozilla and eventually
led to what Firefox is today
• Initially the code was complex and tightly
coupled, a common phenomenon in industry
code
• This formed a high barrier to entry for volunteers
to contribute code
• Architecture was re-designed in late 1998 due to
increasing complexity
DSM’s for Mozilla
Results of Mozilla Re-design
More Results
• After the re-design, volunteerism went up
dramatically (critical for an OSS project to
succeed)
• Both functionality and performance
increased
• Both code size and number of files
decreased (initially)
What are we doing with software nets?
• Due to CVS history, we can create a callgraph
for a piece of software at any time during it’s
evolution
• Do certain parts of the callgraph stabilize before
others? Why?
• Are certain portions of the callgraph more bugprone than others?
• What does code ownership in the callgraph look
like?
• What is the relationship between callgraph
network, co-commit network, and ownership
network?
More Questions
• Does the software network bear any
resemblance to the social network of the
developers who work on it? (Conway’s Law)
• Are callgraphs small-world networks? What is
the distribution of in- and out-degrees? What
would the answers mean (if anything)?
• What partitioning techniques allow us to extract
module structure from source code?
• Is there a relationship between the co-committer
social network and the email social network for
developers?
On with the show…