Web Intelligence
Complex Networks I
This is a lecture for week 6 of
`Web Intelligence
Example networks in this lecture come from a fabulous site of
Mark Newman, U of Michigan: http://www-personal.umich.edu/~mejn/
This part of the course: WI
when
what
why
what else
Sep 27
Complex Networks I
Assignment 1 – worth
70% of CW
Oct 4
Complex Networks II
The WWW is a huge complex
network. Many other networks are
overlaid upon it. Networks have
important and interesting
properties, re: speed of information
spread, robustness, speed and
quality of search, etc.
Oct 11
Web Search: How google works
Search, obviously, is central to web
intelligence
Assignment 3 –
worth 15% of CW
Nov 8
Text mining and knowledge
discovery from the WWW
Towards inferring useful new
knowledge automatically; also for
better search, for non marked-up
web
Reading
Nov 15
Web communities and cultural
models
Understanding how the web
influences
the formation and behaviour of
groups, and the spread of
information
Reading
Assignment 2 – worth
15% of CW
Introductory Points
Graphs and networks are of central importance to us, because:
•
•
The web is a large and complex network
Major phenomena that underpin our existence, such as how
information spreads, how diseases develop, how economies evolve,
are best viewed mathematically as networks.
• Networks have structural properties and behaviour. When we
analyse the structure of a network, we can reveal important clues
about its behaviour. E.g.
• Predict how fast a virus, or rumour will spread on the web
• Assess which are the most authoritative web sites
• Predict how long it will take to search sections of the web
• Predict how robust to damage an area of the www is, or a
cellular process is, etc.
This Week’s Material
Basic Intro to graphs and networks, terminology, and so on.
The interesting properties of real-world networks.
Metrics and other structural properties that are currently used to
analyse both the www and other networks.
To support the understanding of metrics and properties,
this week we cover basics of graphs and networks.
The very basics
A graph is a set of two things: G = {V, E}
V = a set of vertices (also called nodes)
e.g. V = {A, B, C, D}
E = a set of edges (also called arcs, or links)
e.g. E = { {A,C}, {A,D}, {B,C}, {B, D} }
in which each edge is a set of two vertices from V
This graph is:
A
B
C
D
The very basics II
An undirected edge between A and B:
{A, B} (or {B, A})
A
B
A directed edge between A and B: (A, B)
A
B
A loop at A: {A, A} or (A, A)
A
In an undirected graph, all edges are undirected.
In a directed graph, all edges are directed.
The very basics III
A
B
G
C
D
E
F
The degree of a node, in an undirected graph, is the number of
edges attached to it. In this one, the degrees are:
A: 2 B: 3 C: 3 D: 3 E: 0 F: 1 G: 2
What is the mean degree of this graph?
The very basics IV
A
B
G
C
D
E
F
Nodes in directed graphs have in-degrees and out-degrees.
Here: Node: in,out as follows:
A: 1, 2 B: 1, 2 C: 2, 1 D: 2, 2 E: 1, 1 F: 1,2 G: 0, 2
A directed graph without cycles is called a DAG.
Is this a DAG?
The very basics V
This is an unlabelled graph.
This is a labelled graph.
homepage
teaching
It is exactly the same as
(isormorphic to) this one:
graphs
research
Since labels and links have
meaning, this one is different:
homepage
graphs
teaching
research
Diversity of graphs: considering
only loop-free graphs
How many different 2-node, labelled undirected graphs are there?
How many different 2-node, labelled directed graphs are there?
How many different 3-node, labelled undirected graphs are there?
Suppose there are G(k) possible undirected labelled graphs on k nodes.
Whenever we add one extra node to an und. Lab. graph on k nodes:
Any subset of the k existing nodes could link to it, and there are 2k
such subsets. So the number of possible und. lab. graphs on k+1
nodes is 2k times what it is on k nodes.
Example numbers for undirected
labelled graphs
Size of graph
5 nodes
10 nodes
20 nodes
100 nodes
1000 nodes
Number of possible
graphs
1024
35,184,372,088,832
1.6 1057
1.3 101490
a lot.
More basics
A
B
G
C
D
E
F
If there is a path in the graph from each node to every other, the
Graph is connected, else it is unconnected. This one?
More basics II
A
B
C
D
The complete (undirected) graph on
n nodes is the graph that contains
all n(n1/)/2 possible edges.
Is this one complete?
Most graphs of interest and importance are far from complete –
they tend to be called sparse.
Think about the following graphs:
1: Nodes = students in this university; Edge {A,B} exists if
A and B have the same birthday.
2. Nodes = web pages: Edge (A,B) exists if A links to B.
3. Nodes = types of molecules in our bloodstream, Edge(A,B)
exists if A interacts with B.
4. Nodes = all living humans. Edge{A,B} exists if A and B have
ever shaken hands.
More Structural Properties
Diameter: length of the longest path between any two nodes
Number of components: in undirected graphs
Degree distribution: An interesting and important fingerprint of
a graph that we will see more of.
Modularity: A graph is highly modular if it has several clusters
of nodes with many links within the clusters, but few links between
the clusters.
Hierarchical modularity. A graph seems to be hierarchically modular
if it is modular, as above, but the modules are themselves modular.
Some Networks
One of these is a network of protein interactions in yeast. The other is
a visualisation of an outbreak of TB.
What do the nodes and edges represent? And … which is which?
Is this:
spread of HIV infection (node = person / link = HIV transfer)
or is it:
books about politics (node = book / link = one mentions the other)
Notice how the book network is polarised
The internet
Assignment 1
Read:
Write:
Exploring Complex Networks, by Steven Stroglatz,
Nature 410, 268—276
A 500-word `executive summary’ of most of this article.
Leave out Box 1, and the section “Regular networks of
coupled dynamical systems”, restart at “Complex network architectures”.
AND
Write:
Write:
A 100-word account of what you assess to be the three main
points conveyed by this article
A 200-word essay about the relevance of those points to
the topic of your BSc or MSc (e.g. relevance to AI; relevance to
IT(Business), etc..)
Word limits in this assignment are important; over the limits means losing marks
Marking
30% of the marks: completeness and readability
30% of the marks: evidence of understanding
the article, and generally making sense
30% of the marks: clarity of your arguments
10% of the marks: for making me say “Wow”
Next week
Much more advanced, about:
• Degree distributions
• Cluster Co-efficients
• Modularity and hierarchy
• Random networks vs real networks
• Some basic graph algorithms
• Another article, much smaller, to read.
© Copyright 2026 Paperzz