Precept 4: Traveling Salesman
Problem, Hierarchical Clustering
Qian Zhu
2/23/2011
Agenda
• Assignment: Traveling salesman problem
• Hierarchical clustering
– Example
– Comparisons with K-means
TSP
• TSP: Given the coordinates for a set of vertices,
• Find an order of connecting the vertices, such that 1) each
vertex is visited once (except for the start vertex which is
visited twice), and 2) the total distance is minimum. Start
vertex and end vertex must be the same.
Can you spot a TSP tour
among these possible answers?
TSP
• Smaller problems (n=50): humans can do very
well.
• Larger problems: not so much…
• Machine: there is no general way to solve TSP.
Computationally hard.
– Challenge: find a polynomial time algorithm to
solve this problem, and you will win a million
bucks!
A naive algorithm to find a TSP tour
• Exhaustive search among all possible tours,
and pick the minimal tour.
• How many possibilities?
– N cities
– N x (N-1) x (N-2) x … x 1
– N!
• Obviously not very efficient.
TSP
• Let’s say no polynomial time algorithm exists
to solve this problem.
• We can use heuristic methods to derive a
reasonably good answer in reasonable time.
– Heuristics: no guarantee that answer is optimal
– Modern methods can solve much larger TSP
(involving millions of nodes) in reasonable time.
Assignment goals
• Use heuristic methods to estimate answer for a
computationally hard problem
• Visualize the answer to see how well the
heuristics perform.
• Implement a linked list structure to represent a
tour in Java.
– You cannot use LinkedList from Java standard library.
• We will use two heuristics:
• Nearest neighbor
• Minimize total tour distance
– Extra credit: implement your own heuristic
Tour representation
• Circular linked list
private class Node{
Point P;
Node next;
}
Node A
Inner class:
better encapsulation
Node A
Node B
A.p
A.p
B.p
A.next
A.next
B.next
size-1
size-2
Tour traversal
• Start from the head node
• Iterate through the list until the head node is
reached again.
Adding a node to the list
Add:
Node C
To:
Node A
Node B
C.p
A.p
B.p
C.next
A.next
B.next
Insert here
2
1
Node A
Node B
Node B
Node A
A.p
B.p
A.p
B.p
A.next
B.next
A.next
B.next
Node C
C.p
C.next
Nearest neighbor (NN)
• Read in the next point, and add it to the
current tour after the point to which it is
closest.
t_list = list of points in current tour
p_list = list of points
for each p in p_list:
x = closest point in t_list
add p to t_list, after x
C
A
B
Current tour
E
Current point
Compare E with each node in the tour.
Pick the node that is closest to E.
Connect E after this node.
Nearest neighbor (NN)
C
A
E
Current point
B
Current tour
Let’s say B is closest to E. Add E after E:
C
A
B
E
Smallest increase heuristic
• Read in the next point, add it to the current
tour after the point where it results in the
least possible increase in tour length.
t_list = list of points in current tour
p_list = list of points
for each p in p_list:
x = point where if p is inserted after x, smallest increase
in tour length
add p to t_list, after x
Need to calculate every possible increase in order to know which is smallest.
How to calculate increase in tour length?
C
C
A
A
4
5
E
B
Increase in tour length = (4+4) – 5 = 3
B
4
E
Assume…
• The order of inserting nodes is given. (use the
order of nodes in the input file)
• Does order matter?
– Try randomize the order and note whether you
get a better answer.
Other problems similar to TSP
• Easy problems:
– Shortest path problem:
•
•
•
•
Assume: vertices and edges are known
Given two vertices. Find the shortest connected path between them.
Algorithm: Dijkstra (O(|V|2))
Application: route planning
Edges weighted
– Chinese postman problem:
• A mailman wants to deliver mail to a certain neighborhood. The mailman
is lazy, and he wants to find the shortest route through neighborhood,
that meets these criteria:
– It is a closed circuit
– He needs to go through every street at least once
• Algorithm: Edmond (O(|V|3))
• Application: UPS delivery route planning
Edges weighted
Other problems similar to TSP
• Easy problems:
– Minimum spanning tree problem
• Given: a connected graph G with weighted edges.
• Find: a connected subgraph that touches every vertex in the graph. The
sum of weight of edges must be minimal.
• Algorithm: Prim and Kruskal (O(|E| log |E|))
• Application: A cable TV company is laying cable to a new neighborhood. It
is constrained to bury cable only along certain paths. Find a spanning tree
that is a subset of those paths that connects every house and incur
minimum laying cost.
Edges weighted
Other problems similar to TSP
• Easy problems (continued):
– Euler Trail problem:
Seven bridges of Konigsberg -
Take a walk through the city that would
cross each bridge once and only once.
Is there a solution?
Other problems similar to TSP
• Euler Trail problem (continued):
– Let’s convert it to a graph problem:
•
•
•
•
Define a trail: a path that visits every edge exactly once.
Open trail: a trail where start != end.
Closed trail: a trail where start = end.
Find an open trail or a closed trail in the graph.
Can you find an Euler trail in Konigsberg?
Edges unweighted
Other problems similar to TSP
• Euler Trail problem (continued):
– Modern day Konigsberg (after bombing during
WWII destroyed two bridges):
Other problems similar to TSP
• Hard problems:
– Find a Hamiltonian circuit (a path that visits every vertex in
the graph exactly once and return to starting vertex)
Edges unweighted
Hierarchical clustering
1
0.6
0.8
4
0.2
0.0
2
-0.2
3
-0.4
x[,2]
0.4
5
1
0.1
0.2
0.3
x[,1]
0.4
2
3
2
0.62
3
0.42
0.19
4
1.32
0.81
0.93
5
1.02
0.48
0.61
4
0.33
Hierarchical clustering (single-linkage)
1
2
3
2
0.62
3
0.42
0.19
4
1.32
0.81
0.93
5
1.02
0.48
0.61
Iteration 1
4
0.33
node
nodemin distmin
1
2
3
4
5
3
3
2
5
4
0.42
0.19
0.19
0.33
0.33
Join 2 and 3
1
2
3
2
0.62
3
0.42
0.19
4
1.32
0.81
0.93
5
1.02
0.48
0.61
4
0.33
1
2,3
2,3
0.42
4
1.32
0.81
5
1.02
0.48
4
0.33
Iteration 2
1
2,3
4
2,3
0.42
4
1.32
0.81
5
1.02
0.48
0.33
node
nodemin distmin
1
2,3
4
5
2,3
1
5
4
0.42
0.42
0.33
0.33
Join 4 and 5
1
2,3
2,3
0.42
4
1.32
0.81
5
1.02
0.48
1
4
0.33
2,3
0.42
4,5
1.02
2,3
0.48
Iteration 3
1
2,3
2,3
0.42
4,5
1.02
0.48
node
nodemin distmin
1
2,3
4,5
2,3
1
2,3
Join 1 and 2,3
1
2,3
0.42
4,5
1.02
2,3
1,2,3
4,5
0.48
0.48
0.42
0.42
0.48
Iteration 4
1,2,3
4,5
0.48
node
nodemin distmin
1,2,3
4,5
4,5
1,2,3
Join 1,2,3 and 4,5
0.48
0.48
0.35
Note that the height corresponds to the
minimum distance at each iteration
5
dist(x)
hclust (*, "single")
3
2
0.15
0.20
{ { {2, 3}, 1}, {4, 5} }
0.25
4
0.30
Final Tree:
Height
1
0.40
0.45
Cluster Dendrogram
1.5
Exercise
Pairwise distances
10
1.0
1
8
7
6
0.5
0.0
1
2
4
3
-0.5
x[,2]
9
5
-0.5
0.0
0.5
1.0
2
3
4
5
6
7
8
2
0.46
3
1.02
1.43
4
0.19
0.37
1.06
5
0.76
0.88
0.90
0.62
6
1.37
1.83
0.94
1.52
1.70
7
1.42
1.88
1.00
1.58
1.76
0.06
8
1.92
2.38
1.35
2.07
2.19
0.55
0.50
9
1.23
1.68
0.86
1.38
1.58
0.14
0.19
0.69
10
2.18
2.62
1.78
2.35
2.57
0.87
0.81
0.49
x[,1]
Cluster these points using single-linkage method,
using complete-linkage method
9
0.99
Answers
Cluster Dendrogram
8
2
9
1
0.0
4
1
4
0.2
2
5
10
8
Complete linkage
dist(x)
hclust (*, "single")
Single linkage
7
6
7
6
9
dist(x)
hclust (*, "complete")
10
5
0.4
Height
3
1.0
0.5
0.0
Height
1.5
0.6
2.0
3
0.8
2.5
Cluster Dendrogram
Hierarchical vs K-means
• Run-time analysis. Which clustering algorithm
runs faster?
• What’s the bottleneck step in hierarchical
clustering?
• What’s the disadvantage of K-means?
• What’s the disadvantage of hierarchical
clustering?
© Copyright 2026 Paperzz