Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
The combined distribution/assignment problem
in transportation network planning: a parallel
approach on hypercube architecture
V. Lacagnina, R. Russo
Istituto di Matematica per la Ricerca Operative*, Facoltd di
Economia e Commercio, Universitd degli Studi di Palermo,
Viale delle Scienze, 90128 Palermo, Italy
Abstract
The joint distribution/assignment problem plays a central role in urban transport
network planning. In this problem, according to the mathematical model proposed by
S. P. Evans, the trips are iteratively calculated and assigned to the network in such a
way that the resulting traffic flows pattern satisfies the selfish equilibrium condition.
Unfortunately the number of variables and constraints increase hardly with the
greatness of the networks causing long computational time for the equilibrium
solution. In this paper an nCUBE 2 parallel computing architecture is employed to
solve the combined problem and to asses the potential of MIMD machines to handle
large scale transportation network problems.
1 Introduction
The transportation network planning is a very complex problem with a number of
variables and constraints increasing hardly with large scale real networks causing
long computational time. Each transportation planning activities consists of a number
of stages. Among these ones, trip distribution and traffic assignment have a basic
importance. The trip distribution stage is concerned with the estimation of the number
of trips per unit time which will be made under certain circumstances between each
pair of zones (centroids) in the area to be studied. Traffic assignment estimates the
amount of traffic which will use each arc of a transport network under certain
conditions.
The estimated trip demands, obtained by solving a trip distribution model, are
allocated to different routes of the network, assuming that each traveller, in choosing
his route, attempts to minimise his own journey cost taking account of the traffic
conditions resulting from the other travellers' decisions. The resulting decision
produce a selfish equilibrium flow pattern such that no traveller can decrease his
journey cost by changing his route.
However, trip distribution model can't be solved independently by the assignment
one. The costs of each path obtained in the assignment stage are necessary to solve
the trip distribution stage. This problem is overcome by combining trip distribution
and traffic assignment into one stage and describing them by one model.
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
280
High-Performance Computing in Engineering
The aim of this work is to asses the feasibility and performances of a parallel
implementation of the strictly serial algorithm on MIMD machines to solve the
combined distribution/assignment problem. The experiments, carried out by random
networks, show good performances obtaining a quasi-linear speedup.
2 The combined distribution/assignment problem
The overall transportation planning process is composed basically of four phases:
travel generation, travel distribution, modal split and travel assignment. Among these,
according to the mathematical model proposed by S. P. Evans [2], we consider the
trip distribution and traffic assignment stages.
In a standard approach the two phases are evaluated one at a time being known the
total volume of demand and supply of every centroids in the network. However, the
cost of travelling along an arc increases with the amount of traffic that uses the same,
and this is usually taken into account when the trip demands obtained from a
distribution model, are allocated to routes through the network according to a traffic
assignment model. However, the link costs which correspond to the final traffic
flows, estimated from the traffic assignment model, generally are not the same as
those assumed at the trip distribution stage. Combining trip distribution and traffic
assignment into one stage the problem is overcome.
The combined model is defined by the following three sets of assumptions. First, the
demand for trips is described by a gravity model, in which the number of trips per
unit time between each pair of centroids are known and the cost function has an
exponential form for each origin-destination (O/D) pair. This model is known as the
doubly constrained gravity model with exponential cost function. Secondly, it is
assumed that the cost of travelling along each arc is a known, strictly increasing
function of the traffic flow on that arc such that the cost increases indefinitely as the
capacity of the arc is approached. Thirdly it is assumed that the assignment of the
traffic flows pattern satisfies the selfish equilibrium condition (Wardrop's first
assignment principle).
We denote:
T> - set of the network nodes;
2 _ subset of nodes representing zone centroids which are both origin and destination;
2 - subset of centroids which are destination for the trips which begin at /':
Z, = {A/eZ,./*/}Vie2;
<£ - set of the network arcs;
V, Z and L - respectively the number of elements in %% 2. and £ ; it is assumed that
the zone centroids are identified by numbers running from 1 to Z, and the other nodes
by numbers running from Z+l to V;
/ andy - respectively origin and destination centroids;
XM - number of vehicles per unit time travelling on link (A:,/) e JL;
Xyj - number of vehicles per unit time which use the link (A:,/) e £ and which
originated at vertex /;
XM - capacity of the arc (&,/) e £;
t.j - number of trips per unit time which begin at / and end at j;
a- - number of trips per unit time beginning in origin zone i\
b- - number of trips per unit time ending in destination zone/;
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
High-Performance Computing in Engineering
281
^*/(**/) " cost or congestion function on link (k, /);
Let we define the set of function Q/(x^) = J " c^(x) dx. We assume that for each
link (k,l) the function Q,(JC^) is continues on [0, XM) and tends to infinity as x tends
to XM. It is also strictly convex since its derivative CM is a strictly increasing function
on [0,Ajk/). The combined distribution/assignment problem formulated by Evans can
be summarised as follow (S. P. Evans 1976):
,
where x_ is a IxZ, row vector whose components are %&,, (kj) e <£, and /_ is a
lxZ(Z-l) row vector whose components are the elements of the matrix (t-- ),
(i,j E Z), ordered row by row with the diagonal elements omitted.
The whole constrains set can be thought as subdivided in three subsets with respect to
the following stages: distribution, assignment and selfish equilibrium. In the trip
distribution model, the t.. are estimated by:
f,. =%. ^exp(-ay,y) Vz'E%,/E%.
(1)
where r. and s. are parameters associated with the origin / and destination j
respectively and a is a known parameter. The y^ in condition (1) are the shortest
path costs for each origin / and destination j. They satisfy the assignment model
subject to the equilibrium condition, as it will be shown later. The t.. will be required
to satisfy the following conditions as well:
T.t,,=a, V/e2
(2)
f,t,j=bjVjeZ
(3)
/' 6 2 •
t.. >0 Vie 2, y e 2,
(4)
The trip demands ?,, must be assigned to the routes through the network using the
assignment model:
V/e2 XHJ >0 V(/t,/)e £
(5)
V / e 2 XHJ =0 V(*,/) gf
(6)
)e£
(7)
£
(8)
Z^,,-^.,=^ V/e2je2,
(9)
k e "0
A- e T)
Z^,,-Zv,=0 V/e2,/«2
(10)
* 6 t)
/t 6 "0
where x^. is the flow originated at node / that uses the arc (A:,/) e £ in the unit
time. It must be observed that the trip demands t.. are estimated according to the
conditions (l)-(4) each time that the flows are assigned with the minimum costs y ./
which satisfy the Wardrop's condition:
y =0 Vze%
(11)
(13)
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
282
High-Performance Computing in Engineering
3 The algorithm for combined distribution/assignment problem
Let we define the set of constraints:
6D = {(x,f): there exist x^ • such that conditions (2) - (10) are satisfied},
to solve the combined problem it is required to minimise the function P(x^f) subject
to (x,f) ^ ^. The algorithm starts from any solution (%,,^) e ^ and generates a
sequence {(%„,/„) ' ^ ~ 1,2,3,...} in £/X The procedure, which indeed converges to
the optimum (x ,/ ), can be described as follows.
For each origin vertex / e Z find the minimum cost travelling k--(x^) to each
destination j e 2., and choose a minimum cost route from / toy. Then, a new vector
of demands q is found by solving the doubly constrained gravity model. These new
demands are assigned to the new cost routes to obtain for each link (k,l) the new flow:
\\ if (A:,/) is a link of minimum cost route
, c-kl
where
o — , .
"* ' ^ otherwise
The new vectors (x^\^+\) are obtained by the linear combination
(1 — X)(%" ,/~") + h(y
—n ,q
—n ), 0<1<1, of (Xn,tn) and (y.n^n) that minimises the objective
function P. The iterative procedure generates a sequence {(x,,,^,,)} in ^ such that
P(x.n+]>tn+\) < P(x_n>t-n) ^ each n > 1. The procedure converges to the optimal
solution (x , / ) as n tends to infinity but really it is possible to stop it for a finite n
when:
where s is some small quantity chosen according to the level of accuracy required.
See Evans for rigorous prof of convergence. The algorithm is depicted in figure 1.
4 The parallel system
Parallel computing systems consist of several processors connected with high velocitylinks which allow reliable and predictable communication. The system we used is the
nCUBE 2, a general purpose parallel system with a hypercube topology. The nCUBE
2, according to Flynn classification [4], is a MIMD (Multiple Instruction Multiple
Data) system. The maximum number of processors is 8192 on an hypercube of order
13.
According to Bertsekas [1] definition of a hypercube, we "consider the set of all
points in ^-dimensional space with each coordinate equal to zero or one. These points
may be thought of as the corners of a ^-dimensional cube. We let this points
correspond to processors, and we consider a communication link for every two points
differing in a single coordinate.
The resulting network is called hypercube or d-cube". Naturally, if d is the dimension
of the cube then there are 2^ processors. In fig. 2 is depicted a four-dimension hypercube.
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
High-Performance Computing in Engineering
283
Two processors of a d: START
cube have d independent
i
paths and their Hamming
distance h is such that
Initial solution (
1 < h < d: h is equal to the
number of bits in which
their identity numbers difShortest Path => compute min costs k.
fer. This property is very
important, in fact if the
identity numbers of two
nodes in a (/-cube differ in
Distribution => compute new trips q nij
k bits, then there are k independent paths of k links
between these nodes, and
the remaining d-k paths
Network loading => compute new flows y
with k+2 links. This characteristic make hypercube
the most efficient parallel
yes
system but also the most
Is the convergence attained ?
STOP
flexible allowing to represent other parallel systems
like array, mesh, ring and
spanning tree.
Each nCUBE 2 node consists of an nCUBE 2S
processor and 1 to 64
I....
n = n+\
Mbytes of memory [9].
The process integrates a
Fig. 1- Algorithm flow chart for the combined
distribution/assignment problem
64-bit general purpose
CPU, a memory management unit (MMU) and a
network communication unit (NCU). The NCU includes 14 bidirectional DMA
(Direct Memory Access) ports. Of these ones, 13 are dedicated to interprocessor
communications, and one to communicate with remote hypercube spaces or foreign
systems. The architecture of NCU makes the
28 channels able to operate independently from
the CPU, allowing communication and computation to occur simultaneously. In such a way
adding processors to the system increases
computational speed and communications
bandwidth.
5 The parallel algorithm
The program was realised in a C extended code
for nCUBE 2 machines. Given the typical se0000
0001
FIG.2- A view of a 4 order hypercube rial iterative structure of the algorithm we chosen a master-slave activity as the best parallel
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
284
High-Performance Computing in Engineering
approach: in a cube with p processors, only one processor (master) coordinates and
distributes job to the other p-\ processors (slaves). This means that only the master
knows the problem really. In fig. 3 are showed the activities of master and slaves and
the data flows. The syncronization is obtained by a message passing method.
(^Master Activities^
( Data Flows ^
(Slave Activities^ The program starts
with the allocation
/k
h
>v
r_
/
O/D Pair
\
r Shortest Path for a single \ of the hypercube
Shortest
Path
Problem
\.
Shortest
Path
^
A Management
O/D Pair
with required dimen/I*
i\
/ ^Cosfs Array \ Parallel Computing of
Gravity Model
sion. After this
<
O/D
Matrix
J^
Double
Constrained
Management
Gravity Model
phase, each slaveNetwork Loading
processor receives a
I New Flows Array ^
copy of the network
Test for Convergence
information and the
/Opf//na/7Mbve
S/ze\
Parallel
Computing
of
Linear Combination
X. Obj. Func. Value ^ Objective Function
subroutines for the
computation of all
and "-!r New Solution Updating j NewTripsFlows
Arrays ^^
modules subjected to
v__
m4
a parallel activity:
FIG. 3- Master and slave activity and dataflows
shortest path module
(SPM), gravity module (GM) and line search module (LSM). The SPM is the time
critical part of the whole algorithm. Since it is possible to calculate the shortest path
independently for each O/D pair, we have decomposed the whole problem into Z(Z-1)
(where Z is the number of centroids) shortest path problems. Using p processors, it
would theoretically improve the computation speed by an order Z(Z-\)/p. This
approach is preferable to obtain a good scalability as it will be shown in the
experimental frame.
For GM and LSM we have used the nCUBE 2 Parallel Library, which contains
communications routines, like a broadcast between processors, and global functions,
like a global sum of a 32-bit vector. Then the GM and LSM simply become
computation subroutines called from the master processor.
The network loading and the test for convergence does not need to be parallelised.
For this reason we have included them in the master activity.
It must be outlined that data are very long arrays and lists: it has represented a
problem because of the dimension of nCUBE 2 message buffer. Increasing the
problem dimension it is possible a buffer collapse causing algorithm downfall for
memory problems. We solved this problem breaking messages in fixed dimension
slices.
6 The experimental frame
The tests have been carried out employing randomly generated networks. The
developed random generator permits to choose the network characteristics: number of
centroids and total nodes, total volume of demand and supply for each centroids,
ranges of arcs capacity and length. The connectivity of the network has been achieved
by an open bidirectional route connecting all nodes.
We have generated four test groups; each one of them is constituted of five networks
as summarised in table 1.
The principal aim of this work has been to asses the improving of computing
performance by a parallelisation of serial classical algorithms. To evaluate the parallel
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
High-Performance Computing in Engineering 285
implementation we have used two performance indices:
T (n}
T (n)
SM=T,(n)
'
p
where p is the number of processors, n the problem dimension, T (n) and T (n) the
computing time employing respectively one and;? processors.
Tab. 1 - The four test groups.
Group Centroids Total nodes Number of O/D pairs Range of arc numbers
A
50
1241-1262
300
2450
1000
9900
B
100
4118-4136
150
1000
4157-4168
22350
C
2000
8014-8097
150
22350
D
The speedup estimates the performance of parallel algorithm compared to the best
serial one. The efficiency measures the fraction of time that a typical processor is
usefully employed. If S^(n) = p and E^(n) = \, p processors speed up the
computing time by a
factor />, i.e. the
computing time will
/
be 7*(%)/;?. This
/
2
75E+05
condition is named
/
/
linear speedup. Be-+-8
proc.
2.25E 05 ^ -•-16 proc.
//
cause of the global
//
synchronisation,
.E ^+" F -•-128 proc.
/
linear speedup must
/
/
be considered as an
^
:
upper bound.
_- "
In order to run so
-• ""
_-2,502
+
04
:
__^ •-""" — _---i 1
much tests as posis== = = = = 5 = = 3
fc=S========j
O.OOE+00 if--—
sible only one itera[)
E5
C
A
tion of the algorithm
Group of problems
has been performed,
FIG.4 -Sc^alability diagram
because of the serial
computational time is very high beginning from small networks.
Tab. 2 - Speedup and efficiency mean values.
Group of
32 proc.
16 proc.
8 proc.
64 p roc.
128 proc.
Problems i%
S
16
$32
J&/6
&
•S/28
^/22
$64
&64
£32
1 92 0.990 14 .00 0.875 30.08 0.940 53.12 0.830 96.00 0.750
A
1 00 0.875 14 .00 0.875 28.00 0.875 52.03 0.813 107.90 0.843
B
6 00 0.750 12 .00 0.750 26.02 0.813 5267 0.823 113.92 0.890
C
1 00 0.875 14 .00 0.875 29.12 0.910 55.04 0.860 115.20 0.900
D
In tab. 2 are shown the average speedup and efficiency values obtained changing the
number of processors. As it can be observed, the efficiency is very high and
sometimes it is close to the value 1. It depends on two reasons. The first regards the
hardware characteristics of nCUBE 2 that makes possible the best interprocessor
communications in comparison with the other parallel architectures. The second
results from the parallel algorithm and especially from the particular parallelisation of
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
286
High-Performance Computing in Engineering
the shortest path problem. It
must be observed that the
highest values of efficiency
aren't in general obtained for
the maximum number of
employed processors. It depends on the transmission
time that in these cases overcomes the computing time.
In fig. 4 it is depicted the
scalability of the algorithm
simply expressed in unit time
of computing changing the
problem dimension, as re-
72 80 88 96 104 112 120 128
FIG.5 -
the group B problems.
gards a fixed number of processors (from 1 to 128). A
•
dashed polyline joins the
:
computation times for each
number of processors, chang^
^"^^^
ing the problem dimension.
+ ^%^^
i ;
X^^^
For the serial code, this
polyline exhibits a high slope
I
^
;
;
;
>
'
'
at once, and increases further
on starting from the C group.
r'*~\
i i i i i i i i i i i i i i i
- ^
Therefore, the serial computing time has a quasi-exponential behaviour versus the
FIG. 6 - Speedup for the group C problems.
problem dimension. The experiments with the parallel code exhibits, instead, dashed polyline with a soft
increasing trend. This behaviour is evident in figs. 5,6 and 1 which display the
calculated average speedup compared to linear speedup for B, C and D group
problems.
Increasing the problem dimension, the slope of the real speedup polyline tends to the
theoretical value (45%) not only globally but also locally.
+
/ ''%^ &
^^^^
/^ %^
Implementation notes
128 120 -
For the experimentation we
made use of a nCUBE 2 machine configured with two
hypercubes: one with 128
processors with 4 Mbytes of
memory and the other with 8
processors with 16 Mbytes.
This system is installed at the
CNUCE Institute of CNR
(Italian National Council of
Research). The nCUBE
96 88 J- ** - "•^
40 t
24 '8 - - ^+'-^
0
X/^ ^
,'/
/'/
^-%^
/^%^"
x^/^
+ ^s'
,^%^
+^
s''^
-° "Average speedup
FIG. 7 - Speedup for the group D problems.
Transactions on Information and Communications Technologies vol 9, © 1995 WIT Press, www.witpress.com, ISSN 1743-3517
High-Performance Computing in Engineering
287
utilises a SUN workstation as a development system.
Conclusions
In this paper we dealt with a particular problem of the transport networks planning
activity. In the scientific literature a lot of exact models were proposed, but few
computational experiences are available because of the high number variables and
constrains which cause long computational times. This work had the purpose to
employ a nCUBE 2 parallel architecture to solve the combined
distribution/assignment problem. We have assessed the potential of MIMD machines
to handle large scale transport network problems. The obtained results, employing
random networks, show a very good performance of the parallel code compared to the
serial code. A quasi-linear speedup was achieved solving medium-large scale
networks. Further work should be made to improve the efficiency of the shortest path
module and extend the model in order to solve real problems with greater detail in
system representation. To this last end it is necessary at least to consider multimodal
networks and employ a modal split model joined to trip distribution and traffic
assignment.
Acknowledgement
This work has been supported by "Progetto Finalizzato Trasporti 2" of CNR.
Bibliography
1 . Bertsekas D. P., Tsitsiklis J. N., Parallel and Distributed Computation - Numerical
Methods, Prentice-Hall International Inc., 1989.
2. Evans S. P., Derivation and analysis of some models for combining trip
distribution and assignment., Transportation Research, 10, 1976.
3. Evans S. P., A relationship between the gravity model for trip distribution and the
transportation problem in linear programming, Transportation Research, 7, 1973.
4. Flynn, M. J., Some computer organizations and their effectiveness. IEEE Trans.
Computer, C-21, 948-960, 1972.
5. Junchaya T., Chang G., Santiago A., ,4ahwz<W 7)%#c
Computing Architecture., Transportation Research Record 1358.
6. Mahmassani H. S., Mouskos K. C., Vectorization of transportation network
equilibrium assignment codes., Impacts of Recent Computer Advances on
Operation Research, Elsevier Science Publishing Co., Inc., 1989.
7. Oppenheim N., Equilibrium trip distribution/assignment with variable destination
costs., Transportation Research, 27B, N. 3, June 1993.
8. Oppenheim N., Applied models in urban and regional analysis, Prentice-Hall, Inc.,
Englewood Cliffs, New Jersey, 1980.
9. Schmidt- Voigt M., Efficient parallel communication with the nCUBE 2S
processor. Parallel computing, 20, 509-530, 1994.
lO.Sheffi Y., Urban Transportation Networks., Prentice-Hall, Inc., Englewood Cliffs,
New Jersey, 1985.
11. Van Grol H. J. M., Bakker A. F., Special-Purpose Parallel Computer for Traffic
Simulation. Transp. Res. Record 1306.
© Copyright 2026 Paperzz