2. Review of bridge management optimization models

System-Level Stochastic Optimization
of Bridge Maintenance and Replacement Decisions
Charles-Antoine Robelin1
Samer M. Madanat2
Abstract: A formulation of the problem of optimizing maintenance and replacement
decisions for a system of heterogeneous bridges is developed, allowing for exact
continuous solutions. The formulation is based on the solutions of a reliability-based,
facility-level optimization method that uses history-dependent deterioration models. An
implementation of the system-level solution is presented for the discrete case. This
numerical example supports that the discrete-case implementation represents a valid
approximation of the continuous case, while producing solutions that are
computationally efficient for large networks.
Acknowledgments: Partial funding for this research was provided by the University of
California Transportation Center (UCTC) to the first author through a dissertation-year
fellowship.
1
Graduate Student Researcher, Department of Civil and Environmental Engineering,
University of California Berkeley, 116 McLaughlin Hall, Berkeley, CA 94720-1720.
Telephone: +1-510-642-7390. Fax: +1-510-642-1246. Email address:
[email protected]
2
Xenel Professor, Department of Civil and Environmental Engineering, University of
California Berkeley, 110 McLaughlin Hall, Berkeley, CA 94720-1720. Telephone: +1510-643-1084. Fax: +1-510-642-1246. Email address: [email protected]
1.
Introduction
This article addresses the optimization of bridge maintenance and replacement
(M&R) decisions for a system of heterogeneous facilities. Evaluating the cost
effectiveness of maintenance and repair decisions requires the prediction of the future
condition of the facilities. This prediction is performed through the use of deterioration
models. Given the inherent stochasticity associated with the deterioration, the purpose
of a deterioration model is to explain part of the uncertainty in the prediction of future
conditions. There exists a wide range of deterioration models, differing in their required
inputs and their prediction capabilities. This article concentrates on the use of
deterioration models that take into account aspects of the history of maintenance and
deterioration. The use of these history-dependent models is presented in
computationally efficient formulations of the problem of optimizing maintenance and
replacement decisions for a system of several facilities.
The determination of optimal policies for a system of facilities cannot be reduced to
the determination of optimal facility-level policies, repeated independently for each
facility. At the system level, the facility-level decisions are interdependent due to the
presence of a budget constraint, which introduces trade-offs between facilities.
The system-level problem can be formulated using two different approaches: topdown or bottom-up. In a top-down approach, M&R optimization is first performed at
the system level. Actions are recommended for fractions of the system population.
Actual facilities on which actions are performed are then selected to match the systemlevel recommendations as closely as possible. This final selection may differ from the
optimization performed previously at the network level. The top-down approach is only
possible if the system consists of relatively homogeneous facilities. In a bottom-up
approach, M&R optimization is performed at the facility level, and actions are
recommended for individual facilities. These recommendations are then aggregated and
certain actions are selected to take into account system-level constraints, such as the
budget constraint mentioned above. The final selection of actions may differ from the
optimization done earlier at the facility level. The bottom-up approach is appropriate
when the system consists of heterogeneous facilities.
The formulation presented in this paper follows a bottom-up approach, in order to
account for facility-specific details and to be suitable for systems of heterogeneous
facilities.
2.
Review of bridge management optimization models
The objective of bridge management optimization models is to determine optimal
M&R decisions, based on the knowledge of the current condition of the system through
inspections and on the prediction of future condition through the use of deterioration
models.
The optimization can be formulated as a Markov decision process (Madanat, 1993;
Jiang et al., 2000). In these methods, the deterioration is described by a Markov chain,
with the state representing the condition of the facility. The main advantage of these
models is that they enable the use of standard and efficient optimization techniques. As
a consequence, these models have been used to solve the system-level problem, i.e. to
determine optimal M&R decisions for a system of facilities (Hawk, 1994; Golabi &
Shepard, 1997; Smilowitz & Madanat, 2000). The system-level optimization can be
solved using linear programming.
The limitation of the Markovian models is the memoryless assumption, according to
which the probability for the condition of a facility to transition from an initial state A
to a lower state B does not depend on the time spent in state A or on the history of
deterioration and maintenance. Although the assumption of history independence may
be valid for certain bridge states, namely those where the deterioration is primarily
governed by mechanical processes, it has been shown empirically that it is unrealistic
for bridge states where the deterioration is primarily governed by chemical processes
(Mishalani & Madanat, 2002). Moreover, these optimization models are based on a topdown approach, which is only feasible for systems of relatively homogeneous facilities.
Deterioration models in which the history of deterioration is taken into account exist
and have been used in bridge M&R decisions optimization, considering one facility
(Mori & Ellingwood, 1994), or a system of homogeneous facilities (Kong & Frangopol,
2003). However, due to the complexity of their underlying deterioration models, these
optimization methods use a very limited number of decision variables in order to remain
tractable. To the knowledge of the authors, there does not exist a system-level bridge
management optimization method that has more than a few decision variables and that
is based on a deterioration model that takes into account the history of deterioration and
maintenance.
The purpose of the present paper is to determine optimal bridge M&R decisions for a
system of heterogeneous facilities, while using a deterioration model that takes into
account important aspects of the history of deterioration and maintenance.
3.
Problem definition and formulation
The system considered in the present article is a system of facilities managed by a
single agency, such as a state Department of Transportation in the United States.
Although the condition of each bridge in the system changes over time, the system
remains constant over the planning horizon: bridges are neither added to the system nor
decommissioned. The system can be composed of heterogeneous bridges.
Of the three main bridge components (deck, substructure, superstructure), the deck
experiences the fastest deterioration. Therefore, the deck is the only component
considered in this study. The condition of a bridge deck is measured by its reliability
index  , which is linked to its probability p of failure by the following relationship:


p   , where  is the cumulative standard normal distribution. The deterioration
of each facility is modeled using a Markov chain whose state includes aspects of the

history of deterioration and maintenance, and whose transition probabilities are

estimated using a mechanistic model of deterioration.
The planning horizon is broken down into periods of one year. Maintenance and
replacement decisions are made every year for every facility.
The condition of each facility is assumed to be known perfectly, which in practice
means that inspections are carried out frequently and are error free.
3.1.
Facility-level problem and solution
The system-level optimization presented in this article is based on the results of the
optimization of maintenance and replacement decisions for one facility summarized
below (Robelin & Madanat, 2006).
The facility-level problem uses a Markovian deterioration model that accounts for
aspects of the history of deterioration and maintenance. These aspects are: the time
since the performance of the latest maintenance action and the type of that action. Such
a model represents a compromise between simple Markovian deterioration models that
allow the use of standard optimization techniques, and detailed mechanistic
deterioration models whose complexity prevents efficient optimization of maintenance
and replacement decisions. The state of the Markov chain that represents deterioration
is composed of the following variables:

 : current reliability index of the facility;

m : integer indicating the type of the latest action (maintenance or replacement)

performed on the deck (or 0 if no action has been performed since the deck was

new);

 : time since the latest action (or time since the deck was new, if no action has
been performed yet);
 The transition probabilities of the Markov chain are estimated from a mechanistic
deterioration model (Frangopol et al., 2001), using Monte Carlo simulation.
The M&R optimization problem can be formulated as a finite-state Markov decision
process, with a finite horizon (Bertsekas, 2001). It consists of minimizing the expected
cost of maintenance and replacement, subject to the following reliability constraint: the
reliability index of the facility must remain above a user-defined threshold.
The following notation is used:

X : state space of the Markov chain representing the deterioration of the deck.
X is the set of all possible values for x  ,m,  , as defined above.


U x : set of all possible M&R actions, i.e. all types of maintenance actions,

replacement, or do-nothing.




c u x : cost of action u performed on bridge deck in state x .

 : discount amount factor,   1/1 r where r is the discount rate.

Vt x  : minimum cost-to-go for the agency to manage a bridge deck from year t





to the end of the planning horizon, starting from state x in year t .



: set of optimal decisions. t x  is the optimal decision when the bridge deck

is in state x in year t .

 The problem formulationis as follows, for any x  X :


Vt x   min uU c u x    

 V S if t  T 1
y X

P y | x Vt 1 y  if t  0, ,T  2

(1)
where V S captures the usefulness of the bridge deck past the planning horizon. For
practical purposes, the influence of the actual value of V S is limited if the discount rate

r is strictly positive. The cost c u x  is used to implement the reliability constraint: for



each state x corresponding to an unacceptable reliability index, the cost of action u is
chosen to be infinite for each action u  U .
The problem can be solved using backward recursion (Bertsekas, 
2001), which

provides, for all time steps t and all states x , the optimal value t x , i.e. the optimal
action to be taken at time step t if the facility is in state x . The function  is referred to



as a policy, and is defined mathematically as a mapping from the set composed of the

state space and the time space ( X  0, ,T
) to the set U 
of actions. The optimal
expected cost of maintaining the bridge deck over the entire planning horizon is also
 recursion algorithm: V x , where x is the initial state of
provided by the backward
0 0
0
the facility.

3.2.

Formulation of the system-level problem
In the facility-level problem, the reliability threshold was expressed in terms of
reliability index  . In the system-level problem, it is expressed in terms of probability
of failure and is denoted by p . These two expressions are equivalent, since there is a
 relationship between the probability of failure p and the reliability index  :
one-to-one
p   .


At the facility level, the optimization consists 
of minimizing the cost of maintenance

and replacement, subject to a reliability constraint. A direct extension of this
formulation at the system level would consist of minimizing the cost of maintenance
and replacement for all facilities, subject to a reliability constraint for each facility. This
would require the user of the optimization method to provide a value for the threshold
of the probability of failure for each facility. An intuitive choice could be to take the
same value of the threshold for all facilities. With a slightly different formulation, it can
be proven that this choice is actually optimal in the context of reliability-based
optimization. An advantage of this formulation is that it avoids having to prioritize
bridges on the basis of the amount of traffic they carry, which eliminates the difficulty
of having to assign a monetary value to human life.
Let us define the set of decision variables as the set
p1,
, pn  of thresholds of
probability of failure for each facility, where n is the number of facilities. Let us
formulate the optimization as a minimization ofa quantity Jp1 ,
,p n 
, expressing the

performance in terms of probability of failure or risk, subject to a cost constraint. In the
context of reliability, it is relevant to choose Jp1 , ,p
to be the highest probability of
n
failure threshold among all facilities, denoted by
Jp1 ,
,p n 

 max i1,
,n 
pi
(2)
Another expression of the performance in terms of risk could be the average
 over all facilities. However, this expression is not suitable in the
probability of failure
context of reliability-based optimization, because the following unsafe situation would
be possible: the majority of the facilities with low probabilities of failure and a few
facilities with probabilities of failure close to 1.
The formulation of the system-level optimization is as follows:
min  p1 ,
, pn 
max p1, , pn 
(3)
subject to
n
 f p   B
i
i
i1
where B is the maximum total cost (budget), and f i pi  is the optimal cost of

maintenance and replacement for facility i  1, ,n, determined using the facility

level optimization method described in section 3.2.
 for a continuous interval of p 's, i.e. 0,1 . This
It is assumed that f i pi  is defined
 
i
assumption cannot be achieved in practice, since it would require solving an infinite
 the discrete case
number of 
facility-level problems; however, section 5 
shows that
implementation of this optimization problem can confidently be considered a good
approximation of the continuous case. Moreover, the functions f i are assumed to be
continuous. Even if these functions are likely to be only piecewise continuous in
 is smooth enough so that
practice, section 5 shows that the sum of these functions
approximating it by a continuous function is unlikely to lead to large errors.
For each facility, the function f i is nonincreasing. Intuitively, if larger probabilities of
failure are allowed for a facility (i.e. larger values of pi ), fewer M&R actions are
required, thus leading 
to a lower cost f i pi  . More formally, if pi  p j , then the set of

feasible actions with the threshold of probability of failure pi is included in the set of


feasible actions with the threshold of probability of failure p j . Therefore, the

formulation indicates that decreasing the threshold pi of the probability of failure for

each facility increases the total cost, and that the objective of decreasing the pi 's is

constrained by the budget.

4.
Continuous-case solution
The solution is first determined in the case where the facility-level optimal cost,
f i pi  , is defined for a continuous interval of pi 's, for each facility i  1, ,n.

4.1.
Optimal solution


If the following two conditions are met:
(i) i, j  1, ,n2, p  p
i
j

n

(ii)  f i  pi   B
 i=1
(4)
then the set p1, , pn  of thresholds of probabilities of failure of n heterogeneous

facilities is an optimal solution of (3).
Theoptimal solution p * can be expressed as



p*  min p  0,1 p :


n
(5)
f p  B
i

i1
If the functions f i are strictly decreasing, the conditions (4) are also necessary. In that

case, the function F : p

n
F  p   f i  p can be inverted, and the optimal solution
i1
p * has a simple expression:


p*  F 1B
(6)
where F 1 is the inverse function of F .
A corner solution alsoexists when the budget is very low, but it is very unlikely to


happen in practice. If the budget B is so low that it does not allow the threshold of
probability of failure of all n facilities to be strictly less than 1, then the optimal

solution is to use the entire budget. In that case, some facilities will have a threshold of
 strictly less than 1, while at least one facility will have this
probability of failure
threshold at 1.
Given the nature of the problem, it is impossible to maintain the threshold of
probabilities of failure of all facilities at 0 with a finite budget B . Therefore, there is no
corner solution for large values of the budget B .

4.2.
Intuition for the proof of
 optimality
The second condition in (4), which indicates that the entire budget is used at
optimality, is intuitive. In order to obtain an intuitive explanation for the first condition,
which indicates that all pi 's are equal at optimality, let us consider a system of two
facilities, and the following situation: the threshold of probability of failure of facility 1,
 than that of facility 2, p , as shown in figure 1, and the combination
p1, is strictly lower
2
,
of decision variables p1, p2  is feasible (i.e. f1 p1   f 2  p2   B ). It is easy to show that


this combination cannot be optimal.
Since f1 p1  decreases as p1 is increased, and


f 2 p2  increases as p2 is decreased, it is possible to decrease p2 by a small amount 2






and to increase p1 by 1 so as to keep f1 p1   f 2 p2   f1 p1  1  f 2  p2  2 . Thus,
the new combination
p1,  1, p2  2 
is feasible, and the objective function is



improved. This gives an intuitive explanation as to why p1 and p2 must be equal at
 proofs of the existence and optimality of the solution are provided in
optimality. Formal
Robelin (2006).


5.
Discrete-case implementation
5.1.
Empirical evidence of the validity of the discrete-case results
In practice, the values f i pi  cannot be determined for a continuous interval of pi ,
because this would require solving the facility-level optimization problem described in

section 3.2 an infinite
number of times. The values f i pi  can therefore bedetermined
only for a finite set of values of pi , for each facility. The present section aims to show

empirically that the results of the discrete case implementation represent a good

approximation of the continuous-case
solution.
The numerical application considers a system of 742 heterogenous bridges. The
deterioration of each bridge deck is modeled according to Frangopol et al. (2001), and
maintenance and replacement cost information is adapted from Kong & Frangopol
(2003). A system of heterogenous bridges is created by changing the parameters
provided in these papers within reasonable ranges. The facility-level optimization
problem is solved for each facility, for various values of the threshold of probability of
failure. The condition of each facility, measured by its reliability index  , is
discretized, and various discretization step sizes are tried.
 of the three
The results for the system of 742 bridges are shown in figure 2. For each
discretization step sizes, the optimization problem is solved for different values of the
threshold p of probability of failure (two different values with step size 2, four different

values with step size 1, and seven different values with step size 0.5). For a given
discretization step size, it can be noted that the sum of the facility-level optimal costs
decreases as the threshold p of probability of failure increases, which is intuitive, and
was one of the assumptions made in the continuous case. Moreover, the variation of the
optimal cost withrespect to p is relatively smooth; therefore, it is appropriate to
interpolate between the results for different values of p , and the continuity assumption
case is likely to be verified.
made in the continuous
The graph also shows that, for a given 
value of p , the sum of the facility-level
optimal costs decreases as the discretization step size decreases. This is intuitive,
because the model becomes finer as the step size decreases, thus allowing for
improvements in the optimization. It can also be noted that, for any given value of p ,
the difference in cost between step size 1 and step size 0.5 is much smaller than the
difference between step size 2 and step size 1, which suggests that the results
``converge'' as the step size decreases to 0.
These arguments, derived on a system of significant size, provide empirical evidence
that the results found in the discrete-case implementation can confidently be considered
valid approximations of the results in the continuous case.
5.2.
Implementation and derivation of facility-level policies
The facility-level optimization problem is first solved for each facility, for different
values of the threshold p of probability of failure (or equivalently, for different values
of the reliability index). This provides the optimal cost f i p for a discrete set of values
of p , for eachfacility. After summing these values over all facilities and interpolating,
the function F , defined as follows, is known.



F : 0,1  
(7)
n
p
 f p
i
i1
Given a user-defined value B of the budget, the optimal value p * of the threshold of
probability of failure
can be expressed as:

p*  min p : F p  B 
(8)
For each facility, solving the facility-level optimization with the threshold of
 taken as p * provides a set of maintenance and replacement
probability of failure
policies. This set of policies is optimal at the system level. The implementation
procedure is described 
in figure 3.
Note that the optimization described in section 3.2 utilizes a threshold of reliability
index instead of a threshold of probability of failure. The threshold of reliability index
corresponding to the threshold
p * of probability of failure is as follows:
*  1p*, where  is the cumulative standard normal distribution.


5.3.
Computational
complexity

The solution is such that the computational complexity of the system-level problem is
low. Namely, the fact that the threshold of probability of failure is the same for all
facilities at system-level optimum reduces the optimization problem for n facilities to n
independent facility-level problems. Therefore, the complexity is proportional to the


number of facilities in the system, i.e. On . For each facility,
the facility-level
optimization problem is solved a small number of times, as seen earlier. Since the time
 is of the order of a few seconds on a personal
to solve one facility-level problem
computer, typical values of computation times for the system-level problem are five
hours per thousand facilities. These computation times indicate that the present systemlevel optimization method can be applied to systems of the size of that managed by a
state Department of Transportation in the United States. For example, if applied to the
system managed by Caltrans, composed of 24,000 bridges (California Department of
Transportation, 2006), the optimization would require a computation time of
approximately five days, which is very short compared with the time scale of
maintenance and replacement decisions.
6.
Conclusion
This paper addresses the determination of the optimal maintenance and replacement
policies for a system of bridge decks. The results of the facility-level optimization are
incorporated in a new reliability-based, system-level formulation, with the following
characteristics:

the system can be composed of heterogeneous facilities, and recommendations
are provided for each individual facility;

the optimization uses deterioration models that take into account aspects of the
history of deterioration and maintenance. Such deterioration models have not
been widely used in optimization methods present in the literature, although
their benefits in reducing the uncertainty in the prediction of facility condition
are substantial;

the computational efficiency of the system-level solution makes the formulation
suitable for systems of realistic sizes;

the solution is proven to be optimal in the continuous case, and a parametric
study shows that the results obtained in the discrete-case implementation of the
solution seem to be valid approximations of the continuous-case results.
The reliability-based formulation for a system of facilities presented in this article is
likely to be applicable to other systems: fleets of vehicles, other civil infrastructure
systems, supply chain of time-critical goods, assembly lines, etc.
References
D. P. Bertsekas (2001). Dynamic Programming and Optimal Control. Athena
Scientific, Belmont, MA, 2nd edition.
California Department of Transportation (2006). Structure Maintenance and
Investigations. <http://www.dot.ca.gov/hq/structur/strmaint/> (Oct 09, 2006).
D. M. Frangopol, et al. (2001). ‘Reliability-based management of highway bridges’.
Journal of Computing in Civil Engineering 15(1):27–34.
K. Golabi & R. Shepard (1997). ‘Pontis: a system for maintenance optimization and
improvement for US bridge networks’. Interfaces 27(1):71–88.
H. Hawk (1994). Bridgit: Technical Manual. Toronto, Canada.
M. Jiang, et al. (2000). ‘Optimal life-cycle costing with partial observability’. Journal
of Infrastructure Systems 6(2):56–66.
J. S. Kong & D. M. Frangopol (2003). ‘Life-cycle reliability-based maintenance cost
optimization of deteriorating structures with emphasis on bridges’. Journal of Structural
Engineering 129(6):818–828.
S. M. Madanat (1993). ‘Optimal infrastructure management decision under
uncertainty’. Transportation Research Part C 1C(1):77–88.
R. G. Mishalani & S. M. Madanat (2002). ‘Computation of infrastructure transition
probabilities using stochastic duration models’. Journal of Infrastructure Systems
8(4):139–148.
Y. Mori & B. Ellingwood (1994). ‘Maintaining reliability of concrete structures. ii:
Optimum inspection/repair’. Journal of Structural Engineering 120(3):846–862.
C.-A. Robelin (2006). Facility-Level and System-Level Stochastic Optimization of
Bridge Maintenance and Replacement Decisions Using History-Dependent Models.
Ph.D. thesis, University of California, Berkeley, CA.
C. A. Robelin & S. M. Madanat (2006). ‘History-dependent optimization of bridge
maintenance and replacement decisions using Markov decision processes’. Forthcoming
in Journal of Infrastructure Systems.
K. Smilowitz & S. M. Madanat (2000). ‘Optimal Inspection, Maintenance and
Rehabilitation Policies for Networks of Infrastructure Facilities under Measurement and
Forecasting Uncertainty’. Journal of Computer-Aided Civil and Infrastructure
Engineering 15(1):5–13.
Figure 1: System of two facilities with thresholds of probability of failure such that
p1  p2.

Figure 2: Aggregation of results of facility-level optimization for 742 bridges.
Figure 3: Determination of facility-level policies that achieve system-level optimality.