Probabilistic Analysis of Tree Search

Probabilistic
Analysis
of TreeSearch
C . J . H .M c D i a r m i d
Abstract
Consider the family tree of an age-dependentbranching process,where the
branches have costs corresponding to birth times. The first-birth problem
of Hammersley (1974) then concernsthe cost of an optimal (cheapest)node
at depth n. Supposethat we must explore the tree so as to find an optimal
or nearly optimal node at depth n. We now have a suitable model for
analysing the behaviour of tree search algorithms, and we may extend the
investigations of Karp and Pearl (1983).
1. Introduction
Many algorithms considered in operations research,computer scienceand
artificial intelligence may be representedas a search or partial search through a rooted tree. Such algorithms typically involve backtracking but try
to minimise the time spent doing so. This paper extends work of Karp and
Pearl (1983), and gives a probabilistic analysis of backtracking and nonbacktracking search algorithms in certain random trees. We thus cast some
light on the question of when to backtrack: it seems that backtracking is
valuable just for problems with 'dead-ends'.
Let us review briefly the model and results of Karp and Pearl. They
consider an infinite rooted tree in which each node has exactly two sons.
The branches have independent 0, l-valued random costs X, with p =
P(X :0). (We have swappedp and I - p from the original paper.) The
problem is to find an optimal (cheapest) or nearly optimal path from the
root to a node at depth n.
The problem changesnature depending on whether the expected number rns = 2p of zero-cost branches leaving a node is greater than l, equal
to 1or lessthan 1 (as was suggestedin Hammersley1974,Note 8). When
ms ) | a simple 'uniform cost' breadth-first search algorithm A1 finds an
optimal solution in expected time O(n); and when nto = | this algorithm
takes expected time O(r'). When rns ( I any algorithm that is guaranteed
to find a solution within a constant factor of optimal must take exponential expected time. However, in this casea 'bounded-lookaheadplus partial
backtrack' algorithm A2 usually finds a solution close to optimal in linear
250
McDiarmid
expected time. This successfulperformance of the backtracking algorithm
A2 for the difficult case when ms { | was taken to suggest that similar
heuristics should be of general use for attacking NP-hard problems.
We shall see that with the above search model, a simple non-backtracking bounded-lookahead algorithm A3 performs as successfullyas the
backtracking aigorithm A2. Thus it seems hard to recommend the use of
heuristics like A2 on the basis of this searchmodel. Similar comments hold
if we allow more general finite random costs on the branches.
However, there is a qualitative difference if we aliow nodes to have no
sons (or allow branches to have infinite costs) so that there are 'dead-ends'.
We extend Karp and Pearl's work by considering search in random trees
generated by an age-dependentbranching process,in which the mean number of children of an individual is greater than one. The investigation is
related to the first-birth (or death) problem, as introduced by J.M. Hammersley (1974) (seealso Joffre et al. 1973). This model is discussedfurther
below. Let ps be the probability that a node has no sons, and let m6 be the
mean number of zero-cost branches leaving a node (instantaneous births).
Our results concerning algorithms A1 and A2 are natural extensionsof
Karp and Pearl's results. Thus the breadth-first search algorithm A1 finds
an optimal solution in linear expected time if rns ) L and in quadratic
expected time if ffio : 7. lf ms I 1 then any algorithm with a constant
performance guarantee must take exponential expected time, but the backtracking algorithm A2 finds a nearly optimal solution in linear expected
time.
However, the performance of the simple non-backtracking bounded
lookahead algorithm A3 depends critically on whether po : 0 or p6 ) 0.
Suppose that ms < 1, so that optimal search is hard. If po - 0, so that
as in the Karp and Pearl model there are no dead-ends, then algorithm
A3 usually finds a nearly optimal solution in linear expected time; that is,
it performs as successfullyas the backtracking algorithm A2. However, if
po > 0 then algorithm A3 usually fails to finds a solution. Thus our model
suggeststhat backtracking becomesattractive when there is the possibility
of dead-ends,
In the next section we give details concerning the search model and
the algorithms A1, A2 and A3, then in Section 3 we present our results,
and finally Section 4 contairu proofs.
2. Model
and Algorithms
We suppose that the tree to be searched is the family tree F of an agedependent branching process of Crump-Mode type (see Crump and Mode
1968). In this model an initial ancestor is born at time I : 0 and then produceschildren at random throughout his lifetime. It Zr(t) denotesthe num-
25r
Probabilistic Analysis ol Tlee Search
ber of children born up to time l, then 21ft) is an arbitrary counting process,that is a non-negative integer-valued non-decreasingright-continuous
random process. We do not insist Lhat Z1Q) = 0. The children of the ancestor form the first generation: from their several birth times they behave
like independent copies of their parent. Their children form the second
generation, and so on. We let Z"(t) be the number of individuals born in
the nth generationby time t, and let Z"(a) : sup, Zn(t).
We shall always assume that the mean number m : E[21(x)) of
children of an individual satisfiesm ) l, so that the extinction probability
gsatisfiesq< 1. Letpp:
PlZt(q) = &]. Clearlyq > 0 if andonly if
po ) 0, and these conditions correspondto the existenceof'dead-ends'.
When searching the family tree F we take the cost of a branch to be
the differencebetween the birth times of the child and the parent, and so we
take the cost of a node to be the birth time of the corresponding individual.
Thus we seek a first born individual in generation n. We shall denote the
correspondingoptimal cost by Ci (rather than .B,,): if Z^(a):
0 then we
set Ci = oo. Thus
P(C;:x):P(Z"(m)
=0) :Qn+q
asn+oo.
The interesting case is when the tree to be searched is infinite: we shall
often condition on the event .9 of ultimate survival, and then Cj is finite
for all n.
Assumptions. Recall that we assume that the mean family size rn satisfies m ) 1: this is essential. For conveniencewe shall also assume that
m is finite and that lifetimes (branch costs) are bounded. We are thus
able to show that certain events of interest fail with exponentially small
probabilities. (Truncation arguments as in Kingman (1975) may then be
used to obtain 'almost sure' results under weaker assumptions.) Further,
a simple translation allows us to assume that small costs can occur, that
isElZ{6)l>0ford>0.
The distinction between zero and non-zero costs turns out to be important. Let ms: EIZJ\)) be the expected number of zero-cost branches
from a node (instantaneous births to an individual).
We shall discuss the performance of three algorithms, A1, A2 and
A3, the first two of which are taken from Karp and Peatl (1983). Each
algorithm maintains a subtree ? of the family tree F containing the root;
and at each step explores some node of ?. Here, exploring a node tr mean
appending to r the next (leftmost) child g of r in F but not yet in 7, and
observing the cost of the corresponding branch ryi or observing that node
r has no more children.
Algorithm A1 is a 'uniform cost' breadth-first search algorithm and
will be analysed for the casesm0 ) 1 and rke : I, when there are many
zero-cost branches and search is easy. Algorithm A2 is a hybrid of local
252
McDiannid
and global depth-first search strategies and will be analysed for rns <
1. Algorithm 43 consists of repeated local optimal searches,and will be
analysed also for ms I l. Note that A1 is an exact algorithm, whereas A2
and A3 are approximation algorithms or heuristics.
For each algorithm Aj, we let the random cost of the solution found
be C^i (: m if no solution is found), and the random time taken be T^i .
We measure time by the number of nodes of the search tree encountered.
The three algorithms are as follows.
Algorithm
Al: At each step, explore the leftmost node among those
active nodes of minimum cost. Here, a node is active if it is in 7 and may
perhaps have further children. The algorithm haits when it would next
explore a node at depth n. That node then corresponds to an optimal
solution.
Algorithm A.2: This algorithm conducts a staged search with backtracking if a local test is failed. It has three parameters: d, .L, and a. By
an (c, L)-regular path we mean a path which consists of segments each of
Iength .L and cost at most a.L (except that the last segment may have
Iength less than ,L). The algorithm A2 conducts a depth-first search to
find an (o,.L)-regular path from a depth d node to a depth n node. If it
succeedsin reaching depth n, the algorithm returns the corresponding path
as a solution: if it fails, the search is repeated from another depth d node.
If all the nodes at depth d fail to root an (o,,L)-regular path to a depth n
node, the algorithm terminates with failure.
Algorithm A3: This simple bounded-lookaheador'horizon' heuristic is a
staged-searchalgorithm which avoids backtracking. It has one parameter
L. Starting at the root it finds an optimal path to a node at depth .L,
makes that node the new starting point and repeats.
3. Results
We summarise our results in six theorems. Theorem 3.1 concernsthe region
where the mean number rns of zero-cost branches leaving a node satisfies
rns ) l, Theorem 3.2 concerns rno : I and Theorems 3.3-3.6 concern
ms 1I. When rns ) 1 the main distinction is betweenzero and non-zero
costs. Recall that .9 denotes the event of ultimate survival.
T s B o n n v 3 . 1. L e t r n s ) l .
(a) The random variable C* : limCi is finite almost surely on S, and
indeed there exists 6 < I such that
P ( C . > k 1 . 9 ): O ( 6 k ) a s k (b) The timeTfl
oc.
taken by algorithm Al satisfies ElT:|t):
O(n).
Probobilistic Analgsis oJ Tfee Search
253
Thus, if the family tree is infinite, the optimal cost Cj remains bounded a.sn - oo, and algorithm A1 finds an optimal path in linear expected
time.
Next we corsider the critical case rD6 = 1. It is convenient here to
restrict attention to a Bellman-Harris age-dependentbranching process(see
for example Harris 1963). Now children are produced according to a simple
Galton-Watson branching process, and branch costs are independent and
each distributed like some non-negative random variable X.
Tueonptt 3.2. Consider a Bellman-Harris process with mo : mP(X =
0) :t andElZ{x)'l <-.
(a) If further ElZlx)z+u]
< * for some, > 0, P(0 < X < 1) : 0 and
P(X :1) > 0, rhen
Cilloglogn
-
|
almost surely on ,9 as r, + oo.
(b) The time T!1 taken by algorithm A1 satisfies El?:it] = O(n2).
Part (a) shows roughly that if the optimal cost is finite then it is
usually close to log log n: it is a special case of a result of Bramson ( 1978).
Part (b) states that the algorithm A1 finds an optimal path in quadratic
expected time.
Our main interest is in the c&s€?rle < 1. The first result for this case
shows that we cannot quickly find guaranteed optimal or near optimal solutions, and so it is of interest to analyse heuristic approximation methods.
The next result concerns the optimal cost Cj and then we consider the
algorithms A2 and A3.
THBoReu 3.3. Assume that ms 1 I. Let Tn be the least number of nodes
explored in any proof that guarantees a certain path of length n to be
within aconstant factor p of optimal. Thenthere existsq > I andd < 1
such that
P(7. <?" I S) : 0(6") as n + co'
Tsnonnu 3.4. There is a constant ? Z 0, defined by equation (4.1) below
and satisfuing j > 0 if and only if ms 1 1, such that for any e ) 0 there
existsd(1with
/t1
|
r
\
P ( l : c ; '-"r l > . l s ) : 0 6 " ) a s n+ 9 9 .
| )
\ln
|
This result shows roughly that if the optimal cost Cj is finite then
it is usually close to 1n. It is essentially due to Hammersley (1974) and
Kingman (1975), see also Kesten (1973), Kingman (1976). We shall find
that it follows quite easily from our analysis of the search algorithm A2;
see also Biggins (1979).
254
McDiarmid,
TnooRstvt 3.5. Let ms .-1, and consider the backtracking atgorithm AD.
For any e ) 0, with appropriate parameters the algorithm runs in linear
expected time, and there exists 6 < | such that
(r + e)ci) : t - 0(6") asr, + oe.
P(c:'<
THeonnu 3.6. Let ms 11, and consider the non-backtracking algorithm
43.
(a) If ps: 0 then for any e ) 0, with appropriate constant lookahead the
algorithm runs in linear expected time, and there exrsts 6 ( 1 such
that
P(C*t < (1 + e)Ci) = | - 0(6") as n + 99.
(b) If ps ) 0, then for any constant lookahead there exists 6 ( 1 such that
P(C*" < oo) - 0(6")
as r? + oo.
we thus see that the backtracking algorithm A2 is a good heuristic,
and so is the non-backtracking algorithm A3 as long 6 ?o : 0.
Hammersley (1974, Note 8) asked about the concentration of the random variable ci,in particular in the specialcaseconsideredby Karp and
Pearl (1983) when each individual has exactly two children, both born at
time 0 or 1. For this case the bounded differences inequarity of Hoeffding (1963), Azuma (1967) in the form given in McDiarmid (19g9) shows
immediately that for any t ) 0,
P UC; - E(c;)l > t) S 2"-2t"/n
4. Proofs
The first lemma below immediately gives part (a) of Theorem 3.1.
Lsrv{N4R4.1. suppose that rng ) !. Let T be the least depth at which an
infinite path of zero-cost branches is roofed, where we let T : a if there
is nosuch path. Then there existsd < l such that p(T > n
l,g) : O(5")
asn+oo,
PRoon: The zero-cost branches yield a branching process 2 with mean
rns ) I and thus with extinction probability ,i < 1. Supposethat p6 *h >
0 . T h e n a = f ' ( q ) s a t i s f i e s 0< o < 1 . S o , b y a m i n o r e x t e n s i o n o f
Theorem 8.4 in Chapter I of Harris (196J), f"(il : q + O(a"). Here
/, is
the generatingfunction for Z,(oo). But
P(T>
")
:
D
k
p ( 2 , ( @ ) : k ) q u= f , ( . 8 )
Probabilistic Analys'is of Tfee Search
255
Hence
q * O ( a " ) : P ( T > n ) - ( I - q ) P ( T> n l , S )+ q ,
and so P(T > n l^9) : o(a"), as required.The casenot considered
so far
is whenpo * p, = 0, but then clearly
P(T> nl5) : P(T>d<q2".
I
Now considera Galton-Watsonbranchingprocess2. Let D, be the
numberof nodesencountered
in a depth-firstsearchof the family treewhich
terminateson reachinga nodeat depth n or on searchingthe completetree,
and let d" = ElD").
Lotr,ttrln
4.2. Foreachn, dn 1n*1.
Pnoor: Letqn=P(2^ =_0).Of coursedo=1. Suppose
thatd,nisfinite.
Then by conditioningon Zl we seethat
dn+t: t +lnx
( t + q . + . . . + s * - ' )d , ,
fr>1
= 1 + ; "_:
r
Ipo(r-qf)
e" k_sg
d^
:1+t;(r-l(q"))
<l+dn
s i n c e/ ( q " ) = Q n + t ) q n .
I
we
now prove part (b) of rheorem 3.1. consider the branching
xiay
process Z corresponding to the zero cost branches. It has rnean me ) I
and so it has extinction probability f < 1. It now follows from Lemma 4.2
(by Wald's equation) that
Elr,:'l< nI -- t; I
This completesour proof of Theorem 3.1.
T
Pnoor op THpoRev 3.2: Part (a) has alreadybeen discussed,so consider
part (b). consider again the_p.o.ess Z correspondingto the zero-cost
branches. This has mean El7l:
rrTo= 1 and variJnce 62 : o2p2 +
mp(L - p) ( co, where p = P(X : 0). Hence
P (2 ^>o ; : 1 l j (1 )
o"n
256
McDiarmid
(see for example Athreya and Ney 1972, p. 19). So, arguing as before,
E[r:']s =,?* 1-. = o(n2).
P(2"> o)
t
To consider the case ms I I we must investigate the CrumpMode
model in more detail. The key to the analysis is the function @(0) introduced by Kingman (1975). For d ) 0, let
r'l
6@=
) Elt
e-eB"l,
L f J
I
wherethe sum is over the birth times B1a of the childrenr of the initial
ancestor.Note that d(0) = E[21@)] : rn 1rc, and so /(9) < co for all
0)0. Next,fora)0let
p ( a ) : i n f { e e " 6 @ ) I: > o } ,
and definethe 'time constant'.y bv
?:inf{o>0:p(o)
>1}.
(4.1)
The next two lemmasmay be found (essentially)in Kingman (1975).
Lnprve 4.3. The function p, on [0,oo) is continuous;pr(0): rr',o)and for
someb ) 0, p is strictly increasingon [0,b] and p(a) = m for eacha ] b.
Lnuun 4.4. Forany a20,
Efz.(an)l = 0t@)+ o(1))" as n + oo.
PRoor on TupoReu 3.5: Let 0 < e ( 7 and let a =,y*e. By Lemmas
4.3 and 4.4 we may choosea constanttr such that E[27(aZ)] > 1. By
consideringthe (o, ,L)-sonsof a depth d node and their (o,.L)-sonsand so
on we obtain a branchingprocessZ say. This processhasrneanrir > I and
thus has extinctionprobability f < 1.
We can bound the expectedrunning time of algorithm A2 as follows.
By Lemma4.2 (andWald'sequation)for eachnodeat depthd, the expected
cost of a searchto depth n from that node is at most
rnL+1(l(n - d)I Ll+ r) S mL+\(n + r).
Hence(by Wald's equationagain)
Elr*'l < d + m L + L ( n + t )
1-8
: O(n).
Probabilistic Analysis of Ttee Search
257
Next consider costs. Let ,\ be a bound on lifetimes or branch costs,
set
d = d(n) : l(e /.\)nJ. If the algorithm 42 succeedsthen
and
Ct' Sd.\ + f(n - d.)lLl(oL) 3(c * e)n onced2 L.
But
P(C:'- -) ( \rqa1a):
D4'=
k fa(il.
tl
AIso, by Lemmas 4.3, 4.4,
P( C *' S (r - e )n )< E l z" ((t -.)" ) l = o$f)
for some suitable 6t ( 1. Hence
,(cI'
,'-*'i
^t-€
"
\
-/
";\
) P ({ct': - i \ { Ci : m} )
S P (C ;< (r - e )n +
S0(6T)+fa(il-l,(0)
= 0(6T)
for suitable 6z < I.
I
We do not need to prove Theorem 3.4 here, since it is implicit in
Kingman (1975), but note that we have actually done so above. We may
adapt the proof idea above to yield a variant of the 'Chernoff theorem' of
Biggins (1979), which we shall use to prove Theorem 3.3.
L p u u e 4 . 5 . I f I < p ( a ) t h e n f o r a n y t 1 q < p ( a ) t h e r e e x i s t sd < l s u c h
that
P(Z"(an) < ?" | ,S) : 0(6") as ??+ 69.
PnooR: By Lemma 4.3 we may chooseb < o such that p(b) > 4. Then by
Lemma 4.4 we may choosee > 0 sufficiently small and ,L sufficiently large
that e,\ 1- b 1a and (rfu- e)(t-z')/r )^4, where k = EIZr,(bL)].
Consider the branching process Z, with mean ril, formed by taking
(b,.L)-sonsand their sons and so on. By a theorem of Seneta and Hyde
(see for example Athreya and Ney 1972, Theorem 3, p. 30)
P(20 < 0i, - r)k) * f as & -
oc.
By corsidering these processesrooted at the nodes at depth d * l' where
4 = len) and 0 < i < L, we see that
, ( t o * n * o r ( ( d + t ) A + k b L )< 6 1 - . , - )
f a + ; ( Q +o ( 1 ) )
:
s + o(6f)
259
McDiarmid
f o r s u i t a b l e d r( 1 . B u t , i f n : d + i + k L
w h e r e0 < i < t r , t h e n
an > (d + .L)I * kb-Land (* - ,)k ) r1nfor n sufficientlylarge. Hence
P(Z"(an) < q") < q + 0(6T).
Thus
P ( Z " ( a n ) < 4 " 1 , 9 ) ( t- q ) * q n { q + O ( 6 n ,
and the resultfollows,sinceen : Q+ 0(6?) for somesuitable63 (< 1). I
PRoop on THnoReM3.3: Let 0 ( e <--l,Iet B' : 00 + e)lh - e) and
let k : k(n) : L"l0'). The key observation
is that if C; > ("y- e)n then
T4 ) Zp((t - r)"1 0) 2 Zx(Q + e)ft);
for eachnodecountedbyZp((l - 4"10) hascost lessthan CilB andso
must be explored.Now let | 1111pr("y+ e). Then
P ( r ^< ? &| ^ 9s) P ( c ; 1 ( r - e ) n 1 . 9+) P ( z k ( ( r + . ; r ;< 4 e| ^ e )
: o6k)
for suitable 6 < 1, by Lemmas 4.4 and 4.5.
I
Pnoop oF THEoRETU
3.6:
(a) Let po = 0. By Theorem 3.4 we may choose.L so that EICLIL] <
(1 + e)7. Now Cf3 is bounded above by lnlL] independent copies of
CilL, and we are done.
(b) Observethat
P@:3:oo))1-(1
-pil"/L
I
Acknowledgement
I would like to acknowledgestimulating conversationson tree search with
Greg Provan.
Probabili,stic
Analysisof TreeSearch
259
RpppRENcBs
Athreya, K.B. and Ney, P.E. (1972). Branching Processes.Springer-Verlag,
Berlin.
Azuma, K. (1967). Weightedsumsof certain dependentrandom variables.T6kuku MathematicalJournal f9, 357-367.
Biggins, J.D. (1976). The first- and last-birth problems for a multitype agedependentbranchingprocess.Advancesin Apflied Probability 8, 446-459.
(1977). Martingale convergence
in the branchingrandom walk. Journal
of Apflied Probability 74, 25-37.
(1979). Chernoff'stheorem in the branchingrandom walk. Journal of
Applied Probability 14, 630-636.
Bramson,M.D. (1978). Minimal displacernentof branchingrandom walk. Zeitund verwandteGebiete45, 89-108.
schrift fiir Wahrscheinlichkeitstheorie
Chernoff,H. (1952). A measureof asymptoticefficiencyfor tests of a hypothesis
basedon the sum of observations.Annals of MathematicalStatistics 23,
493-507.
branchingprocess
Crump,K.S. and Mode,C.J. (1968).A generalage-dependent
l. Journal of Mathematical Analysisand Applications24,494 508.
and firstGrimmett, G.R. (1985). Large deviationsin subadditiveprocesses
passagepercolation. ln Particle Systems, Random Media, and Large Deviations,ed. R. Durrett, ContemporaryMathematics41, 175-194.
Hammersley,
J.M. (1974).Postulatesfor subadditiveprocesses.
Annalsof Probability 2,652-680.
Harris,T.E. (1963). The Theoryof BranchingProcesses.
Berlin.
Springer-Verlag,
Hoeffding,W. (1963). Probability inequalitiesfor sumsof boundedrandom variables.Journal of the American StatisticalAssocjation58, 13 30.
Joffre,A., LeCam,L. and Neveu,J. (1973).Sur la loi desgrandsnombrespour
desvariablesal6atoiresde Bernoulliattach6esir un arbredyadique.Compte
Rendusde I'Acaddmiedes Sciences,ParisA 277,963-964.
Karp, R.M. (1976).The probabilisticanaiysisof somecombinatorialsearchalgorithms. ln Algorithmsand Complexity,ed. J.F. Tiaub, AcademicPress,
New York, 1-19.
Karp, R.M. and Pearl J. (1983). Searchingfor an optimal path in a tree with
21, 99-116.
randomcosts.ArtjrSciilInteiligence
of Kingman (1973).
Kesten,H. (1973).Contributionto the discussion
Kingman, J.F.C. (1973). Subadditiveergodictheory. Annalsof ProbabilityL,
883-909.
(1975). The first birth problem for an age-dependent
branchingprocess.
Annalsof Probability 3, 790-801.
(1976). Subadditiveprocesses.In Ecole d'Et6 de Probabilitdsde Saint
LectureNotesin Mathematicsno. 539,
Flour V-1975,ed. P.-L.Hennequin,
Berlin.
Springer-Verlag,
McDiarmid,C.J.H. (1989).On the methodof boundeddifferences.
Manuscript.
260
Departmentof Statistics
Oxford University
OxfordOXI 3TG.
McDiarmid