Convergence Thresholds of Newton`s Method for Monotone

Convergence Thresholds of Newton’s Method
for Monotone Polynomial Equations ∗
{esparza,kiefer,luttenbe}@in.tum.de
Javier Esparza, Stefan Kiefer, and Michael Luttenberger
Institut für Informatik
Technische Universität München, Germany †
Abstract
Monotone systems of polynomial equations (MSPEs) are systems of fixed-point equations X1 = f1 (X1 , . . . , Xn ),
. . . , Xn = fn (X1 , . . . , Xn ) where each fi is a polynomial with positive real coefficients. The question of computing the least non-negative solution of a given MSPE X = f (X) arises naturally in the analysis of stochastic
models like stochastic context-free grammars, probabilistic pushdown automata, and back-button processes. Etessami and Yannakakis have recently adapted Newton’s iterative method to MSPEs. In a previous paper we have
proved for strongly connected MSPEs the existence of a threshold kf such that after kf iterations of Newton’s
method each new iteration computes at least 1 new bit of the solution. However, the proof was purely existential.
In this paper we give an upper bound for kf as a function of the maximal and minimal components of the least
fixed-point µf of f (X). Using this result we show that kf is at most single exponential resp. linear for strongly
connected MSPEs derived from probabilistic pushdown automata resp. from back-button processes. Further, we
prove the existence of a threshold for arbitrary MSPEs after which each new iteration computes at least 1/w2h
new bits of the solution, where w, h are the width and height of the DAG of strongly connected components.
1
Introduction
A monotone system of polynomial equations (MSPE for short) has the form
X1 = f1 (X1 , . . . , Xn )
..
.
Xn = fn (X1 , . . . , Xn )
where f1 , . . . , fn are polynomials with positive real coefficients. In vector form we denote an MSPE by X =
f (X). We call MSPEs monotone because X ≤ X ′ implies f (X) ≤ f (X ′ ) for X, X ′ ∈ Rn≥0 . MSPEs appear naturally in the analysis of many stochastic models, like context-free grammars (with numerous applications
to natural language processing [19, 15], and computational biology [21, 4, 3, 17]), probabilistic programs with
procedures [6, 2, 10, 8, 7, 9, 11], and web-surfing models with back buttons [13, 14].
By Kleene’s theorem, a feasible MSPE X = f (X) (i.e., an MSPE with at least one solution) has a least solution
µf ; this solution can be irrational and non-expressible by radicals. Given an MSPE and a vector v encoded in
∗
†
This work was supported by the project Algorithms for Software Model Checking of the Deutsche Forschungsgemeinschaft (DFG).
Part of this work was done at the Universität Stuttgart.
binary, the problem whether µf ≤ v holds is in PSPACE and at least as hard as the SQUARE-ROOT-SUM
problem, a well-known problem of computational geometry ([10, 12] for more details).
For the applications mentioned above the most important question is the efficient numerical approximation of
the least solution. Finding the least solution of a feasible system X = f (X) amounts to finding the least solution
of F (X) = 0 for F (X) = f (X) − X. For this we can apply (the multivariate version of) Newton’s method
[20]: starting at some x(0) ∈ Rn (we use uppercase to denote variables and lowercase to denote values), compute
the sequence
x(k+1) := x(k) − (F ′ (x(k) ))−1 F (x(k) )
where F ′ (X) is the Jacobian matrix of partial derivatives.
While in general the method may not even be defined (F ′ (x(k) ) may be singular for some k), Etessami and
Yannakakis proved in [10, 12] that this is not the case for a more structured method, called Decomposed Newton’s
method (DNM), that decomposes the MSPE into strongly connected components (SCCs)1 . We explain this method
in some more detail. In order to define the SCCs of an MSPE, associate to f a graph having the variables
X1 , . . . , Xn as nodes, and the pairs (Xi , Xj ) such that Xj appears in fi as edges. A subset of equations of f is an
SCC if its associated subgraph is an SCC of the whole graph. DNM starts by computing k iterations of Newton’s
method for each bottom SCC of the system. The values obtained for the variables of these SCCs are then “frozen”,
and their corresponding equations removed. The same procedure is then applied to the new bottom SCCs, again
with k iterations, until all SCCs have been processed. Etessami and Yannakakis prove the following properties of
DNM:
(a) The Jacobian matrices of all the SCCs remain invertible all the way throughout.
(b) The vector x(k) delivered by the method converges to µf when k → ∞ even if x(0) = 0 = (0, . . . , 0)⊤ .
Property (b) is in sharp contrast with the non-monotone case, where Newton’s method may not converge or may
exhibit only local convergence, i.e., the method may converge only in a small neighborhood of the zero.
The results of [10, 12] provide no information
on
of iterations needed to compute i valid bits of
the
number
−i
µf , i.e., to compute a vector ν such that µf j − νj / µf j ≤ 2 for every 1 ≤ j ≤ n. In a former paper [16]
we have obtained a first positive result on this problem. We have proved that for every strongly connected MSPE
f there exists a threshold kf such that for every i ≥ 0 the (kf + i)-th iteration of Newton’s method has at least i
valid bits of µf . Loosely speaking, after reaching the threshold DNM is guaranteed to compute at least 1 new bit
of the solution per iteration; we say that DNM converges linearly with rate 1.
The problem with this result is that its proof provides no information on kf other than its existence. In this
paper we prove that the threshold kf can be chosen as
kf = 3n2 m + 2n2 |log µmin |
where n is the number of equations of the MSPE, m is such that all coefficients of the MSPE can be given as ratios
of m-bit integers, and µmin is the minimal component of the least solution µf .
It can be objected that kf depends on µf , which is precisely what Newton’s method should compute. However, for MSPEs coming from probabilistic models as the ones listed above we can do far better. The following
observations and results help to deal with µmin :
• We obtain a syntactic bound on µmin for probabilistic programs with procedures (having stochastic contextfree grammars and back-button stochastic processes as special instances) and prove that in this case kf ≤
n2n+2 m.
1
More precisely, the proof also requires the MSPE to be clean, see Section 2 for details.
2
• We show that if every procedure has a non-zero probability of terminating, then kf ≤ 3nm. This condition
always holds in the special case of back-button processes [13, 14]. Hence, our result shows that i valid bits
can be computed in time O((nm + i) · n3 ) in the unit cost model of Blum, Shub and Smale [1], where
each single arithmetic operation over the reals can be carried out exactly and in constant time. It was proved
in [13, 14] by a reduction to a semidefinite programming problem that i valid bits can be computed in
poly(i, n, m)-time in the classical (Turing-machine based) computation model. We will not improve this
result, because we do not have a proof that round-off errors (which are inevitable on Turing-machine based
models) do not crucially affect the convergence of Newton’s method. But our result sheds light on the
convergence of a practical method to compute µf .
• Finally, since x(k) ≤ x(k+1) ≤ µf holds for every k ≥ 0, as Newton’s method proceeds it provides better
and better lower bounds for µmin and thus for kf . To demonstrate this, in the paper we exhibit a concrete
MSPE and after a few iterations use our theorem to prove that no component of the solution will reach the
value 1 (which no further number of iterations can prove by itself).
The paper contains two further results. In [16] we left open the problem whether DNM converges linearly for
non-strongly-connected MSPEs. We prove that this is the case. But the convergence rate is poorer: if h and w are
the height and width of the graph of SCCs of f , then there is a threshold e
kf such that e
kf + i · w · 2h+1 iterations
of DNM compute at least i valid bits of µf . We also give an example where DNM needs at least i · 2h iterations
for i valid bits.
The final result of the paper brings us back to Etessami and Yannakakis’ original motivation for DNM. They
introduced the decomposition into SCCs as a tool for proving well-definedness: they showed that the Jacobian
exists for all SCCs, which implies that DNM is always defined. Here we prove that the Jacobian of the whole
MSPE is guaranteed to exist, whether the MSPE is strongly connected or not. As a consequence, one can safely
replace DNM by the standard Newton’s method. Still, since DNM can be far more efficient (its iterations concern
only SCCs, which can be much smaller than the whole MSPE), and since SCCs play an important part in our
threshold analysis, we have formulated our results in terms of DNM.
The paper is structured as follows. In Section 2 we state preliminaries and give some background on Newton’s method applied to MSPEs. Sections 3, 5, and 6 contain the three results of the paper. Section 4 contains
applications of our main result. We conclude in Section 7. Missing proofs can be found in an appendix.
2
Preliminaries
In this section we introduce our notation used in the following and formalize the concepts mentioned in the
introduction.
2.1
Notation
As usual, R and N denote the set of real, respectively natural numbers. We assume 0 ∈ N. Rn denotes the set
of n-dimensional real valued column vectors and Rn≥0 the subset of vectors with non-negative components. We
use bold letters for vectors, e.g. x ∈ Rn , where we assume that x has the components x1 , . . . , xn . Similarly, the
ith component of a function f : Rn → Rn is denoted by fi .
Rm×n denotes the set of matrices having m rows and n columns. The transpose of a vector or matrix is indicated
by the superscript ⊤ . The identity matrix of Rn×n is denoted by Id.
P
The formal Neumann series of A ∈ Rm×m is defined by A∗ = k∈N Ak . It is well-known that A∗ exists if and
only if the spectral radius of A is less than 1, i.e. max{|λ| | C ∋ λ is an eigenvalue of A} < 1. In the case that A∗
exists, we have A∗ = (Id − A)−1 . The converse does not hold.
3
The partial order ≤ on Rn is defined as usual by setting x ≤ y if xi ≤ yi for all 1 ≤ i ≤ n. Similarly, x < y if
x ≤ y and x 6= y. Finally, we write x ≺ y if xi < yi for all 1 ≤ i ≤ n, i.e., if every component of x is smaller
than the corresponding component of y.
We use X1 , . . . , Xn as variable identifiers and arrange them into the vector X. In the following n always
denotes the number of variables, i.e. the dimension of X. While x, y, . . . denote arbitrary elements in Rn , resp.
Rn≥0 , we write X if we want to emphasize that a function is given w.r.t. these variables. Hence, f (X) represents
the function itself, whereas f (x) denotes its value for some x ∈ Rn .
If Y is a set of variables and x a vector, then by xY we mean the vector obtained by restricting x to the
components in Y .
The Jacobian of a function f (X) with f : Rn → Rm is the matrix f ′ (X) defined by
 ∂f1
∂X1
...
∂fm
∂X1
...

f ′ (X) =  ...
2.2
∂f1 
∂Xn
..  .
. 
∂fm
∂Xn
Monotone Systems of Polynomials
Definition 1. A function f (X) with f : Rn≥0 → Rn≥0 is a monotone system of polynomials (MSP), if every
component fi (X) is a polynomial in the variables X1 , . . . , Xn with coefficients in R≥0 . We call an MSP f (X)
feasible if y = f (y) for some y ∈ Rn≥0 .
Fact 2. Every MSP f is monotone on Rn≥0 , i.e. for 0 ≤ x ≤ y we have f (x) ≤ f (y).
Since every MSP is continuous, Kleene’s fixed-point theorem (see e.g. [18]) applies.
Theorem 3 (Kleene’s fixed-point theorem). Every feasible MSP f (X) has a least fixed point µf in Rn≥0 i.e.,
(k)
(k)
µf = f (µf ) and, in addition, y = f (y) implies µf ≤ y. Moreover, the sequence (κf )k∈N with κf = f k (0)
(k)
(k+1)
is monotonically increasing with respect to ≤ (i.e. κf ≤ κf
) and converges to µf .
(k)
In the following we call (κf )k∈N the Kleene sequence of f (X), and drop the subscript whenever f is clear
from the context. Similarly, we sometimes write µ instead of µf .
(k)
A variable Xi of an MSP f (X) is productive if κi > 0 for some k ∈ N. An MSP is clean if all its variables
(k)
(n)
are productive. It is easy to see that we have κi = 0 for all k ∈ N if κi = 0. Just as in the case of context-free
grammars we can determine all productive variables in time linear in the size of f .
Notation 4. In the following, we always assume that an MSP f is clean and feasible. I.e., whenever we write
“MSP”, we mean “clean and feasible MSP”, unless explicitly stated otherwise.
For the formal definition of the Decomposed Newton Method (DNM) (see also Section 1) we need the notion of
dependence between variables.
∂fi
(X) is not the
Definition 5. Let f (X) be an MSP. Xi depends directly on Xk , denoted by Xi E Xk , if ∂X
k
∗
∗
zero-polynomial. Xi depends on Xk if Xi E Xk , where E is the reflexive transitive closure of E. An MSP is
strongly connected (short: an scMSP) if all its variables depend on each other.
Any MSP can be decomposed into strongly connected components (SCCs), where an SCC S is a maximal set
of variables such that each variable in S depends on each other variable in S. The following result for strongly
connected MSPs was proved in [10, 12]:
4
Theorem 6. Let f (X) be an scMSP and define the Newton operator Nf as follows
Nf (X) = X + (Id − f ′ (X))−1 (f (X) − X) .
We have:
(1) Nf (x) is defined for all 0 ≤ x ≺ µf (i.e., (Id − f ′ (x))−1 exists). Moreover, f ′ (x)∗ =
exists for all 0 ≤ x ≺ µf , and so Nf (X) = X + f ′ (X)∗ (f (X) − X).
P
k∈N f
′
(x)k
(k)
(2) The Newton sequence (ν f )k∈N with ν (k) = Nfk (0) is monotonically increasing, bounded from above by
µf (i.e. ν (k) ≤ f (ν (k) ) ≤ ν (k+1) ≺ µf ), and converges to µf .
DNM works by substituting the variables of lower SCCs by corresponding Newton approximations that were
obtained earlier.
3
A Threshold for scMSPs
In this section we obtain a threshold after which DNM is guaranteed to converge linearly with rate 1.
We showed in [16] that for worst-case results on the convergence of Newton’s method it is enough to consider
quadratic MSPs, i.e., MSPs whose monomials have degree at most 2. The reason is that any MSP (resp. scMSP) f
e by introducing auxiliary variables. This transformation
can be transformed into a quadratic MSP (resp. scMSP) f
is very similar to the transformation of a context-free grammar into Chomsky normal form. The transformation
e , and so for a
does not accelerate DNM, i.e., DNM on f is at least as fast (in a formal sense) as DNM on f
worst-case analysis, it suffices to consider quadratic systems. We refer the reader to [16] for details.
We start by defining the notion of “valid bits”.
Definition 7. Let f (X) be an MSP. A vector ν has i valid bits of the least fixed point µf if µf j − νj / µf j ≤
2−i for every 1 ≤ j ≤ n.
In the rest of the section we prove the following:
Theorem 8. Let f (X) be a quadratic scMSP. Let cmin be the smallest nonzero coefficient of f and let µmin and
µmax be the minimal and maximal component of µf , respectively. Let
kf = n · log
µmax
.
cmin · µmin · min{µmin , 1}
Then ν (⌈kf ⌉+i) has i valid bits of µf for every i ≥ 0.
Loosely speaking, the theorem states that after kf iterations of Newton’s method, every subsequent iteration
guarantees at least one more valid bit. It may be objected that kf depends on the least fixed point µf , which is
precisely what Newton’s method should compute. However, in the next section we show that there are important classes of MSPs (in fact, those which motivated our investigation), for which bounds on µmin can be easily
obtained.
The following corollary is weaker, but less technical in that it avoids a dependence on µmax and cmin .
Corollary 9. Let f (X) be a quadratic scMSP of dimension n whose coefficients are given as ratios of m-bit
integers. Let µmin be the minimal component of µf . Let
kf = 3n2 m + 2n2 |log µmin | .
Then ν (⌈kf ⌉+i) has at least i valid bits of µf for every i ≥ 0.
5
Corollary 9 follows from Theorem 8 by a suitable bound on µmax in terms of cmin and µmin , and by the inequation
cmin ≥ 1/2m , see the appendix.
In the rest of the section we sketch the proof of Theorem 8. The proof makes crucial use of the vectors d ≻ 0
such that d ≥ f ′ (µf )d. We call a vector satisfying these two conditions a cone vector of f or, when f is clear
from the context, just a cone vector. To a cone vector d = (d1 , . . . , dn ) we associate two parameters, namely
the maximum and the minimum of the ratios µf 1 /d1 , µf 2 /d2 , . . . , µf n /dn , which we denote by λmax and λmin ,
respectively.
In a previous paper we have shown that if Id − f ′ (µf ) is singular, then f has a cone vector d ([16], Lemmata
4 and 8). As a first step towards the proof of Theorem 8 we show the following stronger proposition.
Proposition 10. Any scMSP has a cone vector.
The second step consists of showing (Proposition 12) that given a cone vector d, the threshold kf,d = log(λmax /λmin )
satisfies the same property as kf in Theorem 8, i.e., ν (⌈kf ,d ⌉+i) has i valid bits of µf for every i ≥ 0.
For that we need the following fundamental property of cone vectors: a cone vector leads to an upper bound on
the error of Newton’s method.
i
Lemma 11. Let d be a cone vector of an MSP f and let λmax = max{ µf
di }. Then
µf − ν (k) ≤ 2−k λmax d.
Proof Idea (see Appendix A for a full proof). If we track the ray g(t) = µf − td starting in µf and headed in
the direction −d (the dashed line in the picture below), then g(λmax ) is the intersection of g with an axis which is
located farthest from µf . One observes that the center g( 21 λmax ) of g(λmax ) and µf is always less than or equal
to the first Newton iterate ν (1) . This is the first step of the proof.
As soon as this fact is proven, we proceed by repeatedly reallocating the origin into the next Newton iterate and
applying the same argument. By induction, one obtains g(2−k λmax ) ≤ ν (k) for all k ∈ N.
The following picture shows the Newton iterates ν (k) for 0 ≤ k ≤ 2 (shape: ×) and the corresponding points
g(2−k λmax ) (shape: +) located on the ray g. Notice that ν (k) ≥ g(2−k λmax ).
X2
0.2
µf = g(0)
X2 = f2 (X)
X1 = f1 (X)
0
0.2
0.4
0.6
X1
−0.2
−0.4
g(λmax )
Now we easily obtain:
Proposition 12. Let f (X) be an scMSP and let d be a cone vector of f . Let kf,d = log λλmax
, where λmax =
min
maxj
µf j
dj
and λmin = minj
µf j
dj .
Then ν (⌈kf ,d ⌉+i) has at least i valid bits of µf for every i ≥ 0.
6
We now proceed to the third and final step. We have the problem that kf,d depends on the cone vector d,
about which we only know that it exists (Proposition 10). We now sketch how to obtain the threshold kf from
Theorem 8, which is independent of any cone vectors, see Appendix A for a full proof.
µf j
dj µf i
i
Consider Proposition 12 and let λmax = µf
and
λ
=
.
Then
the
k
given
there
equals
log
·
min
f,d
di
dj
di µf j .
Notice that this gets large when di gets small compared to dj . By Proposition 10 the component di cannot be 0 if f
is strongly connected. However, MSPs that are not strongly connected can have vectors d ≥ 0 with f ′ (µf )d ≤ d
d
s.t. some components are 0. One can make dji > 0 arbitrarily large also for strongly connected MSPs. But when
doing this, one can show that the strong connectedness “decreases” in some sense, i.e., there are variables X, Y
d
such that X depends on Y only via a monomial that has a very small coefficient. So, dji can be bounded in terms
of cmin .
4
Stochastic Models
As mentioned in the introduction, several problems concerning stochastic models can be reduced to problems
about the least solution µf of an MSPE f . In these cases, µf is a vector of probabilities, and so µmax ≤ 1.
Moreover, we can obtain information on µmin , which leads to bounds on the threshold kf .
4.1
Probabilistic Pushdown Automata
Our study of MSPs was initially motivated by the verification of probabilistic pushdown automata. A probabilistic pushdown automaton (pPDA) is a tuple P = (Q, Γ, δ, Prob) where Q is a finite set of control states, Γ is a finite
stack alphabet, δ ⊆ Q × Γ × Q × Γ∗ is a finite transition relation (we write pX ֒−
→ qα instead of (p, X, q, α) ∈ δ),
and Prob is a function which to each transition pX ֒−
→ qα assigns its probability Prob(pX ֒−
→ qα) ∈ (0, 1]
P
x
Prob(pX
֒−
→
qα)
=
1.
We
write
pX
֒−
→
qα instead of
so that for all p ∈ Q and X ∈ Γ we have pX ֒−
→qα
∗
Prob(pX ֒−
→ qα) = x. A configuration of P is a pair qw, where q is a control state and w ∈ Γ is a stack content.
A probabilistic pushdown automaton P naturally induces a possibly infinite Markov chain with the configurations
x
x
as states and transitions given by: pXβ ֒−
→ qαβ for every β ∈ Γ∗ iff pX ֒−
→ qα. We assume w.l.o.g. that if
x
pX ֒−
→ qα is a transition then |α| ≤ 2.
pPDAs and the equivalent model of recursive Markov chains have been very thoroughly studied [6, 2, 10, 8, 7,
9, 11]. This work has shown that the key to the analysis of pPDAs are the termination probabilities [pXq], where
p and q are states, and X is a stack letter, defined as follows (see e.g. [6] for a more formal definition): [pXq] is the
probability that, starting at the configuration pX, the pPDA eventually reaches the configuration qε (empty stack).
It is not difficult to show that the vector of these probabilities is the least fixed point of the MSPE containing the
equation
X
X
X
X
hpXqi =
x·
hrY ti · htZqi +
x · hrY qi +
x
x
pX ֒−
→rY Z
x
t∈Q
x
pX ֒−
→rY
pX ֒−
→qε
for each triple (p, X, q). Call this quadratic MSPE the termination MSPE of the pPDA (we assume that termination MSPEs are clean, and it is easy to see that they are always feasible). We immediately have that if f is a
termination MSP, then µmax ≤ 1. We also obtain a lower bound on µmin :
(2n+1 −1)
Lemma 13. Let f be a termination MSP with n variables. Then µmin ≥ cmin
.
Together with Theorem 8 we get an exponential bound for kf .
Proposition 14. Let f be a strongly connected termination MSP with n variables and whose coefficients are
expressed as ratios of m-bit numbers. Then kf ≤ n2n+2 m.
7
We conjecture that there is a lower bound on kf which is exponential in n for the following reason. We know a
(n)
family (f (n) )n=1,3,5,... of strongly connected MSPs with n variables and irrational coefficients such that cmin = 14
(n)
for all n and µmin is double-exponentially small in n. Experiments suggest that Θ(2n ) iterations are needed for
the first bit of µf (n) , but we do not have a proof.
4.2
Strict pPDAs and Back-Button Processes
x
A pPDA is strict if for all pX ∈ Q × Γ and all q ∈ Q the transition relation contains a pop-rule pX ֒−
→ qǫ
for some x > 0. Essentially, strict pPDAs model programs in which every procedure has at least one terminating
execution that does not call any other procedure. The termination MSP of a strict pPDA is of the form b(X, X) +
lX + c for c ≻ 0. So we have µf ≥ c, which implies µmin ≥ cmin . Together with Theorem 8 we get:
Proposition 15. Let f be a strongly connected termination MSP with n variables and whose coefficients are
expressed as ratios of m-bit numbers. If f is derived from a strict pPDA, then kf ≤ 3nm.
Since in most applications m is small, we obtain an excellent convergence threshold.
In [13, 14] a class of stochastic processes is introduced to model the behavior of web-surfers which from the
current webpage A can decide either to follow a link to another page, say B, with probability lAB , or to press
the “back button” with nonzero probability bA . These back-button processes correspond to a very special class of
bA
pPDAs having one single control state (which in the following we omit), and rules of the form A ֒−→ ε (press
lAB
the back button from A) or A ֒−−→ BA (follow the link from A to B, remembering A as destination of pressing
the back button at B). The termination probabilities, called revocation probabilities in [13, 14], are given by the
MSPE containing the equation
X
X
hAi = bA +
lAB hBihAi = bA + hAi
lAB hBi
lAB
lAB
A֒−
−→BA
A֒−
−→BA
for every webpage A. Notice that the revocation probability of a page A is the probability that, when currently
visiting an instance of webpage A with H0 H1 . . . Hn−1 Hn the browser history of previously visited pages (H0
being the startpage of the random user), during the further random exploration of webpages starting from A the
random user eventually returns to webpage Hn with H0 H1 . . . Hn−1 being the remaining browser history.
4.3
An Example
As an example of application of Theorem 8 consider the following scMSPE X = f (X).
  

X1
0.4X2 X1 + 0.6
X2  = 0.3X1 X2 + 0.4X3 X2 + 0.3
X3
0.3X1 X3 + 0.7
The least solution of the system gives the revocation probabilities of a back-button process with three webpages. For instance, if the surfer is at page 2 it can choose between following links to pages 1 and 3 with probabilities 0.3 and 0.4, respectively, or pressing the back button with probability 0.3.
We wish to know if any of the revocation probabilities is equal to 1. Performing 14 Newton steps (e.g. with
Maple) yields an approximation ν (14) to the termination probabilities with




0.99
0.98
 0.97  ≤ ν (14) ≤  0.98  .
0.993
0.992
8
We have cmin = 0.3. In addition, since Newton’s method converges to µf from below, we know µmin ≥ 0.97.
1
≤ 6. Theorem 8 then implies
Moreover, µmax ≤ 1, as 1 = f (1) and so µf ≤ 1. Hence kf ≤ 3 · log 0.97·0.3·0.97
(14)
that ν
has (at least) 8 valid bits of µf . As µf ≤ 1, the absolute errors are bounded by the relative errors, and
since 2−8 ≤ 0.004 we know:
  
 −8  
1
0.994
2
µf ≺ ν (14) + 2−8  ≺ 0.984 ≺ 1
1
0.997
2−8
So Theorem 8 gives us a proof that all three revocation probabilities are strictly smaller than 1. Notice also that Newton’s method converges much faster than “Kleene’s iteration” f i (0) i∈N . We have
⊤
κ(14) ≺ 0.89, 0.83, 0.96 , so κ(14) has no more than 4 valid bits in any component, whereas ν (14) has more
than 30 valid bits in each component.
5
Linear Convergence of the Decomposed Newton’s Method
When using Newton’s method for approximating the least fixed point µf of an scMSP f , Theorem 8 states that,
after kf preparatory iterations of Newton’s method, we have at least i bits if additional i iterations are performed.
We call this linear convergence with rate 1. Now we show that DNM, which handles non-strongly-connected
MSPs, converges linearly as well. We will also give an explicit convergence rate.
Let f (X) be any quadratic MSP (again we assume quadratic MSPs throughout this section), and let h(f )
denote the height of the DAG of strongly connected components (SCCs). The convergence rate of DNM crucially
depends on this height: In the worst case one needs asymptotically Θ(2h(f) ) iterations in each component per bit,
assuming one performs the same number of iterations in each component.
To get a sharper result, we suggest to perform a different number of iterations in each SCC, depending on its
depth. The depth of an SCC S is the length of the longest path in the DAG of SCCs from S to a top SCC.
In addition, we use the following
S notation. For a depth
S t, we denote by comp(t) the set of SCCs of depth t.
Furthermore we define C(t) := comp(t) and C> (t) := t′ >t C(t′ ) and, analogously, C< (t). We will sometimes
write v t for v C(t) and v >t for v C> (t) and v <t for v C< (t) , where v is any vector.
Figure 1 shows the Decomposed Newton’s Method (DNM) for computing an approximation ν for µf , where
f (X) is any quadratic MSP. The authors of [10] recommend to run Newton’s Method in each SCC S until
“approximate solutions for S are considered ‘good enough’ ”. Here we suggest to run Newton’s Method in each
SCC S for a number of steps that depends (exponentially) on the depth of S and (linearly) on a parameter j that
controls the precision (see Figure 1).
function DNM (f , j)
/* The parameter j controls the precision. */
for t from h(f ) downto 0
forall S ∈ comp(t) /* all SCCs S of depth t */
t
ν S := Nfj·2 (0) /* j · 2t iterations */
S
/* apply ν S in the depending SCCs */
f <t (X) := f <t (X)[X S /ν S ]
return ν
Figure 1. Decomposed Newton’s Method (DNM) for computing an approximation ν of µf (cf. [10])
Recall that h(f ) was defined as the height of the DAG of SCCs. Similarly we define the width w(f ) to be
maxt |comp(t)|. Notice that f has at most (h(f ) + 1) · w(f ) SCCs. We have the following bound on the number
of iterations run by DNM.
9
Proposition 16. The function DNM(f , j) of Fig. 1 runs at most j · w(f ) · 2h(f)+1 iterations of Newton’s method.
(j)
We will now analyze the convergence behavior of DNM asymptotically (for large j). Let ∆S denote the error
(j)
(j)
(j)
in S when running DNM with parameter j, i.e., ∆S := µS − ν S . Observe that the error ∆t can be understood
as the sum of two errors:
(j)
(j)
(j)
ft (j) ) + (f
∆t = µt − ν t = (µt − µ
µt (j) − ν t ) ,
(j) ft (j) := µ f t (X)[X >t /ν >t ] , i.e., µ
ft (j) is the least fixed point of f t after the approximations from the
where µ
(j)
ft (j) ) and the newly inflicted
lower SCCs have been applied. So, ∆t consists of the propagation error (µt − µ
(j)
(j)
approximation error (f
µt − ν t ).
The following lemma, technically non-trivial to prove, gives a bound on the propagation error.
Lemma 17 (Propagation error). There is a constant c > 0 such that
q
ft k ≤ c · kµ>t − ν >t k
kµt − µ
ft = µ f t (X)[X >t /ν >t ] .
holds for all ν >t with 0 ≤ ν >t ≤ µ>t , where µ
ft has roughly k/2 valid bits of µt . In
Intuitively, Lemma 17 states that if ν >t has k valid bits of µ>t , then µ
other words, (at most) one half of the valid bits are lost on each level of the DAG due to the propagation error.
The following theorem assures that after combining the propagation error and the approximation error, DNM
still converges linearly.
Theorem 18. Let f be a quadratic MSP. Let ν (j) denote the result of calling DNM(f , j) (see Figure 1). Then
there is a kf ∈ N such that ν (kf +i) has at least i valid bits of µf for every i ≥ 0.
We conclude that increasing i by one gives us asymptotically at least one additional bit in each component and,
by Proposition 16, costs w(f ) · 2h(f)+1 additional Newton iterations.
The bound above is essentially optimal in the sense that an exponential (in h(f )) number of iterations is in
general needed to obtain an additional bit. To see this consider the following example that we also used in a
previous paper [16] for a slightly different purpose.
 1 2 1

1 2
X
+
X
X
+
X
0
1
0
1
4
2
4


..


.
X = f (X) =  1
(1)

1 2
1
2
 Xh−1
X
X
+
X
+
h−1
h
4
2
4 h
1
1 2
2 + 2 Xh
The only solution of X = f (X) is (1, . . . , 1)⊤ . Notice that f has h + 1 SCCs: {X0 }, . . . , {Xh }, where {Xj }
has depth j. The DNM starts at the bottom SCCq
{Xh } and works its way up to {X0 }. It is easy to show (see [16])
that the propagation error on level t is at least 1·
(j)
(j)
h
∆t+1 . The approximation error on level h equals ∆h = 2−j·2
(j)
t
if DNM is called with parameter j. So, by induction we get ∆t ≥ 2−j·2 for 0 ≤ t ≤ h. We conclude that at
least 2h Newton steps per bit are needed. Notice that this holds even when we approximate only the bottom SCC
1
1 2
2 + 2 Xh with Newton’s method and solve the other SCCs exactly. Therefore, any method that approximates (1)
suffers from an “exponential” amplification of the propagation error. In other words, this example shows that
computing µf is in general an ill-conditioned problem.
10
6
Newton’s Method for General MSPs
Etessami and Yannakakis [10] introduced DNM because they could show that the matrix inverses used by
Newton’s method exist if Newton’s method is run on each SCC separately (see Theorem 6).
In this section we show, maybe surprisingly, that the matrix inverses used by Newton’s method exist even if the
MSP is not decomposed. More precisely we show the following theorem.
Theorem 19. Let f (X) be any MSP, not necessarily strongly connected. Let the Newton operator Nf be defined
as before:
Nf (X) = X + (Id − f ′ (X))−1 (f (X) − X)
(k)
Then the Newton sequence (ν f )k∈N with ν (k) = Nfk (0) is well-defined (i.e., the matrix inverses exist), monotonically increasing, bounded from above by µf (i.e. ν (k) ≤ ν (k+1) ≺ µf ), and converges to µf .
Theorem 19 relies on a generalized Newton’s method for solving fixed point equations over commutative ωcontinuous semirings, introduced in [5]. The semiring over the nonnegative reals, SR≥0 = hR≥0 ∪{∞}, +, ·, 0, 1i,
is such a semiring. In SR≥0 the operations + and · are defined as in the reals with straightforward extensions to
∞, in particular 0 · ∞ = 0 and a · ∞ = ∞ if a > 0. If an infinite sum does not converge to a real number, it is
∗
defined to be ∞. So, for a matrix A ∈ Rm×m
≥0 , the formal Neumann series A is always defined in SR≥0 , some
entries of A∗ may be ∞.
A theorem of [5] applied to the semiring SR≥0 yields the following proposition.
bf be defined as
Proposition 20 (follows from [5], Theorem 3). Let f (X) be an MSP. Let the Newton operator N
follows:
bf (X) = X + f ′ (X)∗ (f (X) − X) ,
N
(k)
where f ′ (X)∗ is computed in SR≥0 (i.e., may have ∞ as an entry). Then the Newton sequence (b
ν f )k∈N with
(k)
b k (0) is monotonically increasing, bounded from above by µf and converges to µf .
b =N
ν
f
Additive inverses are, strictly speaking, not defined. Therefore, in Proposition 20, instead of f (X) − X we
should rather write “a vector δ(X) such that X + δ(X) = f (X)”. But the simpler formulation causes no trouble
b (k) is positive in each component [5].
here, because f (b
ν (k) ) − ν
In order to show that the Newton scheme from Theorem 19 coincides with the Newton scheme from Proposition 20 we need to show that f ′ (ν (k) )∗ = (Id − f ′ (ν (k) ))−1 . It is sufficient to show that f ′ (ν (k) )∗ does not
have ∞ entries, because then clearly f ′ (ν (k) )∗ (Id − f ′ (ν (k) )) = Id. Notice that this is not a trivial consequence
b (k) and µf do not because the ∞ entries
of Proposition 20: it could be that f ′ (b
ν (k) )∗ has ∞ entries, but the ν
b (k) . What remains to show for Theorem 19
of f ′ (b
ν (k) )∗ are cancelled out by matching 0 entries of f (b
ν (k) ) − ν
′ (k) ∗
is that this is not the case and f (b
ν ) has no ∞ entries. The rest of the proof of Theorem 19 can be found in
Appendix D.1.
6.1
Convergence Speed
As we now know that Newton’s method converges to µf for any MSP f , we address again the question of
convergence speed. By exploiting Theorem 18 and Theorem 19 one can show:
Theorem 21. Let f be any quadratic MSP. Then the Newton sequence (ν (k) )k∈N is well-defined and converges
h(f )
linearly to µf . More precisely, there is a kf ∈ N such that ν (kf +i·(h(f)+1)·2 ) has at least i valid bits of µf for
every i ≥ 0.
A proof is given in Appendix D.2. Again, the exponential factor in 2h(f) cannot be avoided in general. This
follows from the example and the discussion at the end of Section 5.
11
7
Conclusions
We have proved a threshold kf for strongly connected MSPEs. After kf + i steps of DNM we have i bits of
accuracy. The threshold kf depends on the representation size of f and on the least solution µf . Although this
latter dependence might seem to be a problem, lower and upper bounds on µf can be easily derived for stochastic
models (probabilistic programs with procedures, stochastic context-free grammars and back-button processes). In
particular, this allows us to show that kf depends linearly on the representation size for back-button processes. We
have also shown by means of an example that the threshold kf improves when the number of iterations of DNM
increases.
In [16] we left the problem whether DNM converges linearly for non-strongly-connected MSPEs open. We
have proven that this is the case, although the convergence rate is poorer: if h and w are the height and width of
the graph of SCCs of f , then there is a threshold e
kf such that e
kf + i · w · 2h+1 iterations of DNM compute at least
i valid bits of µf . We have also given an example in which DNM needs at least i · 2h iterations for i valid bits.
Finally, we have shown that the Jacobian of the whole MSPE is guaranteed to exist, whether the MSPE is
strongly connected or not.
Acknowledgment. The authors wish to thank Kousha Etessami for very valuable comments.
References
[1] L. Blum, M. Shub, and S. Smale. On a theory of computation and complexity over the real numbers: NP-completeness,
recursive functions and universal machines. Bulletin of the Amer. Math. Society, 21(1):1–46, 1989.
[2] T. Brázdil, A. Kučera, and O. Stražovský. On the decidability of temporal properties of probabilistic pushdown automata. In Proceedings of STACS’2005, volume 3404 of LNCS, pages 145–157. Springer, 2005.
[3] R. Dowell and S. Eddy. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5(71), 2004.
[4] R. Durbin, S. Eddy, A. Krogh, and G. Michison. Biological Sequence Analysis: Probabilistic Models of Proteins and
Nucleic Acids. Cambridge University Press, 1998.
[5] J. Esparza, S. Kiefer, and M. Luttenberger. On fixed point equations over commutative semirings. In Proceedings of
STACS, LNCS 4397, pages 296–307, 2007.
[6] J. Esparza, A. Kučera, and R. Mayr. Model-checking probabilistic pushdown automata. In Proceedings of LICS 2004,
pages 12–21, 2004.
[7] J. Esparza, A. Kučera, and R. Mayr. Quantitative analysis of probabilistic pushdown automata: Expectations and
variances. In Proceedings of LICS 2005, pages 117–126. IEEE Computer Society Press, 2005.
[8] K. Etessami and M. Yannakakis. Algorithmic verification of recursive probabilistic systems. In Proceedings of TACAS
2005, LNCS 3440, pages 253–270. Springer, 2005.
[9] K. Etessami and M. Yannakakis. Checking LTL properties of recursive Markov chains. In Proceedings of 2nd Int.
Conf. on Quantitative Evaluation of Systems (QEST’05), 2005.
[10] K. Etessami and M. Yannakakis. Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear
equations. In STACS, pages 340–352, 2005.
[11] K. Etessami and M. Yannakakis. Recursive Markov decision processes and recursive stochastic games. In Proceedings
of ICALP 2005, volume 3580 of LNCS, pages 891–903. Springer, 2005.
[12] K.
Etessami
and
M.
Yannakakis.
Recursive
Markov
chains,
stochastic
grammars,
and
monotone
systems
of
nonlinear
equations,
2006.
Draft
journal
submission,
http://homepages.inf.ed.ac.uk/kousha/bib_index.html.
[13] R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. Random
walks with “back buttons” (extended abstract). In STOC, pages 484–493, 2000.
[14] R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. Random
walks with “back buttons”. Annals of Applied Probability, 11(3):810–862, 2001.
[15] S. Geman and M. Johnson. Probabilistic grammars and their applications, 2002.
12
[16] S. Kiefer, M. Luttenberger, and J. Esparza. On the convergence of Newton’s method for monotone systems of polynomial equations. In Proceedings of STOC, pages 217–226. ACM, 2007.
[17] B. Knudsen and J. Hein. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic
Acids Research, 31(13):3423–3428, 2003.
[18] W. Kuich. Handbook of Formal Languages, volume 1, chapter 9: Semirings and Formal Power Series: Their Relevance
to Formal Languages and Automata, pages 609 – 677. Springer, 1997.
[19] C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[20] J. Ortega and W. Rheinboldt. Iterative solution of nonlinear equations in several variables. Academic Press, 1970.
[21] Y. Sakabikara, M. Brown, R. Hughey, I. Mian, K. Sjolander, R. Underwood, and D. Haussler. Stochastic context-free
grammars for tRNA. Nucleic Acids Research, 22:5112–5120, 1994.
13
A Proofs of Section 3
A.1
Proof of Proposition 10
Here is a restatement of Proposition 10.
Proposition 10. Any scMSP has a cone vector.
Let f be an scMSP. As mentioned before, it was proved in [16] that f has a cone vector if Id − f ′ (µf ) is
singular.
So, it remains to show that f has a cone vector d in the non-singular case, too. In a first step we show that we
can relax the requirement d ≻ 0 to d > 0.
Lemma 22. Let f be an scMSP and let d > 0 with f ′ (µf )d ≤ d. Then d is a cone vector, i.e., d ≻ 0.
Proof. Since f is an MSP, every component of f ′ (µf ) is nonnegative. So,
0 ≤ f ′ (µf )n d ≤ f ′ (µf )n−1 d ≤ . . . ≤ f ′ (µf )d ≤ d.
Let w.l.o.g. d1 > 0. As f is strongly connected, there is for all j with 1 ≤ j ≤ n an rj ≤ n s.t. (f ′ (µf )rj )j1 > 0.
Hence, (f ′ (µf )rj d)j > 0 for all j. With above inequation chain, it follows that dj ≥ (f ′ (µf )rj d)j > 0. So,
d ≻ 0.
Now we can show that there is a cone vector also in the non-singular case.
Lemma 23. Let f be an scMSP. If (Id − f ′ (µf ))−1 exists, then f has a cone vector.
Proof. By Lemma 22 it suffices to find a vector d > 0 such that f ′ (µf )d ≤ d. Take any e ≻ 0 and set
d := (Id − f ′ (µf ))−1 e. Clearly f ′ (µf )d ≤ d, and so it remains to show d > 0. Since e ≻ 0, it suffices to prove
that every entry of (Id − f ′ (µf ))−1 is nonnegative, which we denote by (Id − f ′ (µf ))−1 ≥ 0. For this recall
that (Id − f ′ (x))−1 = f ′ (x)∗ on G = {x | 0 ≤ x ≺ µf }, and, hence, (Id − f ′ (x))−1 ≥ 0 for x ∈ G.
We now make use of the fact that for every i, j ∈ {1, . . . , n} we may write (Id − f ′ (X))−1
ij as a rational
nij (X)
d(X) , where d(X) is the determinant
th
j row and ith column of (Id − f ′ (X)), (2)
function rij (X) =
of (Id − f ′ (X)), and nij (X) is obtained by (1)
canceling the
taking the determinant of the resulting submatrix,
i+j
and (3) multiplying by (−1) . As d(µf ) 6= 0, the functions rij (X) are continuous at least on an open ball O
centered at µf . Now, as G ∩ O 6= ∅, we have (Id − f ′ (x))−1 ≥ 0 for every x ∈ G, and the rij are continuous,
we get (Id − f ′ (µf ))−1 ≥ 0.
This finishes the proof of Proposition 10.
A.2
Proof of Lemma 11
Here is a restatement of Lemma 11.
i
Lemma 11. Let d be a cone vector of an MSP f and let λmax = max{ µf
di }. Then
µf − ν (k) ≤ 2−k λmax d.
We handle the base case k = 1 of Lemma 11 in the following separate lemma.
14
i
Lemma 24. Let d be a cone vector of a (not necessarily clean) MSP f . Let λmax = max{ µf
di }. Then
1
µf − ν (1) ≤ λmax d.
2
Proof. We write f (X) as a sum
f (X) = c +
D
X
Lk (X, . . . , X)X
k=1
where D is the degree of f , and every Lk is a (k − 1)-linear map from (Rn )k−1 to Rn×n . Notice that f ′ (X) =
P
D
k=1 k · Lk (X, . . . , X). We simply write L for L1 , and h(X) for f (X) − LX − c.
=
≥
=
=
≥
=
≥
=
=
=
=
=
λmax
2 d
λmax
∗
∗
2 (L d − L Ld)
λmax
′
∗
∗
2 (L f (µf )d − L Ld)
λmax ∗ ′
2 L h (µf )d
L∗ 12 h′ (µf )λmax d
L∗ 12 h′ (µf )µf
P
k · Lk (µf , . . . , µf )µf
L∗ 12 D
PDk=2
∗
L
k=2 Lk (µf , . . . , µf )µf
L∗ h(µf )
L∗ (f (µf ) − Lµf − c)
L∗ µf − L∗ Lµf − L∗ c
µf − L∗ c
µf − ν (1)
(L∗ = Id + L∗ L)
(f ′ (µf )d ≤ d)
(f ′ (x) = h′ (x) + L)
(by def. of λmax : λmax d ≥ µf )
(f (x) = h(x) + Lx + c)
(f (µf ) = µf )
(L∗ = Id + L∗ L)
(ν (1) = L∗ c)
By means of a suitable induction we can extend this last Lemma 24 to an arbitrary number of iterations, yielding
the proof of Lemma 11:
Proof. For every k ≥ 0, define g k (X) = f (X + ν (k) ) − ν (k) . We first show that g k is an MSP (not necessarily
clean) for every k ≥ 0. The only coefficients of g k that could be negative are those of degree 0. But we have
g k (0) = f (ν (k) ) − ν (k) ≥ 0, and so these coefficients are also nonnegative.
Moreover, it follows immediately from the definition that µf − ν (k) ≥ 0 is the least fixed point of g k . Finally,
g k satisfies g ′k (µf − ν (k) )d ≤ d, and so d is also a cone vector of g k .
µf −ν
(k)
Let λk = max{ i di i }; in particular, λ0 = λmax . We proceed to prove the lemma by induction on k.
For k = 0 we have by definition ν (0) = 0 and µf ≤ λmax d, and we are done. Now let k ≥ 0 and assume
µf − ν (k) ≤ 2−k λ0 d. We have
(k)
µf i −ν i
}
di
2−k λ0 di
max{ di }
2−k λ0 .
λk = max{
≤
=
.
Since d is a cone vector of g k , we can apply Lemma 24 to g k and get:
1
(1)
(µf − ν (k) ) − ν gk ≤ λk d.
2
15
(1)
where ν gk denotes the first iteration of Newton’s method applied to g k . But we have
(1)
ν gk
and so
A.3
= f ′ (ν (k) )∗ (g k (0) − 0)
= ν (k+1) − ν (k)
1
µf − ν (k+1) ≤ λk d ≤ 2−(k+1) λ0 d.
2
Proof of Proposition 12
Here is a restatement of Proposition 12.
Proposition 12. Let f (X) be an scMSP and let d be a cone vector of f . Let kf,d = log λλmax
, where λmax =
min
maxj
µf j
dj
and λmin = minj
µf j
dj .
Then ν (⌈kf ,d ⌉+i) has at least i valid bits of µf for every i ≥ 0.
Proof. For all 1 ≤ j ≤ n the following holds.
µf − ν (⌈kf ,d ⌉+i)
A.4
≤ 2−(⌈kf ,d ⌉+i) λmax dj
≤ 2−kf ,d −i λmax dj
= λmin dj · 2−i
≤ µf j · 2−i
j
Proof of Theorem 8
Here is a restatement of Theorem 8.
Theorem 8. Let f (X) be a quadratic scMSP. Let cmin be the smallest nonzero coefficient of f and let µmin and
µmax be the minimal and maximal component of µf , respectively. Let
µmax
.
kf = n · log
cmin · µmin · min{µmin , 1}
Then ν (⌈kf ⌉+i) has i valid bits of µf for every i ≥ 0.
Proof. In what follows we shorten µf to µ. Let d be a cone vector of f (which exists by Proposition 10). Let
µ
λj = djj for all 1 ≤ j ≤ n and assume w.l.o.g. λ1 ≥ λ2 ≥ . . . ≥ λn . By Lemma 11 we have ν (k) ≥ µ − 2−k λ1 d.
Let kj = log λλ1j . Then we have
(⌈k ⌉+i)
νj j
kj +i
1
≥ µj −
λ1 dj = µj − 2−i µj
2
(⌈k ⌉+i)
and
µj −νj j
µj
≤ 2−i . So it remains to show kn ≤ kf .
′ (µ) 6= 0 and log λs ≥ 1 k . To prove that
We claim the existence of indices s, t with 1 ≤ s, t ≤ n such that fst
λt
n n
such s, t exist, we use the fact that f is strongly connected, i.e., that there is a sequence 1 = r1 , r2 , . . . , rq = n
with q ≤ n and fr′ j rj+1 (x) 6= 0. Since µ ≻ 0, we also have fr′ j rj+1 (µ) 6= 0. It follows
λr
λ1
λr
= 1 · · · q−1 , and so
λn
λr 2
λr q
λr
λr
λ1
= log 1 + · · · + log q−1 .
log
λn
λr 2
λr q
16
So there must exist a j s.t.
log
λr j
1
λ1
1
≥
log
≥ kn
λrj+1
n−1
λn
n
and one can choose s = rj and t = rj+1 .
′ (µ)d ≤ d implies f ′ (µ)d ≤ d , and so
Now, fst
t
s
st
′
fst
(µ) ≤
ds
λt µs
µmax
=
·
≤ 2−kn /n ·
.
dt
λs µt
µmin
(2)
On the other hand, since f is quadratic, f ′ is a linear mapping such that
′
fst
(µ) = 2(b1 · µ1 + · · · + bn · µn ) + l
′ (µ) 6= 0, at least
where b1 , . . . , bn and l are coefficients of quadratic, respectively linear, monomials of f . As fst
one of these coefficients must be nonzero and so greater than or equal to cmin . It follows
′
fst
(µ) ≥ cmin · min{µmin , 1} ,
which together with equation (2) yields
µmax
, and so
cmin · µmin · min{µmin , 1}
µmax
.
kn ≤ n · log
cmin · µmin · min{µmin , 1}
2kn /n ≤
A.5
Proof of Corollary 9
Here is a restatement of Corollary 9.
Corollary 9. Let f (X) be a quadratic scMSP of dimension n whose coefficients are given as ratios of m-bit
integers. Let µmin be the minimal component of µf . Let
kf = 3n2 m + 2n2 |log µmin | .
Then ν (⌈kf ⌉+i) has at least i valid bits of µf for every i ≥ 0.
First we check the case where f is linear, i.e., all monomials in f have degree at most 1. In this case, Newton’s
method reaches µf after one iteration, so the theorem holds. Consequently, we can assume in the following that
f is strictly quadratic, meaning that f is quadratic and there is a polynomial in f of degree 2.
In the following lemma we give a bound on µmax in terms of µmin , cmin and n. Notice that log µmax is polynomial
in terms of those parameters.
Lemma 25. Let the preconditions of Theorem 8 hold, and let f be strictly quadratic, i.e., nonlinear. Then
µmax ≤
3n−2
cmin
1
.
2n−2
· min(µmin
, 1)
Furthermore, cmin ≤ 1.
17
Proof. Let w.l.o.g. µmax = (µf )1 . The proof is based on the idea that X1 indirectly depends quadratically on
itself. More precisely, by the strong connectedness, X1 depends (indirectly) on some variable, say Xir , such
that f ir contains a degree-2-monomial. The variables in that monomial, in turn, depend on X1 . This gives an
inequation of the form (µf )1 ≥ C · (µf )1 2 , implying (µf )1 ≤ 1/C.
We give the details in the following. Using the strong connectedness there exists a sequence of variables
Xi1 , . . . , Xir and a sequence of monomials mi1 , . . . , mir (1 ≤ r ≤ n) with the following properties:
–
–
–
–
Xi1 = X1 ,
miu is a monomial appearing in f iu
(1 ≤ u ≤ r),
(1 ≤ u ≤ r),
miu = ciu · Xiu+1
mir = cir · Xj1 · Xk1 for some variables Xj1 , Xk1 .
Notice that
µmax = (µf )1 ≥ ci1 · . . . · cir · (µf )j1 · (µf )k1
(3)
≥ min(cnmin , 1) · (µf )j1 · (µf )k1 .
Again by strong connectedness, there exists a sequence of variables Xj1 , . . . , Xjs and a sequence of monomials
mj1 , . . . , mjs−1 (1 ≤ s ≤ n) with the following properties:
– X js = X 1 ,
– mju is a monomial appearing in f ju
′
′
for some variable Xju+1
– mju = cju · Xju+1 or mju = cju · Xju+1 · Xju+1
(1 ≤ u ≤ s − 1),
(1 ≤ u ≤ s − 1).
Notice that
(µf )j1 ≥ cj1 · . . . · cjs−1 · min(µs−1
min , 1) · (µf )1
n−1
≥ min(cn−1
min , 1) · min(µmin , 1) · (µf )1 .
(4)
Similarly, there exists a sequence of variables Xk1 , . . . , Xkt (1 ≤ t ≤ n) with Xkt = X1 showing
n−1
(µf )k1 ≥ min(cn−1
min , 1) · min(µmin , 1) · (µf )1 .
(5)
Combining (3) with (4) and (5) yields
3n−2
2n−2
µmax ≥ min(cmin
, 1) · min(µmin
, 1) · µ2max ,
which implies
µmax ≤
3n−2
min(cmin
, 1)
1
.
2n−2
· min(µmin
, 1)
(6)
Now the second statement of the lemma implies the first one. In order to prove the second statement, assume
for contradiction cmin > 1. This implies µmin > 1 due to the following reason. Consider
the Kleene sequence
0, f (0), f 2 (0), . . . For all 1 ≤ i ≤ n let bi be the smallest natural number such that f bi (0) i > 0. The numbers bi
exist because f is clean and the Kleene sequence converges to µf . We show by induction on bi that f bi (0) i > 1
which, by
the monotonicity of the Kleene sequence, implies µmin > 1. For the inductive step notice that the value
f bi (0) i = f i (f bi −1 (0)) is computed as a sum of products of numbers which are either coefficients of f (and
hence by assumption greater than 1) or of the form f bi −1 (0) j for some j. By induction and by the monotonicity
of the Kleene sequence, a number of the latter form is either 0 or greater than 1. So, f bi (0) i itself must be 0 or
greater than 1. By definition of bi it cannot be 0.
So we have cmin > 1 and µmin > 1. Plugging this into (6) yields µmax ≤ 1. This implies µmax < µmin ,
contradicting the definition of µmax and µmin .
18
Now we can complete the proof of Corollary 9. By Theorem 8 it suffices to show
µmax
≤ 3n2 m + 2n2 |log µmin | .
n · log
cmin · µmin · min{µmin , 1}
We have
µmax
cmin · µmin · min{µmin , 1}
1
≤ n · log 3n−1
2n−2
, 1)
cmin · µmin · min(µmin
n · log
(by Lemma 25)
2n−2
≤ 3n2 · (− log cmin ) − n log(µmin · min(µmin
, 1))
2
≤ 3n m − n log(µmin ·
2n−2
min(µmin
, 1))
(by Lemma 25: cmin ≤ 1)
(cmin ≥ 2−m ) .
2n−2
If µmin ≥ 1 we have −n log(µmin · min(µmin
, 1)) = −n log µmin ≤ 0, so we are done in this case. If µmin ≤ 1
2n−2
2n−1
we have −n log(µmin · min(µmin , 1)) = −n log µmin
= n(2n − 1) |log µmin | ≤ 2n2 |log µmin |. This completes
the proof of Corollary 9.
B Proofs of Section 4
B.1
Proof of Lemma 13
Here is a restatement of Lemma 13.
(2n+1 −1)
Lemma 13. Let f be a termination MSP with n variables. Then µmin ≥ cmin
.
Proof. We prove a stronger result. For every k ∈ {1, . . . , n}, f has k variables X1 , . . . , Xk such that µf 1 , . . . , µf k ≥
k+1
c2min −1 .
x
We proceed by induction on k. For k = 1, observe that, since the MSP is clean, pX ֒−
→ qε is a transition of the
pPDA for some hpXqi, and so [pXq] ≥ x ≥ cmin . We call hpXqi a sink.
k
For k > 1, let X1 , . . . , Xk−1 be variables such that µf 1 , . . . , µf k−1 ≥ c2min−1 . We show that there is a
k+1
b be the MSP obtained by replacing every occurrence of Xi by
variable Xk such that µf k ≥ c2min −1 . Let f
b is also a termination MSP with µf
b = µf and
µf i for every i ∈ {1, . . . , k − 1}; it is easy to see that f
k
k
2k −1 2
2k+1 −1
b
b
cmin ≥ cmin (cmin ) = cmin
. So we can choose any sink of f for Xk .
B.2
Proof of Proposition 14
Here is a restatement of Proposition 14.
Proposition 14. Let f be a strongly connected termination MSP with n variables and whose coefficients are
expressed as ratios of m-bit numbers. Then kf ≤ n2n+2 m.
Proof.
kf
max
= n log cmin ·µmin µ·min{µ
min ,1}
(Theorem 8)
≤ −n log(µ2min · cmin )
(termination MSP)
2·(2n+1 −1)
≤ −n log(cmin
≤
≤
−n2n+2 log c
· cmin ) (Lemma 13)
min
n2n+2 m
(cmin ≥ 1/2m )
19
C Proofs of Section 5
C.1
Proof of Proposition 16
Here is a restatement of Proposition 16.
Proposition 16. The function DNM of Figure 1 runs at most j · w(f ) · 2h(f)+1 iterations of Newton’s method.
Ph(f)
Proof. The number of iterations of the DNM is t=0 |comp(t)| · j · 2t . This can be bounded as follows.
Ph(f)
t=0
C.2
Ph(f)
|comp(t)| · j · 2t ≤ w(f ) · j · t=0 2t
≤ w(f ) · j · 2h(f)+1
Proof of Lemma 17
Before proving Lemma 17, we show the following proposition which covers the case of a quadratic, clean, and
feasible scMSP
f (X) = b(X, X) + l(X) + c,
where b(X, Y ) is a bilinear map, l(X) is linear, and c is constant.
Proposition 26. Let f (X) satisfy the conditions stated above with µf its least fixed-point. Then there is a constant
C such that for all 0 ≤ δ ≤ µf it holds
C · kδk22 ≤ (Id − f ′ (µf )) · δ + b(δ, δ)2 .
Proof. We now discuss the three cases that either (case I) f is linear in X, or (case II) f is non-linear and
(Id − f ′ (µf ))−1 exists, or (case III) f is non-linear and (Id − f ′ (µf )) is singular.
Case I: We first consider the case, where the SCC represented by f is linear in X. Then f ′ (X) ≡ f ′ (0) is
constant, b(X, X) ≡ 0 and (Id − f ′ )−1 exists, as we consider an SCC. So, we get
(Id − f ′ (0))δ ≥ λmin (Id − f ′ (0)) · kδk ,
2
2
where we define λmin (A) to be the smallest absolute value of an eigenvalue of a square matrix A.
Case II: Next, we consider the case where f contains at least one quadratic term in X and (Id − f ′ (µf )) is
invertible. As shown in the proof of Lemma 23, we then have (Id − f ′ (µf ))−1 ≥ 0. So, we may write
(Id − f ′ (µf ))δ + b(δ, δ) ≥ λmin (Id − f ′ (µf )) · δ + (Id − f ′ (µf )−1 b(δ, δ)
2
2
≥ λmin (Id − f ′ (µf )) · kδk2 ,
where we used in the last step that δ ≥ 0 and (Id − f ′ (µf ))b(δ, δ)−1 ≥ 0.
20
Case III: Finally, we consider the case where f depends quadratically on X and Id − f ′ (µf ) is singular. As
f (X) is clean and feasible, i.e. µf ≻ 0, and quadratically depends on X, we know that the first Newton step is
well-defined, i.e. f ′ (0)∗ = (Id − f ′ (0))−1 exists. We therefore may write
(Id − f ′ (µf ))δ + b(δ, δ) ≥ λmin (Id − f ′ (0)) f ′ (0)∗ ((Id − f ′ (µf ))δ + b(δ, δ))
2
2
′
= λmin (Id − f (0)) (Id − 2b̃(µf ))δ + b̃(δ, δ)) ,
2
where b̃(X, X) := f ′ (0)∗ b(X, X). Thus, it is sufficient to show that there exists a C̃ > 0 with
(Id − 2b̃(µ))δ + b̃(δ, δ)) ≥ C̃ kδk2 ,
(7)
2
as we then may set C := λmin (Id − f ′ (0)) · C̃.
We note that f (X) and f̃ (X) := b̃(X, X) + f ′ (0)∗ f (0) are equivalent in the sense that both functions have
′
the same set of fixed points, their Newton sequences and nullspaces of Id − f ′ (x∗ ) and Id − f̃ (x∗ ) are identical.
These properties are easily checked.
The reason for multiplying by f ′ (0)∗ is that this guarantees that no component of b̃(X, X) is the zeropolynomial. We are going to need this property shortly.
First let us give an intuition why this property of b̃ holds – we leave the technical details to the reader: as we
assume that S is an SCC, every variable of X depends on every other variable of X w.r.t. f . Hence, as we
consider the case where f contains at least one quadratic term, every variable either directly depends directly on a
quadratic term, or there exists a sequence of variables Xi1 , Xi2 , . . . , Xik such that Xil depends linearly on Xil+1
and Xik itself depends directly on a quadratic term. All these “linear dependencies” are summarized in f ′ (0)∗ .
Multiplying by f ′ (0) propagates these to the remaining quadratic terms.
Let us introduce the norm
yi }.
kykµf := max{
µfi Remember, we consider a clean scMSP, thus we have µf ≻ 0, and k·kµf is well-defined. It is straightforward to
check that this is indeed a norm. We then define the set of directions
D = {d ∈ Rn | d ≥ 0, kdkµf = 1}.
Then we are guaranteed that for every direction d ∈ D the ray µf − r · d stays non-negative for r ∈ [0, 1], i.e.
0 ≤ µf − r · d ≤ µf
(r ∈ [0, 1]),
and, as 0 < δ ≤ µf , we have rδ−1 · δ ∈ D for rδ := kδkµf ∈ [0, 1].
We now will show that that there exists a C̃ > 0 – independent of d – such that
′
r(Id − f̃ (µf ))d + r2 b̃(d, d) ≥ C̃ · r2
for all r ∈ [0, 1] and d ∈ D, which implies Eq. 7.
We set
2
′
2 2
2
r(Id − f̃ (µf ))d + r2 b̃(d, d)
1 ′
′
2
U (r, d) :=
= b̃(d, d) + hb̃(d, d), (Id−f̃ (µf ))di+ 2 (Id − f̃ (µf ))d ,
r4
r
r
2
2
where h·, ·i is the Euclidean scalar-product. We further define
2
2
′
′
α(d) := b̃(d, d) , β(d) := hb̃(d, d), (Id − f̃ (µf ))di, and γ(d) := (Id − f̃ (µf ))d .
2
2
21
Note that U (r, d) > 0 on (0, 1] × D, as otherwise µf − rd would be a fixed-point of f (X), resp. f̃ (X), less
than µf .
As D is compact and U (r, d) is continuous on (0, 1] × D, the function
g(r) := inf U (r, d)
d∈D
is non-negative and continuous on (0, 1], too. Set
G(R) := inf g(r)
R≤r≤1
for 0 ≤ R ≤ 1. We then have G(R) > 0 for 0 < R ≤ 1 and G decreases monotonically with R → 0.
Now, if we can show that G(0) = inf 0≤r≤1 g(r) > 0, our proof will be complete, as we will then have
′
2
2
(Id − f̃ (µf ))δ + b̃(δ, δ) ≥ G(0) · rδ2 = G(0) · kδkµf ≥ C̃ kδk2
2
by equivalence of norms on Rn for some appropriate constant C̃.
We proceed by assuming the opposite, i.e. G(0) = inf r∈[0,1] g(r) = 0, and show that this leads to the contradiction that µf is not the least fixed-point. With the assumption inf r∈[0,1] g(r) = 0, there has to exist a monotonically
decreasing sequence ri converging to 0 with g(ri ) → 0 for i → ∞.
As U (r, d) is continuous, and D compact, we find for every ri a di ∈ D with g(ri ) = U (ri , di ). As di is a
sequence in the compact set D ⊆ Rn , there exists a convergent subsequence, w.l.o.g. we therefore may assume
that the sequence di already converges to some d∗ ∈ D.
Now, we want to show first that we can refine the sequence (ri , di )i∈N in such a p
way that there is a Cγ > 0 such
that γ(di ) ≤ Cγ ri2 for all i: By the Cauchy-Schwarz inequation we have |β(·)| ≤ α(·)γ(·), thus
i→∞
0 ←−−− g(ri ) ≥
p
α(di ) −
!2
p
γ(di )
≥ 0.
ri
Hence, there has to exist constants cγ ≥ 0 and iγ ∈ N such that for all i ≥ iγ :
p
p
γ(di ) ≤ cγ ,
α(di ) −
ri which in turn implies
p
p
γ(di )
≤ cγ + α(di ) ≤ cγ + max α(d) =: Cγ
d∈D
ri
i.e.
γ(di ) ≤ Cγ2 · ri2 .
Thus, we have to have γ(d∗ ) = 0, i.e. d∗ is located in the nullspace of Id − f ′ (µf ), implying β(d∗ ) = 0 and
α(d∗ ) ≥ c > 0. As d∗ ∈ D, we have d∗ > 0 – see Lemma 22. Thus by the strong connectivity of f (X) we have
d∗ ≻ 0. Hence, there has to exist an i0 such that for all i ≥ i0 , we have di ≻ 0, as di converges to d∗ for all
i ≥ i0 , as di → d and b̃ continuous 2 .
We have already stated that b̃(X, X) cannot be the zero-polynomial in any component, hence, b̃(di , di ) ≻ 0
for all i ≥ i0 , too. So, there also exists a vector cb̃ such that b̃(di , di ) ≻ cb̃ ≻ 0.
Now, as we have
2
i→∞
di
′
g(ri ) = (Id − f̃ (µf )) + b̃(di , di )
−−−→ 0,
ri
2
2
If necessary, we may adjust i0 suitably.
22
there has to exist an I0 such that b̃(di , di ) ≻ cb̃ , and g(ri ) < ( 12 min1≤k≤n (cb̃ )k )2 for all i ≥ I0 . This implies
(Id − f˜′ (µf )) drii ≺ 0.
e := dI . As ri > 0, we also have (Id − f˜′ (µf ))d
e ≺ 0.
Consider the vector d
0
Define
′
e k
−((Id − f̃ (µf ))d)
ρ := min{1, min
}.
e d))
e k
1≤k≤n
(b̃(d,
′
e ≺ 0 and b̃(d,
e d)
e ≻ 0, we have ρ > 0. Now for all 0 < r < ρ ≤ 1
As (Id − f̃ (µf ))d
e − (µf − rd)
e
f̃ (µf − rd)
′
e + r2 b̃(d,
e d)
e
= r(Id − f̃ (µf ))d
≺ 0.
e ≤ µf − rd
e for 0 < r < ρ. But µf is the least solution of f̃ (X) ≤ X over Rn by
This means f̃ (µf − rd)
≥0
virtue of the Knaster-Tarski theorem. So we get the desired contradiction.
Now we can prove Lemma 17, restated here.
Lemma 17. There is a constant c > 0 such that
q
ft k ≤ c · kµ>t − ν >t k
kµt − µ
ft = µ f t (X)[X >t /ν >t ] .
holds for all ν >t with 0 ≤ ν >t ≤ µ>t , where µ
Proof. Let S be an SCC at level t, i.e. S ∈ comp(t). S itself does not need to depend on all variables X >t . Thus,
let dep(S) be the set of variables on which S really depends on – excluding the variables corresponding to S, i.e.
X S . We may then write the MSP f S – corresponding to S – as f S (X S , X dep(S) ).
Now, let µdep(S) be the correct (non-negative) least fixed point of f dep(S) (X dep(S) ), and ν dep(S) the part of the
approximation ν >t relevant to S. Let
εdep(S) = µdep(S) − ν dep(S)
be the absolute error in the underlying SCCs relevant to S. The propagation error is then
δ S := µf S (X s , µdep(S) ) − µf S (X S , ν dep(S) ).
What we are going to show is that there is a CS > 0 such that
q
kδ S k ≤ CS εdep(S) .
Note that this is sufficient to prove the lemma as
e t k∞ =
kµt − µ
max
S∈comp(t)
kδ S k∞ ≤
max (CS ·
S∈comp(t)
q
εdep(S) ) ≤ ( max
∞
S∈comp(t)
CS ) ·
q
kµ>t − ν >t k∞
Because of the equivalence of norms on any vector space of finite dimension over R, we are guaranteed the
existence of an appropriate constant such that this holds in any other norm, too.
In the following, we therefore consider a fixed SCC S, and simplify the notation by setting:
• X := X S – the variables corresponding to the SCC S,
• Y := X dep(S) – the variables corresponding to the SCCs dep(S) on which S depends,
23
• F (X, Y ) := f S (X, Y ) – the restriction of the given system f to the SCC S,
• y ∗ := µdep(S) – the restriction of µ = µf to the variables Y ,
• x∗ := µf S (X s , µdep(S) ) – the least fixed point of F (X, µdep(S) ).
• ε := εdep(S) – the restriction of the approximation error ε = µ − ν, and
• δ := δ S – the error x∗ − µf (X, ν dep(S) ) introduced by replacing y ∗ by ν dep(S) .
So we consider the following parameterized MSP
x
x
x
x
n
m
n
F :R ×R →R :
7→ F (x, y) with F (x, y) = B(
,
)+L
+ c,
y
y
y
y
where B is a bilinear map, L is a matrix, and c is a vector.
As we assume that the whole MSP is “clean” and feasible, we have x∗ ≻ 0 and y ∗ ≻ 0.
We require some suitable approximation ν dep(S) = y ∗ − ε of y ∗ , i.e. 0 ≺ y ∗ − ε ≤ y ∗ . As F (X, y ∗ − ε) ≤
F (X, y ∗ ), the Kleene sequence of F (X, y ∗ − ε) is bounded from above by x∗ . So, for the least fixed-point
µf (X, ν dep(S) ) = x∗ − δ of F (X, y ∗ − ε) we have 0 ≤ δ ≤ x∗ . As we assume y ∗ − ε ≻ 0, F (X, y ∗ − ε)
stays clean, hence x∗ − δ ≻ 0, too.
We are now interested in bounding kδk by kεk. For this to do, let us rewrite the equation
δ
δ
δ
∗
∗
∗
∗
∗
∗
′ ∗
∗
).
,
+ B(
x − δ = F (x − δ, y − ε) as x − δ = F (x , y ) − F (x , y )
ε
ε
ε
With x∗ = F (x∗ , y ∗ ), and by moving all terms containing δ on the right hand side, and the remaining terms on
the left hand side, we get
0
δ
δ
δ
δ
0
0
0
′ ∗
∗
′ ∗
∗
) (8)
,
) + 2B(
,
+ B(
) = (Id − F (x , y ))
,
− B(
F (x , y )
ε
0
0
0
0
ε
ε
ε
We remark that
δ
δ
δ
(Id − F (x , y ))
+ B(
,
) = F (x∗ − δ, y ∗ ) − F (x∗ , y ∗ ) + δ
0
0
0
= F (x∗ − δ, y ∗ ) − (x∗ − δ).
′
∗
As 0 ≤ x∗ − δ ≤ x∗ , we have
∗
F (x∗ − δ, y ∗ ) − (x∗ − δ) > 0,
for δ > 0 – otherwise x∗ − δwould
x∗ of F (X, y ∗ ).
bea fixed point lessthan
δ
0
0
0
Combining Eq. 8 with B(
,
) ≥ 0 and −B(
,
) ≤ 0, we get
0
ε
δ
ε
∗ − δ)
0 < F (x∗ − δ, y ∗ ) − (x
δ
δ
δ
′
∗
∗
= (Id − F (x , y ))
+ B(
,
)
0
0
0
δ
δ
δ
δ
0
≤ (Id − F ′ (x∗ , y ∗ ))
+ B(
,
) + 2B(
,
)
0
0
0
0
ε
0
0
0
= F ′ (x∗ , y ∗ )
− B(
,
)
ε
ε
ε
0
≤ F ′ (x∗ , y ∗ )
.
ε
24
Thus we have
′ ∗ ∗ 0 ′ ∗ ∗ δ
δ
δ ′ ∗
∗
≤ F (x , y ) kεk .
0 < (Id − F (x , y ))
+ B(
,
) ≤ F (x , y )
0
0
0 ε Now, if we succeed in showing that there is always a constant C > 0 such that
δ
δ
δ 2
′ ∗
∗
+ B(
,
) ,
C kδk ≤ (Id − F (x , y ))
0
0
0 (9)
we will obtain a proof of Lemma 17.
Note that this last inequation does not depend on ε anymore. Hence, it is sufficient to consider f (X) :=
F (X, y ∗ ) in the following, forgetting about the underlying SCCs. We write
X
X
b(X, X) = B(
,
)
0
0
for the quadratic part of f .
Then the preceding inequation Eq. 9 may be written as
C kδk2 ≤ (Id − f ′ (x∗ ))δ + b(δ, δ) ,
(10)
where f ′ is now the Jacobian of f , i.e. taken only w.r.t. X. Because of the equivalence of norms on Rn , we may
turn to the Euclidean norm k·k2 and apply Proposition 26 to conclude the proof.
C.3
Proof of Theorem 18
Here is a restatement of Theorem 18.
Theorem 18. Let f be a quadratic MSP. Let ν (j) denote the result of calling DNM(f , j) (see Figure 1). Then
there is a kf ∈ N such that ν (kf +i) has at least i valid bits of µf for every i ≥ 0.
We first prove the following lemma which gives a bound on the error on level t.
Lemma 27. There is a constant c > 0 such that
t
(j) ∆t ≤ 2c−j·2 .
(j)
Proof. It follows from Theorem 8 that (f
µt (j) − ν t ), the approximation error at level t, decreases exponentially
in the number of iterations, i.e., there is a constant c1 > 0 such that
t
(j)
(j) ft − ν t ≤ 2c1 −j·2 .
(11)
µ
Now we can prove the theorem by induction on t. In the base case (t = h(f )) there is no propagation error, so the
claim of the lemma follows from (11). Let t < h(f ). Then
(j) (j) ft (j) + µ
ft (j) − ν t ∆t = µt − µ
(j)
(j) ft (j) + µ
ft − ν t ≤ µt − µ
t
ft (j) + 2c1 −j·2
(by (11))
≤ µt − µ
r
t
(j) ≤ c2 · ∆>t + 2c1 −j·2
(by Lemma 17)
√
t
≤ c2 · 2c3 −j·2t+1 + 2c1 −j·2 (by induction hypothesis)
t
for some constants c2 , c3 , c4 > 0.
≤ 2c4 −j·2
25
From Lemma 27 we deduce that for each component r of depth t there is a constant cr such that
t
(µf r − νr(j) )/µf r ≤ 2cr −j·2 ≤ 2cr −j .
Let kf ≥ cr for all 1 ≤ r ≤ n. Then
(j+kf )
(µf r − νr
)/µf r ≤ 2cr −(j+kf ) ≤ 2−j .
D Proofs of Section 6
D.1
Rest of the Proof of Theorem 19
i resp. M ∗ when we mean (M i ) resp. (M ∗ ) .
In the following, if M is a matrix we often write Mjk
jk
jk
jk
′ (k) ∗
The following lemma assures that in order to show that f (ν ) has no ∞ entries, it suffices to consider the
diagonal elements of the matrix.
∗
∗
Lemma 28. Let A = (aij ) ∈ Rn×n
≥0 . Let A have an ∞ entry. Then A also has an ∞ entry on the diagonal, i.e.
∗
Aii = ∞ for some 1 ≤ i ≤ n.
Proof. By induction on n. The base case n = 1 is clear. For n > 1 assume w.l.o.g. that A∗1n = ∞. We have
A∗1n
=
A∗11
n
X
a1j (A[2..n,2..n] )∗jn ,
(12)
j=2
where by A[2..n,2..n] we mean the square matrix obtained from A by erasing the first row and the first column.
To see why (12) holds, think of A∗1n as the sum of weights of paths from 1 to n in the complete graph over the
vertices {1, . . . , n}. The weight of a path P is the product of the weight of P ’s edges, and ai1 i2 is the weight of
the edge from i1 to i2 . Each path P from 1 to n can be divided into two sub-paths P1 , P2 as follows. The second
sub-path P2 is the suffix of P leading from 1 to n and not returning to 1. The first sub-path P1 , possibly empty, is
chosen such that P P
= P1 P2 . Now, the sum of weights of all possible P1 equals A∗11 , and the sum of weights of all
possible P2 equals nj=2 a1j (A[2..n,2..n] )∗jn . So (12) holds.
As A∗1n = ∞, it follows that either A∗11 or some (A[2..n,2..n] )∗jn equals ∞. In the first case, we are done. In the
second case, by induction, there is an i such that (A[2..n,2..n] )∗ii = ∞. But then also A∗ii = ∞, because every entry
of (A[2..n,2..n] )∗ is less or equal the corresponding entry of A∗ .
So it remains to show that f ′ (ν (k) )∗ss 6= ∞ for all 1 ≤ s ≤ n. This is done in the proof of Proposition 32 below.
There, two cases are considered, depending on whether Newton’s method terminates in the s-component or not.
The following lemma will be used for the nonterminating case.
Lemma 29. Let 0 ≤ ν ≤ f (ν) ≤ µf . Let S denote an SCC with ν S ≺ µf S . Then the submatrix f ′ (ν)∗SS does
not have ∞ as an entry.
Proof. Let L denote the set of variables which are not in S but on which a variable in S depends. Let g(X S ) :=
f S (X)[X L /µf L ]. Then g(X S ) is an scMSP with µg = µf S . As ν S ≺ µg, Theorem 6 (1) is applicable, so
g ′ (ν S )∗ does not have ∞ as an entry. With g ′ (ν S )∗ = f ′ (ν)∗SS , the lemma follows.
The next lemma is a version of Taylor’s theorem, which will be used in Lemma 31 below.
Lemma 30 (from [10]). Let 0 ≤ x ≤ f (x) and let d, k1 , k2 ∈ N with k2 ≥ k1 . Then
f d+k2 (x) − f d+k1 (x) ≥ f ′ (f k1 (x))d (f k2 (x) − f k1 (x)) ,
where by f r (X) we mean f (f r−1 (X)) with f 0 (X) = X.
26
Proof. The lemma follows from a generalized form of Taylor’s theorem stating:
For an MSP f and v, u ≥ 0:
f d (v + u) ≥ f d (v) + f ′ (v)d u .
For the sake of completeness we give a proof of this generalized form of Taylor’s theorem, closely following the
proof of [10].
For d = 1 (induction base) the statement is essentially Taylor’s theorem (see e.g. [5]). Let d ≥ 1. Then, by
Taylor’s theorem, we have:
f d+1 (v + u)
= f (f d (v + u))
≥ f (f d (v) + f ′ (v)d u)
(induction hypothesis)
≥ f d+1 (v) + f ′ (f d (v))f ′ (v)d u (Taylor)
≥ f d+1 (v) + f ′ (v)d+1 u
Lemma 30 itself follows with v = f k1 (x) and u = f k2 (x) − f k1 (x).
The following lemma will be used for the case in which Newton’s method terminates in some component Xs .
It states that if Newton’s method terminates in Xs it must have terminated before in some other component on
which Xs depends.
Lemma 31. Let 1 ≤ s, l ≤ n. Let f ′ (X)∗ss non-trivially depend on Xl . Let 0 ≺ ν ≤ f (ν) ≤ µf and ν s < µf s
b (ν)s < µf s .
and ν l < µf l . Then N
Proof. This proof follows closely a proof of [12]. Let d ≥ 0 s.t. f ′ (X)dss depends non-trivially on Xl . Let
′
m′ ≥ 0 s.t. f m (ν)l > νl . Such an m′ exists because with Kleene’s theorem the sequence (f k (ν))k∈N converges
to µf . Choose m ≥ m′ s.t. f m+1 (ν)s > f m (ν)s . Such an m exists because the sequence (f k (ν)s )k∈N never
reaches µf s . This is because Xs depends on itself (since f ′ (X)∗ss is not constant 0), and so every increase of the
s-component results in an increase of the s-component in some later iteration of the Kleene sequence.
Now we have
f d+m+1 (ν) − f d+m (ν)
≥ f ′ (f m (ν))d (f m+1 (ν) − f m (ν)) (Lemma 30)
≥∗ f ′ (ν)d (f m+1 (ν) − f m (ν))
≥ f ′ (ν)d f ′ (ν)m (f (ν) − ν)
(Lemma 30)
′
d+m
= f (ν)
(f (ν) − ν) .
The inequality marked with ∗ is strict in the s-component – this is due to the choice of d and m above. So, with
b = d + m we have:
(f b+1 (ν) − f b (ν))s > (f ′ (ν)b (f (ν) − ν))s
(13)
For other choices of b inequation (13) also holds, but with ≥ instead of >. Therefore:
P
i+1
µf s = ν + ∞
(ν) − f i (ν)) s (Kleene)
i=0 (f
> ν + f ′ (ν)∗ (f (ν) − ν) s
(inequation (13))
b (ν)
= N
s
Now we are ready for the central proposition of this proof of Theorem 19.
Proposition 32. For all k ≥ 0 the matrix f ′ (b
ν (k) )∗ does not have ∞ as entries.
27
Proof. Using Lemma 28 it is enough to show that f ′ (b
ν (k) )∗ss 6= ∞ for all s.
(k)
(k)
Case 1: The sequence (b
νs )k∈N does not terminate, i.e., νbs < µf s for all k ≥ 0. Then obviously this holds for
all variables in the SCC of Xs . So Lemma 29 applies, hence f ′ (b
ν (k) )∗ss 6= ∞.
(k)
(k )
(k +1)
Case 2: The sequence (b
νs )k∈N terminates, i.e., there is a ks ≥ 1 such that νbs s = νbs s
= . . . = µf s . Let
(ks −1)
(ks )
(ks +1)
ks be the smallest such number, i.e., νbs
< νbs = νbs
= µf s . So there is a variable Xu on which Xs
depends such that
b (ks −1) )u < ∞ ,
0 < f ′ (b
ν (ks −1) )∗su (f (b
ν (ks −1) ) − ν
where the latter inequality is implied by Proposition 20. This implies 0 < f ′ (b
ν (ks −1) )∗su < ∞, therefore also
(k
−1)
f ′ (b
ν s )∗ss < ∞. But with Lemma 31, any variable Xl on which f ′ (X)∗ss depends has already terminated
(k −1)
(k )
one step earlier, i.e. νbl s
= νbl s . Therefore f ′ (b
ν (ks ) )∗ss = f ′ (b
ν (ks −1) )∗ss < ∞. As the l-component does
not change any further we have f ′ (b
ν (k) )∗ss < ∞ for all k ≥ ks . Since f ′ (X) is monotone and (b
ν (k) )k∈N is
monotonically increasing, this holds also for 0 ≤ k ≤ ks .
Combining Proposition 32 with Proposition 20 and the comments below Proposition 20 yields Theorem 19.
D.2
Proof of Theorem 21
Here is a restatement of Theorem 21.
Theorem 21. Let f be any quadratic MSP. Then the Newton sequence (ν (k) )k∈N is well-defined and converges
h(f )
linearly to µf . More precisely, there is a kf ∈ N such that ν (kf +i·(h(f)+1)·2 ) has at least i valid bits of µf for
every i ≥ 0.
We argue that Theorem 18 assuring linear convergence for DNM essentially carries over to the “undecomposed”
method.
The following lemma states that a Newton step is not faster on an SCC, if the values of the lower SCCs are
fixed.
Lemma 33. Let f be an MSP. Let 0 ≤ ν ≤ f (ν) ≤ µf . Let S denote an SCC of f . Let L denote the set of
bf (ν))S ≥ N
bf [X /ν ] (ν S ), where
variables that are not in S, but on which a variable in S depends. Then (N
L
L
S
bf is defined as in Proposition 20.
N
Proof.
bf (ν))S
(N
= f ′ (ν)∗ (f (ν) − ν) S
= f ′ (ν)∗SS (f (ν) − ν)S
+f ′ (ν)∗SL (f (ν) − ν)L
≥ f ′ (ν)∗SS (f (ν) − ν)S
∗
= (f S [X L /ν L ])′ (ν S ) (f S [X L /ν L ](ν S ) − ν S )
bf [X /ν ] (ν S )
=N
L
L
S
The following lemma states the monotonicity of Newton’s method and was proved in [16].
Lemma 34 (Monotonicity of Newton’s Method). Let f (X) be an MSP. Then
Nf (x) ≤ Nf (y) for all 0 ≤ x ≤ y ≤ f (y) ≤ µf .
28
Lemma 33 and Lemma 34 can be combined to the following lemma stating that i · (h(f ) + 1) iterations of the
normal Newton’s method “dominate” i iterations of a decomposed Newton’s method in each SCC.
e (i) denote the result of a decomposed Newton’s method which performs i iterations of Newton’s
Lemma 35. Let ν
method in each SCC. Let ν (i) denote the result of i iterations of the normal “undecomposed” Newton’s method.
e (i) .
Then ν (i·(h(f)+1)) ≥ ν
Proof. Let h = h(f ). Let C(t) resp. C(> t) denote the set of variables in an SCC of depth t resp. > t. We show
by induction on the depth t:
(i·(h(f)+1−t))
(i)
e C(t)
ν C(t)
≥ν
Induction base: t = h(f ). Clear, because for bottom SCCs the two methods are identical.
Let now t < h(f ). Then
(i·(h+1−t))
ν C(t)
= Nfi (ν (i·(h−t)) )C(t)
≥ Ni
(i·(h−t))
≥ Ni
(i)
f C(t) [X/ν C(>t)
≥
=
]
(i·(h−t))
(ν C(t)
(ν
(i·(h−t))
f C(t) [X/e
ν C(>t) ] C(t)
i
(0C(t) )
N
(i)
f C(t) [X/e
ν C(>t) ]
(i)
et
ν
) (Lemma 33)
)
(induction hypothesis)
(Lemma 34)
Now, the lemma itself follows by using Lemma 34 once more.
As a side note, observe that above proof of Lemma 35 implicitly benefits from the fact that SCCs of the same
depth are independent. So, SCCs with the same depth are handled in parallel by the “undecomposed” Newton’s
method. Therefore, w(f ), the width of f , is irrelevant here (cf. Proposition 16).
Now we can finish the proof of Theorem 21. Let k2 be the kf of Theorem 18, and let k1 = k2 ·(h(f )+1)·2h(f) .
Then we have:
h(f )
ν (k1 +i·(h(f)+1)·2 )
h(f )
= ν ((k2 +i)·(h(f)+1)·2 )
h(f )
e ((k2 +i)·2 )
≥ν
(Lemma 35)
h(f )
e ((k2 +i)·2 ) has at least as many bits as the approximation obtained from running
The approximation ν
DNM(f , k2 + i). This is because DNM(f , k2 + i) runs at most (k2 + i) · 2h(f) iteration in every SCC and
h(f )
Newton’s method converges monotonically. So, by Theorem 18, ν (k1 +i·(h(f)+1)·2 ) has at least i valid bits of
µf . Therefore, Theorem 21 holds with kf = k1 .
29

Download Report

Convergence Thresholds of Newton`s Method for Monotone

Paperzz.com

Your Paperzz