D. Yu. Grigoriev Introduction - Laboratory of Mathematical Logic | of

DEVIATION THEOREMS FOR PFAFFIAN SIGMOIDS
D. Yu. Grigoriev
Abstract. By a Pfaan sigmoid with a depth d we mean a circuit with d layers in which rational
operations are admitted at each layer, and to jump to the next layer one solves an ordinary dierential
equation of the type v0 = p(v) where p is a polynomial with the coecients being the functions computed
at the previous layers of the sigmoid. Thus, Pfaan sigmoid computes Pfaan functions (in the sense of
A. Khovanskii). The deviation theorem is proved which states that for a real function 0 6 f computed by a
Pfaan sigmoid with a depth (or parallel complexity) d there exists an integer n such that the inequalities
(exp ( ( exp (jxjn ) ) 1 jf (x)j exp ( ( exp (jxjn ) ) hold for all jxj x0 for a certain x0 ,
where the iteration of the exponential function is taken d times. One can treat the deviation theorem as
an analogue of the Liouvillean theorem (on algebraic numbers) for Pfaan functions.
Introduction
Under Pfaan sigmoid (cf. [MSS],[G] ) with a depth d we understand a computational
circuit having d layers such that at each layer the rational operations are admitted and
for a jump to (i + 1)-th layer, computing a function wi+1 : ! at (i + 1)-th layer
is admitted, being a solution of a dierential equation of the rst order wi0+1 = p(wi+1)
(cf. [Kh] ) where p is a polynomial with the coecients computed at the previous layers
of the sigmoid (see (1) and the section 1 for the exact denitions). In particular, a jump
could be made by taking exp or log of a function computed at previous layers, this kind of
Pfaan sigmoids are called elementary and were considered in [G], where more generally
the sigmoids were introduced in which for a jump to the next layer one can substitute a
function from a previous layer into a solution of a linear ordinary dierential equation with
the polynomial coecients, thus, into exp or log, in particular. Another particular case
of elementary sigmoids are \standard" sigmoids (see [MSS] ) where the jump is made by
applying the function (1 + exp( x)) 1 . So, a function computed at (i + 1)-th layer of a
sigmoid introduced in [G], satises a linear ordinary dierential equation with the coecients from the previous layers 1; : : : ; i. In the present paper the corresponding dierential
R
R
AMS subject classication: 68C25.
Key words and phrases: pfaan sigmoid, deviation theorems, parallel complexity.
Typeset by AMS-TEX
equation could be non-linear, but of the special form (see (1) below), providing that the
computed functions are Pfaan (see [Kh] ).
The main result (see the theorem in the section 1) states that two dierent functions
f1 ; f2 : ! computed by a Pfaan sigmoids with a depth (or parallel complexity) d
cannot be too close to each other, namely (exp( (exp p(x) ))) 1 j(f1 f2 )(x)j exp( (exp p(x))) for a suitable polynomial p 2 [x] and for all x x0 for some x0 2 ,
where the number of iterations of the exponential function equals to d. This type of the
results was called in [G] the deviation theorem where they were proved for the sigmoids
introduced in [G] (cf. above). Also the deviation theorem could be treated as an analogue
of Liouvillean theorem on the bound on the dierence of two (dierent) algebraic numbers,
for the functions computed by Pfaan sigmoids. In particular, it gives a lower bound on
the approximation of a function computed by a Pfaan sigmoid by means of a rational
function (see the corollary in the section 1). On the other hand, one could interpret the
theorem and the corollary as lower bounds on the depth (so, parallel complexity) of a
Pfaan sigmoid, computing a function, provided that it has a rather good approximation
by a \simple" (for example, rational) function.
R
R
R
R
The proof of the theorem is conducted by induction on the depth of a sigmoid. In the
section 2 an upper bound on a function computed by a Pfaan sigmoid is ascertained (see
the lemma), in the section 3 a lower bound and thus the inductive step are proved.
1. Pfaan sigmoids and dierential elds.
Denote the eld P0 = (X ), then by induction on i the eld Pi+1 is generated over
j ) : ! (maybe having a nite number of singularities)
Pi by all the functions wi(+1
satisfying rst-order nonlinear dierential equations of the form
R
R
R
wi j
( )
+1
0
=q
where a polynomial q(Z ) 2 Pi [Z ].
2
wi j
( )
+1
(1)
According to [Kh] any function f 2 Pi , being Pfaan, has a nite number of singularities and roots (for i = 1 see also [B] ). Hence for every two functions f1 ; f2 2
Pi ; f1 6= f2 the dierence (f1 f2 )(x) is either positive or negative everywhere on an
interval x 2 [x0 ; 1) for a certain x0 2 , we write f1 f2 or f1 f2 respectively. By
p1; p2 ; 2 [X ] we'll denote the polynomials with the positive leading coecients. By
exp(i) = exp( (exp) ) we denote the iteration of the exponential function i times.
Obviously exp(i)(p1 ); (exp(i)(p1 )) 1 2 Pi. Now we are able to formulate the main result
of the paper (cf. [G] ).
R
R
Theorem. For any function 0 6 f
2 Pi there exists a polynomial p1 such that
(exp(i) p1) 1 jf j exp(i) p1 :
The theorem will be proved in the next two sections, now we indicate the connections with
the sigmoids (cf. [MSS]). Under a Pfaan sigmoid with the depth d we understand a
j ) at (i +1)-th layer satises an
computations consisting of d layers such that a function wi(+1
P
q`Z ` are
equation (1) where the coecients q` ; 0 ` n of the polynomial q(Z ) =
0`n
j
the rational functions in the functions of the form wi computed at the previous (1; : : : ; i)
( 1)
layers of the sigmoid (cf. [G] ). Thus, at each layer of the Pfaan sigmoid the rational
operations are admitted and the jump from the previous layer to the next one is done
j ) computed at
by solving an equation (1). Thus, inducion on i shows that a function wi(+1
(i + 1)-th layer of a Pfaan sigmoid belongs to Pi+1.
j ) was obtained
In [G] the elementary sigmoids were considered where the function wi(+1
j ) = exp(f =f ) or w(j ) = log(f =f ) where f ; f were the polynomials in the
as either wi(+1
1
2
1
2
1
2
i+1
functions of the form wi(j ) computed at the previous (1; : : : ; i) layers of the elementary
sigmoid. Evidently, an elementary sigmoid is a particular case of Pfaan sigmoids. In its
turn, so-called \standard" sigmoid where the jump from the previous layers to the next
one is fullled by applying the gate function (1 + exp( x)) 1 ( [MSS] ) is a particular case
of the elementary sigmoids.
1
3
An important question, how good one can approximate a function computed by a
Pfaan sigmoid by means of a rational function (cf. [G] ).
Corollary. If a function wd 2 Pd is computed by a Pfaan sigmoid with a depth d then
for any rational function r 2 P0 holds jwd rj (exp(d)(p2 ))
p2, unless wd = r.
1
for a suitable polynomial
2. Upper bound on Pfaan function.
We start proving the theorem by induction on i. The base of induction for P0 is obvious,
let us proceed to the inductive step. In the present section we'll prove the required upper
j ) (see (1)).
bound on the function wi(+1
Assume by inductive hypothesis the coecients q` 2 Pi; qn 6= 0 of the polynomial
q satisfy the following inequalities (exp(i)(p2 )) 1 jq`j exp(i)(p2 ); 0 ` n for an
appropriate polynomial p2.
j ) j 4(exp(i) (p ))2 is valid, the required upper bound is proved. Suppose the
If jwi(+1
2
contrary. Then
j ) )0 1 (exp(i)(p )) 1 q + qn 1 + + q0 = (wi(+1
(i)
exp (p2 ) + 1
n
2
(
j
)
(
j
)
(
j
)
n
n
2
wi+1
(wi+1)
(wi+1)
x
j ) (x)j C + (exp(i) (p ) + 1) exp(i)(p (x)) for suciently large
If n = 0 then jwi(+1
0
2
3
R
x0
j ) (x)) n+1 j x x0 and suitable polynomial p3 and C0 2 . If n 2 then j(wi(+1
1
R
(n 1) 21 (exp(i)(p2 )) 1 (exp(i)(p4 (x))) 1 for suciently large x and a suitable polyx
j ) j (exp(i) (p )). Finally, if n = 1 then log jw(j ) (x)j C +
nomial p4, therefore jwi(+1
4
1
i+1
Rx
(exp(i)(p2 ) + 1) exp(i)(p5 (x)) for suciently large x x0 and suitable polynomial p5
R
x0
j ) j exp(i+1)(p ) for a suitable polynomial p .
and C1 2 , therefore jwi(+1
6
6
R
We summarize the proved upper bound in the following lemma (for i = 0 one can nd
its proof in [B] ).
4
j ) satises (1)
Lemma. Assume the statement of the theorem is proved for Pi and wi(+1
where deg(q) = n. Then
j ) j exp(i) (p )
a) if n = 0 or n 2 then jwi(+1
7
j ) j exp(i+1) (p )
b) if n = 1 then jwi(+1
7
for an appropriate polynomial p7.
Remark. For any polynomial h 2 Pi [Y1; : : : ; Ym ] the similar upper bound as in the lemma
j ) ; : : : ; w(jm ) )j exp(i+1) (p ) is valid where the functions w(j ) ; : : : ; w(jm ) satisfy
jh(wi(+1
8
i+1
i+1
i+1
js ) )0 = q (s) (w(js ) ) where q (s) 2 P [Z ]; 1 s m. This
similar to (1) equations, namely (wi(+1
i
i+1
j ) ; : : : ; w(jm ) ) 2 P required in the theorem.
is an upper bound on a function h(wi(+1
i+1
i+1
1
1
1
3. Lower bound on Pfaan function.
j ) ; : : : ; w(jm ) )j required in the
Now we proceed to proving a lower bound on jh(wi(+1
i+1
j ) ; ; w(jm ) are algebraically indepentheorem. Firstly, we consider the case when wi(+1
i+1
j ) ; : : : ; w(jm ) )j dent over Pi . Assume that the required lower bound is wrong, so jh(wi(+1
i+1
j ) ; ; w(jm ) )
(exp(i+1)(p`)) 1 for all the polynomials p`. Then we say that the function h(wi(+1
i+1
is small. Also we assume that m is the least possible with this property. Finally, without
loss of generality one can assume that the polynomial h is irreducible over Pi.
1
1
1
1
j ) ; ; w(jm ) ))0 2 P is a Pfaan function [Kh], it should
As the derivative (h(wi(+1
i+1
i+1
(js )
j ) ; ; w(jm ) ))0 = P
@h
(s)
be also small. One can represent (h(wi(+1
i+1
@w js q (wi+1 ) =
1
1
sm
1
j ) ; ; w(jm ) ) for a certain polynomial g 2 P [Y ; ; Y ].
g(wi(+1
i 1
m
i+1
(
)
i+1
1
If h g in the ring Pi [Y1; ; Ym ] then there exist polynomials h1; g1 2 Pi[Y1; ; Ym ]
such that 0 6 h h1 + gg1 2 Pi [Y1; ; Ym 1] since h is irreducible. But then the function
j ) ; ; w(jm ) ) is small, applying the remark at the end of the section 2
(hh1 + gg1)(wi(+1
i+1
to the polynomials h1; g1, that contradicts to the minimality of the choice of m.
-
1
1
Now suppose that g = hg0 whereg0 2 Pi [Y1; ; Ym ]. Consider any 1 s m for
js ) ) deg j (h(w(j ) ; ; w(jm ) ))
which degZ (q(s)) 1, then degwijs @w@hjs q(s)(wi(+1
i+1
i+1
wi s
( )
+1
(
)
i+1
5
( )
+1
1
@h(j ) q (`) (w(j` ) )
i+1
@wi+1`
j ) ; ; w(jm ) )) for every ` 6= s, hence
degwijs (h(wi(+1
and as degwijs
i+1
js ) does not occur in the polynomial g (w(j ) ; ; w(jm ) ). If for some 1 s m deg (q (s) )
wi(+1
0
Z
i+1
i+1
js ) j exp(i) (p ) for an appropriate polynomial p .
2 then lemma implies that jwi(+1
9
9
j
j
0
m
(
h
(
w
;
;w
))
j ) ; ; w(jm ) )j exp(i) (p ) for a certain p . Thus, j
i
i
Therefore jg0(wi(+1
10
10
i+1
h(wij ; ;wijm ) j
j ) ); ; w(jm ) j j exp(i) (p ) and jh(w(j ) ; ; w(jm ) )j exp(i)(p10), hence j log jh(wi(+1
11
i+1
i+1
i+1
j ) ; ; w(jm ) ) is small
(exp(i+1)(p11)) 1 . This contradicts to the assumption that h(wi(+1
i+1
j ) ; ; w(jm ) are
and proves the required in the theorem lower bound in the case when wi(+1
i+1
algebraically independent over Pi .
( )
+1
1
( )
+1
1
( 1)
+1
( 1)
+1
1
1
(
)
+1
(
)
+1
1
1
1
j ) ; ; w(js )
In the general case choose some transcendental over Pi basis (let it be wi(+1
i+1
j ) ; ; w(jm ) . Then there exists a polynomial t(Z ) =
without loss of generality) among wi(+1
i+1
P
j ) ; ; w(js ) ][Z ] where the coecients t(`) 2 P [w(j ) ; ; w(js ) ]; 0 t(`)Z ` 2 Pi[wi(+1
i i+1
i+1
i+1
1
1
0
1
`K
1
j ) ; ; w(jm ) )) 0. Since we have proved that
` K and t(0) 6 0, such that t(h(wi(+1
i+1
jt(0)j (exp(i+1)(p12 )) 1 and by lemma and remark after it jt(`)j (exp(i+1)(p12 )); 0 j ) ; : : : ; w(jm ) )j 1 (exp(i+1)(p )) 2 .
` K for a certain p12, we obtain that jh(wi(+1
12
i+1
2
1
1
This completes the proof of the inductive step in the proof of the theorem (see the
beginning of the section 2) because any element in Pi+1 can be represented as a quotient
j ) ; ; w(jm ) )h(2) (w(j ) ; ; w(jm ) ) for some elements w(j ) ; ; w(jm ) 2 P
h(1)(wi(+1
i+1
i+1
i+1
i+1
i+1
i+1
js ) )0 = q (s)(w(js ) ); 1 s m and for some
satisfying the equations of the kind (1) (wi(+1
i+1
polynomials h(1); h(2) 2 Pi [Y1; ; Ym ]. The theorem is proved.
1
1
1
6
BIBLIOGRAPHY
[B] Bellman, R., Stability theory of dierential equations. McGraw-Hill Book Co., 1953..
[G] Grigoriev, D., Deviation theorems for solutions of linear ordinary dierential equations and applications to lower bounds on parallel complexity of sigmoids, to appear.
[Kh] Khovanskii, A., Fewnomials, Transl. Math. Monogr. AMS, 88 (1991).
[MSS] Maass, W., Schnitger, G., Sontag, E., On the computational power of sigmoid versus boolean
threshold circuits, Proc. 32 FOCS, IEEE, (1991), pp. 767{776.
Departments of Computer Science and Mathematics
Penn State University
State College, PA 16802 USA
On leave from
Mathematical Institute
Fontanka 27
St. Petersburg 191011 RUSSIA
7