Applied Mathematical Sciences, Vol. 8, 2014, no. 172, 8601 - 8609 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.410805 On Hessians of Composite Functions Čestmı́r Bárta, Martin Kolář and Jan Šustek Department of Mathematics, Faculty of Science The University of Ostrava, 701 03 Ostrava Czech Republic c 2014 Čestmı́r Bárta, Martin Kolář and Jan Šustek. This is an open Copyright access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract In this paper we will study the Hessian matrices of composite functions F which are in the form F (x) = f (g(x)) with f : R → R and g : Rn → R. We present an explicit formulas for the Hessian of F and for the inverse of the Hessian matrix. Mathematics Subject Classification: 26B05 Keywords: Hessian, composite function 1 Introduction For a function f : Rn → R the Hessian matrix Hf (x) is defined as the n × n matrix of the second-order partial derivatives, 2 ∂2f ∂ f . . . 2 ∂x1 ∂xn ∂x. 1 .. ... . Hf (x) = . . . 2 2 ∂ f ∂ f ... ∂xn ∂x1 ∂x2 n The determinant of the above matrix is called Hessian. The Hessian matrix plays an important role in many areas of mathematics. It is important in neural computing, for instance several non-linear optimization algorithms used for training neural networks are based on considerations of the second-order properties of the error surface, which are controlled by the Hessian matrix [1]. 8602 Čestmı́r Bárta, Martin Kolář and Jan Šustek The inverse of the Hessian matrix has been used to identify the least significant weights in a network as a part of network ‘pruning’ algorithms, or it can also be used to assign error bars to the predictions made by a trained network [1]. Hessian matrices are also applied in Morse theory [4]. Throughout this paper we will study the Hessian matrices of composite functions F which are in the form F (x) = f (g(x)) with f : R → R and g : Rn → R. We present an explicit formulas for the Hessian of F and for the inverse of the Hessian matrix. In previous years, several papers were published with such formulas, but only for special functions F , see e.g. [2] or [6]. These results follow easily from our general results. We will use the following notation for simple work with vectors and matrices a(1) row a(i) = a(1) . . . a(n) , col a(i) = ... , a(n) a(1) 0 a(1, 1) . . . a(1, n) .. , .. .. mat a(i, j) = ... diag a(i) = . . . . 0 a(n) a(n, 1) . . . a(n, n) Using this notation we have ∇g(x) = col ∂g ∂xi and I = diag 1 = mat δij , where δij is the Kronecker symbol, i.e. δii = 1 and δij = 0 for i 6= j. 2 Results Our first result concerns the determinant of the Hessian matrix. Theorem 1 Let f : R → R and g : Rn → R be real functions and let F (x) = f (g(x)) be their composition. Then the Hessian of the function F equals f 00 (g(x)) T −1 ∇g(x) · Hg(x) · ∇g(x) f 0 (g(x))n det Hg(x) . det HF (x) = 1 + 0 f (g(x)) The following result concerns the inverse of the Hessian matrix. Theorem 2 Let f : R → R and g : Rn → R be real functions and let F (x) = f (g(x)) be their composition. Then the inverse of the Hessian matrix of the function F equals Hg(x)−1 · ∇g(x) · ∇g(x)T Hg(x)−1 −1 HF (x) = I − f 0 (g(x)) · 0 . T · Hg(x)−1 · ∇g(x) f (g(x)) + ∇g(x) 00 f (g(x)) 8603 On Hessians of composite functions Theorem 3 is a consequence of previous theorems for a particular class of functions g. Theorem 3 Let αi ∈ R for i = 1, . . . , n and let f : R → R be a real function. n Q Put g(x) = xαk k and F (x) = f (g(x)). Then the Hessian of F equals k=1 det HF (x) = where |α| = n Y g(x)f 00 (g(x)) |α| 0 n n+1 1+ f (g(x)) (−1) (|α|−1) αk xknαk −2 , 0 f (g(x)) |α| − 1 k=1 n P αk . The inverse of the Hessian matrix is k=1 −1 HF (x) 1 1 mat − = g(x)f 0 (g(x)) |α| − 1 δij xi xj . − (|α| − 1)2 + |α|(|α| − 1) αi 1 f 0 (g(x)) g(x)f 00 (g(x)) Example 1 Chen [2] considered function of the form F (x) = f n P hk (xk ) k=1 and computed inductively its Hessian. We obtain the same result directly n P hk (xk ). Hence using Theorem 1. We have g(x) = k=1 ∇g(x) = col h0i (xi ) , n Y det Hg(x) = h00k (xk ) , Hg(x) = diag h00i (xi ) , Hg(x)−1 = diag k=1 1 h00i (xi ) . Then Theorem 1 implies that n Y f 00 (g(x)) 1 0 0 0 n det HF (x) = 1 + 0 row hi (xi ) · diag 00 · col hi (xi ) f (g(x)) h00k (xk ) f (g(x)) hi (xi ) k=1 n n 00 0 2 X Y f (g(x)) hk (xk ) = 1+ 0 f 0 (g(x))n h00k (xk ) . 00 f (g(x)) k=1 hk (xk ) k=1 Moreover, using Theorem 2, we obtain the inverse of the Hessian matrix. diag 1 1 diag h00 (x · col h0i (xi ) · row h0i (xi ) h00 i) −1 i i (xi ) HF (x) = I − f 0 (g(x)) 0 (g(x)) 1 0 0 f + row hi (xi ) · diag h00 (xi ) · col hi (xi ) f 00 (g(x)) i ! h0i (xi )h0j (xj ) 1 h00 i (xi ) = mat δij − · diag 0 n 0 2 P hk (xk ) f (g(x))h00i (xi ) f 0 (g(x)) + 00 00 f (g(x)) h (xk ) k k=1 = mat δij − h0i (xi )h0j (xj ) h00 i (xi ) f 0 (g(x)) f 00 (g(x)) + n P k=1 h0k (x2k ) h00 k (xk ) ! 1 f 0 (g(x))h00j (xj ) 8604 Čestmı́r Bárta, Martin Kolář and Jan Šustek Example 2 Trojovský and Hladı́ková [6] considered function F (x) = exp n Q xk k=1 and computed its Hessian as the sum of 2n simpler determinants. We obtain the same result directly using Theorem 3. We have f (t) = et and αi = 1 for every i. Hence f 0 (t) = f 00 (t) = et . Then Theorem 3 implies that n eng(x) (−1)n+1 (n − 1)g(x)n−2 det HF (x) = 1 + g(x) n−1 n Q n n Y n xk Y n+1 k=1 n−1+n = (−1) xk e xkn−2 . k=1 k=1 Moreover we obtain the inverse of the Hessian matrix. 1 1 1 −1 HF (x) = − δij xi xj mat − (n−1)2 g(x)eg(x) n−1 + n(n − 1) g(x) n Q 1 = mat n Q e k=1 xk n Q xk 1 − n−1 xk − δij xi xj k=1 (n − 1)2 + n(n − 1) xk k=1 k=1 s Example 3 Consider the function F (x) = n Q n Q xk , i.e. the geometric mean. √ n Using the notation of Theorem 3, we have f (t) = t and αi = 1 for every i. 1 1 1 n 1 1 −1 −2 0 00 n Hence f (t) = n t and f (t) = n n − 1 t . Then Theorem 3 implies that 1 g(x) n1 n1 − 1 g(x) n −2 n 1 g(x)1−n (−1)n+1 (n−1)g(x)n−2 = 0 . det HF (x) = 1+ 1 n 1 −1 n−1 n g(x) n n k=1 n 3 Proofs In our proofs we will need the following two lemmas. We present them also with their short proofs. Lemma 1 (Matrix determinant lemma) ([3], Lemma 1.1) Let A be an invertible n × n matrix and let u, v ∈ Rn be column vectors. Then det(A + uvT ) = (1 + vT A−1 u) det A . Proof. By a direct computation we easily obtain the identity A 0 I + A−1 uvT A−1 u I 0 A u = . vT 1 0T 1 −vT 1 0T 1 + vT A−1 u 8605 On Hessians of composite functions Taking determinants of both sides we immediately obtain det(A + uvT ) = det A · det(I + A−1 uvT ) = det A · (1 + vT A−1 u) . Lemma 2 (Sherman-Morrison formula) ([5], Section 2.7.1) Let A be an invertible n × n matrix and let u, v ∈ Rn be column vectors. Suppose that the matrix A + uvT is invertible. Then A−1 uvT T −1 A−1 . (A + uv ) = I − T −1 1+v A u Proof. From the assumption and Lemma 1 it follows that 1 + vT A−1 u 6= 0. Then A−1 uvT + (vT A−1 u)A−1 uvT 1 + vT A−1 u −1 T A uv + A−1 u(vT A−1 u)vT = I + A−1 uvT − 1 + vT A−1 u −1 A uvT A−1 = A−1 (A + uvT ) − (A + uvT ) T A−1 u 1 + v A−1 uvT A−1 −1 = A − (A + uvT ) 1 + vT A−1 u I = I + A−1 uvT − We obtain the result by multiplying by (A + uvT )−1 . Using these lemmas we now prove our theorems. Proof of Theorem 1. The second-order partial derivatives of F are ∂ 2F ∂g ∂g ∂ 2g = f 00 (g(x)) (x) (x) + f 0 (g(x)) (x) . ∂xi ∂xj ∂xi ∂xj ∂xi ∂xj (1) Hence the Hessian matrix can be written as HF (x) = A + uvT , where A = f 0 (g(x))Hg(x) , u = f 00 (g(x))∇g(x) , v = ∇g(x) . (2) From Lemma 1 we obtain −1 det HF (x) = 1+∇g(x)T · f 0 (g(x))Hg(x) ·f 00 (g(x))∇g(x) det f 0 (g(x))Hg(x) f 00 (g(x)) T −1 = 1+ 0 ∇g(x) · Hg(x) · ∇g(x) f 0 (g(x))n det Hg(x) . f (g(x)) 8606 Čestmı́r Bárta, Martin Kolář and Jan Šustek Proof of Theorem 2. As in the proof of Theorem 1 we obtain (1) and (2). Then Lemma 2 implies −1 HF (x) −1 T −1 f 0 (g(x))Hg(x) · f 00 (g(x))∇g(x) · ∇g(x) 0 = I− · f (g(x))Hg(x) −1 T · f 00 (g(x))∇g(x) 1 + ∇g(x) · f 0 (g(x))Hg(x) Hg(x)−1 · ∇g(x) · ∇g(x)T Hg(x)−1 = I − f 0 (g(x)) · 0 . f (g(x)) + ∇g(x)T · Hg(x)−1 · ∇g(x) 00 f (g(x)) Proof of Theorem 3. The derivatives of the function g are ∂g αi = g(x) , ∂xi xi ∂ 2g αi (αi − 1) = g(x) , 2 ∂xi x2i ∂ 2g αi αj = g(x) . ∂xi ∂xj xi xj Therefore its gradient is equal to ∇g(x) = g(x) col αi xi and its Hessian matrix is αi αj αi Hg(x) = mat − diag 2 g(x) = −g(x)(A + uvT ) , xi xj xi where A = diag αi , x2i u = − col αi , xi v = col αi . xi Then Lemma 1 implies that n det Hg(x) = (−g(x)) x2i αi αi · diag · col 1 − row xi αi xi n = (−g(x)) (1 − |α|) = (−1) (1 − |α|) n Y αk x2 k=1 k n Y αk k=1 n Y n x2k αk xknαk −2 . (3) k=1 Lemma 2 implies that x2 Hg(x)−1 diag αxii · col αxii · row αxii · diag αii −1 x2i = diag + x2 g(x) αi 1 − row αxii · diag αii · col αxii −1 x2i mat xi xj = diag + . g(x) αi 1 − |α| ! (4) On Hessians of composite functions 8607 We will use the following identity row x2 αi αi xi xj αi αi · diag i · col + row · mat · col xi αi xi xi 1 − |α| xi |α|2 |α| = |α| + = . (5) 1 − |α| 1 − |α| Then (3), (4), (5) and Theorem 1 imply that det HF (x) f 00 (g(x)) αi −1 x2i xi xj αi = 1+ 0 g(x) row · diag + mat · g(x) col f (g(x)) xi g(x) αi 1 − |α| xi n Y × f 0 (g(x))n (−1)n (1 − |α|) αk xknαk −2 k=1 x2i αi αi xi xj g(x)f 00 (g(x)) αi αi · diag · col + row · mat = 1− row · col f 0 (g(x)) xi αi xi xi 1 − |α| xi n Y αk xknαk −2 × f 0 (g(x))n (−1)n+1 (|α| − 1) k=1 n Y g(x)f 00 (g(x)) |α| 0 n n+1 = 1+ f (g(x)) (−1) (|α| − 1) αk xknαk −2 . f 0 (g(x)) |α| − 1 k=1 By a direct computation, using (5), we obtain ∇g(x)T · Hg(x)−1 · ∇g(x) αi −1 x2i mat xi xj αi = g(x) row · diag + · g(x) col xi g(x) αi 1 − |α| xi 2 αi xi αi αi mat xi xj αi = −g(x) row · diag · col + row · · col xi αi xi xi 1 − |α| xi |α| = g(x) (6) |α| − 1 and Hg(x)−1 · ∇g(x) · ∇g(x)T −1 x2i mat xi xj αi αi = diag + · g(x) col · g(x) row g(x) αi 1 − |α| xi xi 2 xi αi αi 1 αi αi = −g(x) diag · col · row + mat xi xj · col · row αi xi xi 1 − |α| xi xi g(x) α j xi = mat . (7) |α| − 1 xj 8608 Čestmı́r Bárta, Martin Kolář and Jan Šustek Then (6), (7) and Theorem 2 imply −1 HF (x) = = = I− ! Hg(x)−1 · ∇g(x) · ∇g(x)T · f 0 (g(x)) Hg(x)−1 f 0 (g(x)) + ∇g(x)T · Hg(x)−1 · ∇g(x) f 00 (g(x)) ! α j xi g(x) mat x2i mat xi xj −1 |α|−1 xj diag + I − f 0 (g(x)) · |α| g(x)f 0 (g(x)) αi 1 − |α| + g(x) |α|−1 f 00 (g(x)) 1 α j xi x2i mat xi xj β mat − I · diag + , (8) 0 g(x)f (g(x)) xj αi 1 − |α| where β= g(x) |α|−1 f 0 (g(x)) f 00 (g(x)) |α| + g(x) |α|−1 . Now we have x2 α j xi · diag i xj αi αj xi mat xi xj β mat · xj 1 − |α| x2 −I · diag i αi xi xj −I · mat 1 − |α| β mat From the identity β + (9) + (10) = β|α| 1−|α| = β 1−|α| = β mat xi xj , (9) β|α| mat xi xj , 1 − |α| δij = − mat xi xj , αi 1 = mat xi xj . |α| − 1 = (10) (11) (12) we obtain that −1 f 0 (g(x)) (|α| g(x)f 00 (g(x)) − 1)2 + |α|(|α| − 1) mat xi xj . (13) Finally, (8), (11), (12) and (13) imply that 1 1 1 δij −1 HF (x) = mat − f 0 (g(x)) − xi xj . 2 + |α|(|α| − 1) g(x)f 0 (g(x)) |α| − 1 αi (|α| − 1) 00 g(x)f (g(x)) Acknowledgement The authors were supported by grant GAČR P201/12/2351 of the Czech Science Foundation and by grant SGS08/PřF/2014 of the University of Ostrava. On Hessians of composite functions 8609 References [1] Ch.M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995. [2] B.-Y. Chen. An Explicit Formula of Hessian Determinants of Composite Functions and Its Applications. Kragujevac Journal of Mathematics, 36 (1), 2012, 27–39. [3] J. Ding, A. Zhou. Applied Mathematics Letters, 20 (12), 2007, 1223–1226. [4] J. Milnor. Morse theory. Princeton University Press, 1969. [5] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery. Numerical Recipes: The Art of Scientific Computing (3rd ed.), Cambridge University Press, New York, 2007. [6] P. Trojovský, E. Hladı́ková. On the Hessian of the Exponential Function with n Variables. International Journal of Pure and Applied Mathematics, 66 (3), 2011, 287–296. Received: October 19, 2014; Published: December 1, 2014
© Copyright 2026 Paperzz