Hand-written character recognition • MNIST: a data set of hand-written digits − 60,000 training samples − 10,000 test samples − Each sample consists of 28 x 28 = 784 pixels • Various techniques have been tried − Linear classifier: − 2-layer BP net (300 hidden nodes) − 3-layer BP net (300+200 hidden nodes) − Support vector machine (SVM) − Convolutional net − 6 layer BP net (7500 hidden nodes): Failure rate for test samples 12.0% 4.7% 3.05% 1.4% 0.4% 0.35% Hand-written character recognition • Our own experiment: − BP learning with 784-300-10 architecture − Total # of weights: 784*300+300*10 = 238,200 − Total # of Δw computed for each epoch: 1.4*10^10 − Ran 1 month before it stopped − Test error rate: 5.0% Risk-Averting Error Function • Mean Squared Error (MSE) 1 Q(w ) K y K k 1 k fˆ ( xk , w ) 2 1 K K g(w) k 1 k 2 where g k ( w ) yk fˆ ( xk , w ) and ( xk , yk ) : training data pairs, k 1, K , K • Risk-Averting Error (RAE) 1 J (w ) K 2 1 ˆ exp yk f ( xk , w ) = K k 1 K K exp g(w) k 1 k : Risk-Sensitivity Index (RSI) 1. James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010. Normalized Risk-Averting Error • Normalized Risk-Averting Error (NRAE) Cl (w) = 1 l ln Jl (w) It can be simplified as Cl (w) = gM (w) + ln hl (w) - ln K ] [ l 1 g M (w) : argmax gk (w), k K h (w ) : exp vk (w ) k 1 vk (w ) : g k (w ) g M (w ) The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method • A quasi-Newton method for solving the nonlinear optimization problems • Using first-order gradient information to generate an approximation to the Hessian (second-order gradient) matrix • Avoiding the calculation of the exact Hessian matrix can significantly save the computational cost during the optimization The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method The BFGS Algorithm: 1. Generate an initial guess x0 and an initial approximate inverse Hessian Matrix B0 = I . 2. Obtain a search direction pk at step k by solving: Bkpk = -Ñf (xk ) where Ñf (xk ) is the gradient of the objective function f (x) evaluated at xk . 3. Perform a line search to find an acceptable stepsize a k in the direction pk, then update xk+1 = xk + a kpk The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method 4. Set sk = a kpkand yk = Ñf (xk+1 ) - Ñf (xk ). 5. Update the approximate Hessian matrix Bk by T k T k k k k T k k k yk y Bss B Bk+1 = Bk + T yk sk s Bs 6. Repeat step 2-5 until x converges to the solution. Convergence can be checked by observing the norm of the gradient, Ñf (xk ) . The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Limited-memory BFGS Method: • A variation of the BFGS method • Only using a few vectors to represent the approximation of the Hessian matrix implicitly • Less memory requirement • Well suited for optimization problems with a large number of variables References 1. J. T. Lo and D. Bassu. An adaptive method of training multilayer perceptrons. In Proceedings of the 2001 International Joint Conference on Neural Networks, volume 3, pages 2013–2018, July 2001. 2. James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010. 3. BFGS: http://en.wikipedia.org/wiki/BFGS A Notch Function y = f (x) = c[1,2] [2.1,3.1] (x) where c A (x) = 1 if x Ñ A and c A (x) = 0 if x Ñ A MSE vs. RAE MSE vs. RAE
© Copyright 2025 Paperzz