Lecture Slides for ETHEM ALPAYDIN © The MIT Press, 2010 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Neural Networks Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010 Large connectitivity: 105 Parallel processing Distributed computation/memory Robust to noise, failures Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3 Understanding the Brain Levels of analysis (Marr, 1982) 1. Computational theory 2. Representation and algorithm 3. Hardware implementation Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD Neural net: SIMD with modifiable local memory Learning: Update by training/experience Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4 Perceptron d y w j x j w0 w T x j 1 w w 0 , w1 ,...,wd T x 1, x1 ,..., xd T (Rosenblatt, 1962) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5 What a Perceptron Does Regression: y=wx+w0 y w0 Classification: y=1(wx+w0>0) y y s w0 w w x x x0=+1 w0 x 1 y sigmoid o 1 exp w T x Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6 Regression: d K Outputs y i w ij x j wi 0 w Ti x j 1 y Wx Classification: oi w Ti x exp oi yi k exp ok choose C i if y i max y k k Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7 Training Online (instances seen one by one) vs batch (whole sample) learning: No need to store the whole sample Problem may change in time Wear and degradation in system components Stochastic gradient-descent: Update after a single pattern Generic update rule (LMS rule): wijt rit yit x tj Update LearningFa ctor DesiredOut put ActualOutp ut Input Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8 Training a Perceptron: Regression Regression (Linear output): 1 t 1 t t 2 T t 2 E w | x , r r y r w x 2 2 w tj r t y t x tj t t t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9 Classification Single sigmoid output y t sigmoid w T xt E t w | xt , r t r t log y t 1 r t log 1 y t w tj r t y t x tj K>2 softmax outputs exp w Ti xt t y T t exp w k kx E t w i i | xt , r t rit log y it w ijt rit y it x tj Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) i 10 Learning Boolean AND Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11 XOR No w0, w1, w2 satisfy: w0 0 w 2 w0 0 w0 0 w1 w 2 w 0 0 w1 (Minsky and Papert, 1969) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12 Multilayer Perceptrons H y i v Ti z v ih zh v i 0 h 1 zh sigmoid w Th x 1 exp 1 d j 1 w hj x j w h 0 (Rumelhart et al., 1986) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13 x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14 Backpropagation H y i v z v ih zh v i 0 T i h 1 zh sigmoid w Th x 1 exp 1 d j 1 w hj x j w h 0 E E y i zh w hj y i zh w hj Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15 Regression H 1 t t 2 E W, v | X r y 2 t v h r t y t zht y v z v0 t h 1 t h h t Backward E w hj w hj Forward zh sigmoid w x T h E y t zht t t t y z h w hj r t y t v h zht 1 zht x tj t x r t y t v h zht 1 zht x tj t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16 Regression with Multiple Outputs y i 1 t t 2 E W ,V | X ri y i 2 t i vih H y it v ih zht v i 0 zh h 1 v ih rit y it zht t t t t w hj ri y i v ih zh 1 zht x tj t i Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) whj xj 17 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 19 whx+w0 zh Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) vhzh 20 Two-Class Discrimination One sigmoid output yt for P(C1|xt) and P(C2|xt) ≡ 1-yt H y t sigmoid v h zht v 0 h1 E W , v | X r t log y t 1 r t log 1 y t t v h r t y t zht t w hj r t y t v h zht 1 zht x tj t Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 21 K>2 Classes H o v z vi 0 t i h 1 t ih h E W , v | X rit log y it t exp oit t y P C | x i t k exp ok t i i v ih rit y it zht t t t t w hj ri y i v ih zh 1 zht x tj t i Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 22 Multiple Hidden Layers MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks d z1h sigmoid w x sigmoid w1hj x j w1h 0 , h 1,..., H1 j 1 T 1h H1 z 2 l sigmoid w z sigmoid w 2 lh z1h w 2 l 0 , l 1,..., H2 h1 T 2l 1 H2 y vT z 2 vl z2l v0 l 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 23 Improving Convergence Momentum t E wit wit 1 wi Adaptive learning rate a if E t E t b otherwise Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24 Overfitting/Overtraining Number of weights: H (d+1)+(H+1)K Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 26 Structured MLP (Le Cun et al, 1989) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27 Weight Sharing Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 28 Hints (Abu-Mostafa, 1995) Invariance to translation, rotation, size Virtual examples Augmented error: E’=E+λhEh If x’ and x are the “same”: Eh=[g(x|θ)- g(x’|θ)]2 Approximation hint: if gx | ax , bx 0 E h gx | ax 2 if gx | ax gx | b 2 if gx | b x x Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29 Tuning the Network Size Destructive Constructive Weight decay: Growing networks E w i w i w i E' E 2 2 w i i (Ash, 1989) (Fahlman and Lebiere, 1989) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30 Bayesian Learning Consider weights wi as random vars, prior p(wi) pX | w pw ˆ MAP arg max log pw | X w pX w log pw | X log pX | w log pw C pw | X pw pw i i w i2 where pw i c exp 2 ( 1 / 2 ) E' E w 2 Weight decay, ridge regression, regularization cost=data-misfit + λ complexity More about Bayesian methods in chapter 14 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 31 Dimensionality Reduction Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 32 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 33 Learning Time Applications: Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association Network architectures Time-delay networks (Waibel et al., 1989) Recurrent networks (Rumelhart et al., 1986) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 34 Time-Delay Neural Networks Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 35 Recurrent Networks Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 36 Unfolding in Time Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 37
© Copyright 2026 Paperzz