The role of optimization in machine learning DOMINIK CSIBA, MLMU BRATISLAVA, 19.APRIL 2017 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 1 Motivation 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 2 Linear models 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 3 Linear models as optimization Features LASSO Logistic regression Label Regularizer Loss function Features 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA Label 4 Dimensionality Reduction - PCA 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 5 PCA as optimization Features Loss function Features Principal Components 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA Constraint Principal Components 6 Matrix completion -1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 19. April 2017 -1 1 -1 1 1 1 -1 1 -1 -1 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 1 -1 1 -1 -1 1 -1 1 -1 -1 -1 1 -1 1 1 7 Matrix completion as optimization Constraint Loss function Low rank matrix 19. April 2017 Observed indices Observed matrix MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 8 Real-time bidding in online advertising AUCTION WEBPAGE Banner 0.1$ Competitor 19. April 2017 0.05$ 0.01$ Banner Banner Competitor Competitor MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 9 Real-time bidding as optimization Arrive sequentially Feasible Allocations 19. April 2017 Utility function Spent Budget Allocation Allocation Penalty MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 10 Optimization in Machine Learning Most of the learning boils down to an optimization problem Attracts many mathematicians ◦ Main topic of my PhD. thesis ◦ Optimization has its own ecosystem in the machine learning community Treated as a black-box by a lot of practitioners ◦ Black-box gets faster and faster each year ◦ Optimization is a consideration only if .fit() has issues in learning Next: understanding some of the scenarios with such issues 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 11 Supervised learning 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 12 Main idea Ground truth CAT WORLD BREAD CAT DOG examples 19. April 2017 labels MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 13 Learning the true predictor Empirical Risk Minimization: ◦ Samples: ◦ ERM: ◦ Hopefully: Hypothesis class Loss function ◦ Usual form: Solve this! 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 14 How does optimization work? Intuition: Find the bottom of a valley in a dense fog using a teleport, where you can ask for the following information: ◦ 0th order info: the altitude at any location ◦ 1st order info: the slope at any location ◦ 2nd order info: the curvature at any location ◦… Most popular 0th order 19. April 2017 1st order 2nd order MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 15 Going down the hill – Gradient Descent By far the most popular iterative scheme: Intuition: We do a step down the hill ◦ Stepsize: in some cases given by theory, otherwise difficult to pick Too small 19. April 2017 Too big MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 16 Big data 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 17 Large-scale data Gradient descent step: #dimension partial derivatives 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA #examples functions 18 Gradient Descent Do the best we can using first order information Wrong step in a smart direction Stepsize: constant 4 Iteration cost depends on both dimension and number of examples! 19. April 2017 2 6 5 3 1 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 19 Randomized Coordinate Descent Update only one randomly chosen coordinate at a time Stepsize: constant for every dimension Iteration cost is independent of dimension! 19. April 2017 Smart step in a wrong direction 6 7 4 5 2 3 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 1 N E W S 20 Stochastic Gradient Descent Update using only one random example at a time Wrong step in a wrong direction 4 6 Stepsize: Decaying Iteration cost independent of the number of examples! 2 7 5 3 1 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 21 Magic of SGD explained Correct direction in expectation: Variance is not vanishing! Stepsize has to be decaying, otherwise SGD will not converge! For GD/RCD, there is no such issue, because: 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 22 Stochastic Variance Reduced Gradient A new method was proposed in 2013 to reduce the variance of SGD. Outer Loop (repeat forever): ◦ Store the current iterate as . Compute and store ◦ Inner Loop (repeat K times) . ◦ uniformly at random sample ◦ perform the update: Correct direction! We have 19. April 2017 . MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 23 Distributed problems 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 24 Distributed framework Node 1 Node K Node 2 ... Master node Communication is very expensive! 19. April 2017 Naïve Approach 1: Distributed GD Naïve Approach 2: One-shot averaging MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 25 Distributed convergence rates Standard convergence time measure: Distributed convergence time measure: Ideally similar 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 26 Distributed methods: Intuition Iterate over the following steps: 1. Compute the minimizers of the local objectives 2. Send the minimizers to the master node 3. Create a new local objective for each node based on the other minimizers 4. Distribute the local objectives back to the local nodes 19. April 2017 Local estimates of global objective MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 27 Complex objectives 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 28 Deep Learning / Neural Networks For a practitioner: The ultimate tool for machine learning For a mathematician: A nightmare (or the ultimate challenge) ◦ Optimization without any assumptions (except continuity) ◦ No real guarantees on generalization error ◦ No real guarantees on convergence A lot of attention – ¼ of 2500 papers submitted to NIPS 2016 Next: Understand Deep Learning better by understanding when it fails 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 29 Learning parities (Failures of Deep Learning, Ohad Shamir, 2017) TASK: learn a function, which outputs the parity of active entries in an unknown subset of coordinates of a vector, formally: ◦ Choose a vector ◦ For ◦ Learn define without any information on The task is realizable by a single-layer neural network with units In the following experiment we try to learn units 19. April 2017 using MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 30 Parities convergence (Failures of Deep Learning, Ohad Shamir, 2017) 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 31 Non-informative gradients Let be the loss corresponding to the parity problem given . Claim: Fix a point and consider all the gradients vectors . Their variance is upper-bounded by for all . It follows that for large dimensions, all the methods based only on gradient information fail to converge. A more general version of the above claim holds for linear functions composed with a periodic function (Shamir, 2016) 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 32 Objective function example (Distribution-specific Hardness of Learning Neural Networks, Ohad Shamir, 2016) 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 33 Final remarks Optimization is the backbone of machine learning ◦ One does not realize how important it is, until something goes wrong Optimization offers a lot of challenging problems ◦ Ideal for mathematically oriented people with applied tastes Optimization improves a lot by analyzing its failures ◦ “Learning from failures is the key to success”, (put here any name) ◦ Responsible for most of the modern advances in deep learning 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 34 Thank you for your attention! Feel free to contact me on [email protected] with any further questions! 19. April 2017 MLMU BRATISLAVA: THE ROLE OF OPTIMIZATION IN MACHINE LEARNING, DOMINIK CSIBA 35
© Copyright 2026 Paperzz