Generalization in Supervised Machine Learning BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist Hypothetical Knapsack of Coins: Copper and Gold Coins Total number of coins is fixed and is a large sample. Capture-Recapture What is the proportion of Gold coins? Copper and Gold Coins Total number of coins is variable and is a large sample. Capture-Recapture What is the proportion of Gold coins? BASIC ML/STAT TERMINOLOGY: Xn´ p : Features, Predictors Yn´k : Labels, Responses 190 Years after Gauss, the core problem of prediction remains an active problem : Xn´ p : Features, Predictors Yn´k : Labels, Responses Then: Now: || Y - Xb ||2 n E(Yi ) = g (å fi (Xi )) -1 i=1 190 Years after Gauss, the core problem of prediction remains an active problem : Xn´ p : Features, Predictors Yn´k : Labels, Responses Find a mapping♯ from the features: q is a list of parameters, required to represent the function #Approximation What is Supervised Learning? Existing Features Loss Function f : X ®Y Known Labels Assumptions Unavailable Features Loss Function Unknown Labels Evaluating the Learned Function: Loss Function quantifies the error in the approximation. Learn a mapping by optimizing the loss. L( fq (X),Y) Example: ( fq (X) - Y) 2 Predictions with varying parameters: Predictions with varying parameters: How do we generalize? Generalization and Predictability X,Y ~ D Empirical Risk Minimization: 1 n L( fq (Xi . ),Yi . ) å n i =1 Empirical Risk is the average (expected) loss on seen data. True Risk is the expected risk on the process generating the X,Y pairs. True Risk Minimization: PARAMETRIC CHARACTERIZATION OF THE MAPPING : 2d-Linear function: Slope, Intercept Cubic Spline: Number of knots, Location of Knots Nearest-Neighbor regression: Number of neighbors Lasso: L1-L2 Weights Support Vector Machines: Kernel width, Margin Length Random Forests: Resampling sample size Long list of available Supervised Learning Techniques. Most of the techniques have tuning parameters. We can minimize out-of-sample performance by tuning the technique with optimal parameters. Tuning can be performed by cross-validation over a discrete grid of parameter combinations. CURSE OF DIMENSIONALITYFlat World-10D World: CURSE OF DIMENSIONALITYFlat World-10D World: CURSE OF DIMENSIONALITYFlat World-10D World: CURSE OF DIMENSIONALITYLet us validate: Structural Risk Minimization via Regularization: 1 n L( f Linear (Xi ),Yi ) + l å || bi ||1 å n i =1 Geneology? Brief Description Technology Overview Hiring (What we’re looking for) http://blinqmedia.com/contact/job-openings/ Lets work with Abalone Thank You!
© Copyright 2026 Paperzz