Theory and Application to Logistic Regression (Classification) Computational Details 1 • Recall the cost function of the Logistic Classifier: • There are different ways of minimizing , for example: – Gradient descent – Newton’s method 2 Newton’s Method of Minimizing • Recall the cost function of the Logistic Classifier: • Basic Calculus tells us that to find the point(s) at which a function is minimum, find the value(s) of the parameter such that , and ensuring that the point is a minimum (i.e., not a maximum or a saddle point). Find s.t. • To simplify the explanation, will consider • Initially: find • Later, find s.t. s.t. , rather than . . 3 Illustration of Newton's Method ) • Goal: find • Want to find the value of • Will use an iterative method to find • Consider s.t. . How to find ? for which . . as the initial value; then iterate as illustrated above, until is achieved. 4 ) • Update rule for iteration : 5 • Update rule for any iteration: 6 • Cost function: • Update rule for • Update rule for : : 7 • Update rule for • In the gradient descent and Newton’s algorithm, we are interested in updating each component of individually, and as such, the update rule for the component is: • Note: the 2nd order partial derivative means we need to take the derivative of : with respect to again, but in order to avoid confusing index , we use index 8 • Recall : 9 • Recall the non-vectorized expression for • What is the expression for : ? See slide 35 of Logistic Regression Slides 10 For each , where ⋯ 11 Vector Format of • Recall: for each • In vector format: for each For training example , scalar multiply each component of the vector . As determined in previous slides. 12 Verifying for each … 13 Matrix Format of • The 2nd order partial derivative means we need to take the derivative of respect to again, but in order to avoid confusing index , we use index with : for each • This is called the Hessian matrix, named after its inventor, German mathematician Ludwig Otto Hesse. 14 Matrix Format of • ∑ Substituting 1 into the matrix: … … … … … … … … for each 0… 1 1 1 1 1 1 … 1 1 1 1 … 1 … … … 1 … 1 1 1 1 … 1 1 15 Matrix Format of • Recall the outer product or matrix multiplication of two vectors: • The outer product 1 column vector and • Therefore, is equivalent to a matrix multiplication , provided that as a n × 1 column vector (which makes a row vector): is represented as a m × 16 • For the matrix: • The are scalars, i.e., • The forms an • For each training index, x( matrix, i.e., multiplies the th element of . 17 Scalar? • Consider: • Therefore, the following is a scaler: 18 Finally, in Vector Format 19 Gradient Descent vs Newton's Method Gradient Descent Newton’s Method Simpler ? Slightly more complex ? Needs choice of learning rate. No parameters. Needs more iterations to converge. Needs fewer iterations to converge. Order of complexity O(mxn) Order of complexity O(n3) Scales well with number of features Unable to handle very large n. 20 References 21
© Copyright 2026 Paperzz