Machine Learning Week 2 Yosi Kristian

WEEK 2
SOFT COMPUTING &
MACHINE LEARNING
YOSI KRISTIAN
Gradient Descent for Linear Regression
Gradient Descent
Single Variable Linear Regression
Gradient Descent
Have some function
Want
Outline:
• Start with some
• Keep changing
to reduce
until we hopefully end up at a minimum
Ilustration
J(0,1)
0
1
Ilustration
J(0,1)
0
1
The Algorithm
Gradient descent algorithm
Correct: Simultaneous update
Incorrect:
Algorithm Explained..
Gradient descent algorithm


 = Learning Rate
Following  = are the derivative
 effects..
If α is too small, gradient descent
can be slow.
If α is too large, gradient descent
can overshoot the minimum. It may
fail to converge, or even diverge.
Fixed ..
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.
As we approach a local
minimum, gradient descent
will automatically take
smaller steps. So, no need
to decrease α over time.
Applying Gradient Descent
for Linear Regresion
Gradient descent algorithm
Linear Regression Model
Gradient Descent Function..
Algorithm..
Gradient descent algorithm
update
and
simultaneously
Remember Local Minimum Problem.
J(0,1)
0
1
It Wont Happened Here..
“Batch” Gradient Descent
“Batch”: Each step of gradient descent
uses all the training examples.
Visualization
(for fixed
, this is a function of x)
(function of the parameters
)
Contd.
(for fixed
, this is a function of x)
(function of the parameters
)
Contd..
(for fixed
, this is a function of x)
(function of the parameters
)
Contd.
(for fixed
, this is a function of x)
(function of the parameters
)
Contd…
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
(for fixed
, this is a function of x)
(function of the parameters
)
Homework






Create a program to demonstrate Gradient Descent
usage on One Variable Linear Regression Problem.
Use Diamond Data.
Input : 1 variable
Output : 1 variable.
Visualize your program. (MSE, Line Regression)
Able to manually initialize 0 1
Linear Regression with multiple variables
Multiple features
Previously
Multiple Feature
Multiple features (variables).
Notation:
= number of features
= input (features) of
= value of feature
in
training example.
training example.
Hypothesis:
Previously:
Still Hypothesis…
For convenience of notation, define
Multivariate linear regression.
.
Linear Regression with multiple variables
Gradient descent for multiple variables linear
regression
Hypothesis: Symplified
Parameters:
Cost function:
Gradient descent:
Repeat
(simultaneously update for every
)
Gradient Descent
New algorithm
Repeat
:
Previously (n=1):
Repeat
(simultaneously update
)
(simultaneously update
)
for
Linear Regression with multiple variables
Gradient descent in practice I: Feature Scaling
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g.
= size (0-2000 feet2)
= number of bedrooms (1-5)
size (feet2)
number of bedrooms
Feature Scaling
Get every feature into approximately a
range.
Mean normalization
Replace
with
(Do not apply to
E.g.
to make features have approximately zero mean
).
Linear Regression with multiple variables
Choosing Learning Rate
Making sure gradient descent is working correctly.
Example automatic
convergence test:
0
100
200
300
No. of iterations
400
Declare convergence if
decreases by less than
in one iteration.
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller
.
No. of iterations
No. of iterations
-
No. of iterations
For sufficiently small ,
should decrease on every iteration.
But if
is too small, gradient descent can be slow to converge.
Summary:
- If is too small: slow convergence.
- If is too large:
may not decrease on
every iteration; may not converge.
To choose
, try
Homework







Create a program to demonstrate Gradient Descent
usage on Multiple Variable Linear Regression
Problem.
Use Housing Data.
Input : 2 variable
Output : 1 variable.
Able to manually initialize 0 1
 is customizable
Do the “Feature Scalling”
Linear Regression with multiple variables
Features and polynomial regression
Housing prices prediction
Polynomial regression
Price
(y)
Size (x)
Fin…
Finally …