Lecture3neuralnetsf15part1

Logistic Regression
Gary Cottrell
A generalization of the linear
discriminant:

A monotonic activation function g():

This is still considered a linear discriminant,
because if g is monotonic, the boundary will
still be linear (even if it “ramps up”)
To motivate this, imagine two gaussiandistributed categories with equal variance

CSE 190 Lecture 3
9/30/15
2
Gaussian probability density
functions:
By
Bayes’ rule:
And
since these have to sum to 1 (if there are only two
classes), the denominator is the sum of the numerators:
CSE 190 Lecture 3
9/30/15
3
A somewhat counterintuitive
derivation:

Call these terms A and B, we have:
Stop here to
plot the logistic

In other words, the probability of class 1 follows a
sigmoid as a function of the log ratio of the probability of
class C1 to the probability of class C2.
CSE 190 Lecture 3
9/30/15
4
The logistic activation
function:
Allows us to interpret the output as posterior
probabilities – the probability of category
C1 given x.
Note: a can be written as (and there is a generalization to multidimensional gaussians
Where w and x are vectors)
CSE 190 Lecture 3
9/30/15
5



That’s nice, but, we don’t
know the data is gaussian,
and we want to learn the
weights
What to do?
We are going to use something called the
Maximum Likelihood Principle.
The MLP says, “set your parameters to
that they maximize the probability of your
training data.”
CSE 190 Lecture 3
9/30/15
6
Motivating Example

(on board – gaussian model of grades)
CSE 190 Lecture 3
9/30/15
7
What does this mean for us?

We are trying to learn a mapping:
We have a training set of (x,t) pairs, where t=1
means x is in Category 1, and t=0 means
category 2.
 A complete model of this would be to find the
distribution that best models p(x,t) – the joint
probability of the data.
 We define the likelihood of the data as:

CSE 190 Lecture 3
9/30/15
8
What does this mean for us?

The likelihood of the data is indexed by
our parameters θ – in the gaussian
example, this would be μ and σ

So now, the Maximum Likelihood Principle
says we should choose our parameters
as:
CSE 190 Lecture 3
9/30/15
9