Learning From Data Lecture 6 Bounding The Growth Function

Learning From Data
Lecture 6
Bounding The Growth Function
Bounding the Growth Function
Models are either Good or Bad
The VC Bound - replacing |H| with mH (N )
M. Magdon-Ismail
CSCI 4100/6100
recap:
The Growth Function mH(N )
A new measure for the diversity of a hypothesis set.
H(x1, . . . , xN ) = {(h(x1), . . . , h(xN ))}
The dichotomies (N -tuples) H implements on x1, . . . , xN .
H
H viewed through D
The growth function mH(N ) considers the worst possible x1, . . . , xN .
mH (N ) = max |H(x1, . . . , xN )|.
x1 ,...,xN
This lecture: Can we bound mH (N ) by a polynomial in N ?
Can we replace |H| by mH (N ) in the generalization bound?
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 2 /31
Example growth functions −→
Example Growth Functions
N
4
5
···
1
2
3
2-D perceptron
2
4
8
14
···
1-D pos. ray
2
3
4
5
···
2-D pos. rectangles
2
4
8
16
< 25 · · ·
• mH(N ) drops below 2N – there is hope.
• A break point is any k for which mH (k) < 2k .
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 3 /31
Quiz I −→
Pop Quiz I
∗
I give you a set of k ∗ points x1, . . . , xk∗ on which H implements < 2k dichotomys.
(a) k ∗ is a break point.
(b) k ∗ is not a break point.
(c) all break points are > k ∗.
(d) all break points are ≤ k ∗.
(e) we don’t know anything about break points.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 4 /31
Answer −→
Pop Quiz I
∗
I give you a set of k ∗ points x1, . . . , xk∗ on which H implements < 2k dichotomys.
(a) k ∗ is a break point.
(b) k ∗ is not a break point.
(c) all break points are > k ∗.
(d) all break points are ≤ k ∗.
X (e) we don’t know anything about break points.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 5 /31
Quiz II −→
Pop Quiz II
∗
For every set of k ∗ points x1, . . . , xk∗ , H implements < 2k dichotomys.
(a) k ∗ is a break point.
(b) k ∗ is not a break point.
(c) all k ≥ k ∗ are break points.
(d) all k < k ∗ are break points.
(e) we don’t know anything about break points.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 6 /31
Answer −→
Pop Quiz II
∗
For every set of k ∗ points x1, . . . , xk∗ , H implements < 2k dichotomys.
X (a) k ∗ is a break point.
(b) k ∗ is not a break point.
X (c) all k ≥ k ∗ are break points.
(d) all k < k ∗ are break points.
(e) we don’t know anything about break points.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 7 /31
Quiz III −→
Pop Quiz III
To show that k is not a break point for H:
(a) Show a set of k points x1, . . . xk which H can shatter.
(b) Show H can shatter any set of k points.
(c) Show a set of k points x1, . . . xk which H cannot shatter.
(d) Show H cannot shatter any set of k points.
(e) Show mH(k) = 2k .
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 8 /31
Answer −→
Pop Quiz III
To show that k is not a break point for H:
X (a) Show a set of k points x1, . . . xk which H can shatter.
overkill
(b) Show H can shatter any set of k points.
(c) Show a set of k points x1, . . . xk which H cannot shatter.
(d) Show H cannot shatter any set of k points.
X (e) Show mH (k) = 2k .
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 9 /31
Quiz IV −→
Pop Quiz IV
To show that k is a break point for H:
(a) Show a set of k points x1, . . . xk which H can shatter.
(b) Show H can shatter any set of k points.
(c) Show a set of k points x1, . . . xk which H cannot shatter.
(d) Show H cannot shatter any set of k points.
(e) Show mH (k) > 2k .
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 10 /31
Answer −→
Pop Quiz IV
To show that k is a break point for H:
(a) Show a set of k points x1, . . . xk which H can shatter.
(b) Show H can shatter any set of k points.
(c) Show a set of k points x1, . . . xk which H cannot shatter.
X (d) Show H cannot shatter any set of k points.
(e) Show mH (k) > 2k .
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 11 /31
Combinatorial puzzle again −→
Back to Our Combinatorial Puzzle
How many dichotomies can you list on 4 points so that no 2 is shattered.
x1
◦
◦
◦
◦
•
x2
◦
◦
◦
•
◦
x3
◦
◦
•
◦
◦
x4
◦
•
◦
◦
◦
Can we add a 6th dichotomy?
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 12 /31
Can’t add a 6th dichotomy −→
Can’t Add A 6th Dichotomy
x1
◦
◦
◦
◦
•
◦
c AM
L Creator: Malik Magdon-Ismail
x2
◦
◦
◦
•
◦
•
x3
◦
◦
•
◦
◦
•
x4
◦
•
◦
◦
◦
◦
Bounding the Growth Function: 13 /31
B(N, K) −→
The Combinatorial Quantity B(N, k)
How many dichotomies can you list on 4 points so that no 2 are shattered.
↑
↑
N
k
B(N , k): Max. number of dichotomys on N points so that no k are shattered.
x1
◦
◦
◦
•
x2
◦
◦
•
◦
x3
◦
•
◦
◦
B(3, 2) = 4
c AM
L Creator: Malik Magdon-Ismail
x1
◦
◦
◦
◦
•
x2
◦
◦
◦
•
◦
x3
◦
◦
•
◦
◦
x4
◦
•
◦
◦
◦
B(4, 2) = 5
Bounding the Growth Function: 14 /31
B(4, 3) −→
Let’s Try To Bound B(4, 3)
How many dichotomies can you list on 4 points so that no subset of 3 is shattered.
x1
◦
◦
◦
◦
•
◦
◦
•
◦
•
•
c AM
L Creator: Malik Magdon-Ismail
x2
◦
◦
◦
•
◦
◦
•
◦
•
◦
•
x3
◦
◦
•
◦
◦
•
◦
◦
•
•
◦
x4
◦
•
◦
◦
◦
•
•
•
◦
◦
◦
Bounding the Growth Function: 15 /31
Two kinds of dichotomys −→
Two Kinds of Dichotomys
Prefix appears once or prefix appears twice.
x1
◦
◦
◦
◦
•
◦
◦
•
◦
•
•
c AM
L Creator: Malik Magdon-Ismail
x2
◦
◦
◦
•
◦
◦
•
◦
•
◦
•
x3
◦
◦
•
◦
◦
•
◦
◦
•
•
◦
x4
◦
•
◦
◦
◦
•
•
•
◦
◦
◦
Bounding the Growth Function: 16 /31
Reorder the dichotomys −→
Reorder the Dichotomys
α
β
β
c AM
L Creator: Malik Magdon-Ismail
x1
◦
•
•
◦
◦
◦
•
◦
◦
◦
•
x2
•
◦
•
◦
◦
•
◦
◦
◦
•
◦
x3
•
•
◦
◦
•
◦
◦
◦
•
◦
◦
x4
◦
◦
◦
◦
◦
◦
◦
•
•
•
•
α: prefix appears once
β: prefix appears twice
B(4, 3) = α + 2β
Bounding the Growth Function: 17 /31
Bound for α + β −→
First, Bound α + β
α
β
β
c AM
L Creator: Malik Magdon-Ismail
x1
◦
•
•
◦
◦
◦
•
◦
◦
◦
•
x2
•
◦
•
◦
◦
•
◦
◦
◦
•
◦
x3
•
•
◦
◦
•
◦
◦
◦
•
◦
◦
x4
◦
◦
◦
◦
◦
◦
◦
•
•
•
•
α + β ≤ B(3, 3)
↑
A list on 3 points, with no 3 shattered (why?)
Bounding the Growth Function: 18 /31
Bound for β −→
Second, Bound β
α
β
β
c AM
L Creator: Malik Magdon-Ismail
x1
◦
•
•
◦
◦
◦
•
◦
◦
◦
•
x2
•
◦
•
◦
◦
•
◦
◦
◦
•
◦
x3
•
•
◦
◦
•
◦
◦
◦
•
◦
◦
x4
◦
◦
◦
◦
◦
◦
◦
•
•
•
•
β ≤ B(3, 2)
↑
If 2 points are shattered, then using the mirror dichotomies you shatter 3 points (why?)
Bounding the Growth Function: 19 /31
Combine the bounds −→
Combining to Bound α + 2β
α
β
β
c AM
L Creator: Malik Magdon-Ismail
x1
◦
•
•
◦
◦
◦
•
◦
◦
◦
•
x2
•
◦
•
◦
◦
•
◦
◦
◦
•
◦
x3
•
•
◦
◦
•
◦
◦
◦
•
◦
◦
x4
◦
◦
◦
◦
◦
◦
◦
•
•
•
•
B(4, 3) = α + β + β
≤ B(3, 3) + B(3, 2)
The argument generalizes to (N , k)
B(N , k) ≤ B(N − 1, k) + B(N − 1, k − 1)
Bounding the Growth Function: 20 /31
Simple boundary cases −→
Boundary Cases: B(N , 1) and B(N , N )
1
N
1
1
2
1
3
1
4
1
5
1
6
..
1
..
c AM
L Creator: Malik Magdon-Ismail
2
3
k
4
5
6
···
3
B(N , 1) = 1
(why?)
7
B(N , N ) = 2N − 1
15
(why?)
31
63
...
Bounding the Growth Function: 21 /31
Getting B(3, 2) −→
Recursion Gives B(N , k) Bound
B(N , k) ≤ B(N − 1, k) + B(N − 1, k − 1)
1
N
c AM
L Creator: Malik Magdon-Ismail
1
1
2
1
3
1
4
1
5
1
6
..
1
..
2
3
k
4
5
6
···
63
..
...
3
ց ↓
4
7
15
31
..
..
..
..
Bounding the Growth Function: 22 /31
Filling the table −→
Recursion Gives B(N , k) Bound
B(N , k) ≤ B(N − 1, k) + B(N − 1, k − 1)
1
N
c AM
L Creator: Malik Magdon-Ismail
2
3
k
4
5
1
1
2
1
3
3
1
4
7
4
1
5
11
15
5
1
6
16
26
31
6
..
1
..
7
..
22
..
42
..
57
..
Bounding the Growth Function: 23 /31
6
···
63
..
...
B(N , k) ≤
k−1
X
i=0
N
i
−→
Analytic Bound for B(N , k)
Theorem.
B(N , k) ≤
k−1 X
N
i=0
i
.
Proof: (Induction on N.)
1. Verify for N = 1: B(1, 1) ≤
k−1
P N
2. Suppose B(N, k) ≤
.
i
i=0
N
Lemma. Nk + k−1
= Nk+1 .
1
0
=1
X
B(N + 1, k) ≤ B(N, k) + B(N, k − 1)
k−1
P N k−2
P N
≤
+
i
i
=
i=0
k−1
P
i=0
= 1+
= 1+
N
i
+
i=1
k−1
P
i=1
k−1
P
N
i
k−1
P
i=0
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 24 /31
N +1
i
N +1
i
i=1
=
i=0
k−1
P
+
N
i−1
N
i−1
(lemma)
mH (N ) ≤ B(N, k) −→
mH(N ) is bounded by B(N, k)!
Theorem. Suppose that H has a break point at k. Then,
mH (N ) ≤ B(N, k).
x1
◦
◦
◦
◦
•
◦
◦
..
x2
◦
◦
◦
•
◦
◦
•
..
x3
◦
◦
•
◦
◦
•
◦
..
c AM
L Creator: Malik Magdon-Ismail
x4
◦
•
◦
◦
◦
•
•
..
. . . xN
... •
... ◦
... ◦
... ◦
... •
... •
... ◦
. . . ..
Consider any k points.
They cannot be shattered (otherwise k woud not be a break point).
B(N, k) is largest such list.
mH (N ) ≤ B(N, k)
Bounding the Growth Function: 25 /31
Once broken, forever polynomial −→
Once bitten, twice shy . . .
Once Broken, Forever Polynomial
Theorem. If k is any break point for H, so mH (k) < 2k , then
k−1 X
N
.
mH (N ) ≤
i
i=0
Facts (Problems 2.5 and 2.6):
k−1 X
N
i=0
i
≤


N k−1 + 1



k−1

eN



k−1
(polynomial in N )
This is huge: if we can replace |H| with mH (N ) in the bound, then learning is feasible.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 26 /31
There’s good, bad, no ugly −→
A Hypothesis Set is either Good and Bad
log mH (N )
the bad H
the ugly H
the good H
N
1
2
3
2-D perceptron
2
4
8
1-D pos. ray
2
3
2-D pos. rectangles
2
4
c AM
L Creator: Malik Magdon-Ismail
N
4
mH (N )
5
···
14
···
···
≤ N3 + 1
4
5
···
···
≤ N1 + 1
8
16
< 25
···
≤ N4 + 1
Bounding the Growth Function: 27 /31
We have a bound on mH ; next: |H| ← mH ? −→
We have One Step in the Puzzle
X Can we get a polynomial bound on mH (N ) even for infinite H?
X Can we replace |H| with mH (N ) in the generalization bound?
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 28 /31
Ghost ‘test’ set D′ represents Eout −→
Eout
ր
Age
ց
Age
′
Ein
Income
The ghost data set: a ‘fictitious’ data set D ′:
Income
Ein
Income
(i) How to Deal With Eout (Sketch)
Probability distribution
′
of Ein, Ein
Age
′
Ein
is like a test error on N new points.
′
Ein deviates from Eout implies Ein deviates from Ein
.
′
Ein and Ein
have the same distribution.
′
Ein
Ein
′
P[(Ein
(g), Ein(g)) “deviate”] ≥
1
2
P [(Eout(g), Ein(g)) “deviate”]
Eout
We can analyze deviations between two in-sample errors.
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 29 /31
D ∪ D′ =⇒ mH (2N ) −→
(ii) Real Plus Ghost Data Set = 2N points
x1 x2 x3 . . . xN xN +1 xN +2 xN +3 . . . x2N
◦ ◦ • ... ◦
•
•
◦ ... ◦
Number of dichotomys is at most mH (2N ).
Up to technical details, analyze a “hypothesis set” of size at most mH (2N ).
c AM
L Creator: Malik Magdon-Ismail
Bounding the Growth Function: 30 /31
The VC-Bound −→
The Vapnik-Chervonenkis Bound (VC Bound)
P [|Ein(g) − Eout(g)| > ǫ] ≤ 4mH(2N )e−ǫ
2
N/8
,
P [|Ein(g) − Eout(g)| ≤ ǫ] ≥ 1 − 4mH(2N )e−ǫ
Eout(g) ≤ Ein(g) +
c AM
L Creator: Malik Magdon-Ismail
q
8
N
log 4mHδ(2N ) ,
Bounding the Growth Function: 31 /31
2
for any ǫ > 0.
N/8
,
for any ǫ > 0.
w.p. at least 1 − δ.