Machine Learning
Tutorial 2: KNN+Decision Trees
Tutorial Tasks
Problems w.r.t.
Decision Trees
Suggestions for learning:
• KNN, probabilistic KNN : Barber, chapter 14
• NCA: original paper by Goldberg at al.
• Decision Trees: Murphy, chapter 16
Assume You are not yet fit in matrix calculus.
Can You do it („on foot“) in components ?
a
b
y
x > x1
x > x22
x21
x ≤ x1
x ≤ x22 x > x21
x31
x ≤ x21
x > x31
x > x32
x ≤ x32
x ≤ x31
x1
x
x22
x32
a
b
y
x > x1
x > x22
x > x33
x21
x ≤ x1
x ≤ x22 x > x21
x31
x ≤ x21
x > x32 x > x34 x > x31
x ≤ x31
x ≤ x32 x ≤ x34
x ≤ x43 x ≤ x42
x > x43 x > x
x > x41 x ≤ x41
x ≤ x33
42
x1
x41
x
x22
x34
x42
x32
x43
x33
a
b
x > x1
x > x33
x31
x ≤ x21
x > x32 x > x34 x > x31
x ≤ x31
x ≤ x32 x ≤ x34
x ≤ x43 x ≤ x42
x > x43 x > x
x > x41 x ≤ x41
x ≤ x33
42
x1
x21
x ≤ x1
x ≤ x22 x > x21
x > x22
y
x41
x34
x42
x
x22
x32
x43
x33
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
Stirling
𝑁
𝑁
numbers of
𝑁
𝑁
𝑋!
the 2nd kind
𝐸𝑋 =
𝑋 𝑝 𝑥 =
𝑋 𝑋 𝑁 𝑋
𝑁
𝑋=1
𝑋=1
𝑛
: number of ways to partition a set of 𝑛 elements into 𝑥 subsets
𝑥
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑋 𝑝 𝑥
𝑋=1
or
𝑁
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
how many possibilities do I have to draw N times out of a set with cardinality N
with replacement?
how many possibilities do I have to place N distinguishable balls in N
distinguishable boxes with no constraints?
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑁
𝑋 𝑝 𝑥
𝑋=1
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
how many possibilities do I have to place N distinguishable balls
in X distinguishable boxes in a surjective way?
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑁
𝑋 𝑝 𝑥
𝑋=1
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
how many possibilities do I have to place N distinguishable balls
in X distinguishable boxes in a surjective way?
each ball is marked with
the number f of the
respective draw
(𝑓 ∈ {1, … , 𝑁})
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑁
𝑋 𝑝 𝑥
𝑋=1
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
how many possibilities do I have to place N distinguishable balls
in X distinguishable boxes in a surjective way?
each box correpsonds to
an element X in 𝒟𝑇∗ 𝑖
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑁
𝑋 𝑝 𝑥
𝑋=1
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
each box (element 𝑥 ∈
𝑋 ) occurs at least once
how many possibilities do I have to place N distinguishable balls== no box is empty
in X distinguishable boxes in a surjective way?
== we have exactly X
distinct elements in 𝒟𝑇∗ 𝑖
and not at most X
our case (in denominator)
our case (in numerator)
aus Mayr DS 2005
Problem 6: Decision-Trees: Random Forests: Slide 25:
Let
𝐷 denote the number of features,
𝒟 denote the whole data-set,
𝒟𝑇 denote the whole training-set, 𝑁 = |𝒟𝑇 |
𝒟𝑇𝑖 denote the training set of tree 𝑇𝑖
In order to get 𝒟𝑇𝑖 ∶ draw 𝑁 samples from 𝒟𝑇 with replacement.
How many distinct samples do we get on average?
(in other words: what is the expected cardinality of 𝒟𝑇𝑖 ?)
(In still other words: equivalently: what is the percentage of 𝒟𝑇 that we are using on
average for 𝒟𝑇𝑖 ?)
Let X denote the random variable, representing the number of distinct elements of the
resulting multi-set 𝒟𝑇∗ 𝑖 of cardinality 𝑁 when drawing 𝑁 times with replacement from 𝒟𝑇 .
𝑁
𝐸𝑋 =
𝑁
𝑋 𝑝 𝑥
𝑋=1
=
𝑋=1
𝑁
𝑁
𝑋!
𝑋 𝑋 𝑁 𝑋
𝑁
mapping from boxes to actual training examples: how many possibilities do I have
to determine X training examples out of N available.
𝑁
Regard: We must use the unordered
, because the ordering has been taken
𝑋
care of when determing the sequence of distinguishable boxes (factor X! above)
Problem 7: For the following case with categorial features derive a
decision tree, using Entropy as impurity measure (i.e. use Quinlan‘s ID3)
solution source: Tom Mitchell (Carnegie Mellon): ML,
Spring 2011
Problems w.r.t. KNN
𝑑 𝑥, 𝑦 =
or 𝑑 𝑥, 𝑦 = 𝒙 − 𝒚 𝑇 𝚺 (𝒙 − 𝒚)
2
2
2
Let 𝜎 𝑖 be the scale of feature i.
1
1
1
Then we then want to have Σ = 𝑑𝑖𝑎𝑔(𝜎 2 , 𝜎 2 ,..., 𝜎 2)
1
2
𝑛
© Copyright 2026 Paperzz