Node centrality, data mining and machine learning concepts

Danny Hendler
Advanced Topics in on-line Social Networks Analysis
Social networks analysis seminar
Second introductory lecture
Presentation prepared by Yehonatan Cohen
Some of the slides based on the online book “Social media mining”, R. Zafarani, M. A. Abbasi & H. Liu.
Talk outline
 Node centrality
• Degree
• Eigenvector
• Closeness
• Betweeness
 Transitivity measures
 Data mining & machine learning concepts
 Decision trees
 Naïve Bayes classifier
Node centrality
Name the most central/significant node:
1
2
13
4
11
9
12
10
8
7
5
6
3
Node centrality (continued)
Name it now!
12
10
8
13
11
9
7
6
1
4
2
5
3
Node centrality: Applications

Detection of the most popular actors in a network 
Advertising

Identification of “super spreader” nodes 
Health care / Epidemics

Identify vulnerabilities in network structure 
Network design

…
Node centrality (continued)

What makes a node central?
• Number of connections
• It is central if its removal disconnects the graph
• High number of shortest paths passing through the node
• Proximity to all other nodes
• Central node is the one whose neighbors are central
• …
Degree centrality
 Degree centrality is the number of a node’s neighbours:
 Alternative definitions are possible
•
Take into account connection strengths
•
Take into account connection directions
•
…
Degree centrality: an example
12
13
10
11
8
9
7
6
1
4
2
5
3
Node
Degree
4
4
6
3
7
3
8
3
9
3
10
3
11
2
12
2
Eigenvector centrality
 Not all neighbours are equal
•
Popular ones (with high degree) should weigh more!
Eigenvector centrality
of node vi
Adjacency matrix
, where
Choosing the maximum eigenvalue guarantees all vector values are positive
Eigenvector centrality: an example
Closeness centrality
 If a node is central, it can reach other nodes “quickly”
•
Smaller average shortest paths
Average length of shortest
paths from v
, where
Closeness centrality: an example
12
13
10
11
8
9
7
6
1
4
5
3
Node
Closeness
4
0.353
6
0.438
7
0.444
8
0.4
9
0.428
10
0.342
11
2
12
Betweeness centrality
Betweeness centrality: an example
12
13
10
11
8
9
7
6
1
4
5
3
Node
Betweeness
4
30
6
39
7
36
8
21.5
9
7.5
10
20.5
11
2
12
Talk outline
 Node centrality
• Degree
• Eigenvector
• Closeness
• Betweeness
 Transitivity measures
 Data mining & machine learning concepts
 Decision trees
 Naïve Bayes classifier
Transitivity measures
 Link prediction: which links more likely to appear?
 Transitivity typical in social networks
 We need measures for such link-formation behaviour
(Global) Clustering Coefficient
3 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠
𝐶=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠
(Global) Clustering Coefficient
3 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠
𝐶=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠
(Global) Clustering Coefficient
3 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠
𝐶=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠
Triangles: {v1,v2,v3},{v1,v3,v4}
(Global) Clustering Coefficient
3 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠
𝐶=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠
Triangles: {v1,v2,v3},{v1,v3,v4}
Triplets: (v1,v2,v3),(v2,v3,v1),(v3,v1,v2)
(v1,v3,v4),(v3,v4,v1),(v4,v1,v3)
(v1,v2,v4),(v2,v3,v4)
(Global) Clustering Coefficient
3 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠
𝐶=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠
Triangles: {v1,v2,v3},{v1,v3,v4}
Triplets: (v1,v2,v3),(v2,v3,v1),(v3,v1,v2)
(v1,v3,v4),(v3,v4,v1),(v4,v1,v3)
(v1,v2,v4),(v2,v3,v4)
6
𝐶=
8
Local Clustering Coefficient
| 𝑒𝑗𝑘 : 𝑣𝑗 , 𝑣𝑘 ∈ 𝑁𝑖 , 𝑒𝑗𝑘 ∈ 𝐸 |
𝐶(𝑣𝑖) =
𝑘𝑖 (𝑘𝑖 −1)
Local Clustering Coefficient
| 𝑒𝑗𝑘 : 𝑣𝑗 , 𝑣𝑘 ∈ 𝑁𝑖 , 𝑒𝑗𝑘 ∈ 𝐸 |
𝐶(𝑣𝑖) =
𝑘𝑖 (𝑘𝑖 −1)
Number of
connected neighbors
Local Clustering Coefficient
| 𝑒𝑗𝑘 : 𝑣𝑗 , 𝑣𝑘 ∈ 𝑁𝑖 , 𝑒𝑗𝑘 ∈ 𝐸 |
𝐶(𝑣𝑖) =
𝑘𝑖 (𝑘𝑖 −1)
Number of
neighbor pairs
Number of
connected neighbors
Local Clustering Coefficient
| 𝑒𝑗𝑘 : 𝑣𝑗 , 𝑣𝑘 ∈ 𝑁𝑖 , 𝑒𝑗𝑘 ∈ 𝐸 |
𝐶(𝑣𝑖) =
𝑘𝑖 (𝑘𝑖 −1)/2
Number of
neighbor pairs
Number of
connected neighbors
Talk outline
 Node centrality
• Degree
• Eigenvector
• Closeness
• Betweeness
 Transitivity measures
 Data mining & machine learning concepts
 Decision trees
 Naïve Bayes classifier
Big Data

Data production rate dramatically increased
o
Social media data, mobile phone data, healthcare data, purchase data…
Image taken from “data science and prediction”, CACM, December 2013
Data mining/
Knowledge Discovery in DB (KDD)

Infer actionable knowledge/insights from data
o
o
o
When men buy diapers on Fridays, they also buy beer
Email spamming accounts tend to cluster in communities
Both love & hate drive reality ratings
Data mining/
Knowledge Discovery in DB (KDD)

Infer actionable knowledge/insights from data
o
o
o

When men buy diapers on Fridays, they also buy beer
Email spamming accounts tend to cluster in communities
Both love & hate drive reality ratings
Involves several tasks
o
o
o
o
o
o
Anomaly detection
Association rule learning
Classification
Regression
Summarization
Clustering
Data mining process
Data instances
Data instances (continued)
Unlabeled
example
Labeled
example
Predict whether an individual that visits an online
book seller will buy a specific book
Categories of ML algorithms
 Supervised Learning Algorithm
 Classification (class attribute is discrete)

Assign data into predefined classes
 Spam Detection, fraudulent credit card detection
 Regression (class attribute takes real values)

Predict a real value for a given data instance
 Predict the price for a given house
 Unsupervised Learning Algorithm
 Group similar items together into some clusters

Detect communities in a given social network
Supervised learning process
 We are given a set of labeled examples
 These examples are records/instances in the format (x, y)
where x is a vector and y is the class attribute, commonly a
scalar
 The supervised learning task is to build model that maps x to y
(find a mapping m such that m(x) = y)
 Given unlabeled instances (x’,?), we compute m(x’)
 E.g., fraud/non-fraud prediction
Talk outline
 Node centrality
• Degree
• Eigenvector
• Closeness
• Betweeness
 Transitivity measures
 Data mining & machine learning concepts
 Decision trees
 Naïve Bayes classifier
Decision tree learning - an example
Splitting Attributes
Refund
Yes
No
No
MarSt
Married
Single, Divorced
TaxInc
< 80K
No
Class labels
No
> 80K
Yes
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Cheat
Purity is measured by entropy
 Features selected based on set purity
 To measure purity we can use [minimize] entropy.
Over a subset of training instances, T, with a binary
class attribute (values in {+,-}), the entropy of T is
defined as:
 p+ is the proportion of positive examples in D
 p- is the proportion of negative examples in D
Entropy example
Assume there is a subset T, containing 10 instances. Seven
instances have a positive class attribute value and three have a
negative class attribute value [7+, 3-]. The entropy measure for
subset T is
What is the range of entropy values?
[0 , 1]
Pure
Balanced
Information gain (IG)
 We select the feature that is most useful in separating
between classes to be learnt, based on IG
 IG is the difference between the entropy of the parent
node and the average entropy of the child nodes
 We select the feature that maximizes IG
Information gain calculation example
Information gain calculation example
Information gain calculation example
Information gain calculation example
Information gain calculation example
Information gain calculation example
Information gain calculation example
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Cheat
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Married
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Married
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Married
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Married
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Single, Divorced
Married
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
NO
No
MarSt
Single, Divorced
Married
NO
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Yes
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Yes
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
> 80K
Yes
Model: Decision Tree
Decision tree construction: example
T id
Refund
Marital
status
Taxable
Income
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
Training Data
Splitting Attribute
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
< 80K
NO
> 80K
Yes
Model: Decision Tree
Decision tree construction: example
Taxable
Income
Splitting Attribute
T id
Refund
Marital
status
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
< 80K
10
No
Single
90K
Yes
NO
Training Data
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Yes
Model: Decision Tree
Decision tree construction: example
Taxable
Income
Splitting Attribute
T id
Refund
Marital
status
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
< 80K
10
No
Single
90K
Yes
NO
Training Data
Cheat
Refund
Yes
No
NO
MarSt
Married
Single, Divorced
TaxInc
NO
> 80K
Yes
Model: Decision Tree
Talk outline
 Node centrality
• Degree
• Eigenvector
• Closeness
• Betweeness
 Transitivity measures
 Data mining & machine learning concepts
 Decision trees
 Naïve Bayes classifier
Naïve Bayes' Classifier
Let Y represent the class variable with class values (𝑦1 , 𝑦2 ,…, 𝑦𝑛 )
Let 𝑋 = (𝑥1 , 𝑥2 ,…, 𝑥𝑚 ) be an unclassified instance (feature vector)
Naïve Bayes Classifier estimates: 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝑦𝑖 |𝑋)
𝑦𝑖
Naïve Bayes' Classifier
Let Y represent the class variable with class values (𝑦1 , 𝑦2 ,…, 𝑦𝑛 )
Let 𝑋 = (𝑥1 , 𝑥2 ,…, 𝑥𝑚 ) be an unclassified instance (feature vector)
Naïve Bayes Classifier estimates: 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝑦𝑖 |𝑋)
𝑦𝑖
From Bayes formula: 𝑃(𝑦𝑖 |𝑋) =
𝑃
𝑋 𝑦𝑖
𝑃(𝑦𝑖 )
𝑃(𝑋)
Naïve Bayes' Classifier
Let Y represent the class variable with class values (𝑦1 , 𝑦2 ,…, 𝑦𝑛 )
Let 𝑋 = (𝑥1 , 𝑥2 ,…, 𝑥𝑚 ) be an unclassified instance (feature vector)
Naïve Bayes Classifier estimates: 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝑦𝑖 |𝑋)
𝑦𝑖
From Bayes formula: 𝑃(𝑦𝑖 |𝑋) =
𝑃
𝑋 𝑦𝑖
𝑃(𝑦𝑖 )
𝑃(𝑋)
Assumption: 𝑃(𝑋|𝑦𝑖 )= 𝑚
𝑗=1 𝑃(𝑥𝑗 |𝑦𝑖 )
Naïve Bayes' Classifier
Let Y represent the class variable with class values (𝑦1 , 𝑦2 ,…, 𝑦𝑛 )
Let 𝑋 = (𝑥1 , 𝑥2 ,…, 𝑥𝑚 ) be an unclassified instance (feature vector)
Naïve Bayes Classifier estimates: 𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃(𝑦𝑖 |𝑋)
𝑦𝑖
From Bayes formula: 𝑃(𝑦𝑖 |𝑋) =
𝑃
𝑋 𝑦𝑖
𝑃(𝑦𝑖 )
𝑃(𝑋)
Assumption: 𝑃(𝑋|𝑦𝑖 )= 𝑚
𝑗=1 𝑃(𝑥𝑗 |𝑦𝑖 )
𝑃 𝑦𝑖 𝑋) =
(
𝑚
𝑗=1
(𝑃(𝑥𝑗 |𝑦𝑖 ) 𝑃(𝑦𝑖 ))
𝑃(𝑋)
Naïve Bayes' Classifier: example
Naïve Bayes' Classifier: example
X
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
>
Naïve Bayes' Classifier: example
𝑃 𝑋 𝑦𝑖 𝑃(𝑦𝑖 )
𝑃(𝑦𝑖 |𝑋) =
𝑃(𝑋)
>
𝑦(𝑖8) = N
Classification quality metrics
 Binary classification
 (Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn)
 yi {1,-1} - valued
 Classifier: provides class prediction Ŷ for an instance
 Outcomes for a prediction:
True class
Predicted
class
1
-1
1
True positive
(TP)
False positive
(FP)
-1
False negative
(FP)
True negative
(TN)
Classification quality metrics (cont'd)
 P(Ŷ = Y): accuracy (TP+TN)
 P(Ŷ = 1 | Y = 1): true positive rate/recall/sensitivity
 P(Ŷ = 1 | Y = -1): false positive rate
 P(Y = 1 | Ŷ = 1): precision (TP/(TP+FP))
True class
Predicted
class
1
-1
1
True positive
(TP)
False positive
(FP)
-1
False negative
(FP)
True negative
(TN)
Classification quality metrics: example
 Consider diagnostic test for a disease
 Test has 2 possible outcomes:
 ‘positive’ = suggesting presence of disease
 ‘negative’
 An individual can test either positive or negative
for the disease
Classification quality metrics: example
Individuals
without the
disease
Individuals
with
disease
Test Result
Machine Learning: Classification
Call these patients “negative”
Call these patients “positive”
Test Result
Machine Learning: Classification
Call these patients “negative”
Call these patients “positive”
True Positives
without the disease
with the disease
Test Result
Machine Learning: Classification
Call these patients “negative”
without the disease
with the disease
Call these patients “positive”
Test Result
False
Positives
Machine Learning: Classification
Call these patients “negative”
Call these patients “positive”
True
negatives
without the disease
with the disease
Test Result
Machine Learning: Classification
Call these patients “negative”
Call these patients “positive”
False
negatives
without the disease
with the disease
Test Result
Machine Learning: Cross-Validation


What if we don’t have enough data to set aside
a test dataset?
Cross-Validation:


Each data point is used both as train and test data.
Basic idea:




Fit model on 90% of the data; test on other 10%.
Now do this on a different 90/10 split.
Cycle through all 10 cases.
10 “folds” a common rule of thumb.
Machine Learning: Cross-Validation



Divide data into 10 equal
pieces P1…P10.
Fit 10 models, each on
90% of the data.
Each data point is
treated as an out-ofsample data point by
exactly one of the
models.
model
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
1
train
train
train
train
train
train
train
train
train
test
2
train
train
train
train
train
train
train
train
test
train
3
train
train
train
train
train
train
train
test
train
train
4
train
train
train
train
train
train
test
train
train
train
5
train
train
train
train
train
test
train
train
train
train
6
train
train
train
train
test
train
train
train
train
train
7
train
train
train
test
train
train
train
train
train
train
8
train
train
test
train
train
train
train
train
train
train
9
train
test
train
train
train
train
train
train
train
train
10
test
train
train
train
train
train
train
train
train
train