Lecture notes

Classification and Clustering CS102 May 10 Supervised versus Unsupervised Machine Learning ●
●
●
Supervised:
Create a model from wellunderstood training data, use it for inference or prediction about other data. Examples: regression, classification Unsupervised:
Try to understand the data, look for patterns or structure. Examples: data mining, clustering Also inbetween approaches, such as
semisupervised
and
active learning Classification Goal: Given a set of
feature values
(often just called
features
) for an item not seen before, decide which one of a set of predefined categories the item belongs to. Supervised, training data is set of items with both features and categories (
labeled
data) Examples: ●
●
●
Customer purchases features:
age,income,gender,zipcode,profession
; categories: likelihood of buying (
high,medium,low
) Medical diagnosis features:
age,gender,history,symptom1severity, symptom2severity,test1result,test2result
; categories:
diagnosis Fraud detection in online purchases features:
item,volume,price,shipping, address
; categories:
fraud
or
okay Classification using knearest neighbors (kNN) For any pair of items i
,i
and their features, can compute
distance(
i
,i
) 1
2
1
2
●
Example: combine numeric measures using equality for
gender, profession;
absolute difference for
age, income
; proximity for
zipcode To classify a new item
i
: Find
k
closest items to
i
, assign most frequent category Weather data: ● Categories based on temperature (
frigid,cold,cool,comfy,warm
) ● Predict category based on latitude and longitude ● Distance is Euclidean distance in twodimensional space ● Show
TempsCat
spreadsheet,
LatLongCats
scatterplot ● Try two cities: Dallas, Texas (long 96.8, lat 32.8); Davenport, Iowa (long 90.6, lat 41.5) Truth: Dallas comfy, Davenport cold Can also use for regression via average values: ● Customer purchases predict dollars spent ● Weather predict temperature from latitude and longitude ● Show
LatLongTemps
scatterplot Truth: Dallas temp 34, Davenport temp 13 Note: No hidden complicated math! Once distance function is defined, rest is easy (though not necessarily efficient). Classification using decision tree Yes/no or partition questions over feature values Navigate to bottom of the tree, find category ●
●
Customer purchases gender split, then age partitions, then income, categories on leaves. Show classification of new customer. Weather Show
TempsCat
, want to predict category from other “features”. Speculate latitude as most discriminating (sort on lat), but what next? Primary challenge is in building “good” trees from training data Classification using Naive Bayes Conditional independence assumption:
Given two features X
and X
and a category Y, the 1
2
probability that X
=v
for an item in category Y is independent of the probability that X
=v
for 1
1
2
2
that item. (Customer purchases example: might be true for
gender
and
age
, unlikely to be true for
income
and
zipcode
.) This assumption is what makes the approach “naive”. Algorithm: ● Calculate from training data: 1. Fraction (probability) of items in each category 2. For each category, fraction (probability) of items in that category with X=v for each feature X and value v ● To classify a new item
i
: for each category Y compute: probability of item
i
being in category Y (1) times probability of item
i
being in category Y given
i
’s feature values (product of 2’s). Pick the category with the highest result. Example: Predict temperature category from region and coastal. Show
Bayes.csv Coastal city in Northeast
, probabilities: warm: 0.1 * 1 * 0 = 0 comfy: 0.27 * 0.8 * 0 = 0 cool: 0.28 * 0.44 * 0.13 = 0.016 cold: 0.25 * 0.29 * 0.15 = 0.011 frigid: 0.09 * 0 * 0.2 = 0 Noncoastal city in Southatlantic
, probabilities: warm: 0.1 * 0 * 0.5 = 0 comfy: 0.27 * 0.2 * 0.41 = 0.022 cool: 0.28 * 0.56 * 0.13 = 0.020 cold: 0.25 * 0.71 * 0 = 0 frigid: 0.09 * 1 * 0 = 0 Some hidden math/logic but mostly just arithmetic. Disclaimer: Most presentations of the Naive Bayes algorithm (including the readings linked to the website) extend algorithm step two: “... for each category Y compute: probability of being in category Y (1) times probability of being in category Y given feature values (product of 2’s), divided by probability of those feature values
. Pick the category with the highest result. Since the last value doesn’t change across categories, the division won’t change the outcome. Classification using logistic regression Recall: Logistic regression uses training data to compute function f(x
,...,x
) that gives 1
n1
probability of result x
being “yes”. Generalize to probabilities of x
falling in predefined set of n
n
categories. As with Naive Bayes, classify x
as belonging to category with highest probability. n
Lots of hidden math. Clustering Again multidimensional feature space, distance metric between items Goal: Partition set of items into
k
groups (clusters) such that items within groups are “close to each other” Unsupervised, no training data Clustering using kmeans ●
●
●
Each group has a mean value For each item compute square of distance from mean value for its group Goal is to find
k
groups that minimize sum of those squares Weather example: Cluster cities into six groups based on latitude/longitude Distance is Euclidean distance in twodimensional space Show:
Points.jpg
,
Clusters.jpg
,
ClusterMeans.jpg Note clusters need not be of similar sizes. As with knearest neighbors, no hidden complicated math, but efficient algorithm is nontrivial (actually, impossible).

Download Report

Lecture notes

Paperzz.com

Your Paperzz