Plan Bayes` Theorem Bayes` Theorem Proof - cs.Virginia

cs6501: PoKER
Class 3:
Probabilistic
Reasoning
Spring 2010
University of Virginia
David Evans
Bayes’ Theorem
Plan
• One-line Proof of Bayes’ Theorem
• Inductive Learning
Home Game this Thursday, 7pm! (Game start: 7:15pm)
This is not an official course activity.
Email me by Wednesday afternoon if you are coming.
Bayes’ Theorem Proof
Inductive Learning
“Learning from Examples”
Input:
Machine
Learning
Learner
Output:
Limits of Induction
Deciding on h
Karl Popper, Science as Falsification, 1963.
It was the summer of 1919 that I began to feel more and more
dissatisfied with these three theories—the Marxist theory of
history, psycho-analysis, and individual psychology; and I began
to feel dubious about their claims to scientific status. My
problem perhaps first took the simple form, “What is wrong with
Marxism, psycho-analysis, and individual psychology? Why are
they so different from physical theories, from Newton's theory,
and especially from the theory of relativity?”
To make this contrast clear I should explain that few of us at the
time would have said that we believed in the truth of Einstein's
theory of gravitation. This shows that it was not my doubting the
truth of those three other theories which bothered me, but
something else.
One can sum up all this by saying that the criterion of the scientific
status of a theory is its falsifiability, or refutability, or testability.
• Many hypotheses fit the training data
• Depends on hypothesis space: what types of
functions
• Pick between simple hypothesis function (that
may not fit exactly) and complex one
• How many functions are there for
X: n bits, Y: 1 bit ?
Forms of Inductive Learning
Supervised Learning
Given:
Output:
hypothesis function
Unsupervised Learning (no explicit outputs)
Given:
Output:
clustering
Reinforcement Learning
Given:
Output:
Feedback:
First Reinforcement Learner (?)
Arthur Samuel. Some Studies in Machine Learning
Using the Game of Checkers. IBM Journal, 1959.
No feedback for individual decisions (output), but overall feedback.
Earlier inductive learning paper:
R. J. Solomonoff. An Inductive Inference Machine, 1956.
(and neural networks studied ealier)
Spam Filtering
Supervised Learning: Spam Filter
Message 1
Message N
Message 2
Not Spam
Not Spam
Spam
Learner
X-Sender-IP: 78.128.95.196
To: [email protected]
From: Nicole Cox <[email protected]>
Subject: Job Offer
Date: Thu, 22 Apr 2010 10:10:45 +0300 (EEST)
Feature Extraction
Dear David,
Do you want to participate in the greatest Mystery Shopping quests nationwide? Have you
ever wondered how Mystery Shoppers are recruited and how prosperous companies keep up
doing business in the highly competitive business world? The answer is that many companies
are recruiting young, creative, observant, and responsible individuals like you to give their
feedback on various products and customer services and thus improve their quality.
As a Mystery Shopper you have only one responsibility: Act as a real customer while
evaluating the place you are sent to mystery shop and enjoy all the benefits that go along with
your job. Remember that you have nothing to lose, because you are awarded generously for
your efforts:
-You get paid between $10 and $40 per hour for each mystery shopping assignment;
-You keep all things that you have purchased for free;
-You watch movies, eat in restaurants, and visit amusement parks for free;
-You are turning your most enjoyable hobby into a well-paying activity.
Be aware that as a Mystery Shopper you can earn on average $100 to $300 per week. The
Features
F1 = From: <someone in address book>
F2 = Subject: *FREE*
F3 = Body: *enlargement*
F4 = Date: <today>
F5 = Body: <contains URL>
F6 = To: [email protected]
…
Note: this assumes we
already know what the
features are! Need to
learn them.
Bayesian Filtering
Feature
Number of Spam
F1: From: <someone in address book>
Bayesian Filtering
Number of Ham
2
4052
3058
2
F3: Body: *enlargement*
253
1
F4: Date: <today>
304
5423
3630
263
…
…
F2: Subject: *FREE*
F5: Body: <contains URL>
…
Total messages: 4000 Spam / 6000 Ham
Feature
Number of Spam
F1: From: <someone in address book>
F2: Subject: *FREE*
Number of Ham
2
4052
3058
2
F3: Body: *enlargement*
253
1
F4: Date: <today>
304
5423
3630
263
…
…
F5: Body: <contains URL>
…
Total messages: 4000 Spam / 6000 Ham
Bayesian Filtering
Combining Probabilities
Naïve Bayesian Model: assume all features are
independent
Feature
F1: From: <someone in address book>
F2: Subject: *FREE*
F3: Body: *enlargement*
F4: Date: <today>
F5: Body: <contains URL>
…
Number of Spam
Number of Ham
2
4052
3058
2
253
1
304
5423
3630
263
…
…
Total messages: 4000 Spam / 6000 Ham
Combining Probabilities
Learning the Features
Naïve Bayesian Model: assume all features are
independent
Make every
<context, token>
pair a feature.
Learner
Which ones should
we keep?
Feature
Spam Likelihood
F1: Subject: *Poker*
0.03
F2: Subject: *FREE*
0.999
…
…
Bayesian Spam Filtering
Patrick Pantel and Dekang Lin. SpamCop: A
Spam Classification & Organization Program.
AAAI-98 Workshop on Text Classification,
1998.
Paul Graham. A Plan for Spam (2002), Better
Bayesian Filtering (2003)
SpamAssassin Bayesian Filter
Testing Learners
K-Fold Cross Validation
Training Data
Randomly Partition (usually 10 folds)
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Fold 1
Fold 2
Fold 3
Fold 4
Learner
Use k-1 folds for training, test on unused fold.
(repeat for each fold unused)
Fold 5
Score
Concerns
•
•
•
•
Adversarial Spam
Limits of Naïve Bayesian Model
How many features
Expressiveness of learned features
What if
Really Adversarial Spam
How Many Ways Can You Spell V1@gra?
• Player 1: Spammer
– Goal: Create a spam that tricks filter, or make filter
reject ham
600,426,974,379,824,381,952
• Player 2: Filter
– Not be tricked (but not reject ham messages)
Does this game have a Nash equilibrium?
Focused Attack
(try to filter particular
non-spam message)
Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph,
Benjamin I. P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, Kai Xia
Exploiting Machine Learning to Subvert Your Spam Filter. USENIX LEET 2008.
Hidden Markov Model
Hidden Markov Models
Viterbi Path
Finite State Machine
+ probabilities on transitions
+ hide the state
A
1/3
1
+ add observations and output
probabilities
Bet
1/3
Start
1
K
Given a sequence of observations, what is most likely sequence of states
Check
1/3
1/3
2/3
Q
Viterbi Algorithm (1967)
…
x(t-2)
x(t-1)
x(t)
y(t-2)
y(t-1)
y(t)
Key assumption: the most likely sequence
for time t depends only on
(1) the most likely sequence at time t-1
(2) y(t)
Viterbi Algorithm
Key assumption: the most likely sequence
for time t depends only on
(1) the most likely sequence at time t-1
(2) y(t)
Andrew Viterbi
This is true for first-order HMMs: transition probabilities depend only on current state.
Viterbi Algorithm
Applications of HMMs
• Noisy Transmission (Convolution Codes)
– Sequence of states: message to transmit
– Observations: received signal
• Speech Recognition
– Sequence of states: utterance
– Observations: recorded sound
• Bioinformatics
• Cryptanalysis
• etc.
Running time:
Charge
• So far we have assumed the state transition
and output probabilities are all known!
• Thursday’s Class: learning the HMM
If you are coming to the home game Thursday 7pm,
remember to email me by 5pm Wednesday.
Include if you need a ride or can drive other people.