cs6501: PoKER Class 3: Probabilistic Reasoning Spring 2010 University of Virginia David Evans Bayes’ Theorem Plan • One-line Proof of Bayes’ Theorem • Inductive Learning Home Game this Thursday, 7pm! (Game start: 7:15pm) This is not an official course activity. Email me by Wednesday afternoon if you are coming. Bayes’ Theorem Proof Inductive Learning “Learning from Examples” Input: Machine Learning Learner Output: Limits of Induction Deciding on h Karl Popper, Science as Falsification, 1963. It was the summer of 1919 that I began to feel more and more dissatisfied with these three theories—the Marxist theory of history, psycho-analysis, and individual psychology; and I began to feel dubious about their claims to scientific status. My problem perhaps first took the simple form, “What is wrong with Marxism, psycho-analysis, and individual psychology? Why are they so different from physical theories, from Newton's theory, and especially from the theory of relativity?” To make this contrast clear I should explain that few of us at the time would have said that we believed in the truth of Einstein's theory of gravitation. This shows that it was not my doubting the truth of those three other theories which bothered me, but something else. One can sum up all this by saying that the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability. • Many hypotheses fit the training data • Depends on hypothesis space: what types of functions • Pick between simple hypothesis function (that may not fit exactly) and complex one • How many functions are there for X: n bits, Y: 1 bit ? Forms of Inductive Learning Supervised Learning Given: Output: hypothesis function Unsupervised Learning (no explicit outputs) Given: Output: clustering Reinforcement Learning Given: Output: Feedback: First Reinforcement Learner (?) Arthur Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal, 1959. No feedback for individual decisions (output), but overall feedback. Earlier inductive learning paper: R. J. Solomonoff. An Inductive Inference Machine, 1956. (and neural networks studied ealier) Spam Filtering Supervised Learning: Spam Filter Message 1 Message N Message 2 Not Spam Not Spam Spam Learner X-Sender-IP: 78.128.95.196 To: [email protected] From: Nicole Cox <[email protected]> Subject: Job Offer Date: Thu, 22 Apr 2010 10:10:45 +0300 (EEST) Feature Extraction Dear David, Do you want to participate in the greatest Mystery Shopping quests nationwide? Have you ever wondered how Mystery Shoppers are recruited and how prosperous companies keep up doing business in the highly competitive business world? The answer is that many companies are recruiting young, creative, observant, and responsible individuals like you to give their feedback on various products and customer services and thus improve their quality. As a Mystery Shopper you have only one responsibility: Act as a real customer while evaluating the place you are sent to mystery shop and enjoy all the benefits that go along with your job. Remember that you have nothing to lose, because you are awarded generously for your efforts: -You get paid between $10 and $40 per hour for each mystery shopping assignment; -You keep all things that you have purchased for free; -You watch movies, eat in restaurants, and visit amusement parks for free; -You are turning your most enjoyable hobby into a well-paying activity. Be aware that as a Mystery Shopper you can earn on average $100 to $300 per week. The Features F1 = From: <someone in address book> F2 = Subject: *FREE* F3 = Body: *enlargement* F4 = Date: <today> F5 = Body: <contains URL> F6 = To: [email protected] … Note: this assumes we already know what the features are! Need to learn them. Bayesian Filtering Feature Number of Spam F1: From: <someone in address book> Bayesian Filtering Number of Ham 2 4052 3058 2 F3: Body: *enlargement* 253 1 F4: Date: <today> 304 5423 3630 263 … … F2: Subject: *FREE* F5: Body: <contains URL> … Total messages: 4000 Spam / 6000 Ham Feature Number of Spam F1: From: <someone in address book> F2: Subject: *FREE* Number of Ham 2 4052 3058 2 F3: Body: *enlargement* 253 1 F4: Date: <today> 304 5423 3630 263 … … F5: Body: <contains URL> … Total messages: 4000 Spam / 6000 Ham Bayesian Filtering Combining Probabilities Naïve Bayesian Model: assume all features are independent Feature F1: From: <someone in address book> F2: Subject: *FREE* F3: Body: *enlargement* F4: Date: <today> F5: Body: <contains URL> … Number of Spam Number of Ham 2 4052 3058 2 253 1 304 5423 3630 263 … … Total messages: 4000 Spam / 6000 Ham Combining Probabilities Learning the Features Naïve Bayesian Model: assume all features are independent Make every <context, token> pair a feature. Learner Which ones should we keep? Feature Spam Likelihood F1: Subject: *Poker* 0.03 F2: Subject: *FREE* 0.999 … … Bayesian Spam Filtering Patrick Pantel and Dekang Lin. SpamCop: A Spam Classification & Organization Program. AAAI-98 Workshop on Text Classification, 1998. Paul Graham. A Plan for Spam (2002), Better Bayesian Filtering (2003) SpamAssassin Bayesian Filter Testing Learners K-Fold Cross Validation Training Data Randomly Partition (usually 10 folds) Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Learner Use k-1 folds for training, test on unused fold. (repeat for each fold unused) Fold 5 Score Concerns • • • • Adversarial Spam Limits of Naïve Bayesian Model How many features Expressiveness of learned features What if Really Adversarial Spam How Many Ways Can You Spell V1@gra? • Player 1: Spammer – Goal: Create a spam that tricks filter, or make filter reject ham 600,426,974,379,824,381,952 • Player 2: Filter – Not be tricked (but not reject ham messages) Does this game have a Nash equilibrium? Focused Attack (try to filter particular non-spam message) Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph, Benjamin I. P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, Kai Xia Exploiting Machine Learning to Subvert Your Spam Filter. USENIX LEET 2008. Hidden Markov Model Hidden Markov Models Viterbi Path Finite State Machine + probabilities on transitions + hide the state A 1/3 1 + add observations and output probabilities Bet 1/3 Start 1 K Given a sequence of observations, what is most likely sequence of states Check 1/3 1/3 2/3 Q Viterbi Algorithm (1967) … x(t-2) x(t-1) x(t) y(t-2) y(t-1) y(t) Key assumption: the most likely sequence for time t depends only on (1) the most likely sequence at time t-1 (2) y(t) Viterbi Algorithm Key assumption: the most likely sequence for time t depends only on (1) the most likely sequence at time t-1 (2) y(t) Andrew Viterbi This is true for first-order HMMs: transition probabilities depend only on current state. Viterbi Algorithm Applications of HMMs • Noisy Transmission (Convolution Codes) – Sequence of states: message to transmit – Observations: received signal • Speech Recognition – Sequence of states: utterance – Observations: recorded sound • Bioinformatics • Cryptanalysis • etc. Running time: Charge • So far we have assumed the state transition and output probabilities are all known! • Thursday’s Class: learning the HMM If you are coming to the home game Thursday 7pm, remember to email me by 5pm Wednesday. Include if you need a ride or can drive other people.
© Copyright 2026 Paperzz