Exploring Data Science Bayes Theorem May 2016 Randall Shane, PhD [email protected] …DISCLAIMERS… PLEASE NOTE: (1) Code written in this presentation is syntactically expressed using python 2.7.10 with the sci-kit learn library version 0.15.2. (2) Images and some text has been borrowed from the inter webs. Apologies if I did not credit. Thanks for the info, no $$ were made but please be comforted in the fact that you’re making the world a smarter place!! Bayes Theorem Bayes Theorem is a useful tool for predicting conditional probabilities. Mathematically, it is expressed as follows: where A and B are events. P(A) and P(B) are the probabilities of A and B with regard to each other. P(A | B) is the probability of event A given event B is true. P(B | A) is the probability of event B given event A is true. How Does it Work? EXAMPLE: What is the chance you have breast cancer given a positive mammogram result? P( A | B ) = probability of cancer given a positive result. ALSO NOTE: (1) Mammograms are 80% accurate and miss it 20% of the time. (2) 9.6% of the time detect cancer when it is not actually present returning a correct result 90.4% of the time. (3) 1% of women have breast cancer. 99% do not. How Does it Work? Cancer (1%) No Cancer (99%) test + 80% 9.6% test - 20% 90.4% How to read the table: • 1% of people have cancer • If you already have cancer, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance you will test negative. • If you don’t have cancer, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance you will test negative. How Does it Work? Cancer (1%) No Cancer (99%) test + 80% 9.6% test - 20% 90.4% Suppose you get a positive test result. What really are the chances you have cancer? Positive result means you’re in the top row of the table. Chances of a true positive: chance you have cancer * chance the test caught it. (1% * 80% = .008) Chances of a false positive: chance you do not have cancer * chance test said you did. (99% * 9.6% = 0.09504) probability = desired event / all possibilities How Does it Work? probability = desired event / all possibilities chance of having cancer = .008 all possible outcomes = chance of a true positive = .008 chance of a false positive = .09504 So; (.008 + .09504) = .10304 Probability of having cancer after receiving a positive result = desired event / all possibilities .008 / .10304 = 0.0776 = 7.76% …nobody panic… Theorem Applied P(A | B) = P(B | A) * P(A) P(B) = P(C+|T+) * P(C+) P(T+) note: P(T+) = P(C+|T+) + P(C-|T+) =.008 + .09504 = .10304 =.8 * .01 .10304 =.077639 = 7.76% Iris Data Set One of the most common data sets used in Machine Learning… The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper. The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (setosa, virginica and versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in cm. Based on the combination of these four features, many models have been developed to distinguish the species from each other. Options for coding naive bayes in Python • http://machinelearningmastery.com/naivebayes-classifier-scratch-python/ • https://github.com/muatik/naive-bayesclassifier/tree/master/naiveBayesClassifier — OR — • http://scikit-learn.org/stable/modules/ generated/ sklearn.naive_bayes.GaussianNB.html#skl earn.naive_bayes.GaussianNB Naive Bayes using Sci-Kit Learn in Python Code: Output: …So, what do I do with it? …its my base goto for classification! Very robust and provides a basis for evaluation of other algorithms. So; • calculate correct and errors • calc error rate on training set Sources • https://en.wikipedia.org/wiki/Bayes %27_theorem • http://stattrek.com/probability/bayestheorem.aspx • http://betterexplained.com/articles/an-intuitiveand-short-explanation-of-bayes-theorem/ • https://www.youtube.com/watch? v=2Df1sDAyRvQ • http://machinelearningmastery.com/naivebayes-classifier-scratch-python/ • http://scikit-learn.org/stable/modules/ generated/ sklearn.naive_bayes.GaussianNB.html#sklearn. naive_bayes.GaussianNB Thank you for coming! Resources: Code on GitHub: https://github.com/ RandallShane/ BoiseDataScienceMeetup Code: bayes.py if you have additional questions, please feel free to reach out: [email protected] @RandallShanePhD
© Copyright 2026 Paperzz