Bayes Theorem

Exploring Data
Science
Bayes Theorem
May 2016
Randall Shane, PhD
[email protected]
…DISCLAIMERS…
PLEASE NOTE:
(1) Code written in this presentation is
syntactically expressed using python 2.7.10
with the sci-kit learn library version 0.15.2.
(2) Images and some text has been borrowed
from the inter webs. Apologies if I did not
credit. Thanks for the info, no $$ were made
but please be comforted in the fact that
you’re making the world a smarter place!!
Bayes Theorem
Bayes Theorem is a useful
tool for predicting
conditional probabilities.
Mathematically, it is
expressed as follows:
where A and B are events.
P(A) and P(B) are the
probabilities of A and B with
regard to each other.
P(A | B) is the probability of
event A given event B is true.
P(B | A) is the probability of
event B given event A is true.
How Does it Work?
EXAMPLE: What is the chance you
have breast cancer given a positive
mammogram result? P( A | B ) =
probability of cancer given a positive
result.
ALSO NOTE:
(1) Mammograms are 80% accurate
and miss it 20% of the time.
(2) 9.6% of the time detect cancer
when it is not actually present returning
a correct result 90.4% of the time.
(3) 1% of women have breast cancer.
99% do not.
How Does it Work?
Cancer
(1%)
No Cancer
(99%)
test +
80%
9.6%
test -
20%
90.4%
How to read the table:
• 1% of people have cancer
• If you already have cancer, you
are in the first column. There’s an
80% chance you will test positive.
There’s a 20% chance you will test
negative.
• If you don’t have cancer, you are
in the second column. There’s a
9.6% chance you will test positive,
and a 90.4% chance you will test
negative.
How Does it Work?
Cancer
(1%)
No Cancer
(99%)
test +
80%
9.6%
test -
20%
90.4%
Suppose you get a positive test result.
What really are the chances you have
cancer?
Positive result means you’re in the top
row of the table.
Chances of a true positive: chance you
have cancer * chance the test caught it.
(1% * 80% = .008)
Chances of a false positive: chance you
do not have cancer * chance test said
you did. (99% * 9.6% = 0.09504)
probability = desired event /
all possibilities
How Does it Work?
probability = desired event /
all possibilities
chance of having cancer = .008
all possible outcomes =
chance of a true positive = .008
chance of a false positive = .09504
So;
(.008 + .09504) = .10304
Probability of having cancer after
receiving a positive result =
desired event / all possibilities
.008 / .10304 = 0.0776 = 7.76%
…nobody panic…
Theorem Applied
P(A | B) = P(B | A) * P(A)
P(B)
= P(C+|T+) * P(C+)
P(T+)
note: P(T+) = P(C+|T+) + P(C-|T+)
=.008 + .09504 = .10304
=.8 * .01
.10304
=.077639 = 7.76%
Iris Data Set
One of the most common data sets used in
Machine Learning…
The Iris flower data set or Fisher's Iris data set
is a multivariate data set introduced by Ronald
Fisher in his 1936 paper. The use of multiple
measurements in taxonomic problems as an
example of linear discriminant analysis. It is
sometimes called Anderson's Iris data set
because Edgar Anderson collected the data to
quantify the morphologic variation of Iris
flowers of three related species.
The data set consists of 50 samples from each
of three species of Iris (setosa, virginica and
versicolor). Four features were measured from
each sample: the length and the width of the
sepals and petals, in cm. Based on the
combination of these four features, many
models have been developed to distinguish
the species from each other.
Options for coding naive bayes in Python
•
http://machinelearningmastery.com/naivebayes-classifier-scratch-python/
•
https://github.com/muatik/naive-bayesclassifier/tree/master/naiveBayesClassifier
— OR —
•
http://scikit-learn.org/stable/modules/
generated/
sklearn.naive_bayes.GaussianNB.html#skl
earn.naive_bayes.GaussianNB
Naive Bayes
using Sci-Kit
Learn in
Python
Code:
Output:
…So, what do I do with it?
…its my base goto for classification!
Very robust and provides a basis for
evaluation of other algorithms.
So;
• calculate correct and errors
• calc error rate on training set
Sources
•
https://en.wikipedia.org/wiki/Bayes
%27_theorem
•
http://stattrek.com/probability/bayestheorem.aspx
•
http://betterexplained.com/articles/an-intuitiveand-short-explanation-of-bayes-theorem/
•
https://www.youtube.com/watch?
v=2Df1sDAyRvQ
•
http://machinelearningmastery.com/naivebayes-classifier-scratch-python/
•
http://scikit-learn.org/stable/modules/
generated/
sklearn.naive_bayes.GaussianNB.html#sklearn.
naive_bayes.GaussianNB
Thank you for coming!
Resources:
Code on GitHub: https://github.com/
RandallShane/
BoiseDataScienceMeetup
Code:
bayes.py
if you have additional questions,
please feel free to reach out:
[email protected]
@RandallShanePhD