L-8-The Prediction Calculator Tool

The Prediction Calculator Tool
Imagine you are a salesperson, dealing with many customers and
having to select the customers with whom to follow up to close a
transaction.
Following up comes at a cost:
time you need to spend with the customer and
marketing materials you will present them.
So, you need to carefully pick the follow-ups.
The Prediction Calculator Tool
Simple solution is a scorecard tool, which allows the salesperson to assign a
score to each customer based on individual attributes.
Typically, you would use a threshold, something like
‘‘If the total score is at least 70, then the customer is likely to buy a bike, so go
on, follow up!’’
The Prediction Calculator Tool
The Prediction Calculator tool produces such a scorecard.
It also assists you in detecting the optimum threshold for using the
scorecard— a threshold that minimizes any costs associated with incorrect
predictions and maximizes any profits associated with a correct prediction.
The Prediction Calculator tool can perform only binary predictions. It can
be used to predict whether a column will have a certain value or not, but not
to select between multiple alternatives.
The Prediction Calculator Tool
Example
In this example, we will use the Prediction Calculator tool to generate a
scorecard that predicts whether a customer is likely to purchase a bike or not,
based on demographics.
The Prediction Calculator Tool
Example
The Prediction Calculator
The following Figure shows the operational Prediction Calculator report, which
can be used interactively to perform predictions.
The total is compared against the threshold at the top of the report (540, in this
example). If the total exceeds the threshold, the predicted value for Purchased Bike is
True,
The Prediction Calculator Tool
Example
The Prediction Calculator
As an example, try to use the calculator to predict whether a new customer
will buy a bike or not. Enter the customer’s demographics as shown here:
•Married for Marital Status
•Male for Gender
•97111-127371 for Income
•3 for Children
•Graduate Degree for Education
•Professional for Occupation
•Yes for Home Owner
•2 for Cars
•0-1 Miles for Commute Distance
•North America for Region
•46-55 for Age
The total is modified to 603, which exceeds the 558 threshold. Therefore, the
prediction is TRUE, and the customer is likely to buy a bike.
The Prediction Calculator Tool
Refining the Results
In the previous example, we used the Prediction Calculator to predict, based
on demographic information, whether or not a customer will buy a bike.
The Prediction Calculator associates a score with each column value. If the sum
of these scores for a customer is equal to or exceeds a threshold, then the
prediction is positive (the customer will likely buy a bike). If the sum of these
scores is less than the threshold, then the prediction is negative.
The Prediction Calculator Tool
Refining the Results
The predictions can be classified into the following four categories:
True negative predictions—This is correct prediction, but its
a negative one. The tool predicts that a customer is not a bike buyer and
if you ask the customer, you find out that, indeed, the customer is not
interested in buying a bike
True positive predictions—This is correct prediction
False positive predictions, also known as Type I errors—This is an
incorrect positive prediction. The tool predicts that a customer is a bike
buyer but when you ask the customer, you find out that he or she is not
interested in buying a bike.
False negative predictions, also known as Type II errors—This is
another kind of incorrect prediction, a negative one. The tool predicts
that the customer is not a bike buyer, but you find out later that he or she
was actually interested in buying a bike.
The Prediction Calculator Tool
Refining the Results
Our goal in using the calculator is to correctly identify as many bike buyers
as possible. In this scenario, consider the following:
A true positive prediction produces value— the profit margin associated
with selling a bike.
A true negative prediction does not produce value, nor does it produce any
loss —it saves you the marketing effort on an uninterested customer.
A false positive prediction may produce some loss— the marketing cost
associated with that customer.
A false negative prediction does not produce value—it may represent a
lost opportunity to sell a bike.
The total profit generated by the tool is the total profit margin associated
with true positive predictions, minus the total marketing cost associated with
false positive predictions.
The Prediction Calculator Tool
Refining the Results
Suppose that you are using the scorecard to identify high-risk patients. In
this case,
the profit is zero for a true negative prediction.
a false positive prediction may have some cost associated with extra
investigations,
a false negative prediction has a very serious cost associated with patient
risks— costs of treating a more advanced disease.
a true positive prediction
The Prediction Calculator Tool
Refining the Results
We can use the Prediction Report to tune your Prediction Calculator to
maximize the profit. The Figure shows Prediction Report tuning tool
The Prediction Calculator Tool
Refining the Results
By default, the tool associates a profit of $10 with a true positive prediction
and a cost of $10 with a false positive prediction.
These defaults represent a direct marketing scenario, where a true positive
leads to revenue and a false positive leads to losses related to direct marketing
costs.
Use this section of the tool to specify your own costs and profits
The Prediction Calculator Tool
Refining the Results
The tool computes the optimum threshold for the Prediction Calculator as
the threshold that maximizes the profit (revenue from correct predictions,
minus costs from incorrect predictions) over the test set.
During execution, the tool creates a set of randomly selected table rows for
testing purposes.
The Prediction Calculator Tool
Refining the Results
Take a simple example, which considers only Commute Distance and Children.
Assume that the test set contains five rows
Also assume that the following things are true:
•A correct prediction (true positive or true negative) has a profit of $10.
•An incorrect prediction (false positive or false negative) has a cost of $10.
The Prediction Calculator Tool
Refining the Results
If the threshold is set to 524, then any score greater than or equal to 524
generates a positive prediction (correct or incorrect), and any score below 524
generates a negative prediction (correct or incorrect). For a threshold of 524,
the test table produces the following:
•Three true positive predictions (rows with IDs 1, 2, and 3), resulting in a total
revenue of $30.
•One true negative prediction (row 5), resulting in a total revenue of $10.
•Zero false negative predictions.
•One false positive predictions (row 4), resulting in a total cost of $10.
Therefore, the total profit associated with a score threshold of 524 is $30.
The Prediction Calculator Tool
Refining the Results
If you repeat this experiment for all distinct score values in the test set, as well as for 0
and 1000 (the minimum and maximum possible scores), the total profit follows the
values shown in the Table
As a result, the total profit provided
by the tool is $30, and it is
maximized when the threshold is in
the range of 221 to 524.
Actually, the test set granularity
does not permit comparing values in
this range, so the tool will
recommend a threshold of 221 (the
first in the range) as the optimum
threshold
The Prediction Calculator Tool
Refining the Results
The profit starts very low, for a low
threshold. In this case, the number of
false positives is very large
The evolution of the profit for various thresholds
As the score threshold grows, the
number of false positives is reduced.
As the score threshold grows even
further, the number of false negatives
increases.
The cumulative costs associated with incorrect predictions