Title of Presentation Subtitle of Presentation

Analytics: Advanced Tools
October 2014
Jane Tang SVP, Advanced Analytics, Vision Critical [email protected]
Driver Analysis
Driver Analysis Motivation: Don’t Ask Why
• Why not just ask respondents directly why they
purchase a particular product?
• Consumers are generally unconscious of why we do
what we do when it comes to product purchase
decisions.
• Respondents will tell you answers that they think you
want to hear.
• You get their justifications for their purchase, not their
motivation.
Reference: Nisbett, Richard, & Wilson, Timothy. (1977). Telling more than we can
know: Verbal reports on mental processes. Psychological Review, 84, 231-259.
Driver Analysis Motivation: Stated Importance
• Why not ask respondents to state the importance of
product/service attributes?
• The traditional approach of asking respondents to indicate the
importance of attributes on a scale requires no tradeoff:
• “Everything is really important”
• No Differentiation: what’s truly important vs. marginal items.
• Prosocial bias
• Tradeoff methods such as Conjoint and/or MaxDiff are
suitable, but are often more time consuming/require
additional questionnaire real estate.
We recommend using a derived importance
method through driver analysis.
Driver Analysis Background
In Driver analysis, we seek to understand the motivation
behind consumer behaviours by observing the pattern of
associations and correlations between their decisions
and their perception/experience with the product/service
being offered
• If we are interested in what drives consumer purchase decisions, we
look for correlations between purchase decision and consumer
perception and experience with the product.
• If we are interested in what drives customer satisfaction, we look for
correlations between overall customer satisfaction rating and their
satisfaction with key service points.
Driver Analysis: Correlation & Causation
Despite the name, a “driver” analysis is the analysis of relationships and
correlation. A driver analysis does not establish causality.
Driver Analysis Issues
• Requires complete data for every variable.
• Missing data must be replaced with the mean, or other value.
• Alternatively, a reduced base size must be used in the analysis.
• The majority of the respondents need to be able to rate the
attributes/services
• Does not distinguish between drivers of satisfaction and
drivers of dissatisfaction.
• Not predictive. The analysis is based on observation of
current behavior of consumers only, not predictive of
their future behavior.
Driver Analysis Issues
Common in any form of key driver analysis, variables
with little variations will not show up as key drivers.
• Table stake attributes are unlikely to show up as key drivers.
• All airlines are safe, so safety is not a key driver of airline choice
among travelers.
• Components that affect only a very small proportion of the
customers will not show up as key drivers.
• Satisfaction with claims is not as key driver as only a very small
portion of the insurance policy results in claims.
• Consideration in questionnaire design – consistency in how you
measure potential drivers.
Unstructured vs. Structured
Dependent Variable: Overall Satisfaction
Independent Variables: Satisfaction with service attributes
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Ease of making your purchase
The financing and payment terms associated with your purchase
Consistently approving my purchases
Communicating how the purchase approval process works
Having enough information to understand how Bill Me Later works
Stating clearly the financing and payment terms associated with your purchase
Adding Bill Me Later to your PayPal funding options
Paying your Bill Me Later bill
Having enough available credit to make your purchases
Communicating your available line of credit
Ease of managing your account online
Shapley Value Regression
Penalty & Reward
SV Regression Background
• Shapley Value Regression is a type of Unstructured
Driver Analysis.
• SV Regression is able to deal effectively with the
multicollinearity issue that’s often present in Market
Research data.
• Multicollinearity is when there are strong correlations between the
various aspects of consumer perceptions to the point that it affects the
stability of the results in the driver analysis via traditional approaches.
• http://www.visioncritical.com/blog/untangling-which-attributes-
drive-purchase
SV Regression Methodology
• Model Specification
• Dependent variable is the response variable you want to study.
• Has to be at least ordinal, i.e. 3 ordered response category or more.
• Interval or ratio scales are preferred.
• Independent variables are the potential drivers that could influence the
response variable.
• can be metric or non-metric.
• Examples: brand association questions: which of the following brands do
you associate with these attributes?
• What sample size do you need?
• We require at least 10 cases of data for each potential drivers.
• If there are 20 potential drivers, we need at least n=200 cases of data.
• As the ratio falls below 10:1, we encounter the risk of overfitting the model to the
sample, making the results too specific to the sample and lacking generalizability.
SV Regression Example Output
• R2 – How much do the potential drivers together influence
the response variable
• Relative Importance – the relative importance of the
potential drivers (sums to 100%).
SV Regression Output - Details
• R2 - Coefficient of Determination
• Measures the proportion of the variance of the response variable that
is explained by all the potential drivers. It varies between 0 and 1. The
higher the R2, the stronger the association between potential drivers
and the response variable.
• SV – Shapley Value
• the contribution of each potential driver to the overall R2 of the
regression
• sdSV – Standard deviations of the Shapley Values
• Relative Importance
• Rebasing the Shapley Values so that they sum to 100%. The relative
importance of an item = SV/ Overall R2.
Performance vs. Importance
Performance Impact
Performance Impact
BASE CASE
- SINGLE-CHOICE with a minimum of 3 ordered
categories to be used as the the overall
performance indicator for ONE brand.
- 8-14 attributes that measure distinctive aspects
of performance that are directly related to that
brand positively.
- Require an absolute minimum of n=10 cases of
data for each performance attribute. Prefer a lot
more.
Penalty & Reward Analysis
• Shapely Value regression assumes the relationship is symmetrical:
• Good product perception is associated with higher product purchase intent
AND poor product perception is associated with lower purchase intent
• What is the relationship is NOT symmetric?
• Poor product performance on certain attributes may be associated with lower
purchase intent, but good product performance on that same attribute is
NOT associated with higher purchase intent. This is called a penalty
• Good perception on an attribute is associated with higher purchase intent,
but poor perception on that attribute is not associated with lower purchase
intent. This is called a reward.
• Penalty & Reward analysis is a TUR related counting process.
• Using the same Shapley Value principle to understand the contributions of
each attributes, as rewards and separately as penalties.
Penalty/Reward Output
Flavor of meat
Overall amount of food
Overall flavor
Value for the money
Quality of pasta or rice
Quality of vegetables
Appearance of meat
Overall appearance of food
Quality of nutrition information
Penalty
Reward
Structual Equation Model
PLS/Path Model
SEM/PLS
• Structural Equation Models(SEM) combines serveral analysis together:
regression, confirmatory factor analysis and analysis of variance.
• We use a Partial Least Square (PLS) algorithm to estimate SEM
models.
• SEM allows (and requires) you to presuppose a structure to your data
– which means you have to think very specifically about what you want
your data to explain and what you want the explanation to look like.
• Because you have presupposed—or hypothesized—a structure that you
expect your data will fit, SEMs provide statistical validation of your theory.
That is, the output tells you in a variety of ways whether or not your theory is
a plausible representation of reality (the data).
• A key feature of SEMs are latent, or unobserved, variables (measured
indirectly through the Measurement Model).
• The drivers are the loadings from the latent variables to the dependent
variable—for example, overall satisfaction.
SEM Methodology
• Force the researcher to think about the analysis in
advance of doing it
• Provide simple interpretations of complex data.
• Can be useful in small sample situations
SEM Output
DURABLE
0.82
CUTTING EDGE
0.95
TRUSTWORTHY
0.66
HIGH TECH
0.76
HIGHEST QUALITY
1.12
Measurement Model for the
Latent Variables
QUALITY
RESPONSIVE
1.2
KNOWLEDGEABLE
0.98
HELPFUL
0.88
FRIENDLY
0.65
PRICE EVALUATE
0.85
PRICE COMPARE
0.87
VALUE FOR $
1.02
0.92
0.93
0.43
SERVICE
OVERALL
SATISFACTION
0.86
BRAND
LOYALTY
BRAND USAGE
1.15
0.81
VALUE
RECOMMEND

Quality is the strongest driver of
satisfaction.
 More than twice as important
as service

Satisfaction is an important
determinant of loyalty.
0.75
Latent Variables
USE AGAIN
SEM Output II
Conjoint Analysis:
Rating Based Conjoint
Choice Based Conjoint (CBC)
Discrete Choice Model (DCM)
Overview of Conjoint Analysis:
• Conjoint analysis is a popular marketing research technique
that marketers use to determine what features a new
product should have and how it should be priced.
• Conjoint analysis became popular because it was a far less
expensive (smaller sample size) and more flexible way to
address these issues than concept testing.
• When there are just too many potential product combinations for
concept testing
• Need to understand the tradeoff respondents make
• Need to understand the competitive context
• Need to test respondent’s reaction to price changes in a realistic
setting.
Overview of Conjoint Analysis:
• Conjoint analysis involving showing respondent
potential product combinations.
• Products can be factored into parts, called factors.
Different options within each factor represents factor
levels.
• The basic premise of Conjoint Analysis that a
respondent makes purchase decision based on the
inherent value he places on factor levels.
• He will tradeoff the levels within different factors. E.g. trade in
his favourite color for lower price, etc…
• Non-compensatory rules are allowed.
• vegetarian: the meatless burger pattie is a “must have”
Overview of Conjoint Analysis:
These three steps form the basics of conjoint analysis:
1.
Collecting trade-offs: questionnaire with statistical design
showing various options of the product, and respondents input
in terms of product preference.
2.
Estimating buyer value systems: modeling by the analytics
team.
3.
Making predictions: simulation based on the model developed.
Analytics team working with you for results best suited to
answer your client’s marketing question.
Rating Based Conjoint
• We design conjoint cards that represent possible products based
on factor levels. Respondents are asked to rate each cards in
terms of purchase intent. (Or as in this example, likelihood to vote
for this candidate.)
Candidate A
Health Care
The government should get out of health care
Foreign Affairs
Overseas America should focus on leading the world and
promoting our values, and not listen to the UN
Size Of Government
The federal government is bloated, corrupt and
wasteful—spending needs to be cut dramatically
Energy/Environment
Jobs, a strong economy and energy independence are more
important that the environment
Education
The best way to improve the education system is by giving
more resources to our public school teachers.
For a family of four with a household
income of $85,000:
Increase tax by $1,000.
For a single person with an income of
$35,000:
Increase tax by $400.
• Alternatively we can show respondents a stack of cards and ask
him to rank all the cards in terms of his preference.
Rating Based Conjoint
• Analysis: based on regression. Linear (ratings), Logistic
(ranking).
• Individual level estimate is possible, i.e., each respondents will
have a model based his own data: collect lots of information from
each individual. So most models are at the aggregate level.
• Output:
• Preference for the various product options on the same rating
scale
• simulated preference rating
• Relative preference for the various levels within each factor
• Isotherm
• Problems:
• Ratings: scale usage issues, “yeah”ers vs. “nay”ers.
• Ranking: only works with very small problem
• New applications: Media Impact tool, Virtual Menu Board
Choice Based Conjoint
• Choice Based Conjoint: we design conjoint cards that represent
possible products based on factor levels. Products are grouped
into options within a card, and respondents are asked to choose
within the group.
• Over the last decade, academics and practitioners have favored
choice over ratings-based methods:
• Stronger mathematical theory (McFadden: MNL theory)
• Stronger psychological underpinnings
• Argued to be more accurate (comparison to market data)
Discrete Choice Model
• DCM is really just one type of CBC, where the focus is less on
optimizing the product offer, more on the market competitive
context.
DCM
CBC
•
•
•
Uses with multiple factors (6-10) to
describe products
Respondents are shown limited
number of options per card (4-6).
Usually come at the earlier stage
in product development for
–
–
–
Market potential
Best feature combination
Rough price level
•
•
•
Mostly use Brand/Price combo to
describe products
Respondents are shown many options
that represents most of the market
Usually at later stage in product
development to:
–
–
Test for various marketing inputs,
such as package, POS
Determine pricing scenario, product
lineup vs. competitions.
CBC Choice Tasks
CBC Task
DCM Choice Tasks
DCM Choice Tasks
CBC & DCM
• Output: the basic output is similar as those from Rating
based conjoint
• CBC:
• Factor importance/Level preference - Isotherm
• Simulation: simulator, product optimization
• Individual level estimation allows you to further segment the
respondents.
• Potentially developing different optimized product for each segment.
Caution: no simple typing tools for these.
• Feature optimization.
• DCM:
• Usually no isotherm except for impact of packaging change,
sale/promotions
• Simulator: line optimization, pricing optimization
The CBC Simulator
High
Factor Importance/Level Preference
ISOTHERM EXPLANATION
• Each feature shown as vertical line, where longer the line, greater the strength in driving choice. Features are displayed in
descending order. Features on the left are more important than features on the right.
• The options within each feature are shown as tick marks along each vertical line. The higher up on the vertical line, the stronger
the preference for that option.
• Let’s use Shape & display size as an example. After price, it is the most important attribute driving choice, so it should be an
important focus area when designing the new device.
Price 1
PREFERENCE
Price 2
Shape 4
Brand A
Shape 3
2.0 GHz
Shape 7
2
Shape 6
Price 3
Shape 5
Yes
14 hours
10 MP
H
Brand T
Brand C
V
Brand U
Brand K
1.5 GHz
1.0 GHz
No
6 hours
12%
6%
5%
5%
Feature1
Battery Life Camera
10 hours
8 MP
5 MP
4%
Yes
Yes
No
No
5
4 stars
stars
3 stars
Yes
No
No
TH3
TH2
TH1
TH5
TH4
TH6
4%
3%
2%
2%
2%
0%
Feature4
Feature5
Thickness
Feature3
Display
Feature2
Yes
Price 4
Low
Shape 1
Price 5
Price 6
37%
18%
Price
Shape & Manufacturer
Speed
display size
brand
More Important
ATTRIBUTE
IMPORTANCE
Less Important
Choice Share
Pricing Scenarios Through Simulation
Supporting Materials
• For documentation, proposals, reporting, questionnaires,
learning materials, etc…
Advanced Analytics Intranet/ Knowledge Center/Statistical
Methods/CBC
PROGRAMMING CUSTOM CONJOINT SHOWCASE