Segmenting Consumers on the Basis of Nutrition Attitudes

APPLYING COMPONENTS ANALYSIS TO
ATTITUDINAL SEGMENTATION
KRISHNA TATENENI & MEGAN SCHILLER
CURRENT TOPICS IN THE THEORY AND APPLICATION OF LATENT VARIABLE MODELS:
A CONFERENCE HONORING THE SCIENTIFIC CONTRIBUTIONS OF MICHAEL W. BROWNE
September 9-10, 2010
OBJECTIVES OF OUR TALK
 Provide an Overview of Components Analysis and Segmentation
 Give an Example of the Use of Data-based Components Analysis for
Consumer Segmentation
 Demonstrate Segment Profiling & Classification of Individuals
2
OVERVIEW OF COMPONENTS ANALYSIS AND SEGMENTATION
3
Though Factor Analysis (FA) and Components Analysis (CA) share
superficial similarities, there are key differences in their goals
 FA and CA may be represented by the same equation — a multivariate multiple
regression of p dependent variables on m independent variables:
xi    zi  ei
 However, there are some important differences:
Factor Analysis
Components Analysis
• The independent variables are latent
(i.e., unobservable for individuals)
• The independent variables are manifest
(i.e., known for each individual)
• There is a model that implies uncorrelated
error terms
• There is no testable model in components
analysis
 The goodness-of-fit of the model can
be measured and used to select one
model over another
 The goal is to replace many variables
with a smaller set while minimizing loss
of information
Browne and Tateneni (2009): “The dominant purpose of a method of data analysis is determined by
assumptions made in the derivation of the method, not by common usage (or misuse.)”
4
Data-based Components Analysis (DBCA) is a computer program
designed for implementing components analysis
 DBCA applies a singular value (Eckart-Young) decomposition directly to the data
matrix, retaining the first m components
 Unlike in FA, correlations are not calculated at any stage of the analysis
 One side of the decomposition receives the singular values; this side consists of the
“component loadings,” which are analogous to factor loadings in factor analysis
 The other side consists of component scores; there is no equivalent in factor analysis since
factors are latent, unobservable variables
 The initial solution meets the identification conditions of principal components
 However, orthogonal or oblique rotation may be carried out without changing the amount
of variance explained by the m components
 Rotation retains the property of the components being optimal predictors of the original
variables, but destroys the principal component identification conditions
 Rotation in DBCA is double-sided: both component loadings and component scores are
changed by rotation
5
There is a natural connection between the data-based view of
components analysis and segmentation of respondents
 The analysis may be viewed as simultaneously addressing two problems:
 The data condensation problem: Replacing a large number of variables with a smaller set of
“components” that are linear combinations of the original variables
 The respondent profiling problem: Replacing complex respondent profiles on many
different variables by more interpretable respondent profiles on a small set of variables
 Grouping (“segmenting”) respondents by feeding the results of a Principal
Components analysis into a clustering algorithm is not a new idea
 However, it is our view that a better understanding of components analysis, and in
particular, the value of analytic rotation, will lead to better applications in segmentation
 In this presentation, we will review the problem of segmenting consumers on the
basis of their attitudes and demonstrate the utility of data-based components
analysis in addressing this problem
6
Segmentation is a business decision made in order to have a more
targeted and efficient marketing strategy
 Segmentation is often described as a compromise between individual and mass
marketing
 Segmentation is a decision by the marketer to divide the overall market into two or more
distinct segments
 Each segment is different from the others (i.e., there is a different “need” to be met)
 Members within the same segment are similar in ways that are important to the marketer
 Metrics for measuring similarity / distinctness depend on the specific business
objective of the segmentation; some examples:
 Interest in a new product
 Geographic location
 Attitude towards a specific social cause
7
Data-based components analysis is consistent with a parsimonious
approach to the business problem addressed by segmentation
 One type of marketing problem is to define segments who will respond to different
messaging approaches
 For example, one segment of consumers may be interested in fast food restaurants because
they are affordable, while another segment may be interested in the convenience aspect of
the same type of restaurant
 For differential messaging to various audiences, consumers may be segmented on
the basis of responses to attitudinal statements
 Applying data-based components analysis to a set of ratings, all on the same scale, seems a
natural fit for the method
 By defining a few key components, a large data set can be condensed into a smaller set that
is easier to understand
 With careful planning, it may be possible to have just a handful of components and define
segments simply by examining means or medians on those components instead of using
further clustering techniques
8
USING DATA-BASED COMPONENTS ANALYSIS
FOR CONSUMER SEGMENTATION
9
We used DBCA to segment caretakers of young children on their
attitudes about nutrition and food preparation
 We surveyed 310 individuals. To qualify for the study, they had to:
 Have at least one child between the ages of 9 months and 10 years
 Take an active role in purchasing and preparing food for these children
 Respondents were asked to characterize their level of agreement with 18 attitudinal
statements (9 about nutrition, 9 about food preparation)
 These questions were intended to serve as the basis for the segmentation
 We also included some demographic questions so that we could later characterize
our segments. These included but were not limited to:
 Gender
 Age
 Marital status
 Employment status
10
We began the analysis by refining the basis variables for the
segmentation
Organic food
attitudes
 Instead of using all 18 attitudinal statements, we examined the correlations among
them and selected a subset of 6 statements:
 Organic foods are healthier than non-organic foods
 Organic foods are safer for my family than non-organic foods
 Organic foods are generally of a higher quality than non-organic foods
Food Prep
attitudes
 I prefer serving ready-to-eat meals because there is less cleaning up to do
afterwards
 I often buy ready-to-serve meals because it frees up my time to do other things
with my family
 I find it easier to get my child / children to eat pre-prepared foods than
homemade food
 These variables are strongly correlated with each other within their two groupings
and weakly correlated across grouping
 Investing the initial effort in refining the segmentation basis results in a clearer and
more stable solution that is not susceptible to quirks in segmentation methods
11
The first two components account for over two-thirds of the total
variance in our subset of 6 attitudinal variables
COMPONENT
CUMULATIVE PROPORTION
OF TOTAL VARIANCE
1
40%
2
69%
3
82%
4
89%
5
95%
6
100%
 Variables were standardized to have means of 0 and standard deviations of 1 prior to
the singular value decomposition of the data matrix
 Since statements were rated on the same scale, the variances could be left alone
 In this data set, the variances were fairly uniform, so we chose to have component loadings
that could be interpreted as correlations in orthogonally rotated solutions
12
Examining the composition of the variances for a two component
solution shows generally good results
STATEMENT
COMMON VARIANCE
RESIDUAL VARIANCE
Organic foods are healthier than non-organic foods
76%
24%
Organic foods are safer for my family than nonorganic foods
79%
21%
Organic foods are generally of a higher quality than
non-organic foods
75%
25%
I prefer serving ready-to-eat meals because there is
less cleaning up to do afterwards
68%
32%
I often buy ready-to-serve meals because it frees up
my time to do other things with my family
74%
26%
I find it easier to get my child / children to eat preprepared foods than homemade food
44%
56%
 Organic food statements were fairly strongly correlated with each other while food
preparation statements had weaker within-group correlations.
13
Unrotated principal components reflect the two groupings of the
statements; however the components are difficult to interpret
1
0
-1
0
14
1
An orthogonal rotation towards an interpretable target yields very
good results, thanks to the careful selection of the basis variables*
0.8
-1
0
1
-0.2
15
*An oblique rotation makes the components slightly negatively correlated.
We chose an orthogonal rotation for ease of presentation.
The rotated component loadings are structured in a way that makes
the components easy to interpret
Statement
Component 1
Component 2
Organic foods are healthier than non-organic foods
0.87
-0.08
Organic foods are safer for my family than nonorganic foods
0.89
-0.05
Organic foods are generally of a higher quality than
non-organic foods
0.87
-0.01
I prefer serving ready-to-eat meals because there is
less cleaning up to do afterwards
-0.04
0.82
I often buy ready-to-serve meals because it frees up
my time to do other things with my family
-0.05
0.86
I find it easier to get my child / children to eat preprepared foods than homemade food
-0.07
0.66
 High scores on Component 1 are associated with positive attitudes towards organic foods;
high scores on Component 2 are associated with positive attitudes towards prepared foods
16
We used median splits to define four segments based on the two
optimal components after the orthogonal target rotation
3
2
1
0
-4
-3
-2
-1
0
-1
-2
17
1
2
Looking at the scores prior to rotation, it is clear that segment
assignments would be very different with principal components
4
3
2
1
0
-5
-4
-3
-2
-1
0
-1
-2
-3
-4
18
1
2
The oft-used “bubble plot” is an attractive albeit misleading
representation of segmentation solutions
1.5
0
-1.5
0
-1.5
19
1.5
The four evenly sized segments reflects our choice of basis variables,
which led to two orthogonal components
 In practice, we believe that thoughtful consideration of the business problem will
usually yield a small, relevant basis for the segmentation
 In our opinion, traditional approaches to market segmentation rely too much on “the
blender approach,” in which the analyst runs cluster or latent class analysis on a large
number of variables and produces a wide variety of solutions
 Getting caught up in adding color to the segments (“the solution is not useful if the
segments are not different on X”) leads analysis away from the core business issues
 The resulting solutions often reflect compromises that make the segmentation
unimplementable
 Segmentation is not a scientific theory, but a business decision
 Approaches to segmentation will always be algorithmic in nature
 Data-based components analysis is a suitable tool for building such algorithms
20
PROFILING SEGMENTS & CLASSIFYING INDIVIDUALS
21
We analyzed demographic and behavioral data to profile the segments
DESCRIPTION
DEMOGRAPHICS
SHOPPING
BEHAVIOR
SAMPLE
PROPORTION
ORGANIC
ON-YOUR-OWN (Q4)
YOU PROCESS,
I PREP (Q3)
CONVENTIONAL
CONVENIENCE (Q2)
HEALTHY BUT
HELPLESS (Q1)
Organic purchasers,
Prepare food
themselves
Non-organic
purchasers, Prepare
food themselves
Non-organic
purchasers, Preprepared food users
Organic purchasers,
Pre-prepared
food users
More likely to be
older, female,
liberal, in an
suburban setting
More likely to be
male, politically and
geographically
diverse
More likely to
be male, in a rural
setting
More likely to be
younger, female,
liberal, in an urban
setting
More likely to shop
at organic stores
and farmers
markets
Diverse shopping
habits
Less likely to shop at
organic stores and
farmers markets
More likely to shop
at organic stores
and farmers
markets
23%
27%
22%
28%
22
Often, a classification tool is required in order to implement a
segmentation and address the business issue that is at stake
 In segmentation studies that are aimed at uncovering subpopulations of high
opportunity for a specific marketing situation, it is necessary to provide guidance on
classifying individuals outside the study sample; for example:
 Pre-classified lists of physicians that can be used by pharmaceutical sales representatives
 Classifications of retail website visitors to customize the information presented to them
 In the study reported here, we did not set out with an obvious business problem, but
rather hoped to develop a simplified, but realistic example data set
 We did include several questions in addition to the attitudinal statements that might be
suitable for classifying individuals
 We examined the relationship of these questions to the segment classifications to
investigate whether a classification tool might be developed
23
We found that there were seven different questions that were
reasonably predictive of segment membership
1. What is the highest level of education you have completed?
Some high school, High school / GED, Some college, Associates, Bachelors, Advanced degree
2. Which of the following best describes the community where you live?
Urban, Suburban, Rural
3. How often do you buy organic food for your family?
Regularly, Occasionally, Never
4. (If applicable) In the last year my purchasing of organic foods has:
Increased, Stayed the same, Decreased
5. In a typical week I feed my child / children frozen food or pre-prepared food:
Never, 1-3 times, 4-5 times, More than 5 times
6. In a typical week I eat out with my child / children:
Never, 1-3 times, 4-5 times, More than 5 times
7. Do you shop for groceries at natural and organic grocery stores such as Whole Foods and
Trader Joe’s?
24
Using a naïve Bayes classifier, we found that these seven questions are
moderately effective at classifying respondents in the sample*
ORGANIC
ON-YOUR-OWN
YOU PROCESS,
I PREP
CONVENTIONAL
CONVENIENCE
HEALTHY BUT
HELPLESS
ORGANIC ON-YOUR-OWN
61%
23%
2%
15%
YOU PROCESS, I PREP
14%
55%
16%
14%
CONVENTIONAL CONVENIENCE
11%
23%
45%
21%
HEALTHY BUT HELPLESS
16%
21%
16%
47%
PREDICTED SEGMENT
 Each row in the above table shows the percent of respondents in each predicted
segment that fall into each of the original segments
 Overall classification accuracy is 50%
 These questions perform significantly better than chance in classifying respondents:
 If no information is available, there is approximately a 25% chance of correctly identifying
the segment to which a respondent belongs
25
*Cross-validation samples are very rare in market research practice.
A hierarchical approach to classification yields a little improvement in
classification accuracy
ORGANIC
ON-YOUR-OWN
YOU PROCESS,
I PREP
CONVENTIONAL
CONVENIENCE
HEALTHY BUT
HELPLESS
ORGANIC ON-YOUR-OWN
55%
22%
3%
20%
YOU PROCESS, I PREP
11%
57%
19%
13%
CONVENTIONAL CONVENIENCE
11%
18%
48%
23%
HEALTHY BUT HELPLESS
15%
19%
17%
49%
PREDICTED SEGMENT
 The hierarchical classification tool uses the same seven questions as the naïve Bayes
classifier
 However, classification is done in two steps (both also using naïve Bayes classifiers):
1. Respondents are classified into either an Organic+ or an Organic− segment
2. Within each of the two, respondents are further classified into one of the two pre-prepared
segments
 Overall classification accuracy improves slightly to 52%
26
Questions & Discussion . . .
27
Biographies & Acknowledgements
 Krishna Tateneni, Principal, Tateneni Consulting Before starting his own consulting practice
Krishna worked as a Director in the Decision Modeling Group at Eidetics, a pharmaceutical
marketing research and consulting firm in Boston. Prior to that he worked as the Lead
Statistician for the College Board’s Advanced Placement Program at Educational Testing Service
(ETS). He holds a Ph.D. in Quantitative Psychology from the Ohio State University.
 Megan Schiller, Director, Tateneni Consulting Megan has over ten years of experience in
pharmaceutical market research. Prior to working with Tateneni Consulting she was a Senior
Manager in the Client Service Group at Eidetics. She has experience in both quantitative and
qualitative research techniques. She has a Bachelors of Science in Marketing from Boston
College.
 The authors thank Pam Litner, Schlesinger Associates Downtown Chicago for her help in
recruiting respondents for our study.
Schlesinger Associates
www.schlesingerassociates.com
28
APPENDIX: DATA ANALYSIS BY SEGMENT
29
Gender
Conventional
Convenience
Healthy But
Helpless
Male, 33%
Male, 41%
Female,
59%
Female,
67%
You Process,
I Prep
Organic
On-Your-Own
Male, 24%
Male, 36%
Female,
64%
Female,
76%
30
Age
Conventional
Convenience
Healthy But
Helpless
40 and
over, 26%
40 and
over, 40%
Less than
40, 60%
Less than
40, 74%
You Process,
I Prep
Organic
On-Your-Own
40 and
over, 39%
40 and
over, 45%
Less than
40, 55%
Less than
40, 61%
31
Location
Conventional
Convenience
Healthy But
Helpless
62%
60%
38%
25%
13%
1%
Urban
Suburban
Rural
Urban
You Process,
I Prep
Suburban
79%
Rural
Organic
On-Your-Own
62%
29%
17%
8%
Urban
Suburban
4%
Rural
Urban
32
Suburban
Rural
Political Leaning
Conventional
Convenience
Healthy But
Helpless
46%
26%
Conservative
Moderate
44%
24%
27%
28%
Liberal
Conservative
Moderate
You Process,
I Prep
Organic
On-Your-Own
41%
44%
34%
21%
Conservative
Liberal
Moderate
Liberal
33
27%
28%
Conservative
Moderate
Liberal
Frequency of Organic Food Purchase
Conventional
Convenience
Healthy But
Helpless
71%
48%
49%
22%
7%
Regularly
3%
Occasionally
Never
Regularly
Occasionally
You Process,
I Prep
Never
Organic
On-Your-Own
59%
51%
38%
26%
24%
3%
Regularly
Occasionally
Never
Regularly
34
Occasionally
Never
Shopping Locations
Conventional
Convenience
Healthy But
Helpless
65%
40%
28%
25%
24%
9%
Organic Stores
Specialty Foods
Farmers Markets
Organic Stores
Specialty Foods
You Process,
I Prep
Farmers Markets
Organic
On-Your-Own
68%
48%
40%
27%
23%
8%
Organic Stores
Specialty Foods
Farmers Markets
Organic Stores
35
Specialty Foods
Farmers Markets
Frequency of Feeding Frozen / Preprepared Foods
Conventional
Convenience
76%
71%
22%
22%
7%
1%
Never
You Process,
I Prep
1-3 times/wk
4+ times/wk
Never
1-3 times/wk
4+ times/wk
Organic
On-Your-Own
69%
63%
30%
24%
7%
Never
Healthy But
Helpless
1-3 times/wk
7%
4+ times/wk
Never
36
1-3 times/wk
4+ times/wk