APPLYING COMPONENTS ANALYSIS TO ATTITUDINAL SEGMENTATION KRISHNA TATENENI & MEGAN SCHILLER CURRENT TOPICS IN THE THEORY AND APPLICATION OF LATENT VARIABLE MODELS: A CONFERENCE HONORING THE SCIENTIFIC CONTRIBUTIONS OF MICHAEL W. BROWNE September 9-10, 2010 OBJECTIVES OF OUR TALK Provide an Overview of Components Analysis and Segmentation Give an Example of the Use of Data-based Components Analysis for Consumer Segmentation Demonstrate Segment Profiling & Classification of Individuals 2 OVERVIEW OF COMPONENTS ANALYSIS AND SEGMENTATION 3 Though Factor Analysis (FA) and Components Analysis (CA) share superficial similarities, there are key differences in their goals FA and CA may be represented by the same equation — a multivariate multiple regression of p dependent variables on m independent variables: xi zi ei However, there are some important differences: Factor Analysis Components Analysis • The independent variables are latent (i.e., unobservable for individuals) • The independent variables are manifest (i.e., known for each individual) • There is a model that implies uncorrelated error terms • There is no testable model in components analysis The goodness-of-fit of the model can be measured and used to select one model over another The goal is to replace many variables with a smaller set while minimizing loss of information Browne and Tateneni (2009): “The dominant purpose of a method of data analysis is determined by assumptions made in the derivation of the method, not by common usage (or misuse.)” 4 Data-based Components Analysis (DBCA) is a computer program designed for implementing components analysis DBCA applies a singular value (Eckart-Young) decomposition directly to the data matrix, retaining the first m components Unlike in FA, correlations are not calculated at any stage of the analysis One side of the decomposition receives the singular values; this side consists of the “component loadings,” which are analogous to factor loadings in factor analysis The other side consists of component scores; there is no equivalent in factor analysis since factors are latent, unobservable variables The initial solution meets the identification conditions of principal components However, orthogonal or oblique rotation may be carried out without changing the amount of variance explained by the m components Rotation retains the property of the components being optimal predictors of the original variables, but destroys the principal component identification conditions Rotation in DBCA is double-sided: both component loadings and component scores are changed by rotation 5 There is a natural connection between the data-based view of components analysis and segmentation of respondents The analysis may be viewed as simultaneously addressing two problems: The data condensation problem: Replacing a large number of variables with a smaller set of “components” that are linear combinations of the original variables The respondent profiling problem: Replacing complex respondent profiles on many different variables by more interpretable respondent profiles on a small set of variables Grouping (“segmenting”) respondents by feeding the results of a Principal Components analysis into a clustering algorithm is not a new idea However, it is our view that a better understanding of components analysis, and in particular, the value of analytic rotation, will lead to better applications in segmentation In this presentation, we will review the problem of segmenting consumers on the basis of their attitudes and demonstrate the utility of data-based components analysis in addressing this problem 6 Segmentation is a business decision made in order to have a more targeted and efficient marketing strategy Segmentation is often described as a compromise between individual and mass marketing Segmentation is a decision by the marketer to divide the overall market into two or more distinct segments Each segment is different from the others (i.e., there is a different “need” to be met) Members within the same segment are similar in ways that are important to the marketer Metrics for measuring similarity / distinctness depend on the specific business objective of the segmentation; some examples: Interest in a new product Geographic location Attitude towards a specific social cause 7 Data-based components analysis is consistent with a parsimonious approach to the business problem addressed by segmentation One type of marketing problem is to define segments who will respond to different messaging approaches For example, one segment of consumers may be interested in fast food restaurants because they are affordable, while another segment may be interested in the convenience aspect of the same type of restaurant For differential messaging to various audiences, consumers may be segmented on the basis of responses to attitudinal statements Applying data-based components analysis to a set of ratings, all on the same scale, seems a natural fit for the method By defining a few key components, a large data set can be condensed into a smaller set that is easier to understand With careful planning, it may be possible to have just a handful of components and define segments simply by examining means or medians on those components instead of using further clustering techniques 8 USING DATA-BASED COMPONENTS ANALYSIS FOR CONSUMER SEGMENTATION 9 We used DBCA to segment caretakers of young children on their attitudes about nutrition and food preparation We surveyed 310 individuals. To qualify for the study, they had to: Have at least one child between the ages of 9 months and 10 years Take an active role in purchasing and preparing food for these children Respondents were asked to characterize their level of agreement with 18 attitudinal statements (9 about nutrition, 9 about food preparation) These questions were intended to serve as the basis for the segmentation We also included some demographic questions so that we could later characterize our segments. These included but were not limited to: Gender Age Marital status Employment status 10 We began the analysis by refining the basis variables for the segmentation Organic food attitudes Instead of using all 18 attitudinal statements, we examined the correlations among them and selected a subset of 6 statements: Organic foods are healthier than non-organic foods Organic foods are safer for my family than non-organic foods Organic foods are generally of a higher quality than non-organic foods Food Prep attitudes I prefer serving ready-to-eat meals because there is less cleaning up to do afterwards I often buy ready-to-serve meals because it frees up my time to do other things with my family I find it easier to get my child / children to eat pre-prepared foods than homemade food These variables are strongly correlated with each other within their two groupings and weakly correlated across grouping Investing the initial effort in refining the segmentation basis results in a clearer and more stable solution that is not susceptible to quirks in segmentation methods 11 The first two components account for over two-thirds of the total variance in our subset of 6 attitudinal variables COMPONENT CUMULATIVE PROPORTION OF TOTAL VARIANCE 1 40% 2 69% 3 82% 4 89% 5 95% 6 100% Variables were standardized to have means of 0 and standard deviations of 1 prior to the singular value decomposition of the data matrix Since statements were rated on the same scale, the variances could be left alone In this data set, the variances were fairly uniform, so we chose to have component loadings that could be interpreted as correlations in orthogonally rotated solutions 12 Examining the composition of the variances for a two component solution shows generally good results STATEMENT COMMON VARIANCE RESIDUAL VARIANCE Organic foods are healthier than non-organic foods 76% 24% Organic foods are safer for my family than nonorganic foods 79% 21% Organic foods are generally of a higher quality than non-organic foods 75% 25% I prefer serving ready-to-eat meals because there is less cleaning up to do afterwards 68% 32% I often buy ready-to-serve meals because it frees up my time to do other things with my family 74% 26% I find it easier to get my child / children to eat preprepared foods than homemade food 44% 56% Organic food statements were fairly strongly correlated with each other while food preparation statements had weaker within-group correlations. 13 Unrotated principal components reflect the two groupings of the statements; however the components are difficult to interpret 1 0 -1 0 14 1 An orthogonal rotation towards an interpretable target yields very good results, thanks to the careful selection of the basis variables* 0.8 -1 0 1 -0.2 15 *An oblique rotation makes the components slightly negatively correlated. We chose an orthogonal rotation for ease of presentation. The rotated component loadings are structured in a way that makes the components easy to interpret Statement Component 1 Component 2 Organic foods are healthier than non-organic foods 0.87 -0.08 Organic foods are safer for my family than nonorganic foods 0.89 -0.05 Organic foods are generally of a higher quality than non-organic foods 0.87 -0.01 I prefer serving ready-to-eat meals because there is less cleaning up to do afterwards -0.04 0.82 I often buy ready-to-serve meals because it frees up my time to do other things with my family -0.05 0.86 I find it easier to get my child / children to eat preprepared foods than homemade food -0.07 0.66 High scores on Component 1 are associated with positive attitudes towards organic foods; high scores on Component 2 are associated with positive attitudes towards prepared foods 16 We used median splits to define four segments based on the two optimal components after the orthogonal target rotation 3 2 1 0 -4 -3 -2 -1 0 -1 -2 17 1 2 Looking at the scores prior to rotation, it is clear that segment assignments would be very different with principal components 4 3 2 1 0 -5 -4 -3 -2 -1 0 -1 -2 -3 -4 18 1 2 The oft-used “bubble plot” is an attractive albeit misleading representation of segmentation solutions 1.5 0 -1.5 0 -1.5 19 1.5 The four evenly sized segments reflects our choice of basis variables, which led to two orthogonal components In practice, we believe that thoughtful consideration of the business problem will usually yield a small, relevant basis for the segmentation In our opinion, traditional approaches to market segmentation rely too much on “the blender approach,” in which the analyst runs cluster or latent class analysis on a large number of variables and produces a wide variety of solutions Getting caught up in adding color to the segments (“the solution is not useful if the segments are not different on X”) leads analysis away from the core business issues The resulting solutions often reflect compromises that make the segmentation unimplementable Segmentation is not a scientific theory, but a business decision Approaches to segmentation will always be algorithmic in nature Data-based components analysis is a suitable tool for building such algorithms 20 PROFILING SEGMENTS & CLASSIFYING INDIVIDUALS 21 We analyzed demographic and behavioral data to profile the segments DESCRIPTION DEMOGRAPHICS SHOPPING BEHAVIOR SAMPLE PROPORTION ORGANIC ON-YOUR-OWN (Q4) YOU PROCESS, I PREP (Q3) CONVENTIONAL CONVENIENCE (Q2) HEALTHY BUT HELPLESS (Q1) Organic purchasers, Prepare food themselves Non-organic purchasers, Prepare food themselves Non-organic purchasers, Preprepared food users Organic purchasers, Pre-prepared food users More likely to be older, female, liberal, in an suburban setting More likely to be male, politically and geographically diverse More likely to be male, in a rural setting More likely to be younger, female, liberal, in an urban setting More likely to shop at organic stores and farmers markets Diverse shopping habits Less likely to shop at organic stores and farmers markets More likely to shop at organic stores and farmers markets 23% 27% 22% 28% 22 Often, a classification tool is required in order to implement a segmentation and address the business issue that is at stake In segmentation studies that are aimed at uncovering subpopulations of high opportunity for a specific marketing situation, it is necessary to provide guidance on classifying individuals outside the study sample; for example: Pre-classified lists of physicians that can be used by pharmaceutical sales representatives Classifications of retail website visitors to customize the information presented to them In the study reported here, we did not set out with an obvious business problem, but rather hoped to develop a simplified, but realistic example data set We did include several questions in addition to the attitudinal statements that might be suitable for classifying individuals We examined the relationship of these questions to the segment classifications to investigate whether a classification tool might be developed 23 We found that there were seven different questions that were reasonably predictive of segment membership 1. What is the highest level of education you have completed? Some high school, High school / GED, Some college, Associates, Bachelors, Advanced degree 2. Which of the following best describes the community where you live? Urban, Suburban, Rural 3. How often do you buy organic food for your family? Regularly, Occasionally, Never 4. (If applicable) In the last year my purchasing of organic foods has: Increased, Stayed the same, Decreased 5. In a typical week I feed my child / children frozen food or pre-prepared food: Never, 1-3 times, 4-5 times, More than 5 times 6. In a typical week I eat out with my child / children: Never, 1-3 times, 4-5 times, More than 5 times 7. Do you shop for groceries at natural and organic grocery stores such as Whole Foods and Trader Joe’s? 24 Using a naïve Bayes classifier, we found that these seven questions are moderately effective at classifying respondents in the sample* ORGANIC ON-YOUR-OWN YOU PROCESS, I PREP CONVENTIONAL CONVENIENCE HEALTHY BUT HELPLESS ORGANIC ON-YOUR-OWN 61% 23% 2% 15% YOU PROCESS, I PREP 14% 55% 16% 14% CONVENTIONAL CONVENIENCE 11% 23% 45% 21% HEALTHY BUT HELPLESS 16% 21% 16% 47% PREDICTED SEGMENT Each row in the above table shows the percent of respondents in each predicted segment that fall into each of the original segments Overall classification accuracy is 50% These questions perform significantly better than chance in classifying respondents: If no information is available, there is approximately a 25% chance of correctly identifying the segment to which a respondent belongs 25 *Cross-validation samples are very rare in market research practice. A hierarchical approach to classification yields a little improvement in classification accuracy ORGANIC ON-YOUR-OWN YOU PROCESS, I PREP CONVENTIONAL CONVENIENCE HEALTHY BUT HELPLESS ORGANIC ON-YOUR-OWN 55% 22% 3% 20% YOU PROCESS, I PREP 11% 57% 19% 13% CONVENTIONAL CONVENIENCE 11% 18% 48% 23% HEALTHY BUT HELPLESS 15% 19% 17% 49% PREDICTED SEGMENT The hierarchical classification tool uses the same seven questions as the naïve Bayes classifier However, classification is done in two steps (both also using naïve Bayes classifiers): 1. Respondents are classified into either an Organic+ or an Organic− segment 2. Within each of the two, respondents are further classified into one of the two pre-prepared segments Overall classification accuracy improves slightly to 52% 26 Questions & Discussion . . . 27 Biographies & Acknowledgements Krishna Tateneni, Principal, Tateneni Consulting Before starting his own consulting practice Krishna worked as a Director in the Decision Modeling Group at Eidetics, a pharmaceutical marketing research and consulting firm in Boston. Prior to that he worked as the Lead Statistician for the College Board’s Advanced Placement Program at Educational Testing Service (ETS). He holds a Ph.D. in Quantitative Psychology from the Ohio State University. Megan Schiller, Director, Tateneni Consulting Megan has over ten years of experience in pharmaceutical market research. Prior to working with Tateneni Consulting she was a Senior Manager in the Client Service Group at Eidetics. She has experience in both quantitative and qualitative research techniques. She has a Bachelors of Science in Marketing from Boston College. The authors thank Pam Litner, Schlesinger Associates Downtown Chicago for her help in recruiting respondents for our study. Schlesinger Associates www.schlesingerassociates.com 28 APPENDIX: DATA ANALYSIS BY SEGMENT 29 Gender Conventional Convenience Healthy But Helpless Male, 33% Male, 41% Female, 59% Female, 67% You Process, I Prep Organic On-Your-Own Male, 24% Male, 36% Female, 64% Female, 76% 30 Age Conventional Convenience Healthy But Helpless 40 and over, 26% 40 and over, 40% Less than 40, 60% Less than 40, 74% You Process, I Prep Organic On-Your-Own 40 and over, 39% 40 and over, 45% Less than 40, 55% Less than 40, 61% 31 Location Conventional Convenience Healthy But Helpless 62% 60% 38% 25% 13% 1% Urban Suburban Rural Urban You Process, I Prep Suburban 79% Rural Organic On-Your-Own 62% 29% 17% 8% Urban Suburban 4% Rural Urban 32 Suburban Rural Political Leaning Conventional Convenience Healthy But Helpless 46% 26% Conservative Moderate 44% 24% 27% 28% Liberal Conservative Moderate You Process, I Prep Organic On-Your-Own 41% 44% 34% 21% Conservative Liberal Moderate Liberal 33 27% 28% Conservative Moderate Liberal Frequency of Organic Food Purchase Conventional Convenience Healthy But Helpless 71% 48% 49% 22% 7% Regularly 3% Occasionally Never Regularly Occasionally You Process, I Prep Never Organic On-Your-Own 59% 51% 38% 26% 24% 3% Regularly Occasionally Never Regularly 34 Occasionally Never Shopping Locations Conventional Convenience Healthy But Helpless 65% 40% 28% 25% 24% 9% Organic Stores Specialty Foods Farmers Markets Organic Stores Specialty Foods You Process, I Prep Farmers Markets Organic On-Your-Own 68% 48% 40% 27% 23% 8% Organic Stores Specialty Foods Farmers Markets Organic Stores 35 Specialty Foods Farmers Markets Frequency of Feeding Frozen / Preprepared Foods Conventional Convenience 76% 71% 22% 22% 7% 1% Never You Process, I Prep 1-3 times/wk 4+ times/wk Never 1-3 times/wk 4+ times/wk Organic On-Your-Own 69% 63% 30% 24% 7% Never Healthy But Helpless 1-3 times/wk 7% 4+ times/wk Never 36 1-3 times/wk 4+ times/wk
© Copyright 2026 Paperzz