Website vs. Traditional Survey Ratings

Website vs. Traditional Survey Ratings
DO THEY TELL THE SAME STORY?
By D. Randall Brandt
M
ove over, marketing researchers, and make room
for social media. Today, a plethora of general and
industry-specific websites make it possible for consumers to read what other consumers have to say about everything
from hotels to hospitals, restaurants to repair services and caterers to car dealerships. These same sites also make it possible for
consumers to share their own experiences, often in the form of
ratings and open-ended comments very similar to those captured
via traditional service quality or customer satisfaction surveys.
Some argue that, as such websites proliferate, traditional
survey research will be less necessary because the type of data
typically furnished by surveys will be public and essentially “there
for the taking.” Others respond by saying that, while website data
are cheap and accessible, they also are not as representative or
projectable as traditional surveys and therefore should be treated
as a supplement to, rather than a substitute for, such surveys.
Consider an article that appeared on March 21, 2011 on
Advertising Age’s website (“Will Social Media Replace Surveys as
a Research Tool?”), which proclaims “the top research executive of likely the world’s biggest research buyer expects surveys
to dramatically decline in importance by 2020 and sees the rise
of social media as a big reason why.” Jack Neff, the author, was
apparently quite taken by remarks made by Joan Lewis, global
consumer and market knowledge officer at Procter & Gamble,
in a speech delivered at a 2011 Advertising Research Foundation
conference in New York. Lewis was quoted as saying that the
industry should get away from, “believing that a [single] method,
particularly survey research, will be the solution to anything,”
and that, “the more people see two-way engagement and being
able to interact with people all over the world, I think the less
they will want to be involved in structured research.”
Not surprisingly, reactions to the article were mixed. Readers
who agreeed with Lewis offered comments such as, “monitoring social media absolutely is a path to powerful insights” and,
“social media listening is a more natural way of capturing the
voice of the customer in a non-contrived way.” Other readers
took an opposing view, making statements such as, “survey
research can never be fully replaced by social media research, just
as focus groups can never be replaced by surveys … every tool
has a specific strength that covers a specific weakness of another
FA L L
2 012
|
M A R K E T I N G P OW E R . CO M
| 9
tool,” and, “relying on social media to be the primary source of
research is like letting the squeaky wheel get all the grease.”
While they may be provocative, Lewis’ comments—and the
online commentary that ensued—can only take this burgeoning
debate so far. There is little doubt that social media channels are
going to be with us for a long time, and that they can furnish a
wealth of opportunities to listen, learn and respond to customers
who share their experiences online. But, can or will social media
replace surveys as a research tool? Can managers forgo investments in traditional surveys and, instead, rely on website ratings
and comments that are relatively cheap and readily available?
To answer these questions, we need research aimed at establishing the degree to which social media and traditional surveys
yield similar findings, where and how they are different, and
under what circumstances (if any) one should be preferred
Unfortunately, only a few relevant studies have been conducted, and the findings do not tell a clear or consistent story. Some
studies have focused on the extent to which social media and
survey data are correlated. For example, a study by Jamie BakerPrewitt showed that social media “buzz” volume was positively
related to several measures of customer loyalty and brand equity
across 10 different brands. (“Listening Versus Asking: How Social Media Measures Relate to Consumer Attitudes and Purchase
Behaviors.” Paper presented at the CASRO Online Research
Conference, March 3-4, 2011.)
Other researchers have focused on the comparability of social
media and survey data. For instance, Christian Bourque, Rick
Hobbs and Danielle St. Hilaire compared comments obtained
from 1,200 completed surveys with more than 10,000 social
media entries. They found major differences both in the relative
incidence of topics, as well as the proportion of positive, negative and neutral comments, appearing in the two data sources.
Their study also revealed that product owners and users were
represented in the social media data to a much greater degree
than in the survey data, and that the two data sources were not
comparable with respect to the time frame and context of data
collection. These findings led the authors to conclude that, “data
provided by these two approaches are sufficiently different from
each other that both are likely necessary.” (See “Apples and Oranges,” Marketing Research, Fall 2011, 9-13.)
Gina Pingitore conducted two studies, one in the electric utility industry and the other in financial services, in which survey
data from traditional online panelists were compared to those
obtained from real time data collection (RTD) or “rivering” respondents. Results revealed that both incidence and completion
rates were higher among traditional panelists, but that the two
sample sources were comparable demographically. Pingitore also
found that, while RTD and traditional panel sources produced
comparable key driver analysis results, performance ratings and
overall satisfaction indices generally were lower among RTD
respondents. (See “Results from Real Time Data Collection
versus Data from Traditional Panelists: Is It Valid to Combine
Data from These Two Sources?” Paper presented at the CASRO
Online Research Conference, March 3-4, 2011.)
In a study published at www.hotelmanagement.net on July 8,
2011 (“Study Shows TripAdvisor Is a Reliable Review Source”),
10 |
MARKETING RESEARCH
|
FA L L
2 012
Robert Honeycutt and Jonathan Barsky compared ratings from
TripAdvisor to those from a hotel customer satisfaction panel.
They found that these two data sources provided similar results
for some hotel properties, but conflicting results for others. In addition, there was much more variability (i.e., extremely favorable
or unfavorable ratings) in the TripAdvisor data.
From these studies it seems clear that: (a) no solid conclusions
can be drawn regarding the comparability of social media and
traditional survey data and (b) additional research is needed.
With this in mind, we developed a study aimed to address three
questions:
• Do website ratings tell the same story as ratings captured
through traditional survey methods?
• What proportion of consumers who have an opportunity to
share experience via website ratings actually do so?
• How similar are “posters” to “non-posters,” and do ratings
from posters tell the same story as ratings from non-posters?
The Research
Addressing our first research question requires capturing ratings
on a common set of items using a common rating scale. Also, it
requires collecting these ratings from two sources: (a) a website
sample of customers who have taken the initiative to share their
experiences online; and (b) a sample of customers who may or
may not have chosen to take their stories to a website, but who
are willing to share evaluations of their experiences when solicited via a traditional survey approach.
In this study, we obtained Web-based and traditional surveybased ratings from guests who had stayed at one of 38 properties of a leading upscale hotel brand during August 2011. To
build the Web-based sample, all ratings posted on TripAdvisor
for the 38 properties in question were “scraped” and assembled
for analysis. This yielded a total of 369 completed TripAdvisor
ratings. The smallest number of competed ratings for a single
property was two, and the largest was 25.
To build the traditional sample, we were able to obtain a list
of all guests who had stayed at one of the 38 properties during
August 2011. Valid e-mail addresses were available for most of
these guests and were used to make initial contact and invite
them to visit a website where they would be asked to provide
ratings on a small set of items, followed by a few additional
questions. At an 18% response rate, this yielded a total of 1,586
completed ratings. The smallest number of completed ratings for
a single property was 19, and the largest was 62.
Guests in both the TripAdvisor and traditional samples provided the five core ratings typically captured in a TripAdvisor
review: (1) overall hotel rating, (2) service, (3) value, (4) sleep
quality and (5) cleanliness. This five-point scale was used to capture all of the above ratings: excellent, very good, average, poor
and terrible. In addition to (and after) furnishing these ratings,
customers in the traditional sample also were asked about:
• The purpose of the August hotel stay (business or leisure);
• The annual number of business and leisure trips;
• If they posted about the August stay and, if so, on which
website;
• If they posted about any hotel stay during the past 12
!"#$%$&'('
)$*+,+-.+'$-'/0-.123$0-3'),45-'6,07'8,$9:;<$30,='<+,323'
4'8,4;$>0-41'?2,<+@'?4791+'06'A4>-B3'
FIGURE 1: Difference in Conclusions
Drawn from TripAdvisor® versus a Traditional Survey Sample of Ratings
TripAdvisor®
Traditional
TripAdvisor® results indicate that (a) “Value” has the highest
relative importance of any customer experience category,
and (b) the hotel receives its least favorable ratings on this
category.
Importance
Traditional survey results indicate that (a) “Service” has the highest
relative importance of any customer experience category, and (b)
the hotel receives its most favorable ratings on this category.
Performance
Importance
70%
70%
60%
60%
50%
50%
40%
30%
52%
47%
26%
40%
22%
20%
20%
0%
59%
56%
41%
35%
30%
26%
24%
20%
10%
-10%
62%
50%
32% 31%
Performance
15%
10%
Service
Value
Cleanliness
Sleep Quality
0%
Service
Value
Cleanliness
Sleep Quality
NOTE: In the case of Importance, percentages represent proportion of total variance in the overall rating explained by each customer experience category. In the case of Performance, percentages represent the proportion of guests
giving the hotel an “excellent” rating on each category.
=)50> 2R-'&#+'.43+'06'R790,&4-.+E'9+,.+-&4B+3',+9,+3+-&'9,090,>0-'06'&0&41'<4,$4-.+'$-'&#+'0<+,411',4>-B'
' +"914$-+;' %@' +4.#' .23&07+,' +"9+,$+-.+' .4&+B0,@F' ' R-' &#+' .43+' 06' S+,60,74-.+E' 9+,.+-&4B+3'
',+9,+3+-&'&#+'9,090,>0-'06'B2+3&3'B$<$-B'&#+'#0&+1'4-'U+".+11+-&V',4>-B'0-'+4.#'.4&+B0,@F'
months and, if so, on which website;
• If they used a website to plan their August stay and, if so,
which website;
• If they used a website to plan any stay during the past 12
months and, if so, which website;
• Gender; and
• Age.
These additional items enabled us to address the remaining research questions: What proportion of “qualified” guests actually
take the time to share their experiences in the form of a website
review? How similar are posters to non-posters? Do ratings from
posters tell the same story as ratings from non-posters?
The Results
For most of the 38 hotel properties studied, the number of
TripAdvisor and/or traditional survey ratings available for
analysis at the individual property level was inadequate. Therefore, our results are based upon analysis of data that were aggregated across the 38 properties.
Website-based ratings tell a different story than traditional
survey-based ratings. Two methods were used to compare
TripAdvisor with traditional ratings. Method 1 consisted of
drawing a proportionate random sample of 369 of the 1,586
guests who completed the traditional survey. This was done to
create a sample of traditional ratings that had the same total
number of observations as the TripAdvisor sample with a comparable number of guests per property. Ratings from these guests
were compared with those of the 369 guests who posted ratings
on TripAdvisor. Without exception, both mean and top-box (i.e.,
percent “excellent”) scores from TripAdvisor were lower than
the same scores obtained from the traditional sample, and the
differences were statistically significant at the 95% confidence
level for all comparisons except the top-box scores on the overall
hotel rating.
Recognizing that results from the preceding approach could
reflect the luck of the draw, Method 2 involved bootstrapping
both TripAdvisor and traditional samples. One thousand “subsamples” were drawn randomly (with replacement) in order to
estimate means, top-box scores and 95% confidence intervals for
both TripAdvisor and traditional ratings. Method 2 produced
essentially the same results as Method 1: On all five items, both
mean and top-box scores from TripAdvisor were lower than the
same scores obtained from the traditional sample, and the differences were statistically significant for all comparisons except the
top-box scores on the overall hotel rating.
We actually employed a third approach in which the data
from all 1,565 guests in the traditional survey sample were compared to the data from the 369 TripAdvisor raters. A weighting
procedure was applied to the traditional survey sample in order
to achieve desired proportionality across properties. This produced essentially the same results obtained from the proportionate random sampling and bootstrapping methods. Incidentally,
and in contrast to results reported by Honeycutt and Barsky
(2011), the range and variance in TripAdvisor and traditional
survey-based performance ratings were comparable.
Results from both methods of comparison suggest that, as
FA L L
2 012
|
M A R K E T I N G P OW E R . CO M
| 11
they pertain to performance, Web-based ratings may paint a
less favorable picture of the customer experience than ratings
obtained via a more traditional survey approach. Can the same
be said with regard to the importance of different customer
experience categories or elements? To address this question, we
performed a regression analysis using the overall hotel rating
as the criterion variable and the remaining four ratings (value,
service, cleanliness and sleep quality) as predictors.
E xecutiv e Summa r y
Do website ratings tell the same story as ratings captured
through traditional survey methods? What proportion of
consumers who have an opportunity to share experiences
via website ratings actually do so? How similar are “posters” to “non-posters,” and do ratings from posters tell
the same story as ratings from non-posters? This article
addresses these questions and offers some recommendations for getting the most out of both Web-based and more
traditional surveys.
An “averaging-over-ordering” regression method was used
to perform key driver analysis. Essentially, this method involves
performing multiple stepwise regressions in which all possible
orders of entering the predictor variables are examined. The relative importance of any given predictor is calculated as the mean
of its regression coefficients resulting from all possible orders of
entry. For a detailed description of this method, see Kruskal, W.
(1987), “Relative Importance by Averaging Over Orderings” The
American Statistician, 41, 6-10; Theil, H. and C. Chung (1988),
“Information-Theoretic Measures of Fit for Univariate and Multivariate Linear Regressions,” The American Statistician, 42, 249252; and Soofi, E. S., J.J. Retzer and M. Yasai-Ardekani (2000),
“A Framework for Measuring the Importance of Variables with
Applications to Management Research and Decision Models,”
Decision Sciences Journal, 31, 596-625.
A bootstrapping approach, similar to Method 2, was used to
develop importance estimates and their 95% confidence intervals
for both TripAdvisor and traditional samples. Results revealed
that TripAdvisor and traditional sample data differ with respect
to the order of importance of the four experience categories.
However, as the 95% confidence intervals overlap, the differences are not statistically significant.
Still, if only one or the other source is used, the TripAdvisor
and traditional survey data do not tell the same story. As Figure
1 on page 11 illustrates, importance and performance scores
from the TripAdvisor data suggest that “value” is the most
important of the four customer experience categories, and the
one on which the hotel brand performs poorest across the 38
properties examined. In contrast, importance and performance
scores from the traditional sample data suggest that “service” is
the most important category, and the one on which the brand
receives its most favorable performance ratings.
12 |
MARKETING RESEARCH
|
FA L L
2 012
Clearly, relying on only one of these two data sources would
lead managers to draw very different conclusions regarding
customer experience category importance and performance. The
question is which of the two sources is most accurate? To answer
this question, we have to understand a bit more about who is
represented in website versus traditional survey samples.
Posters comprise a very small percentage of all guests. Of the
1,586 guests who comprised the traditional sample in this study,
only 3% reported having posted a review about their August
hotel stay on TripAdvisor or a similar website. Furthermore,
when asked if they had posted a review regarding any hotel stay
during the past 12 months, only 12% said “yes.” Clearly, the
bulk of guests in the traditional sample have not made use of
social media for the purpose of sharing their experiences with
others, and therefore are not represented in data obtained from
that social commentary/media.
Posters are behaviorally and demographically different from
non-posters. In some respects, posters and non-posters are quite
similar. For example, 52% of posters and 53% of non-posters
said the purpose of their August 2011 trip was business. Also,
when asked to estimate how many business versus leisure trips
they make annually, responses of posters and non-posters were
nearly identical:
• Mean annual business trips were 17.8 among posters, compared to 17.1 among non-posters.
• Mean annual leisure trips were 5.5 among posters, compared to 5.2 among non-posters.
However, posters differed significantly from non-posters in at
least three ways:
1. Posters are more likely than non-posters to have researched
website reviews/ratings while planning a trip during the past 12
months, and, as a rule, they do so more frequently:
• 97% of posters reported having checked a website while planning a trip during the past 12 months, com-
pared to just 47% of non-posters.
• 64% of posters reported having checked a website five or more times while planning a trip during the past 12 months, compared to just 37% of non-posters.
2. Posters are more likely to be male:
• 67% of posters are male compared to 52% among non-posters.
3. Posters are more likely to be Gen-Xers than non-posters:
• 48% of posters were born between 1964 and 1984, compared to 34% of non-posters.
Generally speaking, whether sharing their own experiences
or reading those of others, posters are more engaged with social
media than non-posters. Also, they are demographically different
from non-posters in at least two ways.
Ratings obtained from posters tell a different story than
ratings obtained from non-posters. The two methods used to
compare TripAdvisor with traditional ratings were also used
to compare posters with non-posters. The results produced essentially the same findings: Without exception, both mean and
top-box scores from posters were lower than the same scores
obtained from non-posters, and the differences were statistically significant at the 95% confidence level for all top-box and
most mean scores. To evaluate relative importance of the four
customer experience categories, we used the same basic analytical approach used to compare TripAdvisor with traditional data.
Results revealed that posters and non-posters do differ with
respect to the order of importance of the four experience categories. However, as was the case with the TripAdvisor and traditional sample comparison, the differences are not statistically
significant, with the exception of “service,” which is significantly
more important among non-posters.
Keep Your Options Open
Website ratings do not appear to tell the same story as traditional
survey ratings. Web-based ratings are consistently less favorable
than traditional survey ratings, and, when used in conjunction
with estimates of the relative importance of alterative customer
experience categories, lead to different conclusions.
A very small minority of hotel guests post on websites like
TripAdvisor. They are similar to non-posters with respect to
purpose and frequency of travel, but different with respect to
age and gender, and in terms of their overall level of engagement
with social media. Ratings from posters do not appear to tell the
same story as ratings obtained from non-posters. Ratings from
posters are consistently less favorable than ratings from nonposters and often lead to different conclusions.
Admittedly, these findings are based on a single study in one
industry. Furthermore, only one website among many that are
available in the hotel industry was examined. Thus, none but
tentative conclusions can be drawn regarding the comparability
of Web-based and traditional survey-based customer feedback.
That said, the results of our research reveal clear differences
between the two data sources. Furthermore, the results suggest
posters are not representative of most customers, and that their
feedback is different from non-posters.
In light of these findings, one would be hard-pressed to defend
a recommendation that Web-based ratings may be used as a
replacement for more traditional survey ratings. Also, while
not directly examined in the research reported here, additional
limitations of Web-based data make it difficult to defend such a
recommendation.
• Sample sizes, particularly at the individual property or unit
level, are frequently too small to support statistical analysis and
inference. This is an especially critical consideration when incentives and/or compensation of managers and employees at the
unit level are in play.
• Unlike traditional surveys, observations obtained from
Web-based ratings may not be independent. To the extent that a
customer reads and is influenced by ratings of others, his or her
own ratings may change, distorting the experiential story he or
she might otherwise have shared and, thus, introducing a source
of bias into Web-based data.
• Relying on website ratings that are captured and controlled
by a third party means that managers are forced to live with the
categories and rating scales that are in place. Categories such as
“value” and “service” lack the granularity typically needed by
managers to make ratings actionable. Also, there is little opportunity for customization or exploration of critical ad hoc issues.
This is not to say that
traditional surveys don’t
go to marketingpower.com
have limitations. For exAMA Articles
ample, declining response
The Marketing Outlook Survey,
rates have been a concern
LoopFuse, Inc., 2012
for marketing researchers
for many years now.
Marketers Use Location Analytics to
In our own research,
Understand, Engage Shoppers,
Marketing Researchers, 2012
only an 18% response rate
was achieved using a traAMA Webcast
ditional survey approach.
How to Get Exceptional Consumer
Even though this produced
Insights and Market Research
a sample more than four
Using Facebook Data, sponsored
times the size of the Triby MicroStrategy, 2012
pAdvisor sample, the traditional survey approach still
failed to capture feedback
from 82% of guests who were “qualified” to share their experiences. This begs the question, “What are we leaving on the table
when we fail to hear from these guests?”
Simply put, no method of capturing the voice of the customer
(VoC) is perfect. Each has its relative strengths and limitations.
At least one consequence is that alternative methods and data
sources may tell different—sometimes conflicting—stories. Managers who rely on any single source do so at their own peril.
So, what should marketing researchers and managers do?
• Rather than relying on traditional surveys or social media,
capture and leverage data from both sources, along with other
VoC listening posts.
• Select and apply specific VoC methods and data sources
based on the appropriateness of each for addressing specific
managerial questions and information needs.
• Develop and implement a process of VoC integration that
incorporates a uniform set of customer experience categories,
along with analytical techniques that make multiple data sources
work together.
• Draw conclusions and support for decisions and actions
based upon convergence of findings from multiple data sources.
Capturing and integrating VoC via multiple methods is increasingly being touted as a best practice. Results of the research
described in this article support such an approach and, at the
very least, should discourage managers from relying exclusively
on any single VoC data source.
Perhaps the world in 2020 will have changed enough to force
us to reconsider this conclusion—but for now, using social media
to complement, rather than replace, traditional surveys seems the
prudent way to go. MR
Need More Marketing Power?
Note: Special thanks go to Jenny Anderson, Mary Barnidge, Laura
Nicklin, Liz Reyer, William Ryback, Kurt Salmela and Jianping Yuan,
without whose efforts this research would not have been possible.
✒
D. RANDALL BRANDT
is senior vice president, customer experience
management at Maritz Research. He may be reached at
[email protected].
FA L L
2 012
|
M A R K E T I N G P OW E R . CO M
| 13