Disconfirmation Effect on Online Rating Behavior: A Dynamic Analysis Abstract This research studies the effect of “disconfirmation”—the discrepancy between the expected and experienced quality of the same product—on the behavior of consumers leaving product reviews in an online setting. Specifically, we posit that an individual’s pre-purchase expectation is driven by the product ratings she observes and the credibility of the review system she perceives at the time of purchases. Upon product consumption, the individual obtains post-purchase evaluation and uses this information to confirm/disconfirm with her expectation. We extend the Bayesian learning framework to model the individual perception of the system credibility as how closely the ex-ante expectation matches the ex-post evaluation. Incorporating such dynamic perception in raters’ decisions allows us to better understand the chance of consumer engagement in sharing product experience over time. We estimate the proposed model using a rich data set comprising compete purchasing and rating activities made by 361 individuals on an e-commerce website. Our results show that consumer participation in leaving product reviews is subject to the disconfirmation effect. The further an individual’s post-purchase evaluation deviates from her pre-purchase expectation, the more likely she is to express her own opinions about the product. Our results also show that online consumers tend to be less active with respect to posting reviews as they perceive the system to be more precise over time. From simulations, we highlight 1) the dissimilar effects of disconfirmation and experienced quality on consumer rating behavior; 2) various evolutionary patterns of product ratings over time; and 3) the effect of fake, inflated ratings on subsequent rating entries. Keywords: product reviews, disconfirmation, Bayesian learning, world of mouth, user-generated content 1. Introduction It has been well recognized that online customer reviews and ratings 1 have a substantial impact on consumers’ purchasing decisions. According to surveys, 82% of the respondents agree that online reviews directly influence their decisions on what product to buy (Deloitte 2007) and over 75% of them consider recommendations from those who have experienced the product the most credible information source (Nielson 2007). By observing the usage experience shared by peer consumers, prospective buyers can reduce the product uncertainty and make more informed purchases, leading to higher satisfaction and lower merchandise return rates (PowerReviews 2010). As a result, the prevalence of online reviews enhances the information transparency and improves the efficiency of the digital marketplace. Recently, there is growing evidence that companies intend to manipulate online customer reviews for targeted products and services (Gormley 2013). Some boost the image of their own brands or merchandises by posting deceptive positive evaluation, whereas others undermine that of the competitors’ by making up negative usage experience, or both. Such unethical practices cast serious doubt on the trustworthiness of customer-reported ratings and jeopardize the credibility of the review system perceived by consumers (Johnston 2014). In this research, we empirically study how the behavior of consumer leaving online product reviews is impacted by disconfirmation, the discrepancy between the pre-purchase expectation and the postpurchase evaluation about the same product. Our proposed model is developed from two novel aspects. First, we examine the online rating behavior by looking at the activities involved throughout the entire purchase process. An online review platform is an information system which facilities information exchange among its users. System users can be categorized into two groups based on how they interact with the system. The first user type is the review reader (information receiver) who gathers information about others’ usage experience stored in various formats, such as numeric ratings, textual contents, multimedia files, etc. After consuming the product herself, a review reader has an opportunity to become a 1 Following literature, by reviews we mean a general format of consumer evaluation, whereas by ratings we specifically refer to numeric values indicating overall satisfaction. 2 review poster (information provider), the second role a user can play, by sharing her own assessment on the products. It has been shown that when making rating decisions, an individual tends to first observe the expressed opinions by others and then adjusts her own feedback accordingly (Moe and Schweidel 2012; Schlosser 2005). In terms of timing, prior research assumes that the previously posted ratings would impact a focal individual only when she faces a rating decision in the post-purchase stage, without considering an important fact that review posters are often review readers. Motivated by the dual roles a consumer can play, we posit that the influence of existing ratings may take place when the focal individual accesses the review information in the pre-purchase stage as well. The second novelty of this research is that we take into account that individuals may not perceive the review system to be truly trustful. Specifically, we consider that an individual holds the perception of the system credibility, a subjective attitude underlying how credible the system is perceived by her. In the prepurchase stage, a consumer formulates the pre-purchase expectation of the product based on the posted ratings, taking into account her own perceived system credibility. Upon product consumption, she obtains the post-purchase evaluation of the product and encounters disconfirmation. The realizations of disconfirmation are then used by the consumer to update her perception of the system credibility over time. Such a consumer response is especially realistic, after the outbreak of firms’ strategic manipulation in online rating environments. Unlike extant work that studies the review incidence from a static point of view, this research particularly examine at the evolution of the individual rating behavior from a dynamic perspective. We extend the Bayesian learning framework, flexibly allowing the perception of system credibility to change over time and dynamically incorporating this time-variant perception into raters’ decision-making process. The proposed model is applied to a unique dataset consisting of complete purchase history and review entries taken place on an e-commerce website. Utilizing this complete activity logs, we can exactly observe whether a consumer leaves a review for the product she has purchased earlier on. In addition, the length of our panel enables us to calibrate how consumer rating behavior changes over time. To the best 3 of our knowledge, this is the first research that looks at the evolution of consumer participation in giving online reviews at the micro level. Our estimation results show that consumer review-posting behavior is, at least partially, driven by the quality disconfirmation, perhaps due to an intention to correct the overall tone established by peer consumers. This finding provides an additional explanation to the common J-shaped distribution of online ratings across different platforms. Moreover, the disconfirmation effect on review contribution is further moderated by the rating environments where a focal individual is exposed. The magnitude of the effect is weakened by the level of dissension among previous posters; however, it is intensified by the total number of submitted reviews. Our results also reveal that consumers tend to be less active in sharing their opinions as they perceive the review system to be more credible over time. This finding raises a potential free-riding issue of which platform owners should be aware. To highlight the significance of the disconfirmation effect identified in this paper, we perform a series of simulations to better understand consumer rating behavior and the evolution of online product ratings. First, we recover the missing ratings for purchase occasions that do not have a follow-up review submission from the buyer. Our simulation results show that, at the population level, disappointed consumers (i.e. those who encounter negative disconfirmation) are more vocal. The second simulation presents various evolutionary patterns of online product ratings, all of which indicate that the average of posted ratings will converge to the true quality in the long run. While other researchers have explained why average product ratings decline over time (Li and Hitt 2008; Moe and Schweidel 2012), we believe that the disconfirmation effect provides an alternative explanation since it can also explain other common patterns observed in the real life. Our third simulation shows that fake, inflated reviews may strongly induce low ratings from customers who suffer from negative disconfirmation. The rest of this paper is organized as follows. In Section 2, we briefly review the relevant literature and differentiate our work from others. Section 3 describes the data and provides model-free evidence regarding consumer rating behavior. Our empirical model is specified in details in Section 4. Section 5 presents our estimation strategy, identification strategy, model fit and model comparison. In Section 6, we 4 present our estimation results and discuss the associated insights. We also provide robustness checks and conduct a series of simulations to highlight the significance of our main findings. Concluding remarks and future research directions are provided in Section 7. 2. Literature Review This work is related to the stream of literature studying why consumers engage in post-purchase word of mouth (WOM). Using survey data, researchers investigate this question from a motivational point of view. In an offline setting, consumers’ desire for altruism, product involvement and self-enhancement are main factors leading to positive WOM, whereas consumers spread negative WOM usually for altruism, anxiety reduction and vengeance purpose (Sundaram et al. 1998). With a similar approach, Hennig-Thurau et al. (2004) conclude that social benefits, economic incentives, concern for others, and extraversion are the primary motivations for consumer sharing their product experiences on the Internet. An extensive literature using quantitative methods has also been developed to identify what drives consumers to share their product or service experiences in the absence of monetary reward mechanisms. Dellarocas et al. (2004) examine the rating behavior on an electronic trading platform. They find that such voluntary behavior is driven by the expectation of reciprocal behavior, meaning that a trader evaluates her trading partner in order to solicit feedback from the other party. Shen et al. (2013) look at review posting behavior from a strategic perspective. Using the book review data from online book sellers, they argue that online reviewers are more prone to rate popular but less crowded books in order to gain attention and to reduce competition for attention at the same time. They also conclude that reviewers with high reputation costs are more likely to adopt an imitation strategy by posting ratings conforming to the consensus at a community. Factors affecting the level of WOM have also been studied at the population level. Rather than taking the common conception of the level of WOM activity as a monotonic function of customer satisfaction, Anderson (1998) discovers that the relationship between them exhibits a U-shaped pattern– customers are more likely to engage in WOM when they are either extremely satisfied or extremely dissatisfied with the products. Using the data from a movie website, Dellarocas and Narayan 5 (2006) also identify a similar association between observed rating density and perceived movie quality. Along this line, Dellarocas et al. (2010) further suggest that movie goers are more prone to post review for the most or least popular movies measured by box office revenues. There is an emerging literature stream examining how existing ratings affect subsequent ones. In a lab setting, Schlosser (2005) identifies the “self-presentational” phenomenon in which a review poster strategically adjusts her product ratings downwards after observing negative opinions by others, in order to present herself as intelligent. She also finds that a consumer would make her opinions more balanced if the opinions from the crowd exhibit a high level of dissention. Li and Hitt (2008) argue that predominant declining trend of book ratings can be attributed to consumers’ “self-selection” bias, meaning that early buyers have higher perceived quality and therefore tend to give higher ratings than do later buyers. Using reviews posted on an online retailer of bath, fragrance and home products, Moe and Schweidel (2012) show that consumers’ rating behavior is influenced by rating environments where they are exposed. In particular, they find that a consumer is more inclined to share her experience when the existing ratings have high valance and high volume. In addition, active posters are found to be more negative and exhibit differentiation behavior, which is consistent with the self-presentational strategy suggested by Schlosser (2005). Lee et al. (2014) distinguish prior ratings by the friends from those by the strangers and investigate whether these two different sets of ratings have distinctive impact on a focal individual’s opinions. They find that friends’ opinions always induce herding and the presence of social networking dampens the impact of opinions from the crowd. Research studying consumer rating behavior naturally leads to another stream of literature which examines whether customer reviews can represent true product quality. This particular literature stream can be further classified into two categories, depending on whether publicly available reviews are entirely generated by consumers or partially manipulated by firms. Following the notion that online ratings are truly truthful, Hu et al. (2006) discuss whether the mean of posted ratings can represent true product quality. In particular, they develop an analytical model assuming that a consumer would post a review only when the level of her satisfaction is either above or below a “brag-and-moan” interval; she would 6 otherwise be silent. Based on this assumption, they show that the average rating can serve as an unbiased signal if and only if two bounds of the interval are equal or symmetrically deviate from the true quality. On the other hand, Dellarocas (2006) assumes that the observed ratings may not be fully trustful and could be strategically boosted by firms. He demonstrates that inflated reviews are still informative since the firm producing high-quality products benefits the most through such manipulation. In terms of research context and methodology, our work is most relevant to Moe and Schweidel (2012) (henceforth MS). We develop our model based on MS which, in turn, is a generalization of Ying et al. (2006) who first propose that whether a product is rated should affect analyst’s prediction of that rating. Despite several similarities, this paper differs from MS in many aspects. First, our focus is to study how quality disconfirmation and perceived system credibility affect the individual rating decisions, while that of MS is to examine whether online raters would adjust their opinions according to expressed opinions by others. Second, we extend a dynamic learning framework, flexibly allowing an individual’s perception of system credibility to change over time and dynamically linking this perception to her rating decisions. On the contrary, MS use a static model in which posting decisions are treated independent and uncorrelated with each other. Finally, we are able to directly observe from our data set whether a consumer leaves a review for product she has previously purchased, whereas the purchase data is missing in MS. Our lengthy panel also allows us to construct a richer model to investigate why consumer rating behavior changes over time. This paper attempts to close the gap in online review literature by studying the relationship between posted reviews and subsequent ones from a different perspective. Specifically, we postulate that the impact of existing ratings may occur when an individual accesses review information for purchasing decisions (in the pre-purchase stage) even before she faces the rating decisions (in the post-purchase stage). We also consider that individuals may perceive the review system not to be truly trustful, by modeling how the individual perception of system credibility evolve over time and how such perception affects one’s rating decisions in a dynamic fashion. While credibility has been examined in different applications such as how it determines the persuasiveness of communication (Chaiken 1980), the way in 7 which credibility impacts consumers interaction with information systems (as review readers and posters) still remains unstudied. 3. Data and Model Free Evidence The data for this study is provided by an online e-commerce website which sells a variety of electronics and household items. The data set contains complete purchase history and review entries made by 1,000 heavy users, each of whom purchases at least 10 products.2 The purchase record set consists of detailed order information such as the product name, price, handling time and order placement date. The rating data set records customer-reported reviews in a typical format, including a review title, a review body, submission date, and an overall product rating on a discrete 5-star scale, with 5 being the best. The data spans from January 2007 to November 2011. During this period of time, the site had not gone through any policy or system design change that could influence consumers’ rating behavior. It did not implement any monetary and reviewers’ reputation mechanisms that could encourage posting either. Contribution on product review is considered voluntary and self-driven. Our data set is unique in two aspects. First, the complete logs of purchasing and rating activities inform us whether a consumer leaves a review for the product she has purchased earlier on. With this information, we can precisely discern posting behavior from lurking for each purchase occasion. Second, by leveraging the time stamp of all purchasing and rating activities, we are able to recover the rating information (characterized by the valence, volume and variance of posted ratings) available at any given time. It is also worth noting that our data was collected at the micro (individual) level. This nature distinguishes our work from others that commonly use review data at the aggregate (product) level. To identify the experienced quality and consumer learning dynamics, we restrict our attention to consumers who post at least one review, resulting in the final data set comprising 361 panelists who made 2 We study heavy users mainly because they are overwhelmingly more impactful to the e-commerce website in terms of both revenue generation and review contribution. 8 37,209 purchase transactions and 2,257 review entries.3 We also collected the mean of all posted ratings for each product in December 2013. We will utilize this long-run mean rating to proxy the baseline product quality. 4 Before introducing our empirical model, we briefly present some observations and address potential data issues. Table 1 reports the descriptive statistics of variables from our final data set. At the population level, the review posting rate is nearly 6% and the mean of all observed ratings is 3.83. Some variables, such as price and handling time, exhibit a long-tail property (Figure 1a). To deal with the over-dispersion issue, we take logarithmic transformation on them. The log-transformed variables appear to be normally distributed (Figure 1b). Table 1. Descriptive Statistics of Variables Notations yijt Description Mean St. Dev. Min Max A dummy indicating if a purchase occasion leads to a review entry 0.061 0.239 0 1 zijt pijt hijt Rijt Observed rating scores Product prices Order handling time (days) 3.826 50.208 2.328 1.200 200.170 3.674 1 0.440 0 5 13674 142 Valence of existing ratings 4.015 0.621 1 5 Volijt Varijt Volume of existing ratings Variance of existing ratings 58.357 0.817 186.776 0.652 0 0 3587 4 (a) Distribution of price (b) Distribution of log(price+1) Figure 1. Over-dispersion vs. Logarithm Transformation 3 4 The 361 selected individuals (36.1% of the entire sample size) account for 36.5% of all revenue generated and 33.4% of all items purchased by the sampled population. Given these statistics, we believe the selected individuals are representative of the sampled consumers with respect to purchase habits. We test the validity of this proxy in one of robustness checks. 9 The consumer rating behavior on the e-commerce website is shown in Figure 2 to Figure 6. Figure 2 plots the frequency of 5 discrete ratings. The distribution roughly follows a J shape, which is commonly found across various rating platforms (McGlohon et al. 2010). Such an unique shape is consistent with Anderson (1998) who argues that consumers are more prone to express opinions about products when they are either very satisfied (represented by a high score of 4 or 5) or very dissatisfied (represented by a low score of 1 or 2). To better understand the association between the rating left by a focal consumer (a focal rating) and the mean of previously posted ratings by others, we calculate the observed “rating discrepancy” ( zijt Rijt ) for all rated occasions (yijt = 1). Figure 3 shows that most focal ratings do not deviate too much from the overall tone established by the crowd. But, based on this distribution, can we conclude that online raters are more vocal when they concur with the consensus from their peers? This is the first question we attempt to answer in this paper. 36.9% Frequency (%) 40% 29.6% 30% 19.9% 20% 10% 7.2% 6.4% 1 2 0% 3 4 5 Figure 2. Distribution of Posted Rating Figure 3. Distribution of Rating Discrepancy As discussed earlier, during the time span of our data, the e-commerce website didn’t implement any marketing campaign or system design that can possibly alter consumer engagement in online WOM (eWOM). We should expect the rating behavior to be consistent over the time horizon. Indeed, Figure 4(a) shows that the monthly posting rate fluctuated in the early stage and stabilized later on.5 From Figure 4(b), we also see that the monthly average rating discrepancy did not systematically move upward or downward either. Interestingly, if we plot the average posting rate evaluated at the level of individual 5 We present individuals’ rating behavior since January 2008 because the number of samples is limited prior to this point of time. Also note that we exclude data points for 5 months in which no product review was posted. 10 purchase occasion,6 it is evident that consumer participation in leaving reviews exhibits a declining trend (Figure 5). In other words, consumers become less active in expressing their own opinions as they engaged in more purchase occasions on the site. To find out potential explanations for this downward pattern, we also look at other posted outcomes evaluated at the same (by-occasion) level. Despite some random disturbance, however, the average rating discrepancy across individuals did not exhibit any noticeable trend (Figure 6a), and nor did the average posted ratings (Figure 6b). In this paper, we also intend to explore factors that potentially explain the decline in review incidence observed from the data. Figure 4. Monthly Review Posting Rate and Average Rating Discrepency Figure 5. Review Posting Rate by Purchase Occasion Figure 6. Individuals' Rating Behavior by Purchase Occasion 6 We present the posting rate for the first 50 occasions since data points after the 50th occasion reveal a similar pattern. 11 4. Model We develop a dynamic model to study the evolution of consumer rating behavior in an online setting. The general modeling context is that, at a given time, consumers hold the idiosyncratic (prior) perception of the credibility of the review system. When a purchase occasion begins, the consumer first formulates a pre-purchase expectation about the product based on the rating signal she observes and the perceived system credibility she holds at that time. Upon product consumption she obtains the post-purchase evaluation. Using one attitude to confirm/disconfirm with the other, she encounters a certain degree of disconfirmation and then uses this private information to update her own (posterior) perception of the system credibility. Driven by the disconfirmation and others factors, the consumer faces two rating decisions, with the first decision being whether to leave a product review; if she decides to leave a review, she also chooses what rating to submit. The proposed model is presented in the following order: 1) formulation of the disconfirmation; 2) updating of perceived system credibility; 3) consumer decisions of leaving product ratings; and 4) interdependence between two rating decisions. 4.1. Formulation of Quality Disconfirmation Pre-purchase Evaluation. Consider an individual i who is about to purchase product j during occasion t. In our research context, the e-commerce website displays the aggregate statistics of customer selfreported ratings (e.g., arithmetic mean and distribution of discrete ratings) on the top of a product landing page. We assume that the prospective buyer would (at least) access the aggregate rating information available to her before committing the purchase. While posted ratings seem to provide objective information about the product quality, how credible the review system is perceived could be subjective and heterogeneous across individuals. As a result, we assume individual i’s pre-purchase expectation, Qˆ ijt , to follow a normal distribution: 1 Qˆijt ~ N R jt i ,t 1 , jt i ,t 1 , 12 (1) where R jt and jt respectively denote the mean and the precision of all ratings for j posted prior to the time of purchase. 7, 8 These two components together characterize the rating signal available to individual i. We model that the individual perception of system credibility is composed of two components: system biasedness (δ) and system precision (τ). In our model, the updating of the perceived system credibility occurs when the disconfirmation is realized after product consumption. It should be clear that one’s ex- ante expectation, formulated before the purchase, is influenced by the perception she has updated during the previous occasion (with t-1 subscript). The system biasedness i measures how biased the review system is perceived by individual i. When i 0 , individual i perceives the system to be neutral, and therefore believe that R jt provides an unbiased signal about the product quality. In this case, individual i formulates her expectation centered on R jt . When i 0 ( i 0) , she believe that R jt underrates (overrates) the product quality, and therefore she would adjust the mean of her expectation upwards (downwards) to offset the system biasedness she perceives. The sign of i can be thought as the direction in which the mean of individual i’s expectation deviates from the mean of posted ratings. The system precision i measures how precise the system is perceived by i at a given occasion. When it is large (small), the individual i would perceive the review system to be precise (noisy) such that her expectation would be tightly (loosely) centered on its mean. Post-purchase Evaluation. One of the most challenging parts of our modeling task is to model the baseline quality of products. One approach is to assume the latent quality of all products following a zeromean random effect (Moe and Schweidel 2012). To best utilize the publicly available information, we use 7 8 According to a survey conducted by Lightspeed Research (Leggatt 2011), 72% of online shoppers expect to find customer reviews available on the website they are shopping at, while 47% seek them out on company websites and 43% on price comparison sites. Therefore, we assume that one’s expectation is primarily influenced by the mean ratings observed on the same site. The precision characterizes the level of disagreement among others. We use precision instead of variance because the former gives us a neater expression of belief updating. Consumers’ ex-ante expectation is also subject to textual product reviews (Ghose et al. 2012). However, we cannot observe how many textual reviews each prospective buyer reads before making a purchase decision. Therefore, we use the mean of posted ratings to measure the overall opinions shared by others. 13 9 the long-term average rating, R j , as a proxy for the baseline product quality. We assume that the post- purchase evaluation individual i acquires upon the consumption of j to be: Qij R j i 0 , (2) where the parameter λi0 allows for variation in product evaluation across individuals. Disconfirmation. Having developed the expectation and evaluation of the product, we now are able to formally define the disconfirmation. We model disconfirmation as how far individual i’s ex-post evaluation deviates from her ex-ante expectation obtained from the same product: Qijt Qij Qˆ ijt . (3) Plugging (1) and (2) into (3), we have: Qijt ~ N Qijt , jt i ,t 1 1 , where Qijt ( R j i 0 ) ( R jt i ,t 1 ) . (4) (5) A positive Qijt indicates that the experienced product quality is higher than the expectation, and we say that the consumers encounter a positive disconfirmation. 4.2. Updating of the Perception of System Credibility Next, we explain how the individual perception of system credibility evolves over time. We assume that, before experiencing the product, individual i has prior beliefs of i ,t 1 | i ,t 1 and i ,t 1 , which jointly follow a normal-gamma distribution10: i ,t 1 | i ,t 1 ~ N i ,t 1 , i ,t 1 i ,t 1 ~ ai ,t 1 , bi ,t 1 , 9 1 , (6) (7) The use of long-term average ratings as proxies for product baseline quality can be justified in two aspects. From the perspective of the volume of ratings, the mean and minimal values of the total number of posted ratings at product level are 67 and 11, respectively. From the perspective of time horizon, the time we collected our longterm rating data set (December 2013) is 2 years after the latest purchase occasion observed in our final data set (November 2011). It is reasonable to argue that these long-term average ratings are representative of opinions from a sufficiently large consumer base. We also perform a robustness check in Section 7.1 and the results provide supportive evidence that the long-term average ratings well proxy the true product quality. 10 In the context of Bayesian learning, disconfirmation defined here follows a normal model with unknown mean and unknown variance. The conjugate prior for this model is a normal-gamma joint distribution. 14 where i is the precision parameter of a normal distribution, is a time-invariant constant, ai is the shape parameter and bi is the inverse scale parameter of a gamma distribution. After experiencing the product, the individual encounters a certain level of disconfirmation and use this private information to update her beliefs in system credibility. As a result, the perceived system credibility will change over time as the consumer is involved in more purchase-consumption occasions. According to Bayes’ rule, individual i’s posterior beliefs after receiving one disconfirmation signal are given by (DeGroot 1970): it | it ~ N it , it 1 , (8) it ~ ait , bit , (9) where it i ,t 1 D jt jt jt i ,t 1 Qijt , (10) it i ,t 1 D jt , (11) ait ai ,t 1 D jt / 2, (12) bit bi ,t 1 D jt jt i ,t 1 Qijt2 2 i ,t 1 1 , (13) and Djt is a dummy variable indicating whether there is at least one rating posted for j at the time of the tth purchase being placed. It is important at this time to point out how perception of the system credibility is updated. Consider a scenario where individual i has prior beliefs of δi,t-1 | τi,t-1 and τi,t-1. Suppose that she observes the rating signal for j and purchases j during purchase occasion t. Upon product experience, she receives one signal of disconfirmation, Qijt , and uses this private information to update her own beliefs. If there is no product rating available at the time of purchase (i.e. Djt = 0), no belief updating would occur and the beliefs would remain unchanged. If the rating signal is available (i.e. Djt = 1), she would jointly update her beliefs in system biasedness and precision. According to (10), the posterior mean it is the sum of the prior mean and the realized disconfirmation weighted by a fraction Tit / (Tit + γi,t-1). The signal precision Tit 15 and the prior precision γi,t-1 can be interpreted as the strength of the disconfirmation signal her encounters and the strength of the prior belief in system biasedness she holds, respectively. When Tit is large, individual i perceives the posted ratings to be more precise and therefore updates it in a relatively large amount, ceteris paribus. Similarly, the extent of updating of the scale parameter bit is increasing in the magnitude of the realized disconfirmation and the precision of the rating signal. When Tit is small, individual i anticipates the review signal to be noisy with a higher probability and therefore updates bit in a relatively small amount, ceteris paribus. What remains unspecified is the initial beliefs of each individual. The updating rule of the shape parameter indicates that the value of ai0 measures the richness of individual i’s initial learning experience. Since we can observe complete purchase history for all individuals from our data set, we fix ai0 at a small number11 (provided ai0 > 1) because consumers have very limited amount of learning experience with respect to system credibility until they receive the first disconfirmation signal. To allow for heterogeneity across individuals in their initial belief of system precision, we assume bi 0 ~ N(b0 , b2 ), where b0 and b2 measure the mean effect and dispersion of bi0 across individuals, respectively. We further assume i 0 0. This is reasonable because prior to the receipt of any disconfirmation signal, consumers may naturally perceive the review system to be unbiased, due to lack of purchase-consumption experience. Once they receive more disconfirmation signals, the perception of system biasedness will be updated depending on whether the perceived rating information inflate or underestimate the ones’ own product assessment. Finally, we fix γi0 = 0.1 to reflect consumers having an uninformative initial belief of δi0. The belief updating mechanism specified in our model is used to measure consumers’ subjective perception of system credibility. In terms of interpretation of the learning process, it is somewhat different from the traditional learning framework in which the unobserved knowledge (system biasedness and precision in our context) is fixed at a certain level and consumers learn more about it over time. 11 The choice of ai0 is subjective. We estimate the proposed model with ai0 fixed at different values (2.5, 5, 10, and 20) and we do not observe noticeable changes for the estimated parameters. Given this result, we report the estimation results with ai0 = 5 since it gives the best model fit. 16 4.3. Consumer Decisions on Whether to Post and What to Rate So far, we have presented a general model of how disconfirmation is derived, how the individual perception of system credibility is updated, and how these two constructs are linked to each other. In this section, we discuss how we model consumers’ rating decisions. Propensity Model. We posit that whether a consumer would post a review for the product she has previously purchased is mainly affected by the three components: the post-purchase evaluation she has, the disconfirmation she encounters, and the updated belief in the system precision she holds. Given that beliefs are modeled in a form of distribution, we use the mean of the distribution to measure the individual perception. Since the belief in system precision follows a gamma distribution, an individual i’s expectation of the updated system precision, Preit, is given by: Pr eit E it a it 1 . bit (14) According to the belief updating rule provided by (12) and (13), the latent variable Preit will become larger if the magnitude of disconfirmation is relatively small, meaning that the individual perceives the review system to be more precise. To investigate how the individual perception dynamically affects the online rating behavior, we incorporate the latent construct Preit in the raters’ decision-making process. We model that a rater’s decision of whether to leave a product review is governed by latent posting propensity. Specifically, individual i’s posting propensity for product j associated with occasion t, Propijt, is expressed as: Propijt* Propijt p ,ijt i 0 1 Qijt 2 Qijt2 3 Preit 4 Qijt 5 Qijt2 6:8 X jt p ,ijt , (15) where βi0 allows for heterogeneity in baseline propensity across individuals and can be interpreted as individual-specific net utility derived from writing an online review. The covariate Qijt is the mean of disconfirmation given in (5), Preit is the time-variant precision construct specified in (14) and Qijt is the post-purchase evaluation given in (2). We also add quadratic terms, Qijt2 and Qijt2 , to capture possible nonlinear relationships associated with Propijt. To control other effects that could potentially impact the 17 individual posting decisions, we include four variables in Xjt: product price, shipping and handling time which serves as a proxy for the service level received from the e-commerce site, and the volume and the statistical variance of posted ratings.12 Finally, εp,ijt is an idiosyncratic error with a mean of 0. Since the outcomes of the posting decision is binary (posting or lurking), we assume p ,ijt ~ N 0, 1 such that the resulting propensity model has a binary probit specification: Pr( yijt 1) Pr( Propijt* 0) Propijt , (16) where yijt is an occasion-specific dummy indicating whether a purchase leads to a review entry. The parameters of our main interest are β1, β2, and β3. The estimates of β1 and β2 together will inform us of the effect of disconfirmation on consumer participation in posting online reviews. The estimated parameter β3 will indicate whether the perceived system precision would encourage (if β3 > 0) or deter (if β3 < 0) a consumer to/from express/expressing her own opinions about the product. A significant β3 will suggest that the declining trend of posting rate observed from our data (see Figure 5) may be partially explained by the change in perceived system precision over time. Moreover, it has been shown that online opinions are subject to a polar effect – consumers with extreme opinions tend to be more vocal (Anderson 1998; Dellarocas and Narayan 2006). We will be able to confirm the existence of this effect if the signs of β4 and β5 are negative and positive, respectively. Evaluation Model. Now, suppose that individual i has decided to leave a proudct rating for j. We assume that i’s decision of what score to give is governed by latent rating evaluation, which is mainly driven by her proudct evaluation: Evalij*t Evalijt e,ijt Qijt 1:4 X jt e,ijt , (17) where εe,ijt is a zero-mean random shock. While consumers are commonly advised to provide productrelated feedback only, we often observe cases in which customers complain about the poor service they receive from the merchant in product reviews as well. The parameter λ2 will validate whether this 12 We observe from our data that a typical consumer either posts a product rating within two weeks after the order has been placed or does not express their opinions at all. The rating environments (such as volume, valence and variance of ratings) does not have noticeable changes during such short period of time. Therefore we use review statistics observed associated with occasion t to characterize the rating environment individual i is exposed to. 18 anecdotal observation exists among consumers. Since rating evaluation is continuous whereas the submitted ratings are discrete (1 to 5 in our context), we assume the relationship between them to follow: 5, 4, zijt 3, 2, 1, if 4 Evalijt* 5 , if 3 Evalijt* 4 , if 2 Evalijt* 3 , (18) if 1 Eval 2 , if 0 Eval 1 , * ijt * ijt where zijt denotes the submitted rating scores and κ. specifies the evalution-ratings translating cutpoints. For identification purpose we set κ0 =∞, κ1 = 0 and κ5 = ∞ (Koop et al. 2007), resulting in three cutpoints κ2, κ3 and κ4 to be estimated. 4.4. Interdependence between Two Rating Decisions So far we have developed two separate models governing individual decisions of whether to post and what to rate. However, the covariance matrix of the equation system has not yet been clearly specified. Since the post-purchase evaluation, Qijt, enters two equations simultaneously, parameter estimates could be biased if the interdependence between two decisions is not properly specified. As a result, we assume two sets of error terms to follow a bivariate distribution: 0 1 p ~ BVN , e 0 . 1 (19) For identification purpose, we fix the standard deviations of εp and εe at 1 in order to obtain binary probit and ordered probit specification, respectively. The parameter ρ is the correlation coefficient to be estimated.13 Given this covariance structure and the translating cutpoints defined in (18), the probability of observing a joint event of yijt = 1 and zijt =s is given by: 2 (, Propijt , ) 2 ( 4 Evalijt , Propijt , ) s 5, ( Eval , Prop , ) ( Eval , Prop , ) s 4, ijt ijt 2 3 ijt ijt 2 4 Pr( yijt 1, zijt s ) 2 ( 3 Evalijt , Propijt , ) 2 ( 2 Evalijt , Propijt , ) s 3, ( Eval , Prop , ) ( Eval , Prop , ) s 2, ijt 2 ijt ijt ijt 2 2 ( Eval , Prop , ) ( , Prop , ) s 1, 2 ijt ijt 2 ijt 13 (20) Alternatively, one can compute inverse Mills ratio (IMR) from the propensity model and plug ρ·IMR into (17). 19 where Φ2 denotes the standard bivariate normal cumulative distribution function (CDF). Our proposed model releases the independence assumption between two sets of observed outcomes (y’s and z’s). The probability of observing yijt = 0 (i.e. lurking) can be expressed as:14 Pr( yijt 0) 1 Propijt . (21) Based on (20) and (21), the joint likelihood for observing individual i who makes m purchase occasions and posts n product ratings is given by Li yi , zi t( y ijt 0) Pr yijt 0 |--- (m n) terms ---| t( yijt 1) Pr yijt 1, zijt s , |---------- (n) terms ---------| and the likelihood of observing the entire decision set made by N individuals is 5. (22) N i 1 Li yi , zi . Estimation For parameters that are subject to certain constraints we apply the following transformation strategy. First, since the correlation coefficient ρ takes value from [1, 1] only, we estimate the inverse arctangent transformation of it, which maps the support [1, 1] to real line (Ying et al. 2006). Second, we estimate log (b0 ) instead of b0 because the inverse scale parameter of a gamma distribution takes positive values only. Finally, to ensure the magnitude of three cutoffs to obey the desired order (i.e. κ2 < κ3 < κ4), we estimate the log of difference between two adjacent cutoffs. For variables having hyper-dispersion property such as product price pijt and order handling time hijt, we take logarithm transformation. We also mean-center all variables using the grand means to reduce the correlation between the estimated intercepts and slopes and avoid potential multicollinearity in the propensity model (especially for variables which 15 enter propensity model in both linear and quadratic forms). To estimate the proposed model, we use hierarchical Bayes approach which is convenient for the estimation of individual-specific parameters. Given the nature of parameter hierarchy, the parameters in our model can be divided into two groups (Netzer et al. 2008): (1) “random-effect” parameters that vary 14 15 The marginal distribution of one series of bivariate normally distributed variables is simply a normal density. We calculate the variance inflation factor (VIF) for the covariates included in the propensity model and the result shows no evidence of multicollinearity (VIF < 2). 20 across individuals (denoted by θi); and (2) “fixed” parameters that do not vary across individuals (denoted by ψ). We allow individual-specific parameters governing propensity intercept, product evaluation and initial learning parameter to be correlated by assuming: i ~ MVN , , (23) where denotes the mean effects that persist across individuals and Σ denotes the variance and covariance matrix of θ. As we do not have much prior knowledge about model parameters, we use diffuse multivariate normal for the fixed parameter ψ. There are 19 elements in the vector ψ, including β capturing the coefficients for 9 covariates in the propensity model, λ measuring the effect of 4 covariates in the evaluation model, 3 cutpoints (κ2, κ3, κ4) which maps continuous rating evaluation and discrete posted scores, and governing the mean effect of 3 individual-specific parameters. Let ξi denote individualspecific deviation from . Following (23), we can directly estimate those deviations using i ~ MVN 0, . We develop a Markov chain Monte Carlo (MCMC) procedure to recursively draw parameters from the corresponding full conditional distributions using the following steps: i | Yi , Z i , , , ; | i ; | Y , Z ,i , ; | Y , Z , , i ; (24) We adopt random walk Metropolis-Hasting algorithm for steps where the conditional posterior distributions do not have a closed form (Steps 1, 3 and 4 in (24)). To improve the efficiency of the sampler, we use a two-step strategy. In the first step, we run the MCMC for 50,000 iterations and discard 16 the first 25,000 draws as “burn-in” samples. We calculate posterior means and empirical variance based on the remaining 25,000 draws. In the second step, we run a separate MCMC using the posterior means 16 The choice of burn-in period is based on the visual observation on the trace plot of MCMC draws. In fact, 25,000 is a conservative number since the chain appears to converge after initial 5,000 iterations. 21 calculated in the previous step as the initial values and the empirical variance as the diagonal elements of the covariance matrix for the random-walk proposal distribution. 17 We run the MCMC sampler for 100,000 iterations and record every 10th draw only to mitigate the autocorrelation issue which is an evitable consequence of the MCMC simulation (Hoff 2009). The adaptive chain converges immediately and explores the parameter space efficiently. To test convergence, we perform Gelman and Rubin diagnostic (Gelman and Rubin 1992) by running 3 parallel chains with different initial values and random seeds. The potential scale reduction factors (PSRF) are roughly 1.02 for all parameters, suggesting that 18 the chains have converged to the target distributions. We combine draws from three parallel chains, resulting in an effective sample size of at least 500 for all parameters. 5.1. Identification We start our discussion on identification strategy from the evaluation model. There are at least two ways to identify the evaluation model. The first approach is to estimate all 4 utility-rating translating cutpoints with no intercept. To identify the mean effect of the intercept 0 , we adopt an alternative approach by holding one cutpoint as a constant (i.e. κ1 = 0) and estimating the remaining three. Given everything else equal, the difference in the mean ratings among individuals helps us identify λi. Similarly, we can achieve the identification of the baseline propensity βi0 through the difference in the posting rate among individuals. In our framework consumers first update their perception of system credibility, followed by making rating decisions. Denoting the change of posting behavior over time by a series S, we can express the relationship between β3, a0 and b0 using an analogous equation: S 3 f (a0 1) / b0 , where f () represents the belief updating mechanism. Notice that once the initial value of belief parameters is identified, the mechanism of belief updating would become deterministic and the identification for β3 can be achieved. As a result, the relationship among them is identified from the joint likelihood of the change in posting behavior over time. The difficulty is that we cannot identify a0 and b0 simultaneously given 17 The scale parameter of proposal distribution is adaptively chosen such that the acceptance rate is around 23%, as suggested for high-dimension vector (Gelman et al. 2003). 18 A series of MCMC draws are considered to achieve convergence as long as PSRF < 1.2. 22 their relationship described in (14). Since all consumers have limited knowledge about the credibility of the review system until they receive the first disconfirmation signal, we fix the richness of learning experience a0 at a small number (provided a0 1 ) such that b0 is identifiable. Given everything else equal, the variation in the change of posting behavior over time helps us identify the initial inverse scale parameter bi0, which determines the rate at which each individual learns about the system credibility. Once b0 is identified, the value of biasness parameter δit is determined and can be thought as a constant (denoted by c) when it enters the propensity model. We can express the mean of the disconfirmation signal as Qijt Qijt c. Expanding 1Qijt 2 Qijt2 4Qijt 5Qijt2 , substituting and collecting terms, we will have ( 1 2 2 c 4 )Qijt 5Qijt2 1c 2 c 2 , with 4 variables and 4 identifiable parameters. 5.2. Model Fit and Model Comparison To demonstrate the fit of our proposed model, we compare our full (dynamic) model with its three nested versions: Model 1– A static model (no dynamic learning). Model 2– A static model without disconfirmation entering propensity model. Model 3– A static model without disconfirmation and product evaluation entering propensity model. Table 2. Model Fit and Comparison Model Full Model 1 Model 2 Model 3 Dynamic Belief Updating Yes No No No Quality Disconfirmation Yes Yes No No Experienced Quality Yes Yes Yes No DIC w.r.t. Full Model* --349.7 427.2 483.9 * Lower DIC is preferred. Table 2 reports the deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002) for four models. The full model outperforms its three nested versions, and therefore we report and interpret the estimation results from the full model in the next section. Our full model has a better model fit over three static models because the belief updating mechanism provides a general way to explain the chance 23 in consumer rating behavior over time. In addition, the model fit becomes worse (higher DIC) if we exclude the disconfirmation effect (Model 2) and the effect of product evaluation (Model 3) on posting propensity one by one. This pattern provides supportive evidence that it is desirable to incorporate all three components when modeling the behavior of consumer leaving online reviews. 6. Results Table 3 reports the posterior means of parameters specified in the propensity model. Since we apply a Bayesian approach to estimate the model, we evaluate the significance level of parameter estimates based on the highest posterior density (HPD) intervals. A HPD interval describes the information we have about the location of the true parameter after we have observed the data (Hoff 2009). Parameter estimates are considered significant if the corresponding HPD intervals do not contain zero. Table 3. Estimation Results of Propensity Model Notation βi0 β1 β2 β3 β4 β5 β6 β7 β8 β9 Description Mean effect of baseline propensity Disconfirmation – Linear Disconfirmation – Quadratic Perceived system precision Experienced quality – Linear Experienced quality – Quadratic Price Shipping and handling time Volume of posted ratings Variance of posted ratings Coeff. 0.831 0.011 0.025 0.508 0.268 0.065 0.271 0.041 0.009 0.009 Signif. ** ** ** ** ** ** ** ** * ** (*) Indicates 95% (90%) HPD interval does not contain 0. The parameters β1 and β2 together suggest that an individual’s participation in leaving online product review is, at least partially, driven by the realization of disconfirmation. That is, the further the postpurchase evaluation deviates from the pre-purchase expectation, the more likely an individual is to express her opinions about the product. If the expectation more or less reflects one’s own assessment, the consumer is more inclined to lurk. Our estimation results demonstrate a very different story from what appears to be from the distribution of rating discrepancy (see Figure 3). Loosely, the disconfirmation 24 effect on the posting behavior could be attributed to many reasons such as altruism. An altruist may praise the product by submitting a score higher than the current mean level, if her usage experience is higher than what the rating signal represents. On the contrary, if her usage experience on the product is worse than what the rating signal indicates, she may want to warn peer consumers to be vigilant with the product by leaving a below-the-average rating. Moreover, a negative β1 suggests that negative disconfirmation induce posting behavior from consumers to a larger extent, ceteris paribus. While existing work has documented several motives why consumers engage in WOM from a normative point of view (HennigThurau et al. 2004; Sundaram et al. 1998), this research provides a positive validation and contribute to literature by demonstrating disconfirmation is one of the underlying forces driving consumers to voluntarily engage in online WOM. A unique feature of this research is that we tie the change of consumer participation in generating reviews to the evolutionary perception of the system precision. From an economic perspective, it should be clear that the disconfirmation has a different effect from the perceived system precision. The former captures the short-term impact of a disconfirmation shock (presented by β1 and β2), whereas the latter measures an accumulative attitude based on a series of realized disconfirmation in the past and can be thought to have an long-term influence on the rating behavior (presented by β3). Our results show that consumers tend to be more silent in rating activities as they perceive the review system to be more precise over time (β3 < 0). Alternatively, we can say that consumers are more vocal when the review system is perceived to be noisy. As for the effect of the product evaluation on posting propensity, the estimated coefficient is negative for the linear term (β4) but positive for the quadratic term (β5). Combined, these two parameter estimates indicate that consumers are more likely to share opinions when the experienced product quality to be either very high or very low. This finding echoes the polar effect observed by Anderson (1998) in an offline setting and by several researchers in an online setting (Dellarocas et al. 2010; Dellarocas and Narayan 2006; Moe and Schweidel 2012). Furthermore, we also find that consumers are more interested in rating products with higher prices (β6 > 0) and products handled and shipped out in a timely manner (β7 25 < 0). The rating environments, characterized by the volume and variance of posted ratings, has dissimilar impacts on a focal consumer’s posting decision. In particular, online raters’ posting decision is subject to a crowding-out effect, meaning that a focal individual would like to lurk if there has been a big crowd sharing their opinions (β8 < 0). Such finding is analogous to a political phenomenon where voters tend to abstain if public polls have declared clear winners (Sudman 1986). On the other hand, the dissension of posted opinions would encourage the focal individual to contribute in eWOM (β9 > 0). The estimation results of other parameters are reported in Table 4. The coefficient for product price is positive but insignificant (λ1 > 0), implying that product price in general is a weak proxy for quality. As expected, the order handing time has a negative impact on rating evaluation (λ2 < 0). This provides evidence that online consumers more or less reflect the service level they receive from the e-commerce site in the product ratings. Unlike in the propensity model, the dissension of public opinions has a negative impact on latent rating evaluation. The posterior mean of our estimate of log(bi) is significant, suggesting that the updating mechanism well captures the evolution of the system precision perceived by consumers. Table 4. Estimation Results of Evaluation Model and Other Parameters Model Evaluation Cutoffs Credibility Updating Correlation Notation Description Mean Signif. λi0 λ1 λ2 λ3 λ4 Mean effect of experienced quality Price Shipping and handling time Volume of the rating environment Variance of the rating environment ** κ1 κ2 κ3 κ4 2.189 ** ai0 Cutpoints for s=1 and 2 (fixed at 0) Cutpoints for s=2 and 3 Cutpoints for s=3 and 4 Cutpoints for s=4 and 5 Mean effect of initial inverse scale parameter (log-transformed) Initial shape parameter (fixed at 5) 1.997 0.033 .104 .015 .033 0.000 0.486 1.406 2.459 5.000 --- ρ Interdependency between two rating stages log(bi0) ** (*) Indicates 95% (90%) HPD interval does not contain 0. 26 0.082 ** * --** ** ** Table 5 reports the mean and standard deviation of individual-specific parameters. The large standard deviations for βi0 and λi0 suggest that online reviewers are substantially heterogeneous in baseline posting propensity and product evaluation (also see Figure 7). To explore the interdependence between baseline latent parameters at the individual level, we provide the pair-wise correlation matrix among them in Table 6. The positive correlation between βi0 and λi0 suggests that after controlling for other factors, individuals who have higher baseline posting propensity appear to be more lenient with respect to giving higher numeric ratings. This finding is opposite to Moe and Schweidel (2012) who find that active raters are more negative. Finally, our estimate results indicate less heterogeneity across individuals in their initial perception of system precision, as indicated by a small standard deviation for bi0. Table 5. Individual-specific Parameter Estimates Model Mean among individuals 0.831 1.997 8.931 Notation Baseline propensity Baseline evaluation Initial learning status (a) Distribution of βi0 βi0 λi0 bi0 Std. dev. among individuals 0.598 0.520 0.208 (b) Distribution of λi0 (c) Distribution of bi0 Figure 7. Distributions of Individual-Level Parameters Table 6. Individual-specific Parameter Estimates βi0 λi0 bi0 λi0 0.142 1.000 βi0 1.000 27 bi0 0.387 0.876 1.000 6.1. Extension and Robustness Checks In this section, we relax some of our model assumptions by taking into account other factors that could potentially bias our findings. The estimation results are consistent before and after we consider the following three robustness checks. 6.1.1. The Effect of Disconfirmation on Product Evaluation In our proposed model, we assume that the disconfirmation only enters the propensity model and does not have any impact on the post-purchase evaluation of the product. That is, consumers do not factor their prior expectation about quality into product evaluation. Prior research has shown that in offline settings, a consumer’s overall satisfaction could be affected by the quality disconfirmation (Anderson and Sullivan 1993) and such relationship is empirically shown to be positive (Rust et al. 1999). Based on this theory, we now allow a consumer’s rating evaluation to be affected by her overall product satisfaction S, which is a linear combination of her experienced product quality and the quality disconfirmation: Sijt Qijt Qijt , and ν is a new parameter to be estimated. With this specification, the evaluation model can be rewritten as: Evalijt Sijt 1:4 X jt . (25) Propijt i 0 1Qijt 2 Qijt2 3Cred it 4 Sijt 5 Sijt2 6:9 X jt . (26) The propensity model becomes: The mean of disconfirmation becomes: Q ( R j i 0 ) ( R jt it ) / (1 ) . (27) Comparing (27) with (5), we can see that the inclusion of ν will reduce the mean of disconfirmation signal towards zero, provided ν > 0. The estimation results (with respect to parameter estimates and DIC) of the new model only have marginal deviation from those reported in Table 3, suggesting that the main results of this paper is robust even if we consider the impact of disconfirmation on overall product satisfaction. The posterior mean of ν 28 is 0.117 and is significant at 95% level, suggesting that online consumers indeed factor in their ex-ante expectation when choosing product ratings. 6.1.2. The Interaction between Disconfirmation and Rating Environments We have examined how a focal individual’s posting decision is influenced by disconfirmation she encounters as well as rating environments to which she is exposed. An interesting question to ask is: Is the disconfirmation effect itself also moderated by opinions expressed by the crowd? To answer this, we allow the effect of disconfirmation on posting propensity19 to interact with the volume and variance of posted ratings, and incorporate these two interaction terms in our propensity model. The parameters of interest are reported in Table 7.20 Table 7. Effects of Rating Environments on Posting Decision Coeff. Effect Volume on propensity Variance on propensity Volume on Disconfirmation Effect Variance on Disconfirmation Effect 0.008 0.023 0.026 0.008 Signif. ** * ** (*) Indicates 95% (90%) HPD interval does not contain 0. Consistent with the results shown in Table 3, we still find the evidence that the level of disagreement among others encourages posting behavior. Interestingly, the coefficient for “Disconfirmation ൈ Variance” interaction term is negative and significant. This indicates that the disconfirmation effect is moderated by the degree of dissension among posted opinions, perhaps because a consumer can anticipate the review information to be noisy and hence become less sensitive to the realized disconfirmation. Moreover, the coefficient of “Disconfirmation ൈ Volume” interaction has a positive sign. This also makes sense. The higher the volume of posted ratings, the more trustworthy consumers will perceive the review information to be, and therefore the more sensitive they will be to the disconfirmation effect. 6.1.3. The Use of the Long-term Average Rating as a Proxy for Product Quality 19 The disconfirmation effect can be computed as the multiplication of disconfirmation variables and corresponding coefficients (for both linear and quadratic terms). 20 Since the parameter estimates do not have a substantial change, here we report coefficients for variables that are related to rating environments only. 29 The long-term product-level average ratings are used as proxies for the baseline product quality. Although the validity of this proxy is justified (Footnote 9), critical readers may argue that this proxy may not necessarily reflect the overall evaluation among all consumers. Hu et al. (2006) argue that the mean rating would over-report (under-report) the true quality if consumers are more inclined to brag (moan) about the product when they are highly satisfied (disgruntled). To address this issue, we add a product-level random effect in the formulation of the post-purchase evaluation. As a result, (2) can be rewritten as: Qij ( R j j ) i 0 , (28) where j ~ N (0, 2 ). The parameter ωj models the product-level randomness in a sense that some products are overrated and others are underrated by the long-tern mean ratings. If the value of estimated is small, we can argue that the long-term average ratings more or less reflect the true product quality. The estimation results obtained with this new specification do not have noticeable differences from those obtained from our original model. The posterior mean (standard deviation) for ωj is 0.011 (6×105), providing supportive evidence that the long-term average ratings serve as a good proxy for the baseline quality. What makes our finding different from the extant work is that, in addition to the post-purchase evaluation, we also consider the disconfirmation effect. The richer model allows consumers to make posting decisions in response to the overall tone on the product quality set by posted ratings. We should expect that, in the presence of the disconfirmation effect, the average rating will gradually converge to the true quality in the long run. In contrast, the results of Hu et al. (2006) is dependent on an assumption that the mechanism underlying consumer posting behavior is irresponsive to previously expressed opinions by others. 6.2. Simulations and Analyses on Consumer Rating Behavior In this subsection, we perform a series of counterfactual analyses to further highlight the significance of our findings and better understand consumer rating behavior in an online setting. 6.2.1. Recovering unobserved rating scores In the first analysis, we recover the rating outcomes (yijt‘s and zijt’s) for purchase occasions that have a follow-up review entries (yijt = 1). Those recovered (simulated) outcomes are compared to the observed 30 ones to gauge the performance of our model. We also “uncover” the outcomes for lurking occasions (yijt=0) to demonstrate the mechanism underlying online rating behavior. The simulation procedure for is summarized as follows: 1. We compute the latent posting propensity per (15) and evaluation per (17) for all purchase occasions based on the posterior estimates. 2. Given the computed posting propensity, evaluation and estimated cutpoints, we simulate individual decisions of whether to rate per (16) and what to rate per (18). 3. We repeat Step 2 for 1,000 iterations and compute the average for the quantities of our interests across iterations. To make sure the number of iterations is sufficient, we repeat Steps 2-4 in three parallel processes with different random seeds. We compare simulated results and do not find any inconsistency across processes. 36.8% 36.9% 40% Frequency (%) Simulated Actual 30.6% 29.6% 30% 19.6% 19.9% 20% 10% 6.9% 7.2% 6.1% 6.4% 1 2 0% 3 4 5 Rating Figure 8. Simulated vs. Actual Ratings We begin our analysis by assessing the performance of our model in explaining the behavior of consumer leaving ratings. Figure 8 plots the simulated ratings (presented by light grey bars) for those rated occasions against the actual ratings observed from the data set (presented by dark grey bars). Two series of numbers are very close to each other, indicating that the proposed model well captures the mechanism governing consumers’ rating decisions.21 21 We also run the same analysis for three alternative models defined in Section 6.2. The results show that our proposed model outperforms its three nested versions in terms of recovering consumers’ rating decisions. 31 For interpretation purpose, we compute the posting rates evaluated at two dimensions and plot them against different levels of post-purchase evaluation (Figure 9a) and different levels of rating discrepancy (Figure 9b). We can see that the relationship between posting rate and discretized product evaluation is best characterized by a left-skewed U shape. Negative experiences (rated as 1 or 2) have a higher chance to be reported (7.44% and 6.10%, respectively), relative to neutral (5.88%) and positive evaluations (5.54% and 5.70%). It is evident that disappointed or disgruntled consumers (those who encounter strong, negative rating discrepancy) are more vocal in expressing opinions about the product. The simulation results provides a sharp insight into what drives consumers to leave online reviews; an insight that cannot be discovered from directly observing the data pattern, as shown in Figure 2 and Figure 3. (a) Posting Rate (%) vs. Post-purchase Evaluation 8% 7.44% (b) Posting Rate (%) vs. Rating Discrepancy 12% 6.10% 6% 5.88% 5.54% 5.70% 9% 4% 6% 2% 3% 0% 0% 1 2 3 4 5 -3.5 -2.5 Rating -1.5 -0.5 0.5 1.5 2.5 3.5 Rating Discrepancy Figure 9. Dissimilar Effects of Different Measurements on Posting Behavior 6.2.2. Convergence of product ratings To highlight how the disconfirmation effect influence the evolution of product ratings at the product level, we perform our second simulation by following the procedure below. 1. We sample 3,000 individuals from the posterior estimates of population-level parameters (Σ). We assume all individuals purchase and consume the product with the baseline quality being equivalent to a 3.5 star. The sequence of individuals entering the market is random. 2. For each individual we first simulate her posting decision; and for the individual who decides to post a rating, we simulate her decision of what rating to give. When a new rating is posted, we calculate the up-to-date mean ratings as well as other variables that will serve as control variables in the subsequent rating occasions. 32 3. We repeat Step 2 until 150 ratings are posted. 4. We repeat Steps 1-3 for multiple iterations. Figure 10 depicts four major evolutionary patterns of the online product ratings observed from our simulation results. It is evident that in the long run, the mean rating will converge to the true product quality (3.5). If the mean rating in the early stage is relatively high to the true quality, individuals with neutral or negative evaluation would perceive the rating signal to be inflated. The disconfirmation effect would take place and induce low ratings from those dissatisfied consumers. As a result, we shall expect that the mean rating will gradually drop to the level of 3.5 and then stabilize (see Figure 10a). While such declining trend have been attributed to different effects such as self-selection biases (Li and Hitt 2008) and environmental factors (Moe and Schweidel 2012), the effect of disconfirmation on posting decisions can also provide an alternative explanation to this data pattern. Perhaps more importantly, the disconfirmation effect can also explain other evolutionary patterns that cannot be explained by those established theories. For example, if the product is initially underrated, the subsequent consumers who encounter positive disconfirmation are more likely to lift the mean rating up by sharing their pleasant usage experiences (Figure 10b). Interestingly, we also observe an undershooting property where the average rating first exhibits a steep declining pattern and then bounces back to a steady state (Figure 10d). Based on our estimation and simulation results, we believe that the disconfirmation effect is one of the most important forces driving the voluntary behavior of consumer posting product reviews. 6.2.3. The effect of fake, inflated ratings on subsequent rating entries Our simulation can also be used to understand how review manipulation in the early stage influences subsequent ratings submissions. Similar to previous setting, we consider a fictitious scenario where a firm, selling a product with true quality being equivalent to a 3.5, is able to “set” the mean rating at 4 (an inflation of a 0.5 star) when the number of reviews reaches 15. We repeat Steps 1-4 listed in previous subsection for 10 iterations and average the posted ratings cross iterations. As Figure 11 illustrates, the mean rating plummets after the 15th entry and stabilizes at the level of true quality. Although fake, inflated ratings may elevate buyers’ pre-purchase expectation and hence reap additional revenue, we argue that such deceptive information may strongly induce very negative feedback from disgruntled 33 consumers who post reviews for anxiety reduction or vengeance purpose. Similarly, we should expect that the damage caused by the competitors’ malicious review distortion would be alleviated by subsequent positive ratings reported by genuine and satisfied customers. Figure 10. Various Evolutionary Patterns of Ratings (X-axis: Rating Index) Figure 11. Effect of Inflated Ratings on Subsequent Rating Entries (X-axis: Rating Index) 7. Conclusion The primary objective of this paper has been to study the effect disconfirmation has on consumer rating behavior in a digital setting. The early research on online product reviews has studied how user-generated ratings can be related to market performance, whereas the recent work focuses on the impact of existing ratings on subsequent ones from a social dynamics standpoint (Lee et al. 2014; Moe and Schweidel 2012; Shen et al. 2013). Our work contributes to the latter literature stream by proposing a novel framework in which a focal consumer’s posting decisions has been influenced by others’ opinions even before the 34 product is purchased and consumed. In addition, we model how consumers’ perception of the system credibility evolves over time as a result of a series of realizations of disconfirmation. By integrating these two features in the individual decision-making process, we are able to provide a comprehensive understand on what drives consumers to voluntarily contribute online product ratings. Using a rich data set containing complete purchasing and rating activities at the individual level, we empirically show that online consumers’ rating behavior is driven by disconfirmation, the extent to which the post-purchase evaluation deviates from the pre-purchase expectation of the same product. The magnitude of the disconfirmation effect is decreasing in the variance of posted ratings while increasing in the volume of submitted reviews. We also find that online raters tend to become silent as they perceive the review system to be more precise over time. The main finding of this paper echoes prior research in some aspects but also provides different insights in others. On one hand, the disconfirmation effect serves as one of the underlying drivers of why people engaging in WOM such as concerns for others, anxiety reduction, vengeance, etc. (Anderson 1998; Hennig-Thurau et al. 2004). The impact of disconfirmation on posting behavior can also (at least partially) explain 1) the commonly observed U-shaped distribution of online product ratings; and 2) the declining trend of average ratings at the product level. On the other hand, we believe that the long-term average rating can more or less represent the true product quality. Through simulations, we demonstrate that the identified disconfirmation effect will “ensure” that the average rating converges to the true value in the long run. Our proposition to some extent disagrees with Hu et al. (2006) who argue that the mean score may provide misleading recommendation, which is contingent on an analytical assumption that consumers’ posting decisions are solely triggered by their perceived quality. Our empirical results shed light on the economic value of online product ratings in the following aspects. Manufactures or service providers should be aware that online reviewers are more prone to provide feedback when expectations do not match their own perceived quality. While manipulating product reviews by inflating numeric ratings can temporally boost the sales revenue, inflated ratings could 35 turn to be detrimental in the long run as disappointed customers engage in negative eWOM. 22 Furthermore, online reviewers’ perceptions of system precision have a negative impact on their propensity to contribute reviews. Existing literature also shows that rating environment with smaller volume or lower valence discourages posting incidence (Moe and Schweidel 2012). To keep the online rating environment healthy and prosperous, policy makers and marketers who are in charge of online ratings campaigns should design various incentives for different consumers accordingly. This study has some limitations and can be improved in the following directions. First, our data has limited information which prevents us from examining online rating behavior in a more detailed way. For example, although we have utilized the aggregate review information, such as valence and variance, in formulating quality variables, we do not consider the possibility that some consumers may value positive ratings and negative ones differently. If we were able to observe the distribution of posted ratings at the time of purchase, we might be able to discover more interesting findings on the way consumers interpret review signal in the pre-purchase stage. Second, a promising direct for future research is to apply text mining techniques to extract the sentiments stored in the textual data and incorporate them into our econometric model. The synergy created by the integration of different research methods may allow us to provide a shaper insight into consumers’ online rating behavior. Third, consumers’ browsing history before or after purchase can greatly help us build up an even more sophisticated model. For example, if we knew how many pages of reviews each prospective buy reads and how long she stays on each page, we were able to better calibrate how previously posted reviews impact the formulation of the expected quality. If we further knew what similar items each prospective buyer has looked at, we could directly model the effect of existing ratings on a focal consumer’s purchase decision. Reference Anderson, E. W. 1998. Customer Satisfaction and Word of Mouth. Journal of Service Research 1 (1):517. 22 In our empirical model, we can only show that inflated ratings would drive disappointed buyers to leave negative numeric ratings. However, we cannot gauge the harm those negative textual reviews would do to the firm’s reputation. 36 Anderson, E. W., and M. W. Sullivan. 1993. The Antecedents and Consequences of Customer Satisfaction for Firms. Marketing Science 12 (2):125-143. Chaiken, S. 1980. Heuristic versus systematic information processing and the use of source versus message cues in persuasion. Journal of personality and social psychology 39 (5):752. DeGroot, M. H. 1970. Optimal statistical decisions. Vol. 82: Wiley-Interscience. Dellarocas, C. 2006. Strategic Manipulation of Internet Opinion Forums: Implications for Consumers and Firms. Management Science 52 (10):1577-1593. Dellarocas, C., M. Fan, and C. Wood. 2004. Self-interest, reciprocity, and participation in online reputation systems. Dellarocas, C., G. Gao, and R. Narayan. 2010. Are consumers more likely to contribute online reviews for hit or niche products? Journal of Management Information Systems 27 (2):127-158. Dellarocas, C., and R. Narayan. 2006. A statistical measure of a population’s propensity to engage in post-purchase online word-of-mouth. Statistical science 21 (2):277-285. Deloitte. 2007. Most consumers read and reply on online reviews; companies must adjust. USA. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2003. Bayesian data analysis: CRC press. Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical science:457-472. Ghose, A., P. G. Ipeirotis, and B. Li. 2012. Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science 31 (3):493-520. Gormley, M. NY attorney general cracks down on fake online reviews. NBC News 2013 [cited. Available from http://www.nbcnews.com/tech/internet/ny-attorney-general-cracks-down-fake-online-reviewsf4B11235875. Hennig-Thurau, T., K. P. Gwinner, G. Walsh, and D. D. Gremler. 2004. Electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet? Journal of interactive marketing 18 (1):38-52. Hoff, P. D. 2009. A first course in Bayesian statistical methods: Springer. Hu, N., P. A. Pavlou, and J. Zhang. 2006. Can online reviews reveal a product's true quality?: empirical findings and analytical modeling of Online word-of-mouth communication. In Proceedings of the 7th ACM conference on Electronic commerce. Ann Arbor, Michigan, USA: ACM, 324-330. Johnston, S. Don't Fall for Fake Online Reviews. US News 2014. Available at http://money.usnews.com/money/personal-finance/articles/2014/07/08/dont-fall-for-fake-onlinereviews. Koop, G., D. J. Poirier, and L. Tobias. 2007. Bayesian Econometric Methods (Econometric Exercises): Cambridge University Press. 37 Lee, Y., K. Hosanaga, and Y. Tan. 2014. Do I follow my friends or the crowd? Information Cascades in Online movie rating. Management Science Forthcoming. Leggatt, H. 2011. 24% of consumers turned off after two negative online reviews 2011 [cited April 13 2011]. Available from http://www.bizreport.com/2011/04/27-of-consumers-turned-off-by-just-twonegative-online-revie.html. Li, X., and L. M. Hitt. 2008. Self-Selection and Information Role of Online Product Reviews. Information Systems Research 19 (4):456-474. McGlohon, M., N. Glance, and Z. Reiter. 2010. Star quality: Aggregating reviews to rank products and merchants. Paper read at International Conference on Weblogs and Social Media. Moe, W. W., and D. A. Schweidel. 2012. Online Product Opinions: Incidence, Evaluation, and Evolution. Marketing Science 31 (3):372-386. Netzer, O., J. M. Lattin, and V. Srinivasan. 2008. A Hidden Markov Model of Customer Relationship Dynamics. Marketing Science 27 (2):185-204. Nielson. Word-of-Mouth the most Powerful Selling Tool. Nielson 2007 [cited 2013/04/29. Available from http://nz.nielsen.com/news/Advertising_Oct07.shtml. PowerReviews. 2010. Available from http://www.powerreviews.com/products/social-content- engine/customer-reviews. Rust, R. T., J. J. Inman, J. Jia, and A. Zahorik. 1999. What You Don't Know About Customer-Perceived Quality: The Role of Customer Expectation Distributions. Marketing Science 18 (1):77-92. Schlosser, A. E. 2005. Posting versus lurking: Communicating in a multiple audience context. Journal of Consumer Research 32 (2):260-265. Shen, W., Y. J. Hu, and J. Rees. 2013. Competing for Attention: An Empirical Study of Online Reviewers’ Strategic. Working paper. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. Van Der Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4):583-639. Sudman, S. 1986. Do Exit Polls Influence Voting Behavior? The Public Opinion Quarterly 50 (3):331339. Sundaram, D. S., K. Mitra, and C. Webster. 1998. Word-of-mouth communications: a motivational analysis. Advances in consumer research 25 (1):527-531. Ying, Y., F. Feinberg, and M. Wedel. 2006. Leveraging Missing Ratings to Improve Online Recommendation Systems. Journal of Marketing Research 43 (3):355-365. 38
© Copyright 2026 Paperzz