Using Online Ratings as a Proxy of Word-of-Mouth in Motion Picture Revenue Forecasting Chrysanthos Dellarocas • Neveen Awad Farag • Xiaoquan (Michael) Zhang R. H. Smith School of Business, University of Maryland, College Park, MD 20742 Wayne State University, Detroit, MI 48202 Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139 [email protected] • [email protected] • [email protected] Abstract The emergence of online product review forums has enabled firms to monitor consumer opinions about their products in real-time by mining publicly available information from the Internet. This paper studies the value of online product ratings in revenue forecasting of new experience goods. Our objective is to understand what metrics of online ratings are the most informative indicators of a product’s future sales and how the explanatory power of such metrics compares to that of other variables that have traditionally been used for similar purposes in the past. We focus our attention on online movie ratings and incorporate our findings into practical motion picture revenue forecasting models that use very early (opening weekend) box office and movie ratings data to generate remarkably accurate forecasts of a movie’s future revenue trajectory. Among the metrics of online ratings we considered, we found the valence of user ratings to be the most significant explanatory variable. The gender diversity of online raters was also significant, supporting the theory that word-of-mouth that is more widely dispersed among different social groups is more effective. Interestingly, our analysis found user ratings to be more influential in predicting future revenues than average professional critic reviews. Overall, our study has established that online ratings are a useful source of information about a movie’s long-term prospects, enabling exhibitors and distributors to obtain revenue forecasts of a given accuracy sooner than with older techniques. 1 1 Introduction Recent advances in information technology have enabled the creation of a diverse mosaic of technologymediated word-of-mouth communities where individuals exchange experiences and opinions on a variety of topics ranging from products and services, to politics and world events1 . Online communities allow opinions of a single individual to instantly reach thousands, or even millions, of other people. This escalation in audience is altering the dynamics of many industries where word-of-mouth has traditionally played an important role. For example, the entertainment industry has found that the rapid spread of word-of-mouth is shrinking the lifecycles of its products (movies) and causing it to rethink its release and marketing strategies2 . Rapid measurement is the first prerequisite of the fast reactions that are needed in this new playing field. Fortunately, in addition to accelerating its diffusion, the Internet has made wordof-mouth instantly measurable: traces of word-of-mouth can be found in many publicly available Internet forums, such as product review sites, discussion groups, chat rooms, and web logs. This public data provides organizations with the newfound ability to measure word-of-mouth as it happens by monitoring information available on the Internet. Unfortunately, unlike traditional media, online word-of-mouth currently lacks an accepted set of metrics. Therefore, even though firms can collect large amounts of information from online communities, it is not yet clear how they should analyze it. Only a handful of studies have looked at the information value of online word-of-mouth; each has studied a different type of community and (perhaps as a consequence of this) has found a different metric to be most relevant. Godes and Mayzlin (2004) studied unstructured Usenet conversations about TV shows. They related various metrics of these conversations to a dynamic model of sales and found that the "dispersion" of conversations across communities had explanatory power, whereas the volume of conversations did not. Liu (2004) found that the volume of messages posted on Internet message boards about upcoming and newly released movies was a better predictor of their box office success than the 1 Examples of such communities include online product review forums, Internet discussion groups, instant messaging chat rooms, mailing lists and web logs. Schindler and Bickart (2003) provide a comprehensive overview. 2 Movies are seeing much more rapid change in revenues between the opening weekend and second weekend, suggesting that public opinion is spreading faster (Lippman 2003). Rick Sands, the chief operating officer at Miramax, summarized this trend by stating that “In the old days . . . you could buy your gross for the weekend and overcome bad word of mouth, because it took time to filter out into the general audience. Those days are over. Today, there is no fooling the public” (Muñoz, 2003). 2 valence (percentage of messages that express positive opinions) of these messages. This paper focuses on another important type of online word-of-mouth: numerical product ratings posted by consumers online. In the last few years, a number of popular web sites (such as Amazon, Epinions, Yahoo, etc.) have attempted to introduce structure into the conversations posted therein by allowing users to submit numerical ratings about the topic being discussed in addition to (or instead of) a more detailed text review. The introduction of numerical ratings has significantly lowered the cost of submitting evaluations online. This has led to a rapid increase of the number of consumers who became active contributors. Our objective is to establish evidence for the usefulness of online product ratings in revenue forecasting of new experience goods. Furthermore, we are interested in understanding what metrics of online ratings are the most informative indicators of a product’s future sales and how the explanatory power of such metrics compares to that of other variables (such as marketing expenditures and expert reviews) that have traditionally been used for similar purposes in the past. We focus our attention on online movie ratings and incorporate our findings into practical motion picture revenue forecasting models that use very early (opening weekend) box office and movie ratings data to generate remarkably accurate forecasts of a movie’s future revenue trajectory3 . A number of factors make the motion picture industry an ideal test bed for this type of study. First, it is an industry where word-of-mouth plays an important role. Second, there is widespread availability of movie ratings on the Internet; the most popular sites (Yahoo! Movies, IMDB, RottenTomatoes.com) receive hundreds of ratings within hours of a new movie’s release. Third, production, marketing and daily box office data are easily available for most movies, making it easy to correlate the dynamic evolution of a movie’s performance to that of online ratings. Fourth, a sizable academic literature exists on motion picture revenue forecasting (Section 2 provides an overview). Several of these studies have attempted to model the impact of word-of-mouth on movie revenues; most, however, have relied on more traditional explanatory variables, such as a movie’s star power, marketing expenditures, distribution strategy, or professional critic reviews. These studies, thus, serve as a useful benchmark for assessing the added value of online ratings. 3 Throughout the paper, our perspective is that online ratings consititute a valuable real-time “window” into consumer attitudes that can be exploited by firms to forecast future revenues earlier than with more traditional methods. Our study does not attempt to consider the important question of whether online ratings influence (as opposed to predict) future revenues. 3 We developed and tested a family of forecasting models, based on a novel extension of the Bass model of product diffusion (Bass 1969) that takes into consideration the unique properties of the motion picture industry. Among the metrics of online ratings we considered, we found the valence (average numerical value) of user ratings to be the most significant explanatory variable. The gender diversity of online raters was also significant, supporting the theory that word-of-mouth that is more widely dispersed among different social groups is more effective (Godes and Mayzlin 2004). The daily volume of online ratings was highly correlated with the corresponding box office revenues. It is, therefore, best viewed as a proxy of sales volume. Our results support the hypothesis that the impact of word-of-mouth on future sales is proportional to the volume of past adopters; we did not find any special significance of the volume of online ratings beyond that. Interestingly, our analysis found user ratings to be more influential in predicting future revenues than average professional critic ratings. Given the amount of attention that critic ratings have been receiving until now, this result has considerable practical consequences. At the same time, the correlation between user and expert ratings was relatively low; higher predictive power could be achieved by combining them. This finding provides support for the credibility of user ratings, but also suggests that they should best be viewed as a complement, rather than as a substitute, of expert reviews. Using only opening weekend box office and online ratings data, our best model was able to forecast the total revenue of movies in a randomly chosen hold-out subset of our sample with an average relative absolute error of 14.1%. As we discuss in Section 5, such levels of accuracy would have required the use of two weeks of box office data using older techniques. Overall, our study provides positive evidence that online ratings are a useful source of information about a movie’s long-term prospects. From a managerial perspective, the added value of online ratings is that they allow forecasts of a given accuracy to be obtained sooner than with older techniques. The ability to generate very early forecasts has the potential to alter the way that the movie industry is using such tools. Currently, post-release forecasts are primarily of value to movie exhibitors who use them to better manage the yield from their exhibition capacity. We believe that the real-time availability of reliable estimates of word-of-mouth can have important implications for motion picture marketing as well. Such information may allow movie distributors to fine-tune a movie’s campaign, or to develop entirely new marketing strategies that can attempt to respond to 4 an audience’s initial reception of a new movie. The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 describes our data set. Section 4 introduces our forecasting models. Section 5 presents the results of fitting our models to our data set and compares their forecasting accuracy to that of older models. Finally, Section 6 summarizes our findings, discusses the managerial implications of this work, points to its limitations, and suggests potential avenues for future research. 2 Related Work Our work relates to two important streams of past research: forecasting models of motion picture revenues and methodologies for measuring word-of-mouth. Forecasting models of motion picture revenues Predicting the success of a motion picture has largely been viewed in the industry as a “wild guess” (Litman and Ahn 1998). Despite such difficulty, several researchers have proposed models that attempt to forecast motion picture revenues. Such models can be classified along two dimensions. One classification can be based on the type of forecasting model employed: 1. Econometric models identify factors that predict or influence motion picture box office success. A large variety of factors have been examined. Some studies have looked at movie characteristics, such as star power (De Vany and Walls 1999; Ravid 1999), movie genre and MPAA ratings (Austin and Gordon 1987), and academy awards (Dodds and Holbrook 1988). Others have examined a movie’s media advertising (Faber and O’Guinn 1984), timing of release (Krider and Weinberg 1996), distribution strategy (Jones and Mason 1990; Jones and Ritz 1991) and competition (Ainslie, Dreze and Zufryden 2003). Several researchers have studied the role of professional critic reviews (Eliashberg and Shugan 1997; Reinstein and Snyder 2005; Basuroy, Chatterjee and Ravid 2003). Finally, a few integrative studies examined the relative contribution of a combination of factors (Litman 1983; Neelamegham and Chintagunta 1999; Elberse and Eliashberg 2003; Boatwright et al. 2005). 2. Behavioral models focus on factors involved in individual decision making towards selecting 5 a movie to watch (Eliashberg and Sawney, 1994; Sawney and Eliashberg, 1996; Zufryden, 1996; De Silva 1998; Eliashberg et al. 2000). Such models usually employ a hierarchical framework that develops forecasting models by relating behavioral traits of consumers to aggregate econometric factors. Another classification can be based on the timing of the forecast. Most of the proposed models are designed to produce forecasts before a movie’s initial release (Litman 1983; Zufryden 1996; De Silva 1998; Eliashberg et al. 2000) while others focus on forecasting later-week revenues after a movie’s early box office revenues become known (Shawhney and Eliashberg 1996; Neelamegham and Chintagunta 1999). The latter category tends to generate more accurate forecasting results due to the fact that these models have access to more explanatory variables, including early box office receipts, critic reviews, and word-of-mouth effects. Our study proposes a family of diffusion models whose goal is to forecast later-week revenues very soon (i.e. within 2-3 days) after a movie’s initial release. The novelty of our approach lies in the examination of various metrics of online ratings as a proxy of word-of-mouth. To the best of our knowledge we are the first to examine the use of these metrics in the context of movie revenue forecasting4 . Our contribution lies in establishing which metrics of online ratings are the best predictors of motion picture performance and in comparing the predictive power of these new metrics to that of more traditional explanatory variables used in past research such as a movie’s marketing expenditures, distribution strategy, and professional critic reviews. Methodologies for measuring word-of-mouth Traditional attempts to measure word-of-mouth are based on two principal techniques: inference and survey. For example, Bass (1969) and those who have extended his model typically use aggregated sales data to infer the model’s coefficient of internal influence, which, in turn is assumed to relate to word-of-mouth. As another example, Reingen et al. (1984) conduct a survey of the members of a sorority in which they compare brand preference congruity between women that lived in the same 4 Concurrently and independently Liu (2004) studied the impact of unstructured online discussions on movie revenues. Our study, in contrast, focuses on numerical online ratings. Whereas Liu found that the volume of discussion was significant but its valence (positive or negative) marginally so, our study finds the valence of user ratings to be the most significant variable. Furthermore, our modified Bass model helps disentangle the different way in which the volume and valence of ratings both relate to the evolution of a movie’s box office revenues. 6 house and those that did not. They find that those that lived together had more congruent brand preferences than those that did not. The study then infers that those that lived together had more opportunities for interaction and thus, that word-of-mouth communication was more prevalent. Surveys remain the most popular method to study word of mouth, largely because individuals can be asked directly about their communication habits; the error then lies in the self-reporting of behavior. Several well-known studies, such as Bowman and Narayandas (2001), Brown and Reingen (1987), Reingen and Kernan (1986) and Richins (1983), base their analyses on proprietary surveys designed to test a specific hypothesis related to word-of-mouth. The advent of the Internet introduced a third technique for measuring word of mouth: directly through online discussion groups and online review forums. Researchers can easily gather large amounts of data from such forums. Nevertheless, sound methodological principles for analyzing such data are still in the process of being established. Previous research has looked at unstructured online discussion forums and has used volume and dispersion when examining online word of mouth. The theory behind measuring dispersion, or the spread of communication across communities, is that word of mouth spreads quickly within communities, but slowly across them (Granovetter 1973). Godes and Mayzlin (2004) have found that the dispersion of conversations about weekly TV shows across Internet communities has positive correlation with the evolution of viewership of these shows. The theory behind volume is that the more consumers discuss a product, the higher the chance that other consumers will become aware of it. In a recent paper, Duan et. al. (2005) explore the dynamic relationship between online user reviews and motion picture box office revenues. They find that, whereas the volume of online postings shows significant correlation with box office sales, the valence (average numerical rating) of those postings does not have a significant impact. In this study we extend previous attempts to measure the impact of online word-of-mouth by looking at structured product rating forums and suggesting methodologies that endogenize the impact of volume and, thus, allow us to better explore the impact of the valence of online feedback. 7 Variable Box office (aggregate; in millions) Production Budget (in millions) Marketing Budget (in millions) Exhibition longevity (in weeks) Screens in opening week Volume of total user ratings Volume of first week user ratings Volume of critic ratings Average aggregate user rating (range 1-5) Average critic rating (range 1-5) Total number of movies Total number of user ratings Total number of critic ratings Total number of unique users Min 2.5 2 2 3 4 67 2 7 1.9 1.4 Mean 68.1 46.1 24.3 14 2,393 689 312 13 3.4 3.1 Max 403.7 140 50 51 3,615 6,295 3,802 20 4.4 4.6 80 55,156 1,040 34,893 Table 1: Key summary statistics of our data set. 3 Data Set Data Collection Methodology Data for this study were collected from Yahoo! Movies (http://movies.yahoo.com) and BoxOfficeMojo (http://www.boxofficemojo.com). From Yahoo! Movies, we collected the names of all movies released during 2002. For the purpose of our analysis, we excluded titles that were (a) not released in the United States, (b) not a theatrical release (e.g. DVD releases), or (c) not released nation-wide. For each of the remaining titles we collected detailed ratings information, including all professional critic reviews (text and letter ratings, which we converted to a number between 1 and 5) and all user reviews (date and time of review, user id, review text, integer ratings between 1 and 5). We used Boxofficemojo to obtain weekly box office, budget and marketing expenses data. This information was missing for several movies from the publicly accessible parts of that site. We obtained a data set of 80 movies with complete production, weekly box office, critic reviews and daily user review data5 . Our final data set consists of 1188 weekly box office data, 1040 critic reviews (an average of 13 reviews per movie), and 55156 user reviews from 34893 individual users (an average of 689 reviews per movie and 1.5 reviews per user). Table 1 provides some key summary statistics. 5 The final movie sample was found to have similar overall profile with the full set of nationally-released 2002 movies (in terms of genre, budget, and marketing), ensuring that no bias was introduced by considering only a movie subset. 8 2002 Yahoo! Age Movie Raters <18 13% 18-29 58% 30-44 23% 45+ 6% Gender Men 74% Women 26% 2001 US Moviegoers* 15% 35% 28% 22% 49% 51% * Source: Newspaper Association of America (NAA) Table 2: Estimated demographic profile of Yahoo! Movies raters. Demographics of Online Raters We were able to collect partial rater demographic data by mining the user profiles that are associated with the raters’ Yahoo IDs. About 85% of raters in our data set listed their gender and 34% their age. From that information, we constructed an estimate of the demographic profile of the Yahoo! Movies rater population (Table 2). We found that the demographic breakdown of online raters is substantially skewed relative to that of US moviegoers. Most notably, a disproportionately high percentage of online ratings were provided by young males under 30. This suggests that some rebalancing of online ratings might be required to improve their value in forecasting revenues. Relationship between User and Critic Ratings Since much work has been done on using critic reviews to predict movie revenue (Eliashberg and Shugan 1997; Reinstein and Snyder 2005; Basuroy, Chatterjee and Ravid 2003), it is natural to ask how well user ratings correlate with critic ratings. Table 3 depicts the correlation between critic and user ratings. All scores are relatively low. Interestingly, first week user reviews exhibit higher correlation with critic reviews than do later week reviews. Also, reviews posted by male users correlate better than reviews posted by female users. The low correlation between user and critic ratings emphasizes the importance of examining user reviews as a predictive tool, as the information provided by users is substantially different from the information provided by professional movie critics. 9 First week Second week Third week All weeks All 0.63 0.58 0.53 0.59 Raters Male Female 0.61 0.46 0.57 0.53 0.46 0.45 0.58 0.49 Table 3: Correlation of critic and user ratings. Preponderance of Extreme User Ratings Figure 1a plots the histogram of average user ratings for all movies in our data set. The histogram of average critic ratings (normalized to lie in the same interval as user ratings) is also plotted for comparison. User ratings are less evenly distributed than critic ratings, with the majority of movies receiving an average user rating between 3.5 and 4.5. Even more revealing is a plot of the relative incidence of the various types of ratings (Figure 1b). Critics seem to be rating movies on a (slightly upwardly biased) curve. In contrast, the majority of user ratings lie at the two extremes of the ratings scale, with a strong emphasis on the positive end: almost half of all posted ratings are equal to the highest possible rating, 18% of ratings are equal to the lowest possible rating, and only about 30% are intermediate values. The preponderance of extreme reviews is consistent with similar findings related to online product reviews on Amazon.com and other sites (Admati and Pfleiderer 2000). It is also consistent with past research on word-of-mouth that finds that people are more likely to engage in interpersonal communication when they have very positive and very negative experiences (Anderson 1998)6 . Dynamics of Ratings Volume Online reviews are (at least in principle) contributed by people who have watched the movies being rated. It is, thus, expected that their daily volume will exhibit a strong correlation with the corresponding box office revenues and will decline over time. Figure 2 confirms this for “SpiderMan”. Observe that the volume of daily ratings follows closely the box-office peaks and valleys that are associated with weekends and weekdays, especially during the first two weeks. Most movies 6 It is important to note here that the skewed distribution of online ratings is not a cause of alarm and does not diminish their information value. In an interesting theoretical paper, Fudenberg and Banerjee (2004) prove that the presence of reporting bias (i.e. higher propensity to communicate extreme rather than average outcomes) in a population does not diminish the ability of word-of-mouth to enable perfect social learning. 10 % incidence in data set Number of movies 30 25 20 15 10 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 5 1 2 3 Average score Users 4 5 Rating type Critics Users a. Histogram of average user and critic movie scores Critics b. Relative use of different scores by users and critics Figure 1: Comparison of user and critic rating behavior. 120 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Days since release Box Office Ratings Volume Figure 2: Daily box office revenues and corresponding daily volume of user ratings for “Spider-Man” (all values have been normalized so that Day 1=100) in our data set exhibit very similar patterns. The correlation between total box office and total number of ratings for all movies in our data set is 0.80. This suggests that users rate movies soon after they watch them. It also suggests that the volume of online ratings should best be thought of as a proxy of past sales. 4 Models Since one of our objectives is to assist movie exhibitors in better managing supply (i.e. the number of screens on which a movie is exhibited each week) we are interested in forecasting both a movie’s total revenues as well as its revenue trajectory over time. In common with most models of new product sales growth (Mahajan et al. 1990; Meade 1984), our model is based on a hazard rate formulation. The hazard rate of product adoption is the instantaneous probability that a representative consumer 11 who has not yet adopted a (durable) product will do so at time t. Assuming that the size of the market is known, if F (t) denotes the cumulative fraction of adopters at time t and Ḟ (t) denotes its derivative with respect to time (i.e the instantaneous rate of adoption at time t), the hazard rate of adoption is defined as: h(t) = F˙ (t) P r[adopts at time t] = P r[adopts at time τ ≥ t] 1 − F (t) (1) If the size of the market is N and the purchase price is p, total revenues, M , are given by M = N p. From equation (1), the evolution of cumulative revenues R(t) = M F (t) is then governed by the following differential equation: Ṙ(t) = (M − R(t))h(t) (2) From a theoretical perspective, hazard rate models have been shown to provide good approximations of the aggregate outcome of a large number of individual-level stochastic product adoption processes (Chatterjee and Eliashberg 1990). From a practical perspective, most growth curves used in sales forecasting by practitioners can be derived from equation (2) by assuming different functional forms for the hazard rate h(t). For example, a constant hazard rate h(t) = a gives rise to an exponential curve, whereas a monotonically increasing or decreasing hazard rate h(t) = atb gives rise to a Weibull distribution. The well-known Bass model (Bass 1969) also arises as a special case of (2) if we set h(t) = P + QR(t). A common interpretation of the Bass model is that product adoption is driven by two forces: an “external” force, that typically relates to advertising and publicity and is represented by the coefficient P , and an “internal” force that relates to word-of-mouth and is represented by the coefficient Q multiplied by the cumulative number of past adopters. Figure 3 shows a plot of the empirical hazard rate curves corresponding to a representative subset of movies in our sample. We immediately see that these curves fall into two categories: hazard rates that steadily decline over time, corresponding to wide-release (“blockbuster”) movies, and curves that increase then decline, corresponding to narrow-release (“sleeper”) movies. The form of the curves immediately rules out the use of constant and monotonically increasing hazard rate models. Interestingly, our empirical results also rule out the standard version of the Bass model, 12 0.12 0.00025 0.0002 Hazard Rate Hazard Rate 0.1 0.08 0.06 0.04 0.00015 0.0001 0.00005 0.02 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 Weeks since release Weeks since release a. Illustrative subset of wide-release movies b. Illustrative subset of sleeper movies Figure 3: Empirical hazard rates of movie revenues in our data set. where both coefficients P, Q are assumed to be positive (the standard version of the model implies a monotonically increasing hazard rate). In the rest of the paper we will relax the assumption Q > 0 and will allow coefficient Q to take negative values as well (corresponding to negative word-of-mouth about a movie). Drawing upon the unique properties of the motion picture industry, we propose a novel set of hazard rate models that are better able to fit the shape of movie revenues. Our models are theoretically justified by the following two observations: (i) the bulk of a movie’s marketing effort occurs just before a movie’s premiere and declines rapidly post-release7 ; most movies, thus, get an initial publicity “jolt” that diminishes in later weeks, (ii) word-of-mouth is localized in time; people talk more about movies immediately after watching them, and less as time goes by8 . We incorporate these two observations into the Bass hazard rate h(t) = P + QR(t) by introducing discount factors δ and ε that model the post-release decay of movie publicity and the time-locality (“perishability”) of word-of-mouth respectively. We obtain the following family of modified Bass hazard functions: Zt t h(t) = P δ + Q Ṙ(t − τ )ετ dτ 0 ≤ δ ≤ 1, 0≤ε≤1 (3) τ =0 Substituting into (2) we obtain our revenue forecasting equation: 7 Elberse and Anand (2005) report that the highest median TV advertising spending occurs immediately before a movie’s opening weekend; it drops to less than 30% of its peak value in the following week and to less than 10% in later weeks. 8 Eliashberg et. al (2000) recognize and explicitly take this phenomenon into consideration in their MOVIEMOD pre-release forecasting model. Elberse and Eliashberg (2003) also implicitly incorporate the “perishability” of wordof-mouth in their model by using a word-of-mouth proxy variable that is based only on previous-period (rather than cumulative) data. 13 Zt t Ṙ(t) = (M − R(t))(P δ + Q Ṙ(t − τ )ετ dτ ) (4) τ =0 Despite its apparent complexity, equation (4) has a simple intuitive interpretation: the instantaneous probability that a non-adopter will adopt a product at time t is proportional to the residual impact of early publicity surrounding the product, as well as word-of-mouth from previous adopters, where the impact of conversations with recent adopters is greater than the impact of conversations with earlier adopters (or, alternatively, where recent adopters talk more than early adopters). The above equation defines a fairly general class of models. Depending on the values of parameters P, Q, M, δ, ε, hazard function that are monotonically increasing, monotonically decreasing, or inverse U-shaped (first increasing, then decreasing) can be generated. For δ = ε = 1, equation (4) reduces to the standard Bass model. Given a sample of movies with known weekly box office revenues, production, and ratings data, estimation of a revenue forecasting model based on equation (4) requires two steps: First, using nonlinear least squares estimation, equation (4) is fitted to weekly box office revenue data. This step produces a set of coefficients Pi , Qi , Mi , δi , εi for each movie in our sample. Second, linear prediction models are developed for each of the 5 coefficients by regressing the estimates produced by the first step against the set of available covariates. To forecast future box office revenues of a new movie we reverse the process: Using early (production, box office, and online ratings) data and the prediction models of the second step we derive estimates P̂ , Q̂, M̂ , δ̂, ε̂ of the five coefficients for the new movie. Substitution into (4) and numerical integration then provide a forecast of the movie’s cumulative revenues at any future point in time. We experimented with fitting the full 5-parameter model to revenue data of our sample. Although model fit to weekly revenue data was excellent (Adjusted R2 > 0.99), the overall ability of the model to forecast revenues was very poor, since the errors of the five linear prediction models generated by the second step of the model estimation procedure were compounded in the final forecast. A more effective model was obtained by limiting the degrees of freedom of equation (4). Specifically, we assume that (i) discount factors δ, ε do not change across different movies, and (ii) the maximum theoretical market size N is the same for all movies and equal to the entire population 14 of moviegoers. The theoretical justification for the last assumption is based on the observation that most movies are taken out of theaters not when they exhaust their full market potential, but rather, when their rate of revenue growth falls below the opportunity cost of screening the movie (relative to screening a newer, potentially more profitable movie)9 . Based on this observation, we make the (arguably, rather extreme) assumption that, if a movie remains on theaters forever, all moviegoers will eventually watch it, albeit at an arbitrarily slow rate. If we take δ and ε as given, the above two assumptions leave our model with only two free parameters (Pi , Qi ) per movie. We acknowledge that our assumption regarding a movie’s eventual number of adopters (equal to the entire population for all movies) is unorthodox, especially in comparison with the traditional Bass model literature. Observe, however, that, whereas an accurate estimate of a product’s maximum market potential is essential in the traditional Bass model, it is less so in our model. The traditional Bass model h(t) = P + QR(t), P, Q > 0 has a monotonically increasing hazard rate. Therefore, the only way in which the sales curve can level off is if the market is exhausted. This property makes the shape of sales forecasts particularly sensitive to the estimated maximum market potential. The more general model formulation we propose allows the hazard rate to become arbitrarily small before the market is saturated. As long as we choose M to be higher than the highest total revenues of any product in our sample, our model is capable of approximating a large variety of sales growth curves using only two free parameters. Most notably, our model does not require the direct estimation of a movie’s market potential as a separate parameter, avoiding an additional source of forecasting error. Of course, the litmus test of any forecasting model is its forecasting accuracy. Appendix A (to be read after Section 5) compares the forecasting accuracy of our 2-parameter models with that of a more conventional 3-parameter family that assumes that a movie’s total box office revenues represent its market potential. We show that the 2-parameter family outperforms the 3-parameter family by a factor of almost 100%. 9 The increasingly profitable secondary market of movie rentals and DVDs provides compelling evidence for the validity of this assumption. 15 Production, marketing, and distribution strategy BDG MKT SCR SLEEPER Production budget (in millions of $) Estimated marketing costs (in millions of $) Number of screens in opening week Categorical variable indicating if movie is sleeper or wide-release MPAA Rating (dummy variables) G, PG, PG13, R, NR Genre (dummy variables) SCIFI, THRILLER, COMEDY, ROMANCE, DRAMA, ACTION, KIDS Professional Critic Ratings CRAVG Arithmetic mean of professional critics reviews User Ratings BAVG TOT AENTR GENTR Balanced arithmetic mean of user ratings posted during opening weekend* Total number of user ratings posted during opening weekend Entropy of age group distribution of opening weekend raters Entropy of gender distribution of opening weekend raters Box office data BOX Box office revenues during opening weekend (in millions of $) *Average of arithmetic means of ratings posted by males and females Table 4: List of independent variables. 5 Results Independent Variables Table 4 lists all independent variables used in subsequent analyses. Production, Marketing and Availability. Several authors have shown that the budget, advertising and availability of a film is significantly related to its box office performance (Litman 1983; Litman and Kohl 1989; Litman and Ahn 1998; Ravid 1999; Elberse and Eliashberg 2003). Accordingly, we include a movie’s production budget (BDG), marketing budget (MKT ), number of opening weekend screens (SCR) to our variable list. Release Strategy. Most movies are released using one of two distinct strategies. Wide-release 16 or “blockbuster” movies (such as Star Wars) open simultaneously in large numbers of theaters worldwide and are accompanied by intensive pre-release marketing campaigns. Revenues for such movies typically peak during the first weekend and exhibit a steady decline in subsequent weeks. “Sleeper” movies (such as My Big Fat Greek Wedding) are initially released in small numbers of theaters with modest marketing campaigns and rely on word-of-mouth for growth. Revenue streams for such movies typically increase for several weeks before they start to decline. Given the different growth patterns of these two movie categories, it is reasonable to expect that release strategy will have an important impact on parameters P and Q. Accordingly, we use a dummy variable (SLEEPER) to distinguish between the two classes of movies in our sample. We coded a movie as a “sleeper” if its number of opening weekend screens was less than 30010 . MPAA Ratings. Ravid (1999) found MPAA ratings to be significant variables in his regressions. We code MPAA ratings using five dummy variables (G, PG, PG13, R, and NR). Genre. Several papers have included the genre of a film as a control variable (Austin and Gordon 1987; Litman 1983; Litman and Ahn 1998). We collected the genre description from Yahoo! Movies and coded a movie’s genre using 7 dummy variables (Sci-Fi, Thriller, Children, Romance, Comedy, Action, Drama). Professional Critics. An important objective of our study is to compare the relative predictive power of professional critics and user ratings. Accordingly, we included the arithmetic mean of the numerical equivalent (see Section 3) of all professional critic ratings published by Yahoo for each movie (CRAVG). User Ratings. Past work on online word-of-mouth has considered the relationship of the volume, valence, and dispersion of online conversations to product revenues (Godes and Mayzlin 2004; Liu 2004; Duan et al. 2005). We use the total number of posted ratings during the first three days of a movie’s release (TOT ) as our measure of volume. We base our measure of valence on the arithmetic mean of posted ratings during the same period. Given the substantial discrepancy that exists between the demographics of online reviewers and those of moviegoers (Table 2), we found that a balanced average (BAVG) metric, equal to the average of the arithmetic means of ratings posted by (self-reported) male and female Yahoo users during the period of interest, performed 10 Our data set exhibited a clear clustering of movies with regards to the number of opening weekend screens: The highest number of opening weekend screens for a movie classified as “sleeper” was 208. In constrast, the lowest number of opening weekend screens for a wide-release movie was 809. 17 better than the raw average of all posted ratings. According to the theory of strong and weak ties (Granovetter 1973), word-of-mouth is more effective when it spreads among different social groups than when it remains confined within a single group. The dispersion of online word-of-mouth about a product has, thus, been shown to exhibit positive correlation with the evolution of its revenues (Godes and Mayzlin 2004). Finding a good metric for dispersion was tricky in our context, because Yahoo! Movies does not allow threaded discussions through which one could infer a network of strong and weak ties. We hypothesized, however, that dispersion can be proxied through some measure of the demographic (gender, age) diversity of each movie’s raters. The underlying assumption is that most movie conversations take place within social groups of similar age or of the same gender. To test this hypothesis we included the entropy of the (self-reported) gender and age distribution of each movie’s opening weekend raters to our list of independent variables11 . Nonlinear Model Estimation As we discuss in Section 4, estimation of our 2-parameter model requires fixing the values of parameters M, δ, ε for all movies. Since we did not have a basis for selecting a particular set of δ, ε, we estimated separate models for all possible combinations of discount factors in increments of 0.1. This gave us 100 models. Furthermore, we found that, as long as it was higher than the total revenues of the highest-grossing movie in our sample, the choice of M was not very crucial to the model’s forecasting accuracy (even though it did affect the combination of δ, ε that produced the best results). The reported results are based on assuming M =$1000 million. This, in turn, corresponds to the assumption of a population of 166 million moviegoers and an average ticket price of $6. The average fit of equation (4) to weekly revenue vectors of movies in our data set was excellent with an average R2 > 0.98 for all pairs of discount factors where δ = ε. Model fit deteriorated rapidly as parameters δ and ε diverged from each other12 . 11 Given a population whose members are distributed Pamong a finite number of disjoint classes i = 1, ..., N with respective probabilities pi , entropy, defined as H = − i pi logpi , represents a measure of population diversity with respect to that classification. Entropy is minimized if all members of the population belong to the same class. On the other hand, entropy is maximized if the population is evenly distributed among all classes. 12 We were intrigued by this interesting empirical relationship; a rigorous exploration of its causes, however, falls outside of the scope of this paper. One possible explanation is that the mechanisms of decay (“consumer forgetting”) are similar for stimuli received through publicity or word-of-mouth channels. Thus, when averaged across all movies, 18 Regression Analysis In this section we report the results of regressing the sets of coefficients Pi and Qi obtained by the previous nonlinear estimation step to our list of independent variables. In selecting each model, we followed a variable selection procedure similar to the traditional stepwise selection method: in each step, we included a significant variable (at the 5% level) that brought the highest increase in adjusted R2 and checked if inclusion of that variable caused a blow-up of the variance inflation factor (VIF), a commonly used measure of multicollinearity. Following each variable inclusion step, we removed any previously included variables that were no longer significant (at the 6% level). We stopped adding variables when the adjusted R2 did not increase, when additional variables were no longer significant, or when adding new variables resulted in VIF higher than 8 for any of the variables. We repeated the above procedure for all combinations of discount factors and only accepted the subset of variables that were significant in all 100 models. The resulting models are summarized in Table 513 . Coefficient P. Coefficient P can be predicted with very high accuracy (Adj-R2 = 0.99) from first weekend box-office data (BOX ). This is not surprising and follows directly from the model definition. (For t = 0, equation (4) gives P = Ṙ(0)/M ) To get a better sense of the conceptual significance of coefficient P we removed BOX from the list of independent variables and repeated variable selection on the remaining covariates. We obtained a lower but still respectable (Adj-R2 = 0.78) model fit. The two variables that were significant were marketing budget (MKT ) and user ratings volume (TOT ), supporting our interpretation of coefficient P as capturing the “external” forces (marketing, publicity) that drive initial movie revenues14 . Coefficient Q. Five variables were significant in predicting coefficient Q. Among categorical variables, SLEEPER and PG were significant. The significance of SLEEPER is consistent with the higher relative importance of word-of-mouth for sleeper movies. The (positive) significance of PG relates to the fact that movies with less restrictive MPAA ratings generally do better in the box publicity and word-of-mouth decay at the same rate. Further investigation is needed to explore the validity of this hypothesis and the presence or absence of similar relationships in other markets. 13 The specific coefficients of each model depend on the choice of discount factors δ and ε. Table 5 reports coefficients obtained for δ = ε = 0.6. 14 We believe that the high statistical significance of the volume of opening week ratings (TOT ) in predicting coefficient P is simply a consequence of the high correlation (0.80) between TOT and BOX, rather than a statement about the impact of the volume of ratings on initial movie performance. TOT here acts simply as a proxy of box office revenues, capturing a fraction of the variance of revenues that cannot be accounted for by the variation of MKT. 19 Dependent Variable: P Coefficient Variable (Std. Coeff.) BOX Intercept Adjusted R2 F-statistic Std. Error 1.71E-03 (0.99) 1.86E-03 t-value p-value 2.33E-05 73.51 0 5.77E-04 3.23 0.001 p-value 0 0.99 5404.00 Dependent Variable: P (BOX removed from variable list) Coefficient Std. Error t-value p-value Variable (Std. Coeff.) TOT MKT Intercept Adjusted R2 F-statistic 5.23E-05 (0.66) 1.19E-03 (0.38) -1.08E-02 PG BAVG CRAVG GENTR Intercept Adjusted R2 F-statistic 11.37 0 1.80E-04 6.65 0 4.23E-03 -2.54 0.01 p-value 0 0.78 140.10 Dependent Variable: Q Coefficient Variable (Std. Coeff.) SLEEPER 4.60E-06 Std. Error 3.85E-04 (0.46) 1.65E-04 (0.21) 1.65E-04 (0.39) 3.10E-05 (0.20) 2.84E-04 (0.15) -9.00E+04 0.61 26.07 t-value p-value 7.00E-05 5.51 0 5.60E-05 2.97 0.004 3.70E-05 4.46 0 1.40E-05 2.22 0.02 1.48E-04 1.92 0.05 1.40E-04 -6.42 0 p-value 0 Table 5: Regression models for predicting coefficients P and Q. 20 office. Among ordinal covariates the three significant variables were average user ratings (BAVG), average critic ratings (CRAVG) and gender entropy (GENTR). Observe that the standardized coefficient of BAVG (0.39) is almost twice as large as the standardized coefficient of CRAVG (0.20). This indicates that average user ratings are more influential than average professional critic reviews in predicting a movie’s revenue trajectory. Given the amount of attention that critic ratings have received in the past, this result has considerable practical consequences. On the other hand, the simultaneous significance of BAVG and CRAVG, together with the relatively low correlation between user and critic ratings (Table 3), reinforces our earlier remark that these two variables should be considered as complementary proxies of a movie’s revenue potential. Finally, the significance of GENTR states that diversity of a movie’s online raters (with respect to gender) is positively correlated with future revenues. This finding is consistent with Godes and Mayzlin’s (2004) result that higher dispersion of word-of-mouth among different groups correlates with higher future viewership in the context of TV shows. Interestingly, covariates relating to marketing and early box-office revenues were significant in predicting coefficient P, but not in predicting coefficient Q. Similarly, covariates relating to user and critic ratings were significant in predicting Q, but not in predicting P. This is consistent with the theoretical interpretation of coefficients P and Q of equation (4) as capturing the intensity of publicity and word-of-mouth respectively and reinforces the validity of using a modified Bass equation to model the evolution of motion picture revenues. Observe, finally, that the volume of ratings (TOT ) was not significant in predicting coefficient Q. This result is not surprising, and does not contradict the results of Liu (2004) or Duan et. al. (2005), both of whom found the volume of online conversations to be highly significant. To see this, observe that the structure of equation (4) already assumes that the impact of word-of-mouth on revenues is the product of coefficient Q multiplied by the (discounted) number of past adopters. Our data indicates that the volume of ratings is highly correlated with the volume of sales (Figure 2). One therefore expects that, if equation (4) provides a correct description of the underlying phenomenon, the impact of the volume of ratings would be absorbed by the term Rt Ṙ(t − τ )ετ dτ τ =0 and would not be significant in predicting coefficient Q. The fact that the volume of ratings was not a significant predictor of Q, thus, constitutes a further confirmation of our modeling assumptions. 21 delta 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.180 0.242 0.271 0.465 0.903 1.584 2.597 4.460 7.213 12.216 0.2 1.960 0.165 0.213 0.305 0.578 1.076 1.897 3.282 5.688 10.255 0.3 4.986 0.443 0.156 0.221 0.395 0.776 1.429 2.555 4.446 8.099 epsilon 0.5 6.292 8.474 0.551 0.299 0.144 0.325 0.763 1.545 2.964 5.648 0.4 5.756 0.940 0.271 0.150 0.260 0.543 1.076 2.008 3.717 6.761 0.6 5.570 6.108 6.536 2.746 0.382 0.141 0.431 1.104 2.322 4.562 0.7 2.440 2.503 5.403 5.037 4.978 0.479 0.143 0.617 1.651 3.762 0.8 1.021 1.141 2.275 2.235 2.978 1.292 0.606 0.147 0.884 2.557 0.9 1.833 2.260 2.331 0.985 0.969 1.133 0.945 0.732 0.154 1.456 1.0 1.971 1.394 3.160 1.055 1.163 1.009 1.023 2.267 1.452 0.161 Table 6: Mean relative absolute error (RAE) of total revenue forecasts obtained through 2-parameter forecasting models and different pairs of discount factors. Forecasting Accuracy To test the forecasting accuracy of our models we followed a procedure similar to that used by Sawhney and Eliashberg (1996). Specifically, we randomly divided our data set into a training set of 50 movies and a hold-out set consisting of the remaining 30 movies. We used the training set to calibrate regression equations for P, Q and then applied the equations to the hold-out set to obtain forecasts of a movie’s total revenue at the end of its exhibition history. Table 6 lists the average relative absolute error (RAE = |P redicted − Actual|/Actual) that is associated with the above forecasts for each of the 100 combinations of discount factors δ, ε we considered. Observe that forecasting errors are minimized when δ = ε and grow rapidly as the two discount factors diverge from each another. We will, therefore, focus our attention on the case δ = ε, corresponding to the (highlighted) diagonal terms of Table 6. As the two discount factors range between 0 and 1, mean RAE first declines, reaches a minimum (14.1%) at δ = ε = 0.6 and then begins to grow again. Interestingly, the case δ = ε = 1, that corresponds to the standard Bass model, has a mean RAE of 16.1%. This is 14% higher than the mean RAE of our best discounted model. Our forecasting results, thus, confirm our theory-based hypothesis that introduction of discount factors to the two terms of a Bass equation improves the model’s forecasting accuracy in the context of movie revenues. Of the two post-release motion picture forecasting models we are aware of, only the model of Sawhney and Eliashberg (1996) is directly comparable to ours15 . Sawhney and Eliashberg (1996) 15 The model of Neelamegham and Chintagunta (1999) focuses on predicting first-week viewership for movies that 22 developed and tested BOXMOD-I, a model for forecasting the gross revenues of motion pictures based on their early box-office data. They tested how the forecasting accuracy of their model improves as more box-office data becomes available and reported mean RAE of 71.1%, 51.6%, 13.2%, 7.2% and 1.8% when using no box-office data, one week of data, two weeks of data, three weeks of data and all available box-office data, respectively. Using only 3 days of box-office and user and critic ratings data, our best model (2-parameter model with discount factors δ = ε = 0.6) achieves levels of forecasting accuracy (mean RAE of 14.1%) for which BOXMOD-I requires two weeks of box office data. This comparison reinforces our original hypothesis that the use of online ratings enables reliable forecasts of the impact of a new experience good to be made much faster than with older methodologies16 . 6 Summary, Managerial Implications, and Research Opportunities Product review sites are widespread on the Internet and rapidly gaining popularity among consumers. Previous research has established that online product ratings have an influence on consumer behavior (Chevalier and Mayzlin 2003; Senecal and Nantel 2004). This paper shows that these systems can serve as a valuable source of information for firms as well. Specifically, firms can use statistics of online ratings as a reliable proxy of word-of-mouth in revenue forecasting models for new experience goods. We apply this idea to the context of motion pictures and propose motion picture revenue-forecasting models that use statistics of online movie reviews posted by users on Yahoo! Movies during the first weekend of a new movie’s release to forecast that movie’s future box-office performance. Online movie ratings are available in large numbers within hours of a new movie’s theatrical release. As a predictor of a movie’s long-term revenues we have found them to be more informative than other measures currently used by industry experts, such as critic reviews and early cumulative revenues. Their use, thus, allows the generation of reliable forecasts much sooner. Specifically, we have shown that, using only opening weekend (box-office, user and critic review) data, our approach are introduced sequentially in different markets (e.g. different countries). They use post-release data from one market in order to predict the movie’s performance in another market. Their objective, thus, is different from ours: our model uses early box office and ratings data to predict a movie’s future performance in the same market. 16 Since BOXMOD-I does not incorporate covariates, our result should be interpreted as evidence for explanatory power of online ratings rather than as a statement about the power of the underlying behavioral model on which BOXMOD-I is based. 23 can generate forecasts whose accuracy would require two weeks of data using older techniques. The ability to derive early post-release forecasts of a new movie’s performance has traditionally been of value to exhibitors (theater owners). Exhibitor chains need to manage the yield from their exhibition capacity, based on their estimates of demand for movies that they are currently exhibiting. Using such estimates they can adapt the exhibition capacity allocated to a new movie, either by dropping the movie from a theater or by shifting it to a smaller (or larger) screening room. They are, thus, very interested in early forecasts of gross box-office revenues in making their exhibition decisions17 . We argue that the ability to generate reliable forecasts so quickly after a movie’s premiere can have important implications for motion picture marketing as well. Such knowledge will allow movie distributors to fine-tune a movie’s campaign or, perhaps, to develop entirely new marketing strategies that can respond to an audience’s initial reception of a new movie18 . In addition to its managerial implications, our study has produced several empirical insights related to the use of online product ratings in revenue forecasting. First, we found that the average valence of opening weekend user ratings was a highly significant predictor of a movie’s long-term box office performance. Given that the demographics of online raters are skewed relative to the population of moviegoers, we also found that rebalancing the average valence of user ratings, by giving equal weight to the arithmetic mean of ratings posted by males and females, improves their predictive accuracy. Second, our analysis found user ratings to be more influential in predicting future revenues than average professional critic reviews. Given the amount of attention that critic ratings have been receiving until now, this result has considerable practical consequences. On the other hand, we found the correlation between the user and critic ratings to be relatively low; our models were able to achieve better forecasting accuracy by considering a weighted average of user and critic ratings. This suggests that a degree of complementarity exists between the viewpoints of users and experts; both can, thus, add value to predicting a new product’s future success. Third, we found that the gender diversity of a movie’s online raters exhibits a positive correlation 17 Today exhibitors usually commit to exhibit a movie for a minimum of three to four weeks. However, the increasing volatility of second and later-week revenues (Lippman 2003) plus the availability of rapid forecasting tools, such as the ones we propose in this paper, might lead the industry to adopt more flexible contracts that allow exhibitors to re-evaluate their decisions immediately after the opening week. 18 See Mahajan, Muller and Kerin (1984) for some early ideas on how firms can adapt advertising policies to positive and negative word-of-mouth. 24 with that movie’s long-term revenues. This finding supports the theory that word-of-mouth that is more widely dispersed among different social groups is more effective and suggests the need for further research in developing good measures of WOM dispersion from online data. Fourth, we found that the weekly volume of online ratings exhibits high correlation with weekly sales, suggesting that people post ratings soon after they watch a movie. Our study supports the hypothesis, commonly made in diffusion theory, that the impact of word-of-mouth on future sales is proportional to the volume of past adopters but does not find any special significance of the volume of online ratings beyond that. We conclude by pointing out a number of limitations of the current study and associated opportunities for future research. First, in common with the majority of past work in this area, our models do not incorporate the impact of competition from other movies. Such an enhancement is not possible with our current data set, since we don’t have complete box office and production data for all movies playing on all weeks. Second, our objective in this paper was to generate future revenue forecasts from a single, early measurement of box office revenues and online ratings. We, thus, do not have to worry about potential endogeneity issues associated with the interplay between word-of-mouth and revenues. In future work, we plan to examine a model that uses measurements of revenues and ratings at multiple points in time to obtain more accurate forecasts; in such a model, endogeneity will be a more important factor, and will be dealt with accordingly. Third, given its forecasting focus, our study did not attempt to consider the important question of whether online ratings influence (as opposed to predict) future revenues. The perspective of the paper has been that online ratings offer firms a valuable, real-time “window”, that allows very fast measurement of what consumers think about a new product, as opposed to a force that, in itself, influences consumer behavior. Throughout the paper we have, thus, been very careful not to make any statements about causality. Given the increasing popularity of online product review sites, an investigation of causality would be an exciting next step of this line of research. References Admati, A., and P. Pfleiderer (2000) Noisytalk.com: Broadcasting Opinions in a Noisy Environment, Working Paper 1670R, Stanford University. 25 Ainslie, A., Dreze, X. and Zufryden, F. (2003) Modeling Movie Lifecycles and Market Share. Working Paper. Anderson, E.W. (1998) Customer satisfaction and word of mouth. Journal of Service 1 (1), 5-17. Austin, B. and Gordon, T. (1987) Movie Genres: Toward a Conceptualized Model and Standardized Definition. In B. Austin (ed.) Current Research in Film: Audience, Economics, and Law, Vol. 3. Ablex Publishing Co., Norwood, NJ. Banerjee, A. and Fudenberg, D. (2004) Word-of-mouth learning. Games and Economic Behavior 46 (1) 1-22. Bass, F. (1969) A new product growth model for consumer durables. Management Science 15 (January): 215-227. Basuroy, S., Chatterjee, S., and Ravid, S.A. (2003) How Critical are Critical Reviews? The Box Office Effects of Film Critics, Star Power and Budgets. Journal of Marketing 67 (October) 103-117. Bowman, D., and Narayandas, D. (2001) Managing customer-initiated contacts with manufacturers: The impact on share of category requirements and word-of-mouth behavior. J. Marketing Res. 38, 291-297. Brown, J.J.,and Reingen, P. (1987) Social ties and word-of-mouth referral behavior. J. Consumer Res. 14 350-362. Chatterjee, R. and Eliashberg, J. (1990) The Innovation Diffusion Process in a Heterogeneous Population: A Micromodeling Approach. Management Science 36 (9), 1057-1079. Chevalier, J.A., and Mayzlin, D. (2003) The Effect of Word of Mouth on Sales: Online Book Reviews. Yale SOM Working Paper No’s. ES-28 & MK-15. De Silva, I. (1998) Consumer Selection of Motion Pictures. The Motion Picture Mega-Industry. B. Litman. Boston, MA, Allyn & Bacon Publishing, Inc. De Vany, A. and Walls, W.D. (1999) Uncertainty in the movie industry: Does star power reduce the terror of the box office? J. Cultural Econom. 23 (4): 285-318. 26 Dodds, J. and Holbrook, M. (1988) What’s an Oscar Worth? An Empirical Estimation of the Effect of Nominations and Awards on Movie Distribution and Revenues. In B. Austin (ed.) Current Research in Film: Audience, Economics, and Law, Vol. 4. Ablex Publishing Co., Norwood, NJ. Duan, W., Gu, B., and Whinston, A.B. (2005) Do Online Reviews Matter? - An Empirical Investigation of Panel Data. Working Paper, University of Texas at Austin. Elberse, A., and Anand, B.N. (2005) The Effectiveness of Pre-Release Advertising for Motion Pictures. Working Paper, Harvard Business School. Elberse, A. and Eliashberg, J. (2003) Demand and Supply Dynamics for Sequentually Released Products in International Markets: The Case of Motion Pictures. Marketing Science 22 (3) 329-354. Eliashberg, J. and Sawhney, M.S. (1994) Modeling Goes to Hollywood: Predicting Individual Differences in Movie Enjoyment. Management Science 40(9): 1151-1173. Eliashberg, J., and Shugan, S.M. (1997) Film critics: Influencers or predictors? Journal of Marketing 61(2): 68-78. Eliashberg, J., Jonker, J., Sawhney, M.S. and Wierenga, B. (2000) MOVIEMOD: An Implementable Decision Support System for Pre-Release Market Evaluation of Motion Pictures. Marketing Science 19 (3) 226-243. Faber, R. and O’Guinn, T. (1984) Effect of Media Advertising and Other Sources on Movie Selection. Journalism Quarterly 61: 371-377. Godes, D., and, Mayzlin D. (2004) Using Online Conversations to Study Word of Mouth Communication. Marketing Science 23 (4): 545-560. Granovetter, M. (1973) The strength of weak ties. American Journal of Sociology 78(6): 13601380. Jones, J.M. and Mason, C.H. (1990) The role of distribution in the diffusion of new durable consumer products. Technical working paper, Marketing Science Institute, Cambridge, MA, 27 90-110. Jones, J.M. and Ritz, C.J. (1991) Incorporating distribution into new product diffusion models. Internat. J. Res. Marketing 8 (June): 91-112. Krider, R. and Weinberg, C. (1998) Competitive Dynamics and the Introduction of New Products: The Motion Picture Timing Game. Journal of Marketing Research 35 (February): 1-15. Lippman, J. (2003) Box-Office Records Mask Second-Weekend Drops. The Wall Street Journal, June 6, 2003. Litman, B.R. and Ahn, H. (1998) Predicting Financial Success of Motion Pictures. In B.R. Litman (ed.) The Motion Picture Mega-Industry. Allyn & Bacon Publishing Inc., Needham Heights, MA. Litman, B.R., and Kohl, A. (1989) Predicting financial sucess of motion pictures: The 80’s experience. The Journal of Media Economics 2(1): 35-50. Litman, B. R. (1983) Predicting Success of Theatrical Movies: An Empirical Study. Journal of Popular Culture 16 (Spring): 159-175. Liu, Y. (2004) Word-of-Mouth for Movies: Its Dynamics and Impact on Box Office Receipts. Working Paper, December 2004. Mahajan, V., Muller, E., and Bass, F.M. (1990). New Product Diffusion Models in Marketing: A Review and Directions for Research. Journal of Marketing 54(January): 1-26. Mahajan, V., Muller E., and Kerin, R.A. (1984) Introduction Strategy for New Products With Positive and Negative Word-of-Mouth. Management Science 30(December): 1389-1404. Meade, N. (1984) The Use of Growth Curves in Forecasting Market Development - a Review and Appraisal. Journal of Forecasting 3, 429-451. Muñoz, L. (2003) High-Tech Word of Mouth Maims Movies in a Flash. Los Angeles Times, August 17, 2003. Neelamegham, P., and Chintagunta, P. (1999) A Bayesian Model to Forecast New Product Performance in Domestic and International Markets. Marketing Science 18(2): 115-136. 28 Ravid, S.A. (1999) Information, Blockbusters, and Stars: A Study of the Film Industry. Journal of Business 72(4): 463-492. Reingen, P., Foster, B., Brown, J.J., and Seidman, S. (1984) Brand congruence in interpersonal relations: A social network analysis. J. Consumer Res. 11, 1-26. Reingen, P., and Kernan, P. (1986) Analysis of referral networks in marketing: Methods and illustration. J. Marketing Res. 23 370-378. Reinstein, D.A. and Snyder, C.M. (2005) The Influence of Expert Reviews on Consumer Demand for Experience Goods: A Case Study of Movie Critics. Journal of Industrial Economics, forthcoming. Richins, M.L. (1983) Negative word-of-mouth by dissatisfied consumers: A pilot study. J. Marketing 47, 68-78. Sawhey, M.S. and Eliashberg, J. (1996) A Parsimonious Model for Forecasting Gross BoxOffice Revenues of Motion Pictures. Marketing Science 15(2): 113-131. Schindler, R. and Bickart, B. (2003) Published ’Word of Mouth’: Referable, Consumer-Generated Information on the Internet, in C. Hauvgedt, K. Machleit and R. Yalch (eds.), Online Consumer Psychology: Understanding and Influencing Behavior in the Virtual World. Lawrence Erlbaum Associates. Senecal, S. and Nantel, J. (2004) The Influence of Online Product Recommendations on Consumers’ Online Choices. Journal of Retailing 80, 159-69. Sorensen, A.T. and Rasmussen, S.J. (2004) Is Any Publicity Good Publicity? A Note on the Impact of Book Reviews. Working Paper, Stanford University. Zufryden, F.S. (1996) Linking advertising to box office performance of new film releases - A marketing planning model. Journal of Advertising Research. 36 (4): 29-41. 29 Appendix A: Comparison with Three-Parameter Models Our 2-parameter forecasting model is based on the assumption that, given infinite time in theaters, the theoretical market potential of all movies is the same and equal to the entire population of moviegoers. The advantage of our assumption is that it avoids the hurdle of estimating the total market potential of each movie as a separate parameter. A more conventional modeling approach would have been to assume that each movie’s market potential M is equal to its total box office revenues and to fit a 3-parameter (P, Q, M ) revenue model with this assumption. For benchmarking purposes, this appendix reports the results of fitting a 3-parameter model to our data. As before, the nonlinear estimation step was very successful in generating coefficients Pi , Qi , Mi for each movie. The next step consists in developing linear prediction models that relate each set of coefficients to our covariates. We follow the same variable selection procedure that we used to generate our 2-parameter prediction models. The resulting models are summarized in Table 7. Total revenues. The most important challenge in estimating a 3-parameter model is the estimation of a product’s total market potential directly from covariates. Inspired by the work of Sorensen and Rasmussen (2004) on book reviews, we experimented with the following exponential model: Mi = Mi0 exp(Xi0 β)εi (5) where Mi denotes movie i’s total box-office revenues, Mi0 denotes movie i’s opening weekend revenues, and Xi is our vector of covariates. Model (5) can be estimated by linear regression through the following equation: µ Mi LRAT = ln Mi0 ¶ = Xi0 β + ui (6) Fitting equation (6) to our data resulted in a respectable adjusted R2 of 0.84. Three covariates were significant: Among ordinal covariates, only average user ratings (BAVG) turned out to be significant, providing further evidence for the importance of early user ratings in forecasting a movie’s long-term revenue prospects. The other two significant variables (PG, SLEEPER) are categorical. Interestingly, all covariates that were significant in predicting LRAT were also significant in the 2-parameter regression model for Q. This result is intuitive, because both Q and LRAT describe 30 Dependent Variable: LRAT Coefficient Variable (Std. Coeff.) SLEEPER PG BAVG Intercept Adjusted R2 F-statistic 2.99E+00 (0.83) 3.20E-01 (0.09) 4.26E-01 (0.22) -4.35E-01 PG BAVG Intercept Adjusted R2 F-statistic MKT BAVG Intercept Adjusted R2 F-statistic p-value 1.71E-01 17.47 0 1.50E-01 2.035 0.04 9.00E-02 4.75 0 3.34E-01 -1.3 0.19 p-value 0 t-value p-value 3.20E-02 -4.94 0 2.90E-02 -2.37 0.02 1.70E-02 -6.25 0 6.30E-02 10.52 0 p-value 0 t-value p-value 9.20E-03 4.4 0 2.70E-04 -3.89 0.0002 4.40E-03 -3.6 0.0005 1.70E-02 6 0 p-value 0 Std. Error -1.60E-01 (-0.4) -7.00E-02 (-0.18) -1.06E-01 (-0.51) 6.69E-01 0.54 30.82 Dependent Variable: Q Coefficient Variable (Std. Coeff.) SLEEPER t-value 0.84 136.90 Dependent Variable: P Coefficient Variable (Std. Coeff.) SLEEPER Std. Error Std. Error 4.00E-02 (0.43) -1.07E-03 (-0.32) -1.60E-02 (-0.36) 1.01E-01 0.45 21.86 Table 7: Regression models for predicting coefficients LRAT, P, and Q in a 3-parameter forecasting model with discount factors δ = ε = 0.6 31 delta 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.278 0.350 0.456 0.713 1.174 1.833 2.779 4.262 5.609 9.901 0.2 1.785 0.271 0.338 0.482 0.781 1.255 1.965 3.063 4.777 7.378 0.3 3.891 0.629 0.272 0.355 0.553 0.913 1.467 2.336 3.762 5.992 0.4 5.065 1.354 0.418 0.275 0.388 0.657 1.103 1.802 2.985 4.836 0.5 14.003 18.991 0.905 0.391 0.278 0.438 0.798 1.367 2.307 3.925 epsilon 0.6 17.145 17.805 13.217 3.254 0.411 0.283 0.509 0.986 1.784 3.056 0.7 24.701 34.262 17.653 17.950 9.413 0.439 0.288 0.615 1.258 2.319 0.8 1.005 1.994 31.610 32.612 16.071 1.820 0.521 0.296 0.784 1.639 0.9 2.889 22.482 28.158 0.951 0.913 6.498 0.998 0.532 0.304 0.996 1.0 1.986 3.516 38.037 1.029 5.055 0.958 1.415 26.470 2.152 0.313 Table 8: Mean relative absolute error (RAE) of total revenue forecasts obtained through 3-parameter forecasting models and different pairs of discount factors. aspects of the relationship between a movie’s long-term and short-term revenues. Coefficients P and Q. Our covariates were not as successful in predicting coefficients P and Q in 3-parameter models. Both models have lower adjusted R2 than their 2-parameter counterparts (see Table 5). Furthermore, marketing budget (MKT ) and user ratings (BAVG) were significant in predicting both coefficients P and Q but with negative signs. Whereas, in the 2-parameter case, the regression models for predicting coefficients P and Q correspond nicely to the theoretical interpretation of the two coefficients, in the 3-parameter case, the third parameter (total revenues) seems to absorb most of the influence of online ratings, leaving the empirical values of coefficients P and Q with no clear intuitive interpretation. Table 8 lists the average relative absolute error that is associated with the 3-parameter model family for each of the 100 combinations of discount factors we considered. The mean RAE of forecasting a movie’s final revenues directly from equation (6) is 19.3%. This is already 37% higher than the RAE of our best 2-parameter model. The mean RAE of forecasting a movie’s final revenues from the 3-parameter model is even higher, because it compounds the forecasting errors of all three parameters. Observe that the average RAEs listed in Table 8 are almost twice as large as their 2parameter counterparts (see Table 6), suggesting that the 2-parameter model family offers a superior method of forecasting movie revenues . 32
© Copyright 2024 Paperzz