chicken and egg? interplay between music blog buzz and album sales

Pacific Asia Conference on Information Systems
(PACIS)
PACIS 2009 Proceedings
Association for Information Systems
Year 2009
CHICKEN AND EGG? INTERPLAY
BETWEEN MUSIC BLOG BUZZ AND
ALBUM SALES
Sanjeev Dewan∗
∗ University
Jui Ramprasad†
of California, Irvine, [email protected]
of California, Irvine, [email protected]
This paper is posted at AIS Electronic Library (AISeL).
† University
http://aisel.aisnet.org/pacis2009/87
CHICKEN AND EGG? INTERPLAY BETWEEN MUSIC BLOG
BUZZ AND ALBUM SALES
ABSTRACT
This paper examines the interrelationship between music blog buzz and album sales, examining both time
precedence and the contemporaneous relationship between the two. Using the Granger Causality
methodology to examine time precedence, we find evidence of bi-directional causality for the full sample
of music albums we are examining as well as for both independently released and major-label released
music albums. Next, we employ two-stage least squares analysis to quantify the contemporaneous
relationships between music blog buzz and album sales. Preliminary results indicate that blog buzz has a
positive and significant relationship with album sales and that this relationship is stronger for
independently released music. Results from this study have implications for the music industry,
particularly music labels and artists in making decisions about album promotion.
Keywords: Social Media, Blogs, Word-of-Mouth, Music Industry
1
INTRODUCTION
Web 2.0 media in general and blogging in particular have exploded in recent times and their impact on
interactions between consumers, organizations and other stakeholders has attracted much interest (The
Economist 2006). What is interesting to note is that, in many cases, this social media has emerged as a
way for consumers to interact and influence one another and has potentially become an important source
of word-of-mouth (WOM).
Indeed, it has been shown that consumers trust recommendations from other consumer more than they
trust traditional forms of advertising such as those on the television and radio; these recommendations
have the potential to impact consumption decisions (Intelliseek 2006). In the context of social media,
blogs have become a venue for these interactions between consumers. Anecdotal evidence and prior
research has indicated that the music industry in particular is an interesting microcosm in which to study
the impact of these interactions (New York Times 2004, Dewan and Ramaprasad 2009). Of particular
interest for firms is the impact that this has on consumer decision-making, consistent with questions
examined in the word-of-mouth (WOM) literature.
In this study, we look at WOM created on blogs, which is different from previous offline and online
WOM studies which have focused on the spread of WOM through an offline social network or the impact
of online WOM as measured by consumer reviews on a third-party website. In addition whereas books
and movies are the goods that have primarily been examined before, we look at WOM created about
music and also take a closer look at the simultaneous relationship between online WOM and sales.
We proceed as follows. In the next section, we provide a brief literature review. In Section 3, we present
our data source and models we use to test our hypotheses. Section 4 presents our results and we conclude
in Section 5.
2
LITERATURE REVIEW
This study draws mainly from the literature in marketing and information systems, which has a history of
literature around WOM and has pointed out the importance of examining the duality between consumer
opinion formation and sales (e.g., see Duan et al. 2008). A brief overview of this literature follows.
2.1
WOM and Sales
Prior WOM literature has examined the spread of information and its impact in both offline and online
context. Offline studies examine the process of diffusion of information through a social network and the
impact of valence on consumer choice (see, e.g., Brown and Reingen 1984). Online WOM studies have
focused on the impact of valence, volume and dispersion on sales (see, e.g., Liu et al, 2006) in the context
of consumer reviews on third-party websites; more recent studies have looked at individual attributes
such as reviewer disclosure and characteristics and their impact on consumption (Forman et al. 2008).
Many of these online WOM studies have focused on WOM created only through consumer reviews on
third-party websites and about goods such as books and movies. In addition, many of these studies have
attempted to identify the impact of online WOM, however the mechanisms underlying the impact of
online WOM are still unclear. In particular, understanding the direction of the relationship between online
WOM and sales has been particularly challenging. Research that has looked at this relationship (see, e.g.,
Chevalier and Mayzlin, 2003, Liu 2006, and Duan et al. 2008) has faced the issue of endogeneity due to
simultaneity and the existence of unobservable characteristics that impact both WOM and sales; this has
posed challenges for identification. Essentially, there is a “feedback loop” (see Figure 1) between WOM
and performance, given that WOM may not only precede sales, one measure of performance, but may
also result from sales (Godes and Mayzlin 2004a, Duan et al. 2008). Godes and Mayzlin (2004b) “suggest
that the more conversation there is about a product, the more likely someone is to be informed about it,
thus leading to greater sales” (Liu 2006); similarly, one might argue that a product that has a high level of
sales means that more consumers can consume and review the product, thus leading to greater overall
“buzz.”
More blog buzz means
more people are aware
of the product, which
increases demand.
Higher Level of
Blog Buzz
Higher Level of
Album Sales
More sales mean that a larger
percentage of the population is
aware of the music, resulting in
more buzz.
Figure 1.
Feedback Loop: Blog Buzz and Album Sales
In addition to the feedback loop, there may be unobserved characteristics that drive both the creation of
“buzz” and sales; this poses additional challenges to estimating this relationship. Different identification
strategies, such as difference-in-differences (Chevalier and Mayzlin 2006) and simultaneous equation
models (Duan et al. 2008) have been used to try to mitigate these endogeneity issues. Figure 1 outlines
the “feedback loop” between WOM and sales as has been discussed thus far in the literature.
2.2
This Study
In this work, therefore, we try to deconstruct this relationship in the context of music blog buzz and sales.
In previous research cross-sectional data has been used to examine the impact music blogs, specifically at
the sampling phenomenon. In that work the authors studied the drivers of a consumer’s decision to listen
to an entire piece of music posted on a music blog, as well as the relationship that this sampling behaviour
had with sales (Dewan and Ramaprasad 2009). Here, the focus is on examining the simultaneous
relationship between blog buzz about music albums and corresponding sales (of music albums), looking
at two different dimensions of this relationship: time precedence and the contemporaneous relationship. In
particular, the questions we look at are: Does blog buzz precede or help explain music sales or do music
sales provide more explanatory power for blog buzz? How is this relationship affected by the
characteristics of the music? What is the size of the relationship between blog buzz and music sales? And
how is this relationship affected by characteristics of the music?
3
DATA
The dataset used to conduct this is a time-series cross-section (TSCS) dataset where, for each of 2694
albums (cross sections) the dataset includes weekly total blog buzz and weekly album sales for a period of
sixty weeks from the first week of January 2006 until the last week of February 2007. Blog buzz is
measured by the number of blogs that mentioned both the exact artist name along with the exact album
name in a given week, as reported by Google BlogSearch. The weekly blog buzz data is matched with
corresponding weekly album sales from Nielsen SoundScan, which comprises both offline and online
sales and is the data source for compiling the Billboard music charts. In this analysis, we included only
albums that have both sales and buzz observations that are different from 0 for at least one week of the
entire span of sixty weeks.
We supplement this data with additional album-level variables including record label (independent vs.
major), artist reputation, price and genre; these variables do not vary over time. Artist reputation is a
dummy variable, indicating whether the artist was on the Billboard “Top Artists of the Year” in any of the
years between 2002-2006 or if the artist was on the All-Time Hot 100 Artists. If the artist was on either
one of these charts in the years mentioned, the “artist reputation” variable is set to one; otherwise, it is
zero.
Finally, we have customer review data from Amazon.com, which was obtained through Amazon Web
Services and includes album-level data on the average customer review, the number of customer reviews
(logged) and the standard deviation of the last 100 customer reviews of an album. Although this review
data does vary over time, we only have it for one point in time.
Variable description and summary statistics can be seen in Table 1 below.
Variable
Description
Sales
Log(Salest)
Buzz
Log(Buzzt)
RevVal
Full Sample
Major Label
Independent Label
Average valence of
customer reviews
3.220
(2.159)
1.086
(1.706)
4.474
(0.415)
3.756
(2.124)
1.270
(1.771)
4.476
(0.387)
2.222
(1.846)
0.743
(1.521)
4.470
(0.465)
RevNum
Number of customer
reviews
3.301
(1.403)
3.726
(1.280)
2.494
(1.265)
RevStdDev
Standard Deviation of
last 100 reviews
0.817
(0.376)
0.849
(0.330)
0.758
(0.443)
Reputation
Artist Repution (1 =
High Reputation; 0 =
Low Reputation)
0.086
(0.280)
0.125
(0.331)
0.013
(0.112)
Price
Retail price of the
album on
Amazon.com
14.958
(7.343)
15.116
(8.426)
15.018
(4.451)
Indie
Dummy variable for
type of record label
(Mainstream = 0;
Independent = 1)
0.350
(0.477)
Table 1.
Variable Description and Summary Statistics for the Full Sample, Major Label Released
and Independently Released Music.
4
EMPIRICAL METHODOLOGIES
As mentioned previously, we examine two different types of relationships between our key variables,
Sales and Buzz, first examining time precedence for which we use Granger Causality, and then examining
the contemporaneous relationship between the two using two-stage least squares (2SLS) analysis as
confirmed by the Hausman test.
In doing Granger causality analysis, a key part of the analysis is in identifying the proper model
specification. Thus, the discussion below on conducting the specification tests is extensive. In addition, it
is important to note that while the standard Granger causality analysis only includes the time series of the
two variables being examined, our implementation is different in two ways 1) we have multiple cross
sections of data over time and 2) we add control variables to get closer to “true” causality.
In the following two sections, we describe the model specification and empirical methodology used in this
paper.
4.1
Granger Causality
Granger causality is an econometric technique that has been employed to examine dual relationships, for
example, the relationship between crime levels and size of the police force (Marvell and Moody 1996). It
is important to note that Granger causality is not true causality and that it is a means of examining timeseries data of two variables to understand which variable is more or less likely to precede the other. It has
become standard to refer to this as causality in the Granger sense, and thus any use of the term causality
in this context refers to Granger causality and not true causality. Results of Granger causality tests can
show that Granger causality is stronger in one direction than the other or that there is bi-directional
Granger causality1. Granger causality has not been used extensively in the Information Systems literature;
however Dutta (2001) examines the relationship between telecommunications infrastructure and
economic activity finding that the causal relationship is stronger from telecommunications infrastructure
to economic activity.
While Granger’s (1969) original implementation involved two variables with observations across several
time periods, Holtz-Eakin et al. (1988) has extended it for use in time-series cross-section (TSCS) data
sets. Hood III et al (2008) identify three “causal scenarios” in a TSCS framework, which guide our
hypotheses: 1) an identical causal relationship between x and y in all cross-sections; 2) no causal
relationship between x and y in any of the cross-sections; and 3) a causal relationship in some subset of
cross sections. Note that the context in which Hood III et al. propose these hypotheses is one where there
are a relatively limited number of cross sections and thus it is feasible to test each cross-section. It is clear
that given the large number of cross-sections in the dataset used in this study, the individual cross-section
analysis is neither feasible nor informative, but the subset analysis is.
Granger causality is defined as follows: a variable yt is said to (Granger) cause another variable xt if “we
are better able to predict xt using all available information than if the information apart from yt had been
used, ” where the “information” used are lagged values of each of the variables (Granger 1969). The
standard model that is used to test for Granger causality is:
m
m
y t = α o + ∑ α l y t −l + ∑ δ l x t − l + u t
l =1
1
(1)
l =1
For example, Marvell and Moody (1996) found that there is bi-directional causality between crime levels and the size of the
police force but that it is stronger from police to crime. Hood III et al. (2008) found that the (Granger) causal direction varied for
different sub-samples of the data – that is the strength of the causal direction between black mobilization and GOP growth varied
for different regions in the south (southern states of the United States).
where l represents the number of lags of the two variables included in the specification (determined by lag
length tests, discussed below) and can be different lengths for each of the variables. Equation (1)
represents the equation used to test whether x Granger causes y; a similar model with x as the dependent
variable would test whether y Granger causes x. As noted, lag lengths can be different for each of the
variables in the model as well as the same variable across the two specifications.
Using the models just described, the test for Granger causality is conducted using the following
methodology: first, estimate both the restricted (with only a variable’s own lags) and unrestricted (with
both variable’s lags) model above using Ordinary Least Squares (OLS); second, conduct an F-test to see
if the unrestricted model adds any explanatory power (see, e.g. Marvell and Moody 1996). This test is
conducted with both yt and xt as the dependent variable, with the right-hand side model specification
depending on the results of the lag length tests. If the F-test is significant, then the variable added in the
unrestricted model is said to (Granger) cause the first variable – that is, it indicates that, for example, the
lagged values of x contains information that helps better predict y than the information that y holds alone
or, that “changes in one variable, x, precede changes in another variable, y” (Gschwend 2004). Again, it
is important to note that Granger causality neither establishes true causality of one variable to another nor
exogeneity of either of the variables. Granger causality does not account for omitted variables that could
impact both x and y, thus potentially biasing the results. In addition, true exogeneity requires both current
and past time periods of a xt to not effect yt, which is not established with this methodology either (Enders
1995).
Before estimating this model, one has to first conduct multiple specification tests to determine the
appropriate model. Briefly, the steps for determining the model that should be estimated can be
summarized as follows: First, conduct Augmented Dickey-Fuller (ADF) tests to check for the order of
integration of each time-series and adjust appropriately (i.e. by differencing) if the series are not I(0).
Second, determine the appropriate lag length to use for each variable. In past work, determination of lag
length has been done using model selection criteria such as Akaike’s Information Criterion (AIC),
Bayesian Information Criterion (BIC) or Akaike’s Final Prediction Error. Each of these determines the
optimal model to use by “rewarding” goodness of fit (as determined, for example, by the Adjusted RSquared) but penalizing an increase in parameters. Others have selected lags by estimating the model with
a large number of lags and removing the lags one at a time, starting from the longest lag, if the coefficient
estimates are insignificant and the significance level of the F-test does not decline after removing the lag
(see, e.g. Marvell and Moody 1996). After the model has been determined, specification tests to check for
autocorrelation are conducted.
The lag length in the model used in this analysis is determined byAIC; BIC gives similar lag length
selections. For the Sales equation, AIC and BIC indicated that a four lags were sufficient for Buzz in
addition to five lags for Sales. For the Buzz equation, AIC and BIC indicated that three lags were
sufficient for Sales in addition to the five lags used for Buzz. Given the large size of the dataset,
conducting the ADF tests on each sixty week time-series for the 2694 music albums in the dataset is timeintensive. At this point, ADF tests for a random sub-sample of 100 albums have been conducted and
indicate that overall, the time-series are I(0); this indicates that the Sales and Buzz variables do not need to
be differenced.
Equations 2 and 3 (for any album i at time t, with lags as discussed above) represent the standard Granger
causality model using logged values of the Sales and Buzz variables, with additional control variables for
record label, genre and artist reputation.
Sales it = β o + β 1 Sales it −1 + β 2 Sales it − 2 + β 3 Sales it − 3 + β 4 Sales it − 4 + β 5 Sales it − 5 + β 6 Buzz it −1
(2)
J
+ β 7 Buzz it − 2 + β 8 Buzz it − 3 + β 9 Buzz it − 4 + β 10 Indie i + β 11 Rep i + ∑ δ j Genre ij + u it
j =1
Buzz it = β o + β 1 Sales it −1 + β 2 Sales t − 2 + β 3 Sales it −3 + β 4 Buzz it −1 + β 5 Buzz t − 2 + β 6 Buzz it −3
(3)
J
+ β 7 Buzz it − 4 + β 8 Buzz it −5 + β 9 Indiei + β 10 Repi + ∑ δ j Genreij + v it
j =1
where the variables are as described in Table 3. Genreij , for j = 1,....J are dummy variables so that
Genreij = 1 if track i is of Genre j, and equal to 0 otherwise.
These equations are estimated with the dataset described above for three different sets of data based on
the release date of the album: 1) the full set of albums; 2) just albums released within the sixty weeks or
within ten weeks of the sixty weeks examined (called “recently released”), and 3) only albums that are
released more than ten weeks before the start of the data collection (called “not recently released”). Godes
and Mayzlin (2004) did similar analysis, looking at the impact of online conversations on TV show
rations early in the season as opposed to late in the season. We chose ten weeks as a threshold here as it
has been shown that impact the “bursty” effect of an album’s release occurs within the first ten weeks of
an album’s release (Bhattacharjee et al. 2007).
As discussed previously, Hood III et al. (2008) suggest estimating the Granger causality model for each
of the cross-sections. Given that there are almost 2700 cross-sections, this is difficult and not very
informative; instead, the model is estimated for sub-samples of the data. An album’s record label has been
shown to be important to examine in music studies (Chellapa et al. 2007), and thus the sub-sample
analysis is conducted on music released by independent record labels compared to those released by one
of the major labels (Sony BMG, Warner, Universal, and EMI).
4.2
Two-stage least squares (2SLS) Methodology
In the second part of this study, we use a different approach, regression analysis, to quantify the
relationship between Buzz and Sales. Granger causality does not allow for measuring the impact of
current time period effects (Marvell and Moody, 1996), nor does it allow for other variables measured at a
particular cross-section to be included in the specification. Thus, the goal of the analysis conducted here is
to understand the impact of Buzz at time t on Sales. Given the results of the Granger causality test as well
as prior literature on online WOM, it is likely that these two variables are jointly determined and therefore
an appropriate estimation method must be used. Two estimation methods that help reduce the endogeneity
bias and corresponding inconsistent estimations are 2SLS and 3SLS estimation, where 3SLS takes
account cross equation correlation. Results of the Hausman specification test (see Table 2) indicated that
2SLS should be used here.
Efficient under
H0
3SLS
Consistent under H1
2SLS
Table 2. Hausman Test Results
Degrees of
Freedom
59
Statistic
p-value
8990
<0.0001
Thus, we estimated the following equation (for any album i) using 2SLS,
(4)
Salesit = α o + α 1 Buzz it + α 2 RevNumi + α 3 RevVali + α 4 RevStdDevi + α 5 Pricei +
J
α 6 Indiei + α 7 Repi + ∑ δ j Genreij + ε 1i
j =1
In this model, we treat Buzz is endogenous. Thus, before estimating equation (4), an appropriate
instrumental variable for this variable must be determined. In situations where time-series data is
available, it is common practice to use lagged values of the endogenous right hand side variable, along
with all other exogenous variables as instruments. Thus, in this case, Buzzi t-1, which is highly correlated
with Buzzit (0.875) and has a much lower correlation with Salesit (0.302) and all other exogenous variables
are used as instruments for Buzzit.2 White’s (1980) test indicated significant heteroskedasticity, so
heteroskedasticity-adjusted standard errors are used throughout.
5
RESULTS
The results for the Granger causality tests can be seen in Tables 3 through 5. Tables 3 and 4 provide the
results for the estimation of Equations (2) and (3) – the standard Granger Causality model with the record
label, genre and artist reputation (in the full sample) control variables included. Specifically, the results
presented in these tables are the F-statistics and corresponding p-values for the F-tests of the variable
being tested for driving causality. That is, the first row of results present the F-statistic of the lagged buzz
variables in Equation (2) in the full sample as well as “independent” and “major” subsamples; the second
row of results present the F-statistics for the lagged sales variables for same set of samples. Looking at the
results, the F-tests in Tables 3 and 4 seem to indicate strong bi-directional causality for the datasets that
include all of the albums and the albums released recently for both sub-samples of independently released
and major label released music. The results for the albums that are not recently released (Table 5) also
show evidence of bi-directional causality although the strength of the relationships is smaller, and seem to
indicate that there is stronger causality from Sales to Buzz for major labels.
Major Label
32.68***
39.80***
Independent Label
47.07***
61.79***
Full Sample
Major Label
56.15***
19.67***
Predicting Log (Sales)
***
116.82
51.63***
Predicting Log (Buzz)
***, **, *
Notes:
denote significance at 1%, 5% and 10%, respectively.
Independent Label
35.09**
66.78***
Predicting Log (Sales)
Predicting Log (Buzz)
Full Sample
76.08***
96.68***
Notes: ***, **, * denote significance at 1%, 5% and 10%, respectively.
Table 3.
Table 4.
Granger Causality Results (All Albums )
Granger Causality Results (Recently Released)
Full Sample
Major Label
4.36***
2.52**
Predicting Log (Sales)
***
6.25
4.16***
Predicting Log (Buzz)
***, **, *
Notes:.
denote significance at 1%, 5% and 10%, respectively.
Table 5.
2
Independent Label
2.54**
2.30*
Granger Causality Estimates (Not Recently Released)
Note that we conducted the same analysis with a Buzzit-6 and results were qualitatively consistent.
Tables 6 through 8 present the empirical results from the 2SLS estimations of Equation (4). It is
interesting to note that in estimation using the full dataset, the relationship between Buzz and Sales is
significantly stronger for independently-released music than for major label released music.
Intercept
Log( Buzz)
RevVal
RevNum
Full Sample
0.963***
(0.078)
Major Label
0.454***
(0.148)
Independent Label
1.311***
(0.089)
0.147***
(0.005)
0.040***
(0.016)
0.096***
(0.006)
0.103***
(0.030)
0.235***
(0.008)
-0.029***
(0.018)
0.724***
(0.005)
0.777***
(0.007)
0.674***
(0.007)
0.213***
0.265***
0.199***
(0.017)
(0.033)
(0.020)
0.853***
0.739***
1.485***
Reputation
(0.022)
(0.023)
(0.110)
***
***
-0.011
-0.007
-0.021***
Price
(0.001)
(0.001)
(0.002)
***
-0.171
Indie
(0.015)
120596
67614
52982
N
0.359
0.318
0.247
Adjusted R-Squared
Notes: Variables are as defined in Table 1. The results correspond to the regression equation (4) with dependent
variable Log (AlbumSales). Heterosckedastic-adjusted standard errors are in parentheses. ***, **, * denote significance
at 1%, 5% and 10%, respectively.
RevStdDev
Table 6.
2SLS Estimation Results (All Albums)
We can examine these differences further and see that for the music that was recently released the
relationship between Buzz and Sales is almost equal for major label and independently released music
(Table 7), but that in the subset of data that was not recently released (Table 8) we see that the
relationship between these two variables is positive and significant for independently-released music but
is negative and significant in the sub-sample of music released by major labels.
Full Sample
Major Label
Independent Label
1.775***
0.431
2.550***
(0.121)
(0.267)
(0.129)
0.356***
0.373***
0.357***
Log( Buzz)
(0.013)
(0.020)
(0.015)
***
***
-0.038
0.314
-0.281
RevVal
(0.024)
(0.053)
(0.026)
***
***
0.726
0.644
0.825***
RevNum
(0.013)
(0.021)
(0.016)
0.055**
0.093
-0.001
RevStdDev
(0.028)
(0.075)
(0.029)
0.369***
0.598***
0.564
Reputation
(0.121)
(0.128)
(0.551)
-0.009***
-0.012***
-0.005*
Price
(0.002)
(0.001)
(0.003)
-0.181***
Indie
(0.032)
38999
14160
24839
N
0.212
0.159
0.194
Adjusted R-Squared
Notes: Variables are as defined in Table 1. The results correspond to the regression equation (4) with dependent
variable Log (AlbumSales). Heterosckedastic-adjusted standard errors are in parentheses. ***, **, * denote significance
at 1%, 5% and 10%, respectively.
Intercept
Table 7.
2SLS Estimation Results (Recent Releases)
Intercept
Log( Buzz)
RevVal
RevNum
Full Sample
-0.635***
(0.079)
Major Label
-0.698***
(0.132)
Independent Label
-0.520***
(0.096)
0.018***
(0.003)
0.201***
(0.016)
-0.012***
(0.004)
0.179***
(0.026)
0.090***
(0.006)
0.226***
(0.020)
0.985***
(0.005)
1.014***
(0.006)
0.952***
(0.007)
0.166***
0.167***
0.134***
(0.017)
(0.028)
(0.023)
0.742***
0.647***
1.650***
Reputation
(0.017)
(0.018)
(0.084)
***
***
-0.015
-0.010
-0.043***
Price
(0.001)
(0.001)
(0.001)
-0.256***
Indie
(0.011)
81597
53454
28143
N
0.585
0.520
0.517
Adjusted R-Squared
Notes: Variables are as defined in Table 1. The results correspond to the regression equation (4) with dependent
variable Log (AlbumSales). Heterosckedastic-adjusted standard errors are in parentheses. ***, **, * denote significance
at 1%, 5% and 10%, respectively.
RevStdDev
Table 8.
2SLS Estimation Results (Older Releases)
6
CONCLUSION
In this paper, we examine the dual relationship between blog buzz and album sales through two different
means, first employing the Granger Causality methodology and then using 2SLS estimation. This analysis
is related to but distinct from prior work around the impact of music blogs as a form of world of mouth
(Dewan and Ramaprasad 2008), which focused on examining music sampling as a form of consumption
in and of itself. While that paper also examines the relationship between sampling and sales, it examines
this relationship with cross-sectional data. In this paper, we are fortunate to have time-series data which
allows us to study the broader impact of blog buzz as a whole as well as the relationship with sales over
time.
We are encouraged by our results thus far and will continue to examine this simultaneous relationship
more closely. While our Granger causality results indicate bi-directional causality, we plan on pursuing
this analysis to understand in which direction the causality is stronger and for what types of music. As
discussed above, we will refine our Granger causality specification and, following previous literature,
conducting the Granger causality analysis on different sub-samples of data (e.g. genre). In addition, we
plan to implement a richer causal model using a system of simultaneous equations linking blog buzz and
sales.
References
Brown, J.J., P.H. Reingen. 1987. Social Ties and Word-of-Mouth Referral Behavior. Journal of
Consumer Research.14(3) 350-362.
Chevalier, J.A., D. Mayzlin. 2003. The effect of word of mouth on sales: Online book reviews. Journal of
Marketing Research. 43(3) 345-354.
Dewan, S., J. Ramaprasad. 2009. Consumer Blogging and Music Sampling: Long Tail Effects. Working
Paper, Paul Merage School of Business, University of California – Irvine.
Duan, W., B. Gu, A.B.Whinston. 2008. Do online reviews matter? – An empirical investical of panel
data. Decision Support Systems. doi:10.1016/j.dss.2008.04.001
Dutta, A. 2001. Telecommunications and Economic Activity: An Analysis of Granger Causality. Journal
of Management Information Systems. 17(4) 71-95.
Forman, C., A. Ghose,, B. Wiesenfeld. 2008. Examining the Relationship Between Reviews and Sales:
The Role of Reviewer Identity Disclosure in Electronic Markets. Information Systems Research. 19(3)
291-313.
Godes, D., D. Mayzlin. 2004a. Using online conversation to study word of mouth communication.
Marketing Science. 23(4) 545-560.
Godes, D., D. Mayzlin. 2004b. Firm-created word-of-mouth communication: A field-based quasiexperiment. Harvard Business School Marketing Research Papers.
Granger, C.W.J. 1969. Investigating Causal Relations by Econometric Models and Cross-Spectral
Methods. Econometrica. 37(3) 424-438.
Hood, M. V., III, Q. Kidd, et al. 2008. Two Sides of the Same Coin? Employing Granger Causality Tests
in a Time Series Cross-Section Framework. Political Analysis 16(3) 324-344
Holtz-Eakin, D., W. Newey, H.S. Rosen. 1988. Estimating Vector Autoregressions with Panel Data.
Econometrica.56(3) 1371-1395.
Liu, Y. 2006. Word of mouth for movies: Its dynamics and impact on box office revenues. Journal of
Marketing. 70(3) 74-89.
Marvell, T.B., C. E. Moody. 1996. Specification Problems, Police Levels, and Crime Rates. Criminology
34(4) 609-646
The Economist. April 20, 2006. Among the audience.
The New York Times. October 18, 2004. A draining week in the indie-music spotlight.