This article appeared in a journal published by Elsevier. The

This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Electronic Commerce Research and Applications 11 (2012) 275–289
Contents lists available at SciVerse ScienceDirect
Electronic Commerce Research and Applications
journal homepage: www.elsevier.com/locate/ecra
Online user reviews, product variety, and the long tail: An empirical investigation
on online software downloads
Wenqi Zhou 1, Wenjing Duan ⇑
Department of Information Systems & Technology Management, School of Business, The George Washington University, 2201 G Street, NW, Washington, DC 20052, United States
a r t i c l e
i n f o
Article history:
Received 24 November 2010
Received in revised form 1 December 2011
Accepted 11 December 2011
Available online 21 December 2011
Keywords:
Long tail
Superstar
Online user reviews
Product variety
Word-of-mouth
Software download
Quantile regression
a b s t r a c t
Our study examines the impact of both a demand side factor (online user reviews) and a supply side factor (product variety) on the long tail and superstar phenomena in the context of online software downloading. The descriptive analysis suggests a significant superstar download pattern and also the
emergence of the long tail. Using the quantile regression technique, we find the significant interaction
effect between online user reviews and product variety on software download. We find that the impacts
of both positive and negative user reviews are weakened as product variety goes up. In addition, the
increase in product variety reduces the impact of user reviews on popular products more than it does
on niche products. After taking the interaction effect into account, we find that the overall impact of
the increased product variety helps niche products to get more downloads. These results highlight the
importance of considering the intricate interplay between demand side and supply side factors in the
long tail and online word-of-mouth research.
Ó 2011 Elsevier B.V. All rights reserved.
1. Introduction
In traditional brick-and-mortar stores, vendors are recommended to apply an intensive advertising strategy, i.e., highlight
the hits (superstars), and sellers have well recognized the high risks
of investing in new product development. The underlying belief
behind such strategies is the well-recognized Pareto principle applied to sales distribution, namely the 80/20 rule (Brynjolfsson
et al. 2011). This rule says that the cumulative sales from the most
popular products (the top 20%) account for approximately 80% of
total sales. In essence, the superstar effect represents a concentrated consumption pattern in which a relatively small number of
very popular products account for the majority of sales (Frank and
Philip 1995, Rosen 1981). Such superstar effect still prevails in online markets as documented in recent studies (Brynjolfsson et al.
2010b, Duan et al. 2009).
However, the significantly increased product choices and information made possible by the Internet and electronic markets provide the opportunity for more niche products to be discovered
and adopted. Anderson (2006) first coined the term ‘‘long tail’’
(see Fig. 1), which envisages that more niche products offered exclusively in online stores better satisfy consumers’ diversified preferences and thus have the potential to outgrow the demand for
⇑ Corresponding author. Tel.: +1 202 994 3217; fax: +1 202 994 5830.
1
E-mail addresses: [email protected] (W. Zhou), [email protected] (W. Duan).
Tel.: +1 202 994 2454; fax: +1 202 994 5830.
1567-4223/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.elerap.2011.12.002
those popular products often sold through traditional channels. Recent studies have applied Anderson’s original long tail prediction to
both offline and online contexts, and even extend it to purely online
investigations to examine the change in demand distribution over
time (Brynjolfsson et al. 2010a, Elberse and Oberholzer-Gee 2008,
Ghose and Gu 2007, Tan and Netessine, 2009, Tucker and Zhang
2007, Zhao et al. 2008). Thus, the more thorough and essential definition of the long tail effect, which we use in this research, describes
the change in the consumption pattern when more niche products are
being selected and the demand is shifting from the hits to the niches
over time (Anderson 2006, Elberse and Oberholzer-Gee 2008).
There are both supply side and demand side factors that could
attribute to the long tail formation (Brynjolfsson et al. 2006,
2010b). Major supply side factors include increased availability
and variety of products on the Internet resulting from virtually
unlimited ‘‘shelf space,’’ as well as make-to-order production and
digital distribution, which significantly reduce the costs for producers and retailers (Brynjolfsson et al. 2006, 2010b). Key demand
side factors include abundant online product information, such as
consumer reviews and recommendations, and powerful search and
sampling tools on e-commerce websites (Brynjolfsson et al. 2010b,
2011).
The predicted shift of user choices to niche products has already
attracted much attention from academics examining the change in
consumption patterns on the Internet. Those studies mainly focus
on the demand side and results are inconsistent. Some researchers
believe that lower consumer search costs, resulting from online
Author's personal copy
276
Fig. 1. Long tail on
www.longtail.comgr1.
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
distribution
curve.
Source:
Long
Tail
Blog,
http://
feedback and recommendation systems, contribute to the reduced
concentration of sales of popular products, resulting in the long tail
phenomenon (Clemons et al. 2006, Hervas-Drane 2009,
Maryanchyk 2008, Oestreicher-Singer and Sundararajan 2009).
Meanwhile, other empirical studies have advocated the superstar
effect because of lower consumer search costs (Fleder and
Hosanagar 2009, Ghose and Gu 2007, Zhao et al. 2008). Ghose
and Gu (2007) found that search costs for price information are
lower for hit products compared to niche products. Fleder and
Hosanagar (2009) suggest that selection-biased recommendation
systems may help reduce the sales diversity because these systems
tend to recommend products with more historical data, i.e., the
popular products.
In contrast to these studies, which primarily focus on either demand side or supply side justifications for the skewed shape of
user choices, we focus in this research on both a key supply side
factor, product variety, and a critical demand side factor, online user
reviews, to study their effects on the formations of long tail and
superstar phenomena. Using a data set of online software downloads, we first undertake the descriptive analysis to examine
changes in distribution patterns of software downloads over time,
which is followed by a rigorous quantile regression empirical analysis to examine how these two factors and their interplay influence
user choices of software with different levels of popularity.
Our descriptive analysis suggests a coexistence of the superstar
and long tail effects. We still observe a small number of very popular products dominating the demand, but a larger number of hits
are receiving fewer individual downloads over time. In the meantime, we also observe that significantly more niche products are
available and are being chosen by users. More importantly, the demand is shifting from the hits to the niches over time. The quantile
regression results show a significant interaction effect between online user reviews and product variety on the skewed shape of the
download distribution. The interaction effect demonstrates that
consumers’ reliance on online user reviews to choose products is
significantly influenced by the quantity of products available. Specifically, we find that the impacts of both positive and negative
user reviews are weakened as product variety goes up. In addition,
the increase in product variety reduces the impact of user reviews
on popular products more than it does on niche products. As a result, the impact of user reviews on user choices does not show a
well-defined trend over various product popularities, and the magnitude of the impact depends on the level of product variety. On
the other hand, our empirical analyses suggest a clear trend of
the impact of product variety on the distribution of user choices,
even after taking its interaction effect with online user reviews. Increased product variety has a greater positive impact on tail products than it does on popular products, leading to the long tail
formation, regardless of whether the products receive positive or
negative user reviews.
Our study, to the best of our knowledge, is the first to investigate the impacts of both online user reviews (demand side factor)
and product variety (supply side factor), and particularly their
interaction effect, on the formations of long tail and superstar phenomena. The quantile regression technique allows us to investigate
these influences across the entire spectrum of user choices distribution by estimating models of conditional quantile functions from
covariates. We find empirical evidence that omitting the interaction effect between product variety and user reviews could lead
to misleading inferences about the word-of-mouth (WOM) effect.
Hence, this paper contributes to the long tail research by offering
a new perspective on the interaction effect between demand side
and supply side factors in the interpretation of the long tail and
superstar predictions of online consumption patterns. The intertwined and complex relationships among user reviews, product
variety, and product popularity, also offer possible explanations
for extant mixed results on the impact of online user reviews, thus
contributing to both Information Systems (IS) and Marketing research by re-examining the influence of online user reviews.
This paper, as the first attempt to examine the long tail phenomenon in the Internet software market, also adds to the research
on long tail e-commerce by expanding the boundary of long tail research to a very important territory. Many extant long tail studies
focus on the cultural goods (e.g., books and movies) upon which
Anderson (2006) originally built the concept of the long tail. However, the investigation of the demand pattern in the software market is also vital, given its rapid growth in the online marketplace. In
2013, the global software market is forecasted to have a value of
$457 billion, an increase of 50.5% since 2008, of which the US accounts for 42.6%.2 As noted in Anderson’s (2006) illustrations, which
included eBay, Google, and Salesforce (a customer relationship management (CRM) software service provider), the long tail could certainly manifest in the software industry. The economics of the
software industry, with the help of online recommendation systems,
could be significantly influenced by three primary long tail forces,
which we can summarize as ‘‘make it, get it out there, and help me find
it,’’ particularly because most software programs can be produced
and delivered on the Internet with almost zero marginal cost. However, unlike books and movies, software adoptions usually come
with the much greater risks and complexities resulting from installation, learning, and long-term maintenance. Therefore, the conclusions drawn from cultural goods may suffer from poor validity if
directly extended to the software market, which calls for independent investigations from researchers.
The rest of this paper proceeds as follows. We review the related literature and develop the conceptual framework in the next
section. We then describe the data and analyze the empirical model. In the last section, we discuss the results and implications, and
conclude the paper by addressing the limitations and identifying
areas for future research.
2. Literature review and conceptual framework
2.1. The long tail
The long tail phenomenon in the distribution of product sales
was first observed through the comparison between offline and online sales (Anderson 2006, Brynjolfsson et al. 2006, Tucker and
Zhang 2007). Recent long tail studies in e-commerce have extended the shifts predicted in user choices from the hits to the
niches to purely online channels (Clemons et al. 2006, Fleder and
Hosanagar 2009, Ghose and Gu 2007, Hervas-Drane 2009, Maryanchyk 2008, Oestreicher-Singer and Sundararajan 2009, Zhao et al.
2
Software: Global Industry Guide (2009).
Author's personal copy
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
2008). In essence, the long tail phenomenon describes the change
in consumption patterns and thus can only be examined by comparing consumer demand distribution shifts over time in pure online channels (Brynjolfsson et al. 2010b, 2011, Elberse and
Oberholzer-Gee 2008). The long tail phenomenon can be investigated and identified from the following perspectives. First, the
widely agreed-on feature of the long tail effect is that the longer
tail should be emerging, i.e., more niche products are being consumed over time (Anderson 2006, Brynjolfsson et al. 2011, Elberse
and Oberholzer-Gee 2008, Oestreicher-Singer and Sundararajan
2009).
In addition, the long tail consumption pattern is also expected
to include a relatively fatter tail, i.e., the demand is shifting from
the hits to the niches over time. Anderson (2006), who first officially coined the term, ‘‘long tail,’’ pointed out that the theory of
the long tail in economics is that demand is ‘‘shifting away from a
focus on a relatively small number of ‘hits’ at the head of the demand
curve and toward a huge number of ‘niches’ in the tail.’’ Specifically,
there are three scenarios that leads to a relatively fatter tail compared to the head. First, the demand for the niches increases while
the demand for the hits decreases; second, the demand for the
niches increases at a greater rate than the increase in demand for
the hits; and the third, also most plausible scenario is that the demand for all products generally decreases, but the decrease for the
hits is more pronounced (Elberse and Oberholzer-Gee 2008). Given
the largely expanded product variety on the Internet (Brynjolfsson
et al. 2006, 2010a,b, 2011; Tan and Netessine 2009), we expect that
the sales of an individual product would decrease, regardless of its
popularity, as many long tail studies show (e.g., Brynjolfsson et al.
2010a,b; Elberse and Oberholzer-Gee 2008; Tan and Netessine
2009). In this case, when the decrease in demand for the niches
is less significant than the decrease in demand for the hits, the demand actually shifts from the hits to the niches, resulting in a relatively fatter tail. Following the literature, in this study we
examine the long tail phenomenon from two perspectives: longer
and fatter tail, with the fatter tail resulting from the shift in demand from the hits to the niches.
The specific measures to examine the long tail phenomenon require appropriately categorizing products into the hits and the
niches, which have raised disagreements among researchers. Many
of them followed Anderson’s (2006) approach by adopting the
absolute measure (Brynjolfsson et al. 2003, 2010a). They identified
the long tail by comparing the demand shares accounted for by the
hits and the niches differentiated by the amount of sales, e.g.,
the Top 100 products as the hits and products above rank 100 as
the niches. Others examine the long tail phenomenon using the
relative measure, i.e., percentage of sales, to evaluate the change
in distribution curve (Elberse and Oberholzer-Gee 2008, Tan and
Netessine 2009). Tan and Netessine (2009) argued that the relative
measure is more appropriate because it controls for the significant
increase in number of products over time or across channels.
Nevertheless, Brynjolfsson et al. (2010b) conducted a comprehensive review on extant long tail studies. They pointed out that both
the absolute and relative measures have their strengths and weaknesses in comparing the demand shares for both the hits and the
niches, depending on the study settings.
In contrast to the long tail effect, the identification of superstar
effect involves only an observation of the concentration of demand
distribution, as demonstrated both in its original description of the
Pareto principle and the recent long tail research. The superstar effect has been consistently defined as the consumption pattern in
which a small number of popular products account for the majority
of sales (Elberse and Oberholzer-Gee 2008, Rosen 1981, Tucker and
Zhang 2007, Zhao et al. 2008), which show a concentrated demand
distribution toward the hits. In light of such a definition, even as
the tail of the demand distribution becomes longer and fatter over
277
time, a small number of products could still dominate the sales and
thus could still demonstrate the superstar effect.
Therefore, the long tail and superstar phenomena are not necessarily conflicting and could coexist in descriptions of various attributes of a consumption pattern. Although the long tail indicates
the shift of demand from the hits to the niches, the very popular
products can still dominate market demand at the same time. Supporting such arguments, Elberse and Oberholzer-Gee (2008) found
empirical evidence for the coexistence of the long tail and superstar phenomena by examining overall video sales through both online and offline channels. They found that the number of obscure
products increases almost twice every week; meanwhile, the demand for niche products decreases at a much less pronounced rate
than the demand for hits, which denotes the long tail phenomenon.
They also observed that an even smaller number of video titles accounts for the majority of sales, which indicates that the superstar
phenomenon also still prevails. Tucker and Zhang (2007) provided
some explanations for such coexistence using an empirical comparison of consumers’ click-through behavior. Examining clickthroughs between the catalog and Internet channels of a website
that provides online wedding service vendor lists, the study
showed that sales from superstar products are enhanced by
attracting new demand without cannibalizing the demand for
niche products.
2.2. Online user reviews and product variety
The widely adopted online user feedback systems allow consumers to exchange their evaluations and experiences on the Internet and thus to automate and amplify the digital WOM process
(Duan et al. 2009). Nevertheless, the conclusions regarding the
WOM effect across products with various popularities are not
consistent.
An earlier theoretical development by Bakos (1997) has predicted that online WOM recommendations would help consumers
find the less popular goods that nevertheless match their preferences. Online product feedback and recommendations have been
viewed as important demand side factors that can reduce consumer search costs in the pursuit of niche products (Brynjolfsson
et al. 2006). Therefore, most long tail studies regard the digital
WOM effect as an influence on information transformation, which
contributes to the long tail formation on the demand side
(Brynjolfsson et al. 2011). Hervas-Drane (2009) analytically
showed that a recommendation system functions as a taste matching mechanism to help consumers get product information from
others with similar preferences; as such, it reduces sales concentration. Oestreicher-Singer and Sundararajan (2009) provided
empirical evidence that a stronger and broader influence of recommendations on Amazon.com results in a flatter sales distribution.
In particular, online user reviews, as one of the most important
formats of online conversations for measuring WOM (Godes and
Mayzlin 2004), are also believed to contribute to the long tail formation as an important demand side factor in recent empirical
studies (Clemons et al. 2006, Duan et al. 2009, Hervas-Drane
2009, Maryanchyk 2008, Oestreicher-Singer and Sundararajan
2009, Zhu and Zhang 2010). For instance, Clemons et al. (2006)
showed that receiving the most positive reviews helps new products grow more quickly in the marketplace. Zhu and Zhang
(2010) found that online user reviews are more influential for less
popular video games whose players have more years of Internet
experiences. By examining informational cascades in the context
of online software adoption, Duan et al. (2009) demonstrated that
online user reviews have an increasingly positive and significant
impact on the adoption of less popular products, although they
also found that popular products become more popular regardless
of user review ratings.
Author's personal copy
278
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
On the other hand, superstar proponents have a different voice
regarding the WOM effect across products with various popularities. The classic superstar effect proposed by Rosen (1981) argued
that communication technologies could result in a concentration of
the consumption of high quality supplies. Consumers are more
likely to infer product quality from previous users’ purchase decisions, which then results in only choosing from a few highly
ranked products. This superstar effect facilitated by communication technologies has also been shown to be strengthened by the
Internet and other digital distribution channels (Duan et al. 2009,
Goldmanis et al. 2009, Tucker and Zhang 2007). For example, in
opposition to the long tail prediction from Brynjolfsson et al.
(2006), Goldmanis et al. (2009) found that the decline in consumer
search costs due to the introduction of e-commerce leads to the
superstar phenomenon in market share of both travel agencies
and bookstores. Fleder and Hosanagar (2009), in considering the
lower consumer search costs resulting from e-commerce, conducted a simulation to show that recommendations based on sales
and ratings can lead to a reduction in sales diversity. Supporting
these findings, other studies predict that online user reviews
strengthen the superstar consumption pattern (Fleder and
Hosanagar 2009, Zhao et al. 2008). Zhao et al. (2008) found that
positive user reviews have a stronger impact on the hits than on
the niches and that negative user reviews hurt niche products
more, which suggests the superstar effect of online user reviews.
In contrast to online user reviews, increased product variety is
consistently viewed as an important long tail factor on the supply
side in the literature (Anderson 2006, Brynjolfsson et al. 2006). The
Internet has made it more feasible and efficient to provide consumers a much larger selection of products through online channels
than brick-and-mortar stores (Brynjolfsson et al. 2003). Stocking
more products for online channel requires much lower inventory
costs than traditional brick-and-mortar stores. While physical
stores encounter space limitations and various costs including logistic and holding, online stores can maintain a very large inventory in their centralized warehouse on a much less expensive
location. The lower inventory cost of online channels is even more
pronounced for digital products, of which one additional product
only needs adding one more line in the product database. Significantly more niche products are accessible in online channels,
which may not be available in physical stores before. Such availability of more varieties of products is more likely to meet more
consumers’ niche preferences and thus increase the sales of the
niches products, leading to the long tail formation in consumption
pattern.
2.3. Conceptual framework
Fig. 2 depicts our conceptual model, which illustrate the proposed effect of online user reviews, product variety, and their interaction on online consumers’ choices of products with various
popularities. We define the interaction effect between online user
reviews and product variety as a relationship that the degree of a
consumer’s reliance on online user reviews to make product
choices depends on the level of product variety.
Previous studies have identified the interaction effect between
online user reviews and contextual variables on product sales
(Cheema and Papatla 2010, Zhu and Zhang 2010). For example,
Zhu and Zhang (2010) applied the classic psychological choice
models proposed by Hansen (1976) to the e-commerce setting
and argued that consumers’ reliance on user reviews is affected
by contextual variables. Contextual variables are conventionally
defined as the factors describing the context where the brain processes occur, out of which consumers subsequently form their
behavioral responses. These variables may include store size, promotion event, product category, and so forth (Hansen 1976). In the
context of online video games, Zhu and Zhang (2010) identified
‘‘product characteristics’’ as the contextual variable that interacts
with online user reviews to influence sales. Cheema and Papatla
(2010) also echoed Hansen’s theory (1976) by pointing out that
the importance of online information for influencing Internet purchases depends on product category. Similar to these studies, we
argue that online user reviews and an important contextual variable (product variety) interact to influence user choices, potentially
having various impacts across products with various popularities.
Product variety has been recognized as the supply side factor
influencing the diversity of user choices. Product variety information is readily available in almost every online store and serves
as an important information resource for consumers’ decisionmaking. Brynjolfsson et al. (2011) suggested that product variety
should be considered together with demand side factors in examining changes in consumption patterns, implying the potential
interaction effect between online user reviews and product variety.
Their study highlighted the importance of keeping product variety
constant while examining the long tail effect of the demand side
factors resulting from lower consumer search costs enabled by
the Internet channel. Online user reviews, as one of the major demand side factors (Brynjolfsson et al. 2006), thus might influence
user choices, depending on the number of products offered. A
few recent long tail studies have controlled for product variety during their study periods, thus losing the opportunity to reveal the
possible influence of changes in product variety (Brynjolfsson
et al. 2011, Ghose and Gu 2007, Tucker and Zhang 2007). Although
extant research has not formally investigated it, these prior studies
nevertheless suggest that the influence of online user reviews may
depend on their intricate interplay with product variety. In addition, extant work suggests that online user reviews have different
impact on products that have different levels of popularity (Duan
et al. 2009, Zhao et al. 2008, Zhu and Zhang 2010).
In this study, we investigate the impact of both online user reviews and product variety, along with their interaction effect, at
different locations of the demand distribution. The findings are expected to provide a more comprehensive interpretation of the impact of online user reviews and product variety on the
heterogeneity of user choices. We also expect our study to shed
light on the mixed results documented in previous online WOM
studies. In addition, we aim to provide a more thorough understanding of the role that product variety plays in consumers’ online
decision-making, thus contributing to resolution of the current disagreements between long tail and superstar advocates (Anderson
Fig. 2. onceptual framework.
Author's personal copy
279
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
2006; Brynjolfsson et al. 2006, 2010b; Goldmanis et al. 2009).
Furthermore, the potentially very complex relationships between
the demand and supply side factors dictate the nature of this question to be primarily empirical, which we would like to take the first
initiative to investigate in this study.
Table 2
Description of key variables.
Variable
Description and measure
WEEKLYDOWNLOADit
TOTALDOWNLOADit
Weekly number of downloads of software i at week t
Cumulative number of downloads of software i at
week t
Average user rating for software i at week t (one to
five scale with half points)
A dummy variable measures if software i receives
CNET rating at week t
The rank of software i at week t by weekly downloads
The total number of software programs listed in the
category at week t
A dummy variable measures if software i is free-to try
at week t
Days since software i has been posted
USERRATINGit
3. Data
CNETRATINGDit
Our data are from CNET Download.com (CNETD), which is a collection of more than 30,000 free or free-to-try software programs
for Windows, Mac, mobile devices, and webware. CNETD, as a part
of CNET network, is a leading and representative online platform
for software download. CNETD lists approximately 20 large groups
of software programs with approximately 5–20 categories in each
group. In addition to providing the overall product description,
CNETD also shows download counts for each software program
posted and solicits user reviews. The user review system includes
detailed comments and an overall evaluation, using a five-star user
rating system. CNETD also provides editorial reviews for selected
software programs (usually the popular ones). Reviews are summarized by rating on a scale of one to five, with one being the lowest and five being the highest.
CNETD provides an ideal environment for this study. First, in
addition to the description of product features, it clearly provides
the information about the number of products offered for each category. We measure product variety by the number of software programs listed in a specific category. Product evaluation is also
readily available from user reviews for each product, with both detailed comments and a one to five star rating. The data set up the
foundation for our study, which allows us to test how online user
reviews, interacting with the level of product variety, influence
market demand concentration. Specifically, we use the cumulative
average user rating (on a scale of 1–5) as a measure for online user
reviews, which is the most prominent user feedback information
displayed on CNETD for each product. All the information on
CNETD is updated on a daily basis, which enables us to compile a
longitudinal dataset to analyze the dynamics of software downloads. Second, all the software programs listed on CNETD can be
downloaded without any charge, therefore, the price effect on
the demand side is controlled by default. This advantage comes
at a cost, however, because of the possibility that consumers’
download behavior on CNETD may be different from their actual
software purchase behavior. However, because users are often required to make significant commitments and considerable efforts
to use free and free-to-try software programs, free-to-download
software programs on CNETD might not substantially different
from other online products (Duan et al. 2009).
Our sample consists of four software categories, which includes
the popular downloaded software categories as well as a diversified coverage of software programs with different application purposes. The categories are: Antivirus Software, Digital Media Player,
Download Manager and File Compression. Our sample is composed
of weekly data for two periods: from December 2004 to July 2005
(period 1) and from August 2007 to February 2008 (period 2). The
WEEKLYRANKit
WEEKLYVARIETYit
FREEPRICEDit
AGEit
time interval across the two periods offers us the unique opportunity to observe and compare the variations of software download
patterns. The two periods are fairly comparable. Both periods
encompass similar time frames: period 1 is eight months long
and period 2 is seven months long. In addition, during the two
periods, CNETD made no fundamental changes in terms of the
interface design, user rating system, CNET rating system, or search
options. The number of software programs listed in each category
differs considerably, from approximately 60 to 450, as shown in
Table 1. The difference reflects the distinctive environment in each
category, and we define each category as a single market (Duan
et al. 2009). The following information has been extracted for every
software program listed in each category: software name, date
added, total downloads, last week downloads, average user rating,
and CNET rating. We also collect software characteristics, including
operating system requirements, file size, publisher, license (free or
free-to-try), and price when its license is free-to-try. Table 2 presents the variable definitions, descriptions, and explanations of
measurement.
4. Empirical methodology and results
4.1. Descriptive analysis of software download distribution
To investigate the software download pattern on CNETD and its
change over time, we conduct the following descriptive analysis for
each of the four categories across two periods to examine the number of weekly downloads distribution pattern. We use the number of
weekly downloads to capture user demand in this study, which is
similar to some prior studies that have also used consumers’ incremental demand in studying superstar and long tail phenomena
(Elberse and Oberholzer-Gee 2008). We define Qa as the lower
ath quantile of the weekly download distribution. As a result, the
top a% most popular products are simply those whose weekly
downloads exceed Q(1 a) of the download distribution.
To first examine whether the superstar phenomenon prevails
over time, we follow the conventional approach of using the
Table 1
Product variety for sample periods 1 and 2.
Variable
Mean (P1|P2)
SD (P1|P2)
Min. (P1|P2)
Max. (P1|P2)
Antivirus software
Digital media player
Download manager
File compression
106.47|226.96
174.78|437.46
119.78|213.31
66.91|174.35
16.45|26.09
13.36|50.83
5.14|32.68
8.03|21.73
82|132
157|242
112|150
60|95
137|249
203|466
131|256
86|193
Notes: P1 denotes period 1 and P2 denotes period 1. The data is based on weekly aggregation.
Author's personal copy
280
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Digital Media Player
Average Weekly Download Share
Average Weekly Download Share
Antivirus Software
100%
p1
p2
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
100%
p1
90%
p2
80%
70%
60%
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
0%
100%
10%
20%
30%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
10%
20%
30%
40%
50%
60%
50%
60%
70%
80%
90%
100%
File Compression
p1
p2
70%
80%
90%
100%
Popularity
Average Weekly Download Share
Average Weekly Download Share
Download Manager
100%
0%
40%
Popularity
Popularity
100%
p1
90%
p2
80%
70%
60%
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Popularity
Fig. 3. Distributions of average weekly number of downloads. Notes: P1 refers to period 1 and P2 refers to period 2.
relative measure by examining the demand shares accounted for by
products with different popularities (Brynjolfsson et al. 2011;
Fleder and Hosanagar 2009). Specifically, we report the download
share of software titles in different quantiles Qa by calculating
the corresponding percentages of average weekly downloads in
each period.3 The result suggests that a small number of very popular products dominate the demand in both periods. For example, we
find that in each of the four categories in both periods, the top 10%
most popular products, whose weekly downloads exceed Q90 of
the weekly download distribution, account for more than 80% of
the overall downloads. Moreover, the superstar phenomenon seems
to be more significant in period 2, according to this measure. The
download shares accounted for by the hits in all four categories increase in period 2 compared to period 1. For instance, in the category
of Digital Media Player, the download share of the top 10% most popular products increases by 12.70%, from 83.04% in period 1 to 95.74%
in period 2. The increase in the download share of the top 1% most
popular products is even more dramatic, as large as 21.96%, from
32.82% in period 1 to 54.78% in period 2. For a better illustration,
Fig. 3 plots this distribution for both periods in each category against
the overall popularity, starting with the most popular software on
the left side (the ‘‘head’’ of the distribution) and the least popular
software on the right side (the ‘‘tail’’ of the distribution). This relative
measure plot seems to demonstrate a more significant superstar
download pattern, in which the distribution becomes more asymmetrical over time with a sharper peak.
We now turn to the question of whether the long tail emerges
in period 2. We first look into the longer tail attribute of long tail
phenomenon by examining the changes in the number of software
programs in both head and tail over two periods. For simplicity,
3
The detailed report is too lengthy to be included in the paper but is available upon
request.
we denote the bottom a% least popular products, whose weekly
downloads are in the Qa of the weekly download distribution, as
TAILa. Similarly, the top (1 a)% most popular products, whose
weekly downloads exceed Qa of the weekly download distribution,
are denoted by HEAD(1 a). For example, the top 25% most popular products are denoted by HEAD25. Average weekly numbers of
software programs in a series of popularities are portrayed in
Fig. 4 to show the product variety in both head and tail between
the two periods in each category. Fig. 4 indicates that there are
more products in the tail in period 2 than in period 1 in each category, indicating a longer tail in the download pattern. For example, in the category of Antivirus Software, the average number of
the bottom 25% least popular products is 26 in period 1, and it increases to 57 in period 2. Among these tail products, many of the
software programs are virtually unknown, (e.g., ‘‘Yes AntiVirusTool NetsKy-P’’ and ‘‘DiamondCS WormGuard’’). Nevertheless,
even these more obscure products are chosen by some users.
Similarly, to illustrate the relatively fatter tail attribute of the
long tail, we report the average weekly numbers of downloads in a
series of popularities in Fig. 5, after applying a natural log transformation to present results on a comparable scale. We find a substantial download drop in both head and tail from period 1 to
period 2, which has been shown to be significant using a two sample t-test. This result implies that, on average, each individual software program gets fewer downloads over time, regardless of its
popularity. Therefore, to examine whether the relatively fatter tail
exists, we investigate the average decrease of average weekly
downloads over the two periods for both the hits and the niches,
with results reported in Table 3. In each category, the decrease in
demand for the niches is shown to be much less pronounced than
the decrease in demand for the hits, indicating a shift in demand
from the hits to the niches. For example, in the category of Antivirus Software, the weekly downloads of the bottom 1% least popular
Author's personal copy
281
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Number of Software Programs
Digital Media Player
Week
Week
Download Manager
File Compression
Number of Software Programs
Number of Software Programs
Number of Software Programs
Antivirus Software
Week
Week
Fig. 4. Average weekly number of software programs in head and tail. Notes: PiWj refers to the jth week in period i.
Table 3
Decrease of average weekly downloads over two periods for both hits and niches.
Antivirus software
Digital media player
Download manager
File compression
TAIL1
TAIL5
TAIL10
TAIL25
HEAD25
HEAD10
HEAD5
HEAD1
3
2
2
2
11
9
7
5
20
13
11
9
54
31
20
20
82,522
8333
2604
26842
19,2691
16,527
5360
60,768
349,704
26,408
8001
104,433
909,768
80,840
11,990
20,5426
products decrease by 3 in period 2, on average, while the decrease
in weekly downloads for the top 1% most popular products is much
more significant in period 2, as large as 909,768. Figs. 4 and 5,
along with Table 3, show that the software download pattern
exhibits a longer and relatively fatter tail in period 2, demonstrating the long tail phenomenon.
We also notice some interesting results at the individual product level from Fig. 4, which complement our superstar observation
demonstrated in Fig. 3 using the pure relative measure. Fig. 4
shows that in each category more products are not only in the tail
but also in the head. For example, in the category of Digital Media
Player, the average number of the top 10% most popular products
increases from 18 in period 1 to 52 in period 2. These findings,
along with the results in Fig. 5, suggest that a larger number of hits
receive fewer individual downloads in period 2. Therefore,
although overall the superstar effect seems to be more significant
over time as shown in Fig. 3, hit products are actually facing more
intense competition and getting less popular individually. This result is consistent with our observation of the long tail phenomenon
that demands seem to be shifting to the niche products.
Overall, we observe the coexistence of superstar and long tail
download patterns. Similar to most extant studies, the first part
of our descriptive analysis for identifying the superstar effect
adopts the relative measure and looks into the classic distribution
curve of user choices shown in Fig. 3 (Brynjolfsson et al. 2011;
Fleder and Hosanagar 2009). However, our two long tail measures
to examine the longer and relatively fatter tail are slightly different
from either the relative measure or the absolute measure. Our
measures keep the format of the relative measure using the
relative popularity of products to separate them in head and tail.
This allows us to control for the largely increased product variety
in period 2.4 In the meantime, our measures uses the absolute values
of both the number of software programs (product variety) and the
number of downloads, instead of examining the consumption shares
commonly used in previous studies. Instead of choosing from either
a pure relative measure or a pure absolute measure, our analysis on
the long tail phenomenon offers a more complete view from various
perspectives. Our analysis also provides support for the argument of
Brynjolfsson et al. (2010b) that one possible reason for the inconsistent observations on consumer demand pattern in literature could
be the biased choices of different measures.
Although our initial descriptive analysis suggests the presence
of both the superstar and long tail phenomena, it has limited
statistical significance. In the next section, a more rigorous
empirical analysis is conducted to generate more insights for
understanding the factors that might drive such phenomena.
4
A two sample t-test conducted on a time series of weekly product variety for the
two periods (each period data as one sample) shows that the average weekly product
variety in period 2 is significantly greater than that in period 1. The detailed report on
the t-test results is available upon request.
Author's personal copy
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Antivirus Software
Digital Media Player
Average Weekly Downloads
Average Weekly Downloads
282
Week
Download Manager
File Compression
Average Weekly Downloads
Average Weekly Downloads
Week
Week
Week
Fig. 5. Average weekly number of downloads in head and tail. Notes: PiWj refers to the jth week in period i.
4.2. Quantile regression model
Table 4
Descriptive statistics of key variables.
To investigate how online user reviews and product variety
interact to influence user choices of software programs with different popularities, we use the quantile regression methodology. The
widely used least-squares regression, which particularly models
the conditional mean, such as the Ordinary Least Square (OLS)
regression, assumes the noise around the mean of the dependent
variable is normally distributed. Its estimation result can thus only
infer the mean effect of covariates on the dependent variable. In
contrast, quantile regression examines how covariates influence
the entire distribution of the dependent variable. However, it has
not been broadly implemented in IS research. In fact, the quantile
regression method fits well into most long tail studies by directly
coping with the highly skewed distribution of user choices. The
quantile regression model specifies the conditional quantile of
the dependent variable as a linear function of covariates. By examining a series of quantiles, researchers are able to assess and uncover the different impacts of covariates at various locations of
the dependent variable distribution. Our descriptive analysis has
verified a heavily skewed distribution of software downloads,
which suggests that employing the quantile regression technique
is more appropriate than the conventional least-squares regression
model. Specifically, quantile regression is particularly helpful in
this study to investigate empirically whether and how the heterogeneity of user choices is influenced by online user reviews (demand side factor) and product variety (supply side factor).
The general form of the quantile regression is expressed as:
Q aðyjxÞ ¼ x0 bðaÞ
ð1Þ
where Qa(y|x) denotes the ath quantile of the distribution of the
dependent variable y, and x denotes the vector of covariates. The
key observation of the dependent variable in our data is the number
of weekly downloads (WEEKLYDOWNLOADit), which captures user
demand as previously discussed. We apply a natural log transformation on the number of weekly downloads (Log (WEEKLYDOWNLOADit)) and use it as the dependent variable in our quantile
regression models. The log transformation has the advantage of
reducing the nonconstant variance and converting the value to a
magnitude that is comparable to other variables. In addition, the
Variable
Mean
SD
Min.
Max.
4.24
1.61
2.55
1.74
0
3
14.07
2
0
22.67
96.80
20.20
0.07
9.23
115.61
428.47
0.70
0.25
2.56
67.61
453.41
0.46
0
0
1
0
0
1
17.89
243.50
2336
1
2.28
1.54
0
3
12.31
2
0
40.33
200.24
23.76
0.09
9.13
222.09
617.70
0.47
0.29
2.34
129.91
536.08
0.50
0
0
1
0
0
1
17.90
452
3457
1
Download Manager (N = 5, 796)
LOG(WEEKLYDOWNLOAD)it
3.85
1.76
USERRATINGR
2.23
1.67
0
3
12.26
2
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOTALit
WEEKLYRANKit
AGEit
FREEPRICEDit
0
31.29
67.15
38.85
0.22
9.18
109.06
591.55
0.66
0.42
2.27
65.28
569.69
0.48
0
0
1
0
0
1
17.87
249.50
2365
1
3.66
1.98
2.33
1.70
0
3
12.86
1.50
0
17.40
81.57
16.43
0.24
9.04
89.32
577.63
0.61
0.42
2.43
52.07
489.84
0.49
0
0
1
0
0
1
19
190.50
2546
1
Antivirus software (N = 6, 162)
LOG(WEEKLYDOWNLOAD)it
USERRATINGRit
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOTALit
WEEKLYRANKit
AGEit
FREEPRICEDit
Digital media player (N = 1, 1810)
LOG(WEEKLYDOWNLOAD)it
3.57
2
USERRATINGR
it
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOTALit
WEEKLYRANKit
AGEit
FREEPRICEDit
it
File compression (N = 4, 749)
LOG(WEEKLYDOWNLOAD)it
USERRATINGRit
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOTALit
WEEKLYRANKit
AGEit
FREEPRICEDit
‘‘monotone equivariance’’ property of quantile regression, which
does not hold for least-squares regression, allows us to perfectly
Author's personal copy
283
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
re-interpret the fitted quantile regression models for the original
variable (WEEKLYDOWNLOADit) from the transformed variable (Log
(WEEKLYDOWNLOADit)) (Koenker and Gilbert 1978). Eq. (1) can be
estimated using the following linear form:
logðQ ait Þ ¼ b0 ðaÞ þ
X
bj ðaÞ xijt þ eit ðaÞ
ð2Þ
Qait denotes the ath quantile of weekly downloads of software i at
week t, and b0(a) is a constant term. xijt denotes the value of the jth
covariate for software i at week t, and eit is the error term. In our
context, average user ratings range from 1 to 5; thus a 3-star user
rating can be defined as a neutral review and all the other levels
can be defined as extreme reviews (either positive or negative).
We note that a minor linear transformation on user ratings would
help differentiate the rating level of user reviews, thus making the
interpretation of the coefficients more intuitive. Instead of including
USERRATINGit in the model, we consider (USERRATINGit-3). For parsimony, we name the new variable USERRATINGRit . If the ratings
are neutral/positive/negative, (i.e., equal to/above/below point 3),
USERRATINGRit is zero/positive/negative, respectively. Chevalier and
Mayzlin (2006) found that 1-star user reviews hurt sales more than
5-star user reviews benefit sales. To assess the nonlinear impact of
user reviews of different rating levels, we also include a quadratic
term of USERRATINGRit , denoted by USERRATINGRSQit.
To test the interaction effect between online user
reviews and product variety, we include an interaction
term WEEKLYVARIETY Ct USERRATINGRit (Kenny et al. 1998).
WEEKLYVARIETY Ct refers to the centered number of software programs at week t. We demean the number of software programs
(WEEKLYVARIETYt) to treat zero as a meaningful value of product
variety for better interpretations of both the simple effect and
the interaction effect (Aiken and West 1991, Judd and McClelland
1989). A single term, WEEKLYVARIETY Ct , is thus also included
instead of WEEKLYVARIETYt. Therefore, the components related to
user ratings in our model can be expressed as b1
USERRATINGRit þ b2 USERRATINGR SQ it þ b3 WEEKLYVARIETY Ct USERRATINGRit . The estimation of b3 determines the significance of
the interaction effect between online user reviews and product
variety. Hence, if the interaction effect between online user
reviews and product variety indeed is present, the simple
effect of user reviews, measured by b1 USERRATINGRit þ b2
USERRATINGR SQ it , cannot represent the actual impact of user
reviews under most situations with various product variety
levels. In these cases, the total impact of user reviews should be
measured
by
b1 USERRATINGRit þ b2 USERRATINGR SQ it þ b3
C
WEEKLYVARIETY t USERRATINGRit .
Following previous studies, we use the cumulative number of
downloads (TOTALDOWNLOADit) to control for network effects
(Brynjolfsson and Kemerer 1996, Duan et al. 2009, Gallaugher
and Wang 2002), which is considered to be particularly prominent
in the software industry. We also include product age AGEit and the
quadratic term of product age AGESQit to control for product
diffusion (Duan et al. 2009). AGEit captures the linear part of the
diffusion process, and AGESQit approximates the nonlinear component. Using these two variables allows us to reasonably control for
product diffusion while maintaining an adequate degree of
freedom for analysis (Duan et al. 2009). A dummy variable FREE-
Table 5
Correlation matrix of key variables.
Variable
Antivirus software
1. LOG(WEEKLYDOWNLOADit)
2. USERRATINGRit
3.
4.
5.
6.
7.
8.
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOALit
WEEKLYRANKit
AGEit
FREEPRICEDit
Digital media player
1. LOG(WEEKLYDOWNLOADit)
2. USERRATINGRit
3.
4.
5.
6.
7.
8.
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOALit
WEEKLYRANKit
AGEit
FREEPRICEDit
Download manager
1. LOG(WEEKLYDOWNLOADit)
2. USERRATINGRit
3.
4.
5.
6.
7.
8.
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOALit
WEEKLYRANKit
AGEit
FREEPRICEDit
File compression
1. LOG(WEEKLYDOWNLOADit)
2. USERRATINGRit
3.
4.
5.
6.
7.
8.
WEEKLYVARIETY Ct
CNETRATINGDit
LOGTOALit
WEEKLYRANKit
AGEit
FREEPRICEDit
1
2
3
1
0.656
1
0.004
0.026
1
0.438
0.811
0.947
0.374
0.271
0.276
0.732
0.661
0.104
0.224
0.007
0.014
0.181
0.004
0.002
1
0.650
1
0.004
0.004
1
0.378
0.803
0.926
0.040
0.228
0.341
0.688
0.604
0.248
0.181
0.003
0.002
0.163
0.010
0.002
1
0.647
1
0.034
0.012
1
0.159
0.677
0.941
0.344
0.223
0.269
0.675
0.621
0.134
0.107
0.011
0.021
0.244
0.032
2.3E04
1
0.630
1
0.01
0.018
1
0.283
0.823
0.935
0.179
0.169
0.172
0.677
0.582
0.086
0.130
0.010
0.003
0.177
0.012
0.020
4
5
6
7
8
1
0.427
0.335
0.068
0.070
1
0.764
0.025
0.205
1
0.372
0.267
1
0.050
1
1
0.457
0.285
0.213
0.105
1
0.731
0.319
0.138
1
0.027
0.221
1
0.022
1
0.349
0.170
0.307
3.14E04
1
0.611
0.119
0.019
1
0.340
0.198
1
0.157
1
0.374
0.240
0.302
0.027
1
0.744
0.133
0.032
1
0.143
0.191
1
0.035
1
Author's personal copy
284
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Table 6
OLS and quantile regression estimations.
b1
USERRATINGR
b2
USERRATINGRSQ
b3
INTERACTION
b4
WEEKLYVARIETYC
b5
LOGTOTAL
Antivirus software
OLS
0.036
(0.008)***
Q10
0.022
(0.007)***
Q20
0.022
(0.007)***
Q30
0.020
(0.008)**
Q40
0.011
(0.008)
Q50
1.7E4
(0.008)
Q60
0.007
(0.009)
Q70
0.021
(0.011)**
Q80
0.079
(0.016)***
Q90
0.132
(0.024)***
Q99
0.226
(0.035)***
0.008
(0.004)**
0.006
(0.003)**
0.014
(0.003)***
0.015
(0.003)***
0.013
(0.004)***
0.010
(0.003)***
0.015
(0.003)***
0.021
(0.004)***
0.043
(0.005)***
0.073
(0.008)***
0.116
(0.014)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.004
(0.000)***
0.005
(0.000)***
0.004
(0.000)***
0.004
(0.001)***
0.019
(0.000)***
0.018
(0.001)***
0.015
(0.001)***
0.013
(0.001)***
0.011
(0.001)***
0.010
(0.001)***
0.009
(0.001)***
0.008
(0.001)***
0.006
(0.001)***
0.006
(0.001)***
0.005
(0.001)***
Digital media player
OLS
0.054
(0.015)***
Q10
0.020
(0.011)*
Q20
0.054
(0.015)***
Q30
0.094
(0.014)***
Q40
0.096
(0.013)***
Q50
0.172
(0.022)***
Q60
0.343
(0.028)***
Q70
0.389
(0.019)***
Q80
0.460
(0.027)***
Q90
0.451
(0.038)***
Q99
0.821
(0.069)***
0.009
(0.006)
0.001
(0.004)
0.009
(0.006)
0.023
(0.006)***
0.020
(0.005)***
0.039
(0.008)***
0.087
(0.009)***
0.100
(0.006)***
0.120
(0.009)***
0.092
(0.013)***
0.167
(0.029)***
0.001
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.001
(0.001)
Download manager
OLS
0.035
(0.011)***
Q10
0.033
(0.011)*
Q20
0.013
(0.009)
Q30
0.018
(0.008)**
Q40
0.030
(0.009)***
Q50
0.048
(0.013)***
Q60
0.100
(0.016)***
Q70
0.168
(0.022)***
Q80
0.217
(0.029)***
Q90
0.194
(0.034)***
Q99
0.297
(0.144)**
3E4
(0.005)
0.004
(0.007)***
0.005
(0.004)
0.010
(0.003)***
0.017
(0.003)***
0.026
(0.005)***
0.044
(0.006)***
0.066
(0.008)***
0.071
(0.011)***
0.054
(0.011)***
0.011
(0.047)
File compression
OLS
0.131
(0.018)***
0.032
(0.007)***
b6
AGE
b7
AGESQ
b8
FREEPRICED
b9
CNETRATINGD
b10
WEEKLYRANK
Estimated parameter (N = 6162)
0.035
1E04
3E07
(0.004)***
(0.000)
(0.000)
0.029
7.1E05
1.4E08
(0.004)***
(0.000)**
(0.000)
0.048
7.8E05
4.8E10
(0.004)***
(0.000)**
(0.000)
0.057
1.4E4
2.1E08
(0.005)***
(0.000)***
(0.000)
0.067
1.8E4
3.6E08
***
***
(0.005)
(0.000)
(0.000)**
0.081
2.5E4
5.5E08
(0.005)***
(0.000)***
(0.000)***
0.094
2.9E4
7.4E08
(0.005)***
(0.000)***
(0.000)***
0.112
4E4
1.2E07
(0.006)***
(0.000)***
(0.000)***
0.136
0.001
1.7E07
(0.005)***
(0.000)***
(0.000)***
0.206
0.001
3.0E07
(0.010)***
(0.000)***
(0.000)***
0.246
0.001
5.7E07
(0.014)***
(0.000)***
(0.000)***
0.036
(0.014)***
0.014
(0.00)
0.017
(0.012)*
0.019
(0.011)*
0.016
(0.010)
0.011
(0.011)
0.014
(0.010)
0.027
(0.009)***
0.018
(0.014)
0.046
(0.020)**
0.054
(0.054)
0.031
(0.026)
0.059
(0.021)***
0.100
(0.030)***
0.187
(0.031) ***
0.301
(0.087)***
0.787
(0.091) ***
1.126
(0.107) ***
1.801
(0.111) ***
2.268
(0.193) ***
2.781
(0.094) ***
2.631
(0.099) ***
0.033
(0.000)***
0.033
(0.000)***
0.033
(0.000)***
0.032
(0.000)***
0.032
(0.000)***
0.031
(0.000)***
0.031
(0.000)***
0.030
(0.000)***
0.030
(0.000)***
0.030
(0.000)***
0.031
(0.001)***
0.006
(0.000)***
0.007
(0.000)***
0.006
(0.000)***
0.006
(0.000)***
0.005
(0.001)***
0.004
(0.001)***
0.003
(0.001)***
0.003
(0.001)***
0.002
(0.001)***
0.002
(0.001)***
0.002
(0.002)
Estimated parameter (N = 11,810)
0.093
4.0E04
2.55E07
(0.005)***
(0.000)***
(0.000)***
0.065
2.5E4
4.8E08
(0.006)***
(0.000)***
(0.000)***
0.093
4.2E4
9.0E08
(0.005)***
(0.000)***
(0.000)***
0.112
4.7E4
9.6E08
(0.006)***
(0.000)***
(0.000)***
0.130
0.001
1.3E07
(0.006)***
(0.000)***
(0.000)***
0.154
0.001
1.4E07
(0.009)***
(0.000)***
(0.000)***
0.185
0.001
1.3E07
(0.008)***
(0.000)***
(0.000)***
0.238
0.001
1.9E07
(0.010)***
(0.000)***
(0.000)***
0.303
0.001
2.4E07
(0.008)***
(0.000)***
(0.000)***
0.314
0.002
3.5E07
(0.008)***
(0.000)***
(0.000)***
0.354
0.002
5.9E07
(0.011)***
(0.000)***
(0.000)***
0.006
(0.009)***
0.010
(0.009)
0.006
(0.009)
0.048
(0.010)***
0.040
(0.010)***
0.034
(0.012)***
0.018
(0.011)
0.001
(0.010)
0.037
(0.013)***
0.060
(0.014)***
0.103
(0.026)***
0.152
(0.033)
0.070
(0.020)***
0.152
(0.033)***
0.251
(0.023)***
0.263
(0.022)***
0.358
(0.047)***
0.417
(0.061)***
0.757
(0.070)***
1.301
(0.069)***
1.236
(0.043)***
0.824
(0.148)***
0.013
(0.000)***
0.013
(0.000)***
0.013
(0.000)***
0.013
(0.000)***
0.013
(0.000)***
0.012
(0.000)***
0.012
(0.000)***
0.011
(0.000)***
0.01
(0.000)***
0.011
(0.000)***
0.010
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.003
(0.000)***
0.003
(0.000)***
0.004
(0.000)***
0.005
(0.000)***
0.007
(0.001)***
0.013
(0.000)***
0.012
(0.000)***
0.011
(0.000)***
0.010
(0.000)***
0.009
(0.000)***
0.008
(0.000)***
0.007
(0.000)***
0.007
(0.001)***
0.005
(0.001)***
0.003
(0.001)***
0.002
(0.002)
Estimated parameter (N = 5796)
0.043
2E4
2.98E07
(0.006)***
(0.000)***
(0.000)***
0.044
1.1E4
3.6E08
(0.004)***
(0.000)***
(0.000)**
0.052
1.4E4
5.6E08
(0.005)***
(0.000)***
(0.000)***
0.058
2.0E4
7.9E08
(0.005)***
(0.000)***
(0.000)***
0.073
3.0E4
1.1E07
(0.005)***
(0.000)***
(0.000)***
0.095458
4.1E4
1.5E07
(0.006)***
(0.000)***
(0.000)***
0.111
5.2E4
1.9E07
(0.006)***
(0.000)***
(0.000)***
0.137
0.001
2.2E07
(0.010)***
(0.000)***
(0.000)***
0.165
0.001
2.7E07
(0.014)***
(0.000)***
(0.000)***
0.248
0.001
4.3E07
(0.008)***
(0.000)***
(0.000)***
0.095
0.001
2.5E07
(0.025)***
(0.000)***
(0.000)**
0.060
(0.015)***
0.057
(0.010)***
0.056
(0.010)***
0.074
(0.009)***
0.086
(0.009)***
0.096
(0.010)***
0.122
(0.013)***
0.135
(0.014)***
0.173
(0.020)***
0.165
(0.029)***
0.113
(0.072)
0.043
(0.019)**
0.020
(0.011)*
0.002
(0.013)
0.004
(0.010)
0.009
(0.010)
0.035
(0.013)***
0.047
(0.014)***
0.038
(0.020)*
0.047
(0.026)*
0.079
(0.028)***
0.160
(0.054)***
0.030
(0.000) ***
0.030
(0.000)***
0.030
(0.000)***
0.029
(0.000)***
0.029
(0.000)***
0.029
(0.000)***
0.028
(0.000)***
0.028
(0.000)***
0.028
(0.000)***
0.027
(0.000)***
0.035
(0.001)***
0.004
(0.000)***
0.016
(0.001)***
Estimated parameter (N = 4749)
0.049
3E4
3.35E07
(0.007)***
(0.000)***
(0.000)***
0.060
(0.015)***
0.116
(0.021)***
0.035
(0.000)***
Author's personal copy
285
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Table 6 (continued)
Q10
Q20
Q30
Q40
Q50
Q60
Q70
Q80
Q90
Q99
b1
USERRATINGR
b2
USERRATINGRSQ
b3
INTERACTION
b4
WEEKLYVARIETYC
b5
LOGTOTAL
b6
AGE
b7
AGESQ
b8
FREEPRICED
b9
CNETRATINGD
b10
WEEKLYRANK
0.164
(0.014)***
0.183
(0.017)***
0.170
(0.016)***
0.189
(0.017)***
0.237
(0.018)***
0.282
(0.015)***
4.673
(0.016)***
4.573
(0.018)***
3.989
(0.039)***
3.868
(0.039)***
0.047
(0.006)***
0.060
(0.007)***
0.050
(0.007)***
0.059
(0.007)***
0.080
(0.007)***
0.100
(0.006)***
0.368
(0.006)***
0.396
(0.008)***
0.367
(0.016)***
0.185
(0.017)***
0.004
(0.000)***
0.003
(0.000)***
0.003
(0.000)***
0.003
(0.000)***
0.002
(0.000)***
0.003
(0.001)***
0.130
(0.001)***
0.141
(0.001)***
0.154
(0.001)***
0.050
(0.001)***
0.015
(0.001)***
0.013
(0.001)***
0.011
(0.001)***
0.010
(0.001)***
0.010
(0.001)***
0.008
(0.002)***
0.004
(0.001)***
0.003
(0.001)***
0.003
(0.002)
0.003
(0.001)
0.060
(0.008)***
0.078
(0.007)***
0.100
(0.007)***
0.125
(0.008)***
0.162
(0.008)***
0.195
(0.010)***
0.005
(0.009)***
0.005
(0.006)***
0.003
(0.014)***
0.001
(0.007)***
3.4E4
(0.000)***
3.3E4
(0.000)***
4.2E4
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.239
(0.000)***
0.264
(0.000)***
0.347
(0.000)***
0.468
(0.000)***
5.8E08
(0.000)***
4.0E08
(0.000)***
6.0E08
(0.000)**
9.6E08
(0.000)***
1.5E07
(0.000)***
2.0E07
(0.000)***
0.001
(0.000)***
0.001
(0.000)***
0.002
(0.000)***
0.002
(0.000)***
0.063
(0.013)***
0.056
(0.013)***
0.053
(0.013)***
0.058
(0.014)***
0.048
(0.016)***
0.041
(0.015)***
2.4E07
(0.020)**
2.9E07
(0.018)
3.6E07
(0.028)
5.7E07
(0.034)***
0.158
(0.021)***
0.167
(0.019)***
0.171
(0.016)***
0.200
(0.021)***
0.238
(0.020)***
0.272
(0.020)***
0.039
(0.022)***
0.027
(0.029)***
0.018
(0.045)***
0.154
(0.041)***
0.035
(0.000)***
0.035
(0.000)***
0.034
(0.000)***
0.033
(0.000)***
0.032
(0.000)***
0.031
(0.000)***
0.285
(0.000)***
0.281
(0.000)***
0.120
(0.001)***
0.301
(0.001)***
Notes: In each category, the first row presents the OLS estimation results and the remainder reports the quantile regression estimation results. Due to limited space, only the
results of selected quantiles are reported.
*
p < .10 standard errors in parentheses.
**
p < .05 standard errors in parentheses.
***
p < .01 standard errors in parentheses.
PRICEDit is used to control for the license difference of the software.
The CNETD editorial staff reviews some of the software programs
(less than 20%), with an emphasis on the more popular ones, and
the editors use a five-star rating system similar to the user review
system. We thus use a dummy variable CNETRATINGDit to control
for the impact of the availability of a CNET rating. Finally, WEEKLYRANKit is also included to control for the influence of product popularity information, as well as for the potential herding effect
(Duan et al. 2009, Tucker and Zhang 2007). Therefore, our final
quantile regression model can be expressed as the following:
logðWEEKLYDOWNLOADait Þ ¼ b0 ðaÞ þ b1 ðaÞ USERRATINGRit
þ b2 ðaÞ USERRATINGR SQ it
þ b3 ðaÞ WEEKLYVARIETY Ct
USERRATINGRit þ b4 ðaÞ
WEEKLYVARIETY Ct þ b5 ðaÞ
logðTOTALDOWNLOADit Þ
þ b6 ðaÞ AGEit þ b7 ðaÞ
AGESQ it þ b8 ðaÞ
FREEPRICEDit þ b9 ðaÞ
CNETRATINGDit þ b10 ðaÞ
WEEKLYRANK it þ eit ðaÞ
ð3Þ
The simple impact of product variety is measured by b4. If the
interaction effect indicated by the significance of b3 is present,
overall, one additional product added to CNETD would result in a
download change of b4 þ b3 USERRATINGRit . By looking into the
estimations of b1–b4 over a series of quantiles a, we would be able
to derive the impacts of online user reviews and product variety on
the skewed shape of the software download distribution. Specifically, products that have more weekly downloads, namely in higher quantiles (larger a) of weekly download distribution, are more
popular. Therefore, by comparing bj(a) over quantiles, we could
determine whether the influencer xijt is more significant on the hits
than the niches and could further determine whether this influence contributes to the superstar or the long tail phenomenon.
4.3. Quantile regression results
We estimate the quantile regression models using the aggregated weekly observations for a series of quantiles including
5th, 10th, . . . , 95th, 99th, in each of the four categories in period 2,
during which the coexistence of long tail and superstar phenomena
has been identified by our descriptive analysis.5 Tables 4 and 5
present the descriptive statistics and correlation matrix of our
weekly CNETD data.
To facilitate the interpretation of the quantile regression estimation results shown in Table 6, Fig. 6 provides quantile plots
for some key independent variables for each category. We plot
each quantile regression estimator of these variables for a (quantile), ranging from 5th to 99th, as the solid black curve. The gray
area illustrates the 95% confidence interval. For all categories, most
estimators are significant with narrow confidence intervals across
all quantiles, which shows good overall model fits.
We first examine the significance of the interaction effect between online user reviews and product variety because this interaction is closely related to our further inferences regarding the
impact of both online user reviews and product variety. As expected, the coefficient of WEEKLYVARIETY Ct USERRATINGRit (b3) is
significantly negative, which supports the proposed interaction effect between user reviews and product variety. In addition, we notice that this coefficient has a slightly downward slope across
quantiles. Given the significance of this interaction effect,
when WEEKLYVARIETY Ct is not zero (i.e., the mean), the overall
impact of online user reviews can be expressed by b1
USERRATINGRit þ b2 USERRATINGR SQ it þ b3 WEEKLYVARIETY Ct USERRATINGRit . Thus, the magnitude of the impact of user reviews is
contingent on the level of product variety. The coefficients of
USERRATINGRit (b1) and USERRATINGRSQit (b2) are both significantly
positive for most quantiles and are increasing over the quantiles.
This result suggests that the simple effect of positive user reviews
is positive and stronger for more popular products. However, the
opposite signs and opposite trends across quantiles of the simple
5
We also extend our empirical examination to additional categories, which
generates qualitatively similar results. The results are available upon request.
Author's personal copy
286
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Fig. 6. Quantile plots for key independent variables.
effect (b1 USERRATINGRit þ b2 USERRATINGR SQ it ) and the interaction effect (b3 WEEKLYVARIETY Ct USERRATINGRit ) suggest that the
total impact of positive user reviews, indicated by the summation
of the two terms, does not have a definite trend over product popularity. In other words, the effect depends on the level of product
variety. Without knowing the specific level of product variety, we
cannot determine whether positive user reviews reduce the heterogeneity of the consumption pattern, leading to the superstar phenomenon, or whether they help shift the demand from the hits to
the niches, leading to the long tail phenomenon. Similarly, in terms
of negative user reviews, we start by examining its simple effect at
each possible value of USERRATINGR for negative user reviews (i.e.,
0.5, 1, 1.5, 2), because of the opposite signs of USERRATINGRit
and USERRATINGRSQit. The simple effect of negative user reviews is
shown to be negative and gets more negative across quantiles.
Nevertheless, the opposite signs and opposite trends across quantiles of the simple effect and the interaction effect with product
variety also suggest that the negative impact of negative user reviews on user choices of products does not have a definite trend
over different levels of product popularity.
In sum, these results show that our study of how online user reviews influence the superstar and long tail phenomena is inconclusive; the results depends on the level of product variety. Omitting
the interaction effect between online user reviews and product
variety could result in misleading inferences because considering
the simple effect alone results in inaccurate conclusions. This
finding helps explain the mixed results regarding the impact of online user reviews on the skewed shape of user choices in extant
studies, because most works examine the impact in various
contexts but don’t consider the influence of product variety (Duan
et al. 2009, Fleder and Hosanagar 2009, Hervas-Drane 2009,
Maryanchyk 2008, Oestreicher-Singer and Sundararajan 2009,
Zhao et al. 2008, Zhu and Zhang 2010).
In addition, the interaction effect between online user reviews
and product variety indicates that an increase in product variety
weakens the impact of online user reviews. In terms of
positive user reviews, both the negative value of b3
WEEKLYVARIETY Ct USERRATINGRit and the positive simple effect
indicate that higher product variety leads to a less positive impact from positive user reviews. More product variety, nevertheless, weakens the impact of negative user reviews because of the
positive value of b3 WEEKLYVARIETY Ct USERRATINGRit . In other
words, higher product variety always dilutes the impact of user
reviews, regardless of their rating levels. The magnitude discrepancy between the impacts of positive and negative user reviews,
measured by 2 USERRATINGit ðb2 þ b3 WEEKLYVARIETY Ct Þ, is
thus reduced by the increase in product variety. Moreover, the
slightly downward trend of the interaction effect (b3) suggests
that the weakening effect of product variety on the influence
of user reviews is more significant on popular products than
on niche products. This result suggests that, when they have
more product choices, consumers may depend less on individual
Author's personal copy
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
product reviews, especially for popular products. This result is
somewhat counterintuitive because we expect consumers to be
more dependent on others’ feedback to make decisions when
facing a very large pool of choices. However, recent studies do
identify the herding effect among consumers for popular products, regardless of user reviews (Duan et al. 2009). Thus, a possible explanation for the result would be that when product
variety increases, consumers may simply follow previous consumers’ choices without resorting to user reviews.
In terms of product variety, the coefficients of WEEKLYVARIETY Ct
(b4) are significantly positive, with a larger value in the lower
quantile. This result shows that the simple effect of product variety
has a stronger positive impact on niche products than on hit products. To evaluate the overall impact of product variety, we conduct
a series of estimations for b3 USERRATINGRit þ b4 at each possible
value of USERRATINGRit , i.e., 2, 1.5, . . . , 1.5, 2. We find that the total
effect of product variety on software downloads is also positive
and shows a trend similar to its simple effect. In sum, enhanced
product variety leads to the development of a relatively fatter tail
in download pattern by helping generate more demand for the
niches regardless of the rating level of user reviews, which contributes to the long tail formation. As proposed in many long tail theories (e.g., Brynjolfsson et al. 2011), providing more product
choices satisfies niche preferences of some consumers, who would
otherwise have to choose popular products despite their suboptimal match. It is noted that for products with positive user reviews,
product variety might even have a negative impact on very popular
products, which echoes the ‘‘overchoice’’ effect argued by Gourville
and Soman (2005).
As a comparison, we also conduct the OLS regression estimation,
and the results of the key covariates are shown in the first row of
Table 6. The OLS estimators on USERRATINGit R (b1) are significantly
different from the corresponding median estimators of the quantile
regression in all four categories, which suggests that the impact of
online user reviews on the download distribution (WEEKLYDOWNLOADit) is heavily skewed, and OLS regression might not be able to
provide a consistent estimation for the mean effect. Moreover, in
the categories of Digital Media Player and Download Manager, the
OLS estimators of coefficients of USERRATINGRSQit (b2) are both
insignificant compared with the significant quantile regression
estimates. This OLS result could generate a misleading conclusion:
that the impact of user reviews with different rating levels is uniform; but this conclusion contradicts the results from extant studies (Chevalier and Mayzlin 2006). Thus, the concern again is raised
that OLS regression might not be suitable when dealing with a
highly skewed distribution of the dependent variable. In addition,
OLS provides only one snap shot of the overall picture of how user
reviews and product variety influence software downloads, which
fails to explain the shape of the entire download distribution. Such
a comparison empirically highlights the advantages of using the
quantile regression over least-squares regression in long tail studies. Facilitated by this more systematic approach, the intertwined
and complicated relationships between online user reviews and
product variety and their impact on software downloads could offer
possible explanations and interpretations for the inconsistent findings in extant research regarding the long tail and superstar effects
of either demand side or supply side factors.
5. Discussion, limitations, and future research
The objective of this paper is to demonstrate how both demand
side and supply side factors in the online shopping environment
influence user choices, thus cumulatively contributing to changes
in online consumption patterns. We conduct our analysis in the
287
context of online software downloading that demonstrates the
overwhelming amount of product choices available in the current
electronic marketplaces. Our results provide a more comprehensive understanding of the underlying mechanism of long tail and
superstar phenomena by differentiating the levels of both user review ratings and product popularity, as well as recognizing the
interaction effect between user reviews and product variety.
The interaction effect identified between product variety and
online user reviews provides a new perspective for understanding
long tail and superstar effects. The interaction effect suggests a
more complicated relationship between online user reviews and
the heterogeneity of user choices. The increase in product variety
weakens the impact of both positive and negative user reviews,
and this weakening effect is more pronounced on popular products
than on niche products. Thus, suggesting and predicting a unified
trend in the impact of user reviews on consumption pattern across
various product popularity levels without considering the number
of products offered would lead to inaccuracies and biases. The
intricate and entangled mechanism, in which online user reviews
interact with product variety, contributes to the complexity of
and difficulty in the efforts to measure and understand the demand
pattern in this line of research. Therefore, the interaction effect
helps explain the coexistence of superstar and long tail phenomena
observed in this study. In contrast to online user reviews, we are
able to identify a well-defined trend of the impact of product variety across various product popularities. Overall, product variety
influences user choices in favor of the long tail formation, regardless of the rating level of user reviews. This result is consistent with
one of the key arguments of long tail advocates—that a more
extensive assortment of product choices could better match consumers’ diversified preferences (Anderson 2006, Brynjolfsson
et al. 2003). It also helps to explain the long tail phenomenon identified in our descriptive analysis.
Our results generate some new insights that help reconcile the
divergent findings regarding the different impacts of online user
reviews on the skewed distribution shape of user choices. One possible explanation for the mixed results on the impact of online user
reviews across different product popularity levels could be the failure to consider their interplay with product variety. Previous studies have been conducted in various contexts, with different levels
of product variety during their data collection periods. The interaction effect we identify between online user reviews and product
variety indicates that user reviews’ contribution to the skewed distribution of user choices depends on the specific product variety level within the study context.
In addition, this paper provides a potential explanation for different opinions regarding the mean impact of online user reviews
on user choices by pointing out that their impact varies with product popularity (Chen et al. 2004, Chevalier and Mayzlin 2006, Duan
et al. 2008, Liu 2006). Chevalier and Mayzlin (2006) found that the
5-star reviews improve sales whereas 1-star reviews hurt sales,
and the impact of 1-star reviews is greater than the impact of 5star reviews. The increase in valence of online user reviews results
in more sales. In contrast to this significant impact attributed to
online user reviews, Chen et al. (2004), Liu (2006) and Duan
et al. (2008) argued that user reviews do not influence user choices.
This inconsistency could be caused by a failure to analyze the impact of user reviews in a uniform way across the entire spectrum of
user choices.
This paper also contributes to the understanding of product
assortment strategies for online platforms. Most extant studies
agree that enhanced product variety benefits sales because of the
much lower inventory costs in online stores. However, Gourville
and Soman (2005) identified the ‘‘overchoice’’ effect, which refers
to the negative impact of an increased product assortment on
Author's personal copy
288
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
consumer choices. Their key argument is that large amounts of
complex information cause cognitive overload and anticipation of
regret. The impact is so profound that customers might decide
not to make any choice at all. Our results provide another perspective to interpret this potential negative impact of product variety.
The only case in which the increased product variety might result
in fewer downloads by consumers occurs in a small group of very
popular products that have received positive user reviews. In essence, this finding echoes the key argument of long tail advocates,
who claim that consumers with diversified preferences might
make suboptimal choices when facing a limited selection. In this
suboptimal case, very popular products with highly positive user
feedback are probably favored by some consumers as substitutes
for their best matches. If these consumers have preferences for certain niche products, as Anderson (2006) argued, overwhelmingly
high product variety would eventually shift these consumers’ final
choices from the hits to the niches that better satisfy their preferences. This finding also helps explain the observation identified in
our descriptive analysis that superstar products are individually
getting less popular over time.
Finally, the results of this study also provide some managerial
implications. Many companies are starting to consider leveraging
the impact of the emerging niche market. Schmidt (2005), Google’s
CEO, referenced the long tail in his statement about Google’s mission: serving both the ‘‘individual contributors, the small business,
the company where Joe or Bob is the CEO, the CIO, the CFO, and the
worker and the support person—a one person company, a two-person
company, a three-person company’’ and a very large number of customers as well. From the perspective of social planners, because
limitations on store space are less of a concern through Internet
channels, making more products available is always preferred to
increase consumer welfare (Brynjolfsson et al. 2003). However,
the observations from our descriptive analysis suggest that retailers should be cautious about switching their strategy from highlighting the hits to promoting the niches because superstars still
dominate the demand. Merchants need to pay careful attention
when making decisions about entering the niche market. Although
the niche products do meet some consumers’ demand, the evidence that they can gain a larger market share in this more competitive market is lacking. In addition, the identified interaction
effect suggests that retailers should adjust the product selection
criteria to their marketing strategies. Those, who have the capacity
limit and thus focus on marketing superstars, should cautiously select products of high qualities or slant their selections to consumers’ ordinary tastes to stimulate positive user reviews. For online
retailers whose products seriously suffer from an excess of negative user feedback, providing more products would be a promising
solution to diminish the negative impact of the feedback.
One limitation of this study is that the data do not allow us to
quantify the potential benefits from the coexistence of long tail
and superstar phenomena. Bentley et al. (2009) analytically concluded that the optimal inventory for digital retailers is far from
infinite because of the diminishing sales as they proceed further
into the tail products. An interesting extension of the current
study is to examine whether the niche market improves at the
expense of the hit market. In addition, we use a dummy variable
to control for the existence of expert ratings, but don’t consider
the valence of expert reviews because of the lack of an adequate
sample. Only a very small portion of software products (less than
20%) is reviewed by CNETD. The quantile regression results in
each category show that the existence of expert reviews can have
either a positive influence or a negative influence on software
downloading. This influence is stronger with respect to the more
popular products. One interesting extension of this study in future research is to examine the impact of expert ratings on the
whole distribution of consumer choices. Extant studies focused
primarily on either expert reviews or user reviews. Future research, therefore, also calls for a more comprehensive investigation of the impact of online WOM information generated by
different reviewer identity on consumers’ decision-making. In
addition, evidence from the literature seems to suggest that product types may play a role in explaining the superstar and long tail
phenomena. Of course, the data used in this study are in the online software downloading market; yet, our analyses may very
likely yield additional findings and implications in other industries, which definitely calls for extended investigations in future
research that include more product types in one study. Furthermore, the use of free or free-to-try products in this study may
lead to concerns about the generalizability of our results. Online
users might treat decisions about free or free-to-try software programs less seriously than they make decisions to purchase more
expensive products. However, most software programs require
users to learn new interfaces and explore new functionalities,
and even free or free-to-try software programs often require significant commitments from users (Duan et al. 2009). For example,
Digital Media Player software is often used to manage media files,
and users need to input extensive information about various files
and invest significant effort to process these files. From this point
of view, free-to-download software programs are not substantially different from other software products that online users
have to purchase. Finally, our study considers informational effect
on consumer choices in only one third-party website, the CNETD.
Consumers might search multiple information resources online
before making product choices, especially for purchase decisions,
which leads to the important future research direction of examining how information from multiple online resources (e.g., both retail and third-party websites) influences consumer choices and
ultimately product sales.
Acknowledgements
The authors thank Refik Soyer, the reviewers and conference
participants of the 2009 China Summer Workshop on Information
Management (CSWIM 2009), International Conference on Information Systems (ICIS 2009), and seminar participants at George
Washington University for valuable comments on this research.
All errors are our own.
References
Aiken, L. S., and West, S. G. Multiple Regression: Testing and Interpreting Interactions.
Sage Publications, Thousand Oaks, CA, 1991.
Anderson, C. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion
Press, New York, NY, 2006.
Bakos, Y. Reducing buyer search costs: implications for electronic marketplaces.
Management Science, 43, 12, 1997, 1676–1692.
Bentley, R. A., Ormerod, P., and Madsen, M. E. Shelf space strategy in long-tail
markets. Physica A: Statistical Mechanics and its Applications, 388, 5, 2009, 691–
696.
Brynjolfsson, E., and Kemerer, C. F. Network externalities in microcomputer
software: an econometric analysis of the spreadsheet market. Management
Science, 42, 12, 1996, 1627–1647.
Brynjolfsson, E., Hu, Y., and Smith, M. Consumer surplus in the digital economy:
estimating the values of increased product variety at online booksellers.
Management Science, 49, 11, 2003, 1580–1596.
Brynjolfsson, E., Hu, Y., and Smith, M. From niches to riches: the anatomy of the long
tail. Sloan Management Review, 47, 4, 2006, 67–71.
Brynjolfsson, E., Hu, Y., and Smith, M. The long tail: the changing shape of Amazon’s
sales distribution curve. Working paper, SSRN, 2010a. http://ssrn.com/
abstract=167999.
Brynjolfsson, E., Hu, Y., and Smith, M. Long tails vs. superstars: the effect of
information technology on product variety and sales concentration patterns.
Information Systems Research, 21, 4, 2010b, 736–747.
Brynjolfsson, E., Hu, Y., and Simester, D. Goodbye Pareto principle, hello long tail:
the effect of search costs on the concentration of product sales. Management
Science, 57, 8, 2011, 1373–1386.
Cheema, A., and Papatla, P. Relative importance of online versus offline information
for internet purchases: the effect of product category and Internet experience.
Journal of Business Research, 63, 9–10, 2010, 979–985.
Author's personal copy
W. Zhou, W. Duan / Electronic Commerce Research and Applications 11 (2012) 275–289
Chen, P. Y., Wu, S. Y., and Yoon, J. The impact of online recommendations and
consumer feedback on sales. In Proceedings of the 25th International Conference
on Information Systems (ICIS 2004), Washington, DC, December 12–14, 2004,
711–724.
Chevalier, J. A., and Mayzlin, D. The effect of word of mouth on sales: online book
reviews. Journal of Marketing Science, 43, 3, 2006, 345–354.
Clemons, E., Gao, G., and Hitt, L. When online reviews meet hyperdifferentiation: a
study of the craft beer industry. Journal of Management Information Systems, 23,
2, 2006, 149–171.
Duan, W., Gu, B., and Whinston, A. B. The dynamics of online word-of-mouth and
product sales—an empirical investigation of the movie industry. Journal of
Retailing, 84, 2, 2008, 233–242.
Duan, W., Gu, B., and Whinston, A. B. Informational cascades and software adoption
on the internet: an empirical investigation. MIS Quarterly, 33, 1, 2009, 23–48.
Elberse, A., and Oberholzer-Gee, F. Superstars and underdogs: an examination of the
long tail phenomenon in video sales. Working paper no.07-120, Harvard
Business School, Cambridge, MA, 2008.
Fleder, D., and Hosanagar, K. Blockbuster culture’s next rise or fall: the impact of
recommender systems on sales diversity. Management Science, 55, 5, 2009, 697–
712.
Frank, R. H., and Philip, J. C. The Winner-Take-All Society. The Free Press, New York,
NY, 1995.
Gallaugher, J. M., and Wang, Y. M. Understanding network effects in software
markets: evidence from web server pricing. MIS Quarterly, 26, 4, 2002, 303–327.
Ghose, A., and Gu, B. Search costs, demand structure and long tail in electronic
markets: theory and evidence. Working paper no. 06-19, NET Institute, 2007.
Godes, D., and Mayzlin, D. Using online conversations to study word of mouth
communication. Marketing Science, 23, 4, 2004, 545–560.
Goldmanis, M., Hortaçsu, A., Syverson, C., and Emre, O. E-commerce and the market
structure of retail industries. The Economic Journal, 120, 545, 2009, 651–682.
Gourville, J. T., and Soman, D. Overchoice and assortment type: when and why
variety backfires. Marketing Science, 24, 3, 2005, 382–395.
289
Hansen, F. Psychological theories of consumer choices. Journal of Consumer Research,
3, 3, 1976, 117–142.
Hervas-Drane, A. Word of mouth and tasting matching: a theory of long tail.
Working paper no. 07-14, NET Institute, 2009.
Judd, C. M., and McClelland, G. H. Data Analysis: A Model Comparison Approach.
Harcourt Brace Jovanovich, San Diego, CA, 1989.
Kenny, D. A., Kashy, D. A., and Bolger, N. Data analysis in social psychology. In D.
Gilbert, S. Fiske, and G. Lindzey (eds.). Handbook of Social Psychology, Vol. 1,
McGraw-Hill, Boston, MA, 1998, 233–265.
Koenker, R., and Gilbert, B. Regression quantiles. Econometrica, 46, 1, 1978, 33–50.
Liu, Y. Word of mouth for movies: its dynamics and impact on box office revenue.
Journal of Marketing, 70, 3, 2006, 74–89.
Maryanchyk, I. Are ratings informative signals? The analysis of the Netflix data.
Working paper no. 08-22, NET Institute, 2008.
Oestreicher-Singer, G., and Sundararajan, A. Recommendation networks and the
long tail of electronic commerce. Working paper no. 09-03, NET Institute, 2009.
Rosen, S. The economics of superstars. American Economic Review, 71, 5, 1981, 845–
858.
Schmidt, E. Presentation, Annual Stockholders’ meeting, Google Inc., May 12, 2005.
Tan, T. F., and Netessine, S. Is Tom Cruise threatened? Using Netflix prize data to
examine the long tail of electronic commerce. Working paper, Wharton
Business School, University of Pennsylvania, Philadelphia, PA, 2009.
Tucker, C., and Zhang, J. J. Long tail or steep tail? A field investigation into how
online popularity information affects the distribution of customer choices.
Working paper no. 4655-07, Massachusetts Institute of Technology, Cambridge,
MA, 2007.
Zhao, X., Gu, B., and Whinston, A. B. The influence of online word-of-mouth long tail
formation: an empirical analysis. In Proceedings of the Conference on Information
Systems and Technology (CIST 2008), Washington, DC, October 11–12, 2008.
Zhu, F., and Zhang, X. Impact of online consumer reviews on sales: the moderating
role of product and consumer characteristics. Journal of Marketing, 74, 2, 2010,
133–148.