A Bayesian Lifetime Model for the \Hot 100"
Billboard Songs
Eric T. Bradlow and Peter S. Fader
Abstract
People have long been enamored by ranked lists of people (e.g., \best-dressed" lists),
places (e.g., best cities to live in), things (e.g., most popular songs, books, and movies)
and countless other entities. Likewise, people are equally interested in watching these ranks
evolve over time { and speculating about possible future changes (e.g., who will be #1 next
week?) We focus on a popular { but fairly typical { ranked list (the Billboard \Hot 100"
Songs) in order to explain and model the simultaneous movement of multiple items (songs)
up and down the chart over time. While our interest in Billboard data partly reects the
glamour of the music industry, these charts provide a very rich and general data structure.
Surprisingly little research has been done on time-series models for ranked objects. We
further enrich the dataset by adding covariates (e.g., artist history) to capture additional
sources of variation across songs and over time. We posit a model for the time-series of charts
based on a latent lifetime (worth) process. Specically, the latent popularity of each song is
assumed to follow a generalized gamma (GG) lifetime curve with double exponentially (DE)
distributed errors. The immense exibility in the GG family allows the mean of a song's
latent worth process to follow an exponential, Weibull, lognormal, or gamma curve (among
others), reecting the many possible paths it might take through the charts. The DE error
structure is used for convenience as it leads to a well-established \exploding" multinomiallogit likelihood. This framework is embedded in a Bayesian structure in which parameters
of the GG curve are song specic, with means related to observed covariates, and assumed
to come from a multivariate lognormal prior. Inferences from the model are obtained from
posterior samples using Markov chain Monte Carlo techniques.
KEY WORDS: Exploding Likelihood, Latent Worth, Ranked Data, Time Series
Eric T. Bradlow is Assistant Professor of Marketing and Statistics and Peter S. Fader is Associate Professor
of Marketing, The Wharton School of the University of Pennsylvania, Philadelphia, PA. This research was
supported by the Sol C. Snider Entrepreneurial Center, The Wharton School of the University of Pennsylvania.
1
1 Introduction
Whether it is former Wharton School Dean Thomas P. Gerrity standing on top of the goalposts
at the University of Pennsylvania's Franklin Field celebrating the #1 position in the Business
Week business school rankings (1999), rankings of economics journals reported in the American
Economist (Davis, 1998), rankings of sales force personnel (Agency Sales Magazine, 1999), the
New York Times best-seller list of books, or the Environmental Protection Agency's \Nifty
Fifty List" of top chemical polluters (Pollution Engineering, 1998), people and institutions care
about rankings. The psychological and economic impact of being \on the list" can be staggering,
with far-reaching impact on awareness, perceptions, and prots. In many cases, ranked lists
provide the most signicant source of information that consumers utilize for a given product
domain (e.g., Consumer Reports), and in such cases a rm's ranking can make or break its
future prospects.
In and of itself, therefore, a single ranked list can generate much managerial interest; moreover, in those cases where the ranked list is constructed repeatedly over time, people care not
only about their current ranking but also about their movements from one time point to the
next. In many cases, perhaps including all of the examples above, a change in one's ranked
position { either up or down { may be more impactful than the position itself.
Our expectations, when considering the widespread existence of time-series of ranked lists,
was that a considerable body of literature, whether in statistics or the social sciences, would
exist on this topic. Not surprisingly, our search eorts found much recent research on timeseries models (Chateld 1996, West and Harrison 1999) and probability models for ranked data
(Thurstone-Mosteller-Daniels Model - Daniels, 1969; Stern, 1990; Fligner and Verducci, 1992),
but in sharp contrast we uncovered virtually no research at the intersection of these two popular
topics. In this respect, our research appears to be entirely novel.
Before describing our modeling eorts in mathematical detail (in Section 5), we provide an
1
informal heuristic description of our model to highlight its basic features and scientic interest.
We posit a probability model for a time-series of ranked data based on a Bayesian latent lifetime
\worth" process. That is, at every point in time, each unit (e.g., song, book, or business school,
etc.) has a non-negative latent worth, determined by a stochastic utility model (McFadden,
1974), in which the latent worth is the sum of a deterministic (mean latent) component and
a stochastic error term. The Bayesian nature of our model is reected by the fact that the
parameters governing each unit's latent worth function, are assumed to come from a common
population multivariate distribution. This allows for the borrowing of information across units,
reecting the commonalities likely to occur in the way that units move throughout the ranked
lists over time. The mechanics, and hence the science and diculty of the problem, is therefore
in the choice of the mean latent worth function, and the distribution of the random stochastic
errors. Our choices and rationale for each is described next.
When imagining the path that a unit could take through a set of ranked lists over time (i.e.
its vector of observed ranks), one can imagine many possible \shapes". For instance, in lists
of the Hot 100 Billboard Songs (our application in this research), some songs peak very soon
after they enter the chart and decay thereafter, until they drop o the chart entirely. Others,
in contrast, might gain popularity slowly over time (perhaps due to spreading word-of-mouth)
and thus have more of a \bell-shaped" history. Naturally, our goal was to choose a latent worth
function (curve) that was exible enough to allow for these and many other shapes over time.
To accommodate this wide variety of possible shapes, we chose a family of curves associated
with a very exible probability density function, namely the Generalized Gamma (McDonald
and Butler, 1990) to capture the pattern of changes in the latent worth of each unit over time.
The Generalized Gamma is an extension of the standard gamma family with two shape, one
scale, and a shift parameter. For instance, the Exponential, Weibull, Gamma, and Lognormal
distributions are all special cases of this family, thus indicating its exibility. It is important to
note, however, that we are not using the Generalized Gamma in its standard way as a density;
2
we are intentionally calling it a \curve," which will be used to represent the intensity function
that a song's deterministic worth is assumed to follow over time. To keep this distinction clear,
we will make reference to this time-varying intensity function as a GGC (Generalized Gamma
Curve).
In addition, our Bayesian framework assumes that each unit's GGC parameters are drawn
from a population distribution with mean related to observed unit covariates. In this manner,
we are able to relate the shape of a unit's GGC to its known characteristics (e.g. a musical
artist with many previous chart hits may have a GGC that rises quickly, thus aecting its shape
parameters). Further details of the GGC specication are given in Section 5.
Our choice of stochastic error distribution, which was driven by mathematical convenience,
follows that of Chapman and Staelin (1982). In their research they demonstrate that by assuming i.i.d. doubly exponentiated stochastic errors in a random utility model, the overall
likelihood for an entire ranked list can be expressed as a so-called \exploding-likelihood." That
is, the probability of any given ranking can be computed by an \explosion" of conditional probabilities, where, for example, P(ABC) = P(A is ranked 1, B is ranked 2, C is ranked 3) =
P (A = 1)P (B = 2jA = 1)P (C = 3jA = 1; B = 2), i.e., each term in the explosion is given by
an appropriate multinomial choice probability. Thus, the essence of our model is the unique
juxtaposition of the dynamic worth-process (using a GGC) to capture each unit's latent popularity over time, along with an exploding-likelihood formulation to convert this set of latent
worths into a set of observable ranks at any given point in time.
The remainder of the paper is laid out as follows. In Section 2, we provide our motivating
example, the set of all 52 weekly \Hot 100" Billboard Song Charts from 1993. An exploratory
analysis of the Billboard data is presented in Section 3, in which we dene a ve-number
summary of each song used to describe its features and provide a basis for evaluating model
t. In Section 4 we lay out the general data structure (and relevant notation) modeled by our
3
approach. The generality of the structure is emphasized to indicate its applicability to many
common datasets. A detailed specication of our Bayesian latent lifetime model approach,
including a description of the GGC curves, stochastic error structure, and prior densities is
given in Section 5. In addition, a brief description of our computational approach using Markov
chain Monte Carlo (MCMC) methods is given. Model results and inferences are presented in
Section 6. A brief summary of our ndings and areas for future research are described in Section
7.
2 The Hot 100 Billboard Song Charts
Since 1913, Billboard Magazine has chronicled the rise and fall of over 20,000 popular songs
in America. Throughout the rst half of the century, Billboard published a variety of ranked
charts, including best-sellers in retail stores, top songs in terms of radio airplay, best-sellers in
sheet music, and most-played songs in juke boxes. But while these dierent ranking methods
covered dierent aspects of the music industry, none of them stood out as the single best gauge
of overall song popularity. Thus, in 1958, Billboard combined two of these measures (retail
sales and airplay) to create the Hot 100, which quickly became \the denitive weekly ranking
of America's most popular songs" (Whitburn 1996). Soon afterwards, according to Whitburn,
Billboard discontinued most of these other charts in favor of the all-encompassing Hot 1001 .
The impact of the Hot 100 on the music industry and American popular culture has been
quite dramatic. The concept of the \Top 40" { which is simply the set of songs occupying the
top 40 slots in the Hot 100 in any given week { is deeply entrenched in our society. Furthermore,
phrases such as \with a bullet" (used to describe a song that has moved up the charts very
quickly) have become ubiquitous, and clearly indicate the importance of week-to-week changes
1
In 1984, Billboard began publishing separate sales and airplay charts once again, but neither of these charts
{ or any of the dozens of other charts currently put out by Billboard { has the visibility or impact within the
music industry anywhere comparable to the Hot 100.
4
in the Hot 100 rankings. In addition, the business press frequently comments on new and
dierent marketing tactics that record labels use to try to improve a song's chart position (e.g.,
\'Chatting' a Singer Up the Pop Charts," Wall Street Journal, 1999).
Over the years, Billboard has tinkered with dierent methods to combine the raw sales
and airplay gures to derive the Hot 100 rankings, but the underlying formulas are generally
kept condential. In 1998, for instance, Billboard announced a dramatic change to its chart
procedures, to reect the fact that many of today's songs go \straight to radio" without ever
being available for retail purchase as singles. Billboard's own description of the change is
insightful and relevant to the point of our paper:
\Throughout the years, Billboard's charts have changed to better reect marketplace
conditions. With the Dec. 5, 1998 issue, the Billboard Hot 100 chart underwent
a profound change that seeks to utilize new applications of modern technology to
restore the eclectic avor in which the chart was originally steeped. The goal was
deceptively simple: to reveal the most popular songs in the United States. Period."
This bottom-line conclusion ts well with the objectives of our model. While the publishers
of Billboard magazine are unwilling to formally quantify the precise magnitude of popularity
that a particular song enjoys at a given moment in time, they are quite comfortable making
statements about the relative popularity of a set of songs (i.e. produce a set of ordered ranks,
\a chart"). This data constraint requires us to utilize an ordinal modeling approach (in this
research, the exploded logit model described above as we do not observe the underlying interval
scaled radio airplay and sales gures). Similarly, since the changes in popularity (or \worth")
of each song over time are also not directly observable in any way, we rely on a latent \curvetting" process in order to connect the time-series of charts together.
As the basis for our empirical analysis, we use the 52 weekly Billboard Hot 100 charts
from 1993. This dataset includes a total of 444 unique songs, i.e., 100 songs that comprised
5
the chart in the rst issue of Billboard magazine that year (1/9/93), and 344 additional songs
that entered the chart during the subsequent 51 weeks. In the next section, we describe some
summary statistics that oer some tangible examples of the dierent patterns evident in the
dataset.
When running the model, we augment the rank (chart) data with a set of covariates that
may help explain some of the heterogeneity in chart movements over time and across songs. A
number of descriptive characteristics of songs can be gleaned from Top Pop Singles: 1955-1996
(Whitburn, 1997), a book that provides detailed information about the history of each artist,
as well as information about each individual song. In a preliminary analysis, we observed that
many candidate covariates are highly collinear with one another (e.g., number of previous #1
hits and number of previous Hot 100 songs by each artist). We therefore pared the set of
covariates down to two reasonably independent measures: number of previous Hot 100 songs
by each artist, and a binary variable indicating whether or not each song appeared on a movie
soundtrack. This is not a particularly comprehensive set of covariates; other measures (not
available from Whitburn's book) might include: record reviews, concert tour and/or other
promotional information, and additional descriptive attributes of the artist (e.g., group/single,
male/female, musical genre, etc.). But for this initial application, we rely on a more limited set
of measures out of convenience, since our principal goal is simply to demonstrate the manner
in which covariates can aect chart positions.
3 Summarizing Chart Movements
Given the ordinal nature of our data, there are limits on the types of summary statistics that
we can create and meaningfully discuss. Nevertheless, the \birth-growth-decline-death" process
that each song exhibits during its \life" on the chart suggests a straightforward set of summary
measures. While every song may be unique in the exact path it follows, there are a number of
6
natural comparisons that can be made, including:
(i) DEBUT: Each song's debut position; i.e., its rank during its rst week on the chart.
(ii) WKS2PEAK: The number of weeks from debut until the song reaches its peak rank on
the chart.
(iii) PEAK: The top ranking that each song attains during its life on the chart.
(iv) TOTWKS: The total number of weeks that the song spends on the chart.
(v) EXIT: The song's exit position; i.e., its rank during its last week on the chart.
This ve-measure battery captures most of the useful and interesting information contained
in each song's trajectory through the chart over time. To make these summaries as meaningful
and comparable as possible, we omit 196 songs (from our presentation of results) that have
incomplete information on one or more of these measures. More specically, in chart week 1
(1/9/93), there were four new songs entering the chart, so we lack complete information about
several measures (such as DEBUT) for the 96 songs that had at least one week of previous
chart history. Likewise, at the end of our dataset, we can not observe several measures (such
as EXIT and TOTWKS) for all 100 songs in the chart. Rather than showing summary plots
with a varying set of eligible songs, we choose instead to present results only for the 248 songs
with complete data. Keep in mind, however, that we do not impose any such restrictions on
the model itself { that is we utilized all 52 x 100 observed chart positions (the 444 songs) to
actually estimate the model parameters.
To help describe this subset of songs, we present a series of histograms for each of the ve
summary measures. Keep in mind that three of these histograms (DEBUT, PEAK, EXIT) are
dened based on chart positions, so smaller numbers (ranks) indicate better performance on
the chart. The other two measures are dened in terms of weeks.
7
INSERT FIGURE 1 HERE
From these gures, it is evident that the vast majority of songs enter and exit from the
bottom part of the chart. This is consistent with the lifetime metaphor discussed above, but
at the same time, it is important to note that a small number of songs enter the chart at a
rather high position. The best DEBUT in this particular dataset is associated with the song
\That's the Way Love Goes," by Janet Jackson, which entered the chart at #14. (Every few
years, there will be a song that manages to debut at #1.)
Among the many songs with poor DEBUT and poor EXIT positions, most have a short,
uneventful life (i.e., low TOTWKS, poor PEAK, low WKS2PEAK), but several are able to
successfully climb through the chart, reaching a high peak position and/or remaining on the
chart for a long time. Across the 248 songs with complete information, one entry, \Dazzey
Duks" by Duice, was able to stay on the chart for 40 weeks, although it never got higher than
#12. This song also took a remarkably slow journey (27 weeks) to reach that peak position.
Before moving on, it is worth pointing out that 36 songs (15% of those with complete
information) attained their top ranking in their debut week (i.e., WKS2PEAK=1). Many of
these songs stayed on the chart for a very short time, but there are some exceptions, such as
\Runaway Love" by En Vogue, which debuted at #51, and remained on the chart for nine more
weeks, moving down each time.
These anecdotal observations suggest that there are various dierent patterns that we see
across the set of lifetime curves. These dierences provide a strong motivation both for the
exible GG family of curves, as well as for our Bayesian estimation procedure, which will let
us leverage the limited (rank order) data across songs as much as possible.
After developing and estimating the model, we will revisit this set of ve measures to help
determine how well the model is capturing these key characteristics. An adequate representation
of these features can help justify the model specication and/or provide some useful diagnostic
8
information to understand how and when the model falls short.
4 The General Data Structure and Notation
Our research provides a probability model for the following general data structure. Consider a
set of T 1 time-ordered ranked lists measured at time points 1; 2; : : : ; t; : : : ; T . At each time
;1 ) where ;1 is
point, we observe the ranking of I objects given by the vector t;1 = (1;;t1 ; : : : ; I;t
i;t
;1 < ;0 1 indicates a better ranking of unit i compared
the ranking of the i-th object at time t, i;t
i ;t
;1 2 (1; : : : ; R; NR) with
to unit i0 (i.e., a lower ranking indicates greater relative value), and i;t
1 R I and NR indicating that the object is not ranked (in the top R) at time t. We further
;1 , the I T -dimensional observed ranking matrix.
dene ;1 with elements i;t
As a stylized example, consider a time series of T = 3 lists each containing the top R = 5
weekly ranked units given by
0
BB 1
BB
BB 2
BB
BB 3
BB
B 4
;1 = B
BB
BB 5
BB
BB NR
BB
BB NR
@
2
1
NR
3
NR
4
5
NR NR
1
NR C
CC
5 C
CC
C
NR C
CC
C
3 C
CC
C
NR C
CC
C
2 C
CC
C
4 C
CC
A
1
This data indicates, for example, that I = 8 unique units were ranked at some point during the
three weeks, unit 1 was the highest ranked in week 1, 2nd highest in week 2, and not ranked
in week 3, and unit 8 entered the charts in week 3 as the highest ranked. We further dene
9
the inverse ranking matrix with elements i;t , the unit ranked i-th at time t. Specically the
corresponding inverse matrix to ;1 above is given by
0
BB 1
BB
BB 2
BB
BB 3
BB
B 4
=B
BB
BB 5
BB
BB NR
BB
BB NR
@
2
1
4
6
7
1
8 C
CC
6 C
CC
C
4 C
CC
C
7 C
CC
C
2 C
CC
C
NR C
CC
C
CC
NR C
A
NR
NR
NR NR NR
In this matrix, each row represents a chart position, and the each entry in that row identies the
particular song that held that position from week to week. Note that we have no information
to evaluate or compare the worth of the I ; R songs that are not included on the chart in any
given week { all of these songs are lumped together with the indistinguishable NR label.
In addition to and ;1 we further assume that for each unit i, we have available a fully
observed covariate vector xi = (xi;1 ; : : : ; xi;P ), P 0. Allowing for missing (partial) rankings
in ; ;1 or missing covariates in X = (x1 ; : : : ; xI ) is an area for future research. Thus, for our
empirical example, R = 100 reecting the size of the \Hot 100" Charts, T = 52 indicating a
year's worth of weekly data, I = 444 indicating the number of unique songs that appeared in
1993, and P = 2 corresponding to the two covariates (# previous hits, and movie soundtrack
or not).
5 A Bayesian Latent Lifetime Model
Model Description:
10
The basis of our probability model for a given observed ranking matrix ;1 is a latent lifetime
process. Each unit is hypothesized to have some latent lifetime \worth," which it begins to
accumulate from the moment that it is \born" (dened below) and all the way through (and
beyond) its \life" on the chart. This accumulation process will obviously vary over time, i.e.,
more is accumulated in periods in which it is ranked highly and less so in periods of poor or
non-ranked status. That is, the ranking of unit i at time t is determined by the amount it
accumulates, in relation to all other units that are \alive" (dened below) during that interval
of time (i.e., a week).
More formally, the amount that each unit i accumulates at time period t, denoted wi;t , is
assumed to be derived from a stochastic worth formulation given by
wi;t = vi;t + i;t
(1)
where wi;t is the overall latent worth of unit i at time t, vi;t is its deterministic component, and
i;t is a stochastic error term. As described in Section 1, we allow for each unit's accumulation
pattern to follow a exible family of possible shapes by specifying vi;t in accordance with a
Generalized Gamma Curve (GGC). Specically, this denes:
vi;t = t i GGCi;t
r c ;1
;(t ; (zi ; i))c ) 1(t (z ; ))
= t i ci (t ; (zi ; i)) ;(exp(
i
i
ri )
i i
(2)
i
where t and i are scale parameters for each week and song, ri and ci are GGC shape parameters, i is a GGC shift parameter, zi is the list number (week) at which unit i rst appeared,
;() is the gamma function, and 1() is an indicator function. To identify the model we set one
of the i = 1. An alternative way to view our stochastic worth formulation is that log(vi;t ) has
a standard additive linear model structure (with interaction term) equal to f (t) + f (i) + f (i; t)
where f (t) = log(t ), f (i) = log(i ), and f (i; t) = log(GGCi;t ).
11
A more detailed description of the features of this model, and the interpretation of its
parameters, can be described via the following \life story" for a hypothetical unit. Without
loss of generality, we utilize the terminology from our application to the Hot 100 Billboard
Charts for ease of explanation. Each song (unit) i has a chart (list) in which it rst appears, zi .
In our application, zi = 1 indicates songs that rst appeared on the 1/9/93 chart, and thus all
values of zi are dened relative to this anchor. Some number of weeks, i 0, prior to zi , unit i
was \born". So at week t, the age of song i is (t ; (zi ; i )), and therefore by denition (as per
the indicator function in (2)), song i can not accumulate any of its worth prior to being born,
t < (zi ; i). The amount of song i's worth that is accumulated at each week t > (zi ; i) is
then given by a song-specic GGC evaluated at discrete time points t (a Bayesian formulation
for the song-specic parameters is given below). As a simple example, if vi;t is estimated to be
exponentially shaped (which would occur if ri = 1 and ci = 1) then that song would accumulate
most of its worth early in its life, and little later on. On the other hand, the GGC also readily
allows for songs that start near the bottom of the chart, ;1 100, move up through the chart
to their respective peak positions, and decline thereafter. Furthermore, through the inclusion
of i an even more exible set of paths is possible as i allows for songs to accumulate a portion
of their worths prior to their appearance on the charts but after birth, (zi ; i ) < t < zi .
A set of GGCs is shown in Figure 2 for a range of values of r and c that roughly correspond
to those observed in our empirical analysis. On the x-axis we utilize ai;t = (t ; (zi ; i )), time
since birth. For simplicity, we assume i = t = 1, and zi ; i = 0. Larger (smaller) values of
simply shift the entire curve upward (downward) for a given song, and likewise t will allow
these upward/downward shifts to vary over time as well. Increasing the value of zi ; i shifts
the curve to the left, exposing only part of the curve.
[INSERT FIGURE 2 HERE]
It is interesting to note that the solution to @ log(vi;t )=@ai;t = 0 is obtained at ri ci = 1+ ci aci;t .
i
12
Thus a sucient (but not necessary) condition for a downward sloping (exponential-like) worth
curve is ri ci 1. Furthermore, the maximum worth is obtained at ai;t = ((ri ci ; 1)=ci )1=c
i
which is strictly increasing in ri when ri ci > 1. When we think about these characteristics of
the GGC family (and examine Figure 2) in conjunction with the observed summary statistics
discussed earlier, we can make the following conjectures about how these two sets of measures
might relate to each other:
(i) Since the principal role of ci appears to govern the peakedness of the GGC, we should
expect a positive relationship between ci and observed PEAK.
(ii) From Figure 2, ri seems to dictate the \horizontal" shape of the GGC (i.e., location of the
mode and thickness of the right tail), so we should expect a positive relationship between
ri and both TOTWKS and WKS2PEAK.
(iii) For each song, i scales the entire GGC upward (or downward) without aecting its shape.
Thus we should expect a positive relationship between i and both PEAK and TOTWKS.
These conjectures will aid in the interpretation of our ndings in Section 6.
Finally, with respect to the t parameters, their role is to capture and reect the model's
ability to eectively discriminate among the observed rankings in each week. When the model
is working well, i.e., when it is relatively easy to see dierences in the GGC's for the songs
ranked rst, second, third, and so on, the value of t will be large, which basically amplies
these dierences and enhances the probability of seeing that particular ordering of songs. In
contrast, if the song-by-song dierences are rather murky, then t will be relatively small, and
the probability of seeing any particular ordering will shrink towards a uniform distribution.
The specic mechanism that takes the GGC curves and these parameters and converts them
into probabilities and an overall likelihood function is discussed next.
Likelihood Specication:
13
We describe the likelihood for the data in terms of inverse-ranking matrix ; the exact same
likelihood is obtained for ;1 but is notationally more cumbersome to explicate. The observed
inverse-ranking vector at time t can be conceptualized as a sequence of \exploded" conditional
multinomial logit evaluations (Chapman and Staelin 1982). In this formulation, the probability
of observing a given ranking t = (1;t ; : : : ; R;t ) is equivalent to the probability that unit 1;t
has greater worth than all other units (i.e., it is ranked rst), 2;t has the second greatest worth,
...., and R;t has R-th greatest worth.
In standard ranking situations, i.e., where the set of units to be ranked is xed in size and
and known in composition, the probability of ranking vector t is well-known and given by:
Prob(t ) = Prob(w
;t
> w
= Prob(w
;t
maxj=2;:::;Rw ;t ) Prob(w
1;t
1t
2;t
;t
> : : : > w
R;t
;t )
(3)
2;t
j;t
: : : Prob(w
;1) ;t
(R
;t
w
R;t
;t
maxj=3;:::;Rw ;t)
j;t
;
;t )
which yields, by assuming i.i.d. doubly exponentially distributed errors for i;t ,
Prob(t ) =
R
Y
exp(v )
PR0 exp( v ;t ) :
0 ;t
i=1 i =i
i;t
(4)
i ;t
The full likelihood for the data, assuming conditional independence of the ranks across time, is
then given by
Prob() =
T
Y
t=1
Prob(t ):
(5)
In essence, a given song i's probability in the exploded likelihood function in (4) is its exponentiated deterministic worth divided by the total exponentiated deterministic worth of that
song and its \competition set", i.e. all songs ranked equal to and below that song during week
t. However, this simple denition of the competition set is incorrect (for our application): it
14
assumes that this set is composed solely of other songs that were ranked at time t (i.e., the
denominator in (4)). But there are two other groups of (unranked) songs that could have been
ranked, and thus they may compete with those songs actually on the chart at time t. Group 1
consists of songs that had been on the chart at some point in the past, but are no longer ranked
(t > (zi ; i ), i;t 100 for some t, and i;t0 = NR for t0 > t). This could include a song that
just dropped o the chart this week, but it also covers the possibility that an old song could
pop up on the chart again (one such example is Bing Crosby's \White Christmas," which has
made many repeat trips back to the chart). In contrast, Group 2 consists of those songs that
have been \born" but have never actually been on the chart ((zi ; i ) < t < zi ). This includes
songs that have just been released but haven't yet accumulated much worth. Some of these
songs may reach the chart in the next week, and some may never get there, but collectively,
these songs must be factored into the competition set2 .
Thus we need to modify the likelihood, specically the denominator in (4), to account
for those additional non-ranked (and unobserved) songs that compete with the R songs that
were actually observed to be on the chart in a given week. Unfortunately, since the songs in
Groups 1 and 2 are unobservable, there is no direct information in our dataset to incorporate
their worths. Accordingly, we modify (4) using the following assumptions. Let g1;t ; : : : ; gn
1;t
;t
denote the deterministic worths (akin to vi;t ) of the Group 1 songs at time t and h1;t ; : : : ; hn
2;t
;t
denote the deterministic worths of the Group 2 songs at time t. We then assume the following
hierarchical normal distribution structure for their worths:
[gn;t j1;t ; 12;t ] N (1;t ; 12;t ); n = 1; : : : ; n1;t ; t = 1; : : : ; T;
(6)
[hn;t j2;t ; 22;t ] N (2;t ; 22;t ); n = 1; : : : ; n2;t ; t = 1; : : : ; T;
[1;t j1 ; 12 ] N (1 ; 12 ) : t = 1; : : : ; T;
2
While it is theoretically possible that one or more songs in our sample might drop o the chart then reappear
several weeks later, this never occurs in our dataset.
15
[2;t j2 ; 22 ] N (2 ; 22 ); t = 1; : : : ; T;
where N (a; b) denotes a normal distribution with mean a and variance b, and [xjy] denotes the
conditional distribution of x given y. Utilizing the distributions given in (6), our approach is
then as follows.
(i) Obtain an estimate of E (1;t ) and E (2;t ), the expected worth of a randomly selected
Group 1 and Group 2 song. A method for obtaining these estimates is described in detail
below.
(ii) Compute, for each time period t, n1;t exp(E (1;t )) and n2;t exp(E (2;t )) - the total exponentiated expected worth of Group 1 and Group 2 at time t. We discuss below how we
chose n1;t and n2;t .
(iii) Modify the likelihood given in (4) to incorporate the Group 1 and Group 2 songs such
that
Prob(t ) =
R
Y
exp(v
)
;t
:
PR0 exp(v ) + n exp(E(
1;t )) + n2;texp(E(2;t ))
0 ;t
1;t
i=1 i =i
i;t
(7)
i ;t
Note that to make our extended competition set approach operational, we need to determine
n1;t; n2;t ; E(1;t ); E(2;t ) for t = 1; : : : ; T .
Unfortunately, there is no information in the data regarding n1;t and n2;t . An attempt
to t values of n1t and n2t to maximize the likelihood function was useless, since this led to
n1;t = n2;t = 0 for all t, i.e. making the denominator as small as possible. Furthermore,
placing priors on n1;t and n2;t might yield apparently reasonable values, but they would be
solely determined by the prior, an unsatisfactory situation. Ultimately, we turned to a much
less formal approach, namely to set n1;t = n2;t = 50 for all t. This assumption states that the
number of units of the \unseen" competition set is equal in size to the observed ranked list, and
equally split between Groups 1 and 2. Sensitivity analyses (e.g., additional runs with 90/10
16
and 10/90 splits) indicated little variation in our inferences. In fact, this assumption is hardly
restrictive as it is the product of two terms n1;t exp(E (1;t )) and n2;t exp(E (2;t )) that appear
in the likelihood; even after xing the sizes of the two groups, their total worth determined by
the expectations is still entirely free to vary.
In contrast, there is signicant information in the data about E (1;t ) and E (2;t ) given the
following plausible assumption. We assume that the worths of the songs from the observed data
that satisfy the conditions for Group 1 and Group 2 are random samples from N (1;t ; 12;t ) and
N (2;t ; 22;t ) respectively. That is, the worths of those songs which used to be ranked but are no
longer on the chart (Group 1), and those songs which have been born but have not yet made it
onto the chart (Group 2), from the observed T weeks of data are drawn from the same Group
1 and Group 2 distributions given in (6) as the \unseen" part of the competition set. In eect,
this assumes stationarity over time in the distribution of songs which have left the charts, and
those which are born but have not appeared.
Using this assumption, the notation gn;t ; hn;t for the Group 1 and Group 2 songs at time t,
and their distributions assumed in (7), we obtain the following equations for E (1;t ) and E (2;t )
in a Bayesian manner based on standard normal theory:
E (1;t j12;t ; 1 ; 12 ) = (g;t n1;t =12;t + 1=12 )(n1;t =12;t + 1=12 );1; and
(8)
E (2;t j22;t ; 2 ; 22 ) = (h ;t n2;t =22;t + 2=22 )(n2;t =22;t + 1=22 );1:
The equations given in (8) state that the expected average worth of the unseen part of the
competition set, is a precision-weighted average of those observed from week t, g;t or h ;t for
Group 1 and Group 2, and the prior means 1 or 2 respectively. Of course, the equations given
in (8) depend on unknown parameters 12;t ; 1 ; 12 for Group 1 and 22;t ; 2 ; 22 for Group 2. We
estimate these quantities from the observed song worths as follows. To estimate the variances,
12;t and 22;t , we utilize the sample variances of the gn;t and hn;t (s2g;t and s2h;t) respectively.
17
Estimates of 1 equal to g; (the grand mean of all Group 1 g's) and 2 equal to h ; (the grand
mean of the h's) are obtained as gn;t and hn;t have marginal means 1 and 2 respectively.
That is [gn;t j1 ; 12t ; 12 ] N (1 ; 12t + 12 ) and [hn;t j2 ; 22t ; 22 ]. These marginal distributions
also yield standard method-of-moments estimates, which we utilize, for 12 and 22 equal to
Var(gn;t ) ;
PT s2 =T and Var(h ) ; PT s2 =T . These estimates are then substituted into
n;t
t=1 g;t
t=1 h;t
(8) to obtain E (1;t ) and E (2;t ).
Prior Specication:
For the GGC structure specied in (2) in which each unit (song) has its own set of parameters
(i ; ci ; ri ; i ), there is \sparseness" of data available for some units (i.e., a song with a chart
history of only one or two weeks), there are available unit-specic covariates xi , and there are
commonalities likely to exist among dierent units \shapes" over time. This strongly suggests
the appropriateness of a Bayesian approach to obtain the unit-level parameters (Gelman et.
al 1995). We specify the following hierarchical structure for positive (latent worth) random
variables (i ; ci ; ri ; i ):
[i ; ci ; ri ; i j; ; ] MV LN ( + xi ; )
(9)
where MV LN (a; b) denotes a multivariate log-normal distribution with mean vector a and
covariance matrix b, is a 4 P -dimensional matrix of covariate slopes, and is a 4 4dimensional parameter variance-covariance matrix. In this formulation, the parameter matrix
is of direct interest. For example, utilizing these results we can infer whether past information about the artist xi aects the shape of the unit's GGC curve possibly through its total
worth i , its \spikedness" ci , weeks to peak ri or time from birth to rst chart appearance i .
Non-informative hyperpriors were utilized for and . A slightly informative inverse-Wishart
hyperprior was utilized for to ensure a proper posterior (4 degrees of freedom equal to the
dimension, and large variance terms). Sensitivity analyses indicated little impact to the exact
18
values of the hyperprior parameters for .
In addition, we might expect commonalities to occur across the week strength parameters
t . We incorporated as well a hierarchical structure for t by utilizing
[t j ; 2 ] LN ( ; 2 )
(10)
where LN (a; b) denotes a log-normal density with mean a and variance b. A non-informative
hyperprior was utilized for and a proper inverse chi-square distribution with 0.5 degrees of
freedom was used for 2 .
Computational Approach:
Inferences from the model specied by the likelihood given in (5), the extended competition
set given in (7), estimates of the competition set worths given in (8), and priors given in (9) and
(10) were obtained from posterior samples generated by a Markov chain Monte Carlo (MCMC)
sampler (Gelfand and Smith, 1992). Direct sampling from the conditional distributions of
the vector of unit parameters (i ; ci ; ri ; i ) and week strength parameters t to implement the
sampler straightforwardly were unavailable in closed-form due to the non- conjugate hierarchical
structure of the exploding multinomial likelihood and log-normal priors. To obtain samples from
these conditional distributions we implemented a Metropolis step within the MCMC sampler
(Hastings 1970). For all hierarchical-level parameters, the selected distributions yielded closedform conditional distributions derived from standard normal-inverse Wishart theory. More
specic details of the sampler are available from the authors upon request.
For the unit-specic parameters we tried two dierent multivariate sampling densities for
the Metropolis step to obtain potential draws at the t + 1-st iteration: (1) the prior evaluated
at the previous draw - MV LN (t + t xi ; t ), and (2) a MVLN centered at the previous draw
- MV LN ((it ; cti ; rit ; it ); S ). We implemented the approach using 4 4 diagonal matrix S with
19
a variety of dierent diagonal elements ranging from 0:001 to 1. As a result of the various
implementations, all results presented in Section 6 are those using priors centered at the previous
draws. The Metropolis step for the week strength parameters were also similarly implemented.
To ensure convergence of the sampler, we present results based on the combined draws from
three separate runs of our MCMC sampler started at dierent starting values (Gelman and
Rubin, 1992). Convergence criterion suggested an initial burn-in period of 250 draws for each
run, however we discarded the initial 500 from each stream. As each stream was run for 2000
iterations this provided us 4500 draws for estimation. The MCMC sampler was run in Fortran
on an HP 9000 Unix server taking approximately 20 seconds per iteration.
One additional computational feature, which led to signicant time savings, that we mention
is that the conditional distributions for the song unit parameters (i ; ci ; ri ; i ) may be \reduced"
in the number of exploding terms by noting that
[i ; ci ; ri ; i j; xi ; ; ; ] /
T Y
R
Y
exp(v
)
;t
(11)
PR0 exp(v
)
+
n
exp(
E (1;t )) + n2;texp(E (2;t ))
pi 0 ;t
1;t
t=1 i=1 i =i
[i ; ci ; ri; ijxi ; ; ; ]
min(Y
; ;R)
T
Y
exp(v )
/
PR0 exp(v ) + n exp(E (;t )) + n exp(E ( ))
0 ;t
1;t
1;t
2;t
2;t
i =i
r=1
t=max(1;z ; )
[i ; ci ; ri; ijxi ; ; ; ]:
i;t
i ;t
1
i;t
i;t
i
i ;t
i
(12)
That is we need not compute the conditional distribution for unit i over time periods t for
which the song has not yet been born, and secondly the terms in the likelihood involving units
ranked lower than song i do not depend on song i's parameters. This fact led to signicant
computational savings. The results of our analyses are presented next.
20
6 Model Results
We base our analyses here on the 4500 draws from the MCMC sampler, as just discussed. There
are several dierent types of diagnostics worth examining. First we look at the model's ability
to recover the actual week-by-week ranks that served as the input data. Second we will examine
the model parameters to see if they oer some insight by themselves, and then in conjunction
with the ve summary statistics described earlier in section 3. Finally, we look more closely at
these key summary statistics alone { specically, we investigate the model's ability to recover
the observed values across the 248 songs with complete information. Since these statistics
capture the principal characteristics of each song's \life" on the chart, this analysis will provide
a comprehensive assessment of the model's capabilities and shortcomings.
Recovering Ranks:
In Figure 3, we show the model's ability to capture the observed ranks in the dataset.
We display three separate cases: exact match, estimated rank within +/- 5 ranked positions,
and estimated rank within +/- 9 ranked positions. For clarity, we display the results only for
chart positions 1, 10, 20, 30, ..., 80, 90, 100. Each of these points represents an average taken
across the 234,000 (52 weekly charts x 4500 MCMC draws) observations we have for each chart
position.
INSERT FIGURE 3 HERE
It should come as no surprise that obtaining an exact match is a rather dicult achievement
{ occurring less than 10% of the time for most chart positions. This is because there are so many
sources of error in going from the observed ranks to the model and back to the observed ranks,
including possible misspecication of the lifetime curve (i.e., the GGC), randomness around that
deterministic curve, as well as errors due to randomness and possible misspecication in the
\exploded logit" part of the model. In fact, given all of these potential errors, it is encouraging
21
that the model appears to be so capable at capturing the true ranks within +/- 5 rank positions,
e.g., correctly tting most Top 10 songs within the Top 10 positions. It is interesting to note
that the model's ability to recover these true ranks appears to be systematically related to the
chart positions, i.e., the ranks of more popular songs are easier to recover than those of weaker
songs. Perhaps this is due to the fact that we generally have a longer series of observations for
the better songs, and thus can get a better \x" on their week-to-week movements through the
chart.
One rather obvious kink occurs right at position #40, suggesting some problem with the
model that occurs when songs pass that threshold. Indeed, as we move on to an analysis of the
more aggregated summary statistics, this concern { and the explanations underlying it { will
become increasingly clear.
Examining Parameter Estimates
In Figure 4, we show scatterplots of posterior means covering every pairwise combination
of the song-specic GGC parameters (ri ; ci ; i ; and i ). The two shape parameters (ri and
ci ) and the scale parameter (i ) exhibit some signicant (and logical) relationships with each
other, but none of the plots involving the \birth" parameter, i reveal any pattern whatsoever.
The strongest bivariate relationship exists between ri and ci , with a correlation of 0.621. This
pattern reects the fact that popular songs tend to reach high peak postions on the chart and
also tend to remain in the chart for a long time. The vast majority of songs in the upper-right
portion of the plot (i.e., those with ri > 5 and ci > 0:5) are quite successful, spending 20
or more weeks on the chart and generally peaking somewhere in the Top 20. (As additional
evidence, the correlation between PEAK and TOTWKS { among the 248 songs with complete
information { is remarkably strong at -0.913.)
INSERT FIGURE 4 HERE
22
It might be tempting to look at the most extreme song in the plot of r vs. c, (i.e., the one
with the greatest value of ri ci ) and view it as the most popular song in the dataset, but
that would be misleading. While this song, \Baby I'm Yours," by Shai, fared well on the chart
(TOTWKS=25, PEAK=10), it had a lower value of i than other songs in that region of the
plot, thereby scaling down the height of its life-curve relative to those other songs. This one
example conveys the tight interplay among these three parameters, and is also an exception to
the generally positive relationship between ri and i ( = 0:567). Unlike the pattern involving ri
and ci , there is no immediate intuition for this association, but a closer examination of Figure 2
reveals an explanation. Since the Generalized Gamma is a proper probability density function,
the height of the curve at its mode must become lower as the right tail grows thicker (notice
the change in scale of the y-axis as you look across the three plots in each row of Figure 2).
Thus, in the absence of the scale parameter (i ), a song with a long life on the chart would be
\penalized" with a lower mode. The positive relationship between ri and ci makes up for this to
some extent, but these long-lived songs apparently need an extra boost (higher i ) throughout
their lives on the chart to make their worths more comparable to songs with lower values of ri .
Turning to the i parameters, they show very little variation across the songs, either in a
univariate sense or in conjunction with the other model parameters (Figure 4). Several reasons
might explain this nding: (1) Perhaps few songs accumulate any substantial popularity before
entering the chart, so they are basically `born" at the time of their chart debut; (2) The exibility
of the GGC might make this extra parameter unnecessary; or (3) our method for handling songs
that have been born but haven't reached the chart (see equation 6 and surrounding discussion)
might make it dicult or unnecessary to uncover the \post-birth, pre-chart" popularity of any
given song. In any event, we keep this parameter in the model for its intuitive appeal, and
expect it might be of some use in other applications. But we can eectively ignore it for the
remainder of the paper.
The next set of model parameters are the covariate eects ( ), which allow the song-specic
23
parameters to vary as a function of two external measures: whether or not the song comes from
a movie soundtrack, and the past popularity of each artist, as captured by the number of their
previous Hot 100 hits. As shown in Table 1, the shape and scale parameters of each song's
GGC curve are all substantially (and positively) aected by the number of previous chart hits
by that same artist. It makes sense that more popular artists will nd it easier to move higher
in the chart more quickly, remain up high for a longer period of time, and have an overall longer
stay on the chart. The soundtrack eect is much less dramatic: songs drawn from movies tend
to have a greater degree of peakedness, but the other two parameters do not dier from zero to
any meaningful extent.
Table 1: Posterior Means (and Standard Deviations) for covariate eects on unit parameters.
c
Number of previous Hot 100 hits 0.267 (0.084)
Song from movie soundtrack
r
0.086 (0.030) 0.152 (0.054)
0.211 (0.062) -0.076 (0.111) 0.009 (0.039)
Finally, we turn to the last set of parameters, the t , which capture week-specic scale
eects. As seen in Figure 5, there is a very clear pattern, which conveys the model's evolving
ability to t the observed data and accurately discern the ranked positions (i.e., the relative
magnitude of each song's worth) over time. For obvious reasons, the model has a tough time
sorting out the ranks in week 1, and thus it chooses a low value of 1 to try to dampen any
dierences in the songs' worths. But as soon as we begin to see additional observations for
many of these songs, the model's discrimination ability improves quite dramatically.
At occasional times in the dataset the model suddenly becomes \confused" in its ability to
sort out the relative worths of the ranked songs (i.e. a non-monotone pattern in Figure 5). The
24
most striking example occurs in week 17 (5/1/93) of Figure 5, due to an unusually high level of
DEBUT activity (eight new songs entered the chart that week, including the highly successful
entrance of \That's the Way Love Goes," by Janet Jackson, at #14). But subsequently, the t
begin to grow once more as repeated observations begin to build up for the new set of songs.
In applied settings, this set of parameters might provide some useful diagnostics to help gauge
the impact that certain events (e.g., the launch of a controversial new album) might have on
customers' tastes and their buying/listening habits. It is interesting to observe that, within
this dataset at least, this curve plunges downward seemingly on a periodic basis (roughly every
three months). Perhaps the music distribution companies intentionally try to shake up the
charts from time to time, drawing consumers' attention to new artists, genres, and/or other
changes within the music industry.
INSERT FIGURE 5 HERE
Parameters and Summary Statistics
As discussed in the initial model description, we expected to see several systematic patterns
between the estimates of GGC parameter estimates and the ve summary statistics. For each of
the conjectures mentioned earlier, we show the resulting scatterplots in Figure 6. As described
earlier, we utilize only the 248 songs for which we have complete information on all ve measures.
Informal analyses conducted on the other 196 songs revealed no major departures from the
patterns discussed below.
INSERT FIGURE 6 HERE
The rst panel conrms that there is a very clear relationship between the ci and PEAK.
With a correlation of = ;0:806, this is the strongest bivariate association we see between
any of the model parameters and any of the summary statistics. The second and third panels
show evidence supporting the notion that songs with higher values of ri tend to peak later
25
( = 0:562) and stay on the chart longer ( = 0:536). We also see that the i , as expected, tend
to be higher for songs with better peak positions ( = ;0:470) and those with longer lives on
the chart ( = 0:481). While all of these conjectured relationships seem to supported quite well,
there are some peculiar patterns with some of these plots, for instance the two plots involving
TOTWKS. In the next section we will discuss some explanations for these patterns, as well as
the kink seen earlier in Figure 3, when we look carefully at our ability to recover these observed
summary statistics from the model's estimated parameters.
Observed vs. Fitted Summary Statistics
In Figure 7, we show scatterplots of the observed vs. tted summary statistics. The solid
line in each plot represents a perfect t (observed = tted), while the dashed line shows the
best-t (least-squares) line through the data. The tted values plotted are the posterior means
averaged over the MCMC draws. Our ability to recover these summary statistics is quite varied.
Fortunately, for the two most interesting/important measures (TOTWKS and PEAK), the t
appears to be very good. The number and nature of outliers appear to be of no major concern,
and the linear trend in each plot is quite strong ( = 0:949 for PEAK, and = 0:900 for
TOTWKS).
INSERT FIGURE 7 HERE
Moving on to WKS2PEAK, we continue to see a reasonable linear trend ( = 0:815), but
the tted weeks-to-peak are clearly biased downwards. In other words, the model is suggesting
that songs peak before they actually do, and this is happening in a fairly consistent manner.
One explanation behind these biases becomes evident in the earlier histogram for EXIT (Figure
1). If one visualizes the marginal distribution for the tted values alone, one will see that the
vast majority of songs are expected to leave the chart from a low position. This makes intuitive
sense, in accordance with the biological metaphor of each song's \life," wherein each song goes
through a smooth decline after reaching its peak. But the distribution of observed EXIT values
26
(Figure 1) has far too many songs leaving from relatively high chart positions, and in particular,
many of them are tightly clustered between chart positions 40 and 50. Another interesting fact
about this cluster of songs, although not discernable in the histogram, is that virtually every
one of these songs peaked in the Top 20, and most made it all the way into the Top 10 on the
chart.
More careful consideration of the dataset { and music industry behavior { reveals some
reasons why successful songs do not always go through the gradual decline phase that is associated with the life-cycle metaphor. After reaching their peak positions high up on the chart,
many of these songs decline for a short period of time, but once they leave the Top 40, they
are eectively done accumulating popularity. In short, these songs are no longer fashionable.
Crossing the lower bound of the Top 40 seems to trigger a \death" process that is not captured
by the model. Furthermore, the GGC family { despite its exibility { does not deal well with
this type of severe asymmetry. (Look back at Figure 2 to see how the GGC is characterized by
relatively smooth declines for all moderate or high values of r and c, which would be associated
with all of these highly successful songs.)
Striking evidence of this death process can be seen in a scatterplot of TOTWKS vs. EXIT
(Figure 8). Just as chart position #40 seems to represent a signicant threshold that separates
out dierent types of EXIT behavior, so does a 20-week lifespan on the chart. First notice
that there is an unusual abundance of songs that stay on the chart for exactly 20 weeks (a
pattern that can also be detected in Figures 1 and 6). But more unusual than the frequency of
a 20-week lifespan is the dramatically dierent set of patterns we see for songs that exceed vs.
fall short of this hurdle. With only one exception, every song that falls o the chart before its
20th week will leave from chart position 70 or worse. But every song that stays on the chart
for 21 or more weeks will depart from position 60 or better. As noted above, the vast majority
of these songs drop o the chart immediately after they fall out of the Top 40.
27
INSERT FIGURE 8 HERE
The plot of EXIT vs. PEAK (Figure 8) shows that this so-called death process does not
apply to every successful (Top 20) song. There are a number of songs that peak in the Top 20
but remain on the chart for several weeks even after dropping out of the Top 40. Yet every one
of these songs has a chart life of 20 weeks or less. The overall conclusion here is that there is a
special (and highly non-linear) relationship between TOTWKS, PEAK, and EXIT that seems
to reect a set of industry practices or norms that are not captured well by our model.
Beyond the clear evidence of this pattern in the observed vs. tted plots for EXIT and
WKS2PEAK, traces of it can also be seen in the TOTWKS plot in Figure 7. Although the plot
seems ne overall, notice that the tted values for most of the successful songs (TOTWKS >
20) tend to be too high. Once again, this is due to the fact that the songs are disappearing o
the chart much more rapidly than the model expects them to.
It might not seem obvious that this same bias would extend all the way back to DEBUT, but
in the remaining panel of Figure 7, we can see that this is indeed the case. The most signicant
outliers on this graph, i.e., those songs to the lower right, with tted values of DEBUT far lower
than their observed values, are all drawn from this same cluster of successful songs that drop o
the chart very suddenly. Apparently, when the model estimation process attempts to derive a
GGC curve to accommodate the rapid fall of these songs, the values of the ci parameter become
elevated, making the songs quite spiky and thus they are forced to have a more dramatic rise
through the chart as well.
Summary of Empirical Results
In attempting to evaluate the proposed model across this varied set of empirical analyses,
we see generally encouraging results. The model parameters tend to be rather \well-behaved,"
and do an excellent job of capturing the essential elements of each song's movement through
the chart over time. The model's ability to t and recover the observed ranks is generally
28
good, especially for certain critical aspects, such as a song's peak position on the chart. But at
the same time, the ability to model abrupt drops is not captured well by the model (for this
dataset), especially because it embodies a strange interaction with TOTWKS. Nevertheless, we
can put a positive spin on this potential shortcoming by noting that the model provides a very
strong benchmark against which this unusual \death process" can be contrasted and evaluated.
7 Conclusions and Future Research
This paper has oered two broad contributions to the statistical modeling literature. One is
the substantive notion of trying to capture and explain the patterns that can be observed for
a time-series of ranked data; the other is the means by which we address this phenomenon,
i.e., the marriage of the Generalized Gamma \lifetime curve" with the exploded-logit model
formulation. We have focused our attention almost exclusively on the Hot 100 music chart,
since it is a particularly rich dataset both in terms of its empirical characteristics as well as
its historical and social signicance. As noted at the outset of the paper, there are numerous
other datasets for which this type of modeling approach can be applied, and to the best of our
knowledge, no research has made any attempt to make inroads in this general area.
But while the Hot 100 chart may be a natural starting point for this type of research
endeavor, it might actually present a more challenging task than other similar datasets, for
several reasons: (1) We had to devote considerable eort to accomodate the \birth" of new
songs, an issue that would be of relatively little importance in other contexts (e.g., ranking
business schools); (2) The fact that music is, by its very nature, a fashion-oriented product
might add a degree of complexity that might not apply if we were to look at ranks of other
entities; and (3) Because so much attention and resources are focused on the Hot 100, there
might be more of a tendency for rms to try to manipulate the rankings in subtle ways, thus
making it tougher to get an accurate read on the \true worth" of each song. As statisticians
29
begin to pursue further work in this area, these and other factors should be taken into account
in order to understand or anticipate the success/failure of modeling exercises such as the one
discussed here.
From a methodological standpoint, our work here has highlighted both the strengths and
weaknesses that result from relying on the type of \curve-tting" procedure that comprises the
heart of our model. The good news is that a exible curve such as the GGC is simpler and
in some ways more elegant than using a more formal autoregressive approach for vi;t to try to
link each song's worth from one chart to the next. This is not to imply that our approach is
necessarily better (in terms of model performance) than a more traditional time-series model,
but the high degree of parsimony and parameter interpretability seem to imply that our method
is a reasonable way to analyze these data.
The downside of our model became clear as we worked all the way through the empirical
analysis. Specically, the GGC family is ill-equipped to handle the sudden drop in latent worth
that seems to occur for songs that remain on the chart for over 20 weeks. Several paths can
be taken to try to address this problem. One would be to replace the GGC with an even more
exible family of curves to try to allow for the highly asymmetric shape that is required to
accommodate such a rapid fall in the \life" curve over time. Such a need, we expect though, is
problem specic. In many cases, the GGC curve will do quite well.
A second possibility is to introduce an explicit \death process" into the model. The modularity of our approach can allow for such an addition, but a number of questions and dicult
decisions need to be addressed if one wanted to pursue this angle. For instance, do we really
believe that a song can drop to zero worth forever, in which case, it can never reenter the chart
again? (While we do not see any songs reentering the chart in our dataset, such an occurrence
happens with some regularity, especially for seasonal music, or songs that are reissued as part
of a movie soundtrack.) Furthermore, it might be dicult to properly model the critical hurdle
30
that exists at week 20 of a song's life on the chart. We feel that the degree of \engineering"
(ad-hockery) required to capture these eects would be highly idiosyncratic to this type of
dataset, and might harm the model's capabilities in being able to handle a broader class of
data structures.
One further shortcoming of this initial analysis, albeit a minor one, is the limited range
of covariates considered. As noted earlier, our principal objective was merely to show how
covariates could aect the latent parameters in sensible ways, but it would be useful to try to
incorporate eects that could be manipulated by managerial action.
Finally, a natural sequence of next steps would be to see how generalizable our ndings
are as we move away from this dataset to others that are progressively further removed from
the 1993 Hot 100 chart, i.e., other music charts, other types of best-seller charts, and on
towards completely unrelated domains. The model proposed here is literally a rst step in
this progression, and we believe that a number of substantive and methodological insights have
emerged.
References
Agency Sales Magazine (1999), \Ranking Sales Force Personnel", 29(2), p49.
Billboard Online (1998), \Understanding the 'New' Hot 100 Chart", posted at:
http://www.billboard.com/charts/newhot100.html.
Business Week (1998), \Top B-Schools", 10//19/98, pages 86-108.
Chapman, R.G., and Staelin, R. (1982), \Exploiting Rank Ordered Choice Set Data Within
the Stochastic Utility Model", Journal of Marketing Research, Vol. XIX, 288-301.
31
Chateld, C. (1996), The Analysis of Time Series: An Introduction, 5th edition, Chapman &
Hall.
Daniels, H.E. (1969), \Round Robin Tournament Scores", Biometrika, 56, 295-299.
Davis, J.B. (1998), \Ranking of Economics Journals According to the Social Sciences Citation
Index", American Economist, 42(2), 59-64, Fall.
EPA Issues WMPT \Nifty 50 List" (1998), Pollution Engineering, 30(13 Part (1)): 27-29, Dec.
Fligner, M.A., and Verducci, J.S. (1992), Probability Models and Statistical Analyses for Ranking
Data, Springer-Verlag.
Gelfand, A.E., and Smith, A.F.M. (1990), \Sampling-Based Approaches to Calculating Marginal
Densities", Journal of the American Statistical Association, Vol. 85, 398-409.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (1995), Bayesian Data Analysis, Chapman & Hall, London.
Gelman, A., and Rubin, D.B. (1992), \Inference from Iterative Simulation using Multiple Sequences", Statistical Science, Vol. 7, 457-511.
Hastings, W.K. (1970), \Monte Carlo Sampling Methods Using Markov Chains and Their
Applications", Biometrika, Vol. 57, 97-109.
McDonald, J.B., and Bulter, R.J. (1990), Journal of Econometrics, 41, 227-251.
McFadden, D. (1974), \Conditional Logit Analysis of Qualitative Choice Behavior", Frontiers
in Econometrics, Zarembka, P. ed., New York, Academic Press, 105-42.
Stern, H.S. (1990), \Models for Distributions on Permutations", Journal of the American Statistical Association, 85, 558-564.
Wall Street Journal (1999), \'Chatting' a Singer Up the Pop Charts", 10/5/99, p. B1.
West, M. and Harrison J. (1997), Bayesian Forecasting and Dynamic Models, Springer-Verlag.
32
Whitburn, Joel (1996), The Billboard Book of Top 40 Hits, Billboard Publications, Inc., New
York.
;; (1997), Top Pop Singles, 1955-1996, Record Research, Inc., Menomonee Falls, WI.
33
WKS2PEAK
0
0
10
20
20
40
30
60
40
80
50
100
DEBUT
40
60
80
100
0
5
10
15
DEBUT
WKS2PEAK
PEAK
TOTWKS
20
25
0
0
20
10
40
20
60
30
20
0
20
40
60
80
100
0
PEAK
10
20
30
40
TOTWKS
0
20
40
60
80
100
EXIT
40
50
60
70
80
90
100
EXIT
Figure 1: Histograms for DEBUT = debut position, WKS2PEAK = # weeks till peak, PEAK
= highest ranking obtained, TOTWKS = # weeks on charts, EXIT = leaving position, for 248
Hot 100 Billboard Songs from 1993 with full information.
34
R = 5, C = 0.25
R = 7, C = 0.25
v_it
v_it
0.0001
0.0020
5
10
15
20
0.0
0.0010
0.0
0
5
10
15
20
0
10
15
t
R = 3, C = 0.5
R = 5, C = 0.5
R = 7, C = 0.5
v_it
0.010
15
20
0.0
0.002
0.005
0.0
10
0.006
0.015
v_it
0.010
0.020
0.08
0.06
v_it
0.04
0.02
0.0
5
0
5
10
15
20
0
5
10
15
t
t
R = 3, C = 0.75
R = 5, C = 0.75
R = 7, C = 0.75
0.08
v_it
0.04
0.06
v_it
0
5
10
t
15
20
0.0
0.0
0.0
0.02
0.02
0.05
0.04
0.10
20
0.06
t
v_it
20
0.014
t
0.15
0
5
t
0.025
0
0.0002
0.0030
0.8
0.6
0.2
0.4
v_it
0.0003
1.0
0.0004
R = 3, C = 0.25
0
5
10
t
15
20
0
5
10
15
20
t
Figure 2: Plot of Generalized Gamma Curve vi;t versus weeks since birth, ai;t , for various values
of r and c.
35
1.0
Probability of estimated rank within given limits of true rank
Bottom Line = Exact Match
Middle Line = +/- 5
0.6
0.4
0.2
0.0
Probability within Range
0.8
Top Line = +/- 9
0
20
40
60
80
True Ranking
Figure 3: Plot of Estimated Rank Error by True Rank Position
36
100
R versus Theta
tau versus Theta
•
3.5
3.5
•
•
•
•
•
•
•
2.5
• •
• •
• • ••
•
• •
•
••• •
• •
•• • •
•• •• • •• ••• • ••• ••• • •
•• • • ••
• •
• • •
•
• •• •••••• •• •• •
• • •
•
•• • ••• •••••••••••••••••• • •• • •••
•
•• •
•
•
•
•
•
• •• •••••• • •••••• • •• • • •• ••
•
• ••••• •• ••
• •••••••••••••• •••••• • •
• ••• • ••• • • • • •
•
•
•• •
•
•
•
•
•
••
•
•• • ••• • • •••
• • ••
4
5
Theta
••
•
•
1.0
1.5
•
6
7
0.2
7
0.7
0.6
•
0.6
0.8
•
•
• •
•
•
•
• • • •• •
•
•
•
•• • •• • ••• • •
• ••
•
•
• • •
• • •• •
• •
•
• ••• • • • • • •
• ••
••
•
• • •• • • • • • •
••
•
•
••
••
•• ••••
•
••
•
• • ••
•
• • • •• • •
•
•
•••• • • • • • • • •
•• • •
•
•
•
•
•
• •
••• • • • • • • • •• •
•• • • •• •• • • •• ••• •
•
• • • • •••• •
• ••• ••• • ••
•
• • • • •• • • •• • ••
• ••• ••• • • •
•
•• • •
•
•
•
•• ••
•
•
•• •• • •••• •
• •
•
• •
•
•
•
0.4
0.6
0.8
Theta versus C
5
6
0.7
•
•
•
• •
••
•
••
••
• •
• • • • ••
•
•• •• •
•
•
• •
•
••
•
• •
•• •
•
••
•
• •
• • • • • • •• •• • ••
• • • • •
••
•• • • • •
•
•
••
• ••••
•
•
• • ••
•
•• ••
•
•
•• • ••
•
•• •• • •• • • •••• •
• • • •••••• • ••
••• •
• ••
• • •
• ••• •• ••• • • •
• ••
•
•• • •• ••••• ••• • •
• •• • • • • • • • •• •
•
• • • •• •
• • •• •• • • • •
•
• ••
• •
• • ••
••
• • •
•
•
1.0
R versus C
•
1.2
•
0.2
•
•
•
tau
0.7
1.2
•
•
0.6
•
0.4
C
•
0.3
0.6
0.5
0.4
0.3
C
1.2
•
1.0
tau
•
•
•
• • •
••
••
•• ••• •• • •••• •
•
•
•
• • •
•
•
• • •
• •• •••
• • ••
•
•
•• •
•
••
•
•
•
•
•
••
• ••
• ••
••
• • •• ••
••• • •• • • •• •
••• •••• • •
• • •• • • •• •
••
• •
•
• • • ••• •• ••
•
• • • •• • ••••••• •
•
•
•
•
•
•
• •• ••• • • • • • • •
• ••
• •
• • •••
•
•
• ••• • •••••• •• ••
•• • •
•
•
•
• • • •••••••••• • •• ••
•••• •
••
• •• • • •• • •
• •• •••••
•
•
•
• • • •
4
0.5
0.4
1.0
•
C
•
•
0.3
6
4
5
R
•
•
0.4
0.8
tau versus C
•
•
• • •
• • •
•
• • •• • •
• •
••
• • •• • • • •
•
• ••
•• • •• ••
•
•
•
••• • • •
• •• ••• •
•
•• • • •• • • • •• ••••
•
•
• • • • • •• •• •
••• •
• •
•
• • •• • ••••••••••••••••••••••• •• •
• •
• • • ••• • •••• ••• ••••••••• •••• • • •
• ••• ••• •• •••• •
•
• • •••• ••••• •• • ••• ••••• •••
•
•• • •• •••••• • •
•
• •• • •
• • •
•
•
••
•
•
0.6
tau versus R
•
••
0.4
tau
•
•
•
•
•
•
••
•
•
•
•
• ••• •• • •
•
••
•
•
•
•• • • ••
•
•
• •
•• • ••••• • • ••• •
•
••• • •••
• • •
•
• •• • • ••• • •• • •• • •
• • • •• ••• • ••• ••• •••• • •
•
••
• •••••• ••• ••• •
•
•
••
• ••• • ••• •• • •
•
• • • • •• • •••••• • • •• •
• • •• •• •••• •••••••• •••• •••••••• ••
•
•
•• •••• • •• •• • •• • •
•
•• • •• • •• •
••
•
•
•
• ••
• •• • •
•
•
•
••
R
•
0.2
2.0
•
•
• ••
0.5
2.5
2.0
1.0
1.5
Theta
•
•
•
•
•
•
•
3.0
3.0
•
7
1.0
R
1.5
••
•
•
• •
•
•• •
•
• •
• •
•
•
••
•
•
•
•
•
•
•
•
2.0
2.5
3.0
3.5
Theta
Figure 4: Plot of Pairs of Song Parameters. Each dot represents the posterior mean.
37
120
100
80
Posterior Mean Lambda
140
Posterior Mean of Lambda versus Time
0
10
20
30
40
Time in weeks
Figure 5: Plot of Posterior Mean Lambda versus Time.
38
50
PEAK v. C
0.7
•
•
0.3
0
20
40
60
80
7
•
•
6
•
R
5
100
•
•
•
•
•
••••
••
••
•
•• •••
•
••
••••
•
•
•
••
•••
••
••
•••
•
•••
0
5
15
20
PEAK v. Theta
3.5
•
•
3.0
7
•
•
20
30
•
1.0
1.5
Theta
•
2.0
2.5
•
• •
10
25
•
•
•
•
• •
• • •
•
•
••
• •• ••
•
•••
•
•
•
••
•
•
•
•
•
•
•
• •
•• •
• ••• • • • •
•
•
•
•
• ••
• • • ••• •
•
••
• •
• •
••• ••• •• •• • •• •• • • • •• •• • • •• ••• •
•
• • •• • •• •• ••• • • ••• • • • •• • •
•
•
• • • • •• •• • •• • •• •• •• • •• • ••• •
• • • • •• • • •
• •• ••
•
• • •••
•
•
•
•• • •
•
• •• •
6
10
•
TOTWEEKS v. R
•
5
•
•• •
•
• ••
•
• ••
• •
•• • ••
• • •
• • •••
•
•• • •
•
•
•
•
• • • •
•• •
•
•
• •
• • •
• •
•
•
•
WKS2PEAK
•
4
•••
••
••
••
••
•
•
••
•••
•••
••
•
PEAK
•
R
•
•
•
•
• •
••• •
• ••
•
•
• ••
•• •
•• ••
•• •
•
•
•
•
•
••
•
••
•
•
•
••
•••
••
•
•
•
•
•
0
•
•
4
0.5
0.6
•
•
• ••
•
• •
• ••
•
•
••• • • • • • •••
•
•• • •
• •
•
•
•
•
•
•
•
• •
•
•
•
• •• ••
•
•
• •
• ••• • •• •
•
•
•
•• •
•
•
•
• •
•
•
•
•
• •
• •• • •• •
••
•
••
• ••
• • • •
•
• •
• •
•
•
••
• •• • •
••
•
••
•
•
••
• ••• •••• ••
• ••
•
••
• •• • •• • ••• •• •
••
•
•
• •••
• • ••
••
•
•• ••
••••
• • ••• • •
• • • •• • • • • • • • •••
•
•
•
•
•
•••
• • • ••
• ••
•• •
•
• • •• ••• •• ••
• • •• •••
•
•
•
•
•
•
• •
0.4
C
WKS2PEAK v. R
40
•
•
•
•
•
•
•
•
•• • •• •
• •• ••
•
•• • •
• •• • • • • •• •
••
• •
••
• •
• •
•
• ••
••• •
••
•
•
•
•
•
•
•
•
•
• •
• •••
• •
• • • • • •• • •
• ••• •• •••• • • • • • ••• •••
•• • •• • •
•
• •• • •• •
•
• •
•
•
••• • • • •
• ••
•
• • •• ••
• • •••• • •• •• •• •
• • •• •
•• • ••
•
••
• •
•• • • •• ••• •• • •• • • •
•
•
•
•
• •• •
••
••
••
• •
• • • • •• • • • • •
•
•
••
•
•
•
•
• •
0
TOTWEEKS
•••
20
40
60
80
100
PEAK
TOTWEEKS v. Theta
3.5
•
•
3.0
•
•
••
•
•
•• • • •
•
•
• • • •• • •
• ••
•
•
•
•
•
• •
•
• • • •
••
••• • • •
•
• • •
•• •
•••
•
•
•
•
•
•
•
• •••
• • • • • ••
• •• •• • • • • •• •• •• • •• ••• • •
•• • • • • • •
•
•
••
•• •• • • • •
• •
•
••• • ••• • •• •• • ••• ••• • • •• •• • • • •
•
•
• • •
• • • • • •• • •• • • • ••
•
•
•
•
•
•• • •• •• •• •• • •
••• ••
•
•
• •
2.0
•
1.0
1.5
Theta
2.5
•
0
10
20
30
•
40
TOTWEEKS
Figure 6: Plot of Observed Song Features versus Parameters for: Peak Ranking (PEAK) v. C,
Weeks to Peak (WKS2PEAK) v. R, # Weeks in Chart (TOTWKS) v. R, PEAK v. , and
TOTWKS v. .
39
100
60
80
• ••• ••••••••
• • ••••••••••• • • • •• •• •••• • •••• ••••••••••••••••••••••••••••••••••••••••••
•• ••
• • • • • ••••• ••••••
•
••• ••• • •
•
• •
•
• ••
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
0
0
20
40
80
60
40
•
20
Fitted Entering Position
• • •• ••• ••••••••••••
• •• • •
• •••••••••••• ••
••••• •• ••••••••••••••••••
• ••••••••••••• ••••
••
•
•••••• •
••
• • •••• •••••••••
•
••
•••• ••• •• ••• •
• •
••
• •• •••• •••
•
• ••• •••• • • ••••
• •
••• ••• • • •
•
• • ••• •
•
•
• • • ••
•
••
•• ••
•
•
••
• •• • • • • • •
•
••
•
•
•
•
•
••
•
• ••
•
•
•
•
Observed v. Fitted Leaving Rank (EXIT)
Fitted Leaving Position
100
Observed v. Fitted Entering Rank (DEBUT)
0
20
40
60
80
100
0
20
40
60
80
100
Observed Leaving Position
Observed v. Fitted # Weeks (TOTWEEKS)
Observed v. Fitted Peak Week (WKS2PEAK)
25
40
Observed Entering Position
•
•
10
20
20
Fitted Peak Week
•
5
•
•
•
•
••
•••
•••
0
20
0
0
15
•
10
30
•
•
• • • • •• • •
• •
•
•••
•
•
• • •
•
•
••• • • • ••
• •• •
•
• • •• • •• • •
•
•
•• • • •••
• •
•
•• •• •
• •• • • •••• • • •
• • • ••• • •• • •
• • •
• • •• • •
• • •• •• •• ••
•
• •• • ••
•
• •
• • •• • •• •• ••• •
•
• • •
••• •• ••• • •• • •• • •• •
• •
••
•• • • • • •• • • • •
•
•• •• • •• ••
10
Fitted # Weeks
•
••• •
30
40
0
Observed # Weeks
••
•
••
••••
•
•
••
••
•••
•
••
••
••••
•
••
•
••
•
••
••
•
••
5
•
• •
•
• ••
• •
•• •
• ••
• ••
• •
• •
•
••
•
•••
•••
••
••
•
•
•
•
••
•
•
•
••
•
•
•
•
•
•
•
•
••
• •
•
•
•• • •• •
•
••
••• •
•
•• •
•
•
•
•
10
15
20
25
Observed Peak Week
100
Observed v. Fitted Peak Ranking (PEAK)
• • • ••• • ••••••••••••••••••
•• • • • •
•
•
• • •• • • •
• •• •• •
•
• •• • •••• ••
• • ••
•
•
• • • • •••
• • •• •• • • •
•
•
•
••• • • ••
•
•• •••• • • •
•
•
•
• • •••
••
•
• • • ••
•
•
• •
• • • •
•
• •
•
••
•
•• •
•• • ••• •
•
•
•
•
•
•
• ••
•
•
•
• •• • • •
• •
•
• • ••• • •
•
•
•
•
•• • • •••
•
•• • •
• • ••• • • • •
• • •••
• •• ••••••• •• •• •••
••••••• •••••• •
60
40
0
20
Fitted Peak Ranking
80
•
0
20
40
60
80
100
Observed Peak Ranking
Figure 7: Plot of Observed versus Fitted Curve Summaries. The tted values are the posterior
means averaged over all MCMC draws.
40
20
0
10
TOTWEEKS
30
40
EXIT v. TOTWEEKS
•
•• •• •
• ••
••••••••••• •
• • • •••••
• ••• •• • • • • • • •••• • •• • •• •• •••••• ••••••
• • • • • •••••••
••• ••
• • • • • ••••••••••
•• ••• ••••••••••••••••••••••
• • ••••• ••• ••••
•• ••• • ••••
• •••• ••••
•
• • • ••••••••••••
40
50
60
70
80
90
100
EXIT
60
40
0
20
PEAK
80
100
EXIT v. PEAK
•
••••••••••••
•
•
•
•
•••• ••• ••
••• • • •••
• ••
•• • ••••••••• ••••
• • ••• • •••••
• •• •• •••••
•
•• • ••• ••••••••
•
•• •• •••••
• •
•
••• • •••••••
• • •••
•
•
• •
•• • •• ••• •• • •••
•• •
• •
•
••
• •• •• • •
• • • •
•
•
• • • ••
••• •••
•
•• •••••••
•• •
•
•• ••• •••• ••
40
50
60
70
80
90
100
EXIT
Figure 8: Plot of total weeks on the charts (TOTWKS) versus rank when leaving charts (EXIT)
- Panel a, and peak rank obtained (PEAK) versus rank when leaving charts (EXIT) - Panel b.
41
© Copyright 2026 Paperzz