When the Ink is Gone: The Transition from Print

When the Ink is Gone: The Transition from
Print to Online Editions
Ignacio Franceschelli
Northwestern University
June 9, 2011
VERY PRELIMINAR. DO NOT CITE.
1
Introduction
Newspapers are acknowledged as the main providers of original news across the
di¤erent media platforms. In this context, the migration of readers from print
to online editions has spurred concern regarding how a future with only online
editions would a¤ect news coverage.1
The Federal Trade Commission (2010) discusses the challenges faced by journalism in the Internet age. In line with existing consensus the document remains
exceptical regarding potential bene…ts Internet might introduce in terms of news
coverage. It is argued that the new business model brought by online news has
produced a substancial decline in newspaper revenue. The document claims
that this …nancial downturn has forced newspapers to …re reporters, event that
has caused a signi…cant loss of news coverage. Finally, di¤erent policy proposals
such as copyrights for news, antitrust exemptions and …nancial support from the
goverment are discussed as possibilities to counteract the negative e¤ect Internet
has supposedly brought upon the quality of news readers have access to.
This paper brings a …rst empirical assesment of the impact of Internet in news
coverage. We are interested in comparing how fast would news be delivered to
readers in a print editions only world relatively to an online editions only world.
Ignacio Franceschelli, 2001 Sheridan Rd, Northwestern University, Department of Economics, Evanston, IL 60208, US. [email protected]. Tel: (847) 491 8247. I thank
Matthew Gentzkow and Igal Hendel for generous help and discussions as well as Pablo
Boczkowski, Sebastián Campanario, Peter DiCola, Rafael Di Tella, Gastón Roitberg, Jesse
Shapiro and participants at the IO lunch at Northwestern University.
1 Comscore points out that between April 08 and April 09 print only readers in the US have
decreased by 12 percent from 48 to 42 millions and print and online readers by 10 percent
from 38 to 34 millions. Meanwhile, online only readers have soared by a 26 percent from 28
to 35 millions. Overall, newspapers readers have decreased only by 2 percent from 114 to 112
millions.
1
We start, however, by analyzing the relevance of this issue. We evaluate the
extent to which readers value learning about news stories sooner. To perform
this analysis we combine a database that includes over one million news stories
with daily print and online readership numbers. We …nd that the average age
of the news stories reported in the newspaper edition has a negative e¤ect on
readership. Readers value learning about news sooner. The e¤ect is of similar
magnitude for online and print readers.
We then continue to analyze the e¤ect of Internet on the time taken by news
to arrive to readers. To conduct this analysis we reconstruct the typical timeline
of a news story in a print and in an online world. By timeline we refer to the
di¤erent milestones the story undergoes since the actual news event takes place
until the last newspaper reports about it.
We introduce a model to interpret the news story statistics in terms of the
parameters that give place to the timeline. In our framework the speed at
which news are reported depends upon how fast does the newspaper gets the
story and how quick it is able to publish it. Newspapers are able to obtain news
stories from reporters, who genuinely discover stories, and from rewriters, who
simply copy stories from the rival. The unique di¤erence we impose between
the newspaper platforms is that while print editions have simultaneous daily
updates the online editions can be updated anytime.
Figure 3 portraits the results. As pointed out by the Federal Trade Commission document we observe that due to reduced reporter sta¤ online editions
are slower at genuinely discovering news stories. However, there are two factors
downplayed by the document that allow online editions to prevail in the end
over their print counterparts. Continuous updates allows them to publish the
news story as soon as the newspaper discovers it. Also, rewriters play an important role in online editions by allowing newspapers to reduce the time taken
to report a story after the rival broke the news. The exercise shows that in a
world populated only with online editions the last reader to learn about a story
would do it faster than the …rst reader to do it in a world populated only with
print editions.
In brief, we …nd that readers value learning about stories sooner and that
Internet increases the e¢ cacy of newspapers at performing this task. This result
brings a fresh perspective into the existing body of literature which has so far
emphasize the negative aspects that Internet has introduced into the newspaper
industry.2
2
Identifying news stories
The main novelty brought by the project relates to the focus on the news story
level. A news story refers to a particular event to which newspaper articles
can be associated with (e.g. Saddam Hussein being captured). This approach
2 For example, Athey et al (2011) and Bergemann et al (2011) develop models where Internet
negatively a¤ects the level of revenues perceived by newspapers.
2
represents a detour from existing literature which has relied so far on more
aggregate measures when studying the newspaper industry.3
The news story level of analysis requires an important e¤ort in terms of data
collection and article clustering. We consider every article published by the
two main newspapers in Argentina in their online (2003-2009) or print editions
(1998-2009). As we will see studying the argentine market presents several
advantages for the purposes of this research.
First, we developed a program that navigated through the millions of article
web pages and identi…ed and downloaded the di¤erent article components such
as titles, text and time of publication. The program needed to be ran for several
months.
Second, we implemented an algorithm based on word frequency to classify
a total of more than 1.2 million articles into the news events the articles are
about. The algorithm performed over 570 million articles comparisons in order
to build the di¤erent news clusters.
This data process allowed us to reconstruct the exact timeline for each news
story reported by the media. It will allow us to determine which newspaper got
the scoop and how long did it took the rival to publish the story.
2.1
Newspaper Industry in Argentina
We focus on the two main newspapers in Argentina: Clarin and La Nacion.
The availability of high frequency circulation data represents a …rst advantage
of studying the argentine newspaper industry. The Instituto Veri…cador de
Circulaciones (IVC) provides circulation data per month for each of the seven
weekdays.4 We have 7 circulation numbers per newspaper per month, 84 per
newspaper per year and 1,008 per newspaper in our sample that consists of a
twelve years period. This data is much more detailed than the one available
for US newspapers which is aggregated at the quarterly level. Daily data on
unique users is available for the online editions from Alexa.com. We aggregate
the online readership data at the same level than the print readership data for
consistency purposes.
A second advantage of studying the argentine market is that most circulation
comes from newsstands not from subscriptions. In fact, kiosk sales account for
more than 76 percent of newspaper circulation. The high frequency variation
in circulation allows us to estimate the sales e¤ect of publishing recent news.
Otherwise, the variation between newspapers in scoop generation would vanish
as we consider the longer periods associated with changes in the number of
subscriptions.
Finally, the two newspapers we study take a big portion of the print and
online readership market. Clarin circulation represents 36 percent of the print
newspaper market and La Nacion has 14 percent of the market. Moreover,
together they represent over 70 percent of the non-yellow newspaper circulation.
3 Di Tella and Franceschelli (2011) represents an exception. They study the e¤ect of the
reporting of corruption on circulation in Argentina using monthly …gures.
4 The IVC is the Argentina’s equivalent to the US Audit Bureau of Circulations.
3
clarin.com and lanacion.com are also the top online newspapers. clarin.com
leads with 40 percent of the online newspaper market while lanacion.com has
21 percent.
2.2
News Clustering
In Argentina there is no digital database that contains all the information we
need regarding the articles published in newspapers. We require to obtain the
title, subtitles, abstract and text for every article ever published in the online or
print edition as well as the newspaper section, day and exact time of publication
(for online articles). To the best of our knowledge such a complete database does
not exist as well in any other media market in the world.
However, both Clarin and La Nacion provide readers in their website with
an historical archive that displays every article included in their print or online
edition. We developed a program based on the iMacros platform to automate
the download of all articles in the print edition in the 1998-2009 period and in
the online edition in the 2003-2009 period.5 This program navigated through
each of the millions of articles and recognized and collected the title, subtitles
and text of each article as well as the day, time, edition and section of the
newspaper in which it was published.
There are 1,006,685 articles published in the print editions and 240,110 articles published in the online editions. Overall La Nacion accounts for 51.5
percent of the print articles and for 55.2 percent of the online articles. For each
of the 1,246,795 articles included in the data we want to identify the news event
the article refers to. This will allow us to analyze the timeline of each story.
We are particularly interested in determining which newspaper got the scoop.
Of course, it would be not viable to consider all possible combinations in order
to construct the news clusters. Instead, we decided to divide the database into
news cycles and compare the articles within each of these time units.
2.3
News Cycles
We de…ne a news cycle as the 3 am-3 am twenty four hours period that includes
the print editions from two consecutive days as well as the online articles posted
within this period. Underlying this decision is the assumption that if a newspaper were to copy a news story in the print edition it would do it at most in the
following edition. Print editions close approximately at 3 am. The news cycle
de…nition also assumes that if a newspaper were to copy a news story in the
online edition it would do it in the same 3 am-3 am window in which the event
was …rst reported. Figure 1 provides us with the time distribution of the …rst
appearance of news stories online. The news cycle for online news seems to …t
well the 3 am-3 am time frame. In particular, we observe that the lowest level
of activity is observed in the 2 am-4 am period.
As a consequence of the news cycle de…nition the di¤erence between newspapers in publishing a certain news story will be by construction at most one day.
5 The
historic archive for the Clarin online edition begins on 2003.
4
For print editions it could either be 0 days –if both newspaper publish the story
on the same day- or 1 day –if they do it on di¤erent days. For online editions
we have that the time di¤erence can take any value within the [0,1] interval.
Further in the paper we will study the conditions under which we should expect
the mean time di¤erence to be same under both platforms.
It is important to note that the de…nition of twenty four hours news cycles
does not imply that news stories can last at most one day. News stories can
go on for days, weeks or months as long as the story continues to show up in
every consecutive news cycle. However, if a certain event does not appear in the
newspapers online or print editions during a news cycle then when it reappears
it will be classi…ed as a di¤erent news story. For example, if no reports on
the Iraq war appear for a week, when the newspapers return to the subject a
di¤erent news story will be created. We interpret news stories not as speci…c
topics but as di¤erent pieces of information that might or might not relate to a
common topic.
In the 1998-2002 period each news cycles contains an average of 456 articles.
The …gure goes up to 556 articles in the 2003-2009 period when online articles
are incorporated into the sample.
2.4
News Events
Following the computer science literature (see Salton 1988) we run an algorithm
that relies on word frequency over each of these news cycles to identify the
di¤erent news stories. In this paper we decided to consider only the proper
nouns included either in the title, subtitles or text of each article. Proper nouns
represent unique entities such as people, places or institutions. The dataset is
composed of more than 4,300 news clusters each of which includes about 500
articles. Over 56,000 di¤erent proper nouns are used in the database and each
newscluster mentions about 1,200 of them.
The …rst article in the news cycle is assigned to the …rst news story. The
second article is compared to the …rst news story. If similarity –which we de…ne later- in the proper nouns frequency between the …rst and second article
surpasses certain threshold then the second article is assigned to the …rst news
story. This news story will now include the proper nouns from both the …rst
and second article. However, if similarity does not surpass the threshold then
the second article is assigned to a second news story. We continue the process
for every article included in the news cycle. We then repeat the exercise for
every news cycle.
The article D is represented by the vector D = (tf1 ; : : : ; tfp ) where tfi is
the term frequency of proper noun i in article D. tfi is de…ned for all p proper
nouns that appear in any article in the news cycle. Vector D has an average size
of 1,200. We use the weight factor log(N=ni ) to weight the term tfi . N is the
total number of articles in the database and ni the total number of articles in
the database in which the proper noun i appears. This weight factor gives more
in‡uence in the word frequency comparison between articles to those proper
nouns that are relatively rare in the news.
5
The similarity coe¢ cient between articles D and news story Q is normalized
to be in the [0; 1] interval:
similarity(Q; D) = (
t
X
wqk
wdk
k=1
t
X
k=1
(wqk )2
t
X
(wdk )2
k=1
)0:5
(1)
Here we de…ne wdk to be:
wdk = (
tf k
t
X
log(N=nk )
(tf k
log(N=nk
k=1
))2
)0:5
(2)
We set the threshold value to be 0.5. If there is no news story Q such that the
similarity coe¢ cient is greater than 0.5 then article D is assigned to a new news
story. To decrease potential mistakes we only compare articles that appear in
the same newspaper section (Sports, Entertainment, Argentina, International,
Opinion or General Information).
The algorithm identi…es a total of 943,799 news stories out of the 1,246,795
articles included in the data. In the appendix we include several exercises to
check for the reliability of the news clustering process. First, we rerun the
algorithm without imposing the restriction on comparing only articles within
the same newspaper section. The results show that articles from very dissimilar
sections are rarely assigned to the same news story. Second, we compute the
most important news events according to our database. We de…ne the relevance
of a news story as the fraction of articles in an edition the story covers. The
results show well de…ned news clusters built around stories that include very
important news events such as Lehman Brothers bankruptcy, Benedict XVI
inauguration or the assassination of Benazir Bhutto.
Through the paper we will consider only those news stories that were reported at some point in time by both online or both print editions. As a consequence a total of 95,961 stories and 299,455 articles will be considered. This
approach will allow us to de…ne the time di¤erence in reporting news stories between both newspapers within each platform. It will also permit us to overcome
the obvious issue that a newspaper might not report a news event because it is
not interested in doing so and not because it does not have yet the story.6
Table 1 and 2 provide statistics for the news stories by newspaper and platform. Figure 2 shows the time di¤erence in reporting news stories for online (in
minutes) and print editions (in days).7
6 It
is hard to think about a reason why a newspaper would decide to pospone reporting
about a story and therefore imposing a risk in the chances of having the scoop.
7 On both …gures negative values correspond to Clarin breaking the news and positive values
to La Nacion releasing the scoop.
6
3
The value of getting news faster
We do not know the time when the actual news events the stories are about
took place. In the next section we will implement an indirect approach to
compute the average time taken to report on stories since news events take
place. However, computing this same time span for each story included in the
data would imply performing an individual research for each story. Given the
scope of the project that includes about one hundred thousand stories this task
turns out to be unfeasible.
We do know, however, for each article included in the data the time gone
since the story the article is about was …rst published in each platform. We will
then combine this information with daily data on online and print readership
to compute the value for readers of getting news faster.
We start by de…ning agea as the time gone since the news story reported in
article a was …rst published by any of the two newspapers within the platform.
agea = Pa
Pa0
(3)
Pa represents the time of publication of article a and Pa0 represents the time
of publication of the …rst article about the story a is about.
We further de…ne age to be the average of agea across all articles included
in the newspaper edition.
age =
1 X
agea
N a
(4)
If the newspaper only reports scoops then age takes a value of 0. Alternatively, if the newspaper only reports stories that are one day old then age takes
a value of 1. By construction age belongs to the interval [0,1].
We model the indirect utility of consumer i from reading newspaper j on
day t, as a function –among other factors- of how old are the articles he has
access to.8
uijt = agejt + typejt +
i = 1; ::; It
j = 1; 2
j
+
t
+
jt
+ "ijt
(5)
t = 1; :::; T
The vector type accounts for the fraction of articles in each newspaper section. j and t are dummy variables for day t and newspaper j. We assume
that the "ijt are distributed according to a Type I extreme value distribution.
8 To perform a consistent comparison between online and print editions we de…ne t per
month for each of the seven days in the week. We observe daily readership for the online
edition, however, circulation data is observed per month for each of the seven days of the
week. This implies using data at the quasi-daily level, instead of having 365 observations per
newspaper per year we have 84. Therefore, each time we mention the day as unit of time we
will be actually be referring to the aggregate over each of the seven days of the week for each
month.
7
This assumption implies that the market share of newspaper j in day t is given
by:
ln(sjt )
ln(s0 ) = agejt + typejt +
j
+
t
+
jt
(6)
We proceed to estimate (6) for online and print editions on two separate
regressions.9 The results are displayed in Table 3. We observe that the time
gone since the news story was …rst released has a negative e¤ect on readership
for both online and print readers. Moreover, the e¤ect is of similar magnitude
for both platforms. The e¤ect of age on online readers is ^ o = 0:76 and the
e¤ect of age on print readers is ^ p = 0:67. Readers value learning about news
stories sooner. These results con…rm the relevance of the discussion regarding
the relative performance of newspapers in delivering news fast in an online and
print world.
The results are also -to the best of our knowledge- the …rst ones to link
daily newspaper content with daily readership. So far existing literature on
newspapers has relate yearly circulation to lower frequency measures such as
the political position of the news outlet. Unlike other industries newspapers
present day-to-day variation in their content. Each day the newspaper sta¤
aims to include in the edition news stories that will interest their readers. This
daily process, that is at the core of the newspaper industry, has never been
studied before with the use of a large database.
Finally, the importance of fresh news to readers also give us a motivation
to explore the role of scoops. This role has been discussed in relation to its effect over the reputation of newspapers (see for example Genztkow and Shapiro,
2008). In this paper we focus, instead, on an alternative mechanism by which
scoops generate value to readers: breaking news allows newspapers to provide
readers with fresh news. As long as readers value learning about news stories
sooner -as the results showed- then breaking stories should have a positive daily
impact on readership. Moreover, the di¤erent update technology avaible suggest that scoops should have a larger e¤ect over print readers than over online
readers.
On the one hand, daily updates imply that within the print plaform not
getting the scoop means that the news story will be one day old when the
newspaper …nally reports it. On the other hand, the data shows that non scoop
stories will be only one tenth of a day old in online editions.
We de…ne scoop as the fraction of articles that include a scoop in the newspaper edition. As previously discussed, daily updates implies that for print editions
we have a perfect relationship between age and scoop (as can be observed in
Figure 2b).
age = (1
scoop)
(7)
9 We have circulation numbers for print editions from the Instituto Veri…cador de Circulaciones (IVC). These numbers are available for the 1998-2009 period. The data on unique
users for the online editions was obtained from Alexa.com. However, the historic data from
Alexa.com at the daily level is available for October 2007-2009 period. This explains the lower
number of observations for online editions.
8
Expression (7) implicates that the e¤ect of scoop on readership is given by
^ p = 0:67, the negative value of the coe¢ cient computed in Table 3.
Meanwhile, continuous updates implies that there is not a perfect relationship between age and scoop for online editions (as can be obsserved in Figure
2a). The value of agea varies for di¤erent non scoop articles. Nevertheless, we
know from Table 2 that agea for non-scoop articles will be on average 0.10 days,
a mere 10 percent of the 1 day value we observe for print editions.
Therefore, we have that
E(age) = (1
scoop)
0:1
(8)
The disparity between age and scoop for online editions allow us to perform
an additional test to provide further evidence of the relevance of providing faster
access to news stories.
Using (8) it is possible to express equation (6) as a function of scoop for
online editions.
ln(sjt )
ln(s0 )
=
=
=
10
10
(1
scoopjt ) + typejt +
scoopjt + typejt +
scoopjt + typejt +
j
j
+
j
+
t
+
t
+
t
+
+
jt
(9)
jt
jt
We proceed to estimate (9) for online editions. The results are displayed in
Table 4. scoop has a lower impact on reducing age in comparison to what we
observe for print editions. Within the online platform scoops are not that much
needed in order to publish fresh news. Of course, this is not the case for print
editions where the newspaper needs to publish the scoop or otherwise the story
will be one day old when reported. If we also consider that the novelty of news
has similar e¤ect over online and print readership -we have similar ^ valuesthen the model predicts a smaller e¤ect of scoop on online readers.
o
p
=
(10)
10
The results con…rm this notion. The e¤ect of scoop on print readers is
^ p = 0:67 (Table 3) and the e¤ect of scoop on online readers is only ^ o = 0:14
(Table 4). Although the di¤erence is not as big as predicted by the model, it
remains large.
Online and print readers prefer to learn about stories sooner. This result
gives relevance to the comparison on the speed to deliver news in a print and
in an online world that we will perform in the next section.
Nevertheless, it is also worth noting the importance of the documented reduction in the value of scoops in the online era. This phenomenon introduced
by a change in the updating technology available for newspaper editions might
bring huge changes in the newsroom composition. Indeed, anecdotal evidence
suggest the existence of less reporters and more rewriters in the online editions
9
sta¤ in comparison to the print editions sta¤. One of the newspapers analyzed
in this paper has about a 500 people sta¤ in the print edition and a 50 people
sta¤ in the online edition. More importantly, average wages in the online edition
are only 1/3 of the ones observed in the print edition. This issue will also be
studied in the next section.
4
News stories timeline
Our aim in this section is to compare the performance of online and print editions
in delivering news stories. To make such a comparison we will study the typical
timeline of a news story within each platform. The timeline will show the
di¤erent stages the news story goes until the last newspaper publish an article
about it. We start by presenting some …gures obtained from the data and we
then introduce a framework to interpret these empirical …ndings in terms of the
parameters needed to reconstruct the news story timeline.
Table 1 and 2 provide statistics for the news stories by newspaper and platform. While we observe similar …gures for both newspapers within each platform, important di¤erences arise once we compare across platforms.
Table 1 provides information regarding the …rst outlet to break a story. We
observe that 7 percent of the stories were …rst reported by Clarin print edition,
7 percent by La Nacion print edition and 19 percent by both print editions
on the same day. Meanwhile, 32 percent of the stories were …rst reported by
clarin.com, 34 percent by lanacion.com and 1 percent by both online editions
on the same minute. So, while we see similar success rates at breaking stories
by both newspapers within each platform there is a huge disparity between the
33 percent of stories that …rst appeared in a print platform and the 67 percent
that did it online.
A similar pattern is observed when we focus on Table 2 which shows the time
taken to publish stories broke by the rival. We observe that clarin.com takes 0.11
days to publish a story that was …rst reported by lanacion.com and lanacion.com
takes 0.09 days to publish a story that was …rst reported by clarin.com. The
disparity, however, arises when we compare the 0.31 day di¤erence for print
editions with the 0.10 day di¤erence for online editions.
We now introduce a model in order to interpret the results displayed by
Tables 1 and 2 in terms of the performance of online and print platforms in
delivering news. In line with the data …ndings, through the model we assume
that both newspapers share the same technology to obtain and publish stories
within each platform. Nevertheless, we allow for potential disparities to emerge
when we compare across platforms. We have two newspapers –Clarin and La
Nacion- each with a website and a paper edition. We treat each platform within
the same newspaper as independent of each other. While each platform has its
own sta¤, we are aware that strategic decisions at the newspaper level might
undermine the independence assumption.
The time of publication of news stories will depend on two factors: how fast
does the newspaper gets the story and how fast can the newspaper update its
10
edition to publish the acquired story.10
We start by assuming that online and print platforms share the same technology to obtain stories. For this to happen we begin the analysis in a situation
where newspapers can only rely on reporters to obtain stories. Reporters, as
opposed to rewriters, genuinely discover stories and do not steal contents from
the rival. As a consequence, online and print editions only di¤er on the updating
technology available. While print editions have simultaneous daily updates (at
midnight), continuous updates are available for online editions.
We assume that news events take place uniformly along the day, where the
day is represented in the [0; 1] interval.
The sta¤ of reporters from each newspaper and platform takes Di days to
discover a story since the actual event took place. Here i = o for online editions
and i = p for print editions. We de…ne Di to have a uniform distribution that
depends on Di which re‡ects the relative ability of each platform to get scoops,
which we associate to the number of reporters in their sta¤. Hence, we have
Di U [0; Di ]. As previously mentioned for the time being Do = Dp = D. We
also assume Dp < 1.
It is important to note that even if online and print editions shared the same
technology to discover stories we should not expect them to have similar success
at breaking news. Naturally, continuous updates gives online editions a huge
edge over print editions in being the …rst ones to report stories. This issue is
re‡ected in Proposition 1.
Proposition 1 If online and print editions share the same technology for obtaining stories then print editions should break less than 13 percent of the stories.
Proof. We have two newspapers within each platform. The time taken by each
platform to discover a story is given by wi = minfD1i ; D2i g. It is possible to
check that wi has a triangle distribution given by the density function,
8
2wi
< 2
if 0 wi D
i
2
(11)
f (w ) =
D
D
:
0
otherwise
In this context, online editions will have the scoop about every story that
originates early in the day. The online editions will discover -and instantly
publish- the story before print editions even get the chance to update their
editions. This will happen for stories with T such that T < 1 D.
Print editions can only hope to capture a fraction of those stories that occur
later in the day. For the print editions to get a scoop they need to discover the
story before midnight (wp + T < 1) and also need the online editions to get it
after midnight (wo + T > 1). The …rst (second) probability is captured by the
…rst (second) term in equation (12).
1 0 Getting the story implies not just learning about the news event but also writing an article
about it.
11
The fraction of stories broke by any of the two print editions is given by,
9
9 8 D
8
>
Z1 < 1Z T
= >
<Z
=
Sp =
f (wp )dwp
f (wo )dwo dT
(12)
; >
:
>
:
;
1 D
=
0
1 T
2
D
15
We observe that for events with T close to 1 D or 1 the chances of the
print edition getting the scoop are slim. In the …rst case, the chances online
editions do not get the story before midnight tends to zero. In the second case,
the chances print editions get the story before midnight tends to zero. So, the
probability of any print edition breaking the story increases for events that occur
away from both thresholds.
Given that D < 1 equation (12) implies that print editions should get less
than 13 percent of the scoops.
As pointed out by Table 1 the data shows that although online editions
have most of the scoops, print editions break much more stories than the model
prediction. Indeed, 33 percent of the scoops are released by the print editions,
a value well above the 13 percent bound. Within the model framework this
disparity suggests that Dp < Do , what would imply that print editions have a
larger reporter sta¤.
The second piece of evidence that we can use as a reference regarding the
technology available for each platform to acquire news concerns the time di¤erence in reporting stories.
We continue to consider a situation where reporters are the only source for
stories and where we have Do = Dp = D. In this scenario, it is possible to check
that the expected time di¤erence in publishing a story between Clarin and La
Nacion is the same for both the online and print platforms. We will formalize
this result. Nevertheless, it is useful to start by stating the intuition behind this
conclusion.
On the one hand, a news story that is discovered at say 11 pm and at 1 am
respectively by the two newspapers would be published with a 2 hours di¤erence
in the online editions and a 24 hour di¤erence in the print editions (because of
the midnight deadline). On the other hand, a news story that is discovered at
say 9 pm and at 11 pm would be published with a 2 hour di¤erence in the online
editions and a 0 hours di¤erence in the print editions. In the end, if news events
originate uniformly along the day then both e¤ects cancel each other out and
the expected time di¤erence in the reporting of news stories results to be the
same for print and online editions.
Proposition 2 If online and print editions share the same technology for obtaining stories then the expected time di¤ erence between newspapers within each
platform in publishing a story should be the same.
12
Proof. The time di¤erence between platforms in discovering a story is given by
v i = P i = jD1i D2i j. It is possible to check that v i has a triangle distribution
given by the density function,
8
2v i
< 2
if 0 v i D
i
2
(13)
f (v ) =
D
D
:
0
otherwise
The di¤erence between discovering and reporting a story only arises for print
editions which have to wait until midnight to update their edition. For online
editions, continuous updating implies that the time di¤erence in …rst publishing
a story is equal to the time di¤erence in the discovery of the story by each
newspaper.
Therefore for the newspaper websites we have,
Z1 ZD
=
f (v o )dv o dT
o
E( P )
(14)
0 0
D
3
=
Meanwhile, given the midnight deadline for print editions we have a zero
di¤erence if both news outlets discover the story before midnight (scenario A)
and a one day di¤erence if they do it on di¤erent days (scenario B).
E( P p )
= E( P p jA)P (A) + E( P p jB)P (B)
= 0P (A) + 1P (B)
= P (B)
(15)
Therefore, the expected time di¤erence in publishing a news story for print
editions is going to be the probability that the news story is discovered on
di¤erent days.
It is possible to check that,
P (B)
=
Z1 ZD
f (v p )dv p dT
(16)
1 D1 T
=
D
3
Then we have,
E( P p ) =
13
D
3
(17)
Once again the evidence contradicts the model prediction. Table 2 shows
that the average P p = 0:31 is much larger than the observed average P o =
0:10. This piece of evidence suggests that Dp > Do , what would imply that
online editions have a larger reporter sta¤.
So, the relatively high fraction of scoops released by print editions suggests
that paper platforms have more reporters in their sta¤ than online platforms.
However, the small time di¤erence between online editions in reporting stories
suggests exactly the opposite.
We will then incorporate into the model the possibility that newspapers
also rely on rewriters to obtain stories as a way to reconciale these seemingly
contradictory empirical …ndings.
Hiring rewriters will give newspapers the possibility of copying stories from
the rival. Each rewriter sees the story in the rival outlet as soon as it is published
and takes C U [0; C] days to rewrite it.
Within the model framework for print editions hiring rewriters does not
change either the fraction of scoops released or the the time di¤erence in the
publication of stories. Dp < 1 implies that hiring rewriters will not allow the
newspaper to publish the story any sooner (because of daily updates). The same
is of course not true for online editions.
It is then possible to estimate Dp by using the observed average P p = 0:31.
By using equation (17) we compute Dp = 0:93 which satis…es the restriction
Dp < 1.
By de…nition, the existence of rewriters does not a¤ect the fraction of scoops
released by online and print editions. These proportions depend only on the
presence of reporters. We now allow for Dp 6= Do and recalculate the proportion
of stories that would be broke by print editions in this new scenario. It is possible
to check that if Do > 1 then the fraction of scoops for print editions is given by,
S
p
=
o
1 RD p D
R
0
1
R1
f (wo )dwo dT +
Dp 2
1R T 2R T
f (wp )
f (wo )dwo dwp dT +(18)
1 Dp 0 1 T
1 T
o
D
R
R1
f (wo )dw2 dT
T
The …rst term in equation (18) accounts for stories that will be for sure
discovered by the print edition, but that might not be discovered by the online
edition before midnight. The second term accounts for stories that might be
discovered before midnight by the print edition but not by the online edition.
The third term accounts for stories that might not be discovered by the online
edition even the day after they originated. All these three scenarios constitute
instances where the print edition gets the scoop.
Making expression (18) equal to the observed print scoop fraction S p =
0:33 and solving the equation numerically give us a value Do = 1:80 when we
use Dp = 0:93. This value implies that print editions are 48 percent faster
in genuinely discovering stories than their online counterparts. Nevertheless,
continuous updates allow online editions to have 67 percent of the scoops.
14
We will now use these …ndings to incorporate rewriters to the model.
Proposition 3 The existence of rewriters explains the lower time di¤ erence in
publishing stories observed for online editions.
Proof. For online editions we now have that the time di¤erence in …rst reporting
a story is given by the minimum time taken either by the reporters sta¤ or the
rewriters sta¤ to get the story.
E( P o )
= E(minf Do ; Cg)
=
RCRC v
00
0
f o (v)dvdC +
C
RC C
C
(19)
RCRv C
00
dC
o
D
R
C
f o (v)dCdv +
f o (v)dv
C
The value C = 0:22 makes equation (19) equal to the observed
P o = 0:10.
The estimated numbers predict that 94 percent of the stories published but
not broke by an online edition are being stolen by rewriters and only 6 percent
of them are genuinely discovered by reporters. Of course, reporters remain as a
valuable resource for online editions in terms of providing scoops.
We now have all the components required to reconstruct the typical timeline
of a news story in the online and print platforms. The estimated parameters
values are Dp = 0:93, Do = 1:80 and C = 0:22. With these numbers it is
possible to check that online editions are faster than print editions in delivering
news. Indeed, the last online edition to publish the story is faster than the …rst
print edition to do it.
Proposition 4 Online editions are faster at delivering news than print editions.
Proof. The time taken by the …rst online edition to discover and publish the
story is given by,
E(minfD1o ; D2o g)
=
=
Do
3
0:60
(20)
And the extra time taken by the second (last) online edition to publish the
story is,
E( P o ) = 0:10
(21)
Meanwhile for print editions the time taken by the …rst newspaper to discover
the story is,
E(minfD1p ; D2p g)
=
=
15
Dp
3
0:31
(22)
This number di¤ers -due to daily udpates- with the time taken to publish
the story,
E(minfP1p ; P2p g)
=
1 RD p
(1
T )dT +
0
R1
1
p
D
R
(2
T)
R1
Dp
1R T
(1
T)
f (wp )dwp dT +
(23)
0
f (wp )dwp dT
1 Dp 1 T
=
0:81
And the extra time taken by the second (last) print edition to publish the
story is,
E( P p ) = 0:31
(24)
The results are displayed in Figure 3. Online editions are slower at discovering stories. It only takes 0.31 days for the …rst print edition to discover the
news event since it took place. The …gure goes up to 0.60 for online editions.
Within the model framework this disparity is interpreted as a lower number of
reporters in the sta¤ of online editions. Nevertheless, daily updates increase by
0.50 days the time taken by print editions to publish stories. As a consequence,
instant updates allows online editions to break news stories faster than their
print counterparts. The …rst online edition to report the story takes 0.60 days,
while the …rst print edition to do it takes 0.81 days. We also observe that the
time di¤erence in the reporting of stories is much smaller for online editions.
The presence of rewriters coupled with instant updates allows online editions
to reduce this di¤erence from 0.31 days to only 0.10 days in comparison to the
print editions.
The results predict that in a world only with online editions the last reader
to learn about a story would do it faster than the …rst reader to do it in a world
only with print editions. Contrary to popular belief Internet has enhanced
news coverage. Continuous updates and the existence of rewriters allows online
readers to have faster access to news despite the fact that online editions have
reduced the number of reporters in their sta¤.
16
Table 1
Fraction of stories by newspaper and platform that had the scoop
(03-08)
Print
Online
CL
LN Both
All
CL
LN Both
All
…rst …rst …rst stories …rst …rst …rst stories
0.07 0.07 0.19
0.33
0.32 0.34 0.01
0.67
17
Table 2
Time di¤erence in reporting news stories within each platform
(in days)
Print
Online
CL
LN
Both
All
CL
LN
Both
All
…rst
…rst
…rst
stories
…rst
…rst
…rst
stories
1
1
0
0.31
0.11
0.09
0
0.10
(0.00) (0.00) (0.00) (0.46) (0.13) (0.11) (0.00) (0.12)
Standard Errors in Parenthesis
18
Table 3
Age of News and Readership
Online
Print
(07-09)
(98-09)
Age
-0.841*** -0.761*** -0.853*** -0.667***
(0.183)
(0.251)
(0.117)
(0.112)
Day FE
YES
YES
YES
YES
Newspaper FE
YES
YES
YES
YES
Type FE
NO
YES
NO
YES
Adjusted R2
0.97
0.98
0.93
0.94
N of Observations
406
406
2,016
2,016
N of Day E¤ects
203
203
1,008
1,008
Standard Errors in Parenthesis
19
Table 4
Scoops and Readership
Online
(07-09)
Scoop
0.193*** 0.144***
(0.036)
(0.048)
Day FE
YES
YES
Newspaper FE
YES
YES
Type FE
NO
YES
Adjusted R2
0.97
0.98
N of Observations
406
406
N of Day E¤ects
203
203
Standard Errors in Parenthesis
20
Figure 1 - Distribution of Stories by First Time of Publication (Online
Editions)
21
Figure 2a - Time Di¤erence in the First Publication of News Stories (Online
Editions)
22
Figure 2b - Time Di¤erence in the First Publication of News Stories (Print
Editions)
23
Last online edition
publishes the story
Online Editions
0.6
0
News event
takes place
days
0.7
First online edition
discovers /publishes the story
(daily updates)
Print Editions
Last print edition
publishes the story
0
0.3
0.8
News event
takes place
First print edition
discovers the story
First print edition
publishes the story
1.1
Figure 3 - Online and Print editions e¢ cacy at delivering news
24
days
A
Appendix
In order to check the reliability of the news clustering process we rerun the
algorithm for 1998 without imposing the restriction on comparing only articles
within the same newspaper section. If articles from very dissimilar sections (e.g.
Sports and any other section) are assigned to the same news story then chances
are it is an error. Table A1 provides us with the results for those news stories
with exactly two articles. We observe that 13 percent of the matches correspond
to articles from di¤erent sections. Still, some of these articles could be correctly
assigned to the same news story. Some sections such as Argentina and General
Information have blunt limits. Nevertheless, note that the …gure goes down to
just 4 percent when we focus on the Sports section, for which links to articles
appearing in other sections are harder to justify.
An additional exercise to check the reliability of the clustering process is to
compute the most important news event according to our database. For this we
need to de…ne a criterion for news story importance. Each day dozens of news
stories are published by the media, but the importance of each of them various
enormously. We use the fraction of articles in an edition that refers to the news
story as a measure of the relative importance of di¤erent news stories.
Results are shown in Tables A2 and A3 for the print editions and in Tables
A4 and A5 for online editions. It is important to note that of the forty news
clusters included in the tables only two cannot be related to a speci…c event
–and therefore cannot be classi…ed as news events.
The assassination of Benazir Bhutto –a former Prime Minister of Pakistantops Table A2 having occupied 8.8 percent of the articles in La Nacion print
edition on December 28, 2007. Elections in the US, war in Irak as well as the
Lehman Brothers bankruptcy, Benedict XVI inauguration and the Clara Rojas
liberation by the FARCs constitute the remaining top international stories. Additional stories that make it to the top 10 in the online editions include the death
of Pinochet, violence in the Middle East and elections in Chile and Venezuela.
Table A3 provides us with equivalent …gures for national news events, which
might not be known for those not acquainted with argentine news. The investiture of President Rodriguez Saá after De la Rua resignation constitutes the
most important event. This story occupied 16.6 percent of the articles in La
Nacion print edition on December 24, 2001. The election of presidents and the
change of Ministers of Economy take most of the remaining spots. Regarding
online editions we have some sports stories that also make it to the top 10.
25
Table A1
Section of News Stories with exactly two Articles
(1998)
Spor
Intl
Arg
Ent Gral
Sports
1,813
10
24
10
34
International
426
52
5
64
Argentina
1,114
8
238
Entertainment
306
59
General Information
823
Opinion
26
Opi
5
35
87
5
44
35
Table A2
Top 10 International News Stories
Print Editions
(98-08)
News Story
Section Paper
Date
Benazir Bhutto Assassination
Intl
LN
28 Dec 07
Bush wins Reelection
Intl
LN
4 Nov 04
Lehmann Brothers Bankruptcy
Intl
CL
16 Sep 08
Saddam Hussein is captured
Intl
LN
16 Dec 03
Bagdad is bombed
Intl
LN
18 Dec 98
Benedict XVI assumes
Intl
LN
20 Apr 05
FARC frees Clara Rojas
Intl
LN
11 Jan 08
Saddam Hussein is captured
Intl
LN
15 Dec 03
Bush takes o¢ ce
Intl
LN
21 Jan 01
Bush wins Presidential Election
Intl
LN
15 Dec 00
27
Fraction
0.08
0.08
0.08
0.08
0.08
0.08
0.08
0.07
0.07
0.07
Table A3
Top 10 International News Stories
Print Editions
(98-08)
News Story
Section Paper
Rodriguez Saa chosen as President
Arg
LN
Cristina Kirchner Inauguration
Arg
LN
Yabran suicide
Arg
LN
De la Rua - unde…ned story
Arg
CL
Presidential Elections in Argentina
Arg
CL
Cavallo assumes as Min. of Economy
Arg
LN
Lavagna resigns as Min. of Economy
Arg
LN
Kirchner to announce Cabinet
Arg
CL
Cobos unties voting in the Senate
Arg
CL
Cristina Kirchner Inauguration
Arg
CL
28
Date
24 Dec 01
11 Dec 07
21 May 98
2 May 00
28 Apr 03
21 Mar 01
29 Nov 05
16 May 03
18 Jul 08
11 Dec 07
Fraction
0.16
0.15
0.15
0.14
0.13
0.12
0.12
0.12
0.11
0.11
Table A4
Top 10 International News Stories
Online Editions
(03-08)
News Story
Section Paper
Date
Saddam Hussein is captured
Intl
LN
14 Dec 03
Dictator Pinochet dies in Chile
Intl
LN
10 Dec 06
Benedict XVI chosen as Pope
Intl
LN
19 Apr 05
Bush wins Presidential Election
Intl
CL
3 Nov 04
Benedict XVI chosen as Pope
Intl
CL
19 Apr 05
Israeli Attacks Gaza
Intl
LN
27 Dec 08
Plebiscite in Venezuela
Intl
LN
14 Aug 04
Violence in Middle East
Intl
LN
22 Mar 04
Bachelet wins Election in Chile
Intl
LN
15 Jan 06
Bush announces War in Irak
Intl
CL
19 Mar 03
29
Fraction
0.22
0.19
0.19
0.18
0.18
0.18
0.18
0.17
0.15
0.14
Table A5
Top 10 International News Stories
Online Editions
(03-08)
News Story
Section Paper
Cobos unties voting in the Senate
Arg
LN
ONGs - unde…ned story
Gral
LN
Lavagna resigns as Min. of Economy
Arg
LN
Argentina plays against Peru
Spor
LN
Nalbandian wins Master at Shangai
Spor
LN
Argentina plays against Colombia
Spor
LN
Gaudio wins Roland Garros
Spor
LN
Cristina Kirchner wins Election
Arg
LN
Tragedy at Cromagnon Discotheque
Gral
CL
Losteau resigns as Min. of Economy
Arg
LN
30
Date
17 Jul 08
25 Sep 03
28 Nov 05
8 Jul 07
20 Nov 05
1 Jul 07
6 Jun 04
28 Oct 07
31 Dec 04
3 Nov 04
Fraction
0.28
0.23
0.22
0.21
0.21
0.21
0.19
0.19
0.18
0.18
References
[1] Athey, Susan, Emilio Calvano and Joshua S. Gans (2011). “Will the Internet
Destroy the News Media? or Can Online Advertising Markets Save the
Media?”, mimeo.
[2] Bergemann, Dirk and Alessandro Bonatti (2011). “Targeting: Implications
for O- ine vs. Online Media Competition.”, mimeo.
[3] Di Tella, Rafael and Ignacio Franceschelli (2011). “Government Advertising
and Media Coverage of Corruption Scandals.”American Economic Journal:
Applied Economics, forthcoming.
[4] Federal Trade Commission (2010). “Sta¤ Discussion Draft: Potential Policy Recommendations to Support the Reinvention of Journalism”, February 3 2010, http://www.ftc.gov/opp/workshops/news/jun15/docs/newsta¤-discussion.pdf
[5] Gentzkow, Matthew (2007). “Valuing New Goods in a Model with Complementarity: Online Newspapers.”, American Economic Review, 97(3).
[6] Gentzkow, Matthew and Jesse Shapiro (2006). “Media Bias and Reputation”. Journal of Political Economy, 114(2).
[7] Gentzkow, Matthew and Jesse Shapiro (2008). “Competition and Truth in
the Market for News”, Journal of Economic Perspectives, 22(2).
31