When the Ink is Gone: The Transition from Print to Online Editions Ignacio Franceschelli Northwestern University June 9, 2011 VERY PRELIMINAR. DO NOT CITE. 1 Introduction Newspapers are acknowledged as the main providers of original news across the di¤erent media platforms. In this context, the migration of readers from print to online editions has spurred concern regarding how a future with only online editions would a¤ect news coverage.1 The Federal Trade Commission (2010) discusses the challenges faced by journalism in the Internet age. In line with existing consensus the document remains exceptical regarding potential bene…ts Internet might introduce in terms of news coverage. It is argued that the new business model brought by online news has produced a substancial decline in newspaper revenue. The document claims that this …nancial downturn has forced newspapers to …re reporters, event that has caused a signi…cant loss of news coverage. Finally, di¤erent policy proposals such as copyrights for news, antitrust exemptions and …nancial support from the goverment are discussed as possibilities to counteract the negative e¤ect Internet has supposedly brought upon the quality of news readers have access to. This paper brings a …rst empirical assesment of the impact of Internet in news coverage. We are interested in comparing how fast would news be delivered to readers in a print editions only world relatively to an online editions only world. Ignacio Franceschelli, 2001 Sheridan Rd, Northwestern University, Department of Economics, Evanston, IL 60208, US. [email protected]. Tel: (847) 491 8247. I thank Matthew Gentzkow and Igal Hendel for generous help and discussions as well as Pablo Boczkowski, Sebastián Campanario, Peter DiCola, Rafael Di Tella, Gastón Roitberg, Jesse Shapiro and participants at the IO lunch at Northwestern University. 1 Comscore points out that between April 08 and April 09 print only readers in the US have decreased by 12 percent from 48 to 42 millions and print and online readers by 10 percent from 38 to 34 millions. Meanwhile, online only readers have soared by a 26 percent from 28 to 35 millions. Overall, newspapers readers have decreased only by 2 percent from 114 to 112 millions. 1 We start, however, by analyzing the relevance of this issue. We evaluate the extent to which readers value learning about news stories sooner. To perform this analysis we combine a database that includes over one million news stories with daily print and online readership numbers. We …nd that the average age of the news stories reported in the newspaper edition has a negative e¤ect on readership. Readers value learning about news sooner. The e¤ect is of similar magnitude for online and print readers. We then continue to analyze the e¤ect of Internet on the time taken by news to arrive to readers. To conduct this analysis we reconstruct the typical timeline of a news story in a print and in an online world. By timeline we refer to the di¤erent milestones the story undergoes since the actual news event takes place until the last newspaper reports about it. We introduce a model to interpret the news story statistics in terms of the parameters that give place to the timeline. In our framework the speed at which news are reported depends upon how fast does the newspaper gets the story and how quick it is able to publish it. Newspapers are able to obtain news stories from reporters, who genuinely discover stories, and from rewriters, who simply copy stories from the rival. The unique di¤erence we impose between the newspaper platforms is that while print editions have simultaneous daily updates the online editions can be updated anytime. Figure 3 portraits the results. As pointed out by the Federal Trade Commission document we observe that due to reduced reporter sta¤ online editions are slower at genuinely discovering news stories. However, there are two factors downplayed by the document that allow online editions to prevail in the end over their print counterparts. Continuous updates allows them to publish the news story as soon as the newspaper discovers it. Also, rewriters play an important role in online editions by allowing newspapers to reduce the time taken to report a story after the rival broke the news. The exercise shows that in a world populated only with online editions the last reader to learn about a story would do it faster than the …rst reader to do it in a world populated only with print editions. In brief, we …nd that readers value learning about stories sooner and that Internet increases the e¢ cacy of newspapers at performing this task. This result brings a fresh perspective into the existing body of literature which has so far emphasize the negative aspects that Internet has introduced into the newspaper industry.2 2 Identifying news stories The main novelty brought by the project relates to the focus on the news story level. A news story refers to a particular event to which newspaper articles can be associated with (e.g. Saddam Hussein being captured). This approach 2 For example, Athey et al (2011) and Bergemann et al (2011) develop models where Internet negatively a¤ects the level of revenues perceived by newspapers. 2 represents a detour from existing literature which has relied so far on more aggregate measures when studying the newspaper industry.3 The news story level of analysis requires an important e¤ort in terms of data collection and article clustering. We consider every article published by the two main newspapers in Argentina in their online (2003-2009) or print editions (1998-2009). As we will see studying the argentine market presents several advantages for the purposes of this research. First, we developed a program that navigated through the millions of article web pages and identi…ed and downloaded the di¤erent article components such as titles, text and time of publication. The program needed to be ran for several months. Second, we implemented an algorithm based on word frequency to classify a total of more than 1.2 million articles into the news events the articles are about. The algorithm performed over 570 million articles comparisons in order to build the di¤erent news clusters. This data process allowed us to reconstruct the exact timeline for each news story reported by the media. It will allow us to determine which newspaper got the scoop and how long did it took the rival to publish the story. 2.1 Newspaper Industry in Argentina We focus on the two main newspapers in Argentina: Clarin and La Nacion. The availability of high frequency circulation data represents a …rst advantage of studying the argentine newspaper industry. The Instituto Veri…cador de Circulaciones (IVC) provides circulation data per month for each of the seven weekdays.4 We have 7 circulation numbers per newspaper per month, 84 per newspaper per year and 1,008 per newspaper in our sample that consists of a twelve years period. This data is much more detailed than the one available for US newspapers which is aggregated at the quarterly level. Daily data on unique users is available for the online editions from Alexa.com. We aggregate the online readership data at the same level than the print readership data for consistency purposes. A second advantage of studying the argentine market is that most circulation comes from newsstands not from subscriptions. In fact, kiosk sales account for more than 76 percent of newspaper circulation. The high frequency variation in circulation allows us to estimate the sales e¤ect of publishing recent news. Otherwise, the variation between newspapers in scoop generation would vanish as we consider the longer periods associated with changes in the number of subscriptions. Finally, the two newspapers we study take a big portion of the print and online readership market. Clarin circulation represents 36 percent of the print newspaper market and La Nacion has 14 percent of the market. Moreover, together they represent over 70 percent of the non-yellow newspaper circulation. 3 Di Tella and Franceschelli (2011) represents an exception. They study the e¤ect of the reporting of corruption on circulation in Argentina using monthly …gures. 4 The IVC is the Argentina’s equivalent to the US Audit Bureau of Circulations. 3 clarin.com and lanacion.com are also the top online newspapers. clarin.com leads with 40 percent of the online newspaper market while lanacion.com has 21 percent. 2.2 News Clustering In Argentina there is no digital database that contains all the information we need regarding the articles published in newspapers. We require to obtain the title, subtitles, abstract and text for every article ever published in the online or print edition as well as the newspaper section, day and exact time of publication (for online articles). To the best of our knowledge such a complete database does not exist as well in any other media market in the world. However, both Clarin and La Nacion provide readers in their website with an historical archive that displays every article included in their print or online edition. We developed a program based on the iMacros platform to automate the download of all articles in the print edition in the 1998-2009 period and in the online edition in the 2003-2009 period.5 This program navigated through each of the millions of articles and recognized and collected the title, subtitles and text of each article as well as the day, time, edition and section of the newspaper in which it was published. There are 1,006,685 articles published in the print editions and 240,110 articles published in the online editions. Overall La Nacion accounts for 51.5 percent of the print articles and for 55.2 percent of the online articles. For each of the 1,246,795 articles included in the data we want to identify the news event the article refers to. This will allow us to analyze the timeline of each story. We are particularly interested in determining which newspaper got the scoop. Of course, it would be not viable to consider all possible combinations in order to construct the news clusters. Instead, we decided to divide the database into news cycles and compare the articles within each of these time units. 2.3 News Cycles We de…ne a news cycle as the 3 am-3 am twenty four hours period that includes the print editions from two consecutive days as well as the online articles posted within this period. Underlying this decision is the assumption that if a newspaper were to copy a news story in the print edition it would do it at most in the following edition. Print editions close approximately at 3 am. The news cycle de…nition also assumes that if a newspaper were to copy a news story in the online edition it would do it in the same 3 am-3 am window in which the event was …rst reported. Figure 1 provides us with the time distribution of the …rst appearance of news stories online. The news cycle for online news seems to …t well the 3 am-3 am time frame. In particular, we observe that the lowest level of activity is observed in the 2 am-4 am period. As a consequence of the news cycle de…nition the di¤erence between newspapers in publishing a certain news story will be by construction at most one day. 5 The historic archive for the Clarin online edition begins on 2003. 4 For print editions it could either be 0 days –if both newspaper publish the story on the same day- or 1 day –if they do it on di¤erent days. For online editions we have that the time di¤erence can take any value within the [0,1] interval. Further in the paper we will study the conditions under which we should expect the mean time di¤erence to be same under both platforms. It is important to note that the de…nition of twenty four hours news cycles does not imply that news stories can last at most one day. News stories can go on for days, weeks or months as long as the story continues to show up in every consecutive news cycle. However, if a certain event does not appear in the newspapers online or print editions during a news cycle then when it reappears it will be classi…ed as a di¤erent news story. For example, if no reports on the Iraq war appear for a week, when the newspapers return to the subject a di¤erent news story will be created. We interpret news stories not as speci…c topics but as di¤erent pieces of information that might or might not relate to a common topic. In the 1998-2002 period each news cycles contains an average of 456 articles. The …gure goes up to 556 articles in the 2003-2009 period when online articles are incorporated into the sample. 2.4 News Events Following the computer science literature (see Salton 1988) we run an algorithm that relies on word frequency over each of these news cycles to identify the di¤erent news stories. In this paper we decided to consider only the proper nouns included either in the title, subtitles or text of each article. Proper nouns represent unique entities such as people, places or institutions. The dataset is composed of more than 4,300 news clusters each of which includes about 500 articles. Over 56,000 di¤erent proper nouns are used in the database and each newscluster mentions about 1,200 of them. The …rst article in the news cycle is assigned to the …rst news story. The second article is compared to the …rst news story. If similarity –which we de…ne later- in the proper nouns frequency between the …rst and second article surpasses certain threshold then the second article is assigned to the …rst news story. This news story will now include the proper nouns from both the …rst and second article. However, if similarity does not surpass the threshold then the second article is assigned to a second news story. We continue the process for every article included in the news cycle. We then repeat the exercise for every news cycle. The article D is represented by the vector D = (tf1 ; : : : ; tfp ) where tfi is the term frequency of proper noun i in article D. tfi is de…ned for all p proper nouns that appear in any article in the news cycle. Vector D has an average size of 1,200. We use the weight factor log(N=ni ) to weight the term tfi . N is the total number of articles in the database and ni the total number of articles in the database in which the proper noun i appears. This weight factor gives more in‡uence in the word frequency comparison between articles to those proper nouns that are relatively rare in the news. 5 The similarity coe¢ cient between articles D and news story Q is normalized to be in the [0; 1] interval: similarity(Q; D) = ( t X wqk wdk k=1 t X k=1 (wqk )2 t X (wdk )2 k=1 )0:5 (1) Here we de…ne wdk to be: wdk = ( tf k t X log(N=nk ) (tf k log(N=nk k=1 ))2 )0:5 (2) We set the threshold value to be 0.5. If there is no news story Q such that the similarity coe¢ cient is greater than 0.5 then article D is assigned to a new news story. To decrease potential mistakes we only compare articles that appear in the same newspaper section (Sports, Entertainment, Argentina, International, Opinion or General Information). The algorithm identi…es a total of 943,799 news stories out of the 1,246,795 articles included in the data. In the appendix we include several exercises to check for the reliability of the news clustering process. First, we rerun the algorithm without imposing the restriction on comparing only articles within the same newspaper section. The results show that articles from very dissimilar sections are rarely assigned to the same news story. Second, we compute the most important news events according to our database. We de…ne the relevance of a news story as the fraction of articles in an edition the story covers. The results show well de…ned news clusters built around stories that include very important news events such as Lehman Brothers bankruptcy, Benedict XVI inauguration or the assassination of Benazir Bhutto. Through the paper we will consider only those news stories that were reported at some point in time by both online or both print editions. As a consequence a total of 95,961 stories and 299,455 articles will be considered. This approach will allow us to de…ne the time di¤erence in reporting news stories between both newspapers within each platform. It will also permit us to overcome the obvious issue that a newspaper might not report a news event because it is not interested in doing so and not because it does not have yet the story.6 Table 1 and 2 provide statistics for the news stories by newspaper and platform. Figure 2 shows the time di¤erence in reporting news stories for online (in minutes) and print editions (in days).7 6 It is hard to think about a reason why a newspaper would decide to pospone reporting about a story and therefore imposing a risk in the chances of having the scoop. 7 On both …gures negative values correspond to Clarin breaking the news and positive values to La Nacion releasing the scoop. 6 3 The value of getting news faster We do not know the time when the actual news events the stories are about took place. In the next section we will implement an indirect approach to compute the average time taken to report on stories since news events take place. However, computing this same time span for each story included in the data would imply performing an individual research for each story. Given the scope of the project that includes about one hundred thousand stories this task turns out to be unfeasible. We do know, however, for each article included in the data the time gone since the story the article is about was …rst published in each platform. We will then combine this information with daily data on online and print readership to compute the value for readers of getting news faster. We start by de…ning agea as the time gone since the news story reported in article a was …rst published by any of the two newspapers within the platform. agea = Pa Pa0 (3) Pa represents the time of publication of article a and Pa0 represents the time of publication of the …rst article about the story a is about. We further de…ne age to be the average of agea across all articles included in the newspaper edition. age = 1 X agea N a (4) If the newspaper only reports scoops then age takes a value of 0. Alternatively, if the newspaper only reports stories that are one day old then age takes a value of 1. By construction age belongs to the interval [0,1]. We model the indirect utility of consumer i from reading newspaper j on day t, as a function –among other factors- of how old are the articles he has access to.8 uijt = agejt + typejt + i = 1; ::; It j = 1; 2 j + t + jt + "ijt (5) t = 1; :::; T The vector type accounts for the fraction of articles in each newspaper section. j and t are dummy variables for day t and newspaper j. We assume that the "ijt are distributed according to a Type I extreme value distribution. 8 To perform a consistent comparison between online and print editions we de…ne t per month for each of the seven days in the week. We observe daily readership for the online edition, however, circulation data is observed per month for each of the seven days of the week. This implies using data at the quasi-daily level, instead of having 365 observations per newspaper per year we have 84. Therefore, each time we mention the day as unit of time we will be actually be referring to the aggregate over each of the seven days of the week for each month. 7 This assumption implies that the market share of newspaper j in day t is given by: ln(sjt ) ln(s0 ) = agejt + typejt + j + t + jt (6) We proceed to estimate (6) for online and print editions on two separate regressions.9 The results are displayed in Table 3. We observe that the time gone since the news story was …rst released has a negative e¤ect on readership for both online and print readers. Moreover, the e¤ect is of similar magnitude for both platforms. The e¤ect of age on online readers is ^ o = 0:76 and the e¤ect of age on print readers is ^ p = 0:67. Readers value learning about news stories sooner. These results con…rm the relevance of the discussion regarding the relative performance of newspapers in delivering news fast in an online and print world. The results are also -to the best of our knowledge- the …rst ones to link daily newspaper content with daily readership. So far existing literature on newspapers has relate yearly circulation to lower frequency measures such as the political position of the news outlet. Unlike other industries newspapers present day-to-day variation in their content. Each day the newspaper sta¤ aims to include in the edition news stories that will interest their readers. This daily process, that is at the core of the newspaper industry, has never been studied before with the use of a large database. Finally, the importance of fresh news to readers also give us a motivation to explore the role of scoops. This role has been discussed in relation to its effect over the reputation of newspapers (see for example Genztkow and Shapiro, 2008). In this paper we focus, instead, on an alternative mechanism by which scoops generate value to readers: breaking news allows newspapers to provide readers with fresh news. As long as readers value learning about news stories sooner -as the results showed- then breaking stories should have a positive daily impact on readership. Moreover, the di¤erent update technology avaible suggest that scoops should have a larger e¤ect over print readers than over online readers. On the one hand, daily updates imply that within the print plaform not getting the scoop means that the news story will be one day old when the newspaper …nally reports it. On the other hand, the data shows that non scoop stories will be only one tenth of a day old in online editions. We de…ne scoop as the fraction of articles that include a scoop in the newspaper edition. As previously discussed, daily updates implies that for print editions we have a perfect relationship between age and scoop (as can be observed in Figure 2b). age = (1 scoop) (7) 9 We have circulation numbers for print editions from the Instituto Veri…cador de Circulaciones (IVC). These numbers are available for the 1998-2009 period. The data on unique users for the online editions was obtained from Alexa.com. However, the historic data from Alexa.com at the daily level is available for October 2007-2009 period. This explains the lower number of observations for online editions. 8 Expression (7) implicates that the e¤ect of scoop on readership is given by ^ p = 0:67, the negative value of the coe¢ cient computed in Table 3. Meanwhile, continuous updates implies that there is not a perfect relationship between age and scoop for online editions (as can be obsserved in Figure 2a). The value of agea varies for di¤erent non scoop articles. Nevertheless, we know from Table 2 that agea for non-scoop articles will be on average 0.10 days, a mere 10 percent of the 1 day value we observe for print editions. Therefore, we have that E(age) = (1 scoop) 0:1 (8) The disparity between age and scoop for online editions allow us to perform an additional test to provide further evidence of the relevance of providing faster access to news stories. Using (8) it is possible to express equation (6) as a function of scoop for online editions. ln(sjt ) ln(s0 ) = = = 10 10 (1 scoopjt ) + typejt + scoopjt + typejt + scoopjt + typejt + j j + j + t + t + t + + jt (9) jt jt We proceed to estimate (9) for online editions. The results are displayed in Table 4. scoop has a lower impact on reducing age in comparison to what we observe for print editions. Within the online platform scoops are not that much needed in order to publish fresh news. Of course, this is not the case for print editions where the newspaper needs to publish the scoop or otherwise the story will be one day old when reported. If we also consider that the novelty of news has similar e¤ect over online and print readership -we have similar ^ valuesthen the model predicts a smaller e¤ect of scoop on online readers. o p = (10) 10 The results con…rm this notion. The e¤ect of scoop on print readers is ^ p = 0:67 (Table 3) and the e¤ect of scoop on online readers is only ^ o = 0:14 (Table 4). Although the di¤erence is not as big as predicted by the model, it remains large. Online and print readers prefer to learn about stories sooner. This result gives relevance to the comparison on the speed to deliver news in a print and in an online world that we will perform in the next section. Nevertheless, it is also worth noting the importance of the documented reduction in the value of scoops in the online era. This phenomenon introduced by a change in the updating technology available for newspaper editions might bring huge changes in the newsroom composition. Indeed, anecdotal evidence suggest the existence of less reporters and more rewriters in the online editions 9 sta¤ in comparison to the print editions sta¤. One of the newspapers analyzed in this paper has about a 500 people sta¤ in the print edition and a 50 people sta¤ in the online edition. More importantly, average wages in the online edition are only 1/3 of the ones observed in the print edition. This issue will also be studied in the next section. 4 News stories timeline Our aim in this section is to compare the performance of online and print editions in delivering news stories. To make such a comparison we will study the typical timeline of a news story within each platform. The timeline will show the di¤erent stages the news story goes until the last newspaper publish an article about it. We start by presenting some …gures obtained from the data and we then introduce a framework to interpret these empirical …ndings in terms of the parameters needed to reconstruct the news story timeline. Table 1 and 2 provide statistics for the news stories by newspaper and platform. While we observe similar …gures for both newspapers within each platform, important di¤erences arise once we compare across platforms. Table 1 provides information regarding the …rst outlet to break a story. We observe that 7 percent of the stories were …rst reported by Clarin print edition, 7 percent by La Nacion print edition and 19 percent by both print editions on the same day. Meanwhile, 32 percent of the stories were …rst reported by clarin.com, 34 percent by lanacion.com and 1 percent by both online editions on the same minute. So, while we see similar success rates at breaking stories by both newspapers within each platform there is a huge disparity between the 33 percent of stories that …rst appeared in a print platform and the 67 percent that did it online. A similar pattern is observed when we focus on Table 2 which shows the time taken to publish stories broke by the rival. We observe that clarin.com takes 0.11 days to publish a story that was …rst reported by lanacion.com and lanacion.com takes 0.09 days to publish a story that was …rst reported by clarin.com. The disparity, however, arises when we compare the 0.31 day di¤erence for print editions with the 0.10 day di¤erence for online editions. We now introduce a model in order to interpret the results displayed by Tables 1 and 2 in terms of the performance of online and print platforms in delivering news. In line with the data …ndings, through the model we assume that both newspapers share the same technology to obtain and publish stories within each platform. Nevertheless, we allow for potential disparities to emerge when we compare across platforms. We have two newspapers –Clarin and La Nacion- each with a website and a paper edition. We treat each platform within the same newspaper as independent of each other. While each platform has its own sta¤, we are aware that strategic decisions at the newspaper level might undermine the independence assumption. The time of publication of news stories will depend on two factors: how fast does the newspaper gets the story and how fast can the newspaper update its 10 edition to publish the acquired story.10 We start by assuming that online and print platforms share the same technology to obtain stories. For this to happen we begin the analysis in a situation where newspapers can only rely on reporters to obtain stories. Reporters, as opposed to rewriters, genuinely discover stories and do not steal contents from the rival. As a consequence, online and print editions only di¤er on the updating technology available. While print editions have simultaneous daily updates (at midnight), continuous updates are available for online editions. We assume that news events take place uniformly along the day, where the day is represented in the [0; 1] interval. The sta¤ of reporters from each newspaper and platform takes Di days to discover a story since the actual event took place. Here i = o for online editions and i = p for print editions. We de…ne Di to have a uniform distribution that depends on Di which re‡ects the relative ability of each platform to get scoops, which we associate to the number of reporters in their sta¤. Hence, we have Di U [0; Di ]. As previously mentioned for the time being Do = Dp = D. We also assume Dp < 1. It is important to note that even if online and print editions shared the same technology to discover stories we should not expect them to have similar success at breaking news. Naturally, continuous updates gives online editions a huge edge over print editions in being the …rst ones to report stories. This issue is re‡ected in Proposition 1. Proposition 1 If online and print editions share the same technology for obtaining stories then print editions should break less than 13 percent of the stories. Proof. We have two newspapers within each platform. The time taken by each platform to discover a story is given by wi = minfD1i ; D2i g. It is possible to check that wi has a triangle distribution given by the density function, 8 2wi < 2 if 0 wi D i 2 (11) f (w ) = D D : 0 otherwise In this context, online editions will have the scoop about every story that originates early in the day. The online editions will discover -and instantly publish- the story before print editions even get the chance to update their editions. This will happen for stories with T such that T < 1 D. Print editions can only hope to capture a fraction of those stories that occur later in the day. For the print editions to get a scoop they need to discover the story before midnight (wp + T < 1) and also need the online editions to get it after midnight (wo + T > 1). The …rst (second) probability is captured by the …rst (second) term in equation (12). 1 0 Getting the story implies not just learning about the news event but also writing an article about it. 11 The fraction of stories broke by any of the two print editions is given by, 9 9 8 D 8 > Z1 < 1Z T = > <Z = Sp = f (wp )dwp f (wo )dwo dT (12) ; > : > : ; 1 D = 0 1 T 2 D 15 We observe that for events with T close to 1 D or 1 the chances of the print edition getting the scoop are slim. In the …rst case, the chances online editions do not get the story before midnight tends to zero. In the second case, the chances print editions get the story before midnight tends to zero. So, the probability of any print edition breaking the story increases for events that occur away from both thresholds. Given that D < 1 equation (12) implies that print editions should get less than 13 percent of the scoops. As pointed out by Table 1 the data shows that although online editions have most of the scoops, print editions break much more stories than the model prediction. Indeed, 33 percent of the scoops are released by the print editions, a value well above the 13 percent bound. Within the model framework this disparity suggests that Dp < Do , what would imply that print editions have a larger reporter sta¤. The second piece of evidence that we can use as a reference regarding the technology available for each platform to acquire news concerns the time di¤erence in reporting stories. We continue to consider a situation where reporters are the only source for stories and where we have Do = Dp = D. In this scenario, it is possible to check that the expected time di¤erence in publishing a story between Clarin and La Nacion is the same for both the online and print platforms. We will formalize this result. Nevertheless, it is useful to start by stating the intuition behind this conclusion. On the one hand, a news story that is discovered at say 11 pm and at 1 am respectively by the two newspapers would be published with a 2 hours di¤erence in the online editions and a 24 hour di¤erence in the print editions (because of the midnight deadline). On the other hand, a news story that is discovered at say 9 pm and at 11 pm would be published with a 2 hour di¤erence in the online editions and a 0 hours di¤erence in the print editions. In the end, if news events originate uniformly along the day then both e¤ects cancel each other out and the expected time di¤erence in the reporting of news stories results to be the same for print and online editions. Proposition 2 If online and print editions share the same technology for obtaining stories then the expected time di¤ erence between newspapers within each platform in publishing a story should be the same. 12 Proof. The time di¤erence between platforms in discovering a story is given by v i = P i = jD1i D2i j. It is possible to check that v i has a triangle distribution given by the density function, 8 2v i < 2 if 0 v i D i 2 (13) f (v ) = D D : 0 otherwise The di¤erence between discovering and reporting a story only arises for print editions which have to wait until midnight to update their edition. For online editions, continuous updating implies that the time di¤erence in …rst publishing a story is equal to the time di¤erence in the discovery of the story by each newspaper. Therefore for the newspaper websites we have, Z1 ZD = f (v o )dv o dT o E( P ) (14) 0 0 D 3 = Meanwhile, given the midnight deadline for print editions we have a zero di¤erence if both news outlets discover the story before midnight (scenario A) and a one day di¤erence if they do it on di¤erent days (scenario B). E( P p ) = E( P p jA)P (A) + E( P p jB)P (B) = 0P (A) + 1P (B) = P (B) (15) Therefore, the expected time di¤erence in publishing a news story for print editions is going to be the probability that the news story is discovered on di¤erent days. It is possible to check that, P (B) = Z1 ZD f (v p )dv p dT (16) 1 D1 T = D 3 Then we have, E( P p ) = 13 D 3 (17) Once again the evidence contradicts the model prediction. Table 2 shows that the average P p = 0:31 is much larger than the observed average P o = 0:10. This piece of evidence suggests that Dp > Do , what would imply that online editions have a larger reporter sta¤. So, the relatively high fraction of scoops released by print editions suggests that paper platforms have more reporters in their sta¤ than online platforms. However, the small time di¤erence between online editions in reporting stories suggests exactly the opposite. We will then incorporate into the model the possibility that newspapers also rely on rewriters to obtain stories as a way to reconciale these seemingly contradictory empirical …ndings. Hiring rewriters will give newspapers the possibility of copying stories from the rival. Each rewriter sees the story in the rival outlet as soon as it is published and takes C U [0; C] days to rewrite it. Within the model framework for print editions hiring rewriters does not change either the fraction of scoops released or the the time di¤erence in the publication of stories. Dp < 1 implies that hiring rewriters will not allow the newspaper to publish the story any sooner (because of daily updates). The same is of course not true for online editions. It is then possible to estimate Dp by using the observed average P p = 0:31. By using equation (17) we compute Dp = 0:93 which satis…es the restriction Dp < 1. By de…nition, the existence of rewriters does not a¤ect the fraction of scoops released by online and print editions. These proportions depend only on the presence of reporters. We now allow for Dp 6= Do and recalculate the proportion of stories that would be broke by print editions in this new scenario. It is possible to check that if Do > 1 then the fraction of scoops for print editions is given by, S p = o 1 RD p D R 0 1 R1 f (wo )dwo dT + Dp 2 1R T 2R T f (wp ) f (wo )dwo dwp dT +(18) 1 Dp 0 1 T 1 T o D R R1 f (wo )dw2 dT T The …rst term in equation (18) accounts for stories that will be for sure discovered by the print edition, but that might not be discovered by the online edition before midnight. The second term accounts for stories that might be discovered before midnight by the print edition but not by the online edition. The third term accounts for stories that might not be discovered by the online edition even the day after they originated. All these three scenarios constitute instances where the print edition gets the scoop. Making expression (18) equal to the observed print scoop fraction S p = 0:33 and solving the equation numerically give us a value Do = 1:80 when we use Dp = 0:93. This value implies that print editions are 48 percent faster in genuinely discovering stories than their online counterparts. Nevertheless, continuous updates allow online editions to have 67 percent of the scoops. 14 We will now use these …ndings to incorporate rewriters to the model. Proposition 3 The existence of rewriters explains the lower time di¤ erence in publishing stories observed for online editions. Proof. For online editions we now have that the time di¤erence in …rst reporting a story is given by the minimum time taken either by the reporters sta¤ or the rewriters sta¤ to get the story. E( P o ) = E(minf Do ; Cg) = RCRC v 00 0 f o (v)dvdC + C RC C C (19) RCRv C 00 dC o D R C f o (v)dCdv + f o (v)dv C The value C = 0:22 makes equation (19) equal to the observed P o = 0:10. The estimated numbers predict that 94 percent of the stories published but not broke by an online edition are being stolen by rewriters and only 6 percent of them are genuinely discovered by reporters. Of course, reporters remain as a valuable resource for online editions in terms of providing scoops. We now have all the components required to reconstruct the typical timeline of a news story in the online and print platforms. The estimated parameters values are Dp = 0:93, Do = 1:80 and C = 0:22. With these numbers it is possible to check that online editions are faster than print editions in delivering news. Indeed, the last online edition to publish the story is faster than the …rst print edition to do it. Proposition 4 Online editions are faster at delivering news than print editions. Proof. The time taken by the …rst online edition to discover and publish the story is given by, E(minfD1o ; D2o g) = = Do 3 0:60 (20) And the extra time taken by the second (last) online edition to publish the story is, E( P o ) = 0:10 (21) Meanwhile for print editions the time taken by the …rst newspaper to discover the story is, E(minfD1p ; D2p g) = = 15 Dp 3 0:31 (22) This number di¤ers -due to daily udpates- with the time taken to publish the story, E(minfP1p ; P2p g) = 1 RD p (1 T )dT + 0 R1 1 p D R (2 T) R1 Dp 1R T (1 T) f (wp )dwp dT + (23) 0 f (wp )dwp dT 1 Dp 1 T = 0:81 And the extra time taken by the second (last) print edition to publish the story is, E( P p ) = 0:31 (24) The results are displayed in Figure 3. Online editions are slower at discovering stories. It only takes 0.31 days for the …rst print edition to discover the news event since it took place. The …gure goes up to 0.60 for online editions. Within the model framework this disparity is interpreted as a lower number of reporters in the sta¤ of online editions. Nevertheless, daily updates increase by 0.50 days the time taken by print editions to publish stories. As a consequence, instant updates allows online editions to break news stories faster than their print counterparts. The …rst online edition to report the story takes 0.60 days, while the …rst print edition to do it takes 0.81 days. We also observe that the time di¤erence in the reporting of stories is much smaller for online editions. The presence of rewriters coupled with instant updates allows online editions to reduce this di¤erence from 0.31 days to only 0.10 days in comparison to the print editions. The results predict that in a world only with online editions the last reader to learn about a story would do it faster than the …rst reader to do it in a world only with print editions. Contrary to popular belief Internet has enhanced news coverage. Continuous updates and the existence of rewriters allows online readers to have faster access to news despite the fact that online editions have reduced the number of reporters in their sta¤. 16 Table 1 Fraction of stories by newspaper and platform that had the scoop (03-08) Print Online CL LN Both All CL LN Both All …rst …rst …rst stories …rst …rst …rst stories 0.07 0.07 0.19 0.33 0.32 0.34 0.01 0.67 17 Table 2 Time di¤erence in reporting news stories within each platform (in days) Print Online CL LN Both All CL LN Both All …rst …rst …rst stories …rst …rst …rst stories 1 1 0 0.31 0.11 0.09 0 0.10 (0.00) (0.00) (0.00) (0.46) (0.13) (0.11) (0.00) (0.12) Standard Errors in Parenthesis 18 Table 3 Age of News and Readership Online Print (07-09) (98-09) Age -0.841*** -0.761*** -0.853*** -0.667*** (0.183) (0.251) (0.117) (0.112) Day FE YES YES YES YES Newspaper FE YES YES YES YES Type FE NO YES NO YES Adjusted R2 0.97 0.98 0.93 0.94 N of Observations 406 406 2,016 2,016 N of Day E¤ects 203 203 1,008 1,008 Standard Errors in Parenthesis 19 Table 4 Scoops and Readership Online (07-09) Scoop 0.193*** 0.144*** (0.036) (0.048) Day FE YES YES Newspaper FE YES YES Type FE NO YES Adjusted R2 0.97 0.98 N of Observations 406 406 N of Day E¤ects 203 203 Standard Errors in Parenthesis 20 Figure 1 - Distribution of Stories by First Time of Publication (Online Editions) 21 Figure 2a - Time Di¤erence in the First Publication of News Stories (Online Editions) 22 Figure 2b - Time Di¤erence in the First Publication of News Stories (Print Editions) 23 Last online edition publishes the story Online Editions 0.6 0 News event takes place days 0.7 First online edition discovers /publishes the story (daily updates) Print Editions Last print edition publishes the story 0 0.3 0.8 News event takes place First print edition discovers the story First print edition publishes the story 1.1 Figure 3 - Online and Print editions e¢ cacy at delivering news 24 days A Appendix In order to check the reliability of the news clustering process we rerun the algorithm for 1998 without imposing the restriction on comparing only articles within the same newspaper section. If articles from very dissimilar sections (e.g. Sports and any other section) are assigned to the same news story then chances are it is an error. Table A1 provides us with the results for those news stories with exactly two articles. We observe that 13 percent of the matches correspond to articles from di¤erent sections. Still, some of these articles could be correctly assigned to the same news story. Some sections such as Argentina and General Information have blunt limits. Nevertheless, note that the …gure goes down to just 4 percent when we focus on the Sports section, for which links to articles appearing in other sections are harder to justify. An additional exercise to check the reliability of the clustering process is to compute the most important news event according to our database. For this we need to de…ne a criterion for news story importance. Each day dozens of news stories are published by the media, but the importance of each of them various enormously. We use the fraction of articles in an edition that refers to the news story as a measure of the relative importance of di¤erent news stories. Results are shown in Tables A2 and A3 for the print editions and in Tables A4 and A5 for online editions. It is important to note that of the forty news clusters included in the tables only two cannot be related to a speci…c event –and therefore cannot be classi…ed as news events. The assassination of Benazir Bhutto –a former Prime Minister of Pakistantops Table A2 having occupied 8.8 percent of the articles in La Nacion print edition on December 28, 2007. Elections in the US, war in Irak as well as the Lehman Brothers bankruptcy, Benedict XVI inauguration and the Clara Rojas liberation by the FARCs constitute the remaining top international stories. Additional stories that make it to the top 10 in the online editions include the death of Pinochet, violence in the Middle East and elections in Chile and Venezuela. Table A3 provides us with equivalent …gures for national news events, which might not be known for those not acquainted with argentine news. The investiture of President Rodriguez Saá after De la Rua resignation constitutes the most important event. This story occupied 16.6 percent of the articles in La Nacion print edition on December 24, 2001. The election of presidents and the change of Ministers of Economy take most of the remaining spots. Regarding online editions we have some sports stories that also make it to the top 10. 25 Table A1 Section of News Stories with exactly two Articles (1998) Spor Intl Arg Ent Gral Sports 1,813 10 24 10 34 International 426 52 5 64 Argentina 1,114 8 238 Entertainment 306 59 General Information 823 Opinion 26 Opi 5 35 87 5 44 35 Table A2 Top 10 International News Stories Print Editions (98-08) News Story Section Paper Date Benazir Bhutto Assassination Intl LN 28 Dec 07 Bush wins Reelection Intl LN 4 Nov 04 Lehmann Brothers Bankruptcy Intl CL 16 Sep 08 Saddam Hussein is captured Intl LN 16 Dec 03 Bagdad is bombed Intl LN 18 Dec 98 Benedict XVI assumes Intl LN 20 Apr 05 FARC frees Clara Rojas Intl LN 11 Jan 08 Saddam Hussein is captured Intl LN 15 Dec 03 Bush takes o¢ ce Intl LN 21 Jan 01 Bush wins Presidential Election Intl LN 15 Dec 00 27 Fraction 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.07 0.07 0.07 Table A3 Top 10 International News Stories Print Editions (98-08) News Story Section Paper Rodriguez Saa chosen as President Arg LN Cristina Kirchner Inauguration Arg LN Yabran suicide Arg LN De la Rua - unde…ned story Arg CL Presidential Elections in Argentina Arg CL Cavallo assumes as Min. of Economy Arg LN Lavagna resigns as Min. of Economy Arg LN Kirchner to announce Cabinet Arg CL Cobos unties voting in the Senate Arg CL Cristina Kirchner Inauguration Arg CL 28 Date 24 Dec 01 11 Dec 07 21 May 98 2 May 00 28 Apr 03 21 Mar 01 29 Nov 05 16 May 03 18 Jul 08 11 Dec 07 Fraction 0.16 0.15 0.15 0.14 0.13 0.12 0.12 0.12 0.11 0.11 Table A4 Top 10 International News Stories Online Editions (03-08) News Story Section Paper Date Saddam Hussein is captured Intl LN 14 Dec 03 Dictator Pinochet dies in Chile Intl LN 10 Dec 06 Benedict XVI chosen as Pope Intl LN 19 Apr 05 Bush wins Presidential Election Intl CL 3 Nov 04 Benedict XVI chosen as Pope Intl CL 19 Apr 05 Israeli Attacks Gaza Intl LN 27 Dec 08 Plebiscite in Venezuela Intl LN 14 Aug 04 Violence in Middle East Intl LN 22 Mar 04 Bachelet wins Election in Chile Intl LN 15 Jan 06 Bush announces War in Irak Intl CL 19 Mar 03 29 Fraction 0.22 0.19 0.19 0.18 0.18 0.18 0.18 0.17 0.15 0.14 Table A5 Top 10 International News Stories Online Editions (03-08) News Story Section Paper Cobos unties voting in the Senate Arg LN ONGs - unde…ned story Gral LN Lavagna resigns as Min. of Economy Arg LN Argentina plays against Peru Spor LN Nalbandian wins Master at Shangai Spor LN Argentina plays against Colombia Spor LN Gaudio wins Roland Garros Spor LN Cristina Kirchner wins Election Arg LN Tragedy at Cromagnon Discotheque Gral CL Losteau resigns as Min. of Economy Arg LN 30 Date 17 Jul 08 25 Sep 03 28 Nov 05 8 Jul 07 20 Nov 05 1 Jul 07 6 Jun 04 28 Oct 07 31 Dec 04 3 Nov 04 Fraction 0.28 0.23 0.22 0.21 0.21 0.21 0.19 0.19 0.18 0.18 References [1] Athey, Susan, Emilio Calvano and Joshua S. Gans (2011). “Will the Internet Destroy the News Media? or Can Online Advertising Markets Save the Media?”, mimeo. [2] Bergemann, Dirk and Alessandro Bonatti (2011). “Targeting: Implications for O- ine vs. Online Media Competition.”, mimeo. [3] Di Tella, Rafael and Ignacio Franceschelli (2011). “Government Advertising and Media Coverage of Corruption Scandals.”American Economic Journal: Applied Economics, forthcoming. [4] Federal Trade Commission (2010). “Sta¤ Discussion Draft: Potential Policy Recommendations to Support the Reinvention of Journalism”, February 3 2010, http://www.ftc.gov/opp/workshops/news/jun15/docs/newsta¤-discussion.pdf [5] Gentzkow, Matthew (2007). “Valuing New Goods in a Model with Complementarity: Online Newspapers.”, American Economic Review, 97(3). [6] Gentzkow, Matthew and Jesse Shapiro (2006). “Media Bias and Reputation”. Journal of Political Economy, 114(2). [7] Gentzkow, Matthew and Jesse Shapiro (2008). “Competition and Truth in the Market for News”, Journal of Economic Perspectives, 22(2). 31
© Copyright 2026 Paperzz