How Does Digitization Affect Scholarship? Mark McCabe University of Michigan Roger Schonfeld Ithaka Christopher Snyder Dartmouth College December 11, 2007 What Characteristics Are Important to Authors? Journal Characteristics Important to an Author When it comes to influencing decisions about journals in which to publish an article of yours, how important to you is each of the following possible characteristics of an academic journal? a) The journal makes its articles freely available on the Internet, so there is no cost to purchase or to read. b) The journal permits scholars to publish articles for free, without paying page or article charges. c) Measures have been taken to ensure the protection and safeguarding of the journal’s content for the long term. d) The current issues of the journal are circulated widely, and are well read by scholars in your field. e) The journal is highly selective; only a small percentage of submitted articles are published. f) The journal is available to readers not only in developed nations, but also in developing nations. Preferences for Academic Journals, 2006 Percent of faculty who believe that each characteristic is “very important” in influencing the decisions where to publish their articles Wide circulation and reading No cost to publish Preservation is assured Highly selective Accessible in developing world Available for free 0% 20% 40% 60% 80% 100% Background on the Present Study Objectives • What are the scholarly impacts of various business models for journal publishing? • How do various business models for journal publishing affect the value derived by authors and readers? Natural Experiment • Beginning in 1995 publishers and content aggregators began digitizing current and archival content and placing it online. • However, as late as 2005 (the endpoint of our analysis) backfiles for many journals (and current content in some cases) remained offline. • We exploit this heterogeneous chronology to explore the impact of online access. Previous Studies • Many previous studies of this relationship find large effects • Common flaws: these efforts do not adequately control for potential selection problems affecting article quality, do not use adequate statistical methods, or both • For example, did the best journals, at least in some disciplines, gain an online presence earlier? • This study avoids these problems: Variation in journal quality for content published prior to 1995 is unlikely to be related to online strategies adopted by publishers after 1995. Some Empirical Questions • What is the impact of online access on journal citation rates? • Are the benefits greater for newer or older content? • Are the effects discipline-specific? • Which online “channels” have the greatest impact? • Is the geographic and institutional distribution of citing authors influenced by online access? People, Funding, and Timeline • Researchers • Mark McCabe, Professor of Economics, University of Michigan – Principal Investigator • Chris Snyder, Professor of Economics, Dartmouth – Co-Principal Investigator • Roger Schonfeld, Manager of Research, Ithaka • Funded by a grant from The Andrew W. Mellon Foundation • Data collection is completed, analysis is underway, full findings are expected to become available by mid 2008 Our Data Our Data • Three Disciplines • History • Economics and Business • Biological and General Sciences • Hundreds of publishers, aggregators, and archives provided data • 100 journals in each discipline, comparing journal-year by journal-year • 50 that were digitized early on • 50 that were digitized only more recently or not at all • Examine citations TO these journals that appeared in ANY journal from 1980 to 2005 • Complete citation databases obtained from ISI Descriptive Statistics ECONOMICS Obs Mean Std dev Min Max 99 1956.8 27.8 1844 1988 3,449 1985.9 12.9 1956 2005 Citation year 58,429 1994.9 7.1 1980 2005 Citations to journal-publication-year in a year 58,429 37.0 60.0 0 771 Year journal first published Publication year SCIENCE Obs Mean Std dev Min Max 98 1936.2 51.4 1665 1991 3,895 1977.6 23.3 1900 2005 Citation year 71,734 1994.4 7.3 1980 2005 Citations to journal-publication-year in a year 71,734 302.0 1,802.1 0 32,589 Year journal first published Publication year Skewed Distribution of Citation in Economics • About 4,700 zeros, one had 771 cites 5,000 4,500 Frequency 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0 0 100 200 300 400 500 600 Citations to journal-publication-year in a year 700 800 900 Skewed Distribution of Citations in Science • About 5,500 zeros, one had 32,500 cites 6,000 Frequency 5,000 4,000 3,000 2,000 1,000 0 0 5,000 10,000 15,000 20,000 25,000 Citations to journal-publication-year in a year 30,000 35,000 Online Availability for 1980 Content Titles Mean St Dev Min Max Economics (82 journals published in 1980) JSTOR 39 2000.8 2.6 1996 2005 ProQuest 14 2003.0 1.6 2001 2005 Ebsco 34 2002.2 1.5 2001 2005 Publisher Website 19 2001.9 0.2 2001 2002 JSTOR 21 1999.4 1.2 1996 2000 Ebsco 9 2003.9 1.3 2001 2005 PubMed Central 2 2004.0 0.0 2004 2004 22 2003.8 1.3 1999 2005 Science (74 journals published in 1980) Publisher Website Geographic Distribution of First Authors of Articles that Cite Other Articles Science Cites (000) % Econ Cites (000) % English-Speaking Countries* 9,187 59.19 1,308 77.53 Non-English-Speaking Western Europe** 3,622 23.34 251 14.89 Rest of the World 2,711 17.47 128 7.57 Total Cites 15,521 1,687 * US, England, Canada, Australia, Scotland, New Zealand, Wales, Ireland, Northern Ireland ** Germany, Netherlands, France, Spain, Italy, Sweden, Belgium, Norway, Switzerland, Denmark, Finland, Austria, Greece, Portugal, Czech Republic, Slovakia. Challenges • ISI data requires extensive clean-up and quality control • Many publishers and aggregators maintain poor records of their journals’ online histories • First authors are confusing and require more consideration Findings Regression Outputs . xtreg lncit1 age* cyr* d2* js2* ow2*, i(articlegroup) fe robust; Fixed-effects (within) regression Number of obs = 54665 Group variable: articlegroup Number of groups = 99 R-sq: within = 0.4435 Obs per group: min = 52 between = 0.0890 avg = 552.2 overall = 0.2605 max = 975 F(102,54464) = 376.66 corr(u_i, Xb) = -0.0774 Prob > F = 0.0000 (Std. Err. adjusted for clustering on articlegroup) -----------------------------------------------------------------------------| Robust lncit1 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age1 | .5925995 .0194082 30.53 0.000 .5545593 .6306398 age2 | .9779868 .019672 49.71 0.000 .9394294 1.016544 age3 | 1.132953 .0197409 57.39 0.000 1.09426 1.171645 age4 | 1.146659 .019613 58.46 0.000 1.108217 1.1851 age5 | 1.148625 .0197884 58.05 0.000 1.10984 1.187411 age6 | 1.118217 .0195578 57.18 0.000 1.079883 1.15655 age7 | 1.05887 .0196327 53.93 0.000 1.02039 1.097351 age8 | 1.026378 .0195007 52.63 0.000 .9881561 1.064599 age9 | .9633523 .0196253 49.09 0.000 .9248864 1.001818 age10 | .8995837 .0200633 44.84 0.000 .8602596 .9389078 age11 | .8377198 .0198925 42.11 0.000 .7987304 .8767093 age12 | .7902542 .020094 39.33 0.000 .7508698 .8296386 age13 | .7135656 .020046 35.60 0.000 .6742754 .7528558 age14 | .6518025 .0204853 31.82 0.000 .6116512 .6919538 age15 | .5977419 .020616 28.99 0.000 .5573344 .6381494 age16 | .5455455 .0207872 26.24 0.000 .5048025 .5862885 age17 | .5060501 .020825 24.30 0.000 .465233 .5468672 age18 | .4332353 .0211407 20.49 0.000 .3917994 .4746713 age19 | .3762387 .0215208 17.48 0.000 .3340578 .4184197 age20 | .3139517 .0219721 14.29 0.000 .2708861 .3570172 age21 | .3044858 .022119 13.77 0.000 .2611325 .3478392 age22 | .2190796 .0225092 9.73 0.000 .1749615 .2631978 age23 | .1970334 .0232404 8.48 0.000 .1514821 .2425847 age24 | .1424866 .0237271 6.01 0.000 .0959813 .1889918 age25 | .1347377 .0243322 5.54 0.000 .0870464 .182429 age26 | .0516184 .0250276 2.06 0.039 .002564 .1006727 age27 | .0225138 .0250947 0.90 0.370 -.026672 .0716997 age28 | -.0259718 .0253744 -1.02 0.306 -.0757059 .0237622 age29 | -.0632298 .0264435 -2.39 0.017 -.1150593 -.0114004 age30 | -.1099393 .0276293 -3.98 0.000 -.1640929 -.0557856 Science Journal Citations Peak in Year Three 9 95% confidence interval Citations relative to age 49 8 7 6 5 4 3 2 1 0 0 5 10 15 20 25 30 Years since publication 35 40 45 Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any presence, restricted to 1956-2005 publication years Economics Journal Citations Peak in Year Five 9 95% confidence interval for science 8 Citations relative to age 49 7 95% confidence interval for economics 6 5 4 3 2 1 0 0 5 10 15 20 25 Years since publication 30 35 40 45 Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any presence, restricted to 1956-2005 publication years Preliminary General Findings • Citation levels more than double in both disciplines over the sample period, 1980-2005. • There is an increase in citations as a result of digitization and online availability. Highly significant, both for pre-1995 content (digitized backfiles) and born-digital periods. Disciplinary Differences • Citation rates peak earlier in science (3 years) than in economics (5 years); the subsequent decline in citations is more rapid in science. • Online access is associated with an average increase in citations of about 10% for economics and 20% for science titles. • However, the changes in citations observed over time is an order of magnitude larger than the measured impact of online access. For Science, Online Access Boosts Citations 20% Overall 9 Citations relative to age 49 8 Online 7 6 Offline 5 4 3 2 1 0 0 5 10 15 20 25 Years since publication 30 35 40 45 Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any presence, restricted to 1956-2005 publication years For Economics, Online Access Boosts Citations 10% Overall 9 8 Citations relative to age 49 7 6 5 Online 4 Offline 3 2 1 0 0 5 10 15 20 25 30 35 40 45 Years since publication Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any presence, restricted to 1956-2005 publication years Channel Effects • For Science: JSTOR and publisher portals are important, but not other 3rd party channels (except for the period 95-97). • For Economics, all types of channels have a significant impact. • Longer embargo periods clearly decrease the ability of a given channel to increase citations. HIGHLY PRELIMINARY: Geographic Effects on Citation Growth over Time • Rate of citation growth for biology is much higher (double) in non-English-speaking countries. • Rate of citation growth for economics is moderately higher in non-English-speaking countries. • Implication: Are these disciplines growing faster in non-Englishspeaking countries? Impact of Digitization for Science – Publisher Website 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% USA Other English Non-English Europe Other Non-English 1998 1999 2000 2001 2002 2003 2004 2005 Impact of Digitization for Science – JSTOR USA Other English Non-English Europe Other Non-English 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1998 1999 2000 2001 2002 2003 2004 2005 Impact of Digitization for Science – Aggregators USA Other English Non-English Europe Other Non-English 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1998 1999 2000 2001 2002 2003 2004 2005 Impact of Digitization for Economics – Publisher Website USA Other English Non-English Europe Other Non-English 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1998 1999 2000 2001 2002 2003 2004 2005 Impact of Digitization for Economics – JSTOR USA Other English Non-English Europe Other Non-English 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1998 1999 2000 2001 2002 2003 2004 2005 Impact of Digitization for Economics – Aggregators USA Other English Non-English Europe Other Non-English 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 HIGHLY PRELIMINARY: Geographic Effects on Citation Patterns • Science: The channel impact is about twice as large in the nonEnglish speaking countries (e.g. overall a 30% increase versus 15%). • Economics: The channel impact is about twice as large outside of the developed English-speaking countries (~20% increase versus less than 10%). • There is much we can learn from various models for the distribution of content and their relative strengths over time. Further Questions and Discussion Further Questions • Does year of source-item publication matter? • Will references to older articles increase more than references to more recently published articles? • Have self-citation patterns changed? • Presumably we will find no effect, an important confirmation of our data and analytical framework. Findings and Discussion • We find a consistent significant impact from digitization. At the same time, it is an order of magnitude less than the changes observed over time. Is the impact “large” or “small” and what implications if any are there? • The impact is greater in science than in economics. Why? What are the implications? • The impact is greater outside of the English-speaking countries. Why? What are the implications? • Channel effects are dramatic. What are the implications? How Does Digitization Affect Scholarship? Roger C. Schonfeld [email protected] (212) 500 – 2338 www.ithaka.org/research/citation-analysis
© Copyright 2026 Paperzz