Economics

How Does Digitization Affect Scholarship?
Mark McCabe
University of Michigan
Roger Schonfeld
Ithaka
Christopher Snyder
Dartmouth College
December 11, 2007
What Characteristics Are Important to Authors?
Journal Characteristics Important to an Author
When it comes to influencing decisions about journals in
which to publish an article of yours, how important to you
is each of the following possible characteristics of an
academic journal?
a) The journal makes its articles freely available on the Internet, so
there is no cost to purchase or to read.
b) The journal permits scholars to publish articles for free, without
paying page or article charges.
c) Measures have been taken to ensure the protection and
safeguarding of the journal’s content for the long term.
d) The current issues of the journal are circulated widely, and are
well read by scholars in your field.
e) The journal is highly selective; only a small percentage of
submitted articles are published.
f) The journal is available to readers not only in developed
nations, but also in developing nations.
Preferences for Academic Journals, 2006
Percent of faculty who believe that each characteristic is “very important”
in influencing the decisions where to publish their articles
Wide circulation and
reading
No cost to publish
Preservation is
assured
Highly selective
Accessible in
developing world
Available for free
0%
20%
40%
60%
80%
100%
Background on the Present Study
Objectives
• What are the scholarly impacts of various business models for
journal publishing?
• How do various business models for journal publishing affect the
value derived by authors and readers?
Natural Experiment
• Beginning in 1995 publishers and content aggregators began
digitizing current and archival content and placing it online.
• However, as late as 2005 (the endpoint of our analysis)
backfiles for many journals (and current content in some cases)
remained offline.
• We exploit this heterogeneous chronology to explore the impact
of online access.
Previous Studies
• Many previous studies of this relationship find large effects
• Common flaws: these efforts do not adequately control for
potential selection problems affecting article quality, do not use
adequate statistical methods, or both
• For example, did the best journals, at least in some disciplines,
gain an online presence earlier?
• This study avoids these problems: Variation in journal quality for
content published prior to 1995 is unlikely to be related to online
strategies adopted by publishers after 1995.
Some Empirical Questions
• What is the impact of online access on journal citation rates?
• Are the benefits greater for newer or older content?
• Are the effects discipline-specific?
• Which online “channels” have the greatest impact?
• Is the geographic and institutional distribution of citing authors
influenced by online access?
People, Funding, and Timeline
• Researchers
• Mark McCabe, Professor of Economics, University of Michigan –
Principal Investigator
• Chris Snyder, Professor of Economics, Dartmouth – Co-Principal
Investigator
• Roger Schonfeld, Manager of Research, Ithaka
• Funded by a grant from The Andrew W. Mellon Foundation
• Data collection is completed, analysis is underway, full findings
are expected to become available by mid 2008
Our Data
Our Data
• Three Disciplines
• History
• Economics and Business
• Biological and General Sciences
• Hundreds of publishers, aggregators, and archives provided data
• 100 journals in each discipline, comparing journal-year by
journal-year
• 50 that were digitized early on
• 50 that were digitized only more recently or not at all
• Examine citations TO these journals that appeared in ANY
journal from 1980 to 2005
• Complete citation databases obtained from ISI
Descriptive Statistics
ECONOMICS
Obs
Mean
Std dev
Min
Max
99
1956.8
27.8
1844
1988
3,449
1985.9
12.9
1956
2005
Citation year
58,429
1994.9
7.1
1980
2005
Citations to journal-publication-year
in a year
58,429
37.0
60.0
0
771
Year journal first published
Publication year
SCIENCE
Obs
Mean
Std dev
Min
Max
98
1936.2
51.4
1665
1991
3,895
1977.6
23.3
1900
2005
Citation year
71,734
1994.4
7.3
1980
2005
Citations to journal-publication-year
in a year
71,734
302.0
1,802.1
0
32,589
Year journal first published
Publication year
Skewed Distribution of Citation in Economics
• About 4,700 zeros, one had 771 cites
5,000
4,500
Frequency
4,000
3,500
3,000
2,500
2,000
1,500
1,000
500
0
0
100
200
300
400
500
600
Citations to journal-publication-year in a year
700
800
900
Skewed Distribution of Citations in Science
• About 5,500 zeros, one had 32,500 cites
6,000
Frequency
5,000
4,000
3,000
2,000
1,000
0
0
5,000
10,000
15,000
20,000
25,000
Citations to journal-publication-year in a year
30,000
35,000
Online Availability for 1980 Content
Titles
Mean
St Dev
Min
Max
Economics (82 journals published in 1980)
JSTOR
39
2000.8
2.6
1996
2005
ProQuest
14
2003.0
1.6
2001
2005
Ebsco
34
2002.2
1.5
2001
2005
Publisher Website
19
2001.9
0.2
2001
2002
JSTOR
21
1999.4
1.2
1996
2000
Ebsco
9
2003.9
1.3
2001
2005
PubMed Central
2
2004.0
0.0
2004
2004
22
2003.8
1.3
1999
2005
Science (74 journals published in 1980)
Publisher Website
Geographic Distribution of First Authors
of Articles that Cite Other Articles
Science Cites
(000)
%
Econ Cites
(000)
%
English-Speaking Countries*
9,187
59.19
1,308
77.53
Non-English-Speaking Western
Europe**
3,622
23.34
251
14.89
Rest of the World
2,711
17.47
128
7.57
Total Cites
15,521
1,687
* US, England, Canada, Australia, Scotland, New Zealand, Wales, Ireland, Northern Ireland
** Germany, Netherlands, France, Spain, Italy, Sweden, Belgium, Norway, Switzerland, Denmark, Finland,
Austria, Greece, Portugal, Czech Republic, Slovakia.
Challenges
•
ISI data requires extensive clean-up and quality control
•
Many publishers and aggregators maintain poor records of their
journals’ online histories
•
First authors are confusing and require more consideration
Findings
Regression Outputs
. xtreg lncit1 age* cyr* d2* js2* ow2*, i(articlegroup) fe robust;
Fixed-effects (within) regression
Number of obs
=
54665
Group variable: articlegroup
Number of groups =
99
R-sq: within = 0.4435
Obs per group: min =
52
between = 0.0890
avg =
552.2
overall = 0.2605
max =
975
F(102,54464)
= 376.66
corr(u_i, Xb) = -0.0774
Prob > F
= 0.0000
(Std. Err. adjusted for clustering on articlegroup)
-----------------------------------------------------------------------------|
Robust
lncit1 |
Coef. Std. Err.
t P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age1 | .5925995 .0194082 30.53 0.000
.5545593 .6306398
age2 | .9779868 .019672 49.71 0.000
.9394294 1.016544
age3 | 1.132953 .0197409 57.39 0.000
1.09426 1.171645
age4 | 1.146659 .019613 58.46 0.000
1.108217
1.1851
age5 | 1.148625 .0197884 58.05 0.000
1.10984 1.187411
age6 | 1.118217 .0195578 57.18 0.000
1.079883
1.15655
age7 | 1.05887 .0196327 53.93 0.000
1.02039 1.097351
age8 | 1.026378 .0195007 52.63 0.000
.9881561 1.064599
age9 | .9633523 .0196253 49.09 0.000
.9248864 1.001818
age10 | .8995837 .0200633 44.84 0.000
.8602596 .9389078
age11 | .8377198 .0198925 42.11 0.000
.7987304 .8767093
age12 | .7902542 .020094 39.33 0.000
.7508698 .8296386
age13 | .7135656 .020046 35.60 0.000
.6742754 .7528558
age14 | .6518025 .0204853 31.82 0.000
.6116512 .6919538
age15 | .5977419 .020616 28.99 0.000
.5573344 .6381494
age16 | .5455455 .0207872 26.24 0.000
.5048025 .5862885
age17 | .5060501 .020825 24.30 0.000
.465233 .5468672
age18 | .4332353 .0211407 20.49 0.000
.3917994 .4746713
age19 | .3762387 .0215208 17.48 0.000
.3340578 .4184197
age20 | .3139517 .0219721 14.29 0.000
.2708861 .3570172
age21 | .3044858 .022119 13.77 0.000
.2611325 .3478392
age22 | .2190796 .0225092
9.73 0.000
.1749615 .2631978
age23 | .1970334 .0232404
8.48 0.000
.1514821 .2425847
age24 | .1424866 .0237271
6.01 0.000
.0959813 .1889918
age25 | .1347377 .0243322
5.54 0.000
.0870464
.182429
age26 | .0516184 .0250276
2.06 0.039
.002564 .1006727
age27 | .0225138 .0250947
0.90 0.370
-.026672 .0716997
age28 | -.0259718 .0253744 -1.02 0.306 -.0757059 .0237622
age29 | -.0632298 .0264435 -2.39 0.017 -.1150593 -.0114004
age30 | -.1099393 .0276293 -3.98 0.000 -.1640929 -.0557856
Science Journal Citations Peak in Year Three
9
95% confidence
interval
Citations relative to age 49
8
7
6
5
4
3
2
1
0
0
5
10
15
20
25
30
Years since publication
35
40
45
Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any
presence, restricted to 1956-2005 publication years
Economics Journal Citations Peak in Year Five
9
95% confidence
interval for science
8
Citations relative to age 49
7
95% confidence
interval for economics
6
5
4
3
2
1
0
0
5
10
15
20
25
Years since publication
30
35
40
45
Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any
presence, restricted to 1956-2005 publication years
Preliminary General Findings
• Citation levels more than double in both disciplines over the
sample period, 1980-2005.
• There is an increase in citations as a result of digitization and
online availability. Highly significant, both for pre-1995 content
(digitized backfiles) and born-digital periods.
Disciplinary Differences
• Citation rates peak earlier in science (3 years) than in economics
(5 years); the subsequent decline in citations is more rapid in
science.
• Online access is associated with an average increase in citations
of about 10% for economics and 20% for science titles.
• However, the changes in citations observed over time is an order
of magnitude larger than the measured impact of online access.
For Science, Online Access Boosts Citations 20% Overall
9
Citations relative to age 49
8
Online
7
6
Offline
5
4
3
2
1
0
0
5
10
15
20
25
Years since publication
30
35
40
45
Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any
presence, restricted to 1956-2005 publication years
For Economics, Online Access Boosts Citations 10% Overall
9
8
Citations relative to age 49
7
6
5
Online
4
Offline
3
2
1
0
0
5
10
15
20
25
30
35
40
45
Years since publication
Notes: Results from negative binomial regression with age dummies, digital dummy aggregated across channels for any
presence, restricted to 1956-2005 publication years
Channel Effects
• For Science: JSTOR and publisher portals are important, but not
other 3rd party channels (except for the period 95-97).
• For Economics, all types of channels have a significant impact.
• Longer embargo periods clearly decrease the ability of a given
channel to increase citations.
HIGHLY PRELIMINARY:
Geographic Effects on Citation Growth over Time
• Rate of citation growth for biology is much higher (double) in
non-English-speaking countries.
• Rate of citation growth for economics is moderately higher in
non-English-speaking countries.
• Implication: Are these disciplines growing faster in non-Englishspeaking countries?
Impact of Digitization for Science – Publisher Website
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
USA
Other English
Non-English Europe
Other Non-English
1998
1999
2000
2001
2002
2003
2004
2005
Impact of Digitization for Science – JSTOR
USA
Other English
Non-English Europe
Other Non-English
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
1998
1999
2000
2001
2002
2003
2004
2005
Impact of Digitization for Science – Aggregators
USA
Other English
Non-English Europe
Other Non-English
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
1998
1999
2000
2001
2002
2003
2004
2005
Impact of Digitization for Economics – Publisher Website
USA
Other English
Non-English Europe
Other Non-English
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
1998
1999
2000
2001
2002
2003
2004
2005
Impact of Digitization for Economics – JSTOR
USA
Other English
Non-English Europe
Other Non-English
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
1998
1999
2000
2001
2002
2003
2004
2005
Impact of Digitization for Economics – Aggregators
USA
Other English
Non-English Europe
Other Non-English
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
HIGHLY PRELIMINARY:
Geographic Effects on Citation Patterns
• Science: The channel impact is about twice as large in the nonEnglish speaking countries (e.g. overall a 30% increase versus
15%).
• Economics: The channel impact is about twice as large outside of
the developed English-speaking countries (~20% increase
versus less than 10%).
• There is much we can learn from various models for the
distribution of content and their relative strengths over time.
Further Questions and Discussion
Further Questions
• Does year of source-item publication matter?
• Will references to older articles increase more than references to
more recently published articles?
• Have self-citation patterns changed?
• Presumably we will find no effect, an important confirmation of our
data and analytical framework.
Findings and Discussion
• We find a consistent significant impact from digitization. At the
same time, it is an order of magnitude less than the changes
observed over time. Is the impact “large” or “small” and what
implications if any are there?
• The impact is greater in science than in economics. Why? What
are the implications?
• The impact is greater outside of the English-speaking countries.
Why? What are the implications?
• Channel effects are dramatic. What are the implications?
How Does Digitization Affect Scholarship?
Roger C. Schonfeld
[email protected]
(212) 500 – 2338
www.ithaka.org/research/citation-analysis