Referencing patterns of individual researchers: do top scientists rely

Referencing patterns of individual researchers:
do top scientists rely on more extensive
information sources?
Rodrigo Costas, Thed N. van Leeuwen, and María Bordons
CWTS Working Paper Series
Paper number
CWTS-WP-2012-001
Publication date
January 25, 2012
Number of pages
36
Email address corresponding author
[email protected]
Address CWTS
Centre for Science and Technology Studies (CWTS)
Leiden University
P.O. Box 905
2300 AX Leiden
The Netherlands
www.cwts.leidenuniv.nl
Referencing patterns of individual researchers: do top scientists rely on
more extensive information sources?
This is a preprint of an article accepted for publication in Journal of the American
Society for Information Science and Technology copyright © 2012
Rodrigo Costas1 (1); Thed N. van Leeuwen (1); María Bordons (2)
(1) Leiden University, Centre for Science and Technology Studies (CWTS),
Wassenaarseweg 62A, 2333AL Leiden (the Netherlands)
(2) Instituto de Estudios Documentales sobre Ciencia y Tecnología (IEDCYT), Center of
Human and Social Sciences (CCHS), CSIC, Albasanz 26-28, 28037 Madrid (Spain)
Abstract
This study presents an analysis on the use of bibliographic references by individual scientists in
three different research areas. The number and type of references that scientists include in their
papers are analyzed; the relationship between the number of references and different impactbased indicators is studied from a multivariable perspective; and the referencing patterns of
scientists are related to individual factors such as their age and their scientific performance. Our
results show inter-area differences in the number, type and age of references. Within each area,
the number of references per document increases with journal impact factor and paper length.
Top performance scientists use in their papers a higher number of references, which are more
recent and more frequently covered by Web of Science. Veteran researchers tend to rely more on
older literature and non-Web of Science sources. The longer reference lists of top scientists can
be explained by their tendency to publish in high impact factor journals, with stricter reference and
reviewing requirements. Long reference lists suggest a broader knowledge on the current
literature in a field, which is important to become a top scientist. From the perspective of the
“handicap principle theory” the maintained use of a high number of references in one author’s
oeuvre is a costly behavior that may indicate a serious, comprehensive and solid research
capacity, but that only the best researchers can afford. Boosting papers’ citations by an artificial
rise of the number of references does not seem a feasible strategy.
Introduction
Bibliometric indicators describe properties of the scientific communication
process through the use of mathematical and statistical analyses. They are
frequently used to support research assessment at the macro, meso and micro
levels (see for example Braun et al, 1995; Vinkler, 2006; Costas & Bordons,
2005), but also to obtain a better understanding of the behavior and dynamics
followed by researchers when communicating new knowledge (Budd &
Magnuson, 2010).
This study focuses on the analysis of the referencing practices of scientists,
which may provide interesting information about the communication in their field
1
[email protected]
1
as well as about scientists themselves. Different aspects of the referencing
process have been studied in the literature, such as how researchers cite other
papers, the median age of their references and the different typologies of cited
literature according to the different fields (Clements & Wang, 2003; Amat &
Yegros-Yegros, 2009; Larivière et al, 2006); scientists’ ways of searching and
using bibliographic material (Shanmugam, 2009; Budd & Magnuson, 2010);
attitudes and reasons for citing (Oppenheim & Smith, 2001, Clarke &
Oppenheim, 2006) and limitations and bad uses of referencing practices (Roth &
Cole, 2010; Kidd, 1990). However, there is still an important lack of knowledge
about the conceptual relationship between citations and references (Wouters,
1999) as well as about the referencing behavior of researchers at the individual
level (Nicolaisen, 2007).
Within the bibliometric scientific community there is also an important debate
about the reasons for the frequently observed correlation between the number of
references and the number of citations (Alimohammedi & Sjjadi, 2009). The
discussion about this relationship goes back in the bibliometric literature. In 1965,
Price claimed that “we know little about any relationship between the number of
times a paper is cited and the number of bibliographic references it contains”
(Price, 1965). In 1983, Stewart reported that articles in Geology and Plate
Tectonics were “more likely to be cited if they have more references or more
recent references” (Stewart, 1983). Since then several papers have addressed
the topic offering different theories and answers. For example, Steele & Stier
(2000) suggested that the higher level of references can be related to more
interdisciplinary approaches; Uzun (2006) related it to higher degrees of
authorship; and Abt (2000) and Abt & Garfield (2002) associated it to the length
of the articles among other aspects. According to Moed & Garfield (2004), the
reference conventions in a discipline, individual styles, the amount of information
contained in the papers, the paper’s length or the limits imposed by journals
editors may influence the frequency with which researchers cite other literature.
Also a possible “network effect” or “reciprocal altruism” according to which by
citing others you get cited by them has been suggested –“I cite you, you cite me”
(Webster et al, 2009)2. More recently it has been even argued that the impact of
papers could be “boosted” just by including more references in the bibliographic
list of publications (Corbyn, 2010).
From our perspective, another plausible hypothesis to explain this phenomenon
is that a large reference list could be a characteristic of “top” researchers, since
they could have a broader knowledge of the literature in their disciplines and as a
result they could document their papers more and better, being able to surpass
the most strict peer review processes of the best journals, and therefore gaining
more visibility and receiving more citations.
2
According to the hypothesis suggested by Webster et al (2009) if an author cites one of your
papers “you might be more likely to cite [the papers of this author] in the future, provided it is on a
related topic. Thus, the more references an author includes, the greater the likelihood that more
authors will in turn cite his or her work.”
2
As suggested by Moed (2005), a reference list in a paper marks the “sociocognitive location” of that paper. Small (1978) also suggested that cited
documents can be seen as “concept symbols”’ of the ideas contained in the cited
works. Taking these ideas from a more general perspective, it can be assumed
that the body of references used by individual authors in their “oeuvres” signal
their “socio-cognitive location” or “socio-cognitive environment” as well as the set
of “concepts” that they are using for developing their own research. In other
words, references used by scientists indicate what their conceptual framework is,
what their influences are and what knowledge are they managing about their
respective fields of work. From this point of view, longer reference lists in the
oeuvres of researchers might suggest a broader knowledge of the field and a firm
grounding in the preexisting literature.
In this context, the present paper addresses the study of referencing patterns at
the individual level. A recent paper by Frandsen & Nicolaisen (2012) dig the first
spit in this line of research focusing on the effects of experience and prestige of
researchers on their citing behavior in the field of econometrics and provided
some initial results and hypothesis. Their paper concludes with a call for further
empirical research and theoretical analyses on the topic, since only two journals
in a single specialty were studied. In this paper we adhere to this research line by
extensively analyzing different aspects related with the use of information by
individual researchers in three different research areas and with a different
methodology. Thus, this study represents a step forward in the analysis of the
referencing patterns of scientists at the individual level, assuming the perspective
that referencing (citing) is a human behaviour that is better analyzed from the
point of view of individuals; and with the aim of gaining new insights into the
topic.
Objectives
The main objective of this paper is to analyze the use of references in the
oeuvres of individual researchers, focusing on the number and type of references
that they include in their papers; on how it changes by areas; and whether it
could be related to individual factors such as age and research performance.
The main research questions addressed can be summarized as follows:
-
Are there inter-area differences in the use of scientific literature (cited
references) by scientists?
-
Does the use of references vary according to individual factors such as age
and research performance? In other words, do “top” researchers use more
references in their publications as compared to other scientists? What about
more veteran researchers?
3
-
Is there any relationship between the number of references that a scientist
uses in his/her papers and other bibliometric indicators (scientific production,
impact, collaboration)? If so, what are the most influential factors?
The answers to these questions will provide important insights into the
referencing behavior of researchers, useful for policy makers and research
managers, but also for library policies, editors of scientific journals and scientists
themselves.
Methodology
This study is based on the bibliometric analysis of the scientific publications of
1,064 researchers employed with a permanent position (“civil servants”) at the
Spanish National Research Council (CSIC) in 2004. These researchers are
organized in the institution in three main scientific areas: Biology & Biomedicine
(388 scientists), Natural Resources (348) and Materials Science (327). The
classification of the researchers in these three main scientific areas corresponds
to the disciplinary organizational scheme at the CSIC, in which eight different
scientific areas3 are distinguished with a certain degree of homogeneity in their
research profiles and scientists’ behaviors.
For each researcher, the scientific production published in journals covered by
the Web of Science (WoS) during the period 1994-2004 was downloaded and
correctly assigned to their authors (several methodologies for the proper
matching of authors and documents were considered - Costas & Bordons, 2006).
Documents published from Spanish centers, but also from abroad during the stay
of scientists in foreign countries were considered to build the bibliometric profile
of each person.
Indicators based on research performance
The bibliometric profile of every scientist comprises the number of publications,
citation related indicators (citations per publication, number of citations, % highly
cited papers, h-index), journal impact factor based indicators (median of impact
factor and normalized journal position) and relative measures of impact. A
detailed description of these indicators is provided in Appendix I.
To assess whether there is a relationship between research performance and
use of references, scientists were classified following a classificatory
methodology (described in Costas et al, 2010) for the analysis and research
evaluation of individual scientists. Based on this methodology, scientists are
grouped in three classes: top, medium and low; according to their “balanced”
performance across three bibliometric dimensions (Production, Observed Impact
3
Agriculture; Biology & Biomedicine; Chemistry; Food, Science & Technology; Materials Science;
Natural Resources; Physics; and Social Sciences & Humanities.
4
and Journal Quality)4. Top researchers are those with a high performance in at
least two of the three dimensions; medium class present an intermediate
performance in two of the three dimensions and low class researchers have a
low performance in at least two of the three dimensions described (cfr. Costas et
al, 2010).
Indicators based on the cited references
For each scientist, a set of indicators based on the number of cited references
included in their documents was obtained. Indicators based on cited references
were calculated with a window of 11 years. This window is set considering the
year of publication of the source papers and goes 11 years backwards in the
cited references5. Thus for papers published in 1994 only cited references
between 1994-1984 are considered, 1995-1985 for papers in 1995, 1996-1986
for papers in 1996 and so on. With this reference window possible biases due to
differences in the age of researchers and/or in the years of publication of
documents are minimized. In any case it is also important to mention that almost
all researchers under analysis (91%) had publications already in the years 19941995, so we can assume a quite homogenous population in terms of publication
age during the whole period of analysis (1994-2004).
-
-
-
References per document: mean number of references included in the source
documents of each researcher. It is calculated as total number of references
divided by total number of publications.
References per article: mean number of references per WoS document type
article.
External references per document: mean number of references to documents
that do not belong to any of the co-authors of the source documents (in this
indicator only references to 1994-2004 WoS documents were considered
since only in these cases they could be identified with no error).
Total distinct references: total number of unique references cited by every
researcher.
Distinct references per document: for each scientist the number of total
unique references is divided by the number of publications.
Average publication year of the cited references: this is the mean value of the
publication year of the cited references.
Percentage of references to non-WoS literature: this is the percentage of
references to documents not included as source documents in WoS (i.e.
books, non-WoS journals, reports, theses, etc.).
4
In the mentioned methodology, the nine variables described in Appendix I are grouped in three
dimensions by means of factor analysis. The Production dimension includes number of
publications, number of citations and the h-index. The Observed Impact dimension comprises
CPP, %HCP and CPP/FCSm. Finally, the Journal Quality dimension groups IFmed, NJP and
JCSm/FCSm.
5
The percentage of cited references within the last eleven years per area is as follows: 80% in
Biology & Biomedicine, 66% in Materials Science and 59% in Natural Resources.
5
Indicators based on paper length and coauthorship
In previous studies, the number of references per publication has been also
linked to the degree of authorship (Uzun, 2006) and to the length of the articles
(Abt, 2000; Abt & Garfield, 2002). In this study the mean number of authors per
document at individual level (Authors/document), and the mean number of pages
per document of individuals (Pages/document) have been also included in some
analyses. This last indicator (Pages/document) has been considered only as a
“proxy” of the paper length, as the raw number of pages per document can vary
depending on the page format of every journal6.
Review papers indicator
One additional indicator is the Total number of review papers7 per researcher,
assuming that this document type tends to reflect the state of the art in a
particular field and that the presence of review papers in the profile of a
researcher can be an indication of his/her expertise and esteem among their
peers (Lewison, 2009), as well as evidence that the author has achieved a
noteworthy level of recognition from her/his scientific papers (Ketchman &
Crawford, 2007) as a knowledgeable scientist (Weed, 1997). In a way, authors of
review papers can be considered also as “trend setters” with respect to future
research (Sagar et al, 2009) as they provide not only a comprehensive literature
perspective but also establish some (new) order among the facts.
Scientists by age
Finally, researchers were also classified in three groups of age (considering their
age in 2004):
-
Young: researchers with ages between 32 and 43 years old.
Senior: researchers between 44 and 56 years old.
Veteran: researchers with ages between 57 and 69 years old.
Age-group limits are determined by the percentile values in the distribution of
scientists by age (P25 = 44 years old and P75 = 56 years old). We would like to
remark that the “young”, “senior” and “veteran” labels should be understood in
relative terms. In this sense, one could argue that a 40-year scientist is not very
young, but from the point of view of this study he/she is young, as belonging to
the youngest cohort in the population under study. The purpose of categorizing
6
The number of words per paper could be a more accurate measure of paper length (as for
example in Frandsen & Nicolaisen, 2011). Unfortunately, since we had not access to the full text
of all the publications we relied on the number of pages as a “proxy” measure of the paper length.
7
In the total set of publications 3% of the publications are review papers. This percentage varies
per area as follows: Biology & Biomedicine 6%, Materials Science 1%, and Natural Resources
2%.
6
age in a three-class category was to remark differences along three distinct
stages in the life of scientists.
Results
The researchers of the three areas account for a total of 24,982 publications:
9,660 in Materials Science, 9,318 in Biology & Biomedicine and 6,102 in Natural
Resources; receiving 80,546, 189,699 and 56,940 total citations respectively (in
this case, including self-citations). For additional results about this set of
scientists we refer to Costas et al (2009, 2010).
1. Inter-area differences in reference based indicators
As shown in Figure 1, there are clear inter-area differences in the rate of
references per document, percentage of references to non-WoS literature and
average year of references.
Figure 1. Inter-area differences in the reference based indicators
7
Biology & Biomedicine is the area where researchers use on average more
references per document as compared to the other two, followed by Natural
Resources and Materials Science. Statistically significant differences have been
found among the researchers of the three areas (Mann-Whitney U test, p< 0.05).
The references used in Biology & Biomedicine tend to be more recent and are
more frequently covered by WoS than in the rest of the areas. On the other side
of the spectrum is located Natural Resources, where the highest percentage of
non-WoS literature is observed and the references used are older on average
than in the other two areas (differences statistically significant -p<0.05- in all the
cases).
These data show that the literature used by scientists is area-dependent and
therefore the first element that determines the referencing behavior of a
researcher comes from the area where she/he is working. Accordingly, the stress
on the following sections is put on differences among classes of scientists within
each area, rather than on inter-area comparisons.
2. Do “top” researchers use more references in their publications as
compared to the other scientific performance classes?
As seen in the previous part of the analysis, the number of references per
document of individual researchers is clearly area-dependent. Within each area,
we hypothesize that top scientists may have a better knowledge of their research
topics than the rest of scientists; therefore we expect to find a higher number of
references in their papers. In order to verify this issue, Figure 2 presents the
distributions of the rates of references per publication considering different
elements of the scientific output of researchers.
8
Figure 2. Reference based indicators by scientific performance class and area
As expected, we observe in Figure 2 (top left graph) that top researchers present
overall more references per document than the other two scientific performance
9
classes in the three areas, these differences are statistically significant in all
cases (Mann-Whithney U, p<0.000)8.
The same analysis was performed including only the document type “article”
(graph on the top right of Figure 2) in order to avoid possible bias due to specific
document types such as “reviews” that will be studied later. Again, top
researchers present the highest number of references per article (p<0.000), thus
proving that top researchers consistently use more references in their regular
articles than medium and low class scientists.
In order to control for possible influences produced by the collaboration of
researchers (e.g. researchers with a high degree of collaboration might cite more
references in their papers as some of them are included by their co-authors), an
analysis based only on single authored articles was performed (middle left graph
in Figure 2) (a similar approach was followed also by Frandsen & Nicolaisen,
2012). In this analysis of the single-authored articles it can be fairly assumed that
the researchers only use the references and literature that they know by
themselves. We can see that top researchers tend to present more references
per article in two areas, although the differences are statistically significant only
in Natural Resources (p<0.05). In addition, low class researchers present the
lowest levels of cited references per article in all cases. The limitation of this
approach is that single-authored articles are quite scarce in current scientific
publication (Ma & Guan, 2005; van Leeuwen, 2009), especially in experimental
sciences. As a consequence, the number of researchers involved in the analysis
is lower (only 20% of the total in this study) and more than half of them present
only one single-authored article, which means that in many cases we are relying
on a single paper to know how the referencing behavior of an author is like.
To avoid the potential influence of “self-citing” practices, the number of external
references used by the authors is shown (middle right graph in Figure 2). Again it
can be clearly seen how top researchers present the highest level of external
references in their publications (p<0.05).
Finally, the total number of unique references used by each researcher was
obtained and normalized by the total number of publications per person (bottom
graph in Figure 2). The distribution of distinct references per paper shows again
that top researchers present the highest rate (p<0.000 in all the cases), thus
indicating that top researchers use a broader range of literature in their oeuvres
as compared to the other classes of scientists.
8
All the analyses included in Figure 2 were also performed considering the total number of cited
references (i.e. without any reference window or document type restriction) and basically the
same results as with the 11-year window were obtained (data not shown).
10
3. Does the number of references per document vary according to the age
of scientists?
We hypothesize that the age of scientists may be also an influencing factor on
the number of references, since initially the longer experience and professional
career of veteran scientists might result in a wider knowledge of their research
field.
Figure 3. Number of references per document by age and scientific area
R eferences/D ocum ent (11 years
w indow ) SQ R T
7
y = -0.0362x + 6.2983
R2 = 0.4842
6
5
y = -0.0185x + 4.7771
2
R = 0.1378
4
3
y = -0.0272x + 3.9825
2
R = 0.3944
2
1
Natural Resources
Biology & Biomedicine
Materials Science
0
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Age
Contrary to the initial expectations, younger researchers tend to present more
references per document as compared to their older counterparts (Figure 3) (lefthand side figure, p<0.05 in Natural Resources and Biology & Biomedicine). The
decreasing trend in the average number of references per document as scientists
get older is also observed in the right hand graph in Figure 3, where age is
maintained as an independent quantitative variable.
Following the scheme presented in Figure 2, the analysis of reference-based
indicators by age class and area was performed (data not shown). In general
terms, younger researchers presented more references per article, more external
references and more unique references per document than veteran researchers
(p <0.05 in nearly all cases). The only exception to this pattern was observed in
the distribution of references of single-authored publications, where no significant
differences by groups of age were observed.
11
4. Does the use of WoS-covered material vary by age or scientific
performance class of researchers?
The percentage of references to non-WoS literature (i.e. references to books,
non-WoS journals, PhD theses, scientific reports, etc.) is analyzed in relation to
the age and scientific performance class of researchers in Figure 4. In this figure
a clear pattern is found: top and younger researchers present the lowest
percentages of references to non-WoS literature, while the contrary pattern is
found for low-class and older researchers in the three areas analyzed (p<0.05).
Figure 4. Distribution of the percentage of references to non-WoS literature
5. Does the age of the references vary by age or scientific performance
class of researchers?
Assuming that top researchers do probably work at the forefront of science, we
would expect to find that they support their research on very recent literature in
their fields. To explore this issue, the average of the ordinal age (within the 11
years reference window) of the cited references of the papers of every
researcher has been computed and the distribution of this variable for the
different areas is presented in Figure 5.
12
Figure 5. Average age of references by age and scientific performance class of
researchers
Focusing on scientific performance classes and age groups (top graphs in Figure
5) we can observe how low class (left graph) and older researchers (right graph)
use older literature as compared to top class and younger researchers (statistical
significant differences have been found in almost all cases, p<0.05). This is also
supported by the graph on the bottom where a slight positive correlation between
the age of researchers and the age of their references is detected.
6. Do researchers who write reviews also include more references in their
regular articles?
Writing review papers is usually considered a sign of prestige and esteem of
scientists to the extent that authorship of review papers is positively assessed in
the evaluation of researchers (Lewison, 2009). Assuming that scientists who
write review papers have a comprehensive knowledge of their research field and
sometimes even beyond the regions of one’s own expertise (Ketcham &
13
Crawford, 2007), we wonder whether these researchers have also a higher
number of references per paper in their regular articles (Figure 6).
Figure 6. Number of references per article for researchers with and without
review papers
The latter hypothesis is confirmed in Figure 6. Researchers who write review
papers tend to present also a broader use of references in their normal articles
(p<0.05), this outcome supports the hypothesis on the usefulness of the number
of references as a measure of the knowledge of an author in his/her field(s).
Moreover, our data support the importance of review papers in the academic
profile of scientists, since top class researchers present proportionally more
review papers than researchers in the other two scientific performance classes in
the three areas analysed. In general, it can be stated that at least 50% of top
researchers have published one or more review papers, while lower reviewing
activity is found in the other classes (Table 1).
14
Table 1. Crosstab analysis of researchers with and without reviews by scientific
performance class
Scientific Performance Class
Top
Medium
Low
Natural Resources
Authors of reviews
39 (59.1%)
38 (19.9%)
15 (16.3%)
Authors without reviews
27 (40.9%)
153 (80.1%)
77 (83.7%)
Total
66 (100%)
191 (100%)
92 (100%)
Biology & Biomedicine
Authors of reviews
53 (75.7%)
145 (62.8%)
29 (33.3%)
Authors without reviews
17 (24.3%)
86 (37.2%)
58 (66.7%)
Total
70 (100%)
231 (100%)
87 (100%)
Materials Science
Authors of reviews
35 (50.0%)
38 (21.8%)
10 (12.0%)
Authors without reviews
35 (50.0%)
136 (78.2%)
73 (88.0%)
Total
70 (100%)
174 (100%)
83 (100%)
Note: percentages in columns. Pearson Chi-square p<0.000 in the three areas.
Total
92 (26.4%)
257 (73.6%)
349 (100%)
227 (58.5%)
161 (41.5%)
388 (100%)
83 (25.4%)
244 (74.6%)
327 (100%)
On the other hand a somewhat surprising result is observed as the researchers
who publish review papers tend to be relatively younger than those without
reviews (Figure 7) (statistically significant differences in Natural Resources and
Biology & Biomedicine, p<0.05). Therefore, it seems that is not necessary to be a
veteran scientist to gain the experience and recognition needed to write reviews
in a field.
Figure 7. Distribution of the age of researchers with and without reviews
15
7. Does the number of references per document correlate with other
bibliometric indicators at the individual level?
Pearson’s correlations of the number of references per document and other
bibliometric indicators at the individual level are presented in Appendix II for the
three research areas separately. In this correlation matrix we can see how there
is a moderate positive correlation (Pearson’s generally higher than 0.400)
between the rate of references per document and almost all the other bibliometric
indicators. The correlation coefficient is below 0.400 only for the number of
publications (P), pages/document and authors/document.
In particular, a positive correlation between number of references per document
and a) observed impact indicators (C, CPP, CPP/FCSm, etc.); b) journal impact
indicators (Median Impact Factor, NJP and JCSm/FCSm); and c) indicators of
“citation density” of journals (JCSm) and fields (FCSm) is observed.
-
‘Predictors’ of the rate of references per document. Model 1
In order to analyze more in depth the relationship between bibliometric indicators
and the rate of references per document of scientists, a linear regression
analysis has been performed to obtain a model that could “predict” the rate of
references/document (“dependent variable”) considering the other bibliometric
indicators (“independent variables”). Two different models are shown in Table 2.
In model 1 only indicators related with the impact of journals (i.e. Median Impact
Factor, NJP and JCSm/FCSm), the indicators related with the “citation density” of
the publication journals of researchers (JCSm) and their fields (FCSm), and the
number of pages/document and authors/document have been included9.
Squared root values were used for all the variables included in the linear
regression analysis. The final solution for this model was accepted when the
Durbin-Watson statistic was between 1.5 and 2.5. Redundant variables were
omitted to avoid multicollinearity. To reject the presence of multicollinearity we
examined the values of tolerance, variance inflation factor (VIF) and condition
index10, and only those variables not highly correlated were left in the model.
The best predictors for the rate of references per document of individual
researchers are the Median of the Impact Factor, together with the pages per
9
The indicators of observed impact (C, CPP, CPP/FCSm) were not included in the model, as we
considered that the number of references in a document may contribute to explain its subsequent
citation rate, while the inverse relationship has no sense. In a way, the number of citations is not
a determining factor of the level of references per document, but a consequence of it.
10
Durbin-Watson statistics between 1.5 and 2.5 generally indicate that autocorrelation is low
enough to draw adequate conclusions from the regression analysis (see for example Milne,
1969). It is usually accepted that tolerance less than 0.10 and VIF greater than 10 suggest
multicollinearity. Moderate to strong collinear relations are associated with condition indexes of 30
to 100 (see for example Belsley et al, 1980).
16
document rate and the number of publications -in Natural Resources and Biology
& Biomedicine-. The citation density of the fields of the researchers (FCSm)
presents also some influence over the rate of references per document at
individual level. The relatively high Adjusted R Squares (0.41, 0.68 and 0.53)
suggest that the model is reasonably good in the three areas for the “prediction”
of the references per document of individual researchers (Table 2).
According to model 1, it is possible to calculate the “predicted” (or expected)
values of references per document for every researcher and calculate the
difference between the observed rate of references per document versus the
expected rate (“Dif. O-E”). Based on this difference it is possible to study which
researchers have more references than expected as the “predictions” derived
from model 1.
In the light of these observed-expected differences in references, the following
question can be raised: do top researchers still tend to include more references
than expected according to model 1? Figure 8 presents the Dif. O-E of scientists
by research performance class in each of the three areas under analysis. Here,
we can observe that top and medium class researchers tend to present more
references per document than estimated by the models (statistically significant
differences observed among the three classes in Materials Science, p<0.05-, and
also between Low class and the rest in the other two areas, p<0.05).
Figure 8. Distribution of the difference between the observed and the expected
reference rate (Dif. O-E) by scientific performance class
17
-
‘Predictors’ of the rate of references per document. Model 2
Model 2 is obtained including in the regression analysis new variables: age of
scientists, experience writing reviews and research performance class (top,
medium, low). Concerning the research performance class, several dummy
variables are built to indicate whether the scientist is top (“top_dummy”: 1=yes;
0=no) or medium (“medium_dummy”: 1=yes; 0=no) (with “low” defined as the
reference category). The variable “reviews” indicates whether a given scientist
has at least 1 review among their papers (“review”=yes). The number of reviews
was not used because most of the scientists have no reviews at all. The
presence of collinearity was discarded by means of tolerance and VIF (Variance
Inflation Factor) tests.
As observed in Table 2, the research performance class and age are significant
in the three areas. Medium and top class scientists tend to use a higher number
of references than low class scientists, and the greater standardized beta
coefficient of top scientists indicate the greater weight of this variable on the final
number of references in this class of scientists. As far as the age is concerned, it
is negatively correlated with the average number of references per document. It
means that older scientists tend to use a lower number of references per
document in the three areas.
The variable “review” is significant in Biology & Biomedicine and Materials
Science, where the number of references per document tend to be higher for
those scientists who have review experience. However, the variable is not
significant in Natural Resources, where the effect of reviews is subsumed by
effect of other variables such as impact factor, research performance class and
paper length.
18
Table 2. Regression analysis using number of references per document (SQRT)
as dependent variable
Model 1
Coef.
Model 2
Standard
error
Stand.
coeff.
Coef.
Standard
error.
Stand.
coeff
Biology & Biomedicine
(Intercept)
IF median (SQRT)
0.065
0.803***
(0.425)
(0.091)
0.424
2.210
0.450***
(0.631)
(0.106)
0.238
Pages/doc (SQRT)
P (SQRT)
0.937***
0.083***
(0.122)
(0.021)
0.306
0.159
0.807***
-0.054
(0.113)
(-0.227)
0.263
0.220
FCSm (SQRT)
Age
0.248***
(0.078)
0.153
0.174**
-0.131*
(0.073)
(0.067)
0.108
-0.081
0.507***
0.607***
(0.101)
(0.144)
0.269
0.256
(0.081)
0.274
medium_dummy
top_dummy
Review
Adjusted R-squared
0.413
0.516***
0.519
F score
67.86***
52.240***
Materials Science
(Intercept)
-1.610***
(0.334)
-0.374***
(0.413)
IF median (SQRT)
Pages/doc (SQRT)
1.766***
0.810***
(0.102)
(0.100)
0.710
0.274
1.413***
.808***
(0.111)
(0.095)
0.568
0.273
P (SQRT)
FCSm (SQRT)
0.028***
0.357***
(0.011)
(0.088)
0.088
0.163
-0.010
0.314***
(0.011)
(0.085)
-0.033
0.143
Age
medium_dummy
-0.115**
0.295***
(0.038)
(0.066)
-0.094
0.196
top_dummy
Review
0.493***
0.222***
(0.095)
(0.055)
0.276
0.137
Adjusted R-squared
F score
0.677
162.71***
0.727
103.56***
Natural Resources
(Intercept)
IF median (SQRT)
-0.945**
2.044***
(0.336)
(0.171)
0.521
0.806
1.179***
(0.602)
(0.202)
0.301
Pages/doc (SQRT)
P (SQRT)
0.563***
0.115***
(0.071)
(0.021)
0.310
0.220
0.541***
0.024
(0.068)
(0.024)
0.298
0.046
FCSm (SQRT)
Age
0.318***
(0.099)
0.138
0.320**
-0.131*
(0.094)
(0.063)
0.139
-0.082
0.681***
0.965***
(0.104)
(0.151)
0.369
0.426
medium_dummy
top_dummy
Review
Adjusted R-squared
0.527
F score
91.51***
0.075
0.591
59.644***
***p<0.001; **p<0.01; *p<0.05
Data expressed as coefficient (standard error)
19
(0.084)
0.038
Some interesting findings emerge from the comparison of model 1 and 2. First,
the addition of some variables in model 2 improves slightly the explanatory power
of the model (Adjusted R-squared). Secondly, we can observe that impact factor,
which is the most influential factor in model 1 (according to the values of the
standardized coefficients which allow comparison among variables expressed in
different units), is less relevant in model 2. Moreover, the number of publications
is significant in model 1, but it drops in the model 2 in the three areas. The
underlying reason for these changes is the weight of the new variables
introduced in model 2. Top scientists usually have a high number of publications
and/or tend to publish in high impact factor journals, to the extent that P is no
more needed in model 2. The fact that impact factor is maintained in the second
model suggests that even within the class of top scientists there are differences
in this value that can be associated to differences in references per document
rate.
Discussion and conclusions
In this paper different aspects related with the use of information by individual
researchers have been analyzed, assuming that bibliographic references are key
elements in the communication of scientific research and new ideas.
In the first place, we would like to make some comments regarding the
methodology followed in this study. As mentioned before; Frandsen & Nicolaisen
(2012) first explored this line of research, focusing on the effects of experience
and prestige of researchers on their citing behaviour in the field of econometrics.
It is interesting to note that they analyzed the referencing behavior of authors
through the study of their single-authored publications in order to avoid the
influence of co-authors on the referencing pattern of the studied authors. The
limitation of this approach is that only those scientists who have single-authored
publications can be studied, fact that can be very restrictive in the hard sciences,
where very few documents are single-authored. In this paper, we have adopted a
novel approach by focusing on the total scientific journal publications of
researchers during an 11-year period to analyze their referencing practices, since
we consider the use of references as a characteristic of the behavior of authors
in the production of their “oeuvres”, instead of as a property of individual papers.
A limitation of our approach is that we cannot completely discard the potential
influence of co-authors on the referencing practices of a given researcher, but we
focus on the “average behavior” of scientists, which is drawn from the study of
their individual bibliometric profiles, instead of relying only on their scarce singleauthored publications11. In any case, further research on the influence of coauthors on the referencing practices of researchers would be needed in the
future.
11
In our study only 20% of scientists had single-authored publications and half of them presented only one, which clearly
reduces the usefulness of the single-authored publication- based approaches.
20
On the other hand, while the main bibliometric indicators used by Frandsen &
Nicolaisen (2012) include the total number of publications and citations (both
size-dependent, see for example Franceschet, 2009; Waltman & van Eck, 2009;
Costas et al, 2010); here a broader set of indicators has been used, including
also some citation density indicators (citation density of fields and journals) as
well as some indicators about the authors (age and performance level of the
researchers).
It is also important to bear in mind that both approaches present the limitations
coming from the populations studied: a sample of papers from two econometric
journals in the Frandsen & Nicolaisen (2012) paper, and the oeuvres of individual
researchers in three different research areas in the present paper. Although the
three areas here analyzed present quite consistent and similar patterns still some
organizational or country influences could play a role, therefore the results here
presented would benefit from further research in other populations of
researchers.
Inter-areas differences in cited references
From this study it can be concluded that individuals use references mainly as a
function of their journals and fields, something that has been previously
suggested in the literature (see for example, Moed 2005).
Our study shows that the distribution of the number of references per document
of researchers varies per area, with Biology & Biomedicine scholars presenting
the longest reference lists, followed by Natural Resources and finally by Materials
Science researchers. The shortest reference list of Materials Science is
consistent with the claims of Kidd (1990), who observed that engineering fields
have less comprehensive bibliographies in their works.
The three areas under study present differences in the use of non-WoS literature
as well as in the age of the cited material. Inter-area differences were previously
described in the literature. In particular, in the study of Butler & Visser (2006) on
Australian universities, the lowest use of WoS sources was observed in
Humanities and Social Sciences (less than 10% in some disciplines such as
Architecture or Law and close to 30% in the case of Economics) while the best
WoS coverage corresponded to Biology, Physics and Chemistry (80-90%). Main
inter-field differences were due to the different weight of non-journal sources (i.e.
books, book chapters, conference papers) in the dissemination of research.
In our study, Biology & Biomedicine researchers are the ones who rely more
heavily on WoS-covered material and also present the strongest focus on more
recent literature, which can be linked to the fact that this is a very internationally
oriented area. This is in line with the observation of Tenopir et al (2009) that
medical/health scholars tend to read more and more recent publications than
scholars from other fields.
21
On the other side of the spectrum are located researchers in Natural Resources,
who tend to cite more non-WoS publications as well as older literature than in the
other two areas, which is consistent with the results of previous studies in natural
resources-related fields (Velho & Krige, 1984; Rey Rocha et al, 1999; Garg et al,
2006), and can be partly explained by the more local orientation of the area in
some of its research topics (Costas & Bordons, 2005).
Relationship between cited references and other bibliometric indicators from an
individual level perspective
A positive correlation between the number of references per document and both
indicators of observed impact (CPP, CPP/FCSm, %HCP, C, etc.) and indicators
of journal impact (Median Impact Factor, NJP, JCSm/FCSm) is observed. The
relationship between references and citations is an issue of great current concern
(Alimohammadi & Sajjadi, 2009). A positive correlation between the number of
references per article and the number of times it was cited has been observed at
the document level (Uzun, 2006; Webster et al, 2009) and at the journal level
(Biglu, 2008). Dependence between reference frequency and impact factor was
described in different sciences by Abt (2000), who also observed a positive
relationship between the number of references and the normalized paper length.
This relation was steeper in high-impact factor journals, in which the same
increase in paper length produced a higher increase in number of references
(higher slope of the regression line). The “stricter” referencing requirements of
high impact factor journals can be the underlying reason (Bordons et al, 2002).
Our regression analysis (model 1) shows that the impact factor is the most
influential variable, followed by the number of pages per document. The influence
of the number of pages should be analyzed with caution because this variable
was not normalized according to the number of words/page, which can vary from
journal to journal. As a consequence, it only provides some orientative
information in this study. In any case, it shows some “predictive” power on the
number of references per document of researchers, in a way that longer papers
tend to include more references (Abt & Garfield, 2002). One potential explanation
for this relationship is that longer papers could have a more comprehensive
content and discuss more varied ideas; thus supporting the hypothesis that a
larger number of pages and references in the publications of an author can be an
indication of the greater amount of ideas, concept symbols or knowledge that
he/she is managing and discussing in his/her papers. In this line, it is interesting
to mention that both number of references and article length have been identified
as strong predictors of impact in the field of psychology (Haslam et al, 2008).
All in all, these findings are in agreement with the claim of Nicolaisen (2007) that
the “act of citing” is “embedded within the socio-cultural conventions of
collectives”. As seen in this study, the journals and the fields where an author is
working constitute important determinants on the number of references that
22
he/she includes in the papers, which somehow establish the “sociocultural
convention” of the author as a member of a collective. However, the influence of
individual factors is also observed in model 2, where top performance and older
age have respectively a positive and negative effect on the number of references
per document.
Use of literature and age of researchers
From the individual point of view, younger researchers present a higher number
of references per document and tend to rely more on recent literature, while the
contrary holds for older researchers. A possible explanation to this issue was
provided by Frandsen & Nicolaisen (2012), who suggested that as authors gain
experience and become more respected among their peers they might feel less
need for supporting all their claims by bibliographic references. In addition, the
idea of older scientists being more likely than younger ones to cite older
publications was initially mentioned by Zuckerman & Merton (1973) and Barnett
& Fink (2008) who suggested two potential explanations for this phenomenon:
first, an age bias in the receptivity of scholars to new ideas, with younger
scientists being more receptive than older ones to new ideas and publications;
and second, the possible accumulated knowledge of scientists who created their
base knowledge when they began their professional careers (i.e. when they were
younger). Another complementary explanation is that younger researchers are
more likely to access their readings from electronic sources (where normally the
newer publications are contained) than their older counterparts, who
proportionally read more printed sources (Tenopir et al, 2009). The increasingly
competitive environment of research may also contribute to explain the described
use of references by younger scientists, who are obliged to demonstrate an
important competence and excellence in internationally-oriented research topics
to obtain a permanent position at the CSIC (Costas et al, 2010). This relationship
between the use of more recent literature and the probability of being in the
forefront of research has been also suggested by Gingras et al (2008).
It is important to highlight again that our conclusions are limited to the population
of scientists that we have studied. Although we distinguish between “young”,
“senior” and “veteran” scientists, these labels should be understood in relative
terms, since the existence of very young scientists (in absolute terms) is limited
because all the researchers considered in the study have already a permanent
position, which means they have been able to demonstrate a relevant scientific
track and to compete for tenure. In this sense, exploring whether younger
researchers (e.g. PhDs, recent postdoctoral researchers, etc.) show different
patterns in their referencing behavior as compared to other more established
colleagues as the ones studied here remains a pending matter to be studied in
the future.
23
Use of literature and scientific performance class
An initial conclusion is that top researchers use in their papers a broader range of
scientific literature as compared to other researchers, and also more than
expected by their journals and fields. They cite more references per document,
they also tend to cite more references in their single-authored publications, use
more external references and they also use a wider variety of unique references
in their total set of publications. Moreover, top scientists use more recent
literature and rely more heavily on WoS-covered material than the rest of the
researchers.
A plausible explanation for this pattern is that top researchers have (or display) a
broader knowledge of the current literature existing in their respective fields. This
explanation can be supported by the fact that they tend to be also more
frequently authors of review papers and that the authors of review papers tend to
have more references in their regular publications. In this line, Rames Babu &
Singh (1998) indicated that it is almost impossible to be a productive scientist
without awareness of what others are doing in your area of specialization, and
that an acquaintance with recent trends of research in the context of a global
situation is inevitable for raising one’s own research output.
Besides, top researchers are also younger researchers (cfr. Costas et al, 2010)
and it has been confirmed in this study that younger scientists also tend to use
more references in their papers as compared to their older colleagues. These
results are in line with the findings of Tenopir & King (2000) and Tenopir et al
(2009) who observed that in general high achievers and younger researchers
read more articles than other scientists, thus suggesting that reading habits and
literature acquaintance are key elements in the success of scientific research.
These conclusions have important implications from the perspective of library
and information access policies, as they should provide tools and resources in
order to facilitate the access to the new knowledge published in the fields of
researchers and thus allowing them to be able to keep up high standards of
referencing in their scientific work and publications. Although electronic tools
have notably improved the accessibility to scientific knowledge in the modern
world, the claim for the establishment of adequate information access services
(Ramesh Babu & Singh, 1998) is still valid in order to allow researchers to be
aware of the most important ongoing literature in their fields
The authorship of review papers as a proxy of the knowledge of the researchers
in their fields
Authorship of reviews has been considered in the literature as an indicator
contributing to the high esteem or experience in which a scientist is held
(Lewison, 2009; Frandsen & Nicolaisen, 2012), since reviews are frequently
24
commissioned to experts who are supposed to have a specially broad and up-todate knowledge of the literature in their fields. The fact that review authors
present more references than the remaining authors in their non-review
publications and that top researchers are among the most prone to write review
papers suggest the relevance of the number of references per document in the
oeuvres of scientists as an indication of a genuine broader knowledge of the
relevant literature in their disciplines.
An interesting finding in this study is that the authors of reviews are not
necessarily the more veteran researchers in their areas, something that was also
observed by Gingras et al (2008) who suggested that the production of reviews
increases until 50, and gradually decreases afterwards. This implies that
relatively younger scientists can also be experts in their fields and attain enough
esteem to become authors of reviews. As Squires (1989) stated about
biomedical review articles, “well-prepared descriptive and evaluative review
articles of important topics or questions are always welcome but involve research
and preparation efforts that many authors are unwilling to make”. In fact,
according to Ketcham & Crawford (2007) an editorial invitation to write a review
is an important event for a younger or mid-career scientist, as the reviews give
opportunity to provide the scientists’ unique appraisal of knowledge in his/her
area of expertise, while for more senior authors, reviews may or may not have
career value. In any case, the study of the determinants of review authorship is
an interesting topic which is beyond the objectives of the present paper and
deserves a specific and detailed analysis in the future.
Integrating data under the regression model
Our regression model shows that scientists who use a relatively high number of
references per document tend to be young, top or medium as far as their
research performance concerned, publish relatively long documents in relatively
high impact factor journals and work in fields of high citation density. In the areas
of Biology & Biomedicine and Materials Science these scientists show some
experience in writing reviews, while it is not significant in the case of Natural
Resources. In summary, our results indicate that the number of references per
document is partly explained by the characteristics of the field (citation density),
characteristics of the paper (article length, review paper) and journal prestige
(impact factor). Moreover, some personal factors seem to have some
explanatory power, such as the age and the research performance of scientists.
Theoretical discussion
A general explanation for the positive correlation between cited references and
observed impact at individual level can be suggested: researchers who have a
comprehensive referencing behavior (and implicitly also an important knowledge
of the literature in their fields) get their papers published in the best journals - as
25
they surpass the higher standards and more difficult peer review requirements of
these journals (Bordons et al, 2002) - and as a result they get a higher degree of
visibility and receive more citations. From our point of view, a plausible
hypothesis is that those scientists who present longer reference lists in their
publications rely on more diverse sources of knowledge for their research and
write more comprehensive and strong studies. It can be proposed then that the
correlation between cited references and observed impact at individual level is
intermediated by the high impact of the journals that these researchers are
targeting, which explains in turn the higher impact that they eventually achieve.
Nevertheless, following Corbyn’s “boosting” argument (Corbyn, 2010), it could
still be argued that some authors could simply include more references in their
papers (in a somehow manipulative or “perfunctory” way) in order to get them
published in better journals and obtain also more citations. In this regard it must
be taken into account that “stacking” masses of references is not sufficient to
appear serious and strong (Latour, 1987). Authors need to “modalize” or “qualify”
the references12 to get them adequately attached to the argument of the citing
paper (implying also that the citing authors need to “know” at some degree the
cited papers). From this perspective, it seems quite unlikely that an author just by
“dropping” some more secondary or unrelated references could get a not very
relevant paper published in a high impact journal (and becoming highly cited
afterwards). And this possibility appears even less likely if the oeuvres of
researchers are considered (as done in this study), since the systematic
manipulation of the referencing behavior in an oeuvre seems an unfeasible task.
The idea that top researchers present genuinely a higher rate of references in
their oeuvres can be framed in the theory of the “handicap principle” or the
“theory of costly signaling” (Nicolaisen & Frandsen, 2007; Nicolaisen, 2007). This
theory has been recently used in information science (e.g. Small, 201013), where
its potential validity for the understanding of the reference behavior of authors
has been pointed out (Frandsen & Nicolaisen, 2012).
This principle was described by Zahavi (2003) in the context of sociobiological
studies and is of use to explain the evolution of all communication systems. From
this perspective, different aspects of social behavior can be explained, such as
the relevance of social prestige, which is understood as the respect awarded to
an individual who has demonstrated his/her strength and abilities. According to
Zahavi “if an individual is of high quality and its quality is not known, the
individual may benefit from investing a part of his/her advantage in advertising
that quality, by taking on a handicap, in a way that inferior individuals would not
12
According to Small (1978) scientists need to “create a link between a concept and a document”
and “the work cited cannot be appended without some explicit or implicit context”.
13
In the perception of Henry Small (2010), the “generosity” of citing may also entail differentiation
of one’s works from the work of others, in order to establish one’s own niches by showing that
what one presents is original and unique by comparing it with others with similar ideas.
26
be able to do, because for them, the investment would be too high”14. Besides
“the selective process by which individuals develop their handicap increases their
fitness, rather than decreases it. Only cheaters would decrease their fitness if
they were to take on a handicap that does not match their qualities, hence the
efficacy of the handicap in discouraging dishonest signaling”. In this line of
reasoning, Nicolaisen (2007) suggests that references are a sign of confidence
and that a stack of references is a “handicap” that only an honest author can
afford. Authors who are uncertain of themselves will usually not risk the potential
loss of reputation that the discovery of fraudulent citation habits would carry
(especially if done systematically in their oeuvres). Accordingly, without
proposing that all references are always honest, Nicolaisen suggests that the
handicap principle ensures that authors honestly credit their inspirations and
sources to a tolerable degree. In other words, in the light of the results presented
in this paper, it can be conveyed that the capacity of including more references in
the papers is a costly signal primarily preferred by stronger (or top) researchers,
who genuinely manage more ideas and knowledge, and as a result of this they
are able to publish in higher impact journals, thus obtaining more visibility and
citations from their colleagues. In addition, systematic dishonest long reference
lists, such as those based in the mere (or random) stacking of references, should
be avoided by scientists, as these can be on their own detriment and may make
the work of readers, reviewers and journal editors more difficult, and will hardly
contribute to the final quality of documents.
Finally, further research on this topic would still be necessary. First, by extending
this type of analysis to other sets of researchers (from other research
organizations, countries and fields) in order to corroborate and/or discuss some
of our results and conclusions here presented. Second, the analysis of other
factors and variables that could also have an influence on the type and amount of
knowledge managed by researchers (e.g. interdisciplinarity, network effects,
working conditions, etc) will be also an important line of development in this area
of research. Such analyses would contribute to improve our understanding of the
scientific communication and referencing patterns of researchers and how they
transfer their knowledge and ideas through their scientific publications.
Acknowledgments
The authors are grateful to the two anonymous referees who with their comments
contributed to improve the quality of the original manuscript and especially for
drawing our attention to a recent publication of Frandsen & Nicolaisen (2012).
14
Note that Zahavi’s observations referred initially to animals, although afterwards extended to
explain social behaviour of humans.
27
References
Abt, H.A. (2000). The reference-frequency relation in the physical sciences.
Scientometrics, 49(3): 443-451.
Abt, H.A.; Garfield, E. (2002). Is the relationship between numbers of references
and paper lengths the same for all sciences? Journal of the American Society for
Information Science and Technology, 53(13): 1106-1112.
Alimohammadi, D.; Sajjadi, M. (2009). Correlation between references and
citations. Webology, 6(2): a71.
Amat, C.B.; Yegros-Yegros, A. (2009). Median age difference of reference as
indicator of information update of research groups: a case study in Spanish food
research. Scientometrics, 78(3): 447-465.
Barnet, G.A.; Fink, E.L. (2008). Impact of the Internet and scholar age distribution
on academic citation age. Journal of the American Society for Information
Science and Technology, 59(4): 526-534.
Belsley, D.A.; Kuh, E.; Welsch, R.E. (1980). Regression diagnosis: identifying
influential data and sources of collinearity. New Jersey: Wiley.
Biglu, M.H. (2008). The influence of references per paper in SCI to Impact
Factors and the Matthew Effect. Scientometrics, 74(3): 453-470.
Bordons, M.; Barrigón, S. (1992). Bibliometric analysis of publications of Spanish
pharmacologist in the SCI (1984-1989). Part II. Scientometrics, 25(3): 425-446.
Bordons, M.; Fernandez, M.T.; Gomez, I. (2002). Advantages and limitations in
the use of impact factor measures for the assessment of research performance
in a peripherals country. Scientometrics, 53(2): 195-206.
Braun, T.; Glanzel, W.; Grupp, H. (1995). The scientometric weight of 50 nations
in 27 science areas, 1989-1993. Part II. Life Sciences. Scientometrics, 34(2):
207-237.
Budd, J.M.; Magnuson, L. (2010). Higher education literature revisited: citation
patterns examined. Research in Higher Education, 51: 294-304.
Butler, L.; Visser, M.S. (2006). Extending citations analysis to non-source items.
Scientometrics, 66(2): 327-343.
Clarke, M.E.; Oppenheim, C. (2006). Citation behaviour of information science
students II: Postgraduate students. Education for Information, 24: 1-30.
28
Clements, K.W.; Wang, P. (2003). Who cites what? The Economic Record,
79(245): 229-244
Corbyn, Z. (2010). An easy way to boost a paper’s citations. Nature <
http://www.nature.com/news/2010/100813/full/news.2010.406.html> [Accessed
06-10-2010]
Costas, R.; Bordons, M. (2005). Bibliometric indicators at the micro-level: some
results in the area of natural resources at the Spanish CSIC. Research
Evaluation, 14(2): 110-120.
Costas, R.; Bordons, M. (2006). Algorithms to solve the lack of normalization in
author names in bibliometric studies. Investigacion Bibliotecologica, 21(42): 1332.
Costas, R.; Bordons, M.; van Leeuwen, T.N.; van Raan, A.F.J. (2009). Scaling
rules in the science system: influence of field-specific citation characteristics on
the impact of individual researchers. Journal of the American Society for
Information Science and Technology, 60(4): 740-753.
Costas, R.; van Leeuwen, T.N.; Bordons, M. (2010). A bibliometric classificatory
approach for the study and assessment of research performance at the individual
level: the effect of age on productivity and impact. Journal of the American
Society for Information Science and Technology, 61(8): 1564-1581.
Franceschet, M. (2009). A cluster analysis of scholar and journal bibliometric
indicators. Journal of the American Society for Information Science and
Technology, 60(1): 1950-1964.
Frandsen, T.F.; Nicolaisen, J. (2012). Effects of academic experience and
prestige on researchers’ citing behavior. Journal of the American Society for
Inforamation Science and Technology, 63(1): 64-71.
Garfield, E. (1955). Citation indexes to Science: a new dimension in
documentation through association of ideas. Science, 122: 108-111.
Garg, K.C.; Kumar, S.; Lal, K. (2006). Scientometric profile of Indian agricultural
research as seen through Science Citation Index Expanded. Scientometrics,
68(1): 151-166.
Gingras, Y.; Larivière, V.; Macaluso, B.; Robitaille, J.-P. (2008). The effects of
aging on researchers’ publication and citation patterns. Plos ONE, 3(12): e4048.
Haslam, N.; Ban, L.; Kaufmann, L.; Loughnan, S.; Peters, K.; Whelan, J.; Wilson,
S. What makes an article influential? Predicting impact in social and personality
psychology. Scientometrics 76(1): 169-185, 2008.
29
Hirsch, J.E. (2005). An index to quantify an individual’s scientific research output.
Proceedings of the National Academy of Sciences in the United States of
America, 102(46): 16569-16572.
Ketchman, C.M.; Crawford, J.M. (2007). The impact of review articles. Laboratory
Investigation, 87: 1174-1185.
Kidd, J.S. (1990). Measuring referencing practices. Journal of the American
Society for Information Science, 41(3), 157-163.
Larivière, V.; Archambault, É.; Gingras, Y.; Vignola-Gagné, É. (2006). The place
of serials in referencing practices: comparing natural sciences and engineering
with social sciences and humanities. Journal of the American Society for
Information Science and Technology, 57(8): 997-1004.
Latour, B. (1987). Science in action. Cambridge, MA: Harvard University Press.
Lewison, G. (2009). The percentage of reviews in research output: a simple
measure of research esteem. Research Evaluation, 18(1): 25-37.
Ma, N.; Guan, J. (2005). An exploratory study on collaboration profiles on
Chinese publications in Molecular Biology. Scientometrics, 65(3): 343-355.
Milne, R.D. (1969). Mathematics of growth – V. Financial analyst journal. 13-15
Moed, H.F. (2005). Citation analysis in research evaluation. The Netherlands:
Springer.
Moed, H.F.; Garfield, E. (2004). In basic science the percentage of ‘authoritative’
references decreases as bibliographies become shorter. Scientometrics, 60(3):
295-303.
Nicolaisen, J. (2007). Citation analysis. Annual Review of Information Science
and Technology, 41: 609-641.
Nicolaisen, J.; Frandsen, T.F. (2007). The handicap principle: a new perspective
for library and information science and information science research. Information
Research, 12(4).
Oppenheim, C.; Smith, R. (2001). Student citation practices in an Information
Science Department. Education for Information, 19: 299-323
Price, D.J.S. (1965). Networks of scientific papers: the pattern of bibliographic
references indicates the nature of the scientific front. Science, 149(3683): 510515.
30
Ramesh Babu, A.; Singh, Y.P. (1998). Determinants of research productivity.
Scientometrics, 43(3): 309-329.
Rey Rocha, J.; Martin Sempere, M.J. (1999). The role of domestic journals in
geographically-oriented disciplines: the case of Spanish journals on earth
sciences. Scientometrics, 45(2): 203-216.
Roth, W.-M.; Cole, M. (2010). The referencing practices of Mind, Culture and
Activity: On Citing (Sighting?) and Being Cited (Sighted?)’. Mind, Culture, and
Activity, 17 (2), 93-101.
Sagar, A.; Kalyane, V.L.; Prakasan, E.R.; Garg, R.G.; Kumar, V. (2009).
Scientometric highlights on science and technology related review articles
affiliated to India. Malaysian Journal of Library & Information Science, 14(2): 8399.
Shanmugam, A. (2009). Citation practices amongst trainee teachers as reflected
in their project papers. Malaysian Journal of Library & Information Science, 14(2):
1-16.
Small, H. (1978). Cited documents as concept symbols. Social Studies of
Science, 8(3): 327-340.
Small, H. (2010). Referencing through history: how the analysis of landmark
scholarly texts can inform citation theory. Research Evaluation, 19(3): 185-193.
Solari, A.; Magri, M.H. (2000). A new approach to the SCI Journal Citation
Reports, a system for evaluating scientific journals. Scientometrics, 47(3): 605625.
Squires, B.P. (1989). Biomedical review articles: what editors want from authors
and peer reviewers. Canadian Medical Association Journal,141(1): 195-197.
Steele, T.W.; Stier, J.C. (2000). The impact of interdisciplinary research in the
environmental sciences: a forestry case study. Journal of the American Society
for Information Science, 51(5): 476-484.
Stewart, J.A. (1983). Achievement and ascriptive process in the recognition of
scientific articles. Social Forces, 62(1): 166-189
Tenopir, C.; King, D.W. (2000). Towards electronic journals: realities for
scientists, librarians and publishers. Washington, DC: Special Libraries
Association.
31
Tenopir, C.; King, D.W.; Spencer, J.; Wu, Lei. (2009). Variations in article
seeking and reading patterns of academics: what makes a difference? Library &
Information Science Research, 31: 139-148.
Uzun, A. (2006). Statistical relationship of some basic bibliometric indicators in
scientometrics research. In: Proceedings of the International Workshop on
Webometrics, Informetrics and Scientometrics. Nancy (France). Pp. 87-91.
<http://eprints.rclis.org/handle/10760/7432>. Accessed: 18/07/2011
van Leeuwen, T.N. (2009). Strength and weakness of national science systems:
A bibliometric analysis through cooperation patterns. Scientometrics, 79(2): 389408.
van Raan, A.F.J. (2004). Measuring Science. In: Moed, H.F.; Glänzel, W., eds.
Handbook of Quantitative Science and Technology Research. The Netherlands:
Kluwer Academic Publishers: 19-50.
Velho, L.; Krige, J. (1984). Publication and citation practices of Brazilian
agricultural scientists. Social Studies of Science, 14, 45-62
Vinkler, P. (2006). Composite scientometrics indicators for
publications of research institutes. Scientometrics, 68(3): 629-642.
evaluation
Waltman, L.; van Eck, N.J. (2009). A taxonomy of bibliometric performance
indicators based on the property of consistency. Report series: Research in
Management. < http://repub.eur.nl/res/pub/15182/ERS-2009-014-LIS.pdf>
[Accessed: 12/01/2012]
Webster, G.D.; Jonason, P.K.; Schember, T.O. (2009). Hot topics and popular
papers in Evolutionary Psychology: analysis of title words and citation counts in
Evolution and Human Behavior, 1979-2008. Evolutionary Psychology, 7(3): 348362.
Weed, D.L. (1997). Methodologic guidelines for review papers. Journal of the
National Cancer Institute, 89(1): 6-7.
Wouters, P. (1999). The citation culture. Dissertation. Amsterdam: University.
<http://garfield.library.upenn.edu/wouters/wouters.pdf> [Accessed: 25/05/2011]
Zahavi, A. (2003) Indirect selection and individual selection in sociobiology: my
personal views on theories of a social behaviour. Animal behaviour, 65: 859-863.
Zuckerman, H.; Merton, R.K. (1973). Age, aging and age structure in science. In:
Merton, R.K., ed. The Sociology of Science. Chicago: University.
32
Appendix I. Bibliometric indicators used for the analysis of scientists’
performance
For each individual researcher, a bibliometric profile comprising several indicators was
produced. Some of said indicators are based on the CWTS15 standard methodology (van
Raan, 2004).
(a) Total number of publications (P) during the period 1994-2004 considering only
articles, letters and reviews. Full counting has been used for the calculation of this
indicator when multi-authored papers are considered.
(b) Total number of citations (C) received by publications (P) during the period 19942004. Note that the citation window is variable and shorter for the most recent
publications (e.g. for publications in 1994, citations from 1994 to 2004 are considered;
while for publications in 2004, only citations in 2004 are taken into account).
(c) Citations per Publication (CPP). This is the citation-per-document rate for each
researcher. This indicator is again slightly different from the original CPP by CWTS as it
is based on C, as defined before (excluding self-citations), divided by P.
(d) Percentage of Highly-Cited Papers (%HCP). Highly-Cited Papers (HCP) are those
publications (only articles and reviews are included here) cited above the 80-percentile
in their respective CSIC research areas (Biology & Biomedicine, Materials Science and
Natural Resources). In other words, HCP are those papers among the 20% most cited
within each of the three CSIC areas.
(e) h-index. A scientist's h-index is the highest number of papers that he/she has
published which have each amassed at least the same number of citations (Hirsch,
2005).
(f) Median Impact Factor of publications (IF med). Considering all the papers published
by each researcher, the median value of the publication journal Impact Factor (as
defined by Garfield 1955) distribution is calculated. The median has been preferred to
the mean due to the reported ‘skewness’ of this indicator (Solari & Magri, 2000). The
Impact Factor is obtained through the Journal Citation Reports (JCR) as published by
Thomson Reuters.
(g) Normalized Journal Position (NJP). This is a measure of the average position of the
publication journals in their scientific categories (Thomson subject categories) according
to their impact factor (Bordons & Barrigón, 1992). Unlike the IF med, it allows for interfield comparisons as it is a field-normalized indicator.
(h) CPP/FCSm. This indicator measures the impact of a research unit (in this case,
individual researchers), compared to the world citation average in the fields in which the
unit is active (van Raan, 2004). The rate of citations per publication (CPP) (self-citations
removed) is compared with the Field Citation Score mean (FCSm) that is the field-based
worldwide average impact used as reference – this indicator (FCSm) also measures the
‘citation density’ of the field/s of publication of a given unit. Here again we use the
15
CWTS-Center for Science and Technology Studies (Centrum voor Wetenschaps-en
Technologie Studies) in Leiden University.
33
definition of fields based on the classification of scientific journals into categories
developed by Thomson Reuters. Although this classification is not perfect, it provides a
clear and ‘fixed’ consistent field definition suitable for automated procedures within any
given data-system.
(i) JCSm/FCSm. This indicator measures the impact of the publication journals within
their scientific fields. The journal-based worldwide average impact (Journal Citation
Score mean –JCSm-, thus this indicator also measure the ‘citation density’ of the
publication journals or a unit) for an individual researcher is compared to the average
citation score of the fields (FCSm).
Note that for the last indicators (CPP/FCSm, JCSm/FCSm, JCSm and FCSm), only
articles, letters and reviews (excluding book reviews) are considered, and only external
citations (citations that are not produced by any of the co-authors of the source
document) were taken into account.
34
Appendix II. Correlation between the number of references per document and other bibliometric indicators
Pearson
correlation
Natural
Resources
Biology &
Biomedicine
Indicators
Refs/Doc
Refs/Doc
1
P
C
P
.285**
1
C+sc
.517**
.882**
1
h-index
.508**
.887**
.949**
h-index
CPP
CPP/
FCSm
%HCP
JCSm/
FCSm
IF
Median
NJP
JCSm
FCSm
Pag/
Doc
Auth/
Doc
1
CPP
.640**
.288**
.635**
.542**
1
CPP/FCSm
.572**
.320**
.581**
.518**
.871**
1
%HCP
.485**
-.033
.424**
.301**
.846**
.668**
1
JCSm/FCSm
.475**
.317**
.481**
.470**
.544**
.555**
.287**
1
IF Median
.629**
.228**
.473**
.460**
.594**
.439**
.425**
.673**
1
NJP
.577**
.312**
.466**
.495**
.534**
.540**
.332**
.716**
.809**
1
JCSm
.524**
.256**
.518**
.460**
.704**
.436**
.495**
.790**
.725**
.612**
1
FCSm
.333**
.051
.270**
.211**
.518**
.076
.469**
.099
.442**
.160**
.664**
1
Pag/Doc
.272**
-.199**
-.137*
-.133*
-.004
-.014
.08
.065
-.007
.049
.022
-.031
1
Auth/Doc
.276**
.066
.174**
.187**
.198**
.182**
.148*
.220**
.193**
.288**
.209**
.071
.283**
Refs/Doc
1
P
.127*
1
C+sc
.423**
.731**
h-index
.431**
.852**
.891**
1
CPP
.528**
.015
.651**
.370**
1
CPP/FCSm
.447**
.058
.628**
.366**
.906**
1
1
1
%HCP
.406**
-.242**
.365**
.150**
.795**
.675**
1
JCSm/FCSm
.438**
-.103*
.361**
.196**
.696**
.708**
.613**
1
IF Median
.536**
-.122*
.337**
.223**
.651**
.502**
.645**
.767**
1
NJP
.540**
.107*
.356**
.360**
.505**
.465**
.529**
.609**
.725**
1
JCSm
.519**
-.106*
.401**
.218**
.776**
.555**
.714**
.827**
.856**
.600**
1
FCSm
.432**
-.028
.289**
.184**
.525**
.151**
.485**
.274**
.574**
.369**
.753**
1
Pag/Doc
.362**
.013
.028
.058
.067
.027
.140*
.172**
.141**
.122*
.193**
.152**
1
Auth/Doc
-0.043
.153**
.144**
.096
.033
.111*
-.09
.044
-.045
-.02
-.05
-.125*
.08
35
1
Pearson
correlation
Materials
Science
Indicators
Refs/Doc
Refs/Doc
1
P
.200**
P
C
h-index
CPP
CPP/
FCSm
%HCP
JCSm/
FCSm
IF
Median
NJP
JCSm
FCSm
Pag/
Doc
Auth/
Doc
1
C+sc
.462**
.838**
1
h-index
.512**
.844**
.940**
1
CPP
.592**
.310**
.718**
.641**
1
CPP/FCSm
.433**
.285**
.642**
.568**
.922**
1
%HCP
.628**
.158**
.587**
.610**
.797**
.719**
1
JCSm/FCSm
.588**
.160**
.371**
.417**
.584**
.540**
.519**
1
IF Median
.769**
.212**
.457**
.496**
.590**
.437**
.595**
.764**
1
NJP
.488**
.240**
.345**
.415**
.439**
.374**
.481**
.752**
.764**
1
JCSm
.711**
.228**
.478**
.485**
.684**
.455**
.594**
.800**
.835**
.625**
1
FCSm
.563**
.253**
.420**
.409**
.542**
.241**
.426**
.385**
.605**
.364**
.844**
1
Pag/Doc
.084
-.199**
-.238**
-.224**
-.184**
-.150**
-.126*
-.193**
-.195**
-.237**
-.146**
-.034
1
Auth/Doc
.334**
.299**
.426**
.384**
.330**
.232**
.310**
.200**
.366**
.196**
.274**
.234**
-.389**
Note: SQRT normalization for all the bibliometric indicators; loadings > 0.40 for Refs/Doc are in bold
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed)
36
1