What is bibliometrics

Applying bibliometrics in research
assessment and management ...
It’s complicated !
Dr. Thed van Leeuwen
Presentation at the NARMA Meeting, 14th April 2015
Outline
• CWTS and Bibliometrics
• Detail and accuracy in bibliometric applications
• Normalization in bibliometrics
• Coverage in bibliometric studies
• Infamous bibliometric indicators – What to avoid
• CWTS methodology – basic indicators
• Advantages and disadvantages in bibliometric analysis
1
CWTS and
Bibliometrics
2
What is bibliometrics ?
• Quantitative analysis of science & technology, and the study of
cognitive and organizational structures in science and technology.
• Scientific communication between scientists through (mainly) journal
publications.
• Key concepts are output and impact, as measured through
publications and citations.
• Important starting point in bibliometrics: scientists express, through
citations in their scientific publications, a certain degree of influence
of others on their own work.
• By large scale quantification, citations indicate (inter)national
influence or (inter)national visibility of scientific activity, but should
not be interpreted as synonym for ‘quality’.
CWTS data system
• CWTS has a full bibliometric license from Thomson
Reuters to conduct evaluation studies using the Web
of Science.
• Our database covers the period 1981-2014/5.
• Some characteristics:
– Over 41.000.000 publications.
– Over 600.000.000 citation relations between source papers.
– Author disambiguation tools are applied, linked with acquired experience
– Continuous address cleaning tools being developed, related to the Leiden Ranking.
– Contains reference sets for journal and field citation data.
Detail and accuracy
in bibliometric
applications
5
Tension between detail and accuracy:
Duhem’s ‘Law of Cognitive Complementarity’ *
• An inverse relationship
exists between the
precision of our
information, and its’
substantiation
• Detail and security /
accuracy stand in a
competing relationship !
*
‘Epistemetrics’ by Nicolas Rescher (2006)
• We estimate the size of
the tree at around 8 mtr.
• We are quite sure that
the tree is between 6-12
mtr. high.
• We are virtually certain
that ist height is
between 3-18 mtr.
• But we can be completely
and absolutely sure that
its height is between 1
mtr and 56 mtr.
7
Levels of aggregation in bibliometric analysis
• We distinguish various levels of analysis:
– Macro-level, e.g. country and region comparison for
the EU, Dutch Observatory of S&T.
– Meso-level, e.g. research organizations, universities,
institutes.
– Micro-level, e.g. analysis of programs, groups, or
even, increasingly, individual researchers.
Bibliometrics can be applied on all three levels of analysis,
however, every level brings it’s own requirements !!!
Data collection in bibliometric analysis
• Roughly, we can distinguish three methods
for the collection of a set of publications:
– Based on a list of names of researchers
(verification through a website creates a valid dataset)
– Based on a list of publications of a unit
(the supplied lists form the authorized/verified dataset)
– Based on the address of an institute or unit
(this approach does not allow lower level insights and conclusions)
We work with various methods, macro-level studies
usually exclude the first two methods.
RESEARCH PROFILE
OUTPUT AND IMPACT PER FIELD
2005 - 2009/2010
Example of a so-called
Research Profile
FIELD
(MNCS)
ONCOLOGY (1.21)
CARD&CARDIOV SYS (1.78)
HEMATOLOGY (1.31)
ENDOCRIN&METABOL (1.16)
• Profile of Leiden University
Medical Center
IMMUNOLOGY (1.18)
RHEUMATOLOGY (2.00)
GENETICS&HEREDIT (1.77)
RAD,NUCL MED IM (1.24)
CLIN NEUROLOGY (1.37)
• In Immunology they’re not
as strong as in other
medical disciplines.
BIOCHEM&MOL BIOL (1.23)
PERIPHL VASC DIS (1.43)
MEDICINE,GEN&INT (3.63)
NEUROSCIENCES (1.21)
SURGERY (1.83)
OBSTETRICS&GYNEC (1.28)
UROLOGY&NEPHROL (1.66)
• However, this does not
automatically mean that the
Dept. of Immunology is
performing at that level !
PHARMACOL&PHARMA (1.21)
GASTROENTEROLOGY (1.47)
PEDIATRICS (1.49)
CELL BIOLOGY (1.32)
PSYCHIATRY (1.24)
MICROBIOLOGY (1.75)
RESPIRATORY SYST (2.13)
PUBL ENV OCC HLT (1.68)
VIROLOGY (1.10)
PATHOLOGY (1.43)
INFEC DISEASE (1.23)
MEDICINE,RES&EXP (1.64)
0
1
2
3
4
5
6
7
8
Share of the output (%)
IMPACT:
LOW
AVERAGE
HIGH
Disconnect between organizational
units & fields
Dept. of
Physics
Dept. of
Chemistry
C-IV
C-III
C-I
P-III
P-I
C-II
Chemical
Engineering
P-II
Chemistry
Physics
P-IV
Normalization in
bibliometrics
12
On normalization in bibliometric analysis
• The use of normalization is conditio sine qua non in applying
bibliometric techniques.
• The most used system is that Journal Subject Categories, which
fits the multidisciplinary nature of the Web of Science.
• However, the most applied system, that of Journal Subject
Categories, has serious drawbacks *
* Van Eck, N.J., et al (2013). Citation analysis may severely underestimate the impact of
13
clinical research as compared to basic research. PLoS ONE, 8(4), e62395. arXiv:1210.0442
Journal Subject Category
“Clinical Neurology”
Some conclusions on normalization
• Therefore, CWTS has developed methods to normalize in a
different way, avoiding these problems.
• However, normalization and level of aggregation remain in a
complex relationship.
• We have to remain aware of the other meaning of the word
normalization, and avoid that this becomes a straight jacket.
15
Coverage in
bibliometric
studies
16
Introduction
• The use of evaluative bibliometrics can only become meaningful
when used in a the right context.
• Publication culture of the unit(s) under assessment are shaping
that context.
• As such, any bibliometric study should start with an assessment
of the adequacy of metrics in that particular context.
• Therefore, CWTS has developed methods to assess that fit of
metrics in a certain context.
17
How to define adequate coverage ?
•
In order to determine whether metrics applied in an
assessment context are meaningful, one needs to
know what is represented through the metrics.
•
We distinguish two types of coverage:
– Internal (from inside the perspective of the WoS)
– External (from the perspective of a total output set)
18
Assessing the adequacy of WoS for bibliometrics:
The Internal coverage method
– Look at publications in WoS across fields,
– Use the references given by the authors of the publications,
– Analyze the communication channels referred to,
– Usage of WoS journals as share of the total number of references is an
indication of the relevance for the authors involved,
– Thereby constituting a basis for the usage of bibliometrics as evaluation tool !
Assessing the adequacy of WoS for bibliometrics:
The External coverage method
– Use the list of publications of an organization, subject of a bibliometric
analysis.
– Match the submitted list with the WoS.
– Degrees of covered scientific outlets indicate the relevance of WoS journals.
– Thereby constituting a basis for the usage of bibliometrics as an evaluation
tool !
Internal coverage
in bibliometric
studies
21
AU
Moed, HF; Garfield, E.
TI
In basic science the percentage of 'authoritative' references
decreases as bibliographies become shorter
SO
SCIENTOMETRICS 60 (3): 295-303, 2004
Y
RF
ABT HA, J AM SOC INF SCI T, v 53, p 1106, 2004
Y
GARFIELD, E. CITATION INDEXING, 1979 (BOOK!)
N
Not in WoS
S
GARFIELD E, ESSAYS INFORMATION S, v 8, p 403, 1985
N
GILBERT GN, SOC STUDIES SCI, v 7, p 113, 1977
Y
MERTON RK, ISIS, v 79, p 606, 1988
Y
WoS Coverage
= 5/7 = 71%
Discipline
(Publications in 2010)
in
WO
ROUSSEAU R, SCIENTOMETRICS, v 43, p 63, 1998
Y
ZUCKERMAN H, SCIENTOMETRICS, v 12, p 329, 1987
Y
BASIC LIFE SCIENCES (99,991)
WoS Coverage in 2010
across disciplines
BIOMEDICAL SCIENCES (105,156)
MULTIDISCIPLINARY JOURNALS (8,999)
CHEMISTRY AND CHEMICAL ENGINEERING (118,141)
CLINICAL MEDICINE (224,983)
ASTRONOMY AND ASTROPHYSICS (12,932)
PHYSICS AND MATERIALS SCIENCE (137,522)
BASIC MEDICAL SCIENCES (18,450)
•
Black=Excellent coverage (>80%)
•
Blue= Good coverage (between 60-80%)
•
Green= Moderate coverage (but above
50%)
•
Orange= Moderate coverage (below 50%,
but above 40%)
•
Red= Poor coverage (highly problematic,
below 40%)
BIOLOGICAL SCIENCES (60,506)
AGRICULTURE AND FOOD SCIENCE (26,709)
INSTRUMENTS AND INSTRUMENTATION (8,485)
EARTH SCIENCES AND TECHNOLOGY (33,160)
PSYCHOLOGY (24,244)
ENVIRONMENTAL SCIENCES AND TECHNOLOGY (42,705)
MECHANICAL ENGINEERING AND AEROSPACE (20,336)
HEALTH SCIENCES (29,213)
ENERGY SCIENCE AND TECHNOLOGY (15,021)
MATHEMATICS (27,873)
STATISTICAL SCIENCES (11,263)
GENERAL AND INDUSTRIAL ENGINEERING (8,756)
CIVIL ENGINEERING AND CONSTRUCTION (8,430)
ECONOMICS AND BUSINESS (16,243)
ELECTRICAL ENGINEERING AND TELECOMMUNICATION (...
MANAGEMENT AND PLANNING (7,201)
COMPUTER SCIENCES (23,687)
EDUCATIONAL SCIENCES (9,917)
INFORMATION AND COMMUNICATION SCIENCES (4,006)
SOCIAL AND BEHAVIORAL SCIENCES, INTERDISCIPLINARY...
SOCIOLOGY AND ANTHROPOLOGY (9,907)
LAW AND CRIMINOLOGY (5,299)
LANGUAGE AND LINGUISTICS (3,514)
POLITICAL SCIENCE AND PUBLIC ADMINISTRATION (6,423)
HISTORY, PHILOSOPHY AND RELIGION (11,753)
CREATIVE ARTS, CULTURE AND MUSIC (6,147)
LITERATURE (4,786)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100
%
% Coverage of references in WoS
External coverage
in bibliometric
studies
25
External coverage & journal literature (i)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
(Bio)medicine
Economics & Management
All Publications
Humanities
WoS Publications
Law
Social sciences
• Production is spread across disciplines.
• In Web of Science, Biomedicine is dominant !
External coverage & journal literature (ii)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
(Bio)medicine
Economics & management
BOOK
CASE
CHAP
CONF
GEN
Humanities
JOUR
MGZN
PAT
RPRT
THES
Law
Social sciences
• We observe a variety of types of output.
• Journal publishing is important in all disciplines !
Infamous
bibliometric
indicators:
JIF & H-index
28
Definitions of Journal Impact Factor & Hirsch Index
• Definition of JIF:
– The mean citation score of a journal, determined by dividing all
citations in year T by all citable documents in years T-1 and T-2.
• Definition of h-index:
– The ‘impact’ of a researcher, determined by the number of received
citations of an oeuvre, sorted by descending order, where the
number of received citations on that single paper equals the rank
position.
Problems with JIF
•
Methodological issues
– Was/is calculated erroneously
(Moed & van Leeuwen, 1996)
– Not field normalized
– Not document type normalized
– Underlying citation distributions are highly skewed
•
(Seglen, 1994)
Conceptual/general issues
– Inflation
(van Leeuwen & Moed, 2002)
– Availability promotes journal publishing
– Is based on expected values only
– Stimulates one-indicator thinking
– Ignores other scholarly virtues
Deconstructing the myth of the JIF…
•
•
•
Take the Dutch output
Similar journal impact classes
Focus on publications that belong to the top 10% of their field
40,0%
35,0%
30,0%
25,0%
A (0 >MNJS <=0.40)
20,0%
B (0.40 > MNJS <= 0.80)
C (0.80 > MNJS <= 1.20)
D (1.20 > MNJS <=1.60)
15,0%
10,0%
5,0%
0,0%
E (MNJS > 1.60)
Problems with H-index
•
Bibliometric-mathematical issues
– mathematically inconsistent
(Waltman & van Eck, 2012)
– conservative
– Not field normalized
•
(van Leeuwen, 2008)
Bibliometric-methodological issues
– How to define an author?
– In which bibliographic/metric environment?
•
Conceptual/general issues
– Favors age, experience, and high productivity
(Costas & Bordons, 2006)
– No relationship with research quality
– Ignores other elements of scholarly activity
– Promotes one-indicator thinking
The problem of fields and h-index …
•
•
Spinoza candidates, across all domains …
Use output, normalized impact, and h-index
60
7.00
6.00
Med
50
Phy
5.00
Med
40
Bio
Phy
Soc
Med
Psy
Med
H-index
CPP/FCSm
4.00
Bio Bio
Med
Med
Med Che
Psy
Che
Eng
Phy
Med
Eng
Soc
Med
2.00
Hum
Mat
Che
Psy
Phy
Bio
Psy
Che
Bio
20
Eng
Soc
Mat
Hum
Soc
Phy
10
Med
Med
1.00
Med
Med
Bio
0.00
Env
Phy
Psy
3.00
Env
Med
Phy
30
Phy
Eng
Psy
0
0
50
100
150
TOTAL PUBLICATIONS
200
250
0
50
100
150
TOTAL PUBLICATIONS
200
250
In what database context … ?
Selected my own publications in WoS and Scopus, Google Scholar
has a pre-set profile.
Database
H-index
Based upon …
Web of Science
14
Articles in journals
Scopus
25
Articles, book (chapters), and
conference proceedings papers
Google Scholar
33
All types, incl. Reports
34
CWTS methodology:
basic indicators
35
Indicators suitable for assessment (1)
p: the number of publications of a unit, in a certain period.
tcs: The total number of citations received in a certain period.
mcs: the mean citation score of the oeuvre of a unit.
% not cited: the share of that oeuvre that is not cited.
% self citations: the share of citations given by the (co-)authors.
36
Indicators suitable for assessment (2)
mncs: the comparison of the actual impact with expected field
average impact scores.
mnjs: comparison of the journals in which the unit published, with
the field average impact in which the output was published.
internal coverage: indicates relevance of the bibliometric analysis,
based on reference behavior of units themselves.
Top 10%: The share of the output that belongs to the top 10% most
highly cited in the fields the unit is active in.
37
Various additional types of analysis focus on …
• Research profiles: a break down of the output over various
fields of science.
• Scientific cooperation analysis: a break down of the output
over various types of scientific collaboration.
• Knowledge user analysis: a break down of the ‘responding’
output into citing fields, countries or institutions.
• Network analysis: how is the network of partners composed,
based on scientific cooperation?
Advantages and
disadvantages of
bibliometric
analysis
39
Some disadvantages of applying bibliometrics …
• Steers away from more qualitative considerations.
• Metrics shape as much as measure scientific activity.
• People tend to forget we are talking about ‘indicators’.
• Tends to stimulate one-dimensional thinking.
• It requires skills to calculate and interpret results.
• ….
Some advantages of applying bibliometrics …
• It offers insights into underlying structures and patterns.
• It is a strong complementary tool to peer review.
• It is relatively stable in time.
• ….
We have not dealt with …
• The historical-social sciences perspective on the origins of the rise
of bibliometrics in the nowadays science system.
• University rankings and all their problems.
• Bibliometric mapping and network methodologies.
• ‘Address’ and ‘Author’ issues when collecting data.
• Open Access and the ‘issues’ in relation to evaluation
• …
Some conclusions …
• Bibliometrics should always be combined with peer review,
• … and preferably conducted by skilled experts !
• Always contextualize the bibliometric scores !
• One better avoids the ‘Quick & Dirty’ indicators !
• Advanced bibliometrics can be very helpful in research
management, at various levels.
Thank you for your attention!
Any questions?
Ask me now, or mail me
[email protected]
44