arXiv

Developing ArXivSI to Help Scientists to
Explore the Research Papers in ArXiv
Zhang Zhixiong, Qian Li, Shi Hongbo
National Science Library, Chinese Academy of Sciences
Sep 24, 2015
Outline

1. Introduction

2. Main Ideas & Implementation

3. Using of ArXivSI

4. Conclusion
Outline

1. Introduction

2. Main Ideas & Implementation

3. Using of ArXivSI

4. Conclusion
1. Introduction

arXiv



An e-print service in the fields of physics,
mathematics, computer science...
Hosted by the Cornell University Library
Houses more than one million of e-prints




arXiv Hits 1 Million Submissions this year
200-300 new submissions each day
A driving force in scientific communication
draw in thousands of researchers to see the latest
developments in their fields
1. Introduction

Grisha Perelman


A proof of the Poincaré
Conjecture in three papers
made available in 2002 and
2003 on arXiv.
On December 22, 2006,
Science honored
Perelman's proof of the
Poincaré conjecture as the
scientific "Breakthrough
of the Year"
1. Introduction

Content in arXiv is very good, but Some
weaknesses in using this contents




ArXiv search platform (http://arxiv.org)
just provides a simple search result set for user.
It does not help user to “analyze” what is included
in the result set and “reveal” some clues for user to
better understand and use the search result.
It cannot help user to explore valuable research
papers housed in arXiv in a convenient manner.
1. Introduction
1. Introduction
1. Introduction
1. Introduction

The weaknesses of search functions of
original arXiv platform:




Does not display more than 1000 hits in one
query
No filtering aid is provided to specify the
search query and refine the search result
No sort aid is provided to sort the search result
in different ways (title, submission time,
subjects etc)
No navigation tool is provided to help user
better understand and use the search result.
1. Introduction



In order to help scientists efficiently
discover the knowledge they need housed
in arXiv
with the help of CUL, the NSL of CAS
developed a system named arXiv Search
Interface ( arXivSI )
which can help scientists to explore
the arXiv research papers in a more
vivid way
http://arxivsi.las.ac.cn
Outline

1. Introduction

2. Main Ideas & Implementation

3. Using of ArXivSI

4. Conclusion
2. Main Ideas & Implementation

The main idea behind the arXivSI


Trying to develop an exploratory system
Turn the arXiv search experience from
information retrieval to knowledge
exploration
2. Main Ideas & Implementation

To be an knowledge exploratory system,
arXivSI should




analyze the arXiv search results
reveal useful patterns hidden in those results
visualize those patterns in a vivid way
provide an user-friended interface to help user
explore the papers in arXiv easily and
smoothly
Architecture of ArxivSI
Detail Page of Article AND Fulltext Article
arXiv Search Interface (arXivSI)
Faceted Search
Guided Browsing
Search Condition
Navigation Bar
Patterns
Visualization
arXiv.org
Full text
Link
arXiv:1410.6143
Article-ID
Indexer based on Solr Search Platform
Automatic
Indexer
OAI-PMH
Protocol
Metadata Repository
Metadata Repository
Automatic
Harvester
arXiv
From Information Retrieval to Knowledge Exploration
Knowledge Exploration
Model
(arXivSI)
Information Retrieval
Model
(arXiv)
Search Information
Obtain Fulltext paper
Facted Search Features
transforming
Guided Browsing
Search Conditions
Navigation
Patterns Visualization
Obtain fulltext paper
2. Main Ideas & Implementation

(1) Faceted search features



Shows hits counts by category in addition to search result.
Help user to “analyze” what is included in the search
result
“reveal” some clues for user to better understand and use
the search result
Faceted search features
Faceted search features
2. Main Ideas & Implementation

(2) Guided Browsing


Based on the faceted index, guided browsing
is developed for user to select and explore
the papers by specific domain, subject,
submission time or author
which means the user could browse any
paper they are interested easily and
conveniently in a large search result.
Guided Browsing
Guided Browsing
2. Main Ideas & Implementation

(3)Search Conditions Navigation Bar


A search condition navigation bar is
implemented in arXivSI to trace user’s
search actions.
The user can add or remove any search
condition easily to constrict or expand the
results.
Search Conditions Navigation Bar
Search Conditions Navigation Bar
Search Conditions Navigation Bar
Search Conditions Navigation Bar
2. Main Ideas & Implementation

(4) Patterns Visualization


To reveal useful patterns hidden in the search
results, visualization functions have been
implemented to visualize the patterns of search
result.
The user of arXivSI can get a bird view of the
information related to his search terms and
explore the papers in arXiv easily and smoothly
Visualization of Submission
Count by Domain
Visualization of Submission Rate
by Domain
Visualization of Subject Distribution
Patterns for One Domain
Visualization of Distribution
Patterns for Search Result
Visualization Distribution pattern of Search Result of
"dark matter"
Visualized Navigation tools for Search Result of "dark
matter"
Distribution of Research
Papers By Time
Outline

1. Introduction

2. Main Ideas & Implementation

3. Using of ArXivSI

4. Conclusion
3. Using of ArXivSI

On December 12, 2014, arXivSI service is
released publicly

http://arxivsi.las.ac.cn
3. Using of ArXivSI

Top institutional users of arXivSI in China
No.
Institution Name
Total access
1
Chinese Academy of Sciences
63940
2
Zhejiang University
535
3
University of Science and Technology of China
253
4
Tsinghua University
230
5
The Hong Kong University of Science and Technology
162
7
Renmin University of China
145
8
Beijing Normal University
133
9
Shanghai Jiao Tong University
94
10
Huazhong University of Science and Technology
92
3. Using of ArXivSI

Top institutional users of arXivSI outside China
No.
1
2
3
4
5
6
7
8
9
10
Institution Name
Stanford University
Nanyang Technological University
University of Oxford
Columbia University
Massachusetts Institute of Technology
CERN
The University of Tokyo
Universität Hannover
University of Toronto
Princeton University
Total access
234
182
132
105
101
95
91
69
67
66
Outline

1. Introduction

2. Main Ideas & Implementation

3. Using of ArXivSI

4. Conclusion
4. Conclusion




arXivSI provides an efficient way to
explore and discover the knowledge
housed in arXiv
Used inside and outside CAS, but still not
widely used outside CAS
Welcome to try it, http://arxivsi.las.ac.cn
Future work


ML and NLP
Deep knowledge exploration services
Thanks


Thank You for Your Attention!
谢谢!