Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Zhang Zhixiong, Qian Li, Shi Hongbo National Science Library, Chinese Academy of Sciences Sep 24, 2015 Outline 1. Introduction 2. Main Ideas & Implementation 3. Using of ArXivSI 4. Conclusion Outline 1. Introduction 2. Main Ideas & Implementation 3. Using of ArXivSI 4. Conclusion 1. Introduction arXiv An e-print service in the fields of physics, mathematics, computer science... Hosted by the Cornell University Library Houses more than one million of e-prints arXiv Hits 1 Million Submissions this year 200-300 new submissions each day A driving force in scientific communication draw in thousands of researchers to see the latest developments in their fields 1. Introduction Grisha Perelman A proof of the Poincaré Conjecture in three papers made available in 2002 and 2003 on arXiv. On December 22, 2006, Science honored Perelman's proof of the Poincaré conjecture as the scientific "Breakthrough of the Year" 1. Introduction Content in arXiv is very good, but Some weaknesses in using this contents ArXiv search platform (http://arxiv.org) just provides a simple search result set for user. It does not help user to “analyze” what is included in the result set and “reveal” some clues for user to better understand and use the search result. It cannot help user to explore valuable research papers housed in arXiv in a convenient manner. 1. Introduction 1. Introduction 1. Introduction 1. Introduction The weaknesses of search functions of original arXiv platform: Does not display more than 1000 hits in one query No filtering aid is provided to specify the search query and refine the search result No sort aid is provided to sort the search result in different ways (title, submission time, subjects etc) No navigation tool is provided to help user better understand and use the search result. 1. Introduction In order to help scientists efficiently discover the knowledge they need housed in arXiv with the help of CUL, the NSL of CAS developed a system named arXiv Search Interface ( arXivSI ) which can help scientists to explore the arXiv research papers in a more vivid way http://arxivsi.las.ac.cn Outline 1. Introduction 2. Main Ideas & Implementation 3. Using of ArXivSI 4. Conclusion 2. Main Ideas & Implementation The main idea behind the arXivSI Trying to develop an exploratory system Turn the arXiv search experience from information retrieval to knowledge exploration 2. Main Ideas & Implementation To be an knowledge exploratory system, arXivSI should analyze the arXiv search results reveal useful patterns hidden in those results visualize those patterns in a vivid way provide an user-friended interface to help user explore the papers in arXiv easily and smoothly Architecture of ArxivSI Detail Page of Article AND Fulltext Article arXiv Search Interface (arXivSI) Faceted Search Guided Browsing Search Condition Navigation Bar Patterns Visualization arXiv.org Full text Link arXiv:1410.6143 Article-ID Indexer based on Solr Search Platform Automatic Indexer OAI-PMH Protocol Metadata Repository Metadata Repository Automatic Harvester arXiv From Information Retrieval to Knowledge Exploration Knowledge Exploration Model (arXivSI) Information Retrieval Model (arXiv) Search Information Obtain Fulltext paper Facted Search Features transforming Guided Browsing Search Conditions Navigation Patterns Visualization Obtain fulltext paper 2. Main Ideas & Implementation (1) Faceted search features Shows hits counts by category in addition to search result. Help user to “analyze” what is included in the search result “reveal” some clues for user to better understand and use the search result Faceted search features Faceted search features 2. Main Ideas & Implementation (2) Guided Browsing Based on the faceted index, guided browsing is developed for user to select and explore the papers by specific domain, subject, submission time or author which means the user could browse any paper they are interested easily and conveniently in a large search result. Guided Browsing Guided Browsing 2. Main Ideas & Implementation (3)Search Conditions Navigation Bar A search condition navigation bar is implemented in arXivSI to trace user’s search actions. The user can add or remove any search condition easily to constrict or expand the results. Search Conditions Navigation Bar Search Conditions Navigation Bar Search Conditions Navigation Bar Search Conditions Navigation Bar 2. Main Ideas & Implementation (4) Patterns Visualization To reveal useful patterns hidden in the search results, visualization functions have been implemented to visualize the patterns of search result. The user of arXivSI can get a bird view of the information related to his search terms and explore the papers in arXiv easily and smoothly Visualization of Submission Count by Domain Visualization of Submission Rate by Domain Visualization of Subject Distribution Patterns for One Domain Visualization of Distribution Patterns for Search Result Visualization Distribution pattern of Search Result of "dark matter" Visualized Navigation tools for Search Result of "dark matter" Distribution of Research Papers By Time Outline 1. Introduction 2. Main Ideas & Implementation 3. Using of ArXivSI 4. Conclusion 3. Using of ArXivSI On December 12, 2014, arXivSI service is released publicly http://arxivsi.las.ac.cn 3. Using of ArXivSI Top institutional users of arXivSI in China No. Institution Name Total access 1 Chinese Academy of Sciences 63940 2 Zhejiang University 535 3 University of Science and Technology of China 253 4 Tsinghua University 230 5 The Hong Kong University of Science and Technology 162 7 Renmin University of China 145 8 Beijing Normal University 133 9 Shanghai Jiao Tong University 94 10 Huazhong University of Science and Technology 92 3. Using of ArXivSI Top institutional users of arXivSI outside China No. 1 2 3 4 5 6 7 8 9 10 Institution Name Stanford University Nanyang Technological University University of Oxford Columbia University Massachusetts Institute of Technology CERN The University of Tokyo Universität Hannover University of Toronto Princeton University Total access 234 182 132 105 101 95 91 69 67 66 Outline 1. Introduction 2. Main Ideas & Implementation 3. Using of ArXivSI 4. Conclusion 4. Conclusion arXivSI provides an efficient way to explore and discover the knowledge housed in arXiv Used inside and outside CAS, but still not widely used outside CAS Welcome to try it, http://arxivsi.las.ac.cn Future work ML and NLP Deep knowledge exploration services Thanks Thank You for Your Attention! 谢谢!
© Copyright 2026 Paperzz