RoMEO and CRIS Technical Issues & Efficiency Tips Peter Millington Centre for Research Communications University of Nottingham RoMEO and CRIS in Practice Birmingham, 1st April 2011 Outline • Patterns of usage – Do we have a crisis? • Approaches to using ROMEO in CRIS – Real time queries – Caching and reusing RoMEO query results • Rates of change – Reality Check – And their implications • Other efficiency tips Usage of Interactive RoMEO One Month Number of Page Views 6000 5000 4000 3000 2000 1000 0 27/02/2011 06/03/2011 13/03/2011 Date 20/03/2011 27/03/2011 100000 80000 60000 40000 20000 Number of Page Views Usage of Interactive RoMEO Long Term 120000 0 Mar-2011 Feb-2011 Jan-2011 Dec-2010 Nov-2010 Oct-2010 Sep-2010 Aug-2010 Jul-2010 Jun-2010 May-2010 Apr-2010 Mar-2010 Feb-2010 Jan-2010 Dec-2009 Month Usage of Interactive RoMEO • • • • • • Similar curve shapes for other measures Distinct weekly pattern ~4,500 Page views per day ~1,000 Visits per day ~ 700 Unique visitors per day Seems to be a stable seasonal pattern Usage of the RoMEO API – All Users One Month 100 Number of IP Addresses 90 80 70 60 50 40 30 20 10 0 01/03/11 08/03/11 15/03/11 Date 22/03/11 29/03/11 Usage of the RoMEO API – All Users Long Term 100 No. of IP Addresses 90 80 70 60 50 40 30 20 10 0 01/11/200 31/12/200 01/03/201 30/04/201 29/06/201 28/08/201 27/10/201 26/12/201 24/02/201 9 9 0 0 0 Date 0 0 0 1 Usage of the RoMEO API – Requests One Month Number of Requests 600000 500000 400000 300000 200000 100000 0 01/03/2011 08/03/2011 15/03/2011 Date 22/03/2011 29/03/2011 Usage of the RoMEO API – Requests Long Term Number of Requests 600000 500000 400000 300000 200000 100000 0 01/11/2009 31/12/2009 01/03/2010 30/04/2010 29/06/2010 28/08/2010 27/10/2010 26/12/2010 24/02/2011 Date Usage of the RoMEO API • Much more variable pattern – Weekly cycle of visits less distinct – Number of requests very highly variable • More usage by fewer users – ~60 Unique visitors per day – Over 250,000 hits per day (>50 times interactive) • Significant growth – Steady growth in number of API users – Rapid growth in number of requests Do we have a Crisis? • Do you ever think RoMEO is slow? – Most API usage is by CRIS-like applications • How can we improve things? – Higher capacity server? • Funding? Unnecessary? – Improve efficiency? • Optimise the API? More efficient usage? – Put a cap on number of requests per day? • What level? 1000? 2000? – Block commercial software users • N.b. Creative Commons License API approaches in CRIS applications • Real time requests when displaying data – Acceptable for individual article displays – Latency too slow for lists of articles • Caching RoMEO data for rapid local re-use – Initial (bulk) checks against RoMEO – Store the results locally – Periodically recheck for updated policies • Whole bibliography • Additions and updates only Real Time Usage Pattern 30000 Number of Request 25000 20000 15000 10000 5000 0 01/11/2009 09/02/2010 20/05/2010 28/08/2010 Date 06/12/2010 16/03/2011 Real Time Usage Pattern 30000 One Month 20000 15000 10000 5000 0 01/11/2009 2000 1800 1600 1400 1200 1000 800 600 400 200 09/02/2010 0 20/05/2010 28/08/2010 15/03/2011 08/03/2011 01/03/2011 Date Number of Requests Number of Request 25000 Date 06/12/2010 22/03/2011 16/03/2011 29/03/2011 Real Time Usage Pattern • Levels vary day by day – Arguably high usage for one installation • Occasional peaks – Special system jobs – Special end user projects Caching with Monthly Updates 100000 Number of Requests 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 01/11/2010 01/12/2010 01/01/2011 Date 01/02/2011 01/03/2011 01/04/2011 Caching with Monthly Updates • Rechecking the whole database each cycle – Seems to take three days. Low priority setting? • Scheduled job – starts 1st of the month – Could it be a weekend instead? • Faster. Less intrusive. • What is being checked? – Each reference? – Groups of records for each journal title? • What about additions between cycles? Caching with Daily Updates (1) 2500 Number of Requests 2000 1500 1000 500 0 01/06/2010 21/07/2010 09/09/2010 29/10/2010 Date 18/12/2010 06/02/2011 28/03/2011 Caching with Daily Updates (1) 2500 One Month 2500 1500 1000 500 0 01/06/2010 Number of Requests Number of Requests 2000 2000 1500 1000 21/07/2010 500 09/09/2010 29/10/2010 18/12/2010 06/02/2011 28/03/2011 0 Date 01/03/2011 06/03/2011 11/03/2011 16/03/2011 21/03/2011 26/03/2011 31/03/2011 Date Caching with Daily Updates (1) • Whole database checked every day – Institutions can easily have lists of 50,000 items! – Lists constantly growing, slowing things down • What is being checked? – Each reference? Probably • Additions and updates between checks? – No accuracy problems • Sledgehammer to crack a nut Is the nut cracking the sledgehammer? 90000 Number of Requests 80000 70000 60000 50000 40000 30000 20000 10000 0 01/05/2010 20/06/2010 09/08/2010 28/09/2010 Date 17/11/2010 06/01/2011 25/02/2011 Caching with Daily Updates (2) Number of Requests (Log scale) 100000 10000 1000 100 10 1 01/03/2010 30/04/2010 29/06/2010 28/08/2010 Date 27/10/2010 26/12/2010 24/02/2011 Caching with Daily Updates (2) • • • • Note the logarithmic scale Large initial check of the whole database Daily check of added & changed items only Welcome low loading on the API Rates of Change – Reality Check • Institutional Bibliographies – Up to 2,000 additions per year (<40 per week) – Few bibliographic changes after initial QA • RoMEO Publishers’ Policies – c.25 additions or substantive changes per week • Journal - Publisher Correlations – Change of publisher - infrequent - mostly January – Bulk changes - Business take-over or name change • Expiry of archiving embargos RoMEO Implications of Change Rates • Institutional Bibliographies – Only need to check additions & changes – Weekly check probably sufficient, or on first use • RoMEO Publishers’ Policies – Recheck when the RoMEO record changes – Store RoMEO ID with article/journal for bulk updates • Journal - Publisher Correlations – Full recheck annually on rolling cycle – Specific rechecks for known business/name changes • Expiry of archiving embargos – Scope for improvement in RoMEO Caching of RoMEO Publisher Data • Download the whole database with “?all=yes” – Relatively fast – Download as often as you wish • Suggest weekly • And/Or… – Store key RoMEO data with bibliographic records – Provide links to interactive RoMEO • Full publisher records using RoMEO ID, or • Journal level data using ISSN Caching Journal-level Data • Schema/Organisation – Per journal (efficient) – Per article (probably inefficient) • Fields – – – – Journal title ISSN and ESSN RoMEO Persistent Publisher ID RoMEO Colour and/or Version-specific permissions • Normal – i.e. At the time of publication • Adjusted after the completion of any embargo period Most Efficient RoMEO Queries • Journals – ISSN/ESSN or Exact Title • Unique or far fewer results, so faster • May avoid the overhead of needing to search Zetoc • Publishers – RoMEO ID • Unique result. It gets no faster. – Exact publisher name • May sometimes find multiple results. What to do with failed requests? • Don’t just keep rechecking! • Not a journal article? – Outside RoMEO’s scope. Prevent rechecking • Data error (e.g. typo, bad abbreviation)? – Correct the source data, then recheck • No publisher or no policy in RoMEO? – Feedback to RoMEO – if important – Recheck infrequently – say annually or quarterly Any Questions? RoMEO: API: http://www.sherpa.ac.uk/romeo http://www.sherpa.ac.uk/romeo/api Blog: E-mail: Twitter: http://romeoblog.jiscinvolve.org [email protected] @SHERPAServices Peter Millington: [email protected] 0115 84 68481 http://www.sherpa.ac.uk/romeo/
© Copyright 2026 Paperzz