Securing Text and Data Mining Rights in Licensing Scholarly Materials The NERL Perspective Christine M. Stamison, MLIS Director, NorthEast Research Libraries Consortium Presentation for ALCTS CRS Interest Group American Library Association Conference Chicago, June 25, 2017 The NERL Consortium • 30 Core Dues Paying Members – Primarily located in the Northeast but also scattered in Midwest and West Coast – Only core members have voting rights • 116 Affiliate Libraries – All over the US and two in Canada – Affiliates can participate in license only if allowed by publisher American Library Association Conference ALCTS CRS Interest Group The NERL Mission • To foster and support the educational and research missions of its member institutions by coordinating, consolidating, and negotiating the best possible licensing terms and prices for electronic resources. American Library Association Conference ALCTS CRS Interest Group Text and Data Mining • The practice of searching through large amounts of computerized data to find useful patterns or trends – Webster Dictionary https://www.merriamwebster.com/dictionary/data%20mining • Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. – Wikipedia, https://en.wikipedia.org/wiki/Text_mining American Library Association Conference ALCTS CRS Interest Group Why TDM Matters • Researchers in all subject areas expect to be able to mine data • Researchers looking for answers not overtly expressed in the literature • TDM classes taught regularly in institutional setting – Their already doing it! American Library Association Conference ALCTS CRS Interest Group TDM Output • Full-text of copyright materials are consumed and not expressed • Expressed as graphs, word clouds, and other types of representation • In most cases, copyright governs the reproduction of materials, not way they are analyzed (Peter Leonard, Director, Digital Scholarship Services – Yale University) American Library Association Conference ALCTS CRS Interest Group Licensing TDM • Separate and lengthy TDM licenses are unnecessary • Output is subject to same terms and conditions as when undertaking any research using licensed resources • Making lawful use of the content when employing TDM is subject to the negotiated license agreement. • “Right to read is the right to mine” (Daniel Dollar presentation: Charleston Conference, 2015, ‘Text and Data Mining Licensing Issues”) American Library Association Conference ALCTS CRS Interest Group Licensing TDM Falls Under Fair Use • Courts have held up TDM as Fair Use in the following: – Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014) – HathiTrust digitized works for inclusion in a database that enabled data mining and textual analysis and made it easier to identify and locate sources of information. – Authors Guild v. Google, 770 F.Supp.2d 666 (S.D.N.Y. 2011) – Google digitally scanned books in the collections of partner libraries and incorporated the works into a searchable database that could be used by scholars and researchers. The search results included “snippets” of text an eighth of a page long. The Southern District of New York explicitly referenced the benefit of Google Books to TDM, noting the project “transformed the book text into data for the purpose of substantive research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way that they have not been used before.” (Association for Research Libraries, Issue Brief, Text and Data Mining and Fair Use in the United States, Prepared by Krista L. Cox, director of public policy initiatives, on June 5, 2015) American Library Association Conference ALCTS CRS Interest Group What to Include? • Brief amendment /annex or terms of the license • Process for TDM – API, delivered for use on external facing server • What can be done with the output • Any type of exceptions to authorized use • Scholarly sharing? • Eventual destruction of data? • Consequences of breach American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • Don’t Accept Anything That Restricts Fair Use/Copyright Law – Licensee and Authorized Users may make all use of the Licensed Materials as consistent with the United States Copyright Act of 1976 as amended (17 U.S.C. §101, et seq.) including all limitations on and exceptions to the exclusive rights as granted therein. (Wording from Joan Emmet, MLS, JD –Yale University) American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • Try to define TDM the way you wish Before: “Text and Data Mining” means to perform extensive automated searches of subscribed Content, including data embodied therein, the sorting, parsing, addition or removal of linguistic structures, and the selection and inclusion of subscribed Content into an index or database for purposes of classification or recognition of relations and associations.” “TDM Output” means the result of any Text and Data Mining activity or operation, capable of fixation, reproduction and/or communication in any form, including without limitation the creation of an index, reference, abstract, relative or absolute description or representation of subscribed Content, an algorithm, formula, metrics, method, standard or taxonomy describing or based on subscribed Content, a relational expression or measurement, whether scalable or not, of subscribed Content, extraction, alternative representation or translation, expression or discussion of any extracts from mined subscribed Content, whether in the form of a direct extraction or a representation in any form which is based on subscribed Content. American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • We tweaked definitions and added definition for TDM Materials: After: “Text and Data Mining” (“TDM”) means performing automated searches, selection of content, and structured analyses of subscribed Content including data embodied therein, the sorting, parsing, addition or removal of linguistic structures, and the selection and inclusion of discrete parts of subscribed Content into another form for purposes of classification or recognition of relations, patterns, and associations, the extraction, alternative representation or translation, expression or discussion of any extracts from mined subscribed Content, whether in the form of a direct extraction or a representation in any form. “TDM Materials” the materials, data and information created for or during the Text and Datamining, that are based on the subscribed Content. “TDM Output” means the data and information which is the result of any Text and Data Mining, however excluding any verbatim duplication of the subscribed Content in whole or in part, except for de minimus use. American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • Often licenses include language to limit externally facing output (200 characters, etc.); this limits Fair Use Before: “Where TDM results or conclusions are made available to non-Authorized Users on an externally facing website as a result of a search query, and where such outputs include original, copyright protected material, only a snippet of that original, copyright protected material may be displayed/presented. For the purposes of this ANNEX B, a snippet shall mean an extract that is no more than 150 characters. All snippets must cite the appropriate journal as the source of the material;” Before: make the results of any TDM Output available on an externally facing server or website as long as this inclusion consists only of a few lines of querydependent text of individual full text items of subscribed Content (e.g. extracts from articles or book chapters) which shall be in any event shorter than 200 characters or 15 words or 1 complete sentence or limited to bibliographic metadata. In no event shall the TDM Output contain links to access substantial parts of a full-text work or database of subscribed Content beyond the above limitation; American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • After our negotiations: After: “Where TDM results or conclusions are made available to nonAuthorized Users on an externally facing website as a result of a search query, and where such outputs include original, copyright protected material, no more than that permitted by the limitations and exceptions of the U.S. Copyright Act (17 U.S.C. §101, et seq.) may be displayed/presented. Any quotations or portions of original article text must cite to the appropriate journal as the source of the material.” After: Nothing in the language of Sections 1 and 2 of the TDM License shall be interpreted to inhibit any limitation or exception as provided by US copyright law, (17 U.S.C. § 101, et seq.). American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • TDM Financials – Most libraries paying a premium for content – TDM should be part of the cost of doing business – Inability to mine is a type of embargo (restriction) on using the content that will increasingly undermine the value of the library’s investment in that content – While most publishers will not charge extra for TDM rights for using API some publishers will charge for delivering data for external servers • Negotiate a price for Licensor solely to prepare and deliver such copies on a time and materials basis American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • “Text and Data Mining. Authorized Users may use the Licensed Materials to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes, utilize and share the results of text and/or data mining in their scholarly work, and make the results available for use by others, so long as the purpose is not to create a product for use by third parties that would substitute for the Licensed Materials. Licensor will cooperate with Licensee and Authorized Users as reasonably necessary in making the Licensed Materials available in a manner and form most useful to the Authorized User. If Licensee or Authorized Users request the Licensor to deliver or otherwise prepare copies of the Licensed Materials for text and data mining purposes, any fees charged by Licensor shall be solely for preparing and delivering such copies on a time and materials basis.” • Duke University Press and Opinion Archives have consented to this language for NERL. • http://liblicense.crl.edu/licensing-information/model-license/ American Library Association Conference ALCTS CRS Interest Group Tips for Licensing TDM • Perpetual Access for TDM downloads – If your library paid for perpetual access for materials you should be able to have perpetual access to TDM downloads/materials – Researchers want to be able to run algorithms through data • Some suggested wording: “Removal of locally-loaded copies of subscribed Content downloaded for TDM or TDM Materials: Unless retention of subscribed Content is otherwise permitted under the License Agreement, upon termination of the TDM License under clause …” American Library Association Conference ALCTS CRS Interest Group Other Examples of TDM • On the path to good TDM: • Bloomsbury • “Text Mining. Authorized Users may use the licensed material to perform and engage in text mining /data mining activities for legitimate academic research and other educational purposes.” • Sage • “Text Mining. Authorized Users may use the licensed material to perform and engage in text mining /data mining activities for legitimate academic research and other educational purposes. Those uses beyond educational use shall require SAGE’s permission.” American Library Association Conference ALCTS CRS Interest Group Some Final Thoughts • Have substantive conversations with publishers on what you need and why • Some publishers often use wording from other publishers TDM policy -don’t have one of there own yet • Remember: Publishers have a valid concern about their content and want to protect it from piracy • These conversations can take a long time. Keep the conversation civil and keep the door open. American Library Association Conference ALCTS CRS Interest Group Some Final Thoughts • • • • Don’t agree to limiting Fair Use Don’t be afraid to change some definitions Protect your perpetual access We need a NISO best practices for TDM – working on it now! American Library Association Conference ALCTS CRS Interest Group Thank you for your attention. I welcome your feedback Christine M. Stamison Director, NERL [email protected] American Library Association Conference ALCTS CRS Interest Group
© Copyright 2026 Paperzz