Text and Data Mining - ALA Connect

Securing Text and Data Mining
Rights in Licensing Scholarly
Materials
The NERL Perspective
Christine M. Stamison, MLIS
Director, NorthEast Research Libraries Consortium
Presentation for ALCTS CRS Interest Group
American Library Association Conference
Chicago, June 25, 2017
The NERL Consortium
• 30 Core Dues Paying Members
– Primarily located in the Northeast but
also scattered in Midwest and West
Coast
– Only core members have voting rights
• 116 Affiliate Libraries
– All over the US and two in Canada
– Affiliates can participate in license only if
allowed by publisher
American Library Association Conference
ALCTS CRS Interest Group
The NERL Mission
• To foster and support the educational
and research missions of its member
institutions by coordinating,
consolidating, and negotiating the
best possible licensing terms and
prices for electronic resources.
American Library Association Conference
ALCTS CRS Interest Group
Text and Data Mining
• The practice of searching through large amounts
of computerized data to find useful patterns or
trends – Webster Dictionary https://www.merriamwebster.com/dictionary/data%20mining
• Text mining usually involves the process of structuring
the input text (usually parsing, along with the addition of
some derived linguistic features and the removal of
others, and subsequent insertion into a database),
deriving patterns within the structured data, and finally
evaluation and interpretation of the output. – Wikipedia,
https://en.wikipedia.org/wiki/Text_mining
American Library Association Conference
ALCTS CRS Interest Group
Why TDM Matters
• Researchers in all subject areas
expect to be able to mine data
• Researchers looking for answers not
overtly expressed in the literature
• TDM classes taught regularly in
institutional setting
– Their already doing it!
American Library Association Conference
ALCTS CRS Interest Group
TDM Output
• Full-text of copyright materials are
consumed and not expressed
• Expressed as graphs, word clouds,
and other types of representation
• In most cases, copyright governs the
reproduction of materials, not way
they are analyzed
(Peter Leonard, Director, Digital Scholarship Services – Yale University)
American Library Association Conference
ALCTS CRS Interest Group
Licensing TDM
• Separate and lengthy TDM licenses are
unnecessary
• Output is subject to same terms and
conditions as when undertaking any research
using licensed resources
• Making lawful use of the content when
employing TDM is subject to the negotiated
license agreement.
• “Right to read is the right to mine”
(Daniel Dollar presentation: Charleston Conference, 2015, ‘Text and Data Mining
Licensing Issues”)
American Library Association Conference
ALCTS CRS Interest Group
Licensing TDM Falls Under
Fair Use
• Courts have held up TDM as Fair Use in
the following:
–
Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014) – HathiTrust digitized
works for inclusion in a database that enabled data mining and textual analysis
and made it easier to identify and locate sources of information.
–
Authors Guild v. Google, 770 F.Supp.2d 666 (S.D.N.Y. 2011) – Google digitally
scanned books in the collections of partner libraries and incorporated the works
into a searchable database that could be used by scholars and researchers. The
search results included “snippets” of text an eighth of a page long. The Southern
District of New York explicitly referenced the benefit of Google Books to TDM,
noting the project “transformed the book text into data for the purpose of
substantive research, including data mining and text mining in new areas,
thereby opening up new fields of research. Words in books are being used in
a way that they have not been used before.”
(Association for Research Libraries, Issue Brief, Text and Data Mining and Fair Use in the United States, Prepared
by Krista L. Cox, director of public policy initiatives, on June 5, 2015)
American Library Association Conference
ALCTS CRS Interest Group
What to Include?
• Brief amendment /annex or terms of the license
• Process for TDM – API, delivered for use on
external facing server
• What can be done with the output
• Any type of exceptions to authorized use
• Scholarly sharing?
• Eventual destruction of data?
• Consequences of breach
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• Don’t Accept Anything That Restricts
Fair Use/Copyright Law
– Licensee and Authorized Users may make all use of the Licensed
Materials as consistent with the United States Copyright Act of 1976 as
amended (17 U.S.C. §101, et seq.) including all limitations on and
exceptions to the exclusive rights as granted therein. (Wording from
Joan Emmet, MLS, JD –Yale University)
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• Try to define TDM the way you wish
Before: “Text and Data Mining” means to perform extensive automated
searches of subscribed Content, including data embodied therein, the sorting,
parsing, addition or removal of linguistic structures, and the selection and
inclusion of subscribed Content into an index or database for purposes of
classification or recognition of relations and associations.”
“TDM Output” means the result of any Text and Data Mining activity or
operation, capable of fixation, reproduction and/or communication in any form,
including without limitation the creation of an index, reference, abstract,
relative or absolute description or representation of subscribed Content, an
algorithm, formula, metrics, method, standard or taxonomy describing or based
on subscribed Content, a relational expression or measurement, whether
scalable or not, of subscribed Content, extraction, alternative representation or
translation, expression or discussion of any extracts from mined subscribed
Content, whether in the form of a direct extraction or a representation in any
form which is based on subscribed Content.
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• We tweaked definitions and added definition for
TDM Materials:
After: “Text and Data Mining” (“TDM”) means performing automated
searches, selection of content, and structured analyses of subscribed Content
including data embodied therein, the sorting, parsing, addition or removal of
linguistic structures, and the selection and inclusion of discrete parts of
subscribed Content into another form for purposes of classification or
recognition of relations, patterns, and associations, the extraction, alternative
representation or translation, expression or discussion of any extracts from
mined subscribed Content, whether in the form of a direct extraction or a
representation in any form.
“TDM Materials” the materials, data and information created for or during the
Text and Datamining, that are based on the subscribed Content.
“TDM Output” means the data and information which is the result of any Text
and Data Mining, however excluding any verbatim duplication of the
subscribed Content in whole or in part, except for de minimus use.
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• Often licenses include language to limit externally facing
output (200 characters, etc.); this limits Fair Use
Before: “Where TDM results or conclusions are made available to non-Authorized
Users on an externally facing website as a result of a search query, and where such
outputs include original, copyright protected material, only a snippet of that original,
copyright protected material may be displayed/presented. For the purposes of this
ANNEX B, a snippet shall mean an extract that is no more than 150 characters.
All snippets must cite the appropriate journal as the source of the material;”
Before: make the results of any TDM Output available on an externally facing server
or website as long as this inclusion consists only of a few lines of querydependent text of individual full text items of subscribed Content (e.g. extracts
from articles or book chapters) which shall be in any event shorter than 200
characters or 15 words or 1 complete sentence or limited to bibliographic
metadata. In no event shall the TDM Output contain links to access substantial
parts of a full-text work or database of subscribed Content beyond the above
limitation;
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• After our negotiations:
After: “Where TDM results or conclusions are made available to nonAuthorized Users on an externally facing website as a result of a search
query, and where such outputs include original, copyright protected
material, no more than that permitted by the limitations and
exceptions of the U.S. Copyright Act (17 U.S.C. §101, et seq.) may be
displayed/presented. Any quotations or portions of original article text
must cite to the appropriate journal as the source of the material.”
After: Nothing in the language of Sections 1 and 2 of the TDM License
shall be interpreted to inhibit any limitation or exception as provided by
US copyright law, (17 U.S.C. § 101, et seq.).
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• TDM Financials
– Most libraries paying a premium for content
– TDM should be part of the cost of doing business
– Inability to mine is a type of embargo (restriction) on
using the content that will increasingly undermine the
value of the library’s investment in that content
– While most publishers will not charge extra for TDM
rights for using API some publishers will charge for
delivering data for external servers
• Negotiate a price for Licensor solely to prepare and
deliver such copies on a time and materials basis
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
•
“Text and Data Mining. Authorized Users may use the Licensed Materials to perform
and engage in text and/or data mining activities for academic research, scholarship,
and other educational purposes, utilize and share the results of text and/or data mining
in their scholarly work, and make the results available for use by others, so long as the
purpose is not to create a product for use by third parties that would substitute for the
Licensed Materials. Licensor will cooperate with Licensee and Authorized Users as
reasonably necessary in making the Licensed Materials available in a manner and form
most useful to the Authorized User. If Licensee or Authorized Users request the
Licensor to deliver or otherwise prepare copies of the Licensed Materials for text and
data mining purposes, any fees charged by Licensor shall be solely for preparing and
delivering such copies on a time and materials basis.”
•
Duke University Press and Opinion Archives have consented to this language for
NERL.
•
http://liblicense.crl.edu/licensing-information/model-license/
American Library Association Conference
ALCTS CRS Interest Group
Tips for Licensing TDM
• Perpetual Access for TDM downloads
– If your library paid for perpetual access for materials you
should be able to have perpetual access to TDM
downloads/materials
– Researchers want to be able to run algorithms through
data
• Some suggested wording:
“Removal of locally-loaded copies of subscribed Content downloaded for
TDM or TDM Materials: Unless retention of subscribed Content is
otherwise permitted under the License Agreement, upon termination
of the TDM License under clause …”
American Library Association Conference
ALCTS CRS Interest Group
Other Examples of TDM
• On the path to good TDM:
• Bloomsbury
• “Text Mining. Authorized Users may use the licensed
material to perform and engage in text mining /data
mining activities for legitimate academic research and
other educational purposes.”
• Sage
• “Text Mining. Authorized Users may use the licensed
material to perform and engage in text mining /data
mining activities for legitimate academic research and
other educational purposes. Those uses beyond
educational use shall require SAGE’s permission.”
American Library Association Conference
ALCTS CRS Interest Group
Some Final Thoughts
• Have substantive conversations with
publishers on what you need and why
• Some publishers often use wording from other
publishers TDM policy -don’t have one of there own
yet
• Remember: Publishers have a valid concern
about their content and want to protect it from
piracy
• These conversations can take a long time. Keep the
conversation civil and keep the door open.
American Library Association Conference
ALCTS CRS Interest Group
Some Final Thoughts
•
•
•
•
Don’t agree to limiting Fair Use
Don’t be afraid to change some definitions
Protect your perpetual access
We need a NISO best practices for TDM –
working on it now!
American Library Association Conference
ALCTS CRS Interest Group
Thank you for your attention.
I welcome your feedback
Christine M. Stamison
Director, NERL
[email protected]
American Library Association Conference
ALCTS CRS Interest Group