Research and Creative Projects: Studying the “undetermined

Draft August 11, 2016
Research and Creative Projects: Studying the “undetermined” category in the Copyright
Review Management System (CRMS)
Cumulative CRMS-US Exported Determinations (January 2015)
PD Determinations: 164,983 (52.8%)
IC Determinations: 53,957 (17.3%)
UND/NFI Determinations: 93,738 (30.0%)
Total Determinations: 312,678
Introduction
The Copyright Review Management System (CRMS) is a project designed to determine the
copyright status of books in the HathiTrust digital library. Beginning in 2008, CRMS-US
focused on reviewing books published in the United States between 1923-1963. By late 2014, the
CRMS-US project had reviewed over 310,000 volumes and identified roughly 164,000 as public
domain. An additional significant category emerged during the CRMS-US project -“und/nfi” (undetermined/need further investigation) -- volumes presenting complex copyright
questions that were too complicated for the streamlined, production oriented CRMS
process. Over the course of the CRMS-US project, over 90,000 volumes were designated
und/nfi and reviews for approximately 45,000 of these volumes noted the presence of “inserts.”
When we talk about an insert, we may be referring to a range of materials—to cite a few
examples, inserts include individual photographs, illustrations, and articles or chapters that may
have been previously published in other works. The inserts issue greatly complicates copyright
review. At its most fundamental, the insert issue is frequently an information problem. We often
can make a copyright determination for a given volume, but the copyright status of component
parts may be impossible to determine or require extensive research. Rather than attempt to
determine the copyright status of individual inserts, CRMS-US set aside volumes containing
inserts.
Working with two student researchers, we have executed a deeper examination of approximately
300 volumes that were initially determined to be und/nfi. Specifically, we sought to better
understand the likelihood that individual inserts were in copyright. We studied the independent
copyright status of individual inserts within a given volume, creating a representation of the
volume and its component parts. In studying inserts more closely, we’ve taken a first step
towards evaluating the costs and challenges represented by these materials in our collections. We
hope our observations will help inform reasonable solutions to the question of inserts and the
challenges they represent.
Key findings:
Draft August 11, 2016

Based on the data we’ve collected, we can offer greater copyright determinacy estimates
for volumes reviewed by the CRMS-US project team.

Less than 10% of inserts in our sample appear to have been renewed.

Images are a particularly difficult area for copyright review. Copyright renewal records
do not include actual images and publicly available reverse image search technologies
(Google Image Search; TinEye) are not advanced enough for this type of research.
Without additional metadata from the volume being reviewed, or a more advanced image
search tool, image research will be very costly and is often likely to be inconclusive.
The absence of publication date information is the greatest obstacle to determining the
renewal status of individual inserts. For works published prior to 1950, this is directly
related to our limited ability to search the Catalog of Copyright Entries.


The high cost of renewal research for inserts makes it a poor solution to the inserts
problem; alternate solutions are necessary.
1. We can offer revised copyright determinacy estimates for the CRMS-US project.
Figure 1: Number of und/nfi volumes found to have been renewed at the volume level.
Draft August 11, 2016
One feature of the CRMS-US review process was to first check for inserts prior to consulting the
renewal records found in the Stanford Copyright Renewal Database. If a volume was determined
“und/nfi” due to inserts, CRMS-US reviewers stopped the review and did not determine whether
the volume itself had been renewed.
To gauge the impact of this feature of the CRMS-US process, we instructed our reviewers to first
determine the copyright status of each volume in our sample. Of the 305 volumes we reviewed,
we found that 30% of the volumes determined to be und/nfi (“inserts”) had actually been
renewed at the volume level and were therefore in copyright. Our findings help us offer an
estimated revision of copyright determinacy for CRMS-US. If an estimated 30% of the
45,704 und/nfi volumes (13,711 volumes) are in fact in copyright, our determinacy shifts as well
– reducing the und/nfi category by 4.4%:
Revised Estimate for CRMS-US Exported Determinations (January 2015)
PD Determinations: 164,983 (52.8%)
IC Determinations: 67,668 (21.6%)
UND/NFI Determinations: 80,027 (25.6%)
Total Determinations: 312,678
2. We can estimate the number of volumes with inserts that are likely in the public domain
[This will have to be revised if these numbers are different]
Figure 2: Percentage of non-renewed volumes with at least one renewed insert.
Draft August 11, 2016
We’ve established that approximately 30% of the und/nfi (“inserts”) volumes were in fact in
copyright. During the course of the project, we collected information to estimate the percentage
of non-renewed volumes that contained at least one renewed insert. We also gave our reviewers
the ability to note when they simply did not have enough information to determine whether a
volume contained renewed inserts. With these options, our reviewers estimated that 47% of nonrenewed volumes did not contain even one renewed insert.
The data represented above is relevant to the estimated 70% of the 45,704 und/nfi (“inserts”)
volumes (31,993 volumes) that were not renewed at the volume level. Applying these
percentages, we can estimate that 6,397 und/nfi volumes would likely not contain enough
information to make a determination; 10,557 would contain at least one renewed insert; and
15,039 volumes would likely be in the public domain. If we add these estimates to our revised
estimates, we arrive at the following determinacy rates.
Estimated CRMS-US Exported Determinations (January 2015)
PD Determinations: 180,022 (57.6%)
IC Determinations: 78,225 (25%)
UND/NFI Determinations: 54431 (17.4%)
Total Determinations: 312,678
3. Less than 10% of the inserts we studied appear to have been renewed.
Figure 3: Breakdown of renewal findings for studied inserts.
Draft August 11, 2016
Over the course of our examination of works containing inserts, we found that less than 10% of
the inserts we studied appeared to have been renewed. In arriving at this finding, we attempted to
err on the side of overestimating the number of renewed inserts. We directed our researchers to
limit their review of inserts to 15 inserts and focus their attention on those inserts that, in their
experience and estimation, were most likely to be renewed. Even with this focus on the inserts
most likely to have been renewed, our researchers found renewal records for less than 10% of the
inserts studied.
This 10% renewal rate for inserts is largely consistent with the renewal rates found in Barbara
Ringer’s 1961 study of copyright renewal.
In practice, this means that a very small percentage of potentially renewed inserts are currently
preventing access to a significant number of works. In most cases, the risk of a renewed insert is
the only thing keeping works closed. In many others, a single renewed insert, on one page of a
several hundred page work, is keeping the entire work closed.
It should be noted that the roughly 45,000 CRMS-US books currently closed due to inserts is the
tip of a very large iceberg – if we continue to keep entire volumes closed due only to the possible
presence of an in copyright insert, we will continue to limit access to major portions of our
collections. This is an area where solutions are needed; any solution must recognize that the vast
majority of inserts are likely to be in the public domain.
4. Images are a particularly difficult area for copyright review.
Our findings suggest that copyright review of images is not likely to be successful. We found
that photographs, artwork, and maps are often poorly credited when compared to other types of
inserts. When examining image titles, it is often unclear whether the image was created by a
third party author or the author of the volume. Author names, an important component of
renewal research, are similarly unclear.
Draft August 11, 2016
Of the 580 image inserts reviewed over the course of the study (photographs, artwork, and
maps), only 10 were marked as renewed, or 1.7%. This can be compared to the 10.4% renewal
rate for non-image inserts. In addition, image inserts marked “not enough information”
accounted for 54.1% of all reviews, compared to only 9.8% for non-image inserts.
Image renewal research is undermined by several information problems. One problem is that
photography companies are frequently listed instead of specific photographers. While company
names can be used for copyright research, we generally cannot know whether company names or
specific authors would be listed in renewal records. To compound the problem, photography
companies are often no longer in business, making more detailed research extremely
difficult. Our research tools make this problem even worse -- the Catalog of Copyright Entries,
one of our primary resources for renewal research, contains only text entries -- it can be
challenging or impossible to connect a renewal record with the actual image it represents.
Unfortunately, there are no clear avenues for improvement in this area. Advances in reverse
image search engines may help this situation, but the lack of quality in-volume
acknowledgements and the inconsistent credit provided with images are not problems that can be
easily solved. Until image databases are truly comprehensive and image search capabilities are
more advanced, renewal research for images is not a good use of limited resources.
5. The absence of publication date information is the greatest obstacle to determining the
renewal status of individual inserts.
Draft August 11, 2016
The CCE is currently only browsable and searchable in a very narrow sense -- volume by
volume, year by year. In order to perform renewal research in the Catalog of Copyright Entries,
we need to know the publication date of the insert.
As shown above, no identifiable publication date accounted for the majority of our “not enough
information” determinations. Put simply, if we don’t know when a given insert was published,
we have very little way of knowing if the copyright in that insert was renewed.
In order to resolve this problem, a full text searchable version of the CCE is necessary -- the
creation of a full text searchable CCE will be resource intensive, but it is vital for even a basic
understanding of insert renewal.
6. Copyright review is not a sustainable solution to the insert issue.
Our findings have made us even more certain that determining the copyright status of individual
inserts is not a good path forward for libraries. Alternate solutions are needed.
To begin with, performing individual copyright review of inserts is incredibly time consuming –
even after we instructed our researchers to limit their review of volumes to 15 inserts per
volume, review time for each volume averaged nearly an hour (59.77 minutes). At a $12/hour
rate, reviewing only 45,000 volumes would cost approximately $537,000, simply to arrive at an
estimated public domain determination.
Draft August 11, 2016
Even with additional copyright review, of the 45,704 und/nfi (“inserts”) volumes, only an
estimated 15,039 volumes would likely be found to be in the public domain. The high cost and
continued uncertainty of copyright review of inserts makes this type of review a problematic and
costly approach for libraries.
Notes for Conclusion
As volumes become more complex – serials and newspapers are obvious examples – the number
of inserts in a given work often increases significantly.
Solutions to the insert issue should:



not treat inserts as a binary “on/off” switch for access to the volumes in which they are
found;
be based on an understanding that the vast majority of inserts are likely in the public
domain.
account for high transaction costs in identifying in copyright inserts (if there was a
licensing regime, it would be very difficult and costly for the Collective Management
Organization (CMO) to pay funds out properly) .
Fair use may be the most appropriate solution, given the high costs of differentiating incopyright from public domain works, the poor fit between compensation and the incentives for
creating the inserts (these inserts were created long ago, the connection between the insert
author’s incentive to create the insert and the use is very attenuated). Unreasonable and
unnecessary to create a market through ECL. Other ways to incentivize authorship and
compensate rights holders (capped tax deduction for authors/rights holders who can establish that
they hold rights in renewed inserts).