Draft August 11, 2016 Research and Creative Projects: Studying the “undetermined” category in the Copyright Review Management System (CRMS) Cumulative CRMS-US Exported Determinations (January 2015) PD Determinations: 164,983 (52.8%) IC Determinations: 53,957 (17.3%) UND/NFI Determinations: 93,738 (30.0%) Total Determinations: 312,678 Introduction The Copyright Review Management System (CRMS) is a project designed to determine the copyright status of books in the HathiTrust digital library. Beginning in 2008, CRMS-US focused on reviewing books published in the United States between 1923-1963. By late 2014, the CRMS-US project had reviewed over 310,000 volumes and identified roughly 164,000 as public domain. An additional significant category emerged during the CRMS-US project -“und/nfi” (undetermined/need further investigation) -- volumes presenting complex copyright questions that were too complicated for the streamlined, production oriented CRMS process. Over the course of the CRMS-US project, over 90,000 volumes were designated und/nfi and reviews for approximately 45,000 of these volumes noted the presence of “inserts.” When we talk about an insert, we may be referring to a range of materials—to cite a few examples, inserts include individual photographs, illustrations, and articles or chapters that may have been previously published in other works. The inserts issue greatly complicates copyright review. At its most fundamental, the insert issue is frequently an information problem. We often can make a copyright determination for a given volume, but the copyright status of component parts may be impossible to determine or require extensive research. Rather than attempt to determine the copyright status of individual inserts, CRMS-US set aside volumes containing inserts. Working with two student researchers, we have executed a deeper examination of approximately 300 volumes that were initially determined to be und/nfi. Specifically, we sought to better understand the likelihood that individual inserts were in copyright. We studied the independent copyright status of individual inserts within a given volume, creating a representation of the volume and its component parts. In studying inserts more closely, we’ve taken a first step towards evaluating the costs and challenges represented by these materials in our collections. We hope our observations will help inform reasonable solutions to the question of inserts and the challenges they represent. Key findings: Draft August 11, 2016 Based on the data we’ve collected, we can offer greater copyright determinacy estimates for volumes reviewed by the CRMS-US project team. Less than 10% of inserts in our sample appear to have been renewed. Images are a particularly difficult area for copyright review. Copyright renewal records do not include actual images and publicly available reverse image search technologies (Google Image Search; TinEye) are not advanced enough for this type of research. Without additional metadata from the volume being reviewed, or a more advanced image search tool, image research will be very costly and is often likely to be inconclusive. The absence of publication date information is the greatest obstacle to determining the renewal status of individual inserts. For works published prior to 1950, this is directly related to our limited ability to search the Catalog of Copyright Entries. The high cost of renewal research for inserts makes it a poor solution to the inserts problem; alternate solutions are necessary. 1. We can offer revised copyright determinacy estimates for the CRMS-US project. Figure 1: Number of und/nfi volumes found to have been renewed at the volume level. Draft August 11, 2016 One feature of the CRMS-US review process was to first check for inserts prior to consulting the renewal records found in the Stanford Copyright Renewal Database. If a volume was determined “und/nfi” due to inserts, CRMS-US reviewers stopped the review and did not determine whether the volume itself had been renewed. To gauge the impact of this feature of the CRMS-US process, we instructed our reviewers to first determine the copyright status of each volume in our sample. Of the 305 volumes we reviewed, we found that 30% of the volumes determined to be und/nfi (“inserts”) had actually been renewed at the volume level and were therefore in copyright. Our findings help us offer an estimated revision of copyright determinacy for CRMS-US. If an estimated 30% of the 45,704 und/nfi volumes (13,711 volumes) are in fact in copyright, our determinacy shifts as well – reducing the und/nfi category by 4.4%: Revised Estimate for CRMS-US Exported Determinations (January 2015) PD Determinations: 164,983 (52.8%) IC Determinations: 67,668 (21.6%) UND/NFI Determinations: 80,027 (25.6%) Total Determinations: 312,678 2. We can estimate the number of volumes with inserts that are likely in the public domain [This will have to be revised if these numbers are different] Figure 2: Percentage of non-renewed volumes with at least one renewed insert. Draft August 11, 2016 We’ve established that approximately 30% of the und/nfi (“inserts”) volumes were in fact in copyright. During the course of the project, we collected information to estimate the percentage of non-renewed volumes that contained at least one renewed insert. We also gave our reviewers the ability to note when they simply did not have enough information to determine whether a volume contained renewed inserts. With these options, our reviewers estimated that 47% of nonrenewed volumes did not contain even one renewed insert. The data represented above is relevant to the estimated 70% of the 45,704 und/nfi (“inserts”) volumes (31,993 volumes) that were not renewed at the volume level. Applying these percentages, we can estimate that 6,397 und/nfi volumes would likely not contain enough information to make a determination; 10,557 would contain at least one renewed insert; and 15,039 volumes would likely be in the public domain. If we add these estimates to our revised estimates, we arrive at the following determinacy rates. Estimated CRMS-US Exported Determinations (January 2015) PD Determinations: 180,022 (57.6%) IC Determinations: 78,225 (25%) UND/NFI Determinations: 54431 (17.4%) Total Determinations: 312,678 3. Less than 10% of the inserts we studied appear to have been renewed. Figure 3: Breakdown of renewal findings for studied inserts. Draft August 11, 2016 Over the course of our examination of works containing inserts, we found that less than 10% of the inserts we studied appeared to have been renewed. In arriving at this finding, we attempted to err on the side of overestimating the number of renewed inserts. We directed our researchers to limit their review of inserts to 15 inserts and focus their attention on those inserts that, in their experience and estimation, were most likely to be renewed. Even with this focus on the inserts most likely to have been renewed, our researchers found renewal records for less than 10% of the inserts studied. This 10% renewal rate for inserts is largely consistent with the renewal rates found in Barbara Ringer’s 1961 study of copyright renewal. In practice, this means that a very small percentage of potentially renewed inserts are currently preventing access to a significant number of works. In most cases, the risk of a renewed insert is the only thing keeping works closed. In many others, a single renewed insert, on one page of a several hundred page work, is keeping the entire work closed. It should be noted that the roughly 45,000 CRMS-US books currently closed due to inserts is the tip of a very large iceberg – if we continue to keep entire volumes closed due only to the possible presence of an in copyright insert, we will continue to limit access to major portions of our collections. This is an area where solutions are needed; any solution must recognize that the vast majority of inserts are likely to be in the public domain. 4. Images are a particularly difficult area for copyright review. Our findings suggest that copyright review of images is not likely to be successful. We found that photographs, artwork, and maps are often poorly credited when compared to other types of inserts. When examining image titles, it is often unclear whether the image was created by a third party author or the author of the volume. Author names, an important component of renewal research, are similarly unclear. Draft August 11, 2016 Of the 580 image inserts reviewed over the course of the study (photographs, artwork, and maps), only 10 were marked as renewed, or 1.7%. This can be compared to the 10.4% renewal rate for non-image inserts. In addition, image inserts marked “not enough information” accounted for 54.1% of all reviews, compared to only 9.8% for non-image inserts. Image renewal research is undermined by several information problems. One problem is that photography companies are frequently listed instead of specific photographers. While company names can be used for copyright research, we generally cannot know whether company names or specific authors would be listed in renewal records. To compound the problem, photography companies are often no longer in business, making more detailed research extremely difficult. Our research tools make this problem even worse -- the Catalog of Copyright Entries, one of our primary resources for renewal research, contains only text entries -- it can be challenging or impossible to connect a renewal record with the actual image it represents. Unfortunately, there are no clear avenues for improvement in this area. Advances in reverse image search engines may help this situation, but the lack of quality in-volume acknowledgements and the inconsistent credit provided with images are not problems that can be easily solved. Until image databases are truly comprehensive and image search capabilities are more advanced, renewal research for images is not a good use of limited resources. 5. The absence of publication date information is the greatest obstacle to determining the renewal status of individual inserts. Draft August 11, 2016 The CCE is currently only browsable and searchable in a very narrow sense -- volume by volume, year by year. In order to perform renewal research in the Catalog of Copyright Entries, we need to know the publication date of the insert. As shown above, no identifiable publication date accounted for the majority of our “not enough information” determinations. Put simply, if we don’t know when a given insert was published, we have very little way of knowing if the copyright in that insert was renewed. In order to resolve this problem, a full text searchable version of the CCE is necessary -- the creation of a full text searchable CCE will be resource intensive, but it is vital for even a basic understanding of insert renewal. 6. Copyright review is not a sustainable solution to the insert issue. Our findings have made us even more certain that determining the copyright status of individual inserts is not a good path forward for libraries. Alternate solutions are needed. To begin with, performing individual copyright review of inserts is incredibly time consuming – even after we instructed our researchers to limit their review of volumes to 15 inserts per volume, review time for each volume averaged nearly an hour (59.77 minutes). At a $12/hour rate, reviewing only 45,000 volumes would cost approximately $537,000, simply to arrive at an estimated public domain determination. Draft August 11, 2016 Even with additional copyright review, of the 45,704 und/nfi (“inserts”) volumes, only an estimated 15,039 volumes would likely be found to be in the public domain. The high cost and continued uncertainty of copyright review of inserts makes this type of review a problematic and costly approach for libraries. Notes for Conclusion As volumes become more complex – serials and newspapers are obvious examples – the number of inserts in a given work often increases significantly. Solutions to the insert issue should: not treat inserts as a binary “on/off” switch for access to the volumes in which they are found; be based on an understanding that the vast majority of inserts are likely in the public domain. account for high transaction costs in identifying in copyright inserts (if there was a licensing regime, it would be very difficult and costly for the Collective Management Organization (CMO) to pay funds out properly) . Fair use may be the most appropriate solution, given the high costs of differentiating incopyright from public domain works, the poor fit between compensation and the incentives for creating the inserts (these inserts were created long ago, the connection between the insert author’s incentive to create the insert and the use is very attenuated). Unreasonable and unnecessary to create a market through ECL. Other ways to incentivize authorship and compensate rights holders (capped tax deduction for authors/rights holders who can establish that they hold rights in renewed inserts).
© Copyright 2026 Paperzz