EMC TECHNICAL WHITE PAPER TITLE

MONETIZING MEDIA AND ENTERTAINMENT
CONTENT ARCHIVES
The case for disk-based object storage as a low-cost, lowmaintenance alternative to tape-based libraries.
ABSTRACT
This white paper explains the value of the object storage in media and entertainment archive
workflows. The paper compares the architectural requirements and 3-5 year maintenance
requirements for media content archives on robotic tape libraries and disk-based object storage.
The paper also reviews how cloud storage enables new revenue streams for content owners.
September, 2016
WHITE PAPER
The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, the EMC logo, Elastic Cloud Storage, and ECS are registered trademarks or trademarks of EMC Corporation in the United
States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2016 EMC
Corporation. All rights reserved. Published in the USA. 08/2016 white paper H15484.
EMC believes the information in this document is accurate as of its publication date. The information is subject to change without
notice.
EMC is now part of the Dell group of companies.
2
TABLE OF CONTENTS
EXECUTIVE SUMMARY .............................................. ERROR! BOOKMARK NOT DEFINED.
Audience ........................................................................................................................................... 4
MEDIA AND ENTERTAINMENT INDUSTRY TRANSFORMATION ........................................4
How Did We Get Here? ..................................................................................................................... 4
What Has Changed? ......................................................................................................................... 5
Bring The Cloud Home ...................................................................................................................... 5
Tape Problems .................................................................................................................................. 6
Media Archive Transformation........................................................................................................... 7
What Is The Total Cost Of Ownership For Tape ............................................................................. 10
3
EXECUTIVE SUMMARY
Magnetic tape was one of the earliest methods for storing digital data. Originally it was a great choice because other methods for long-term storage of
digital data weren’t really viable. Memory was extremely expensive and required constant power, disks were very large, had very low storage density
(and were extremely expensive and not that reliable), and other methods including flexible disks and paper tape were too limiting. Backing up and
archiving data to magnetic tape was the obvious choice—and for many years supported a thriving technology community.
Tape drive and media types proliferated and speeds and capacities grew rapidly. In addition to many different proprietary tape solutions that were
available, companies also began to adapt consumer tape technologies such as digital cassettes and 8mm helical-scan technologies for the backup and
archive market. In the high end, enterprise IBM 3480 and 3490 and StorageTek SD-3 (Redwood) dominated the market.
Accompanying this booming tape backup and archive market was a plethora of backup and archive software—all designed with tape drives and tape
automation in mind. Most packages didn’t support other technologies, but that was soon to change.
Tape started feeling the pressure from competing technologies, particularly random access media. For a brief time Optical Disc Libraries looked like a
good choice to replace tape, due to reasonable initial capacity points, automation with juke boxes, and the promise of long life. Unfortunately increases
in density were very slow in coming and the technology ultimately proved to be too expensive.
In parallel there were huge advances in hard disk technology. Form factors reduced quickly, density increased dramatically, reliability improved by
orders of magnitude, and $/MB decreased to levels competitive with tape.
AUDIENCE
This white paper is intended for CIOs, CTOs, engineering staff, and business decision makers of media companies comparing tape and alternative
storage methods as a long term retention format.
MEDIA AND ENTERTAINMENT INDUSTRY TRANSFORMATION
HOW DID WE GET HERE?
Tape has been the core medium for storing archival media content. This process started when videotape was used as the primary acquisition format
(replacing film). Subsequently the “master” and “copy master” videotapes were stored on shelves in archive vaults. In the past decade, the media and
entertainment industry has rapidly evolved to support digital production and file-based “tapeless” post-production workflows. In the rush to go “tapeless,”
why has the industry ended up with petabytes of archival and near-line media content stored on cantankerous robotic tape library archives? The answer
is simple, at first: tape is cheap. Maybe too cheap, because it is commonplace that many duplicates of the same content are occupying space in LTO
libraries today. It is often cheaper just to archive content than to compare it with existing archives. While most media and entertainment business
stakeholders need to ensure that their data is always available for content repurposing, content revisions, or compliance reasons, they also need the
absolute lowest cost and highest density for their near-line archive tier.
For years, tape was the only answer. Servers were expensive. The disk arrays required for suitable data protection were also too expensive for media
content archive purposes. More importantly, the move to high-definition video content has pushed the archive requirements for large media businesses
into the multi-petabyte capacity. The power and cooling requirements for a multi-petabyte archive pushed the industry away from disk-based systems to
tape based systems. For the first generation of near-line media archives, robotic LTO tape libraries were the only solution that made sense.
As the media and entertainment industry moves to 4K resolution and virtual/augmented reality content formats, the storage and archive requirements for
media content grow exponentially. However, as the storage requirements of the media and entertainment industry continue to grow, industry revenue
has not grown accordingly. Those in the media industry are constantly challenged to find innovative ways to stay ahead of the technology curve, find
new ways to monetize their content, and ensure that their content is preserved for future business opportunities. Effectively, the industry is challenged to
“do more with less”.
WHAT HAS CHANGED?
The public cloud has changed everything. The economy of scale found in the public cloud has changed the economics of multi-petabyte archives. The
shared cost of infrastructure, compute resources, power, cooling, and engineering is generally much lower in the public cloud compared the operational
4
cost of maintaining a tape-based archive1. The public cloud brings the promise of “limitless” scale with minimal engineering and technical support
requirements. Specifically in the media industry, large and highly functional public video archives such as Google’s YouTube have proven what is
achievable with a cloud-scale media archive – now reported to be ingesting new content at a rate of greater than 1PB per day.
The public cloud promises far more than lower storage costs. By moving data to the public cloud, collaboration between business units operating in
different geographies of the world is simplified. Media companies with multiple geographic locations no longer need to orchestrate bandwidth-expensive
transfers of large media files over the WAN to collaborate and share media content – the entire media content archive is visible in all geographies and
content can be immediately retrieved by cloud-ready applications without waiting on a robotic tape library.
Of course, the economy of scale found in the public cloud comes at a cost. While it is quite cost effective to store content in the cloud, most media and
entertainment professionals find the cost of data retrieval back from the cloud prohibitive. For this reason, the public cloud is often compared to the
“Hotel California” or “Roach Motel” – it’s cheap and easy to get data in, easy to stay, but your data can never leave without paying significant data egress
fees. The public cloud is a cost-effective long-term media archive platform only if you give up your freedom to use the applications of your choice and
move your entire workflow to the cloud. The public cloud is often the only choice for start-up media and entertainment companies lacking a technical
infrastructure and competent engineering staff, however, it can be a problematic solution for media companies that want to maintain a competitive
advantage in the marketplace. Further, not all applications easily port to the cloud. While cloud-enabled video editing solutions have existed for years,
market acceptance for these solutions is relatively low due in part to issues such as latency and the inability to select shots that are in focus when
working with very low resolution proxy video files.
BRING THE CLOUD HOME
What options exist for media companies wishing to leverage cloud storage in their workflow using best-of-breed applications not found on the public
cloud? Enter Dell EMC’s Elastic Cloud Storage. ECS is a multi-purpose cloud storage platform with enterprise-grade capabilities that the public cloud
does not (or cannot) provide. ECS blends the best features of next-gen object storage with traditional storage features required for media workflows in
transition, such as native NFSv3 access support for objects created via the S3 API and strong consistency between geo-distributed data center
locations. ECS also offers advanced features not available on public cloud offerings such as byte range updates, atomic appending, rich ACLs, data ‘in
place’ analytics, and metadata search and management capabilities. With the introduction of the higher density, lower cost ECS D-Series, the media
and entertainment world finally has an on-premises cloud storage platform that is cost competitive with multi-petabyte LTO tape libraries.
ECS is built from the ground up as a software-defined storage solution that can also be deployed on a turn-key commodity server infrastructure. The
unique architecture of ECS allows compute resources and storage resources to scale independently. Like the public cloud, ECS can span multiple
geographic locations with a single storage infrastructure. ECS can distribute parity data across multiple locations in order to provide enhanced data
availability and resiliency while reducing the WAN traffic between data center locations typically associated with replication of geo-distributed storage
systems.
Why object storage? Object storage is gaining rapid adoption, not simply because of its prevalence on the public cloud. The object storage architecture
is rapidly becoming mandatory for applications that must manage large, constantly growing repositories of data for long-term retention. Traditional
storage systems are like parking lots. You need to find an open space to park your data and when a volume is full, like a parking lot, you need to build
another one. When it’s time to retrieve your data, just like when you retrieve your car from a parking lot, you must remember where you parked it.
Object storage is less like a parking garage and more like valet parking. When you need to retrieve your data, you simply need the name of the object
and an authentication key, the object storage tracks the active location of your data, regardless of an outage at the data center where the object was
ingested or any other event affecting data availability.
Traditional storage systems, like parking garages, require the creation of a hierarchy prior to storing millions or billions of media objects. Just as you can
only fit a certain number of cars on a single level of a parking garage, a media archive system requires the creation of an arbitrary folder structure to
break up the archive contents. The archive system must do this in order to avoid hitting the client operating system limits for how many files can be
stored in a single directory structure before encountering file enumeration and indexing issues. In an object-based storage system, the application
simply provides the authentication key for accessing the object store. Like a valet, the object store maintains a pairing of the ingested object ID provided
by the application and the disk location of the object data. The archive application simply needs to request the name of the object to retrieve data, and
the object-based storage system independently handles data protection and lifecycle management of the data on disk. For more information on the
underpinnings of the ECS architecture, please see the Dell EMC Elastic Cloud Storage (ECS) Overview and Architecture White Paper.
TAPE CHALLENGES
Data stored on tape is not accessible until the tape is retrieved by a robotic arm and loaded into an available tape drive which will seek to the data’s tape
location. Data must be identified, located, and predictively restored from tape back to disk in order to be used by media applications. So the data in the
1
Source: IDC Report “Quantifying the Business Value of Amazon Web Services”
5
archive cannot actually be used directly “in-place”. The problem of data accessibility is only the tip of the iceberg when dealing with tape. Tape-based
multi-petabyte archives present a host of implementation and maintenance issues.
The most obvious problem with large tape libraries is the physical footprint of the library frame and real estate required for frame expansions. Tape
libraries are huge, the size of a bus in some cases. While robotic tape libraries are relatively energy efficient with low cooling requirements, the price of
real-estate is on the rise. This problem is even more relevant in desirable media center locations like Hollywood, Vancouver, New York, and Soho.
While a robotic tape library requires a large humidity-controlled room dedicated to the tape library frame and storage for mirrored tapes that will be
exported and eventually shipped off-site, the ECS D6200 packs over 6 PB of raw disk storage into a single rack and the capacity can be expanded by
adding additional racks at any location, in any data center. While tape libraries require inefficient mirrored writes to 2 tapes for data protection, ECS
uses efficient hybrid encoding with erasure coding, triple mirroring, and XOR data reduction for objects that gains ever greater raw disk utilization
efficiency, availability, and data protection reliability as you expand ECS to more data center locations. Offsite transportation, storage, backup
administration and audit makes tape archiving complex
Anyone familiar with operating a robotic tape library will tell you that the worst problem, and most unpredictable expense is maintenance. Errors on a
tape library can be commonplace. Robotic arm failures and robotic arm track alignment issues caused by floor sagging as the weight of the library
continues to grow are vexing issues to troubleshoot. Drive failures and the down-time required to fix robotics issues can lead to missed deadlines and
data availability issues that not only consume inordinate amounts of engineering time, they can also result in lost revenue.
Tape degrades with every use. One major US-based broadcaster reports that over 90% of the tapes in their library that contain data have been read for
restores from tape in day to day operations. This high duty cycle on reads, effectively using tape as a near-line tier of storage, results in unexpected
expenses for tape replacement and excessive engineering support caused by restore failures. In many cases, restore failures can result in lost time for
operations staff forced to retrieve the original media content – should it still exist.
LTO tape has an aggressive roadmap – new tape formats with higher capacity and greater throughput are always on the horizon. When media
companies reach the maximum tape slot capacity on their tape libraries, they typically have the option to expand their library frame to accommodate
more tape slots. If the library has been in operation for 3-5 years or more than 2 generations of LTO have been introduced since the library was
purchased, a more economical option than a library frame expansion is a tape migration.
In order to maximize the capacity and performance of a tape library, media companies embark on a migration where petabytes of data are migrated from
an older LTO format to the most cost-effective LTO format available at the time. These migrations can take months or years depending on the volume of
the library and number of tape drives added to the library for the migration. Obviously, a background process to read every single tape in the library and
write it to a new tape stresses the system to the point where maintenance becomes a routine issue. The added load on the tape library can also affect
the performance of the archive system as it tries to keep up with growth in the media company’s current workflows.
Pulling mirror copy tapes back from a disaster recovery site due data restore failures during a migration between LTO generations is also a common
issue. Tape drive failures during the high duty cycles of a tape migration are also a common occurrence. Any error in the asset tracking system where
the location of a media file on a tape is stored can result in lost media content. The myth of a 15-30 year lifecycle for LTO media does not ring true when
one looks at the economics of an always-online, constantly growing media content archive. Tape migrations every 3-5 years are a reality that media
companies should only have to endure once, and that’s when the data is migrated to disk!
Tape often requires a dedicated application environment for managing the robotics control interface and tracking the tape locations of petabytes worth of
media content. These applications can be costly to install and even more costly to maintain due to rising software license support costs. Media
companies are under increasing pressure to move their content to the cloud to escape these rising support costs. Luckily, ECS provides an option for
those who want to reap the benefits of cloud storage without risking the out of control expenses associated with data retrieval from the public cloud.
MEDIA ARCHIVE TRANSFORMATION
The greatest benefit for media and entertainment companies moving from a tape-based archive to a disk based archive is the instant availability of their
media content. Media stored on tape cannot be monetized without a planned and scheduled retrieval from a robotic tape library. Consider for a moment
the rapid proliferation of Over the Top (OTT) media content distribution services like NetFlix and Hulu. A content owner wishing to monetize their
existing archive of media content must plan a large-scale transcoding and repackaging project every time a new content distributor, Video on Demand
(VOD) format, or disruptive media technology comes to market.
For a media content owner relying on a tape-based archive, the process of repacking a large library of media content must start with the painstaking
process of restoring the relevant content from LTO to a near-line storage tier where it can be read by the transcoding and repackaging application
services. This process can be highly disruptive to the content owner’s normal workflow, delaying archive and restore requests required for day-to-day
operations and media production tasks. Further, any delay introduced by mechanical failure on the tape library will result in a delay in all downstream
tasks like transcoding and content delivery. While most mechanical failures in a tape library such as a robotic arm failure or robotic arm track alignment
6
issues can be resolved within 24 hours, the compounding delays caused by archival data availability can have dire financial consequences for content
owners. Contracts for VOD or OTT content distribution can be time sensitive, with differing revenue rates for content delivered within 24 hours of linear
broadcast versus 48 hours after the premiere linear broadcast.
Contrast the plight of the content owner with a legacy LTO content archive versus a content owner with an archive on ECS. Let’s assume a hypothetical
project where the content owner must go back to their archive to repurpose all their HDTV content for a new business initiative requiring 4K up
conversion and repackaging. The up conversion cannot rely on the mezzanine level compression used on the content typically used in OTT and VOD
repackaging tasks. The content owner must predictively restore all their contribution or archival quality content in the anticipated order of importance for
file-based 4K up conversion. Restoring an entire library of content could delay the completion of this project for months. The content owner with an
archive on ECS can instantly access any of their content without the need for predictive planning and scheduling. The ECS-based archive exposes a
world of options for this new business initiative, such as just-in-time up conversion based on real-time customer demand for archival content. The
transcode services can immediately access archival content for repurposing via S3 Application Program Interface (API) connectivity or the native NFSv3
interface of ECS.
ECS offers increased operational efficiency for day-to-day media operations as well. Consider a legacy broadcast or media production archival
workflow. As we see in Fig. 1, a legacy media workflow leveraging a robotic LTO tape library for its archival storage tier requires a significant investment
in supporting hardware, software, licensing, and support. In the example workflow shown, new content is logged in the Media Asset Management
(MAM) system and ingested to an online tier Network Attached Storage (NAS) platform from client workstations connected via Server Message Block
(SMB) or Network File System (NFS). The ingested media content is inspected by media Quality Control (QC) services under API-controlled automation
from the MAM system. Once the media successfully passes QC testing under MAM automation, a task is raised by the MAM system to move one copy
of the ingested content to an LTO tape that will remain in the tape library in perpetuity, and a second copy of the ingested content is mirrored to an LTO
tape that will be exported from the tape library and eventually transported off-site to a Disaster Recovery (DR) location.
The MAM system will retain the original copy of the ingested content, or media asset, on the online NAS tier for up to 90 days past the last automated
task requiring the asset. By retaining this online copy of the media asset, even though it is not needed, the MAM system is preventing the time
consuming and potentially business-impacting wait time to restore the media asset from tape if an unexpected task requiring the archived media asset is
raised by a user of the MAM system. In order for the media asset to move from online NAS file system to the archive storage tier on LTO tape, it is quite
common for a MAM system to require API integration with a Content Storage Management (CSM) system. The CSM typically consists of a database
that is in many ways redundant to the database used by the MAM system, an application layer somewhat redundant to the MAM system, a robotic tape
library control interface, and an array of Fibre Channel (FC)-attached data movement servers that facilitate the copying of media assets from online
storage or video servers to the LTO tape subsystem.
The data movement servers can write to the LTO tapes using the standardized Linear Tape File System (LTFS) or, in the case of robotic tape libraries
implemented prior to the recent adoption of LTFS, the data movement servers may write to LTO tape in a format that is proprietary to the CSM vendor.
In larger CSM installations with multiple data movement servers, a near-line NAS or Storage Area Network (SAN) is used as a temporary cache between
the online storage or video servers and the LTO tape library. The temporary cache ensures that the tape drives in the robotic tape library can read or
write at a constant data rate that the online storage tier may not be able to sustain while it is under heavy load from all the services running under such
as MAM automated ingest, QC, transcode, or manual video editing operations.
Clearly, the added complexity of the servers, FC network infrastructure, CSM applications, CSM database, software licensing, and additional database
licensing dramatically increases the cost and maintenance requirements for operating a robotic tape library. The annual maintenance costs of the CSM
software alone can add six figures of operating expense per year for larger media companies. Another significant expense incurred by maintaining a
robotic tape library and CSM system is the near-line NAS or SAN required to act as a temporary cache between the online storage tier and the data
movement servers.
7
Figure 1. Legacy media and entertainment active archive architecture
Much has changed since media companies started building their first file-based workflows and media content archives. Today, MAM vendors are rapidly
adopting the S3 API in response to customer demand for workflows that leverage the cloud as their long-term media archive. While the economics of
retrieving data from an archive in the public cloud may not work for everyone, this shift in the media and entertainment industry opens new opportunities
for companies interested in ending the painful cycle of migrating their media content archives from one generation of LTO technology to the next.
As the next-gen media workflow illustrated in the Fig. 2 architecture demonstrates, ECS not only simplifies the archive infrastructure, it enables critical
new cloud-enabled workflows not possible with a legacy robotic tape library.
8
Figure 2. Next-gen media content archive architecture using ECS.
Removing tape from the workflow removes the cost and complexity of the CSM, FC networking infrastructure, and near-line NAS or SAN. The real value
of replacing your robotic tape library with the ECS D-series is the increased efficiency and productivity in your workflow. Valuable engineering
resources that used to lose time to unexpected tape library maintenance tasks can be focused on new, revenue-generating projects. Unfettered access
to ALL your media content at ANY time allows rapid deployment of new workflows and new revenue opportunities. Moving your archive to an onpremises cloud storage platform enables you to leverage your content in next generation web-based applications. A media content library stored on
tape has limited value to your business other than archive, while media content stored on cloud storage can be monetized immediately.
ECS is more than a simple replacement for aging robotic tape library technology. ECS is a veritable Swiss-army knife of a cloud storage solution. First,
ECS allow you to consolidate all your backup and archive storage requirements into a single platform. ECS is certified with a wide array of backup and
archive software solutions. ECS can also be used as a high-density tier of storage to seamlessly extend the capacity of other storage products like Isilon
with CloudPools (Fig. 3), Data Domain with Cloud Tiering, and VNX with CloudArray, and Data Protection Suite (Avamar & Networker) with CloudBoost.
Windows applications can take advantage of the limitless capacity of ECS by mapping ECS storage to a drive letter with the free CIFS-ECS Tool
available on the EMC Support Portal. ECS is the ideal storage platform for next generation application development; DevOps teams can leverage the
same technologies found in the public cloud, with the added benefit of enterprise-grade features like multi-tenancy, charge-back, and most importantly,
no access to public cloud developer resources that will lock your applications and workflow into a single cloud service provider.
9
Read/Write
Data Access
Read Only
Data Access
SyncIQ
Metadata
Replication
Disaster Recovery
IsilonSD Edge
Primary Isilon
CloudPools
Tiering
CloudPools
Tiering
ECS
Figure 3. Isilon CloudPools tiering to ECS. Folder data is tiered from primary Isilon to ECS as sharded,
nd
encrypted data, freeing up capacity in OneFS. Only metadata “stubs” from primary Isilon are replicated to 2
site IsilonSD Edge filesystem using SyncIQ replication, providing read-only access to data tiered to ECS.
CONCLUSION: WHAT IS THE TOTAL COST OF OWNERSHIP FOR TAPE?
Media companies must place a price tag on the risk of data unavailability during a mechanical failure in a robotic tape library. For a linear broadcast
company, the delay in data availability could result in a missed air date, advertiser revenue loss, and legal fees. One must consider the cost of labor for
engineering support to maintain the library, especially once every 3-5 years when it’s time to migrate to the latest LTO format. One must consider both
the cost of their archive software, software support, and licensing cost that may be redundant to their primary MAM system in functionality. One must
consider the operational expense of transportation, off-site storage, audit, and recall of mirrored tape copies. One must consider the cost of expensive
nearline NAS and SAN volumes required to cache media content for reads and/or writes to tape.
While we can’t help you with the cost of lost opportunities by keeping your valuable media content hidden on tape, we can help you calculate the total
cost of ownership for your current tape library compared to ECS. Try the latest version of ECS software for free for non-production use by visiting our
Free & Frictionless download. For more information, visit ECS online and contact a sales representative for a TCO analysis.
10