Riding Moore`s Law to Scale Live OTT Compression Solutions

Riding Moore’s Law to Scale Live OTT
Compression Solutions
White Paper
Abstract
Compression solutions for encoding have traditionally been commercially available using FPGAs and then transitioning to ASICs
or a mix thereof in order to maximize performance, while minimizing cost and power utilization. This methodology has worked
well for the traditional linear broadcast environment, in which there was usually one encoded stream per video input.
For over-the-top (OTT) distribution, Adaptive Bitrate (ABR) is utilized; the encoder needs to produce multiple streams for each
video input, enabling the end user to dynamically select the best stream for their network connection and device profile. This
simple change obsoletes the model of one device per output stream using legacy hardware encoding, which doesn’t scale well,
as each video service delivered could have five or more profiles.
This paper describes a software architecture running on server platforms and utilizing purpose-built hardware acceleration. By
riding the commoditization of server hardware and the ever-increasing processing capabilities, this architecture is able to provide
the critical advantage of scale and channel density for OTT networks.
Introduction
Video distribution has come a long way since the days of analog distribution. With the advances of MPEG-2, H.264 and now
High-Efficiency Video Coding (HEVC) compression, it has become possible to fit more services at higher quality into traditional
distribution mediums such as satellite, terrestrial and cable. At the same time, there have been significant improvements in
available Internet bandwidth to the end user, with the average Internet speed in the U.S. reaching 8.7 Mbps by the middle of
2013 [1]. The number of services available to consumers has ballooned from the 1990s to today, where there is the potential for
thousands of live services available at anytime to a connected home.
The combination of shrinking the needed bandwidth per service with compression and the rise in available Internet bandwidth
to the end user is leading to significant growth in streaming media, thus enabling providers to reach their customers easily with
content.
Software compression solutions have historically been the core of streaming media delivery networks, while hardware
compression solutions have been the core of linear delivery networks. Both software and hardware compression solutions have
their benefits and drawbacks. As linear and streaming services converge, it makes sense to look at architectures that leverage the
benefits of each solution and reduce the potential drawbacks.
Evolution of Compression Codecs
Over the last two decades, there has been widespread adoption of video compression for the distribution of video and audio
to consumers. MPEG-2 truly kicked off the migration from analog-only services to digital television and dramatically increased
the number of services available to the consumers. A switch from an analog transmission over satellite to DVB-S using MPEG-2
compression for standard definition (SD) services allowed providers to transmit eight services in place of a single analog service.
With the introduction H.264 video encoding, the distribution density was further doubled as H.264 required only half of the
bitrate of MPEG-2 compression for the same resolution. While a doubling in density for SD services was achieved, there was also
a transition to high-definition television (HDTV) that required 3-4 times more bandwidth than MPEG-2 SD video. By switching
to H.264, distributors were able to reduce the bitrate increase to 1.5 to 2 times that of MPEG 2 SD. Telco service providers took
Delivering the Moment
imaginecommunications.com
advantage of H.264 compression to enter the video distribution to the home market over their existing asymmetric digital
subscriber line (ADSL) infrastructure.
Entering the third decade of video compression distribution, a new codec has been developed: HEVC. As with H.264, HEVC
doubles the density of services being distributed for the same resolution. But this new codec also brings with it the promise
of delivery of even higher resolution video with support for Ultra High Definition Television (UHDTV), which for UHDTV-1 at
2160p60 is four times the resolution and bandwidth of a 1080p60 HDTV signal. While there is the promise for UHDTV services,
the key advantages for HEVC will be streaming services; the ability to provide a 720p or 1080p video at half the bitrate dramatically
reduces the cost of distribution for over-the-top (OTT) providers, while improving the experience for the end users.
As new codecs reduce the overall bandwidth for services, there is a cost in the terms of complexity to encode the services in the
new video compression algorithm. If we start with MPEG-2 NTSC video, there is an approximate 10 times increase in complexity
in using H.264 for the same video resolution. There is a further tenfold increase when HEVC is used for compression. If we look at
resolution changes, there is an approximate six times increase for encoding 1080i60 over NTSC, and approximately an additional
six times for UHDTV-1 for 2160p60. Figure 1 shows how the various codecs compare in complexity versus MPEG-2 SD with HEVC
UHDTV-1 having an increased complexity of 3,600 times.
Figure 1 – Complexity of codecs over time
Linear Services
Linear services are typically defined as live distributed services throughout the distribution chain using MPEG-2 transport streams
(TS). They are usually continuously being distributed as a dedicated channel. For example, the services a consumer would
watch over the air on a digital terrestrial television (DTT) system or through the cable, direct to home (DTH) or Internet protocol
television (IPTV) provider are typically linear services. Video on demand (VOD) and other on-demand services are not usually
considered linear services, as they are not served out in a linear fashion.
With the move from analog to digital, there was a dramatic increase in the number of channels available to the consumer, from
delivering 10s of channels in the early 1990s to now having 100s of channels available. In 2008, the average American subscribed
to 118 channels over cable [2].
Providers often utilize a dedicated encoder per service, but with the increase in the number of services, denser modular
encoding solutions have become more popular. They use less rack space, they save on power and they can provide built-in
redundancy to ease operational complexity for the providers.
There has been a move toward using transcoding solutions to take the services received from content providers and transcode
them to the formats and bitrates required for their end customers. The advantage of transcoding technologies is they tend to
have greater density than encoders as they have no baseband input processing requirements.
With the availability of new video resolutions, there tends to be a significant amount of simulcast for the majority of services.
Today, some providers create a down-converted SD version of their HDTV services to support legacy customers. This simulcast
effect is pushing providers to the denser solutions, as they are not only dealing with an increase in services, but also with multiple
versions of each service.
Delivering the Moment
imaginecommunications.com
Streaming Services
While linear services are considered dedicated services over MPEG-2 TS, streaming services are defined as programs destined for
the end user over IP. Streaming services used to be offered in multiple formats and codecs, and the user would select the codec
and format that worked with the combination of their PC and Internet connection. This method of selection was mostly trial and
error from the user’s perspective, resulting in a significant number of buffering messages and a fairly poor user experience.
Today, streaming services have improved dramatically, whereby the user no longer needs to try to figure out the best stream
for their connection. Depending on the provider, there can still be platform issues on the receiving side based on the DRM or
packaging selection. Overall, these changes have enhanced the end user experience.
With Adaptive Bitrate (ABR), a provider creates multiple different profiles for their service, defining the lowest tier of service to the highest.
Some services have anywhere from four to eight or more profiles. The profiles vary depending on the provider, but can go from Quarter
Common Intermediate Format (QCIF) all the way up to 1080p. The client device tunes to the service and selects a default profile; based on
its own algorithm, it will switch to the best-available quality based on available bandwidth and the bitrates of the profiles. It can move from
profile to profile near seamlessly, settling on the best-available profile for the end user.
With the additional profiles created using ABR to enhance the user experience, there is a greater demand on the encoding solution to create
four to eight times more encoded streams than traditional linear services. While a significant majority of the streams are lower resolution
and require less processing power to encode the lower profiles, a traditional linear service encoding model would still utilize one encoder
per profile. There is an added requirement of synchronization between each encoder to allow seamless switching between profiles. In such
a solution, a service with eight output profiles equates to approximately four HDTV encoders, depending on the selection of profiles.
Hardware Compression
Real-time encoders have typically used dedicated hardware to take in analog or SDI video and create an MPEG-2 or H.264
compressed video stream in a MPEG-2 TS. These encoders have historically been based on Application Specific Integrated
Circuits (ASICs) using dedicated silicon for the encoding and Field Programmable Gate Arrays (FPGAs) to support additional tasks
such as TS processing or video analysis. These are then controlled from an embedded processor. The disadvantage of this type
of design is that the ASICs are inflexible, and many key functions cannot be reprogrammed. The advantage of these designs is
that they tend to be lower cost for the performance they provide and require less power — both of which are key components in
the selection process for any encoding solution.
An alternative solution for hardware encoding is an all- FPGA encoder. The majority of the first real-time H.264 encoders were
FPGA based; this is also true for HEVC encoders. FPGA solutions provide the flexibility of being reprogrammable and have a faster
time to market. The key drawback of most FPGA solutions to date is that they have a higher cost and draw more power than a
comparable ASIC solution.
Hardware solutions work very well for single linear feeds, and they can often be adapted for streaming services if the design of
the ASIC or FPGA allows encoding based on the number of encoded blocks rather than on a single video. This will allow multiple
profiles to be encoded within the capacity of the encoding hardware. Most designs work for four to eight profiles depending on
the resolution of each profile. This design also overcomes any concerns with synchronization between encoders as they are all
managed out of the same hardware architecture. If there is a need for multiple high-resolution profiles, or more than eight, it
might be necessary to span beyond a single encoder design, thereby adding complexity in order to maintain synchronization
across encoders to support ABR alignment.
Software Compression
Software compression solutions are typically run on generic server platforms and have been historically utilized almost
exclusively for streaming or on-demand services. To encode a service, a server would have a video capture card and the software
encoding for the various output formats required. Such a solution provides a significant amount of flexibility as any component
of the software architecture can be upgraded and modified based on the application needs.
A software encoder provides users a significant amount of flexibility. To increase the number of services, providers purchase
the commodity server hardware required to run those services and load the appropriate software. A key advantage here is that
the commodity hardware is typically the same as that being used for a provider’s IT infrastructure. This provides flexibility in
launching new services as well as pricing breaks, due to the large market for server platforms over dedicated hardware solutions
for encoding.
The reason software compression solutions have not typically been used for linear services is due to performance and reliability
concerns in a server-based solution. Performance for linear services has lagged compared with hardware solutions. The first
Delivering the Moment
imaginecommunications.com
real-time software MPEG-2 HDTV encoders became available in 2003 [3] — almost a decade after the release of the MPEG-2
specification. Due to this, streaming services have been focused on lower video resolutions using codecs that are optimized for
high compression and low complexity for encoding.
Software solutions for real-time HDTV encoding of H.264 first became available around 2010 — seven years after the first solutions were
available for MPEG-2. Moore’s Law states that the transistor count on integrated circuits (ICs) is observed to double every two years, which
can be roughly correlated to doubling computing power every two years. If we start at 2003 for a real-time MPEG-2 HD encoder, and we
accept that H.264 HDTV encoding takes 10 times as much computational power, then we would expect real-time encoders to appear
within 3.3 generations, which translates to roughly 6.6 years. This would predict that servers should have the computational capacity to do
realtime software encode of H.264 HDTV by 2009, which lines up well to the actual availability of software encoders.
Figure 2 – Predicted software encoded stream density
In Figure 2, we apply Moore’s Law to predict servers without additional hardware acceleration should have the capacity to encode up to five
HDTV H.264 videos in real time today. There are commercial software solutions that meet these capabilities on commodity server hardware.
Hybrid Software and Hardware Compression
It has been shown that hardware compression solutions provide high-performance real-time encoding at the beginning of the
lifecycle for most codecs. Software compression solutions provide greater flexibility in compression offerings, but only meet
performance milestones later in the lifecycle of a codec. A method to have the best of both worlds is a software architecture
married with purpose-built hardware. Such a solution provides the flexibility of a software solution for generalized tasks such as
TS processing and table manipulation, while offloading the computational complexity of video encoding to attached hardware.
Mixing of software and hardware for compression solutions can provide significant density improvements, while providing
the flexibility of software upgrades and easier development of new features. The addition of GPU-assisted encoding can
provide a two-and-a-half to three times improvement in encoding density for H.264 encoding, depending on the efficiency
of the software [4]. This is a considerable jump in performance and allows for significant flexibility in how the codec is used for
compression. This would bring the total real-time encodes in a server to approximately 12 to 15 HDs, or three to four ABR profile
sets of eight.
While GPUs provide additional capacity for encoding, they are still designed for general-purpose computation, albeit for a
specific instruction set that can be useful for assisting H.264 and HEVC compression. An alternate proposal is to utilize ASICs
dedicated for compression on dedicated PCIe boards in a server. The advantage of an ASIC solution is that it has purpose-built
compression hardware with low power utilization, and high performance for encoding only. In this model, you are able use
software to perform all of the TS processing and rate control and ancillary features such as audio transcoding, while the ASICs
handle the heavy lifting of the video encoding.
This provides a flexible, high-performance and cost-effective solution. There are currently commercially available ASICs that
provide a density of four HDTV encodes [5] at a power level that can fit 10 on a PCIe board. It is feasible to achieve 40 HDTV
encodes in a server using a single PCIe slot. This produces 10 sets of eight ABR profiles. Using four of these PCIe boards with
the same base CPU expands the density to 160 HDTV encodes or 40 sets of eight ABR profiles. Such an architecture scales very
quickly as the software processing load is minimized by offloading all of the difficult tasks of encoding to the PCIe board.
Delivering the Moment
imaginecommunications.com
Both a GPU and an ASIC-based architecture will increase the cost and power utilization on a per-device level. While they both will
provide a reduction on a per-service basis, with the ASIC solution having almost 10 times the density, the savings is considerably
more.
Summary
Streaming media viewership has undergone tremendous growth over the last few years. With this growth, video quality and
quality of service has increased due in large part to improving Internet access and advances in streaming technologies such
as ABR. These improvements have led to ever-increasing encoding complexity over traditional linear services. The increase
in encoding complexity will only accelerate with the addition of HEVC and further need to simulcast different resolutions and
codecs to handle client compatibility.
While software encoding was key to the launch of streaming media, providers need to look at architectures that scale with
the additional complexity requirements of their customers. Hybrid software and hardware architectures are currently the best
options to provide the lower cost and greater flexibility and scalability needed to adapt to this changing market landscape.
Imagine Communications is a global leader in processing and compression solutions for media, broadcast, service provider, government,
and enterprise markets, offering an expansive portfolio of encoding and transcoding products. The company’s SelenioNext™ platform
provides up to 10 times the density and 10 times less power consumption than competitive solutions, with the ability to replace an entire
headend of video processing in a single platform. SelenioNext is an all-in-one TV Everywhere solution that enables service providers to
ingest precompressed services and transcode, package, encrypt and stream multiscreen, multi-device video. Integrating multiple functions
within a commercial-off-the-shelf (COTS) server platform, it provides a highly dense, scalable and operationally efficient package designed
to meet the growing demand for live programming to mobile and connected screens.
Available in 1U and 2U appliances or in a 10U blade system, Selenio Next easily fits into optimal form factors for all online video applications
— from providing a handful of IPTV streams to thousands of multiscreen transcodes. Utilizing advanced Adaptive Bit Rate (ABR) technology,
the system is unmatched in its support of up to 320 HD ABR or 320 SD ABR profiles per 2U server and its ability to scale up to any number
of profiles per video program. Selenio Next also features an onboard broadcast management system for superior control and visibility into
network resource optimization.
References
[1] Akamai, The State of the Internet, Volume 6 number 2, 2nd quarter, 2013 report
[2] The Nielsen Company, “Average U.S. Home Now Receives A Record 118.6 TV Channels, According To Nielsen”, June 2008,
http://www.nielsen.com/us/en/pressroom/2008/average_u_s__home.html
[3] AMD, “Moonlight Launches the World’s First software based 720p MPEG-2 Real Time Encoding Solution for AMD64
Technology-Based Systems“, November 2003, http://www.amd.com/us/pressreleases/Pages/Press_Release_79092.aspx
[4] Main Concept, “NVIDIA speed results”, June 2010, http://www.mainconcept.com/fileadmin/user_upload/download/product_
sheets/CUDA-Sheets_06-2010.pdf
[5] ViXS Systems, http://www.vixs.com/indexee.php/products/features/xcode-pro-200
Originally published in the 2014 NAB Broadcast Engineering Conference proceedings.
+1.866.4.Imagine
© 2014 Imagine Communications
Proprietary and Confidential
WP_SCALELIVEOTT_0914