Volume 1 Issue 3 - Full Issue

IN THIS ISSUE:
The big picture: technology to meet the challenges
of media fragmentation
.
Co-viewing on OTT devices: similarities
and differences
.
Using machine learning to predict
future TV ratings
VOL 1 ISSUE 3
FEBRUARY 2017
EDITOR-IN-CHIEF
SAUL ROSENBERG
MANAGING EDITOR
JEROME SAMSON
The world of measurement is changing.
Thanks to recent advances in data collection, transfer, storage and analysis,
there’s never been more data available to research organizations. But ‘Big
Data’ does not guarantee good data, and robust research methodologies are
more important than ever.
REVIEW BOARD
PAUL DONATO
EVP, Chief Research Officer
Watch R&D
KATHLEEN MANCINI
SVP, Communications
MAINAK MAZUMDAR
EVP, Chief Research Officer
Watch Data Science
Measurement Science is at the heart of what we do. Behind every piece of
data at Nielsen, behind every insight, there’s a world of scientific methods
and techniques in constant development. And we’re constantly cooperating
on ground-breaking initiatives with other scientists and thought-leaders in
the industry. All of this work happens under the hood, but it’s not any less
important. In fact, it’s absolutely fundamental in ensuring that the data our
clients receive from us is of the utmost quality.
These developments are very exciting to us, and we created the Nielsen
Journal of Measurement to share them with you.
FRANK PIOTROWSKI
EVP, Chief Research Officer
Buy Data Science
ARUN RAMASWAMY
Chief Engineer
ERIC SOLOMON
SVP, Product Leadership
WELCOME TO THE NIELSEN JOURNAL
OF MEASUREMENT
SAUL ROSENBERG
The Nielsen Journal of Measurement will explore the following topic areas in 2017:
BIG DATA - Articles in this topic area will explore ways in which Big Data
may be used to improve research methods and further our understanding of
consumer behavior.
SURVEYS - Surveys are everywhere these days, but unfortunately science
is often an afterthought. Articles in this area highlight how survey research
continues to evolve to answer today’s demands.
NEUROSCIENCE - We now have reliable tools to monitor a consumer’s
neurological and emotional response to a marketing stimulus. Articles in this
area keep you abreast of new developments in this rapidly evolving field.
ANALYTICS - Analytics are part of every business decision today, and data
science is a rich field of exploration and development. Articles in this area
showcase new data analysis techniques for measurement.
PANELS - Panels are the backbone of syndicated measurement solutions
around the world today. Articles in this area pertain to all aspects of panel
design, management and performance monitoring.
TECHNOLOGY - New technology is created every day, and some of it is so
groundbreaking that it can fundamentally transform our behavior. Articles in
this area explore the measurement implications of those new technologies.
FOREWORD
Welcome to the 3rd issue of the Nielsen Journal of Measurement!
In this third edition of the journal, and the first issue of 2017, we’re featuring three papers
that relate to the fascinating world of television measurement.
It’s easy to think of television as an established medium—and television research as
an established practice—but nothing can be further from the truth: television sits at
the epicenter of today’s changing media habits, and the measurement systems we’ve
developed over the years need to keep pace. That’s the topic we’re exploring in this issue’s
first paper, “The big picture: technology to meet the challenges of media fragmentation.”
Authored by Nielsen’s chief engineer, it provides a review of past best practices and
offers a deep dive into the many pieces that make up modern television measurement.
Our second paper, “Co-viewing on OTT devices: similarities and differences,” examines
the dynamics of television viewing on over-the-top (OTT) devices, using census
impression data from Roku, one of our most progressive data partners. It’s an important
research topic: Television watching has traditionally been a social activity—something
we often do as a family unit—but the increasing use of small screens (smartphones and
tablets) to watch TV content is transforming that experience. Can OTT devices reverse
that trend?
The third paper in this edition, “Using machine learning to predict future TV ratings,”
explores innovative methods recently developed by data scientists at Nielsen to predict
ratings based on historical data. The practical implications are evident and far-reaching:
With most TV advertising still bought at “upfront” events well ahead of schedule, any
improvement in predictive accuracy can bring substantial financial benefits to the
industry.
As usual, we’re including four shorter pieces in this issue to give you a preview of some
exciting new work we’re engaged in at Nielsen: an advanced system to analyze the impact
of advertising on in-store sales one purchase at a time; an evolutionary algorithm to
test millions of product design options simultaneously; a fuzzy matching algorithm to
normalize coding inconsistencies in longitudinal surveys; and an overview of the role of
memory—and its decay—in advertising.
Enjoy this new issue of the journal!
JEROME SAMSON, MANAGING EDITOR
VOL1, ISSUE 3
NIELSEN JOURNAL OF MEASUREMENT
IN THIS ISSUE
SNAPSHOTS
In each issue of the Journal, we start with a few snapshots to introduce current measurement topics
in a summary format. We expect to develop many of these snapshots into full-length articles in
future issues of the Journal.
1.
p
MEASURING THE IMPACT OF ADVERTISING
ONE PURCHASE AT A TIME.....................................................................................................6
2.
SURVIVAL OF THE FITTEST: USING EVOLUTIONARY ALGORITHMS
TO OPTIMIZE YOUR NEXT PRODUCT IDEA.......................................................................... 8
3.
FUZZY MATCHING TO THE RESCUE: ALIGNING SURVEY
DESIGNS ACROSS TIME......................................................................................................... 10
4.
UNDERSTANDING MEMORY IN ADVERTISING...................................................................12
FEATURED PAPERS
Full-length articles that illustrate how Nielsen is thinking about some of the most important
measurement challenges and opportunities in the industry today.
1.
2.
3.
4
THE BIG PICTURE: TECHNOLOGY TO MEET THE CHALLENGES OF MEDIA
FRAGMENTATION...................................................................................................................15
CO-VIEWING ON OTT DEVICES: SIMILARITIES
AND DIFFERENCES.................................................................................................................21
USING MACHINE LEARNING TO PREDICT FUTURE
TV RATINGS.............................................................................................................................30
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
SNAPSHOTS
SNAPSHOT #1
MEASURING THE IMPACT OF
ADVERTISING ONE PURCHASE
AT A TIME
BY LESLIE WOOD, Chief Research Officer, Nielsen Catalina Solutions
In recent years, the creation of large single-source datasets
has been a major boon to the advertising research industry.
At Nielsen Catalina Solutions, we’re combining in-store
sales data from millions of households with information on
whether or not those households are exposed to any given
ad campaign. By examining the sales differential between
exposed and unexposed households, we’re able to compute
the sales lift generated by thousands of campaigns with great
accuracy1.
The ANCOVA (analysis of covariance) model that forms
the basis of this test-and-control methodology has been
thoroughly tested, and it provides quick and reliable answers
to brand managers interested in measuring the effectiveness
of a campaign as a whole. But there are occasions when it
doesn’t quite fit the bill. Consider, for instance, the case of a
campaign that reached such a large audience that it’s nearly
impossible to find households that were not exposed to it
(see Fig. 1). Where would we find the control group?
FIG 1: STANDARD TEST-CONTROL METHODOLOGY
ANALYSIS LEVEL: HOUSEHOLDS AT CAMPAIGN LEVEL
YOUR AD
EXPOSED
UNEXPOSED
Overall Campaign Measurement Window
See details about this method in: Using single-source data to measure advertising effectiveness in VOL 1, ISSUE 2 of the Nielsen Journal of Measurement.
1
6
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
To address this challenge, we’ve developed a new
methodology called ‘Cognitive Advantics’ (CA). Instead of
examining sales lift in aggregate over the course of the entire
ad campaign, it analyzes the household sales data at the level
of each purchase occasion, and takes into account the timing
of ad exposures every step of the way—a much more granular
look at the data. After all, a household might see an ad, make
a purchase, see an ad again for the same brand, and we’d be
hard-pressed to say that the second ad had any influence on
that particular purchase. Conversely, a household might see
an ad, make a purchase two months later, and with so much
time in-between, it would be difficult to conclude that the ad
exposure was the determining factor behind that purchase.
By analyzing data at the purchase-occasion level, we’re able
to take exposure ‘recency’ into consideration—the more
recent the ad, the greater the impact, and while the effective
time window can vary from study to study, we generally look
back 28 days from the time of purchase to find one or more
exposures to which the purchase occasion can be attributed
(see Fig. 2). We’re also able to solve the control group
problem because while there may not be many households
who haven’t been exposed to the campaign at one point or
another, there are generally enough purchase occasions that
weren’t influenced by an ad right before the purchase—even
among exposed households.
To perform the analysis, the CA methodology takes all
relevant variables (purchase history, media consumption,
demographics, location, category purchases, etc.), feeds them
into a collection of data modeling algorithms and allows the
data to pick and combine the models so that the results have
the best (i.e., most statistically sound) cross-validation. This
is the ‘cognitive’ part in the CA name. The end result is a very
powerful tool that relies very little on human intervention and
can be deployed at scale.
As the market moves toward real-time, push-button solutions,
this is the next evolution in the measurement of advertising
effectiveness. The early results are very promising, and we’re
looking forward to sharing details, examples and performance
benchmarks in a future edition of the journal.
FIG. 2: COGNITIVE ADVANTICS METHODOLOGY
ANALYSIS LEVEL: PURCHASE OCCASION AT EXPOSURE LEVEL
Effective
Advertising
exposure
Ineffective
Advertising
exposure
Purchase Occasions
Influenced by
the Advertising
Purchase Occasions
Not Influenced by
the Advertising
Maximum length of time during which ad remains effective
7
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
SNAPSHOT #2
SURVIVAL OF THE FITTEST:
USING EVOLUTIONARY
ALGORITHMS TO OPTIMIZE YOUR
NEXT PRODUCT IDEA
BY KAMAL MALEK, SVP Innovation Data Science, Nielsen
Imagine the following scenario: You’re the marketing manager
for a leading brand of household products, and you’re
considering a new line of eco-friendly, multi-purpose cleaners.
You’ve studied market trends, measured your competition,
and conducted exploratory focus groups and consumer
interviews. In the process, you’ve identified a number of
essential attributes for your new product, including key
features and benefits, scent varieties, package design, colorschemes and graphic elements. Once you’ve combined all
the top ideas from your internal teams and creative agency,
you end up with seven possible design options for each of
six distinct product attributes. That’s more than 100,0002
possible combinations to sift through! Which of those
combinations are most likely to resonate with consumers and
lead to in-market success?
When faced with so many options, your first step might be
to use your best judgment to select a handful of product
versions and submit them to a wave of monadic concept
tests. In a monadic test, each version is presented to
a separate panel of representative consumers, who are
asked to rate the proposed product concept on a number
of dimensions (such as purchase intent, uniqueness, or
relevance ) before everyone’s scores are averaged to identify
the most promising version. The methodology behind
monadic tests is well understood and the technique is very
effective, but it only allows you to explore a very tiny fraction
2
of all possible alternatives. You need to pre-select the product
concepts that you believe are the most promising, and
that pre-selection is necessarily biased and often politically
charged. You’re most probably missing out on your best
options.
Modern choice-based conjoint analysis can help: In that
type of research, each respondent is presented a sequence
of product alternatives and asked to select their preferred
version in each of those side-by-side comparisons. The
collected responses are then used to build a choice model—
typically a hierarchical Bayesian logistic regression model—
which gives the probability of the respondent choosing
one concept over another as a function of the value of its
attributes. Unlike monadic concept tests, conjoint analysis
makes it possible to explore all values for all attributes, but
the models that result from this type of analysis are often too
simple to capture the holistic nature of how consumers react
to new consumer products. In most real-life situations, there
are important synergies and negative interactions between
attributes—especially when aesthetic elements are involved—
and those models are generally not good enough to reflect
them.
We developed a new approach to address those limitations.
It’s based on the principles of genetic evolution: We start
with a quasi-random initial set of product versions, present
76 = 117,649
8
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
them to respondents and, based on their feedback, select the
better performing ones as parents for breeding purposes. The
algorithm then uses genetic crossover to combine traits from
two parents and breed new product candidates (offspring);
mutation to introduce traits that were not present in either
parent; and replacement to eliminate poor-performing
members of the population and make room for the offspring.
Step by step, in survival-of-the-fittest fashion, the population
of new product concepts evolves to reflect the preferences
of the respondents, and we end up with perhaps four or
five top concepts that can be further investigated. The
genetic algorithm is essentially a search and optimization
process that is guided by human feedback every step of
the way and acts as a learning system. It doesn’t require
modeling complex human behavior—and solving the difficult
mathematical problems that come with such models—and
yet it implicitly accounts for all that complexity.
The Nielsen Optimizer service is based on this technique,
and we’ve used it for thousands of client projects already,
to great success. In fact, in an early comparative study,
we’ve measured that product concepts identified by Nielsen
Optimizer generate on average a lift of 38% in forecasted
revenue, compared to non-optimized (best-guess) concepts.
We typically need 500 to 1,000 respondents to conduct a
Nielsen Optimizer study and quickly reduce a set of 100,000
potential product versions down to its most promising
candidates—which can then be studied in greater detail with
monadic testing.
We will share more details on the genetic algorithm behind
Nielsen Optimizer, as well as relevant case studies, in
a future edition of the Nielsen Journal of Measurement.
While we have more work to do to improve the respondent
interface, fine-tune the analytics systems and shorten delivery
time, there’s no question that this technique is already
making it possible for brand managers to save time, explore
more ground, and bring their new products to market with
much more confidence.
9
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
SNAPSHOT #3
FUZZY MATCHING TO THE RESCUE:
ALIGNING SURVEY DESIGNS
ACROSS TIME
BY JENNIFER SHIN, Sr. Principal Data Scientist, Nielsen
GAN SONG, PhD candidate, Columbia University
Surveys are a valuable tool for any market research
company. As a leading global information and
measurement company, Nielsen has developed complex
models and methodologies that hinge on the accuracy of
survey data we use in our products. Survey data not only
provides insights about what people watch, listen to, and
buy, but it also helps media companies define and reach
their target audiences.
Obtaining these insights is not without challenges. Over
time, surveys are typically modified to collect new data, or
to improve the quality of the information collected from
respondents. It’s not just that new questions get introduced,
but old questions might receive a new treatment, often with
new answer choices added to the mix. While this can greatly
improve the value of a survey, those changes can introduce
inconsistencies each time the survey is administered.
For instance, take this question: How frequently do you
purchase dental floss in your household? Respondents have
two predefined answer choices: ‘(1) 0-2 times in the past
month’; and ‘(2) 3+ times in the past month’. To help tabulate
the data and retain some meaning to the metadata, analysts
decide to create two variables: ’Dental Floss: Light Users: 0-2
Times/Last Month: Total Category’ and ‘Dental Floss: Users:
3+ Times/Last Month: Total Category’. Why Total Category?
Because there might be many variants in the market: waxed,
multifilament, mint-flavored, etc.
10
Now suppose that six months later, the same survey is
administered to a new group of respondents, with the same
exact question, but the variable names have been changed
to ‘Dental Floss: Times/Last Month: Light (0-2)’ and ‘Dental
Floss: Times/Last Month: Heavy (3+)’ because those names
are shorter, or we don’t care about different varieties after
all, or they make more sense according to a new survey-wide
naming convention. Wait another six months, and we might
add a medium tier: ‘Dental Floss: Times/Last Month: Light (02)’, ‘Dental Floss: Times/Last Month: Medium (3-4)’ and ‘Dental
Floss: Times/Last Month: Heavy (5+)’.
In real life, naming conventions change all the time, either
on purpose or by accident. How then do we match that data
over time? With the right domain expertise, the solution
might be simple enough for one or two variables, but some
surveys have thousands of variables. For example, at Nielsen,
we’re working with one survey that contains attitudes, usage,
and purchasing information for over 6,000 products and
contains 20,000 variables across 26 categories. Every time
it gets refreshed—twice a year—approximately 80 percent of
the questions remain the same, and 20 percent involve new
questions and modified answer choices. That means that
4,000 variables need to be examined and lined up against
previous data.
Specifically, matching responses requires recognizing
changes in formatting, choices, questions, and categories as
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
well as identifying new additions and deletions. The manual
effort takes two weeks—just for that one survey—and is
prone to tabulation mistakes and errors of interpretation.
That’s where machine learning can help. In particular, a type
of algorithm that involves fuzzy string matching.
In string matching problems, the Levenshtein algorithm is
a natural place to start. It’s a simple and efficient dynamic
programming solution used to calculate the minimum
number of character substitutions, insertions and deletions
that are necessary to convert one word into another—that
is, to minimize the “distance” between those two words.
In our case, those words are the names of the survey labels
(data fields) that may have changed from one survey iteration
to the next, and need to be harmonized to allow analysts
to compute trends. Taking our solution one step further,
we developed a model that broke down each label into
separate sections—or cells—according to certain structural
characteristics, and computed the Levenshtein distance
within each of those cells. And because we’re dealing with
problems where thousands of such calculations need to take
place in short order, we paralleled the code to apply it more
efficiently to large problem sets.
Our innovative cell-based comparison model outperforms
the existing word-based comparison models by a substantial
margin, and we’re looking forward to sharing the details of
our approach in an upcoming issue of the journal.
11
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
SNAPSHOT #4
UNDERSTANDING MEMORY IN
ADVERTISING
BY DAVID BRANDT, EVP, Product Leadership, Nielsen
INGRID NIEUWENHUIS, Director, Neuroscience, Nielsen
Advertisers and those who measure the impact of advertising
are obsessed with memory. If advertising is to be successful,
it has to stick in the consumer’s memory—or so the saying
goes. But what exactly is that thing called memory, how long
does it linger, and how do we measure it?
At the first level, memory can be divided into two types:
Explicit memory, which refers to information we are aware of
(the facts and events we can consciously access), and implicit
memory, which refers to information we’re not consciously
aware of (it’s stored in our brain and can affect behavior, but
we can’t recall it). Explicit memory can be further divided into
episodic memory and semantic memory. Episodic memory is
the memory of an event in space and time—it includes other
contextual information present at that time. On the other
hand, semantic memory is a more structured record of facts,
meanings, concepts and knowledge that is divorced from
accompanying episodic details.
How do these various types of memories come in play in
advertising? Advertising memories that we retrieve through
standard recall and recognition cues are episodic. Here are
a few questions that researchers might use to retrieve those
memories: What brand of smartphone did you see advertised
on TV last night? Do you recall if it was a Samsung Galaxy S7
or an iPhone 7? What if I told you it aired during last night’s
episode of Madam Secretary? What if I told you it featured
a father shooting a video of his young daughter playing
a scene in Romeo and Juliet? But very often, consumers
cannot tell us exactly how they came to know what they know
about a brand. They know that Coca-Cola is refreshing, for
12
instance, but cannot tell us exactly how they first came by that
information. Was it an ad they saw, a word from a friend, a
personal experience? That memory is semantic. Unconscious
associations (such as a non-accessible childhood experience
of drinking Coca-Cola during a hot summer) create implicit
memories that can continue to affect brand preferences much
later in life.
Memory is a complex concept, with different types of
memories serving different roles, and the nature and
content of our memories changes over time. If consumers
can’t remember what they saw last night without a prompt,
but something they saw years ago still has an effect on
them, it’s important that we, as researchers, gain a better
understanding of the impact that time has on memory.
Research tells us that memories start to decay immediately
after they’re formed. That decay follows a curve that is very
steep at the beginning (the steepest rate of decay occurs in
the first 24 hours) and levels off over time. In a controlled
experiment, Nielsen tested the memorability of 49 video
ads immediately after consumers were exposed to them in
a clutter reel, and we tested that memorability again the day
after exposure (among a separate group of people). Levels of
branded recognition had fallen nearly in half overnight. This
isn’t just happening in the lab: Nielsen’s in-market tracking
data shows similar patterns.
Does this rapid decay of memory spell doom for the ad
industry? Not at all. The fact that a specific memory can’t
be recalled doesn’t mean that it’s fully gone. For one,
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
relearning explicit information that is almost fully forgotten
is much faster than learning it the first time around. Practice
(repetition) indeed makes perfect—and can help create
durable memories. In addition, the most striking revelation of
a decay curve is not the steep decline at the start, but rather
the leveling-off that occurs over the long term. We studied
brand memorability decay over a longer period of time for a
number of digital video ads recently, and while recall dropped
for all ads by 50 percent in the first 24 hours (as was the case
in our earlier study), it still stood at that same 50 percent
level five days later for half of the brands.
What does this tell us about measuring memory? First, that
time between exposure and measurement matters. The
24 hours mark is ideal because that’s the point where the
memory curve starts to flatten. Second, that advertising
memories are encoded in context (asking questions about
the show in which the ad aired, for instance, is going to help
consumers remember that ad). Finally, that memories can
endure—either via repetition for explicit types of memories,
or via implicit internalization.
To help advertisers in today’s cluttered advertising
environment, researchers need to measure memory in all
its forms. At Nielsen, we capture important performance
metrics for ad memorability using carefully crafted surveys,
and those surveys are conducted in a way that produces
reliable benchmarks for the industry. And with the tools of
neuroscience3, we can now measure brain activity during
exposure and monitor both explicit and implicit memory
systems with second-by-second granularity. Together,
these different research techniques are helping us better
understand the nature of memories—and how memories and
advertising come to interact.
See From theory to common practice: consumer neuroscience goes mainstream in VOL 1 ISSUE 2 of the Nielsen Journal of Measurement.
3
13
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
FEATURED PAPERS
THE BIG PICTURE: TECHNOLOGY TO
MEET THE CHALLENGES OF MEDIA
FRAGMENTATION
BY ARUN RAMASWAMY Chief Engineer, Nielsen
INTRODUCTION
It’s a great time to be a media consumer, creator, or
distributor. New streaming technologies with over-thetop (OTT) apps, connected devices and social media are
expanding the media landscape. While traditional linear TV
offers an increasing array of new channels and new features
(e.g., cloud-based DVR), OTT providers are making their
mark with curated and skinny bundles for live programming
choices. Exclusive content from OTT and subscription videoon-demand (SVOD) providers is exploding. Consumers can
truly choose to watch anytime, anywhere and on any device.
On the technology side, data management platforms,
advertising exchanges and real-time programmatic
15
technologies are revolutionizing the ad industry with
data-driven and predictive ad delivery capabilities. These
technologies are making it possible to reach consumers or
preferred lifestyle segments with personalized ads.
While those changes are great for the consumer, they are
creating more complexity in the ecosystem, and thus more
challenges for media researchers. Those challenges can be
grouped into two broad categories: media fragmentation
(more content and channels that need to be measured) and
device fragmentation (media consumption on more diverse
digital platforms). To make the right business decisions in
this highly complex marketplace, content owners, publishers,
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
advertisers and their agencies need a reliable solution that
can address this two-pronged challenge. They need a full
picture of the consumption of both ads and content, piecing
together all devices and distribution channels to produce
what we at Nielsen call a ‘total audience’ measurement
solution.
This paper outlines the key technology developments that are
making it possible.
HOW LINEAR TV AUDIENCE
MEASUREMENT WORKS TODAY
Let’s set the stage by first reviewing how audience
measurement is performed in the U.S. for linear TV—the
oldest and still the most widely used media platform
available1. In linear TV, the same programming and the same
set of national commercials are broadcast to all viewing
audiences of a given channel. In that context, a panel that
is statistically sampled from the TV viewing universe is well
suited to collect the data and estimate audience figures for
the vast majority of programs and advertisements.
The major technical components of the ratings system
operated by Nielsen in the U.S. for linear TV are highlighted
in Fig. 1.
Content identif ication
Nielsen leverages dual engines for content identification:
watermarking and fingerprinting.
The Nielsen audio watermark is an inaudible signal that is
inserted in the content’s audio by a device called an encoder.
The signal is algorithmically hidden or masked so that it is
not audible to viewers. The information in the watermark
helps identify the source of the program along with the
time of broadcast. More than 3,000 Nielsen watermarking
encoders (hardware and software versions) are installed at
broadcast networks, cable networks and local TV stations in
the U.S., covering over 97% of all broadcast content on the
air. VOD content is encoded to carry the Nielsen watermark
too.
Nielsen also identifies content via audio fingerprints
(sometimes called ‘signatures’). Fingerprinting is a popular
content identification technology. Around 900 media
monitoring sites collect fingerprints in all metered markets,
for all broadcast content, and store them in a central
reference library.
In-home data collection
Once a home has been recruited and has agreed to be part of
a Nielsen panel, a Nielsen meter is installed at every TV site
in the household by Nielsen field technicians. In every home,
FIGURE 1: TECHNICAL STEPS FOR NIELSEN’S TRADITIONAL TV RATINGS SYSTEM IN THE U.S.
In-home data collection: using
tuning-meters and people-meters
PANEL HOMES
METERED MARKETS
Content identification:
watermarks encoded
at TV networks and
stations
Processing and ratings:
centralized at
Nielsen back-office
Content identification:
fingerprints at media
monitoring sites
See:The Nielsen Total Audience Report: Q3 2016
1
16
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
the meters capture two important measurement ingredients:
tuning (i.e., what is being watched); and audience (i.e., who
is watching).
deploy simpler versions (called GTAM Lite and Code Reader)
for simpler configurations and for smaller markets. These
various types of meters are shown in Fig. 2.
The software in the Nielsen meter performs the following
key functions: It identifies which device is actively feeding
content to the TV (source detection); It decodes the Nielsen
watermark algorithm from the audio; It computes the
fingerprint algorithm of the audio; It determines the On/Off
state of the TV; And it communicates the collected data back
to the Nielsen back-office.
Meters are required to perform at a high level of accuracy.
One metric that Nielsen monitors closely is the amount of
identification that comes from watermarks. High numbers
validate the efficacy of watermarked transmission and
detection. For example, in the past six months, GTAM meters
were able to credit 97.59% of all viewing using watermarks,
and the balance of 2.41% using fingerprints.
Nielsen’s current portfolio of meters is built to meet market
needs. The GTAM (global television audience meter) is our
most comprehensive meter and is installed when a site’s
measurement requirements are complex (i.e., with multiple
consumer devices, surround sound audio, etc.). We can also
In panels where we wish to electronically capture the
audience (who is watching), an additional device is installed:
the people-meter. The people-meter has a text-based display
to communicate with the panelist, and a remote control for
the panelist to interact with the device (see Fig. 3).
FIGURE 2: VARIOUS TYPES OF NIELSEN METERS
The people-meter is installed near the TV and is fully visible
to the panelists. When the TV is on, this device prompts the
panelists to periodically log themselves in as active viewers.
The people-meter transfers the data it collects to the colocated Nielsen TV meter, so that the tuning in the home can
be properly attributed to who is watching the content.
Processing and ratings computation
GTAM Lite
GTAM
The data collected from panel homes is cleansed, credited
to distributors (a network or local station, for example) and
mapped to specific programs and commercials. It serves as
the basis for daily ratings computations.
Code Reader
FIGURE 3: THE NIELSEN PEOPLE-METER
17
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
MEETING THE CHALLENGE: MEDIA
FRAGMENTATION
house—such as wearables, smartphones or even a new breed
of devices developed by Nielsen to capture over-the-top
(OTT) and broadband content delivery.
Now that we have a basic understanding of the traditional TV
measurement infrastructure, it’s time to examine how today’s
media realities are affecting the ratings environment, and
what type of technology development is underway to address
those challenges.
These new meters are making it possible for us to address the
measurement requirements of the modern connected home
and its cloud-based content delivery. And they come with
remote troubleshooting capabilities that can give technicians
a real-time view into the home’s television environment—
reducing the need for a field visit in many of the cases.
Today’s media environment features many more distribution
channels than ever before. As the number of channels
multiplies (live as well as on-demand), there are instances,
for programs with very small audiences, where the ratings
derived from the panels are zero. Simply put, panels are not
large enough to capture the audiences of the long tail.
One solution to that problem is to leverage return path data
(RPD) from set-top boxes, Smart TVs and other devices.
These sources of big data, while missing demographics, can
fill specific volumetric gaps in panel data2. Another solution
is to increase the size of the panel by deploying more meters.
Nielsen has done just that many times in its history—after
all, the national TV measurement service in the U.S. relied on
5,000 households up until 2003, and it now includes nearly
40,000 households.
Of course, as with any replacement to production equipment,
we need to make sure that these next-generation devices are
delivering at a level of data accuracy that’s at least equivalent
to current benchmarks. One particular metric of interest
is the in-tabulation (in-tab) rate, the percentage of homes
in the panel with data that has passed stringent quality
tests, and that are therefore cleared to be part of the ratings
estimates for the day. We’re in the process of evaluating the
performance of our next-generation meters and the early
results are extremely encouraging. We’re looking forward to
their rollout in the near future.
FIGURE 4: AN EARLY VERSION OF THE
NEXT-GENERATION METER
But panel expansion can create a strain on logistics and
maintenance operations. It’s not just the total time needed to
install measurement equipment in those new homes that’s at
stake. Once meters are installed, Nielsen ensures that panels
are maintained through regular field technician visits and a
strict monitoring of key performance indicators. Visits are
also needed to coach and maintain contact with panelists,
service malfunctioning meters, or connect new devices. The
attention we pay to these field operations is one of the main
reasons why Nielsen’s panels are so robust, and the data so
reliable.
These technical and operational realities gave us an
opportunity to rethink our metering technology. By leveraging
new low-energy processors (spurred by the IoT phenomenon)
and integrated components, we’ve developed next-generation
meters that combine measurement functions in a single
compact unit with a modern design. The physical interfaces
on those new devices are kept to a minimum in favor of
wireless interfaces, significantly reducing the amount of
wiring—and thus the amount of time spent on installation.
They can communicate with other elements around the
2
See The value of panels in modeling big data in VOL1 ISSUE1 of the Nielsen Journal of Measurement.
18
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
MEETING THE CHALLENGE:
DEVICE FRAGMENTATION
Consider now the connected devices landscape—devices
like smartphones, tablets, connected TVs and other OTT
devices (Roku, Apple TV and more). There are myriads
of apps that offer content choices. That content could
have originated from TV (on-demand or live), or it could
be purely digital. What about the ad model? Some of
the content may have no ads, linear ads or ads that are
targeted dynamically. Fig.5 helps visualize these various
combinations of digital content origin and ad model.
From a measurement perspective, even a large panel may be
statistically insufficient to capture all the variances in devices,
apps and ad models. To address this challenge, we use
census impressions from digital devices and calibrate those
impressions with data from our panels (where we know what
the demographics are). Census collection is a 360-degree
view of all impressions for all consumers from all digital
devices and apps (PCs, Macs, mobile, tablets, connected
devices). The overall measurement process is shown in
Fig. 6. It involves the familiar steps of content identification,
data collection and processing and ratings computation,
but with a few adjustments to meet the needs of the digital
infrastructure. Let’s review what those adjustments are.
FIGURE 5: VARIOUS CONFIGURATIONS OF DIGITAL
CONTENT ORIGIN AND AD MODEL
CONTENT ORIGIN
AD MODEL
Originates from linear TV
Linear ad load (the ads are the same as when the content aired on linear TV)
Originates from linear TV
Dynamic ad load (the ads are not the
same, and their insertion might be a
function of a number of audience targeting criteria)
Native digital
Changes to number of ad
spots and ad loads
FIGURE 6: TECHNICAL STEPS TO ADDRESS DEVICE FRAGMENTATION IN THE U.S.
TV originated
watermarks
app SDK
census collection
total ratings
+
[14AF52BC1114398]
tags
native digital
browser SDK
data enrichment
digital TV
ratings
digital content
ratings
Content identif ication
Data collection
Nielsen has created software that has been embedded in
most leading transcoders to extract the Nielsen watermark
from the audio and re-insert it as metadata in the digital
stream. This metadata tag (called ID3) is supported on most
leading streaming formats and is now easy to access from the
streaming content.
The next part of the puzzle is the meter equivalent. Rather
than physical meters in a select number of panel homes,
we have created a software library called the software
development kit (SDK) that’s deployed to the universe of
digital viewers. The SDK is instrumented in publisher and
aggregator apps (e.g., apps from multichannel video program
distributors), as well as on browser pages that stream or
render content. Every time a consumer watches content,
the SDK captures the measurement data (impressions) and
transmits ID3 or CMS tag data back to Nielsen’s collection
system. By having the same software handle both ID3 and
CMS tags, Nielsen clients have the flexibility to choose
between advertisement models (linear or dynamic) in order to
maximize their monetization objectives.
If there is no Nielsen watermark present (as is often the case
for native digital content), we leverage the client’s metadata
(program name, title, length, type and more) to identify the
content. This metadata is provided directly by the client’s
content management system (CMS). Note that video content
isn’t the only media type that can benefit from this approach:
static media (e.g., banner ads, pop-ups, etc.) can be tagged in
exactly the same way.
19
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
Processing and ratings computation
Processing census impressions is in the domain of very big
data, and we make use of all the relevant data storage and
processing technologies (such as Hadoop, Spark, NoSQL and
Kafka) on cloud-based platforms in order to process that data
at scale. Once impressions and demographics are combined,
we can proceed with ratings computations and produce
digital TV and digital content ratings.
As a final step, the linear TV ratings and digital ratings
are combined for a total content ratings number, and the
complex, fragmented picture we started with at the beginning
of this paper is now complete.
A COMPREHENSIVE SOLUTION AND
A ROADMAP FOR THE FUTURE OF
MEASUREMENT
Recent innovations such as smarter meters and census
data collection are helping us solve the puzzle of today’s
fragmented media ecosystem (see Fig. 7). So, what lies
ahead?
The marketplace keeps evolving, of course, and we’re
already exploring exciting new developments to track
where consumers are going in the next few years. In
particular, the world of IoT is upon us. More devices and
in-home appliances are getting connected every day, and
becoming smarter. It’s a natural fit for us to envision ways
to integrate our meters with consumer IoT devices. We’re
also investigating wearables embedded with our modern
content recognition technology to create new person-based
measurement devices.
On the digital front, our focus is to increase the footprint
of our solutions and make it easier for clients to implement
our measurement technology. To that effect, the engineering
team at Nielsen is working on a new innovation, named cloud
API, that doesn’t require an integrated client library like the
SDK, but rather leverages web APIs to collect data. With a
cloud control, it will be easier to take advantage of advances
in machine learning to make the systems cognitive and
intelligent.
There’s a whole world of developments ahead of us, and we
will expand on these new opportunities in a future paper. It’s
an exciting time to be a technologist at Nielsen!
FIGURE 7: A SUMMARY OF TECHNICAL SOLUTIONS TO ADDRESS TODAY’S MEASUREMENT CHALLENGES
20
Linear TV
Media
fragmentation
Device
fragmentation
watermarks
fingerprints
RPD
Next-generation meter
SDK
census collection
NIELSEN JOURNAL OF MEASUREMENT, VOL 1, ISSUE 3
CO-VIEWING ON OTT DEVICES:
SIMILARITIES AND DIFFERENCES
BY KUMAR RAO, KAMER YILDIZ AND MOLLY POPPIE Data Science Methods, Nielsen
INTRODUCTION
When we watch television, we often have someone else in our
household watching with us: a spouse, a child, a roommate,
even a family guest. That behavior is called ‘co-viewing,’ and
it’s been a topic of intense social research for as long as
television has been around.
Co-viewing has been a topic of commercial interest as well
ever since it was discovered that joint media attention could
improve learning1, engage memory, and thus by extension
stimulate brand recall. Today, co-viewing is not limited to
traditional television viewing—what we refer to in the industry
1
as linear TV. With the emergence of digital technologies and
increased content streaming over the Internet, it’s become
vital for media companies to understand consumers’ coviewing patterns across different platforms.
While co-viewing trends on tablets and smartphones have
been studied2, co-viewing activity using over-the-top (OTT)
capabilities (connected devices like Roku and Apple TV, Smart
TVs, and game consoles) has received limited attention due
to a lack of accurate measurement solutions. However, with
programming content typically displayed on a regular-size
See for instance the research conducted as early as 1967 by the Children’s Television Network to launch and run the landmark TV series Sesame Street.
Dan, O. (2014). M Marks the Spot: Audience Behaviors Across Mobile. Paper presented at the Advertising Research Foundation: Audience Measurement, New York, NY.
2
21
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
television screen and in a familiar household setting—the
hallmarks of traditional co-viewing activity—OTT devices are
probably the digital platform that should intuitively invite the
most immediate scrutiny.
Co-viewing of OTT content (programming content as well
as ads) presents an interesting challenge for audience
measurement. The viewing environment might be familiar
(the living room, the bedroom, the kitchen, etc.), yet the
OTT ecosystem has some unique characteristics (content
distribution, access, choice, viewer identification, etc.), and
measuring streaming activity in that new ecosystem involves
a few adjustments to traditional media research solutions.
In this paper, we present research on the dynamics of coviewing activity on OTT devices, and how they compare
to co-viewing benchmarks for standard television. The
preliminary findings from this study should be of interest to
researchers looking to better understand the media habits of
the population of viewers behind these devices, and to media
companies looking to make the most of OTT platforms for
programming and advertising applications.
BACKGROUND
Early co-viewing studies examined the effect of VCRs (in their
ability to facilitate family movie nights, for instance) when
they were first introduced, and the educational effects of
having a parent watch TV with their child (e.g., mentoring,
mediation, etc.). More recent studies have explored how
people are expanding the co-viewing experience via social
media (by tweeting about a live TV event, for instance). While
there are a few exceptions, the body of literature on the topic
leaves little doubt that the outcome is generally very positive:
Co-viewing adds context to the viewing experience, enhances
social interactions, and creates a stronger bond between
viewers and the content (programs and ads) they’re watching.
But today’s new technology is inviting a re-examination.
Programming options are proliferating and people are
consuming more media content than ever before3. Digital
video recorders (DVR), video-on-demand (VOD) services and
3
online streaming capabilities are empowering consumers
to watch television programming on their own schedule.
This means that in theory, people are increasingly watching
content that’s more aligned with their own individual tastes—
and thus quite possibly less aligned with the tastes of other
members of their family. In this new ecosystem, the media
industry sees an opportunity to target ads that are more
directly suited to those individuals, but is it worth the tradeoff if it comes at the expense of co-viewing?
Before we can answer that question, we need to size up
the problem: Is today’s streaming technology affecting
co-viewing, and if so, to what extent? Video streaming can
take place on a smartphone or a tablet, and it’s not difficult
to imagine that the size of those devices can be a physical
impediment to co-viewing. But video streaming via an overthe-top device gets displayed on a ‘regular’ television screen.
How does co-viewing in that type of environment compare to
co-viewing on traditional television?
This is what we set out to find out in this paper. The view
in the industry is that co-viewing on OTT devices must be
largely similar to that observed on linear TV. This hypothesis
is reassuring for the media industry, of course, but we felt it
was important to validate it against statistically representative
data and use the industry-standard Nielsen ratings service
as the benchmark. This would not just allow us to accurately
quantify the key differences, but also examine more closely
the idiosyncratic behavior of certain demographic groups.
Nielsen recently partnered with Roku to deliver audience
measurement solutions on TV-connected devices. For
this paper, we used detailed campaign-level data from
this new service to take a closer look at OTT co-viewing
behavior and compare it to co-viewing incidence levels on
traditional television. Specifically, we conducted a post-facto
examination of a large volume of OTT campaign data in order
to understand the nuances and patterns in co-viewing of OTT
impressions. The combination of big data from Roku and
nationally-representative panel data from Nielsen gave us the
opportunity to develop a robust methodology to conduct this
research exploration.
See The Nielsen Total Audience Report: Q3 2016
22
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
STUDY DESIGN
Data capture and calibration
To measure advertising audiences on digital platforms
(like Roku), Nielsen developed a census-based system that
leverages software plug-ins that are directly embedded in the
media player apps of those providers4.
Data used in this study
The empirical analysis in this study is based on OTT campaigns
measured during two different time periods. The first dataset
was a six-month dataset (Nov 2015 – May 2016) comprising 15
campaigns and involving 18 million impressions. The second
was a three-month dataset (May – July 2016) comprising 36
campaigns and involving 112 million impressions. The second
dataset was simply a temporal extension of the first one and was
used to drill down into data cross-sections in a way that wasn’t
possible with the first dataset.
The TV viewing data was based on six months (Dec 2015 –
May 2016) of live TV viewing from active Nielsen National
People Meter (NPM) panel households (N=34, 831). Among
these households, around half (51%, N=17, 817) viewed live TV
on sets connected to an OTT device. In this study, we used TV
viewing from that subset of panel households, as opposed to
viewing from all households in the panel.
A side-by-side comparison of TV and OTT viewing in a sample
can only be meaningful if the sampled units have access to
both TV and OTT. The presence of an OTT device in the home
implies certain distinct characteristics: age, income, access
to broadband internet service, etc. Figure 1 illustrates the
marginal distributions of demographic characteristics across
all NPM and OTT households. Limiting our TV data to that
coming from OTT-capable households allows us to minimize
that demographic bias and offer a fair comparison of coviewing activity between OTT and linear TV among people
living in similar types of households.
FIGURE 1: MARGINAL DISTRIBUTIONS OF DEMOGRAPHICS ACROSS ALL AND OTT NPM HOUSEHOLDS
(A) OTT
(B) ALL
Index
NPM HHs
(A/B)
NPM HHs
(n=17,817) (%) (n=34,831) (%)
Head-ofHousehold
(HOH) Age
Age 16 - 24
Age 25 - 34
Age 35 - 44
Age 45 - 54
Age 55 +
2.5%
17.9%
21.0%
22.8%
35.7%
2.4%
15.0%
17.2%
20.7%
44.7%
1.0
1.2
1.2
1.1
0.8
Household
Size
HH Size: 1
HH Size: 2
HH Size: 3
HH Size: 4
HH Size: 5+
11.5%
28.5%
19.0%
20.0%
21.0%
17.7%
30.3%
17.7%
16.6%
17.7%
0.7
0.9
1.1
1.2
1.2
Index
(A) OTT
(B) ALL
NPM HHs NPM HHs (A/B)
(n=17,817) (%) (n=34,831) (%)
Number
of Kids
No. of Kids: 0
No. of Kids: 1
No. of Kids: 2
No. of Kids: 3 +
56.3%
19.9%
17.0%
6.8%
61.4%
16.6%
13.4%
8.6%
0.9
1.2
1.3
0.8
Hispanic
HOH
Yes
No
85.0%
15.0%
85.8%
14.2%
1.0
1.1
11.5%
21.9%
21.3%
16.9%
28.5%
18.2%
24.7%
20.5%
14.4%
22.3%
0.6
0.9
1.0
1.2
1.3
Household
Income
< $25,000
$25,000 - <$50,000
$50,000 - <$75,000
$75,000 - <$100,000
$100,000+
For its ability to capture impressions from all devices, not just a sample, this measurement approach is referred to as ‘census measurement.’ See a full
description of this method in: The big picture: technology to meet the challenges of media fragmentation in this issue of the Nielsen Journal of Measurement.
4
23
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
Def inition of co-viewing metrics
RESULTS
In this study, we define the OTT co-viewing rate as the
proportion of impressions that were viewed by two or more
viewers. That is, for a dimension “d,” the co-viewing rate is
expressed as:
Overall co-viewing rate on OTT and linear TV
The dimensions are demographic groups, defined for
instance by age and gender combinations (e.g., Males 1824), or by time periods (e.g., weekday, weekend, daytime,
evening). In the census data, each OTT ad impression is
recorded as a viewing transaction with a particular daypart
and the genre of the program that contained the ad.
Similarly, we define the TV co-viewing rate as the proportion
of viewing events that were viewed by two or more viewers5:
We measured an overall OTT co-viewing rate of 34%,
compared to 48% for linear TV. This difference isn’t entirely
surprising. OTT devices offer consumers many more viewing
options than linear TV does, and while that diversity gives
people a chance to find a program they can enjoy as a
group, it also gives them the option to pick a program that’s
uniquely tailored to them—and no one else in the household.
Linear TV also has the edge when it comes to live television
events (e.g., sports, awards shows, political debates, etc.)
that tend to be viewed with others.
Whether on TV or OTT, most of the co-viewing activity (70%
for TV and 76% for OTT) involves only two persons.
FIGURE 2: OTT AND TV CO-VIEWING DISTRIBUTION
8%
Here, TV viewing events are aggregates of minute-level TV
data collected via meters in the NPM panel. The aggregation
is based on program, originator, household, viewing date,
daypart, and the age and gender of household members.
Each viewing event therefore corresponds to the viewing
of a program at the daypart level by a member in the panel
household for a particular program that aired live in the last 7
days.
The following limitations should be considered when
comparing the OTT and TV co-viewing rates: First, the
OTT data is based on ad exposures, whereas the TV data
is based on viewership of TV programs; second, the time
periods selected for OTT and TV are largely overlapping,
but they’re not an exact match; third, we did not control for
the moderating effects of content type, timing, and genre;
and finally, the OTT data we used in this study is restricted
to Roku data, and to a limited number of campaigns run
on the Roku platform. Still, we believe that the data and
metrics are sufficiently well aligned to provide a good basis
of comparison for this exploratory analysis into the common
viewing patterns and behaviors of U.S. media consumers.
26%
14%
34%
48%
34%
66%
52%
OTT
CO-VIEWING
1 VIEWER
TV
CO-VIEWING
2 VIEWERS
3+ VIEWERS
Source: Nielsen Roku OTT measurement (15 campaigns; 18M
impressions; Nov 2015 – April 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data
from any OTT connected TV sets; Dec 2015 – May 2016
Note that this definition of co-viewing for TV was created specifically for this research in order to closely align with the OTT definition.
It is different from the definition of co-viewing used by Nielsen’s traditional reporting systems (such as NPOWER).
5
24
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
Co-viewing by daypart
There are certain parts of the day (and the week) that are
more conducive to co-viewing for linear television: prime
time and weekend daytime are good examples. Early fringe
leads up to prime time with solid co-viewing activity, but coviewing drops substantially in late fringe (night owls tend to
watch TV alone). Finally, co-viewing is at its lowest during the
week (both in the morning and afternoon) when one or more
members of the household are likely to be away at work or at
school.
Co-viewing activity for OTT follows the same patterns for each
daypart. The gaps between OTT and linear TV is at its widest
during the day (both weekdays and weekends), which seems
to be a time when people are more likely to stream alone. The
narrowest gap between OTT and TV co-viewing is during late
fringe (33% for OTT vs. 40% for TV). Night owls might watch
regular TV alone, but an OTT device boosts their chances to
have some company.
FIGURE 3: OTT AND TV CO-VIEWING BY DAYPART, TIME OF DAY AND DAY OF WEEK
54%
40%
38%
44%
54%
48%
42%
40%
33%
33%
WEEKDAY EARLY FRINGE PRIME TIME LATE FRINGE
DAYTIME
(MON-SAT
(MON-SAT
(MON-SUN
(MON-FRI 6-8PM AND
8-11PM AND 11PM-6AM)
6-10PM)
SUN 6-7PM)
SUN 7-11PM)
OTT CO-VIEWING
42%
28%
25%
24%
WEEKDAY
MORNING
(MON-FRI
6-10AM)
41%
56%
WEEKEND
DAYTIME
(SAT-SUN
6AM-6PM)
DAYTIME EVENING
(6AM-6PM) (6PMMIDNIGHT)
46%
33%
WEEKDAY
51%
36%
WEEKEND
TV CO-VIEWING
Source: Nielsen Roku OTT measurement (15 campaigns; 18M impressions; Nov 2015 – April 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data from any OTT connected TV sets; Dec 2015 – May 2016
Co-viewing by age
Children co-view much more than the rest of the population
(see figure 4). In fact, 70% of all the viewing done by children
of age 2-12 is done with someone else (a friend, a parent),
regardless of whether the viewing is done on an OTT device
or not. The co-viewing rate (OTT or not) is still well above
50% for teenagers (age 13-17). After a slight drop for people
of age 18-20, the linear TV co-viewing rate climbs back up
progressively for people in their 20s, and then starts to drop
regularly until it reaches the 40% mark around 45 years old.
For OTT, the drop is much more substantial at age 18, and
co-viewing continues to drop for people in their 20s. It then
25
stabilizes and gets reasonably close to TV levels for people
who are 45 or older. At its widest (for people in their late 20s),
the gap between TV and OTT is 26 percentage points—in
fact, viewers in that age group are only half as likely to coview on OTT as they are for regular television.
The ‘bulge’ between the curves between the ages of 18 and 45
is particularly interesting. These are the ages when people are
most likely to be active (in school and in the workforce), and
thus have schedules that are more individualized. But these
are the years when people are at their most social too. It would
seem that people in that age range are using their OTT devices
for some ‘me-time,’ and that with age, their OTT behavior
comes back in line with how they’re watching linear television.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 4: CO-VIEWING RATES BY AGE AND PLATFORM
80%
70%
60%
50%
40%
30%
20%
10%
0%
2-12
13-17
18-20
21-24
25-29
30-34
OTT CO-VIEWING
40-44
35-39
45-49
50-54
55-64
65+
TV CO-VIEWING
FIGURE 5: CO-VIEWING RATES BY AGE AND GENDER
80%
70%
60%
50%
40%
30%
20%
10%
OTT CO-VIEWING
21
-2
M 4
25
M 29
30
-3
M 4
35
M 39
40
-4
M 4
45
-4
M 9
50
M 54
55
-6
4
M
65
+
-2
0
M
-17
18
13
M
M
F1
317
F1
820
F2
1-2
F2 4
52
F3 9
03
F3 4
53
F4 9
04
F4 4
54
F5 9
05
F5 4
564
F6
5+
0%
TV CO-VIEWING
Source: Nielsen Roku OTT measurement (15 campaigns; 18M impressions; Nov 2015 – April 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data from any OTT connected TV sets; Dec 2015 – May 2016
26
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
When we take that analysis one step further and examine age
groups by gender (see figure 5), we notice that women in
general tend to co-view regular TV more than men, but that’s
not necessarily the case with OTT. Women co-view OTT as
much as men across all age groups, and perhaps even more
so among teenagers.
Co-viewing by household size
The more, the merrier: It would seem natural for co-viewing
to increase as a function of household size. After all, a
person living alone isn’t likely to have as many co-viewing
opportunities as someone living in a household with two
parents, three kids and two grandparents.
There is, however, a dip in co-viewing for people living in
households of size 3. It’s not so much a dip for regular TV
as it is an absence of what might have been expected to be
an increase, but for OTT it’s a discernible dip, from 42% (for
viewers in households of size 2) down to 34%. We looked at
these households more closely and found that they are mostly
single parent (mom/dad) households with two kids. One
potential theory for a lower overall co-viewing rate for these
households is that it’s simply due to an absence of adult coviewing. Another theory stems from previous findings that
media consumption for single parent homes is different from
that in two-parent homes6. It’s possible that viewing in these
homes is more individualized in nature due to less parental
mediation and involvement. As a result, viewers in these
homes are more likely to watch content that’s more aligned
with their own individual tastes. The fact that the dip is more
pronounced for OTT than linear TV seems to reinforce that
hypothesis.
FIGURE 6: CO-VIEWING RATES BY HOUSEHOLD SIZE
60%
42%
46%
45%
45%
52%
48%
34%
HH SIZE=2
HH SIZE=3
OTT CO-VIEWING
HH SIZE=4
HH SIZE=5+
TV CO-VIEWING
Source: Nielsen Roku OTT measurement (36 campaigns; 112M impressions; May 2016 – July 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data from any OTT connected TV sets; Dec 2015 – May 2016
6
Gentile, D. A., & Walsh, D. A. (2002). A normative study of family media habits. Applied Developmental Psychology, 23, 157–178.
27
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
Co-viewing by number of kids in the household
Co-viewing is a direct function of the number of children
in the house. For linear TV, the rate increases by nearly ten
points with each child: from 39% in households with no kids
to 48% in household with one kid, 56% if there are two kids
around the house and 65% for three or more kids.
As with most comparative analyses in this paper, the OTT
rates are below their TV counterparts, but there’s a noticeable
difference here: the OTT co-viewing rate for households with
two kids is only marginally better than that for households
where only one kid is present (41% vs. 39%)—and a full 15
percentage points lower than the 56% TV benchmark for that
group. This is in line with the observation we made earlier
that single-parent households with two kids seem to exhibit
more personal viewing patterns.
FIGURE 7: CO-VIEWING RATES BY NUMBER OF KIDS
65%
56%
48%
39%
39%
53%
41%
28%
NO KIDS
1 KID
OTT CO-VIEWING
2 KIDS
3+ KIDS
TV CO-VIEWING
Source: Nielsen Roku OTT measurement (36 campaigns; 112M impressions; May 2016 – July 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data from any OTT connected TV sets; Dec 2015 – May 2016
Co-viewing by content type
In figure 8, we illustrate co-viewing rates for a number of
popular programming genres. Notice that for the most part,
co-viewing remains in a 40-50% range for TV and a 30-40%
range for OTT, regardless of program genre. With one notable
exception: children’s programming, for which TV co-viewing
hits a high mark of 60% while OTT co-viewing stands at
38%—one of the highest co-viewing rates for OTT, but far
behind its TV counterpart.
28
Since children co-view more than adults, it’s not surprising
to see children’s programming be one of the most co-viewed
genres on television, but we were expecting a higher OTT
co-viewing rate. It is possible that kids are still watching
children’s programming together when that programming is
on linear TV (e.g., on Saturday mornings), but are using the
OTT devices in their homes to watch different content. This is
an area for further exploration.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 8: CO-VIEWING RATES BY PROGRAMMING GENRE
60%
39%
38%
36%
35%
34%
33%
GENERAL
DOCUMENTARY
FEATURE
FILM
OTT CO-VIEWING
39%
38%
CHILDRENS
31%
SCIENCE
FICTION
35%
48%
COMEDY
VARIETY
34%
45%
SUSPENSE/
MYSTERY
35%
46%
39%
POPULAR MUSICSTANDARD
GENERAL
VARIETY
NEWS
30%
41%
ADVENTURE
39%
42%
SPORTS
COMMENTARY
41%
47%
GENERAL
DRAMA
48%
TV CO-VIEWING
Source: Nielsen Roku OTT measurement (15 campaigns; 18M impressions; Nov 2015 – April 2016)
Source: TV co-viewing rates are from Nielsen TV measurement data from any OTT connected TV sets; Dec 2015 – May 2016
IS OTT HELPING OR HURTING
CO-VIEWING?
The impact of OTT devices on co-viewing behavior is
complex. On one hand, those devices offer many new
opportunities for people to find content that they can watch
together. But they also make it very easy to isolate oneself. It
wouldn’t be wrong to summarize our findings this way: OTT
co-viewing is generally lower than TV co-viewing, and it follows
the same patterns (kids do it more, it increases with household
size, it’s larger in the evening than in the daytime, etc.).
But we also found evidence that points to measurable
differences: certain household dynamics (e.g., a single parent
with two children) have a peculiar co-viewing profile that
might be exaggerated by OTT activity; some age groups (18 to
29
45) seem to use OTT devices disproportionately for individual
viewing; children’s programming isn’t co-viewed on OTT as
much as one might expect; and OTT activity during daytime
hours appears to be more personal.
The methods we developed for this research are allowing us
to study co-viewing, but more fundamentally they’re allowing
us to put a face on OTT viewers, whether they’re co-viewing
or not, and compare their behavior to that of regular TV
viewers. This is of particular importance to advertisers eager
to use the OTT ecosystem to reach new and existing market
segments as efficiently as possible. Is OTT helping or hurting
co-viewing? We have some preliminary answers but not
the full picture yet. As OTT usage continues to grow, we’re
looking forward to building on the research and methodology
developed for this paper to improve our understanding of
OTT’s impact on society.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
USING MACHINE LEARNING TO
PREDICT FUTURE TV RATINGS
BY SCOTT SEREDAY AND JINGSONG CUI Data Science, Nielsen
FUTURE
PAST
INTRODUCTION
Nielsen’s TV ratings have been a mainstay of the U.S.
media industry for over half a century. They’re used to
make programming decisions and have become part of our
popular culture1, but they are also the basis for billions of
dollars’ worth of advertising transactions every year between
marketers and media companies. They help measure the
success of TV shows, verify that their audience size and
composition are delivering against strict media-buy targets,
and provide a basis for make-goods if the numbers come up
short. From that point of view, TV ratings are metrics that
measure the past, or at best the present, of TV viewing.
1
But ratings are also used to predict the future. They set
expectations and affect programming decisions from one
season to the next, and they help set the cost of advertising
(advertising rates) well in advance of when a program
goes on the air. In the U.S. for instance, TV networks sell
the majority of their premium ad inventory for the year at
the “upfront,” a group of events that occur annually each
spring. For each network, the upfront is a coming-out party
to introduce new programs and build up excitement for the
upcoming season, but behind the curtains, it’s very much
a marketplace for advertisers to buy commercial time on
See the weekly top-10s here: http://www.nielsen.com/us/en/top10s.html
30
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
television well ahead of schedule. Upfronts are effectively a
futures market for television programming, and they provide
networks with some stability in their financial forecasts.
As a result, media companies have invested considerable
effort to project future ratings. Reliable forecasts can
help industry players make faster, more accurate and less
subjective decisions, not just at the upfront, but also in
the scatter planning2 that occurs during the season. And if
reliable forecasts can be produced through an automated
system, they can be used to enable advanced targeting on
emerging programmatic TV platforms.
But ratings projections are challenging: They require a steady
inflow of rich, granular, reliable data, and the ability to adapt
and incorporate new data to account for the latest changes in
viewing behavior. Viewers are increasingly consuming media
on different devices and through different channels. Their
viewing is also increasingly likely to be time-shifted to work
conveniently around their own schedule. These changes are
making predictions more difficult. More difficult, but also
more crucial to the evolving TV ecosystem.
In this paper, we discuss a recent pilot project where Nielsen
worked with one of our key clients to innovate and improve
the practice of ratings projections. Through collaboration,
we aimed to develop a more accurate (better performance
metrics), more efficient (better cycle time) and more
consistent (reduced variability) system to improve their
existing practice and lay the foundation for an automated
forecasting infrastructure.
CHOOSING THE RIGHT DATA
FEATURES
What were the parameters of this project? We were asked
to make projections for several TV networks. These
projections needed to include live and time-shifted program
and commercial viewing for more than 40 demographic
segments. They also needed to be supplied for each day of
the week and hour of the day. For upfront projections, we
were limited to utilizing data through the first quarter of the
year (Q1), because of the timing of the upfront, and needed
to project ratings for the fourth quarter (Q4) of that year all
the way to Q4 of the following year.
2
In every predictive modeling project, the type and quality of
the input data have a very significant impact on the success
of the model. We considered several factors during the design
stage to choose the most appropriate and effective data for
this research. It’s important to point out how some data,
while promising and completely suitable for other types of
research studies, can be inadequate or inefficient for our
purpose.
Consider, for example, the enthusiasm that a top executive
might have for a new program on the lineup. That enthusiasm
is hard to quantify. It introduces bias (the executive might
have played a larger role in bringing that program to life), and
even if we were able to express it mathematically, we couldn’t
obtain the same information for all the other programs on the
air. Domain expertise, in the form of subjective insights, can
be invaluable to help guide the design of a predictive model
and validate its results, but it often falls short as a direct
input variable.
We also needed to ensure that the data would be available on
a timely basis—so that it could be digested and utilized in the
execution of any future projections. Obviously, post-hoc data
(such as post-premiere fan reviews) can be highly indicative
of the enduring success of a program, but since it occurs
after the program airs, it’s useless for projection purpose.
Finally, in order to develop a process that can scale to handle
all channels, programs, and dayparts, we decided to only
use data that is already stored and managed with some
level of automation in current practice. Future programming
schedules, for instance, could most certainly boost the
accuracy of our models, but they’re not currently standardized
nor universally available.
In the end, we decided to rely almost entirely on historical
ratings data as input to our forecast model. Fortunately, at
Nielsen, we’ve been collecting top-quality ratings data for
decades, with rich, consistent and nationally representative
demographics information. We included standard commercial
and live ratings data in our input variables, as well as timeshifted viewing, unique viewers (reach), average audiences
(AA%), persons or households using TV (PUT/HUT), as
well as various deconstructed cuts of data. To supplement
the TV ratings, we looked at ratings from Nielsen Social,
marketing spend (from Nielsen Ad Intel) and other available
program characteristics. Fig. 1 highlights some of the data we
evaluated for upfront and scatter predictions:
Scatter Planning refers to a small percentage of ad inventory that is reserved by networks for last-minute use.
31
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 1: DATA VARIABLES EVALUATED FOR UPFRONT AND SCATTER PREDICTIONS
DESCRIPTION
EXAMPLE DATA
RATIONALE
Known elements to assess
and categorize a show
Genre
Air date/time
Differences in
characteristics impact
ratings
PROGRAM
PERFORMANCE
Performance on
measurable dimensions
Historic ratings
Past performance
indicative of future ratings
PROMOTIONAL
SUPPORT
Investment in driving
awareness among
audience
Marketing spend
Greater promotion /
spend lifts ratings
Audience interest and
commitment to a show
Television Brand Effect
Higher intent to watch/
sustained engagement lifts
ratings
Social media information
Nielsen Social Content
Ratings
Inbound social media
reflects program popularity
and engagement
PROGRAM
CHARACTERISTICS
AUDIENCE
ENGAGEMENT
SOCIAL/ON-LINE
BEHAVIOR
USING EXPLORATORY ANALYSIS TO
GAIN INSIGHTS
It’s always a good idea to explore the data before building
a model. This preliminary analysis doesn’t need to be very
sophisticated, but it can be crucial to reveal the rich dynamics
of how people have watched TV in the past, and it can help
highlight some important and interesting factors that will
influence our final projections.
Fig. 2, for example, confirms that among the networks that
are part of this project, primetime viewing is still by far
32
On/Cross-air promos
the most popular daypart for television consumption. Not
surprisingly, weekend usage in the daytime is higher than
weekday usage. And over the course of the past five years,
the overall percentage of persons watching traditional linear
television has been trending downward. Note as well the
seasonality of the metric.
In Fig. 3, we can see the differences in usage level by age
and gender for those same networks, with older viewers
much more likely to watch TV than younger generations, and
women in each age group typically watching more than their
male counterparts.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 2: PERCENTAGE OF PERSONS USING LINEAR TV FROM 2011 TO 2016
(PERSONS 25-54, LIVE+7)
45%
40%
35%
30%
Prime
Sat-Sun Daytime
M-F Daytime
Early Morning
25%
20%
15%
10%
5%
0%
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
2011
2012
2013
2014
2015
2016
FIGURE 3: PERSONS USING LINEAR TV BY AGE AND GENDER
45%
40%
35%
HHLD
30%
F65+
M65+
25%
F25-64
20%
M25-64
F18-24
15%
M18-24
10%
5%
0%
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
2011
33
2012
2013
2014
2015
2016
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
In another example (Fig. 4), preliminary analysis of timeshifted data for two specific networks—one broadcast
network and one cable network—has allowed us to
understand the rise of time-shifting activity over the years,
and how much less seasonal that behavior has been in
primetime for programs on the cable network, compared to
programs on the broadcast network.
random decision forests, support vector machines, neural
networks and gradient boosting machine (GBM)3. While each
method has its own advantages and disadvantages, in the
end, the GBM method (specifically, the xgboost optimized
library) proved to offer the best combination of accuracy and
scalability for our project.
Those are just a few examples, but they illustrate the type of
exploratory analysis that we performed to fully appreciate the
scope, direction and overall quality of the data that we wanted
to feed into our models.
Gradient boosting is typically an ensemble (a model
comprised of many smaller models) that utilizes many
decision trees to produce a prediction. The illustration in
Fig. 5 shows a simplified example of how an individual tree
might work, and Fig. 6 shows how multiple trees might be
aggregated in an ensemble to make a prediction.
A DEEPER DIVE INTO OUR
METHODOLOGY
In developing our projections, we tested many models and
machine learning algorithms, including linear regression,
penalized regression, multiple adaptive regression splines,
We opted for xgboost, a recent variant of GBM, because it
penalizes overly aggressive models—models that fit to the
historical results too perfectly, a common mistake called
“overfitting.” Xgboost has taken the competitive prediction
world by storm in recent years and frequently proves to be the
most accurate and effective method in Kaggle4 competitions.
It’s notably fast, scalable and robust.
FIGURE 4: RISE IN THE TIME-SHIFTED ACTIVITY FOR TWO SEPARATE NETWORKS
60%
50%
40%
Cable Network B
30%
Broadcast Network A
20%
10%
0%
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
2011
2012
2013
2014
2015
2016
A discussion of the merits of each of these methods is beyond the scope of this paper. Interested readers will find a useful
comprehensive resource in The Elements of Statistical Learning (by Hastie, Tibshirani, and Friedman).
3
Kaggle is a crowdsourcing platform where data mining specialists post problems and compete to produce the best models.
More information can be found at kaggle.com.
4
34
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 5: A SIMPLE EXAMPLE OF A DECISION TREE
tree 1
prior
rating
>1.1
no
no
predicted
rating: 0.4
prior
rating
>0.5
yes
yes
prime
time
no
yes
Sunday
no
predicted
rating: 0.6
yes
no
predicted
rating: 0.8
last year
rating
> 1.0
predicted
rating: 1.2
yes
predicted
rating: 1.3
FIGURE 6: COMBINING MULTIPLE TREES INTO AN ENSEMBLE MODEL
tree 1
tree 2
predicted
rating: 0.1
tree 3
predicted
rating: 0.3
predicted
rating: 0.2
rating average:
0.2
35
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
SPLITTING THE DATA TO TRAIN A
WELL-BALANCED MODEL
parameters. The final parameters were selected with
consideration to the results of the cross-validation,
helping limit the tendency to overfit the model to the
training set.
We restricted our data to only that which would be available
when projections are typically made. Since the upfront
occurs in May and June, it’s technically possible for upfront
projections to include some data from Q2, but for testing
purposes, we decided to use only data through Q1 (and all
relevant data from the preceding years, of course).
•
We also held out some data that was never used in the
buildup process, but served as another layer to test the
validity of our model and protect against overfitting.
Holdout validation testing data provides an additional
measure of quality control in the overall process. Models
still tend to overfit even when using cross-validation.
In order to choose the parameters most appropriate
to apply to a new dataset, it is usually better to choose
results that are slightly conservative, even for the testing
dataset. The holdout validation testing set helped us
achieve that balance.
•
Once everything checked out and the final parameters
were set, we retrained the model using the best
parameters to leverage the most complete information
available. We then ran it on a new dataset and compared
its performance to client projections, focusing on key
demographic groups.
To be objective in assessing the accuracy of our projections,
it was important to implement a fair and reliable process to
develop our model and test our results along the way. Fig. 7
illustrates the iterative process we used to accomplish that
goal.
Here are the main steps:
•
Our algorithm randomly split the data into training
and cross-validation testing sets. The model learned by
making predictions based on the training set, testing
those predictions on the cross-validation testing set,
and repeating the process multiple times using different
FIGURE 7: AN ILLUSTRATION OF THE ITERATIVE PROCESS USED IN THE PROJECT
UTILIZE XBOOST AND
TRAINING SET TO
DEVELOP MODEL
GENERATE RANDOM
SPLIT OF DATA
BETWEEN TRAINING AND
CROSS-VALIDATION
TESTING SETS
START
PROCESS
36
COMPUTE CROSSVALIDATION ERROR
MEET
STOPPING
CRITERIA?
GET
HOLDOUT
SET
GET DATA
(EXCLUDING
HOLDOUT SET)
COMPUTE
HOLDOUT
ERROR
ACCEPTABLE
ACCURACY
LEVEL?
TEST ON NEW DATA
AGAINST CLIENT
PROJECTIONS
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
FIGURE 8: TRAINING ENOUGH WITHOUT OVERFITTING
high
test
dataset
prediction error
well-trained
over-fitted
under-trained
training
dataset
low
low
model complexity
MEASURING THE PERFORMANCE OF
OUR MODELS
high
•
How close were our projections?
We relied on a variant of WAPE (weighted mean absolute
percentage error) to evaluate the accuracy of our models.
WAPE is a statistical measure that helped us ensure that
the way our model fit new data was reasonably consistent
with how it fit historical data.
We used WAPE to compare our model’s accuracy to our
client’s model at two different levels. The first was at the
channel level, which placed little emphasis on the ability
to distinguish between programs, but was focused on
getting the high level trends right—such as overall TV
viewership for each channel. We also compared WAPE
at the hour-block or program level. The hour-block level
looked at the model’s ability to distinguish between
shows, as well as its ability to understand the high-level
effects that influence all shows.
37
How much information did the model explain?
The metric of choice for this component was R-squared.
R-squared is a statistical measure that represents the
percentage of variation the model is able to explain.
Unlike WAPE, R-squared did not evaluate if the highlevel trends were captured appropriately. It was far
more concerned with the ability to distinguish between
programs, and was used to help establish the root of
success or failure in our model at a more granular level.
As we evaluated our results, we focused on the following
criteria:
•
We used cross-validation to build
and evaluate our model. Crossvalidation penalizes models that
make predictions that fit too
perfectly to past data, and thus
are likely to reflect patterns that
are too complex and unlikely to
continue in the future. When
training using cross-validation,
we tried to find the point at
which the model was able to
capture important elements to
make predictions, but ignored
elements that were not powerful
enough to offset the noise they
created. The illustration in Fig.
8 can help visualize the point
where a model starts to be too
well trained for it to perform
adequately on a new test dataset.
•
Was the model helpful?
In addition to the hard evidence presented by WAPE and
R-squared, we needed to consider practical implications
of our process. For example, the model must be feasible
for the client to implement. In addition, it should
complement the client’s existing framework. We also
needed to identify where our projections could be trusted
and when it might be more reasonable to lean on inhouse knowledge. Finally, the accuracy of the model
needs to be consistent enough to be trusted in the first
place.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
Our model was effective and produced several interesting
findings. It held up close to expectations in terms of accuracy
when evaluated using future testing dates. In addition, when
we computed network performance using granular hourblock level data (we predicted 192 such observations for each
network), our model’s improvement over the client model
was substantial for almost every network (see Fig. 9).
However, when we used aggregate network-level data (rather
than hour-block level data) in our model, the results of our
projections were far less clear. For some networks, we were
closer, but for others, the client’s model was more accurate in
projecting the overall rating (see Fig. 10).
FIGURE 9: IMPROVEMENTS OVER CLIENT’S MODEL USING LOW-LEVEL OBSERVATIONS
80%
Average improvement for R-squared: 41%
Average improvement for WAPE: 16%
60%
40%
20%
0%
NETWORK BY NETWORK
FIGURE 10: COMPARISON OF MODEL PERFORMANCE USING HIGH-LEVEL OBSERVATIONS
Average WAPE error for client model: 9.1%
Average WAPE error for Nielsen model: 8.3%
30%
20%
10%
0%
NETWORK BY NETWORK
38
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
Why did the results look so different when rolled up to the
network level? One possibility is that the client’s model was
able to capture unique in-house knowledge that could explain
high-level effects that might have influenced all programs. It’s
also important to remember that a prediction at the network
level relies on fewer prediction points, and might as a result
be less reliable to begin with. We are probably very limited as
to the conclusions that can be gleaned from the model at that
level.
What is more interesting, however, is that when looking into
the granular results for each network, we believe we see some
indications as to how our model and the clients’ projections
might be combined to complement each other. First, we
found that a model consisting of 90% our projection and 10%
our client’s projection outperformed each model individually
in the two quarters that we tested. This was not isolated to
just one case either: In fact, among the 11 regressions we ran
for each of the channels, 10 suggested that both the client
and Nielsen’s models should contribute to a combined rating.
This 90%/10% balance may not be the most robust estimate
going-forward (as it should be validated over time), but it
is certainly evidence that there is some unique knowledge
contributed from both models.
Furthermore, there are some patterns that seem to emerge
when we look at how each model complements the other
from network to network. The network where a regression
suggests the client’s model contributes the most was
rebranded and relaunched just five months after the upfront.
This was somewhat expected given our prior assumption that
the client’s in-house knowledge should have more value when
there are more significant changes taking place. To make this
theory stronger, the network where a regression suggested
that the client’s model should have the second highest weight
was rebranded and relaunched just before the upfront.
39
TOWARD A HYBRID MODEL
In the end, we were able to put together a robust model to
predict future ratings, based on modern machine learning
principles, and that model was particularly strong when
the input data—and projected ratings data—was granular.
However, for channels where we suspected in-house
knowledge could play a key role, we found that the client’s
in-house model performed reasonably well. We believe that
a hybrid model (one that can combine the raw power of our
approach with custom insights) might be the best approach
going forward.
There are additional benefits to combining forces. The time
and energy required to generate thousands of projections
are often beyond the resources of individual market research
departments, especially for the lower-rated programs and
time slots. An automated projection system can take care of
the vast majority of projection estimates, and allow in-house
experts to focus on the more important programs and factor
in additional insights for those estimates. An in-house expert
can also quickly evaluate the impact of unusual events and
identify specific projections that are likely to go astray.
Of course, this doesn’t mean that we shouldn’t try and
improve our predictive model: We might add more
demographic characteristics to the model (e.g., income,
location of residence, internet usage, etc.); Considering how
much better our model performs with granular data than
high-level data, we could take the analysis one step further
and use respondent-level data; We might even add more
competitive viewing data into the mix.
But the human element will always play a key role in the
interpretation, so we might as well include that human
element in the modeling process. The media landscape is
changing fast, and those who are able to merge algorithms
and intuition will be best positioned to anticipate the coming
trends and capitalize on the opportunities.
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31
40
NIELSEN
NIELSEN JOURNAL
JOURNAL OF
OF MEASUREMENT,
MEASUREMENT,VOL
VOL1,1,ISSUE
ISSUE31