Web Relevance Engine

Inside the Technology:
Web Relevance Engine
Table of Contents
Introduction..............................................................1
The Web Relevance Engine by the Numbers....3
Inside the Technology............................................4
How the Algorithms Work.....................................5
Data...........................................................................6
Algorithms...............................................................8
Natural Language Processing.............................9
People.......................................................................12
Products...................................................................15
Demand....................................................................16
Applications.............................................................17
Conclusion...............................................................18
What is the Web Relevance Engine?
The Web Relevance Engine (WRE) is an intelligent machine that learns continuously and develops a deep
understanding of Web content and Web users’ intent. It relies on machine-learning and natural language processing
to create a discovery experience that is relevant, personalized and optimized for connecting consumers with the
information and products that they’re looking for. This white paper explores the WRE’s underlying technology.
For further reading on how companies are leveraging the WRE, visit: bloomreach.com/resources/
Introduction
The Internet is a key tool in everyday life and consumers expect it to
perform reliably -- the same way they expect the lights to come on when
they flip a switch or the water to stream out when they turn on a faucet.
Living in a world run on mobile devices and navigated through a host
of search engines and the availability of texts, email alerts, product
recommendations and advertisements, consumers expect to effortlessly
find what they are looking for online when they describe those things in
their own terms.
The problem for those who sell products or offer content on the Web is
that no two people are alike.
One person’s tight, red dress might be another’s crimson, form-fitting
Herve Leger. So, how does a website owner deliver on consumers’
expectations that their online searches will result in recommendations
that are relevant to them? How does a retailer understand users’ intent
well enough to tailor the search experience down to the individual?
Sequin
Color
Belt
Frilly
Stretchy
Long
1
Each visitor takes a different approach when they seek to discover
products or information on the Web. But each wants a memorable
discovery experience; one in which the site search and navigation feel
individually tailored to him or her; one in which an expressed interest
in one product yields recommendations of complementary or related
items. Each uses his or her own terms to describe products. Some prefer
mobile. Others prefer a desktop.
Some would rather navigate a site using the facets and filters that
appear on a landing page. Others rely on the search box on the site.
But every visitor wants to be presented with results, products and
recommendations that are relevant to them. How does a website owner
learn from every interaction on his or her site and the way consumers
use language on the greater Web? How can a team observe and act
on more than 100 million consumer interactions a day or process the
number of Web pages, another 100 million day, that it needs to optimize a
customer’s discovery experience? BloomReach approaches this challenge with its Web Relevance Engine,
a machine-learning technology that powers the Personalized Discovery
Platform, a set of tools that:
•
guides consumers to the content that they seek.
•
powers the search, navigation and personalization on sites,
providing consumers with personalized results, categories and
recommendations, while creating a connection among all the digital
devices a single shopper might use during a search.
•
gives merchants and site owners actionable insights into improving
content performance based on the way visitors to their sites
interact with content on the site.
2
Number of Pages Processed Per Day:
010100110101011011011
100 million
The Amount of Data Processed Per Day:
5-10 terabytes
Number of Hadoop Jobs Per Day:
5,000-6,000
Number of Synonym Pairs in the WRE:
10 billion
Number of Consumer Interactions Seen Per Day:
150 million
The Number of Different Colors the WRE Understands:
1,077
The Web Relevance Engine by the Numbers
3
Inside the Technology
The Web Relevance Engine means that for the first time website owners
are able to deliver relevant experiences for every visitor at scale, while
keeping up with their ever-changing inventory and the constantly
evolving ways that consumers express their intent. The engine sits at
the center of efforts to acquire customers and provide memorable and
engaging experiences for them, while informing data-driven strategic
decisions. This white paper will explain how that’s possible
by exploring in depth the way the WRE comes to
deeply understand people, products and Webwide demand -- and the way the three are
intertwined. But first, let’s look at the data
and technology that power the WRE.
empty bins that you’ll have to fill with your own data. When building the
WRE, we focused on gathering data from a variety of sources, some of
which benefit from network effects that a single e-commerce enterprise
could not collect on its own. Other data in the WRE is either difficult to
gather or difficult to analyze and act on in real-time at scale, especially
when that action relies on data from multiple sources.
The principal data used in the WRE comes from three
sources: pixels, feeds and crawling. This data is the
key to the WRE’s deep understanding of people,
products and demand, the interplay of which
we’ll cover in a later section of this white
paper.
The garbage in, garbage out axiom is
fitting when it comes to platforms that
power product discovery. Sadly, many
of those platforms don’t even come
with garbage. They’re essentially just
4
How the Algorithms Work
The Web Relevance Engine applies a series of algorithms to the gathered
data in order to create a meaningful discovery experience for consumers
on the Web. The WRE is able to algorithmically assemble a meaningful
relevant set in response to a query. Then it is able to rank the large
number of results in order of relevance, based on the user’s intent.
BloomReach’s WRE relies on a database of 10 billion synonym pairs and
other signals involving product attributes to match a given query to the
relevant set of products in our customers’ product catalogs.
Unlike many search engines, including the out-of-the-box
Solr product, the WRE’s reliance on machine learning and
natural language processing means that there is no need
to manually build a synonym list or to rely on a massive
global dictionary that provides a one-size-fits all solution to
a problem that is not a one-size-fits all challenge.
Not only that, but the WRE learns from every interaction. Results and
rankings continuously improve. Consider a query including the word,
“Frozen.” Not long ago the attribute would refer to the adjective for
something that is very cold. “Frozen food container” might be a relevant
result, depending on many other factors. But since Disney released the
animated princess movie “Frozen” in 2013, the attribute has a whole new
meaning. Now, for department stores and general merchandisers, “frozen”
is quite likely to be a reference to a product with a tie-in to the movie.
Once a relevant set is identified, the WRE goes to work
ranking the results in a many-layered process that applies
metrics to thousands or even millions of products, including
assessing the products’ performance in terms of revenue
per visit and the products’ performance in the context of
the search query entered. The WRE’s underlying algorithms
next turn to personalizing the results and rankings. Consider
a consumer who has a propensity for converting on
medium-size shirts. A shirt that ranks high on color and
other product attributes would be knocked down in rank if it
were not available in medium. Another relevant shirt would
move up the list.
5
Data
Pixel
Feeds
A listening pixel is added to each page of all BloomReach customer
sites. The pixel enables the Web Relevance Engine to understand the
specific behavior of customers on that site in relation to its content.
For example, what is the path for an individual shopper on the site?
Did she start on the homepage or land on a category page via organic
search? Did she click on a product page? Did she execute a site search?
Was she happy enough with the site search results to click on a
product?
Or did she abandon her search? All this activity is placed in context due
to the fact that the WRE develops a deep understanding of the content
-- page, link, image -- that a given user visits.
The listening pixel not only tracks that data for a shopper, but compiles it
for the aggregate visitors to the site, revealing a tremendous number of
actionable insights.
The listening pixel also provides the WRE with performance data for
products and categories. By tracking views, clicks, add-to-carts and
conversions, the WRE understands which products are performing well
from a revenue perspective. This data can then be used to optimize the
product mix on a given page for the maximum revenue potential.
Our customers provide us with product feeds for some BloomReach
Personalized Discovery Platform applications. Although there are
exceptions due to site-specific requirements, the product feed is
typically provided via file transfer protocol (FTP) daily, with intraday
updates for inventory and pricing changes. Platforms such as IBM
WebSphere Commerce have native integrations to the product database
to simplify and expedite sharing product data.
The Web Relevance Engine ingests a product feed, pulling out valuable
data about the products and how they relate to one another, both in their
known category structures and based on attributes within the WRE that
we use to enhance the feed data.
6
M
E
For information beyond that contained in the product feed, including
creative marketing copy on product pages and customer reviews, the
Web Relevance Engine does a deep crawl of BloomReach customer
websites. The additional language on those pages helps the WRE
gain a further understanding of the context that the products fit into,
as well as the language that consumers use to describe them.
T
Th
St in- AD
ra St
ES
pp ra
yL pB C
ea an
th da
er g
Ha e D
rn res
es s
sB &
elt
Crawl
METADESC
Thin-Strap Bandage Dress & Strappy Leather Harness Belt
CRUMB
Women’s Apparel + Dresses
PRODUCT_ID 37213
CRUMBURLS
%2F%2Fapparel%2F%2Fapparel%2Fdresses%2F
%2Fapparel%2Fdresses%2F.asp
DESC
There’s no doubt you’ll be the red-hot ticket at your next event with
this dress.This signature look…
HEADING
Red Bandage Dress with Leather Harness Belt
IMAGE
http://hautedress.com/is/image/hautedress/n/w/300/33212.jpg
* Crawling information example
In addition to BloomReach customer sites, the WRE also crawls a
myriad of other public sites such as Wikipedia, competitor sites
and blogs, to gain further understanding of language, context and
consumer intent.
For example, if we have a customer in the teen women’s apparel
business, we may crawl several other sites in that vertical to increase
the WRE’s understanding of the language used by those teens and the
marketers who speak to them.
For example, the WRE crawls Wikipedia to learn about the language
of landmarks and people of note. And to learn about consumer intent
and product descriptions and the context in which the two meet, the
WRE crawls the sites of our customers’ competitors.
7
Algorithms
Natural language processing drives many of the WRE’s capabilities.
So, the WRE compiles data gathered from the pixel, crawl, feeds and
queries into language models. For example, the WRE has 10 billion
synonym pairs. The WRE also understands 1,077 different colors by name and by how their various hues relate to one another.
Therefore, this data in the WRE is extremely valuable as a starting point
for any of the BloomReach applications. The ongoing engagement with
BloomReach customer sites as well as the performance of our search
optimized pages enable the WRE to continuously learn about consumer
language and behavior.
The Web Relevance Engine has a tremendous and growing database of
consumer queries. The data is compiled from historic organic and paid
searches, as well as site searches on BloomReach customers sites.
Note that organic and paid search marketers no longer have as much
access to data at the query level due to changes in consumer privacy
protection by the major search engines.
For an idea of just how much there is to learn about consumer language
and behavior, consider an exercise BloomReach conducted, asking
participants to describe a rather fetching, red dress. The first 500 users
who took the quiz came up with 129 ways to describe the dress’ color,
194 ways to describe its neckline and 275 ways to describe its belt,
which some might say defies description. And that was just for starters. ceramic
travel mug
reusable
shopping bags
reclaimed
energy
monitor
energy saver
appliances
reusable
energy
saving
home composting kit
recycled glass vase
recycled
recycle bin
organic
bamboo
coasters
solar
organic robe
organic
gift basket
reclaimed
wood bowl
solar
radio
recheargable
batteries
organic
skinny jeans
product-tag
non-product-tag
node of interest
8
Natural Language Processing
The heart of the Web Relevance Engine is a natural language
processing system that relies on machine learning to constantly
improve its reliability. The two technologies together give the WRE the
ability to understand the relationship between a consumer and the
content on a BloomReach customers’ website.
This work goes on at a tremendous scale. The WRE processes five
to 10 terabytes of data a day. It is exposed to 150 million consumer
interactions a day. That scale is vital to the WRE’s effectiveness. The
more data the WRE ingests and processes, the more accurate its
understanding of natural human language becomes.
BloomReach uses natural language processing to do that by
determining the semantic language that shoppers use and how that
language relates to the language that merchants, marketers and
manufacturers use to describe and classify their products and other
content. Natural language processing also helps the WRE determine
the context in which language is used, for instance “red” often refers to
a color, but it can also mean a brand, as in REDValentino.
That’s how the WRE can understand all those synonyms and know
what all those colors are -- and their relationship to each other. That
allows the WRE to know that someone searching for a crimson,
silk dress for a night on the town would be delighted with the same
Herve Leger dress that someone hunting for a red, cotton dress for a
cocktail party would like.
The WRE constantly crawls the Web, conducting deep crawls of
BloomReach’s customers’ websites and the broader Web. It processes
data feeds from customers’ catalogs, processing 100 million Web
pages a day. The WRE gathers data from BloomReach’s listening pixels,
queries and its crawls and compiles language models. It then develops
an understanding of language, the context in which words are used and
the way words relate to each other and to product attributes. That’s how
the WRE can understand all those synonyms and know what all those
colors are -- and their relationship to each other.
9
Automated Collaborative Filtering/Behavioral Modeling
You can learn a great deal about user intent with aggregate behavioral data.
For example, if a site search query has a high volume yet few conversions, it
can be extraordinarily valuable to look at what those shoppers did next. Did
a significant number of them navigate to a particular category or product
page? Did they refine their search query and try again? And when it comes
to showing similar products or products that might be complementary, the
“wisdom of the crowd” can be immensely valuable.
as well as reveal it to site optimizers and marketers. It is all about
connecting the dots that consumers reveal by their behavior. In the
case of a site search query that is not converting well, the WRE looks
at the products that are co-viewed with that query. This seemingly
simple act, done at a large scale, helps the WRE connect demand with
supply via better synonyms.
One of the techniques that the Web Relevance Engine uses is a process
called “collaborative filtering,” which utilizes data to cluster recommended
products. Collaborative filtering statistically clusters visitors around
shared preferences using vector mathematics and identifies specific
users who are within a natural cluster, but who have not yet seen
recommendations related to the cluster’s shared preferences. The WRE
uses our natural language processing algorithm to interpret the attributes
(and their synonyms) for the products in the cluster and identifies other
products that should be in the cluster based on their attributes. The
system then offers the recommendations. The visitor’s interaction with
the recommendation helps refine the statistical clustering.
It is important to recognize that collaborative filtering on its own can
create some very off-putting recommendations, particularly for lowtraffic or new products, since the sample size of behavior that’s related to
those products will be very low. So, the WRE takes additional steps, such
as limiting the recommendations to the same or related categories as the
original product, to ensure that they are high quality.
Behavioral modeling is also used to identify areas on a site where
demand is going unmet. For example, if the product information does
not contain a commonly used synonym for that product, yet that
synonym is used by shoppers as a site search query, the WRE can
learn that new term, use it for our recommendations and results
Personalization gone wrong
10
Machine Learning
K1
S
Keyword
query
Search
engine
P1
C1
P2
P3
C1
P1
P4
P1
P1
C2
C1
User 1
User 2
P1
C1
P2
P3
P1
C1
P2
P3
C1
P1
P4
P1
C1
P1
P4
P1
P1
C2
C1
P1
C2
C1
User 3
Product page P1 was already
tagged with a concept
Series of clicks by users for the
same keyword query
Category page
Product page
At the end of the day, improving customer experiences is a math
problem. It’s not just the large number of customers and products, but
rather how to increase the probability that you are showing the right
product mix to each and every customer.
To do that with any speed and scale requires machine learning, which,
in the case of the Web Relevance Engine, crunches vast amounts of
data and then predicts the right product mix that suits a particular user’s
expressed intent. If you are familiar with statistics, you’ll recognize that
Bayesian probability models adjust their results as they ingest new
information. There is no conclusion or final certainty. Rather, these
models continue to learn and improve over time.
This is essentially how the machine learning capabilities of the WRE
work. The WRE analyzes new data - about people, products and demand
- to constantly iterate on the actions it takes and opportunities it
recommends. This self-tuning of the algorithms allows BloomReach to
improve our results continuously over time.
We propagate the same tag to Category
page C1, since it occurs with P1 often
and within a few clicks.
Page associated concept tag
The WRE is a life-long learner
The WRE’s continuous learning means that the engine is able to
anticipate the complete phrase a user intends to type before he or
she is finished typing. The WRE builds this understanding based on
what users ultimately click on while broadly searching as well as
when they search on a website. For instance, consider a user who
types the letter ‘s’ on a department store site. The WRE might offer
“shoes” as a site search query. Type in ‘s’ on an athletic team gear
site and “Seattle Seahawks” is a more relevant query.
Similarly, machine learning provides the engine with the ability to
adapt to different seasons. Typing the letter ‘v’ on a florist’s site
in February is likely to produce “Valentine’s Day” as a query. Type
‘v’ a few months later and the auto suggestion is more likely to be
“violets.”
11
People
2 individuals, same demographic.
Female, 28 years old, College grad, Mountain View, CA, Income >80k
There are more than 3 billion Internet users on the planet and the number
is growing every minute. All of them have their own personalities, likes,
dislikes, needs and desires. There is a reason that the cereal aisle at the
supermarket is lined with dozens, or even hundreds, of different brands
and seems to stretch for miles. A significant portion of the population
likes cereal in the morning, but does a particular individual prefer Honey
Bunches of Oats with Almonds, Honey Bunches of Oats with Strawberries
or Honey Bunches of Oats Just Bunches? Or maybe you’re more of a
Kashi GoLean (or Kashi GoLean Crunch!, Kashi GoLean Crisp! Cinnamon
Crumble or Kashi GoLean Crisp! Toasted Berry Crumble) person.
The point is that each individual is slightly different and collectively
they have more options than any human could possibly contemplate.
And yet, each individual expects Web retailers and content providers
to know them; know what their preferences are, speak their language
and serve up recommendations that are relevant to them, whether
they’re just starting their research or returning to a site on a different
digital device altogether.
And not only are people different, their circumstances are different and
change minute-by-minute. They signal their intent by typing queries,
sure, but also on Facebook and Twitter. They display their intent by
clicking on email links or opening a page on their iPhones and iPads.
And they don’t stand still. It’s not just what a given individual is looking
for, it’s where they are and when they are looking for it. Context matters
-- more so than ever in the era of mobile. Someone in New York
looking for restaurants in Union Square is not going to be interested in
John’s Grill in San Francisco. But a searcher in San Francisco might be
delighted to know that John’s is just off that city’s Union Square.
12
Earlier in this paper, we described how the Web Relevance Engine
ingests pixel data. That pixel data is the foundation of the WRE’s
understanding of individual people. A website visitor’s engagement with
a site reveals a great deal about her tastes and intent in that moment.
Let’s start with an important piece of how the WRE is able to understand
a shopper’s intent. As we’ll discuss later in more detail, the WRE
understands products. The feed, crawl and language data enable the
WRE to truly know what is on a page. To the WRE, each product is a rich
series of attributes - from cut, color and category to context relating to
where that product would be used. And if you understand the attributes
of each product, you can also map the relationships between those
products in the catalog in ways that are not reliant on the category
tree and go far beyond typical analytics systems that simply view each
product page as a URL.
If you map each visitor’s engagement with a site - the products on the
pages they view, the products they engage with, click on, add-to-cart and
buy - you can learn a great deal about that individual shopper’s tastes.
The click path through the products is also a journey through those
products’ attributes. Given a small amount of engagement - say, three
product page views - the WRE can begin learning the affinities of an
individual shopper.
But as we all know, today’s consumer:
1
Uses multiple devices.
2
Does not think of those devices as disparate channels
when engaging with a single site.
3
Is not logging in. In fact, BloomReach researched the
rates at which people log into an e-commerce site and
found it to be around 1% across desktop, tablets and
smartphones (1.56%, 1.25% and .85% respectively).
13
To overcome these challenges, the WRE uses robust pattern
detection technologies to connect anonymous users across
multiple devices using a number of behavioral and technical
signals. For instance if devices on one Wi-Fi network viewed the
same category page and three product pages over a two-day
period, there’s a high probability that it is the same person.
It is important to note that while the WRE does know quite a bit about
an individual visitor to one of our customer’s sites, it does not collect or
store any personally identifiable information (PII) about those visitors.
Nor does BloomReach track consumers between websites.
(See BloomReach’s privacy policy for details.)
This cross-device connection can be useful for subtly
personalizing the experience of a shopper and for proving the
“mobile influence” for shoppers who browse on the smartphone,
yet convert on the desktop.
14
Products
Fabric
Color
Natural
Weird Name
Necklace
Material
Fabric
Color
Style
Category 1
Category 2
Category 3
Category 4
Category 5
The Web Relevance Engine begins its deep understanding of product
catalogs by consuming product feeds and crawling the website.
Once the WRE has that initial product record, the attributes contained
within the data (which many merchants refer to as “tags”) are
supplemented with additional attributes. This robust attribute data
- which may include synonyms, contextual language about how the
products are used, relationships with other products, etc. - is how the
WRE determines the correct, relevant product mix to show on any Web
page or widget. In search engine parlance, returning the right documents
(products in this case) is known as “recall” and it is something the WRE
excels at. BloomReach applies this WRE capability for applications
including organic-search-optimized category pages, site search results
and identification of opportunities to improve a site to capture unmet
consumer expressions of demand.
But beyond recalling the right products for a given page, search
query or recommendation, the WRE also gathers performance data
at the product level via the listening pixel. For example, how well
a given dress performs when shown as a result for a “summer
dresses” query should determine where that dress is ranked on
the page. The combination of performance data with attributes is
also applied by the WRE to new products as they enter the catalog.
If a new dress is similar to a dress that’s performing well for the
“summer dresses” query, that new dress can automatically receive a
boost in the rankings for that query. The key here is the automated
attribute extraction being used to recognize similar products.
Finally, the personal affinities we described in the People section above
are applied to the product rankings to ensure that each individual sees
the products that best suit his or her tastes. The combination of vast
data and machine learning allows the WRE to provide personalized
recommendations, search results, category pages and more for each
individual Web user. It makes sure, in short, that fans of crimson formfitting Herve Legers discover the dress most dear to their hearts.
15
Demand
red dress
BloomReach sees over 1 billion customer interactions a week and
through that broad experience, the Web Relevance Engine improves
relevance day after day. Similarly, the WRE sees how consumers
behave on individual customer’s sites, which gives it the data
it needs to personalize recommendations for consumers, while
providing merchants with insights into what products would benefit
from more prominent placement and which should be dropped lower
in the site’s hierarchy for lack of interest.
Now pair all of those insights into consumer demand with a deep
understanding of the products in a retailer’s catalog and you have
the ability to truly connect supply and demand. Economics 101. At its
core, this is one of the biggest challenges that e-commerce sites face.
They have the products and believe there is unmet demand for those
products. Yet the friction those consumers face in discovering those
products is a hurdle to meeting business goals.
It’s a big data problem that the WRE is uniquely capable of solving.
16
Applications
SNAP
COMPASS
u
ORGANIC
SEARCH
ou
ear ning t
l
e
s
People
data
Products
ch
Web
Wide
Demand
y
BloomReach Organic Search uses the WRE to increase crawlability,
improve site content and cluster products on high quality thematic
pages. Each of these helps our customers capture and convert consumer
Site Optimization
& Merchandising
lo g
BloomReach SNAP leverages the WRE to optimize and personalize
product discovery for every single visitor to a website. A shopper’s path
to purchase may include site search, navigating to category pages and
engaging with product recommendations. And that journey may take
place across multiple devices. BloomReach SNAP learns from every
interaction with a shopper - across devices - to better tailor the product
mix, ranking and recommendations to suit her expressed affinities. The
results are an improved customer experience and revenue lift.
Search, Navigation
& Personalization
no
Using the WRE, BloomReach Compass identifies opportunities to
better optimize a site and digital marketing campaigns. Compass
prioritizes actionable recommendations to site merchants so they
can reduce friction in the buying journey. Compass is able to provide
recommendations like no other tool because the WRE understands the
products in the catalog, the behavior of each visitor and the signals that
indicate where those people and products are having difficulty finding one
another.
Customer
Acquisition
ti n
The cloud-based nature of the WRE also means that it is instantaneously
and infinitely scaleable. Additional clusters can be deployed at a
moment’s notice, meaning new customers can be added efficiently,
without any disruption to existing customers.
demand from search engines. The WRE also helps identify additional
opportunities, facilitates creating new category pages (which can also be
used for other marketing channels, like email, paid search or social), and
continuously monitors the health of those pages - both from a technical
and consumer perspective. Achieving quality coverage for organic search
at scale is a challenge that necessitates having technology like the WRE.
con
The Web Relevance Engine is the underlying engine of the BloomReach
Personalized Discovery Platform that powers BloomReach Compass,
SNAP (Search, Navigation and Personalization) and Organic Search.
Personalized Discovery Platform
17
Conclusion
The Web is truly one of the technological wonders of our time.
But we are still in the early days of taking advantage of the
opportunities that can arise from a world where everyone is
constantly connected to a nearly unfathomable amount of
content.
The Web Relevance Engine represents a giant leap forward,
harnessing the power of big data, machine learning and
natural language processing to deliver the relevant results
consumers have come to expect in the era of always-on
search and discovery. The WRE is driving business success,
while delivering the promise of a relevant and reliable Web.
bloomreach.com
18
About BloomReach:
The BloomReach Personalized Discovery Platform understands and matches your content to what people
are seeking, across marketing channels and devices. BloomReach makes your content and products more
discoverable with applications for organic search, site search (SNAP), site optimization and merchandising.
Further reading:
http://bloomreach.com/resources/
Follow us: