Inside the Technology: Web Relevance Engine Table of Contents Introduction..............................................................1 The Web Relevance Engine by the Numbers....3 Inside the Technology............................................4 How the Algorithms Work.....................................5 Data...........................................................................6 Algorithms...............................................................8 Natural Language Processing.............................9 People.......................................................................12 Products...................................................................15 Demand....................................................................16 Applications.............................................................17 Conclusion...............................................................18 What is the Web Relevance Engine? The Web Relevance Engine (WRE) is an intelligent machine that learns continuously and develops a deep understanding of Web content and Web users’ intent. It relies on machine-learning and natural language processing to create a discovery experience that is relevant, personalized and optimized for connecting consumers with the information and products that they’re looking for. This white paper explores the WRE’s underlying technology. For further reading on how companies are leveraging the WRE, visit: bloomreach.com/resources/ Introduction The Internet is a key tool in everyday life and consumers expect it to perform reliably -- the same way they expect the lights to come on when they flip a switch or the water to stream out when they turn on a faucet. Living in a world run on mobile devices and navigated through a host of search engines and the availability of texts, email alerts, product recommendations and advertisements, consumers expect to effortlessly find what they are looking for online when they describe those things in their own terms. The problem for those who sell products or offer content on the Web is that no two people are alike. One person’s tight, red dress might be another’s crimson, form-fitting Herve Leger. So, how does a website owner deliver on consumers’ expectations that their online searches will result in recommendations that are relevant to them? How does a retailer understand users’ intent well enough to tailor the search experience down to the individual? Sequin Color Belt Frilly Stretchy Long 1 Each visitor takes a different approach when they seek to discover products or information on the Web. But each wants a memorable discovery experience; one in which the site search and navigation feel individually tailored to him or her; one in which an expressed interest in one product yields recommendations of complementary or related items. Each uses his or her own terms to describe products. Some prefer mobile. Others prefer a desktop. Some would rather navigate a site using the facets and filters that appear on a landing page. Others rely on the search box on the site. But every visitor wants to be presented with results, products and recommendations that are relevant to them. How does a website owner learn from every interaction on his or her site and the way consumers use language on the greater Web? How can a team observe and act on more than 100 million consumer interactions a day or process the number of Web pages, another 100 million day, that it needs to optimize a customer’s discovery experience? BloomReach approaches this challenge with its Web Relevance Engine, a machine-learning technology that powers the Personalized Discovery Platform, a set of tools that: • guides consumers to the content that they seek. • powers the search, navigation and personalization on sites, providing consumers with personalized results, categories and recommendations, while creating a connection among all the digital devices a single shopper might use during a search. • gives merchants and site owners actionable insights into improving content performance based on the way visitors to their sites interact with content on the site. 2 Number of Pages Processed Per Day: 010100110101011011011 100 million The Amount of Data Processed Per Day: 5-10 terabytes Number of Hadoop Jobs Per Day: 5,000-6,000 Number of Synonym Pairs in the WRE: 10 billion Number of Consumer Interactions Seen Per Day: 150 million The Number of Different Colors the WRE Understands: 1,077 The Web Relevance Engine by the Numbers 3 Inside the Technology The Web Relevance Engine means that for the first time website owners are able to deliver relevant experiences for every visitor at scale, while keeping up with their ever-changing inventory and the constantly evolving ways that consumers express their intent. The engine sits at the center of efforts to acquire customers and provide memorable and engaging experiences for them, while informing data-driven strategic decisions. This white paper will explain how that’s possible by exploring in depth the way the WRE comes to deeply understand people, products and Webwide demand -- and the way the three are intertwined. But first, let’s look at the data and technology that power the WRE. empty bins that you’ll have to fill with your own data. When building the WRE, we focused on gathering data from a variety of sources, some of which benefit from network effects that a single e-commerce enterprise could not collect on its own. Other data in the WRE is either difficult to gather or difficult to analyze and act on in real-time at scale, especially when that action relies on data from multiple sources. The principal data used in the WRE comes from three sources: pixels, feeds and crawling. This data is the key to the WRE’s deep understanding of people, products and demand, the interplay of which we’ll cover in a later section of this white paper. The garbage in, garbage out axiom is fitting when it comes to platforms that power product discovery. Sadly, many of those platforms don’t even come with garbage. They’re essentially just 4 How the Algorithms Work The Web Relevance Engine applies a series of algorithms to the gathered data in order to create a meaningful discovery experience for consumers on the Web. The WRE is able to algorithmically assemble a meaningful relevant set in response to a query. Then it is able to rank the large number of results in order of relevance, based on the user’s intent. BloomReach’s WRE relies on a database of 10 billion synonym pairs and other signals involving product attributes to match a given query to the relevant set of products in our customers’ product catalogs. Unlike many search engines, including the out-of-the-box Solr product, the WRE’s reliance on machine learning and natural language processing means that there is no need to manually build a synonym list or to rely on a massive global dictionary that provides a one-size-fits all solution to a problem that is not a one-size-fits all challenge. Not only that, but the WRE learns from every interaction. Results and rankings continuously improve. Consider a query including the word, “Frozen.” Not long ago the attribute would refer to the adjective for something that is very cold. “Frozen food container” might be a relevant result, depending on many other factors. But since Disney released the animated princess movie “Frozen” in 2013, the attribute has a whole new meaning. Now, for department stores and general merchandisers, “frozen” is quite likely to be a reference to a product with a tie-in to the movie. Once a relevant set is identified, the WRE goes to work ranking the results in a many-layered process that applies metrics to thousands or even millions of products, including assessing the products’ performance in terms of revenue per visit and the products’ performance in the context of the search query entered. The WRE’s underlying algorithms next turn to personalizing the results and rankings. Consider a consumer who has a propensity for converting on medium-size shirts. A shirt that ranks high on color and other product attributes would be knocked down in rank if it were not available in medium. Another relevant shirt would move up the list. 5 Data Pixel Feeds A listening pixel is added to each page of all BloomReach customer sites. The pixel enables the Web Relevance Engine to understand the specific behavior of customers on that site in relation to its content. For example, what is the path for an individual shopper on the site? Did she start on the homepage or land on a category page via organic search? Did she click on a product page? Did she execute a site search? Was she happy enough with the site search results to click on a product? Or did she abandon her search? All this activity is placed in context due to the fact that the WRE develops a deep understanding of the content -- page, link, image -- that a given user visits. The listening pixel not only tracks that data for a shopper, but compiles it for the aggregate visitors to the site, revealing a tremendous number of actionable insights. The listening pixel also provides the WRE with performance data for products and categories. By tracking views, clicks, add-to-carts and conversions, the WRE understands which products are performing well from a revenue perspective. This data can then be used to optimize the product mix on a given page for the maximum revenue potential. Our customers provide us with product feeds for some BloomReach Personalized Discovery Platform applications. Although there are exceptions due to site-specific requirements, the product feed is typically provided via file transfer protocol (FTP) daily, with intraday updates for inventory and pricing changes. Platforms such as IBM WebSphere Commerce have native integrations to the product database to simplify and expedite sharing product data. The Web Relevance Engine ingests a product feed, pulling out valuable data about the products and how they relate to one another, both in their known category structures and based on attributes within the WRE that we use to enhance the feed data. 6 M E For information beyond that contained in the product feed, including creative marketing copy on product pages and customer reviews, the Web Relevance Engine does a deep crawl of BloomReach customer websites. The additional language on those pages helps the WRE gain a further understanding of the context that the products fit into, as well as the language that consumers use to describe them. T Th St in- AD ra St ES pp ra yL pB C ea an th da er g Ha e D rn res es s sB & elt Crawl METADESC Thin-Strap Bandage Dress & Strappy Leather Harness Belt CRUMB Women’s Apparel + Dresses PRODUCT_ID 37213 CRUMBURLS %2F%2Fapparel%2F%2Fapparel%2Fdresses%2F %2Fapparel%2Fdresses%2F.asp DESC There’s no doubt you’ll be the red-hot ticket at your next event with this dress.This signature look… HEADING Red Bandage Dress with Leather Harness Belt IMAGE http://hautedress.com/is/image/hautedress/n/w/300/33212.jpg * Crawling information example In addition to BloomReach customer sites, the WRE also crawls a myriad of other public sites such as Wikipedia, competitor sites and blogs, to gain further understanding of language, context and consumer intent. For example, if we have a customer in the teen women’s apparel business, we may crawl several other sites in that vertical to increase the WRE’s understanding of the language used by those teens and the marketers who speak to them. For example, the WRE crawls Wikipedia to learn about the language of landmarks and people of note. And to learn about consumer intent and product descriptions and the context in which the two meet, the WRE crawls the sites of our customers’ competitors. 7 Algorithms Natural language processing drives many of the WRE’s capabilities. So, the WRE compiles data gathered from the pixel, crawl, feeds and queries into language models. For example, the WRE has 10 billion synonym pairs. The WRE also understands 1,077 different colors by name and by how their various hues relate to one another. Therefore, this data in the WRE is extremely valuable as a starting point for any of the BloomReach applications. The ongoing engagement with BloomReach customer sites as well as the performance of our search optimized pages enable the WRE to continuously learn about consumer language and behavior. The Web Relevance Engine has a tremendous and growing database of consumer queries. The data is compiled from historic organic and paid searches, as well as site searches on BloomReach customers sites. Note that organic and paid search marketers no longer have as much access to data at the query level due to changes in consumer privacy protection by the major search engines. For an idea of just how much there is to learn about consumer language and behavior, consider an exercise BloomReach conducted, asking participants to describe a rather fetching, red dress. The first 500 users who took the quiz came up with 129 ways to describe the dress’ color, 194 ways to describe its neckline and 275 ways to describe its belt, which some might say defies description. And that was just for starters. ceramic travel mug reusable shopping bags reclaimed energy monitor energy saver appliances reusable energy saving home composting kit recycled glass vase recycled recycle bin organic bamboo coasters solar organic robe organic gift basket reclaimed wood bowl solar radio recheargable batteries organic skinny jeans product-tag non-product-tag node of interest 8 Natural Language Processing The heart of the Web Relevance Engine is a natural language processing system that relies on machine learning to constantly improve its reliability. The two technologies together give the WRE the ability to understand the relationship between a consumer and the content on a BloomReach customers’ website. This work goes on at a tremendous scale. The WRE processes five to 10 terabytes of data a day. It is exposed to 150 million consumer interactions a day. That scale is vital to the WRE’s effectiveness. The more data the WRE ingests and processes, the more accurate its understanding of natural human language becomes. BloomReach uses natural language processing to do that by determining the semantic language that shoppers use and how that language relates to the language that merchants, marketers and manufacturers use to describe and classify their products and other content. Natural language processing also helps the WRE determine the context in which language is used, for instance “red” often refers to a color, but it can also mean a brand, as in REDValentino. That’s how the WRE can understand all those synonyms and know what all those colors are -- and their relationship to each other. That allows the WRE to know that someone searching for a crimson, silk dress for a night on the town would be delighted with the same Herve Leger dress that someone hunting for a red, cotton dress for a cocktail party would like. The WRE constantly crawls the Web, conducting deep crawls of BloomReach’s customers’ websites and the broader Web. It processes data feeds from customers’ catalogs, processing 100 million Web pages a day. The WRE gathers data from BloomReach’s listening pixels, queries and its crawls and compiles language models. It then develops an understanding of language, the context in which words are used and the way words relate to each other and to product attributes. That’s how the WRE can understand all those synonyms and know what all those colors are -- and their relationship to each other. 9 Automated Collaborative Filtering/Behavioral Modeling You can learn a great deal about user intent with aggregate behavioral data. For example, if a site search query has a high volume yet few conversions, it can be extraordinarily valuable to look at what those shoppers did next. Did a significant number of them navigate to a particular category or product page? Did they refine their search query and try again? And when it comes to showing similar products or products that might be complementary, the “wisdom of the crowd” can be immensely valuable. as well as reveal it to site optimizers and marketers. It is all about connecting the dots that consumers reveal by their behavior. In the case of a site search query that is not converting well, the WRE looks at the products that are co-viewed with that query. This seemingly simple act, done at a large scale, helps the WRE connect demand with supply via better synonyms. One of the techniques that the Web Relevance Engine uses is a process called “collaborative filtering,” which utilizes data to cluster recommended products. Collaborative filtering statistically clusters visitors around shared preferences using vector mathematics and identifies specific users who are within a natural cluster, but who have not yet seen recommendations related to the cluster’s shared preferences. The WRE uses our natural language processing algorithm to interpret the attributes (and their synonyms) for the products in the cluster and identifies other products that should be in the cluster based on their attributes. The system then offers the recommendations. The visitor’s interaction with the recommendation helps refine the statistical clustering. It is important to recognize that collaborative filtering on its own can create some very off-putting recommendations, particularly for lowtraffic or new products, since the sample size of behavior that’s related to those products will be very low. So, the WRE takes additional steps, such as limiting the recommendations to the same or related categories as the original product, to ensure that they are high quality. Behavioral modeling is also used to identify areas on a site where demand is going unmet. For example, if the product information does not contain a commonly used synonym for that product, yet that synonym is used by shoppers as a site search query, the WRE can learn that new term, use it for our recommendations and results Personalization gone wrong 10 Machine Learning K1 S Keyword query Search engine P1 C1 P2 P3 C1 P1 P4 P1 P1 C2 C1 User 1 User 2 P1 C1 P2 P3 P1 C1 P2 P3 C1 P1 P4 P1 C1 P1 P4 P1 P1 C2 C1 P1 C2 C1 User 3 Product page P1 was already tagged with a concept Series of clicks by users for the same keyword query Category page Product page At the end of the day, improving customer experiences is a math problem. It’s not just the large number of customers and products, but rather how to increase the probability that you are showing the right product mix to each and every customer. To do that with any speed and scale requires machine learning, which, in the case of the Web Relevance Engine, crunches vast amounts of data and then predicts the right product mix that suits a particular user’s expressed intent. If you are familiar with statistics, you’ll recognize that Bayesian probability models adjust their results as they ingest new information. There is no conclusion or final certainty. Rather, these models continue to learn and improve over time. This is essentially how the machine learning capabilities of the WRE work. The WRE analyzes new data - about people, products and demand - to constantly iterate on the actions it takes and opportunities it recommends. This self-tuning of the algorithms allows BloomReach to improve our results continuously over time. We propagate the same tag to Category page C1, since it occurs with P1 often and within a few clicks. Page associated concept tag The WRE is a life-long learner The WRE’s continuous learning means that the engine is able to anticipate the complete phrase a user intends to type before he or she is finished typing. The WRE builds this understanding based on what users ultimately click on while broadly searching as well as when they search on a website. For instance, consider a user who types the letter ‘s’ on a department store site. The WRE might offer “shoes” as a site search query. Type in ‘s’ on an athletic team gear site and “Seattle Seahawks” is a more relevant query. Similarly, machine learning provides the engine with the ability to adapt to different seasons. Typing the letter ‘v’ on a florist’s site in February is likely to produce “Valentine’s Day” as a query. Type ‘v’ a few months later and the auto suggestion is more likely to be “violets.” 11 People 2 individuals, same demographic. Female, 28 years old, College grad, Mountain View, CA, Income >80k There are more than 3 billion Internet users on the planet and the number is growing every minute. All of them have their own personalities, likes, dislikes, needs and desires. There is a reason that the cereal aisle at the supermarket is lined with dozens, or even hundreds, of different brands and seems to stretch for miles. A significant portion of the population likes cereal in the morning, but does a particular individual prefer Honey Bunches of Oats with Almonds, Honey Bunches of Oats with Strawberries or Honey Bunches of Oats Just Bunches? Or maybe you’re more of a Kashi GoLean (or Kashi GoLean Crunch!, Kashi GoLean Crisp! Cinnamon Crumble or Kashi GoLean Crisp! Toasted Berry Crumble) person. The point is that each individual is slightly different and collectively they have more options than any human could possibly contemplate. And yet, each individual expects Web retailers and content providers to know them; know what their preferences are, speak their language and serve up recommendations that are relevant to them, whether they’re just starting their research or returning to a site on a different digital device altogether. And not only are people different, their circumstances are different and change minute-by-minute. They signal their intent by typing queries, sure, but also on Facebook and Twitter. They display their intent by clicking on email links or opening a page on their iPhones and iPads. And they don’t stand still. It’s not just what a given individual is looking for, it’s where they are and when they are looking for it. Context matters -- more so than ever in the era of mobile. Someone in New York looking for restaurants in Union Square is not going to be interested in John’s Grill in San Francisco. But a searcher in San Francisco might be delighted to know that John’s is just off that city’s Union Square. 12 Earlier in this paper, we described how the Web Relevance Engine ingests pixel data. That pixel data is the foundation of the WRE’s understanding of individual people. A website visitor’s engagement with a site reveals a great deal about her tastes and intent in that moment. Let’s start with an important piece of how the WRE is able to understand a shopper’s intent. As we’ll discuss later in more detail, the WRE understands products. The feed, crawl and language data enable the WRE to truly know what is on a page. To the WRE, each product is a rich series of attributes - from cut, color and category to context relating to where that product would be used. And if you understand the attributes of each product, you can also map the relationships between those products in the catalog in ways that are not reliant on the category tree and go far beyond typical analytics systems that simply view each product page as a URL. If you map each visitor’s engagement with a site - the products on the pages they view, the products they engage with, click on, add-to-cart and buy - you can learn a great deal about that individual shopper’s tastes. The click path through the products is also a journey through those products’ attributes. Given a small amount of engagement - say, three product page views - the WRE can begin learning the affinities of an individual shopper. But as we all know, today’s consumer: 1 Uses multiple devices. 2 Does not think of those devices as disparate channels when engaging with a single site. 3 Is not logging in. In fact, BloomReach researched the rates at which people log into an e-commerce site and found it to be around 1% across desktop, tablets and smartphones (1.56%, 1.25% and .85% respectively). 13 To overcome these challenges, the WRE uses robust pattern detection technologies to connect anonymous users across multiple devices using a number of behavioral and technical signals. For instance if devices on one Wi-Fi network viewed the same category page and three product pages over a two-day period, there’s a high probability that it is the same person. It is important to note that while the WRE does know quite a bit about an individual visitor to one of our customer’s sites, it does not collect or store any personally identifiable information (PII) about those visitors. Nor does BloomReach track consumers between websites. (See BloomReach’s privacy policy for details.) This cross-device connection can be useful for subtly personalizing the experience of a shopper and for proving the “mobile influence” for shoppers who browse on the smartphone, yet convert on the desktop. 14 Products Fabric Color Natural Weird Name Necklace Material Fabric Color Style Category 1 Category 2 Category 3 Category 4 Category 5 The Web Relevance Engine begins its deep understanding of product catalogs by consuming product feeds and crawling the website. Once the WRE has that initial product record, the attributes contained within the data (which many merchants refer to as “tags”) are supplemented with additional attributes. This robust attribute data - which may include synonyms, contextual language about how the products are used, relationships with other products, etc. - is how the WRE determines the correct, relevant product mix to show on any Web page or widget. In search engine parlance, returning the right documents (products in this case) is known as “recall” and it is something the WRE excels at. BloomReach applies this WRE capability for applications including organic-search-optimized category pages, site search results and identification of opportunities to improve a site to capture unmet consumer expressions of demand. But beyond recalling the right products for a given page, search query or recommendation, the WRE also gathers performance data at the product level via the listening pixel. For example, how well a given dress performs when shown as a result for a “summer dresses” query should determine where that dress is ranked on the page. The combination of performance data with attributes is also applied by the WRE to new products as they enter the catalog. If a new dress is similar to a dress that’s performing well for the “summer dresses” query, that new dress can automatically receive a boost in the rankings for that query. The key here is the automated attribute extraction being used to recognize similar products. Finally, the personal affinities we described in the People section above are applied to the product rankings to ensure that each individual sees the products that best suit his or her tastes. The combination of vast data and machine learning allows the WRE to provide personalized recommendations, search results, category pages and more for each individual Web user. It makes sure, in short, that fans of crimson formfitting Herve Legers discover the dress most dear to their hearts. 15 Demand red dress BloomReach sees over 1 billion customer interactions a week and through that broad experience, the Web Relevance Engine improves relevance day after day. Similarly, the WRE sees how consumers behave on individual customer’s sites, which gives it the data it needs to personalize recommendations for consumers, while providing merchants with insights into what products would benefit from more prominent placement and which should be dropped lower in the site’s hierarchy for lack of interest. Now pair all of those insights into consumer demand with a deep understanding of the products in a retailer’s catalog and you have the ability to truly connect supply and demand. Economics 101. At its core, this is one of the biggest challenges that e-commerce sites face. They have the products and believe there is unmet demand for those products. Yet the friction those consumers face in discovering those products is a hurdle to meeting business goals. It’s a big data problem that the WRE is uniquely capable of solving. 16 Applications SNAP COMPASS u ORGANIC SEARCH ou ear ning t l e s People data Products ch Web Wide Demand y BloomReach Organic Search uses the WRE to increase crawlability, improve site content and cluster products on high quality thematic pages. Each of these helps our customers capture and convert consumer Site Optimization & Merchandising lo g BloomReach SNAP leverages the WRE to optimize and personalize product discovery for every single visitor to a website. A shopper’s path to purchase may include site search, navigating to category pages and engaging with product recommendations. And that journey may take place across multiple devices. BloomReach SNAP learns from every interaction with a shopper - across devices - to better tailor the product mix, ranking and recommendations to suit her expressed affinities. The results are an improved customer experience and revenue lift. Search, Navigation & Personalization no Using the WRE, BloomReach Compass identifies opportunities to better optimize a site and digital marketing campaigns. Compass prioritizes actionable recommendations to site merchants so they can reduce friction in the buying journey. Compass is able to provide recommendations like no other tool because the WRE understands the products in the catalog, the behavior of each visitor and the signals that indicate where those people and products are having difficulty finding one another. Customer Acquisition ti n The cloud-based nature of the WRE also means that it is instantaneously and infinitely scaleable. Additional clusters can be deployed at a moment’s notice, meaning new customers can be added efficiently, without any disruption to existing customers. demand from search engines. The WRE also helps identify additional opportunities, facilitates creating new category pages (which can also be used for other marketing channels, like email, paid search or social), and continuously monitors the health of those pages - both from a technical and consumer perspective. Achieving quality coverage for organic search at scale is a challenge that necessitates having technology like the WRE. con The Web Relevance Engine is the underlying engine of the BloomReach Personalized Discovery Platform that powers BloomReach Compass, SNAP (Search, Navigation and Personalization) and Organic Search. Personalized Discovery Platform 17 Conclusion The Web is truly one of the technological wonders of our time. But we are still in the early days of taking advantage of the opportunities that can arise from a world where everyone is constantly connected to a nearly unfathomable amount of content. The Web Relevance Engine represents a giant leap forward, harnessing the power of big data, machine learning and natural language processing to deliver the relevant results consumers have come to expect in the era of always-on search and discovery. The WRE is driving business success, while delivering the promise of a relevant and reliable Web. bloomreach.com 18 About BloomReach: The BloomReach Personalized Discovery Platform understands and matches your content to what people are seeking, across marketing channels and devices. BloomReach makes your content and products more discoverable with applications for organic search, site search (SNAP), site optimization and merchandising. Further reading: http://bloomreach.com/resources/ Follow us:
© Copyright 2026 Paperzz