FileNewTemplate

F u n c t i o n a l S e a r c h ™ Te c h n i c a l O v e r v i e w
April 2013
Outline
1. What is Functional Search™?
2. How it works
3. The future
Outline
1. What is Functional Search™?
2. How it works
3. The future
“nearby restaurant”
“when in a movie to pee”
“control my laptop”
Powered by Quixey
APP STORE
SEARCH
PRELOADED APP
SEARCH WIDGET
VOICE-ACTIVATED
APP SEARCH
All-Platform Search Solution
Mobile
Web-Based Platforms
iPhone
HTML5
Android
Facebook
Windows Phone
LinkedIn
Salesforce
Twitter
Desktop
Browser
Mac
Firefox Add-ons
PC
Chrome Extensions
IE Add-ons
Data Sources
Apps as first-order objects
Structured
metadata
fields
Developer
Text metadata
User
App
device
editio
n
editio
n
Platform
Platform
device
Outline
1. What is Functional Search™?
2. How it works
3. The future
Machine-Learned Regression Search
(query, app)  <feature1, …, feature100>  score
1. Define features
(query, app)  <feature1, …, feature100>
2. Collect training points
(query, app)  score
3. Train machine-learned regression model
<feature1, …, feature100>  score
Machine-Learned Regression Search
(query, app)  <feature1, …, feature100>  score
1. Define features
(query, app)  <feature1, …, feature100>
2. Collect training points
(query, app)  score
3. Train machine-learned regression model
<feature1, …, feature100>  score
Types of Features
•
•
•
Query Features
–
Word count
–
Popularity
–
Category classification
Result Features (a.k.a. App Features)
–
Downloads
–
Star ratings
–
Avg review positivity
–
Number of platforms
Query-Result features
–
tf-idf for app title
–
tf-idf for entire app metadata text corpus
–
Query-result category alignment
–
Matches for domain concepts like “free” and “iphone”
No single feature is sufficient
Title Text Match
Angry Birds
Sudoku (genina.com)
PacMan
Cave Shooter
Stupid Maze Game
non-title freq of
"games"
App Popularity
How good for query
low
high
Very high
very high
low
low
high
high
low
high
high
high
low/medium
medium
low
medium
very high
medium
very low
low
“coffee shop” – more popular isn’t always best
Metafeatures Reduce ML Problem
•
Instead of learning huge regression <feature1, …, feature100>  score...
•
Can define metafeatures:
<feature1, … feature10>  MetaFeature1 (e.g. “App Quality”)
…
<feature91, …, feature100>  MetaFeature10 (e.g. “Query-to-App Text Match”)
•
Then do smaller regression:
<MetaFeature1, …, MetaFeature10>  score
Metafeatures: Pros and Con
•
Con: Constrains what the ML can learn
–
Now it can’t learn facts like “a high Text Relevance score is a bad sign for apps that have lots of tweets”
–
The concept of “# of tweets” is screened off by the concept of “Quality”
–
But anticipated relationships can be addressed
Text
Relevance
Overall
Score
# of
tweets
Quality
Metafeatures: Pros and Con
•
Pro: Use our domain knowledge to factor the problem
– Metafeatures introduce feature independence assumptions
– E.g. the “Quality Score” metafeature takes into account:
• Store-specific star ratings (iTunes App Store, Google Play, Windows Phone store,
Blackberry World, etc)
• Store-specific review counts
• Avg. popularity of all apps by this developer
• Number of recent tweets about an app
Metafeatures: Pros and Con
•
Pro: Easier to get high-quality test data
– Instead of asking our testers, “How overall good is this app for this query?”
– We can ask:
• “How high-quality is this app, given these star ratings / reviews / tweets?”
• “How textually relevant is this app for this query, given this selection of text?”
Machine-Learned Regression Search
(query, app)  <feature1, …, feature100>  score
1. Define features
(query, app)  <feature1, …, feature100>
2. Collect training points
(query, app)  score
3. Train machine-learned regression model
<feature1, …, feature100>  score
Collecting evaluation points
•
Hire full-time paid testers
•
(query, app)  Score from 1 to 5
Hundreds of points per tester per day
Machine-Learned Regression Search
(query, app)  <feature1, …, feature100>  score
1. Define features
(query, app)  <feature1, …, feature100>
2. Collect training points
(query, app)  score
3. Train machine-learned regression model
<feature1, …, feature100>  score
Q: What kind of regression model does Quixey use?
A: Commercial Gradient Boosted Decision Trees (GBDT) for search ranking
…and other ML stuff for query understanding, dynamic app
classification, cross-platform app edition merging
Choosing the best model
•
TreeNet outputs models that minimized mean-squared-error on training points
–
•
a metric on (query, app, score) points
Our real metric is Normalized Discounted Cumulative Gain (nDCG)
–
a metric on (query, <app1, …, app5>) multi-app rankings
–
•
We might evaluate several TreeNet models before seeing a real nDCG improvement. We might retry regression with:
–
Different combinations of features
–
Different TreeNet settings like learnRate and maxTrees, different splits of data
–
More/better data (or fix errors in our training data)
Outline
1. What is Functional Search™?
2. How it works
3. The future
apps -vs- web
Wants
Functions
have a clean house
Technology
Wants
Technology
Functions
Technological Web
list nearby karaoke places
Apps
Wants
Technology
Functions
Technological Web
list nearby karaoke places
URL
Apps
Wants
Technology
Functions
Technological Web
list nearby karaoke places
URL
Functional
Web
Apps
Functional
Function
Web
URL
278 Castro Street
Mountain View, CA 94041
www.quixey.com
Liron Shapira, CTO
[email protected]