Under the Hood:Facebook Graph Search

-A Survey
Xiangqian Lee
Websoft






Natural language to search the huge social
graph.
Users, pages, photos, places, posts,
games…all nodes.
Relationship can be: like, friend of, follow,
check in, ……
Query by natural language is simple, natural.
keyword-based, form-filling not good
Natural Language query can do good in a
domain-specific area.

Two main challenges(by Lars Rasmussen)
◦ Parse the natural languages to structured query.
◦ High scalable indexes of nodes and relationships
supporting frequently large scale updates and
search.

Natural Language Query:
◦ friend in nanjing

Semantic Language:


Intersect(firend(me), residents(115073811842312))
S-expression Language:

(and friend:232343, residents:115073811842312)
Unicorn
Grammar: Weighted Context Free Grammar(WCFG)
N-gram


Detect all possible query segments that refer
to an entity or a relation.
For each,
◦ Find possible categories with a probability.

Use Facebook Typeahead to resolve entities
behind the query segments with high
confidence.

One intent can be expressed in various ways.
◦
◦
◦
◦
◦



“photos of my friends”
“friend photos”
“photos with my friends”
“pictures of my friends”
“photos of facebook friends”
Query may be not grammatically correct.
Synonyms.
Unimportant terms in the trees.


Find all terminal rules that match the query.
Search:
◦ Generate candidate semantic trees -> semantic
languages.
◦ The tree is generated from a subset of terminal
rules that have a sequence of consecutive, nonoverlapping matching tokens covering the whole
range of query.
◦ Output a top-k list of semantic trees.->natural
language query suggestions.
◦ Adopt semantic scoring to prevent similar
suggestions or semantically incorrect suggestions.

NL query: friend in nanjing

URL:


https://www.facebook.com/search/me/friends/115073811842312/residents/present/i
ntersect
Intersect(friend(me), present(residents(115073811842312)))

Unicorn
◦
◦
◦
◦
◦
Inverted Indexes framework: nodes and relations
In-memory
INPUT: a S-expression query language
Use Hive, Hadoop & HBase to update indexes.
Update Scale:




1 billion people,
240 billion photos ,
1 trillion connections,
Thousands type of connections
PER MONTH.
◦ A series of query optimizations are adopted.
My friends who lives in Beijing, China and like Friends(TV show)
Intersect(friend(me), residents(12345), like(67890))
(and friend:13579, live-in:12345, like:67890)
Friends
me
Beijing
(apply R: A): apply the binary relation R on set A.
Columbia
University
Google
Me
Goldman
New York

For hundreds or thousands of results:
◦ Because the query is supposed very close to your
intension, query relevance is less important.
◦ Rank by relevance to your social networks.
◦ Example:
 Find restaurants.
 Restaurants liked by more people are closely related to
you will be ranked higher.
◦ More rules are adopted.

Facebook Engineering blog:

Reddit Post:
◦ Under the Hood: Building out the infrastructure for Graph
Search (https://www.facebook.com/notes/facebookengineering/under-the-hood-building-out-theinfrastructure-for-graph-search/10151347573598920)
◦ Under the Hood: Indexing and ranking in Graph
Search(https://www.facebook.com/notes/facebookengineering/under-the-hood-indexing-and-ranking-ingraph-search/10151361720763920)
◦ Under the Hood: The natural language interface of Graph
Search (https://www.facebook.com/notes/facebookengineering/under-the-hood-the-natural-languageinterface-of-graph-search/10151432733048920)
◦ Ask Me Everything post on Reddit by Lars
Rasmussen.(http://www.reddit.com/r/IAmA/comments/18j
b6d/i_am_the_pointyhaired_engineering_director_for/)