Building Watson A Brief Overview of the DeepQA Project

IBM Research
IBM Watson and Jeopardy:
Was it only a game?
Barclay Brown
IBM
1
© 2010 IBM Corporation
IBM Research
IBM Watson
The Enormity of the Challenge
The Evolution of DeepQA
How Watson Plays Jeopardy
What else can we do with this technology?
© 2010 IBM Corporation
IBM Research
IBM Watson
The Enormity of the Challenge
The Evolution of DeepQA
How Watson Plays Jeopardy
What else can we do with this technology?
© 2010 IBM Corporation
IBM Research
$400
Watson’s ancestor
defeated the world
champion at this in
1997, also a
Broadway musical.
© 2010 IBM Corporation
IBM Research
What is, “Chess” ?
Source: UPI
http://www.upi.com/News_Photos/view/6a0429fa1ada13e8ae7f5558b9deaf88/Champion-Garry-Kasparov-plays-IBMs-Deep-Blue/
© 2010 IBM Corporation
IBM Research
The Jeopardy! Challenge posed a unique and
compelling artificial intelligence question…
“Can a computer system be designed to compete
against the best humans at a task thought to
require high levels of human intelligence, and if
so, what kind of technology, algorithms, and
engineering is required?”
Ferrucci, et al, Building Watson, AI Magazine, Fall 2010
© 2010 IBM Corporation
IBM Research
Want to Play Chess or Just Chat?
Chess
–A finite, mathematically well-defined
search space
–Limited number of moves and states
–All the symbols are completely grounded in the
mathematical rules of the game
Human Language
–Words by themselves have no
meaning
–Only grounded in human cognition
–Words navigate, align and communicate an infinite space of
intended meaning
–Computers can not ground words to human experiences to derive
meaning
© 2010 IBM Corporation
IBM Research
Easy Questions?
ln((12,546,798 * π)) ^ 2 / 34,567.46 =
0.00885
Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,
Owner
Serial Number
David Jones
45322190-AK
Serial Number Type
Invoice #
45322190-AK
INV10895
LapTop
David Jones
David Jones
8
=
Invoice #
Vendor
Payment
INV10895
MyBuy
$104.56
Dave
Jones
David Jones
≠
© 2010 IBM Corporation
IBM Research
Hard Questions?
Computer programs are natively explicit, fast and exacting in their calculation over
numbers and symbols….But Natural Language is implicit, highly contextual,
ambiguous and often imprecise.
Person
Birth Place
A. Einstein
ULM
Where was X born?
Structured
Unstructured
One day, from among his city views of Ulm, Otto chose a water color to
send to Albert Einstein as a remembrance of Einstein´s birthplace.
 X ran this?
Person
Organization
J. Welch
GE
If leadership is an art then surely Jack Welch has proved himself a
master painter during his tenure at GE.
© 2010 IBM Corporation
IBM Research
The Jeopardy! Challenge: A compelling and notable way to drive and
measure the technology of automatic Question Answering along 5 Key Dimensions
Broad/Open
Domain
Complex
Language
$200
$1000
If you're standing, it's the
direction you should
look to check out the
wainscoting.
The first person
mentioned by name in
‘The Man in the Iron
Mask’ is this hero of a
previous book by the
same author.
$600
$2000
High
Precision
Accurate
Confidence
High
Speed
In cell division, mitosis
splits the nucleus &
cytokinesis splits this
liquid cushioning the
nucleus
Of the 4 countries in the
world that the U.S. does
not have diplomatic
relations with, the one
that’s farthest north
© 2010 IBM Corporation
IBM Research
Broad Domain
We do NOT attempt to anticipate all questions and build specialized databases.
3.00%
2.50%
2.00%
In a random sample of 20,000 questions we found
2,500 distinct types*. The most frequent occurring <3% of the time.
The distribution has a very long tail.
And for each these types 1000’s of different things may be asked.
1.50%
1.00%
Even going for the head of the tail will
barely make a dent
0.50%
he
film
group
capital
woman
song
singer
show
composer
title
fruit
planet
there
person
language
holiday
color
place
son
tree
line
product
birds
animals
site
lady
province
dog
substance
insect
way
founder
senator
form
disease
someone
maker
father
words
object
writer
novelist
heroine
dish
post
month
vegetable
sign
countries
hat
bay
0.00%
*13% are non-distinct (e.g., it, this, these or NA)
Our Focus is on reusable NLP technology for analyzing volumes of as-is text.
Structured sources (DBs and KBs) are used to help interpret the text.
11
© 2010 IBM Corporation
IBM Research
IBM Watson
The Enormity of the Challenge
The Evolution of DeepQA
How Watson Plays Jeopardy
What else can we do with this technology?
© 2010 IBM Corporation
IBM Research
$600
Watson’s spare-time
hobby--second only to
reading millions of
books (which it doesn’t
really understand).
© 2010 IBM Corporation
IBM Research
What is “parsing”?
Volumes of Text
Syntactic Frames
Semantic Frames
Inventors patent inventions (.8)
Officials Submit Resignations (.7)
People earn degrees at schools (0.9)
Fluid is a liquid (.6)
Liquid is a fluid (.5)
Vessels Sink (0.7)
People sink 8-balls (0.5) (in pool/0.8)
© 2010 IBM Corporation
Keyword Evidence
IBM Research
In May 1898 Portugal celebrated
the 400th anniversary of this
explorer’s arrival in India.
In May, Gary arrived in
India after he celebrated his
anniversary in Portugal.
arrived in
celebrated
In May
1898
Keyword Matching
400th
anniversary
This evidence
suggests “Gary” is
the answer BUT the
system must learn
that keyword
matching may be
weak relative to other
types of evidence
15
Keyword Matching
Portugal
celebrated
In May
Keyword Matching
anniversary
Keyword Matching
in Portugal
arrival in
India
explorer
Keyword Matching
India
Gary
© 2010 IBM Corporation
Deeper Evidence
IBM Research
In May 1898 Portugal celebrated
the 400th anniversary of this
explorer’s arrival in India.
On the 27th of May 1498, Vasco da
Gama landed in Kappad Beach
Search Far and Wide
Explore many hypotheses
celebrated
Find Judge Evidence
Portugal
May 1898
400th anniversary
landed in
Many inference algorithms
Temporal
Reasoning
27th May 1498
Date
Math
arrival
in
Stronger
evidence can
be much
harder to find
and score.
16
India
Statistical
Paraphrasing
GeoSpatial
Reasoning
Paraphrase
s
Kappad Beach
GeoKB
Vasco da Gama
explorer
The evidence is still not 100% certain.
© 2010 IBM Corporation
IBM Research
Not Just for Fun
Some Questions require
Decomposition and Synthesis
A long, tiresome speech delivered by a frothy pie topping
.
Diatribe
.
Harangue
.
.
.
Whipped Cream
.
.
Meringue
.
.
.
Answer: Meringue Harangue
Category: Edible Rhyme Time
17
© 2010 IBM Corporation
IBM Research
Divide and Conquer
(Typical in Final Jeopardy!)
Must identify and solve
sub-questions from different
sources to answer
the top level question
Lyndon B Johnson
In 1968
this man was U.S. president.
When "60 Minutes" premiered this man was U.S. president.
?
The DeepQA architecture attempts different decompositions and
recursively applies the QA algorithms
18
© 2010 IBM Corporation
IBM Research
The Missing Link
Buttons
Shirts,
TV remote controls, Telephones
Mt
Everest
Edmund
Hillary
He was first
On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.
© 2010 IBM Corporation
IBM Research
IBM Watson
The Enormity of the Challenge
The Evolution of DeepQA
How Watson Plays Jeopardy
What else can we do with this technology?
© 2010 IBM Corporation
IBM Research
The Best Human Performance: Our Analysis Reveals the Winner’s Cloud
Each dot represents an actual historical human Jeopardy! game
Top human
players are
remarkably
good.
Winning Human
Performance
Grand Champion
Human Performance
Computers?
2007 QA Computer System
More Confident
21
Less Confident
© 2010 IBM Corporation
IBM Research
Some Growing Pains (early answers)
the People
THE AMERICAN DREAM
Decades before Lincoln, Daniel Webster spoke of government "made
for", "made by" & "answerable to" them
No One
WW I
NEW YORK TIMES HEADLINES
An exclamation point was warranted for the "end of" this! In 1918
a sentence
Apollo 11 moon landing
MILESTONES
In 1994, 25 years after this event, 1 participant said, "For one crowning
the Big Bang
moment, we were creatures of the cosmic ocean”
Call on the phone
THE QUEEN'S ENGLISH
Give a Brit a tinkle when you get into town & you've done this
urinate
Louis Pasteur
FATHERLY NICKNAMES
This Frenchman was "The Father of Bacteriology"
How Tasty Was My
Little Frenchman
© 2010 IBM Corporation
IBM Research
Grouping Features to produce Evidence Profiles
Clue: Chile shares its longest land border with this country.
1
0.8
0.6
Argentina
Bolivia
Bolivia is more
popular due to
a commonly
discussed
border dispute..
0.4
0.2
Positive Evidence
0
-0.2
Negative Evidence
© 2010 IBM Corporation
IBM Research
Evidence: Time, Popularity, Source, Classification etc.
Clue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.
Saint Paul
South Bend
There’s a Bethel College and a Seminary in both
cities. System is not weighing location evidence
high enough to give St. Paul the edge.
© 2010 IBM Corporation
IBM Research
Evidence: Puns
Clue: You’ll find Bethel College and a Seminary in this “holy” Minnesota city.
Saint Paul
South Bend
Humans may get this based on the pun since St.
Paul since is a “holy” city. We added a Pun Scorer
that discovers and scores Pun relationships.
© 2010 IBM Corporation
IBM Research
DeepQA: Incremental Progress in Answering Precision
on the Jeopardy Challenge: 6/2007-11/2010
IBM Watson
Playing in the Winners Cloud
100%
90%
v0.8 11/10
80%
V0.7 04/10
70%
v0.6 10/09
v0.5 05/09
Precision
60%
v0.4 12/08
50%
v0.3 08/08
v0.2 05/08
40%
v0.1 12/07
30%
20%
10%
Baseline 12/06
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% Answered
© 2010 IBM Corporation
IBM Research
Precision, Confidence & Speed
 Deep Analytics – Combining many analytics in a
novel architecture, we achieved very high levels of
Precision and Confidence over a huge variety of as-is
content.
 Speed – By optimizing Watson’s computation for
Jeopardy! on over 2,800 processing cores we went
from 2 hours per question on a single CPU to an
average of just 3 seconds.
 Results – in 55 real-time sparring games against
former Tournament of Champion Players last year,
Watson put on a very competitive performance in all
games -- placing 1st in 71% of the them!
© 2010 IBM Corporation
IBM Research
Watson Workload Optimized System (Power 750)
 90 x IBM Power 7501 servers
 2880 POWER7 cores
 POWER7 3.55 GHz chip
 500 GB per sec on-chip bandwidth
 10 Gb Ethernet network
 15 Terabytes of memory
 20 Terabytes of disk, clustered
 Can operate at 80 Teraflops
 Runs IBM DeepQA software
 Scales out with and searches vast amounts of unstructured
information with UIMA & Hadoop open source components
 SUSE Linux provides a cost-effective open platform which is
performance-optimized to exploit POWER 7 systems
 10 racks include servers, networking, shared disk system,
cluster controllers
1
Note that the Power 750 featuring POWER7 is a commercially available
server that runs AIX, IBM i and Linux and has been in market since Feb 2010
© 2010 IBM Corporation
IBM Research
$800
This computer does
NOT hear, see,
connect to the
Internet or watch
Jeopardy!
© 2010 IBM Corporation
IBM Research
What is “Watson” ?
Clue Grid
Insulated and
Self-Contained
Human Player
1
Decisions to
Buzz and Bet
Strategy
Watson’s
Game
Controller
Jeopardy!
Game
Control
System
Text-to-Speech
Clue &
Category
Answers &
Confidences
Watson’s
QA Engine
2,880
IBM Power750
Compute
Cores
15 TB of
Memory
Human Player
2
Clues, Scores & Other Game Data
Analysis of natural language content
equivalent to 1 Million Books
© 2010 IBM Corporation
IBM Research
Generalized DeepQA Reasoning Paradigm
Case/
Question/
Topic
Data
Learned Models
help combine and
weigh the Evidence
Evidence
Sources
Answer
Sources
Primary
Search
Question
/Case
Analysis
Candidate
Answer
Generation
Fact, Finding
and Question
Decomposition
Hypothesis
Scoring
Hypothesis
Generation
Hypothesis
Generation
Evidence
Retrieval
Hypothesis and
Evidence Scoring
Hypothesis and Evidence
Scoring
...
Deep
Evidence
Scoring
Synthesis
Models
Models
Models
Models
Models
Models
Final Merging
and Confidence
& Ranking
ConfidenceWeighted
Differential
Diagnosis
© 2010 IBM Corporation
IBM Research
IBM Watson
The Enormity of the Challenge
The Evolution of DeepQA
How Watson Plays Jeopardy
What else can we do with this technology?
© 2010 IBM Corporation
IBM Research
To leverage our lead we
must cross this trough
very quickly through
education and training
Growth
Dave: How do I get 8% return in this economy?
Watson: What is Toronto????
© 2010 IBM Corporation
35
IBM Research
© 2010 IBM Corporation
IBM Research
Watson’s real value proposition: Efficient decision support over
unstructured (and structured) content
Deeper Understanding,
Higher Precision and Broader,
Timely Coverage at lower costs
Jeopardy! Challenge
Shallow Understanding
Low Precision
Broad Coverage
Unstructured Data
 Broad, rich in context
 Rapidly growing, current
 Invaluable yet under utilized
Deeper Understanding but Brittle
High Precision at High Cost
Narrow Limited Coverage
Structured Data
 Precise, explicit
 Narrow, expensive
© 2010 IBM Corporation
37
IBM Research
Search vs. Database vs. Data Mining vs. Watson
(Watson uses unstructured and structured sources)
Search
Quickly
finding
documents
from large
volumes of
text based
on
keywords
Database
Search term: “Rudolph”
Finding
precise
information
based on
precise
queries
over tables
Document List
Based on Keywords
Query:
Select Payment where Owner=“David Jones” and
Type(Product)=“Laptop”,
Tables
Name
Serial Number
David Jones
45322190-AK
Serial Number
Type
Invoice #
45322190-AK
LapTop
INV10895
Invoice #
Vendor
Payment
INV10895
MyBuy
$104.56
Result: $104.56
Data Mining
I need to drop and
head I need to drop
and head
I need
I need
to to
drop and
drop and
head
head
I need to drop
I need to drop and
and head I need to
I need to drop and
head I need to drop
drop and head
head I need to drop
and head
I need
I need
to to
drop and
and head I need to
drop and
head
I need to drop and
head
I need to drop
drop and head
head I need to drop
and
head
I
need
to
I need to drop and
I need to drop and
and head I need to
drop
and
head
head I need to drop
I need to drop and
head I need to drop
drop and head
and head
I need
head I need to drop
I need to
I need
to to
drop andand head
I need to drop and
drop and
head
and head I need to
drop
and
head
head I need to drop
I need to drop and
head I need to drop
drop and head
and head I need to
head I need to drop
I need to
I need to drop andand head
I need to drop and
drop and head
and head I need to
drop
and
head
head I need to drop
I need to drop and
head I need to drop
drop and head
and head I need to
head I need to drop
I need to
I need to drop andand head
I need to drop and
drop and head
and head I need to
drop
and
head
head I need to drop
head I need to drop
drop and head
and head I need to
I need to
I need to drop andand head
I need to drop and
drop and head
drop
and
head
head I need to drop
head I need to drop
and head I need to
I need to drop andand head I need to
drop and head
head I need to dropdrop and head
Uncovering
knowledge
insights based on
searching for
correlations within
LDL > 220
large number of
and
Patient suffered a heart
cases
attack
Both in green
and head I need to
I need to drop and
drop and head
head I need to drop
and head I need to
drop and head
Watson
Efficiently apply
existing
knowledge to
help inform
decisionmaking
Name
Serial Number
David Jones
45322190-AK
Marion Barber
88769234-NY
I need to drop and
head I need to drop
Insights
from
other
analytics
and head
I need
to and
I need
to drop
drop and head
head I need to drop
I need to drop and
and head I need to
I need to drop and
head I need to drop
drop and
head
head I need to drop
and head
I need
to and
I need
to drop
and head I need to
drop and head
I need to drop and
head I need to drop
drop and head
and head I need to
head I need to drop
I need to drop and
I need to drop and
drop and head
and head I need to
head I need to drop
I need to drop and
head I need to drop
drop and head
and head
I need
to and
and head I need to
head I need to drop
I need
to drop
I need to drop and
drop and head
drop and head
and head I need to
head I need to drop
I need to drop and
head I need to drop
drop and head
and head I need to
and head I need to
head I need to drop
I need to drop and
I need to drop and
drop and head
drop and head
and head I need to
head I need to drop
I need to drop and
head I need to drop
drop and head
and head I need to
and head I need to
head I need to drop
I need to drop and
I need to drop and
drop and head
drop and head
and head I need to
head I need to drop
head I need to drop
drop and head
and head I need to
and head I need to
I need to drop and
I need to drop and
drop and head
drop and head
head I need to drop
head I need to drop
and head I need to
and head I need to
I need to drop and
drop and head
drop
and
head
head I need to drop
38
© 2010 IBM Corporation
and head I need to
I need to drop and
drop and head
head I need to drop
and head I need to
drop and head
IBM Research
Evidence Profiles from disparate data is a powerful idea
 Each dimension contributes to supporting or refuting hypotheses based on
– Strength of evidence
– Importance of dimension for diagnosis (learned from training data)
 Evidence dimensions are combined to produce an overall confidence
Overall Confidence
Positive
Evidence
Negative
Evidence
© 2010 IBM Corporation
IBM Research
What’s next for Watson? Some ideas from students.
IBM Announces Student Winners of Watson Case
Competition from Cornell University
 Faculty award winners from Nine Universities Also Announced; Professors to
Receive $10,000 Grants for Watson Curriculums
 First Place: Customer Service: Say Hello to Watson - Consumers with urgent
questions about their electronic devices, call, e-mail, post and chat with customer
service representatives. Watson could look at unstructured and structured
information to help consumer electronics firms answer inquiries with greater
accuracy and faster response times ultimately creating happy customers.
 Second Place: Watson Helps You Plan Your Next Beach Getaway Faster Planning business trips and vacations, including websites that aggregate flight
and hotel rates, tourism sites packed with things to do, and millions of online
reviews from fellow travelers.
 Third Place: You're Hired! Big Data Helping HR - Adaptive human capital
management model that calls upon Watson's ability to quickly comb through data
and make informed recommendations, to help businesses match open jobs with
the best candidates.
© 2010 IBM Corporation
IBM Research
© 2010 IBM Corporation