Background, Overview and What’s Next Christopher Codella, Ph.D. IBM Distinguished Engineer, CTO Public Sector, Watson Group © 2014 IBM Corporation Cognitive Systems… … learn and interact naturally with people to extend what humans could do on their own. They help us solve problems by penetrating the complexity of Big Data. Cognitive Systems Era Sensors & Devices Programmable Systems Era Tabulating Systems Era We are here Social Media VoIP Enterprise Data © 2014 IBM Corporation Informed Decision Making: Search vs. Expert Q&A Decision Maker Has Question Search Engine Distills to 2-3 Keywords Finds Documents containing Keywords Reads Documents, Finds Answers Delivers Documents based on Popularity Finds & Analyzes Evidence Expert Decision Maker Understands Question Asks NL Question Produces Possible Answers & Evidence Considers Answer & Evidence Analyzes Evidence, Computes Confidence Delivers Response, Evidence & Confidence © 2014 IBM Corporation Automatic Open-Domain Question Answering A Long-Standing Challenge in Artificial Intelligence to emulate human expertise Given – Rich Natural Language Questions – Over a Broad Domain of Knowledge Deliver – – – – 4 Precise Answers: Determine what is being asked & give precise response Accurate Confidences: Determine likelihood answer is correct Consumable Justifications: Explain why the answer is right Fast Response Time: Precision & Confidence in <3 seconds © 2014 IBM Corporation A Grand Challenge Opportunity Capture the imagination – The Next Deep Blue Engage the scientific community – Envision new ways for computers to impact society & science – Drive important and measurable scientific advances Be Relevant to IBM Customers – Enable better, faster decision making over unstructured and structured content – Business Intelligence, Knowledge Discovery and Management, Government, Compliance, Publishing, Legal, Healthcare, Busin ess Integrity, Customer Relationship Management, Web Self-Service, Product Support, etc. 5 © 2014 IBM Corporation Real Language is Real Hard Chess –A finite, mathematically well-defined search space –Limited number of moves and states –Grounded in explicit, unambiguous mathematical rules Human Language –Ambiguous, contextual and implicit –Grounded only in human cognition –Seemingly infinite number of ways to express the same meaning © 2014 IBM Corporation What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud Each dot – actual historical human Jeopardy! games Top human players are remarkably good. Winning Human Performance Grand Champion Human Performance 2007 QA Computer System 7 © 2014 IBM Corporation What Computers Find Easy (and Hard) (ln(12,546,798 * π)) ^ 2 / 34,567.46 = 0.00885 Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”, Owner Serial Number David Jones 45322190-AK Serial Number Type Invoice # 45322190-AK INV10895 LapTop David Jones David Jones 8 = IBM Confidential Invoice # Vendor Payment INV10895 MyBuy $104.56 Dave Jones David Jones ≠ © 2014 IBM Corporation What Computers Find Hard Computer programs are natively explicit, fast and exacting in their calculation over numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise. Person Birth Place A. Einstein ULM Where was X born? Structured Unstructured One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein´s birthplace. X ran this? Person Organization J. Welch GE If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE. © 2014 IBM Corporation Different Types Of Evidence: Keyword Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. In May, Gary arrived in India after he celebrated his anniversary in Portugal. arrived in celebrated In May 1898 Keyword Matching 400th anniversary Evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence Portugal celebrated In May Keyword Matching anniversary Keyword Matching in Portugal arrival in India explorer 10 Keyword Matching Keyword Matching India Gary © 2014 IBM Corporation Different Types Of Evidence: Deeper Evidence In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. On27th 27thMay May1498, 1498,Vasco Vascoda daGama Gama On Onlanded 27th May 1498, Vasco da Gama th of Kappad Beach Vasco da Onlanded the 27inin MayBeach 1498, Kappad landed in Kappad Beach Gama landed in Kappad Beach Search Far and Wide Explore many hypotheses celebrated Find Judge Evidence Portugal May 1898 400th anniversary landed in Many inference algorithms Temporal Reasoning 27th May 1498 Date Math Stronger evidence can be much harder to find and score. arrival in Statistical Paraphrasing Paraphrases India GeoSpatial Reasoning Kappad Beach Geo-KB Vasco da Gama explorer The evidence is still not 100% certain. 11 © 2014 IBM Corporation Some Basic Jeopardy! Clues This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 Category: ENDS IN "TH" Answer: coelacanth The type of thing being asked for is often indicated but can go from specific to very vague When hit by electrons, a phosphor gives off electromagnetic energy in this form Category: General Science Answer: light (or photons) Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting it Category: Lincoln Blogs Answer: his resignation 12 © 2014 IBM Corporation Broad Domain We do NOT attempt to anticipate all questions and build databases. We do NOT try to build a formal model of the world 3.00% 2.50% In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. 2.00% And for each these types 1000’s of different things may be asked. 1.50% 1.00% Even going for the head of the tail will barely make a dent 0.50% he film group capital woman song singer show composer title fruit planet there person language holiday color place son tree line product birds animals site lady province dog substance insect way founder senator form disease someone maker father words object writer novelist heroine dish post month vegetable sign countries hat bay 0.00% *13% are non-distinct (e.g, it, this, these or NA) Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text. 13 © 2014 IBM Corporation Automatic Learning For “Reading” Volumes of Text Syntactic Frames Semantic Frames Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8) © 2014 IBM Corporation Evaluating Possibilities And Their Evidence In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus. Many candidate answers (CAs) are generated from many different searches Organelle Vacuole Each possibility is evaluated according to different dimensions of evidence. Cytoplasm Just One piece of evidence is if the CA is of the right type. In this case a “liquid”. Plasma Mitochondria Blood … “Cytoplasm is a fluid surrounding the nucleus…” Is(“Cytoplasm”, “liquid”) = 0.2↑ Is(“organelle”, “liquid”) = 0.1 Is(“vacuole”, “liquid”) = 0.2 Is(“plasma”, “liquid”) = 0.7 Wordnet Is_a(Fluid, Liquid) ? Learned Is_a(Fluid, Liquid) yes. © 2014 IBM Corporation What It Takes to compete against Top Human Jeopardy! Players Our Analysis Reveals the Winner’s Cloud Each dot – actual historical human Jeopardy! games Winning Human Performance In 2007, we committed to making a Huge Leap! Grand Champion Human Performance Computers? Not So Good. 16 2007 QA Computer System © 2014 IBM Corporation DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture © 2014 IBM Corporation Deep QA: Incremental Progress in Precision and Confidence 6/2007-11/2010 Now Playing in the Winners Cloud 100% 90% 11/2010 80% 4/2010 Precision 70% 10/2009 5/2009 60% 12/2008 50% 8/2008 5/2008 40% 12/2007 30% 20% Baseline 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered © 2014 IBM Corporation One Jeopardy! question can take 2 hours on a single 2.6Ghz Core Optimized & Scaled out on 2880-Core IBM workload optimized POWER7 HPC using UIMA-AS, Watson answers in 2-6 seconds. Question 100s sources 1000’s of Pieces of Evidence 100s Possible Answers 100,000’s scores from many simultaneous Text Analysis Algorithms Multiple Interpretations Question & Topic Analysis Question Decomposition Hypothesis Generation Hypothesis Generation Hypothesis and Evidence Scoring Hypothesis and Evidence Scoring ... Synthesis Final Confidence Merging & Ranking Answer & Confidence © 2014 IBM Corporation Watson – a Workload Optimized System 90 x IBM Power 7501 servers 2880 POWER7 cores POWER7 3.55 GHz chip 500 GB per sec on-chip bandwidth 10 Gb Ethernet network 15 Terabytes of memory 20 Terabytes of disk, clustered Can operate at 80 Teraflops Runs IBM DeepQA software Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components Linux provides a scalable, open platform, optimized to exploit POWER7 performance 10 racks include servers, networking, shared disk system, cluster controllers 1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010 © 2014 IBM Corporation Watson’s real value proposition: Efficient decision support over unstructured (and structured) content Deeper Understanding, Higher Precision and Broader, Timely Coverage at lower costs Jeopardy! Challenge Shallow Understanding Low Precision Broad Coverage Unstructured Data Broad, rich in context Rapidly growing, current Invaluable yet under utilized Deeper Understanding but Brittle High Precision at High Cost Narrow Limited Coverage Structured Data Precise, explicit Narrow, expensive © 2014 IBM Corporation 21 Taking Watson beyond Jeopardy! Understanding Specific Questions Interacting Question-In/Answer-Out Explaining Precise Answers & Accurate Confidences Learning Batch Training Process The type of murmur associated with this condition is harsh, systolic, and increases in intensity with Valsalva From specific questions to rich, incomplete problem scenarios (e.g. EHR) Evidence analysis and look-ahead, drive interactive dialog to refine answers and evidence Move from quality answers to quality answers and evidence Answers, Corrections, Judgements Input, Responses Entire Medical Record Dialog Responses, Learning Questions Refined Answers, Follow-up Questions Rich Problem Scenarios Scale domain learning and adaptation rate and efficiency Interactive Dialog Teach Watson Comparative Evidence Profiles Continuous Training & Learning Process © 2014 IBM Corporation 22 Teach Watson technology generates questions to enhance understanding and acquire new knowledge – used in dialog or crowd-sourcing opportunities Watson considers… What would have to be true for seizure disorder to be correct? Patients with preexisting These disorders can stop the seizure disorder should not nerves and muscles in your use bupropion due to a esophagus from working higher-than-proportional right. This can cause food to increase in the possibility of move slowly or even get seizure as the dose is stuck in the esophagus. increased. Does contraindicates the use of bupropion mean should not use bupropion? Q Automatically generates Learning Questions… A What neurological condition contraindicates the use of bupropion? Does “contraindicates the use of” mean “should not use” in general? © 2014 IBM Corporation 23 Dialoguing to an answer Q: What diagnosis explains the patient’s condition? Present Factors Red, painful eye Blurred vision Family history of arthritis Behcet’s Disease Sarcoidosis Lyme Disease Medical Record Lyme Disease Sarcoidosis Behcet’s Disease Absent Factors Circular rash Fatigue Headache 63% 3% 1% 45% 32% 1% The first symptom of Lyme disease (also called Lyme disease can affect different Lyme’s disease) for about 50% of people is body a such as called the nervous small, redsystems, bull’s-eye rash, erythema joints, skin, and heart. Symptoms are migrans,system, at the site of an infected tick bite. … often described as happening three–stages Other early, acute Lyme symptoms are in flu-like (although not or everyone experiences all three): fatigue, achy muscles joints, fever, chills, stiff 1. A circular rash, typically within 1-2 weeks of neck, swollen glands, and a headache. infection, often is the first sign of infection. … Lyme disease is caused by the bacterium Borrelia 2. Along with the rash, a person may have fluburgdorferi and is transmitted to humans through the like symptoms such as swollen lymph bite of infected blacklegged ticks. Typical symptoms nodes, fatigue, headache, and muscle aches. include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans. © 2014 IBM Corporation 24 Navigating/Exploring the Information Space (with Watson) Hypothesis Reasoning Question Q Q H e Evidence H H e Q H Q Conclusion H e H e H e Q H e H e Q Q Report Q Q Q H Q H e H e Q H e Q Line of Inquiry H Q © 2014 IBM Corporation Analytics: the use of data and computation to make smart decisions Data Decision point Possible outcomes Data instances Historical Reports and queries on data aggregates Predictive models Option 2 Answers and confidence Simulated Feedback and learning Text Video, Images Audio © 2014 IBM Corporation Initial Watson Solutions Decisions Engagement Discovery Cases WellPoint Interactive Care Guide & Reviewer WellPoint Interactive Care & Insights for Oncology IBM Watson Engagement Advisor IBM Watson Discovery Advisor IBM Watson Case Advisor (Research) Health Plans & Providers Clinicians Customers Scientists Clinicians Streamline authorization of procedures Improve Diagnosis and Treatment Transform Customer Experiences Accelerate Research and Insights Summarize Cases and identify the relevant data © 2014 IBM Corporation IBM Watson Is Ushering In A New Era Of Computing System “Intelligence” Cognitive Programmatic Tabulation Punch cards Time card readers 1900 - ~1940 Search Deterministic Enterprise data Machine language Simple outputs ~1940 - 2010 Discovery Probabilistic Big Data Natural language Intelligent options 2011 © 2014 IBM Corporation THANK YOU © 2014 IBM Corporation
© Copyright 2026 Paperzz