Vortrag downloaden

The next 20 Minutes: An example for applying deep learning and why you should care.
Beuth Hochschule für Technik Berlin – www.datexis.com
Machine Reading is a Bonus Level for the Gaming Industry
Distributors: Understand Users, Optimize Game Production
Identify target groups with stronger focus
Discover new trends first
Gamers: Personalize Environment
Find new games that match your preferences
Get helpful recommendations before you buy
Share your experience with the community
Developers/Publishers: More targeted Offers/Sales
Optimize game balance and flow based on user interaction
Engange users to identify with your game
Support: Lower Costs with Self-Services
Provide better help with intelligent CRM dialogue systems
Beuth Hochschule für Technik Berlin – www.datexis.com
Giana Sisters Screenshots available
under GFDL http://www.gnu.org/copyleft/fdl.html
5
How Gamers Communicate
Large online communities contribute
Discussion Forums
Game Reviews
Walk-Throughs, Hints & Cheats
In-Game Communication
Steam: >125M active users contributing
400M pieces of user-generated content
about 4500 games
http://www.anandtech.com/show/9003/valve-to-showcasesteamvr-hardware-steam-machines-more-at-gdc-2015/
Questions, Answers, Wishes, Trolls, Spam, …
Beuth Hochschule für Technik Berlin – www.datexis.com
6
http://dbl43.beuth-hochschule.de/demo/tasty/
DEMO TASTY
Beuth Hochschule für Technik Berlin – www.datexis.com
What machines can read and when they fail
NATURAL LANGUAGE PROCESSING
AND DEEP LEARNING
Beuth Hochschule für Technik Berlin – www.datexis.com
Shallow vs. Deep Learner
“Traditional” NLP investigated (not only) which
features are necessary to learn a (NER) model.
Local Context: pre- or suffix, n-grams, surface
features etc.
Global Features: Windows 200 words before or
after
Context Aggregation: How often did we see in a
context a feature (such as nouns, digits etc.)
Prediction History: Did we see an NER before?
Unlabed Text: Which words are commonly in a
large corpus observed?
Dictionaries and Gazetteers
Golem (Löser/Herta): Deep Learning: Maschinen, die wie Menschen lernen .
http://www.golem.de/news/deep-learning-maschinen-die-wie-menschen-lernen-1510-116468.html
Beuth Hochschule für Technik Berlin – www.datexis.com
13
So is deep learning somehow capturing all these features now automatically?
Beuth Hochschule für Technik Berlin – www.datexis.com
Specializing Deep Learning Networks
We avoid pitfalls by stacking
specialized Neuronal Networks
Tri-Gram Char-Hashing
“emulates” capability of a
human to form syllables
Surface features (SF) We
lowercase all words but add
surface features as vector.
Word2Vec Embeddings
capture paradigmatic local
context
Bi-Directional LSTM read
sentence forward & backwards
Correcting lables with CRF
Beuth Hochschule für Technik Berlin – www.datexis.com
15
Word Encoding based on Context Distribution
Word embedding models are trained to encode context for a given word, e.g. Word2Vec.
The output is a vocabulary with a vector
assigned to each item (e.g. size 300)
Benefit: groups vectors of similar words
in vectorspace, based on context
Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space.
Beuth Hochschule für Technik Berlin – www.datexis.com
16
Overcome Spelling Errors with Letter-Trigram Word Hashing
Word2Vec returns no results on unseen words. We investigate a robust encoding technique:
Tokenize a word into letter-trigrams
Generate n-hot hashing vector from the trigrams
Benefits:
Can encode unseen idiosyncratic words
Robust against spelling errors e.g. „Gianna“
Smaller word vector size (~15K instead of 250K)
What about surface forms, e.g. capitalization?
Don‘t ignore surface form features!
But keep it simple: encode capitalization patterns as flags in vector
e.g. all uppercase, first letter capitalized, mixed capitalization, punctuation, ...
Huang et al. (2013): Learning Deep Structured Semantic Models for Web Search using Clickthrough Data.
Beuth Hochschule für Technik Berlin – www.datexis.com
17
Applications on News and Medical Domains
We conduct Named Entity Recognition for news and medical texts.
Models trained with only 4.000 gold labeled sentences
Best results (91.1% F1 CoNLL03) with stacked bidirectional LSTMs +letter-trigram encoding + emb
Strong generalization: can easily be retrained with annotated data from other domains
Beuth Hochschule für Technik Berlin – www.datexis.com
18
Where should it go?
Human brains work different
Special + large general purpose areas
“Data generation” from examples, ethics,
interaction with other brains
Multimodal (not only text or relational data)
…
http://www.publicdomainpictures.net/view-image.php?image=130733&picture=stick-kids-border
Golem (Löser/Herta): Deep Learning: Maschinen, die wie Menschen lernen .
http://www.golem.de/news/deep-learning-maschinen-die-wie-menschen-lernen-1510-116468.html
Beuth Hochschule für Technik Berlin – www.datexis.com
19
So why should I care?
Beuth Hochschule für Technik Berlin – www.datexis.com
Selected Publications and Buzzwords
Deep Learning & Text Mining
Sebastian Arnold, Robert Dziuba, Alexander Löser: TASTY: Interactive Entity Linking As-You-Type. To appear in COLING 2016,
Osaka, Japan. 12/2016.
Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander Löser: Robust Named Entity Recognition in Idiosyncratic Domains. CoRR
abs/1608.06757 (2016)
Stanford Course: CS224d: Deep Learning for Natural Language Processing, Richard Socher et.al., http://cs224d.stanford.edu
Andre Karpathy’s Blog http://karpathy.github.io/
100 Years AI Stanford Report 2016
Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece
Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, and Astro
Teller. "Artificial Intelligence and Life in 2030." One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel,
Stanford University, Stanford, CA, September 2016. Doc: http://ai100.stanford.edu/2016-report. Accessed: September 6, 2016.
Machine Intelligence , Shivon Zilis, 12/2015 https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-2-0
Frameworks: KERAS, NEON (now with Intel), Tensorflow, Torch7, Caffe, DeepLearning4J, OpenAI Gym (Benchmarks)
Beuth Hochschule für Technik Berlin – www.datexis.com
24
Copy Right
Giana Sisters Screenshots available under GFDL http://www.gnu.org/copyleft/fdl.html
Beuth Hochschule für Technik Berlin – www.datexis.com
27
Word Sequence Labeling with Stacked Bidirectional LSTMs
We train the LSTM to recognize entities in sentences.
Input sentence as a sequence of words
w = (w0, w1, ... , wn)
Encode words as vectors
xt = encode(wt)
LSTM: stacked layers for word and label context,
reading sentence forward and backwards
Output Begin, Inside, Outside labels:
yt = softmax(Wht + bt)
Training: Fit weights of LSTM layers with stochastic
gradient descent and backpropagation through time.
Beuth Hochschule für Technik Berlin – www.datexis.com
28
Contextual Learning with Long Short-Term Memory (LSTM)
LSTMs are neural networks with recurrent connections.
Cells can „remember“ time steps
Gates control the ability to „forget“
Lipton and Berkowitz (2015): A Critical Review of Recurrent Neural Networks for Sequence Learning.
Beuth Hochschule für Technik Berlin – www.datexis.com
29