The next 20 Minutes: An example for applying deep learning and why you should care. Beuth Hochschule für Technik Berlin – www.datexis.com Machine Reading is a Bonus Level for the Gaming Industry Distributors: Understand Users, Optimize Game Production Identify target groups with stronger focus Discover new trends first Gamers: Personalize Environment Find new games that match your preferences Get helpful recommendations before you buy Share your experience with the community Developers/Publishers: More targeted Offers/Sales Optimize game balance and flow based on user interaction Engange users to identify with your game Support: Lower Costs with Self-Services Provide better help with intelligent CRM dialogue systems Beuth Hochschule für Technik Berlin – www.datexis.com Giana Sisters Screenshots available under GFDL http://www.gnu.org/copyleft/fdl.html 5 How Gamers Communicate Large online communities contribute Discussion Forums Game Reviews Walk-Throughs, Hints & Cheats In-Game Communication Steam: >125M active users contributing 400M pieces of user-generated content about 4500 games http://www.anandtech.com/show/9003/valve-to-showcasesteamvr-hardware-steam-machines-more-at-gdc-2015/ Questions, Answers, Wishes, Trolls, Spam, … Beuth Hochschule für Technik Berlin – www.datexis.com 6 http://dbl43.beuth-hochschule.de/demo/tasty/ DEMO TASTY Beuth Hochschule für Technik Berlin – www.datexis.com What machines can read and when they fail NATURAL LANGUAGE PROCESSING AND DEEP LEARNING Beuth Hochschule für Technik Berlin – www.datexis.com Shallow vs. Deep Learner “Traditional” NLP investigated (not only) which features are necessary to learn a (NER) model. Local Context: pre- or suffix, n-grams, surface features etc. Global Features: Windows 200 words before or after Context Aggregation: How often did we see in a context a feature (such as nouns, digits etc.) Prediction History: Did we see an NER before? Unlabed Text: Which words are commonly in a large corpus observed? Dictionaries and Gazetteers Golem (Löser/Herta): Deep Learning: Maschinen, die wie Menschen lernen . http://www.golem.de/news/deep-learning-maschinen-die-wie-menschen-lernen-1510-116468.html Beuth Hochschule für Technik Berlin – www.datexis.com 13 So is deep learning somehow capturing all these features now automatically? Beuth Hochschule für Technik Berlin – www.datexis.com Specializing Deep Learning Networks We avoid pitfalls by stacking specialized Neuronal Networks Tri-Gram Char-Hashing “emulates” capability of a human to form syllables Surface features (SF) We lowercase all words but add surface features as vector. Word2Vec Embeddings capture paradigmatic local context Bi-Directional LSTM read sentence forward & backwards Correcting lables with CRF Beuth Hochschule für Technik Berlin – www.datexis.com 15 Word Encoding based on Context Distribution Word embedding models are trained to encode context for a given word, e.g. Word2Vec. The output is a vocabulary with a vector assigned to each item (e.g. size 300) Benefit: groups vectors of similar words in vectorspace, based on context Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space. Beuth Hochschule für Technik Berlin – www.datexis.com 16 Overcome Spelling Errors with Letter-Trigram Word Hashing Word2Vec returns no results on unseen words. We investigate a robust encoding technique: Tokenize a word into letter-trigrams Generate n-hot hashing vector from the trigrams Benefits: Can encode unseen idiosyncratic words Robust against spelling errors e.g. „Gianna“ Smaller word vector size (~15K instead of 250K) What about surface forms, e.g. capitalization? Don‘t ignore surface form features! But keep it simple: encode capitalization patterns as flags in vector e.g. all uppercase, first letter capitalized, mixed capitalization, punctuation, ... Huang et al. (2013): Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. Beuth Hochschule für Technik Berlin – www.datexis.com 17 Applications on News and Medical Domains We conduct Named Entity Recognition for news and medical texts. Models trained with only 4.000 gold labeled sentences Best results (91.1% F1 CoNLL03) with stacked bidirectional LSTMs +letter-trigram encoding + emb Strong generalization: can easily be retrained with annotated data from other domains Beuth Hochschule für Technik Berlin – www.datexis.com 18 Where should it go? Human brains work different Special + large general purpose areas “Data generation” from examples, ethics, interaction with other brains Multimodal (not only text or relational data) … http://www.publicdomainpictures.net/view-image.php?image=130733&picture=stick-kids-border Golem (Löser/Herta): Deep Learning: Maschinen, die wie Menschen lernen . http://www.golem.de/news/deep-learning-maschinen-die-wie-menschen-lernen-1510-116468.html Beuth Hochschule für Technik Berlin – www.datexis.com 19 So why should I care? Beuth Hochschule für Technik Berlin – www.datexis.com Selected Publications and Buzzwords Deep Learning & Text Mining Sebastian Arnold, Robert Dziuba, Alexander Löser: TASTY: Interactive Entity Linking As-You-Type. To appear in COLING 2016, Osaka, Japan. 12/2016. Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander Löser: Robust Named Entity Recognition in Idiosyncratic Domains. CoRR abs/1608.06757 (2016) Stanford Course: CS224d: Deep Learning for Natural Language Processing, Richard Socher et.al., http://cs224d.stanford.edu Andre Karpathy’s Blog http://karpathy.github.io/ 100 Years AI Stanford Report 2016 Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, and Astro Teller. "Artificial Intelligence and Life in 2030." One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel, Stanford University, Stanford, CA, September 2016. Doc: http://ai100.stanford.edu/2016-report. Accessed: September 6, 2016. Machine Intelligence , Shivon Zilis, 12/2015 https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-2-0 Frameworks: KERAS, NEON (now with Intel), Tensorflow, Torch7, Caffe, DeepLearning4J, OpenAI Gym (Benchmarks) Beuth Hochschule für Technik Berlin – www.datexis.com 24 Copy Right Giana Sisters Screenshots available under GFDL http://www.gnu.org/copyleft/fdl.html Beuth Hochschule für Technik Berlin – www.datexis.com 27 Word Sequence Labeling with Stacked Bidirectional LSTMs We train the LSTM to recognize entities in sentences. Input sentence as a sequence of words w = (w0, w1, ... , wn) Encode words as vectors xt = encode(wt) LSTM: stacked layers for word and label context, reading sentence forward and backwards Output Begin, Inside, Outside labels: yt = softmax(Wht + bt) Training: Fit weights of LSTM layers with stochastic gradient descent and backpropagation through time. Beuth Hochschule für Technik Berlin – www.datexis.com 28 Contextual Learning with Long Short-Term Memory (LSTM) LSTMs are neural networks with recurrent connections. Cells can „remember“ time steps Gates control the ability to „forget“ Lipton and Berkowitz (2015): A Critical Review of Recurrent Neural Networks for Sequence Learning. Beuth Hochschule für Technik Berlin – www.datexis.com 29
© Copyright 2026 Paperzz