Human Learning vs. Machine Learning: Decision Making under

185.A83 Machine Learning for Health Informatics
2016S, VU, 2.0 h, 3.0 ECTS
Week 18 ‐ 04.05.2016 17:00 ‐ 20:00
Human Learning vs. Machine Learning:
Decision Making under Uncertainty
and Reinforcement Learning
a.holzinger@hci‐kdd.org
http://hci‐kdd.org/machine‐learning‐for‐health‐informatics‐course
Holzinger Group
1
Machine Learning Health 04 ML needs a concerted effort fostering integrated research
http://hci‐kdd.org/international‐expert‐network
Holzinger, A. 2014. Trends in Interactive Knowledge Discovery for Personalized Medicine: Cognitive Science meets Machine Learning. IEEE Intelligent Informatics Bulletin, 15, (1), 6‐14.
Holzinger Group
2
Machine Learning Health 04 Standard Textbooks for RL
 Sutton, R. S. & Barto, A. G. 1998. Reinforcement learning: An introduction, Cambridge, MIT press
http://webdocs.cs.ualberta.ca/~sutton/book/the
‐book.html
 Powell, W. B. 2007. Approximate Dynamic Programming: Solving the curses of dimensionality, John Wiley & Sons
http://adp.princeton.edu/
 Szepesvári, C. 2010. Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, edited by R.J. Brachman and T. G. Dietterich, Morgan & Claypool.
http://www.ualberta.ca/~szepesva/RLBook.html
Holzinger Group
3
Machine Learning Health 04 Red thread through this lecture








1) What is RL? Why is it interesting?
2) Decision Making under uncertainty
3) Roots of RL
4) Cognitive Science of RL
5) The Anatomy of an RL agent
6) Example: Multi‐Armed Bandits
7) RL‐Applications in health
8) Future Outlook
Holzinger Group
4
Machine Learning Health 04 Quiz (Supervised S, Unsupervised U, Reinforcement R)
1) Given ; find that map a new
(S/U/R?)
2) Finding similar points in high‐dim (S/U/R)?
3) Learning from interaction to achieve a goal (S/U/R)?
4) Human expert provides examples (S/U/R)?
5) Automatic learning by interaction with environment (S/U/R)?
6) The agent gets a scalar reward from the environment (S/U/R)?
Solution on top of page 6 Holzinger Group
5
Machine Learning Health 04 1) What is RL?
Why is it interesting?
“I want to understand intelligence and how minds work. My tools are computer science, statistics, mathematics, and plenty of thinking” Nando de Freitas, Univ. Oxford and Google.”
Holzinger Group
6
Machine Learning Health 04 Remember: Three main types of Machine Learning
 I) Supervised learning (classification)





1‐S; 2‐U; 3‐R; 4‐S; 5‐R; 6‐R Given , pairs; find a that map a new to a proper Regression, logistic regression, classification
Expert provides examples e.g. classification of clinical images
Disadvantage: Supervision can be expensive
 II) Unsupervised learning (clustering)





Given (features only), find that gives you a description of Find similar points in high‐dim E.g. clustering of medical images based on their content
Disadvantage: Not necessarily task relevant
 III) Reinforcement learning





more general than supervised/unsupervised learning
learn from interaction to achieve a goal
Learning by direct interaction with environment (automatic ML)
Disadvantage: broad difficult approach, problem with high‐dim data
Holzinger Group
7
Machine Learning Health 04 Why is RL interesting?
 Reinforcement Learning is the oldest approach, with the longest history and can provide insight into understanding human learning [1]  RL is the “AI problem in the microcosm” [2]
 Future opportunities are in Multi‐Agent RL (MARL), Multi‐Task Learning (MTL), Generalization and Transfer‐Learning [3], [4].
[1] Turing, A. M. 1950. Computing machinery and intelligence. Mind, 59, (236), 433‐460.
[2] Littman, M. L. 2015. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521, (7553), 445‐451, doi:10.1038/nature14540. [3] Taylor, M. E. & Stone, P. 2009. Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10, 1633‐1685.
[4] Pan, S. J. & Yang, Q. A. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, (10), 1345‐1359, doi:10.1109/tkde.2009.191.
Holzinger Group
8
Machine Learning Health 04 RL is key for ML according to Demis Hassabis
https://www.youtube.com/watch?v=XAbLn66iHcQ&index=14&list=PL2ovtN0KdWZiomydY2yWhh9‐QOn0GvrCR
Go to time 1:33:00
Holzinger Group
9
Machine Learning Health 04 A very recent approach is combining RL with DL
 Combination of deep neural networks with reinforcement learning = Deep Reinforcement Learning
 Weakness of classical RL is that it is not good with high‐
dimensional sensory inputs
 Advantage of DRL: Learn to act from high‐dimensional sensory inputs
Volodymyr Mnih et al (2015), https://sites.google.com/a/deepmind.com/dqn/
https://www.youtube.com/watch?v=iqXKQf2BOSE
Holzinger Group
10
Machine Learning Health 04 Learning to play an Atari Game
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human‐level control through deep reinforcement learning. Nature, 518, (7540), 529‐533, doi:10.1038/nature14236
Holzinger Group
11
Machine Learning Health 04 Example Video Atari Game
Holzinger Group
12
Machine Learning Health 04 Scientists in this area ‐ selection ‐ incomplete! Status as of 03.04.2016
Holzinger Group
13
Machine Learning Health 04 Goal: Select actions to maximize total future reward
Image credit to David Silver, UCL
Holzinger Group
14
Machine Learning Health 04 Standard RL‐Agent Model goes back to Cybernetics 1950
Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237‐285.
Holzinger Group
15
Machine Learning Health 04 RL‐Agent seeks to maximize rewards
Intelligent behavior arises from the actions of an individual seeking to maximize its received reward signals in a complex and changing world
Sutton, R. S. & Barto, A. G. 1998. Reinforcement learning: An introduction, Cambridge MIT press
Holzinger Group
16
Machine Learning Health 04 RL – Types of Feedback (crucial!)
 Supervised: Learner told best
 Exhaustive: Learner shown every possible  One‐shot: Current independent of past Littman, M. L. 2015. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521, (7553), 445‐451.
Holzinger Group
17
Machine Learning Health 04 Problem Formulation in a MDP
 Markov decision processes specify setting and tasks
 Planning methods use knowledge of and to compute a good policy
 Markov decision process model captures both sequential feedback and the more specific one‐shot ʹ
is independent of both feedback (when and Holzinger Group
Littman, M. L. 2015. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521, (7553), 445‐451.
18
Machine Learning Health 04 Agent observes environmental state at each step t






1) Overserves
2) Executes
3) Receives Reward
Executes action :
= Agent state = environment state = information state
 Markov decision process (MDP)
Image credit to David Silver, UCL
Holzinger Group
19
Machine Learning Health 04 Environmental State is the current representation
 i.e. whatever data the environment uses to pick the next observation/reward
 The environment state is not usually visible to the agent  Even if is visible, it may contain irrelevant information
 A State Holzinger Group
is Markov iff:
20
Machine Learning Health 04 Agent State is the agents internal representation
 i.e. whatever information the agent uses to pick the next action
 it is the information used by reinforcement learning algorithms
 It can be any function of history:

Holzinger Group
21
Machine Learning Health 04 Components of RL Agents and Policy of Agents
 RL agent components:
 Policy: agent's behaviour function
 Value function: how good is each state and/or action
 Model: agent's representation of the environment
 Policy as the agent's behaviour
 is a map from state to action, e.g.
 Deterministic policy: a = (s )
 Stochastic policy: (ajs ) = P[At = ajS t = s  Value function is prediction of future reward:
Holzinger Group
22
Machine Learning Health 04 What if the environment is only partially observable?
 Partial observability: when agent only indirectly observes environment (robot which is not aware of its current location; good example: Poker play: only public cards are observable for the agent):
 Formally this is a partially observable Markov decision process (POMDP):
 Agent must construct its own state representation ,
for example:
Holzinger Group
23
Machine Learning Health 04 RL is multi‐disciplinary and a bridge within ML
Decision Making
Reinforcement Learning
Computer Science
Cognitive Science
Economics
Mathematics
under uncertainty
Holzinger Group
24
Machine Learning Health 04 2) Decision Making under uncertainty
Holzinger Group
25
Machine Learning Health 04 Decision Making is central in Health Informatics
Source: Cisco (2008). Cisco Health Presence Trial at Aberdeen Royal Infirmary in Scotland
Holzinger Group
26
Machine Learning Health 04 Reasoning Foundations of Medical Diagnosis
Holzinger Group
27
Machine Learning Health 04 Clinical Medicine is Decision Making!
Hersh, W. (2010) Information Retrieval: A Health and Biomedical Perspective. New York, Springer.
Holzinger Group
28
Machine Learning Health 04 Human Decision Making
Wickens, C. D. (1984) Engineering psychology and human performance. Columbus (OH), Charles Merrill.
Holzinger Group
29
Machine Learning Health 04 Medical Action …
is permanent decision making Holzinger Group
30
Machine Learning Health 04 History of DSS is a history of artificial intelligence
E. Feigenbaum, J. Lederberg, B. Buchanan, E. Shortliffe
Rheingold, H. (1985) Tools for thought: the history and future of mind‐expanding technology. New York, Simon & Schuster.
Buchanan, B. G. & Feigenbaum, E. A. (1978) DENDRAL and META‐DENDRAL: their applications domain. Artificial Intelligence, 11, 1978, 5‐24.
Holzinger Group
31
Machine Learning Health 04 3) Roots of RL
Holzinger Group
32
Machine Learning Health 04 Pre‐Historical Issues of RL
Ivan P. Pavlov (1849‐1936) 1904 Nobel Prize Physiology/Medicine
Holzinger Group
Edward L. Thorndike (1874‐1949)
1911 Law of Effect
33
Burrhus F. Skinner
(1904‐1990)
1938 Operant Conditioning
Machine Learning Health 04 Classical Experiment with Pavlov’s Dog
Holzinger Group
34
Machine Learning Health 04 Back to the rats … roots 
WILL PRESS
LIVER FOR
FOOD
 What if agent state = last 3 items in sequence?
 What if agent state = counts for lights, bells and levers?
 What if agent state = complete sequence?
Holzinger Group
35
Machine Learning Health 04 Historical Issues of RL in Computer Science
https://webdocs.cs.ualberta.ca/~sutton/book/the‐book.html
Turing, A. M. 1950. Computing machinery and intelligence. Mind, 59, (236), 433‐460.
Richard Bellman 1961. Adaptive control processes: a guided tour. Princeton.
Watkins, C. J. & Dayan, P. 1992. Q‐learning. Machine learning, 8, (3‐
4), 279‐292.
Sutton, R. S. & Barto, A. G. 1998. Reinforcement learning: An introduction, Cambridge, MIT press.
Littman, M. L. 2015. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521, (7553), 445‐
451.
Excellent Review Paper:
Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237‐285
Holzinger Group
36
Machine Learning Health 04 This is still state‐of‐the‐art in 2015
Chase, H. W., Kumar, P., Eickhoff, S. B. & Dombrovski, A. Y. 2015. Reinforcement learning models and their neural correlates: An activation likelihood estimation meta‐analysis. Cognitive, Affective & Behavioral Neuroscience, 15, (2), 435‐459, doi:10.3758/s13415‐015‐0338‐7.
Holzinger Group
37
Machine Learning Health 04 2015 – the year of reinforcement learning 
Deep Q‐networks (Q‐Learning is a model‐free RL approach) have successfully played Atari 2600 games at expert human levels
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human‐level control through deep reinforcement learning. Nature, 518, (7540), 529‐533, doi:10.1038/nature14236
Holzinger Group
38
Machine Learning Health 04 Typical Reinforcement Learning Applications: aML
httpimages.computerhisto
ry.orgtimelinetimeline_ai.r
obotics_1939_elektro.jpg
1985
http://cyberneticzoo.com/robot‐time‐line/
Holzinger Group
39
Machine Learning Health 04 Autonomous Robots
http://www.neurotechnology.com/res/Robot2.jpg
Holzinger Group
40
Machine Learning Health 04 Is your home a complex domain?
https://royalsociety.org/events/2015/05/breakthrough‐science‐technologies‐machine‐learning
Kober, J., Bagnell, J. A. & Peters, J. 2013. Reinforcement Learning in Robotics: A Survey. The International Journal of Robotics Research.
Holzinger Group
41
Machine Learning Health 04 This approach shall work here as well?
Nogrady, B. 2015. Q&A: Declan Murphy. Nature, 528, (7582), S132‐S133, doi:10.1038/528S132a.
Holzinger Group
42
Machine Learning Health 04 4) Cognitive Science of RL
Human Information Processing
Holzinger Group
43
Machine Learning Health 04 How does our mind get so much out of it …
Salakhutdinov, R., Tenenbaum, J. & Torralba, A. 2012. One‐shot learning with a hierarchical nonparametric Bayesian model. Journal of Machine Learning Research, 27, 195‐207.
Holzinger Group
44
Machine Learning Health 04 Learning words for objects – concepts from examples
Quaxl
Quaxl
Quaxl
Salakhutdinov, R., Tenenbaum, J. & Torralba, A. 2012. One‐shot learning with a hierarchical nonparametric Bayesian model. Journal of Machine Learning Research, 27, 195‐207.
Holzinger Group
45
Machine Learning Health 04 How do we understand our world …
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. 2011. How to grow a mind: Statistics, structure, and abstraction. Science, 331, (6022), 1279‐1285.
Holzinger Group
46
Machine Learning Health 04 One of the unsolved problems in human concept learning
 which is highly relevant for ML research, concerns the factors that determine the subjective difficulty of concepts:  Why are some concepts psychologically extremely simple and easy to learn,  while others seem to be extremely difficult, complex, or even incoherent?  These questions have been studied since the 1960s but are still unanswered … Feldman, J. 2000. Minimization of Boolean complexity in human concept learning. Nature, 407, (6804), 630‐633, doi:10.1038/35036586.
Holzinger Group
47
Machine Learning Health 04 A few certainties
 Cognition as probabilistic inference
 Visual perception, language acquisition, motor learning, associative learning, memory, attention, categorization, reasoning, causal inference, decision making, theory of mind
 Learning concepts from examples
 Learning and applying intuitive theories
(balancing complexity vs. fit)
Holzinger Group
48
Machine Learning Health 04 Modeling basic cognitive capacities as intuitive Bayes






Similarity
Representativeness and evidential support
Causal judgement Coincidences and causal discovery
Diagnostic inference
Predicting the future
Tenenbaum, J. B., Griffiths, T. L. & Kemp, C. 2006. Theory‐based Bayesian models of inductive learning and reasoning. Trends in cognitive sciences, 10, (7), 309‐318.
Holzinger Group
49
Machine Learning Health 04 Drawn by Human or Machine Learning Algorithm?
Holzinger Group
50
Machine Learning Health 04 Human‐Level concept learning – probabilistic induction
A Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people
Holzinger Group
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. 2015. Human‐level concept learning through probabilistic program induction. Science, 350, (6266), 1332‐1338, 51
doi:10.1126/science.aab3050.
Machine Learning Health 04 How does our mind get so much out of so little?
Holzinger Group
52
Machine Learning Health 04 Human Information Processing Model (A&S)
Atkinson, R. C. & Shiffrin, R. M. (1971) The control processes of short‐term memory (Technical Report 173, April 19, 1971). Stanford, Institute for Mathematical Studies in the Social Sciences, Stanford University.
Holzinger Group
53
Machine Learning Health 04 General Model of Human Information Processing
Physics
Perception
Cognition
Motorics
Wickens, C., Lee, J., Liu, Y. & Gordon‐Becker, S. (2004) Introduction to Human Factors Engineering: Second Edition. Upper Saddle River (NJ), Prentice‐Hall.
Holzinger Group
54
Machine Learning Health 04 Alternative Model: Baddeley ‐ Central Executive
Quinette, P., Guillery, B., Desgranges, B., de la Sayette, V., Viader, F. & Eustache, F. (2003) Working memory and executive functions in transient global amnesia. Brain, 126, 9, 1917‐1934.
Holzinger Group
55
Machine Learning Health 04 Neural Basis for the “Central Executive System” D'Esposito, M., Detre, J. A., Alsop, D. C., Shin, R. K., Atlas, S. & Grossman, M. (1995) The neural basis of the central executive system of working memory. Nature, 378, 6554, 279‐281.
Holzinger Group
56
Machine Learning Health 04 Slide 7‐14 Central Executive – Selected Attention
Cowan, N. (1988) Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information‐processing system. Psychological Bulletin, 104, 2, 163.
Holzinger Group
57
Machine Learning Health 04 Selective Attention Test
Note: The Test does NOT properly work if you know it in advance or if you do not concentrate on counting
Simons, D. J. & Chabris, C. F. 1999. Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception, 28, (9), 1059‐1074.
Holzinger Group
58
Machine Learning Health 04 Human Attention is central for decision making
Wickens, C. D. (1984) Engineering psychology and human performance. Columbus (OH), Charles Merrill.
Holzinger Group
59
Machine Learning Health 04 5) The Anatomy of an RL Agent
Holzinger Group
60
Machine Learning Health 04 Why is this relevant for health informatics?
 Decision‐making under uncertainty
 Limited knowledge of the domain environment
 Unknown outcome –
unknown reward
 Partial or unreliable access to “databases of interaction”
Russell, S. J. & Norvig, P. 2009. Artificial intelligence: a modern approach (3rd edition), Prentice Hall, Chapter 16, 17: Making Simple Decisions and Making Complex Decisions
Holzinger Group
61
Machine Learning Health 04 Decision Making under uncertainty
agent
delay
critic
environment
<position, speed>
<carpet, chair>
<new position, new speed>, advancement
Control
Policy
Value
Function
Image credit to Allessandro Lazaric
Holzinger Group
62
Machine Learning Health 04 Taxonomy of RL agents 1/2: 1) Components
 Policy: agent's behaviour function e.g. stochastic policy  Value function: how good is each state and/or action
e.g.  Model: agent's representation of the environment
predicts the next state; the next reward
Holzinger Group
63
Machine Learning Health 04 Taxonomy of agents 2/2 2) Categorization
 1) Value‐Based (no policy, only value function)
 2) Policy‐Based (no value function, only policy)
 3) Actor‐Critic (both)
 4) Model free
(and/or) – but no model
 5) Model‐based
(and/or – and model)
Holzinger Group
64
Machine Learning Health 04 Maze Example: Policy
Holzinger Group
65
Machine Learning Health 04 Maze Example: Value Function
Holzinger Group
66
Machine Learning Health 04 Maze Example: Model
Holzinger Group
67
Machine Learning Health 04 Principle of a RL algorithm is simple
Time steps 1
2
 Observe the state  Take an action (problem of exploration and exploitation)
 Observe next state and earn reward ,
 Update the policy and the value function ,
Holzinger Group
68
Machine Learning Health 04 Example RL Algorithms












Temporal difference learning (1988)
Q‐learning (1998)
BayesRL (2002)
RMAX (2002)
CBPI (2002)
PEGASUS (2002)
Least‐Squares Policy Iteration (2003)
Fitted Q‐Iteration (2005)
GTD (2009)
UCRL (2010)
REPS (2010)
DQN (2014)
Holzinger Group
69
Machine Learning Health 04 6) Example: Multi‐Armed Bandits (MAB)
Image credit to http://research.microsoft.com/en‐us/projects/bandits
Holzinger Group
70
Machine Learning Health 04 Principle of the Multi‐Armed Bandits problem (1/2)




There are slot‐machines (“einarmige Banditen”)
Each machine returns a reward Challenge: The machine parameter is unknown
Which arm of a ‐slot machine should a gambler pull to maximize his cumulative reward over a sequence of trials? (stochastic setting or adversarial setting)
Image credit and more information: http://research.microsoft.com/en‐us/projects/bandits
Holzinger Group
71
Machine Learning Health 04 Principle of the Multi‐Armed Bandits problem (2/2)
 Let ∈ 1, … , be the choice of a machine at time
 Let ∈ be the outcome with a mean of  Now, the given policy maps all history to a new choice:
 The problem: Find a policy that max
 Now, two effects appear when choosing such machine:
 You collect more data about the machine (=knowledge)
 You collect reward
 Exploration and Exploitation
 Exploration: Choose the next action to  Exploitation: Choose the next action to  models an agent that simultaneously attempts to acquire new knowledge (called "exploration") and optimize his or her decisions based on existing knowledge (called "exploitation"). The agent attempts to balance these competing tasks in order to maximize total value over the period of time considered. More information: http://research.microsoft.com/en‐us/projects/bandits
Holzinger Group
72
Machine Learning Health 04 MAP‐Principle: “Optimism in the face of uncertainty”
Exploration
the higher the (theoretical) uncertainty the higher the chance to select the action
Exploitation
the higher the (estimated) reward the higher the chance to select the action
Auer, P., Cesa‐Bianchi, N. & Fischer, P. 2002. Finite‐time analysis of the multiarmed bandit problem. Machine learning, 47, (2‐3), 235‐256.
Holzinger Group
73
‐
Machine Learning Health 04 Knowledge Representation in MAB
 Knowledge can be represented in two ways:
 1) as full history
or
 2) as belief
where are the unknown parameters of all machines
The process can be modelled as belief MDP:
Holzinger Group
74
Machine Learning Health 04 The optimal policies can be modelled as belief MDP
Poupart, P., Vlassis, N., Hoey, J. & Regan, K. An analytic solution to discrete Bayesian reinforcement learning. Proceedings of the 23rd international conference on Machine learning, 2006. ACM, 697‐704.
Holzinger Group
75
Machine Learning Health 04 Applications of MABs
 Clinical trials: potential treatments for a disease to select from new patients or patient category at each round, see:
W. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bulletin of the American Mathematics Society, vol. 25, pp. 285–294, 1933.
 Games: Different moves at each round, e.g. GO
 Adaptive routing: finding alternative paths, also finding alternative roads for driving from A to B
 Advertisement placements: selection of an ad to display at the Webpage out of a finite set which can vary over time, for each new Web page visitor
Holzinger Group
76
Machine Learning Health 04 7) Applications in Health
Holzinger Group
77
Machine Learning Health 04 Example for Health
https://www.youtube.com/watch?v=20sj7rRfzm4
Holzinger Group
78
Machine Learning Health 04 Kusy, M. & Zajdel, R. 2014. Probabilistic neural network training procedure based on Q(0)‐
learning algorithm in medical data classification. Applied Intelligence, 41, (3), 837‐854, doi:10.1007/s10489‐014‐0562‐9.
Holzinger Group
79
Machine Learning Health 04 Holzinger Group
80
Machine Learning Health 04 Example
Bothe, M. K., Dickens, L., Reichel, K., Tellmann, A., Ellger, B., Westphal, M. & Faisal, A. A. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Review of Medical Devices, 10, (5), 661‐673, doi:10.1586/17434440.2013.827515.
Holzinger Group
81
Machine Learning Health 04 Example
Bothe, M. K., Dickens, L., Reichel, K., Tellmann, A., Ellger, B., Westphal, M. & Faisal, A. A. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Review of Medical Devices, 10, (5), 661‐673, doi:10.1586/17434440.2013.827515.
Holzinger Group
82
Machine Learning Health 04 Example
Bothe, M. K., Dickens, L., Reichel, K., Tellmann, A., Ellger, B., Westphal, M. & Faisal, A. A. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert Review of Medical Devices, 10, (5), 661‐673, doi:10.1586/17434440.2013.827515.
Holzinger Group
83
Machine Learning Health 04 Example of a work with MABs
Joutsa et al. Mesolimbic dopamine release is linked to symptom severity in pathological gambling. NeuroImage, 60, (4), 1992‐1999, doi.org/10.1016/j.neuroimage.2012.02.006.
Holzinger Group
84
Machine Learning Health 04 8) Future Outlook
Holzinger Group
85
Machine Learning Health 04 Grand Challenge: Transfer Learning
learning
Knowledge transfer
 To design algorithms able to learn from experience and to transfer knowledge across different tasks and domains to improve their learning performance
Holzinger Group
86
Machine Learning Health 04 Example for Transfer Learning
V. Mnih et al., “Playing Atari with Deep Reinforcement Learning”, Nature (2015)
Rich Caruana, “Multi‐task Learning”, MLJ (1998)
Holzinger Group
87
Machine Learning Health 04 Pan, S. J. & Yang, Q. A. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, (10), 1345‐1359, doi:10.1109/tkde.2009.191.
Holzinger Group
88
Machine Learning Health 04 Transfer Learning is studied for more than 100 years
 Thorndike & Woodworth (1901) explored how individuals would transfer in one context to another context that share similar characteristics:  They explored how individuals would transfer learning in one context to another, similar context  or how "improvement in one mental function" could influence a related one.  Their theory implied that transfer of learning depends on how similar the learning task and transfer tasks are,  or where "identical elements are concerned in the influencing and influenced function", now known as the identical element theory.
 Today example: C++ ‐> Java; Python ‐> Julia
 Mathematics ‐> Computer Science
 Physics ‐> Economics
Holzinger Group
89
Machine Learning Health 04 Domain and Task
Pan, S. J. & Yang, Q. A. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, (10), 1345‐1359, doi:10.1109/tkde.2009.191.
Holzinger Group
90
Machine Learning Health 04 Transfer learning settings
Heterogeneous Transfer Learning
Heterogeneous
Transfer Learning
Feature space
Tasks
Homogeneous
Identical
Different
Single‐Task Transfer Learning
Inductive Transfer Learning
Focus on optimizing a target task
Domain difference is caused by sample bias
Domain difference is caused by feature representations
Tasks are learned simultaneously
Sample Selection Bias / Covariate Shift
Holzinger Group
Domain Adaption
91
Multi‐Task Learning
Machine Learning Health 04 Tasks
Identical
Different
Assumption
Inductive Transfer Learning
Single‐Task Transfer Learning
Domain difference is caused by sample bias
Domain difference is caused by feature representations
Focus on optimizing a target task
Tasks are learned simultaneously
Sample Selection Bias / Covariate Shift
Holzinger Group
Domain Adaption
92
Multi‐Task Learning
Machine Learning Health 04 Is this a complex domain?
https://royalsociety.org/events/2015/05/breakthrough‐science‐technologies‐machine‐learning
Holzinger Group
93
Machine Learning Health 04 Thank you! Holzinger Group
94
Machine Learning Health 04 Sample Questions  Why is RL for us in health informatics interesting?
 What is a medical doctor in daily clinical routine doing most of the time?
 Please explain the human decision making process on the basis of the model by Wickens (1984) !
 What is the underlying principle of DQN?
 What is probabilistic inference? Give an example!
 Why is selective attention so important?
 Please describe the “anatomy” of a RL‐agent!
 What does policy‐based RL‐agent mean? Give an example!
 What is the underlying principle of a MAB? Why is it interesting for health informatics?
Holzinger Group
95
Machine Learning Health 04 Keywords










Reinforcement Learning
Trial‐and‐Error Learning
Markov‐Decision‐Process
Utility‐based agent
Q‐Learning
Passive reinforcement learning
Adaptive dynamic programming
Temporal‐difference learning
Active reinforcement learning
Bandit problems
Holzinger Group
96
Machine Learning Health 04 Advance Organizer (1)
 RL:= general problem, inspired by behaviorist psychology; how software agents learn to make decisions from success and failure, from reward and punishment in an environment –
aiming to maximize cumulative reward.  RL is studied in game theory, control theory, operations research, information theory, simulation‐based optimization, multi‐agent systems, swarm intelligence, genetic algorithms.  Aka: approximate dynamic programming.  The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

Holzinger Group
97
Machine Learning Health 04 Unsupervised – Supervised – Semi‐supervised
A) Unsupervised ML: Algorithm is applied on the raw data and learns fully automatic – Human can check results at the end of the ML‐pipeline
B) Supervised ML: Humans are providing the labels for the training data and/or select features to feed the algorithm to learn – the more samples the better – Human can check results at the end of the ML‐pipeline C) Semi‐Supervised Machine Learning: A mixture of A and B – mixing labeled and unlabeled data so that the algorithm can find labels according to a similarity measure to one of the given groups Holzinger Group
98
Machine Learning Health 04 Reinforcement Learning
D) Reinforcement Learning: Algorithm is continually trained by human input, and can be automated once maximally accurate
 Advantage: non‐greedy nature
 Disadvantage: must learn model of environment
Holzinger Group
99
Machine Learning Health 04 The difference to interactive ML
E) Interactive Machine Learning: Human is seen as an agent involved in the actual learning phase, step‐by‐step influencing measures such as distance, cost functions …
4. Check
2. Preprocessing
1. Input
3. iML
Constraints of humans: Robustness, subjectivity, transfer?
Open Questions: Evaluation, replicability, … Holzinger Group
100
Machine Learning Health 04