Text S1.

Supplemental Materials
A. Learning rule derivation
The partial derivatives of the cost function in equation (2) with respect to the preferred
direction of the ith neuron, θpi .
e

i
i
( e   d )  i   ' cos( d   p ) sin(  d   p )
E 
p

i

 p (   ) e  0
e
d
 p i


d   pi 
d   p 
i

2

(A.1)
2
where  ' = 2 . This equation is valid where  d   p   / 2 , because y is differentiable
i
in this range; otherwise the second term equals zero.
Derivation of the first term in the right hand side of equation (A.1)
We used a geometrical approximation to
Figure S.A. Approximation of
executed direction
e
 p i
e
(Figure S.A).
 p i
. Vi is a vector for preferred direction
 pi
, and Ve a vector for the
e .
A perturbation on the preferred direction  p i also perturbed the cell’s vector Vi along
this direction of magnitude y i . This perturbation in the vector also perturbed the
1
population vector Ve on the executed direction. Because the amount of perturbation in
the specific individual vector, | Vi |  p and the amount of perturbation in the population
i
vector, | Ve | e are equal,
| Vi |  p | Ve | e 
i
e | Vi |
1 i


y
i
| Ve | | Ve |
 p
(A.2)
Thus we approximate
e
with y i , the only available local (and thus biologically
i
 p
plausible) information.
1
is a scaling factor and can be absorbed into the learning
| Ve |
rate of supervised learning.
We can then obtain the supervised learning rule in equation (2).
( e   d )
e
yi

(



)
e
d
| Ve |
 p i
i 
 
 e  y 
  i | Ve | 
p


(A,3)
We verified both analytically and in simulations that the approximation
e
y i is

p i | Ve |
appropriate (please contact the corresponding author to obtain the exact derivation and
corresponding simulation results).
Derivation of the second term in the right hand side of equation (A.1)
The second term can be approximated with
 ' cos( d   p i ) sin(  d   p i )   ' y i ( d   p i )
(A.4)
( y i  cos( d   p ), sin( x)  x)
i
Note that equation (A.4) is valid where  d   p   / 2 ; otherwise this term is equal to
i
zero. The first Taylor expansion of sine function is valid for  p ≈  d . However, due to
i
truncation of the neurons’ activation rule (Equation (1)), if  p is far from  d , where the
i
2
approximation of the sine function is invalid, y i is zero or near zero. Thus, this
approximation is valid for all directions. We verified in simulations, that compared to a
non-approximated equation, the maximum error is about 13%.
In summary, the weight update rule is
 pi   pi 
E
yi
i
i




(



)
 ' ( d   p ) y i
p
e
d
| Ve |
 p i
(A.5)
  p   SL ( d   e ) y i   UL ( d   p ) y i
i
with
  0,  SL =
i

the learning rate of the supervised learning rule, and  UL .= '
| Ve |
the learning rate of the unsupervised learning rule.
3
B. Effect of each learning process on bistability
Figure S1. Effects of supervised learning. (A) Directional error, (B) normalized population vector
(PV), (C) and spontaneous arm use after different durations of therapy followed by 0 free choice
trial (immediate) and 3000 free choice trials (follow-up) without supervised learning. Unlike in the
full model (see Figure 5), the bistable behavior is not present, as shown by the non-crossing of
the curves in the immediate and follow-up condition.
Figure S2. Effects of unsupervised learning. (A) Directional error, (B) normalized population
vector (PV), and (C) spontaneous arm use after different durations of therapy followed by 0 free
choice trial (immediate) and 3000 free choice trials (follow-up) without unsupervised learning.
Unlike in the full model (see Figure 5), the bistable behavior is not present, as shown by the noncrossing of the curves in the immediate and follow-up condition.
4
Figure S3. Effects of reinforcement learning. (A) Directional error, (B) normalized population
vector (PV), and (C) spontaneous arm use after different durations of therapy followed by 0 free
choice trial (immediate) and 3000 free choice trials (follow-up) without reinforcement learning.
Unlike in the full model (see Figure 5), the bistable behavior is not present, as shown by the noncrossing of the curves in the immediate and follow-up condition. In these simulations, we first
used a positive reinforcement learning rate (0.01) during acute stroke phase (500 free choice
trials after lesion), before “turning off” reinforcement learning in the following trials. Due to
supervised learning and unsupervised learning, performance improved over time but
spontaneous arm use stayed low.
5