Supplemental Materials A. Learning rule derivation The partial derivatives of the cost function in equation (2) with respect to the preferred direction of the ith neuron, θpi . e i i ( e d ) i ' cos( d p ) sin( d p ) E p i p ( ) e 0 e d p i d pi d p i 2 (A.1) 2 where ' = 2 . This equation is valid where d p / 2 , because y is differentiable i in this range; otherwise the second term equals zero. Derivation of the first term in the right hand side of equation (A.1) We used a geometrical approximation to Figure S.A. Approximation of executed direction e p i e (Figure S.A). p i . Vi is a vector for preferred direction pi , and Ve a vector for the e . A perturbation on the preferred direction p i also perturbed the cell’s vector Vi along this direction of magnitude y i . This perturbation in the vector also perturbed the 1 population vector Ve on the executed direction. Because the amount of perturbation in the specific individual vector, | Vi | p and the amount of perturbation in the population i vector, | Ve | e are equal, | Vi | p | Ve | e i e | Vi | 1 i y i | Ve | | Ve | p (A.2) Thus we approximate e with y i , the only available local (and thus biologically i p plausible) information. 1 is a scaling factor and can be absorbed into the learning | Ve | rate of supervised learning. We can then obtain the supervised learning rule in equation (2). ( e d ) e yi ( ) e d | Ve | p i i e y i | Ve | p (A,3) We verified both analytically and in simulations that the approximation e y i is p i | Ve | appropriate (please contact the corresponding author to obtain the exact derivation and corresponding simulation results). Derivation of the second term in the right hand side of equation (A.1) The second term can be approximated with ' cos( d p i ) sin( d p i ) ' y i ( d p i ) (A.4) ( y i cos( d p ), sin( x) x) i Note that equation (A.4) is valid where d p / 2 ; otherwise this term is equal to i zero. The first Taylor expansion of sine function is valid for p ≈ d . However, due to i truncation of the neurons’ activation rule (Equation (1)), if p is far from d , where the i 2 approximation of the sine function is invalid, y i is zero or near zero. Thus, this approximation is valid for all directions. We verified in simulations, that compared to a non-approximated equation, the maximum error is about 13%. In summary, the weight update rule is pi pi E yi i i ( ) ' ( d p ) y i p e d | Ve | p i (A.5) p SL ( d e ) y i UL ( d p ) y i i with 0, SL = i the learning rate of the supervised learning rule, and UL .= ' | Ve | the learning rate of the unsupervised learning rule. 3 B. Effect of each learning process on bistability Figure S1. Effects of supervised learning. (A) Directional error, (B) normalized population vector (PV), (C) and spontaneous arm use after different durations of therapy followed by 0 free choice trial (immediate) and 3000 free choice trials (follow-up) without supervised learning. Unlike in the full model (see Figure 5), the bistable behavior is not present, as shown by the non-crossing of the curves in the immediate and follow-up condition. Figure S2. Effects of unsupervised learning. (A) Directional error, (B) normalized population vector (PV), and (C) spontaneous arm use after different durations of therapy followed by 0 free choice trial (immediate) and 3000 free choice trials (follow-up) without unsupervised learning. Unlike in the full model (see Figure 5), the bistable behavior is not present, as shown by the noncrossing of the curves in the immediate and follow-up condition. 4 Figure S3. Effects of reinforcement learning. (A) Directional error, (B) normalized population vector (PV), and (C) spontaneous arm use after different durations of therapy followed by 0 free choice trial (immediate) and 3000 free choice trials (follow-up) without reinforcement learning. Unlike in the full model (see Figure 5), the bistable behavior is not present, as shown by the noncrossing of the curves in the immediate and follow-up condition. In these simulations, we first used a positive reinforcement learning rate (0.01) during acute stroke phase (500 free choice trials after lesion), before “turning off” reinforcement learning in the following trials. Due to supervised learning and unsupervised learning, performance improved over time but spontaneous arm use stayed low. 5
© Copyright 2025 Paperzz