P. Šanda: Speeding Up the Algorithm for Finding Optimal Kernel Bandwidth in Spike Train Analysis, en 73-75 en73 Speeding Up the Algorithm for Finding Optimal Kernel Bandwidth in Spike Train Analysis P. Šanda1 1 Institute of Physiology, Academy of Sciences of the Czech Republic Supervisor: Doc. RNDr. Petr Lánský, CSc. Summary One of the important tasks in the spike train analysis is to estimate the underlying firing rate function. The aim of this article is to improve the time performance of an algorithm which can be used for the estimation. As there is no unique way how to infer the firing rate function, several different methods have been proposed. A popular method how to estimate this function is the convolution of the spike train with Gaussian kernel with appropriate kernel bandwidth. The definition of what “appropriate” means remains a matter of discussion and a recent paper [1] proposes a method how to exactly compute optimal bandwidth under certain conditions. For large sets of spike train data the elementary version of the algorithm is unfortunately too inefficient in terms of computational time complexity. We present a refined version of the algorithm which in turn allows us to use the original method even for large data sets. The achieved performance improvement is demonstrated on a particular results and shows usability of proposed method. Keywords: action potential, spike train, neural coding, firing rate, convolution, Gaussian kernel, kernel bandwidth, Brent's minimization, parallel computing, MPI 1. Introduction Many neurophysiological studies are based on the assumption that the majority of information flow between neurons is provided by spikes. Spike trains are believed to form a neuronal code and many coding models successfully predict experimental stimuli features when only the resulting spike train is given. It has been shown that important aspects of the stimuli are coded by the neuron's firing rate, however, the exact procedure how to obtain such a rate from the experimental EJBI – Volume 6 (2010), Issue 1 data differs from paper to paper and various methods were proposed [2]. Here we consider the method based on the convolution of a spike train with a fixed (Gaussian) kernel, which in result leads to a smooth estimate of firing rate and has been widely used in the past decades [3], [4], [5], [6], [7], [8]. The most difficult part of this method is the selection of the kernel bandwidth, because it affects substantially the “quality” of the estimate, while there is no obvious clue how the optimal bandwidth should be chosen. In [1] authors propose a kernel density estimator based on the mean integrated squared error principle (MISE) and formulate a precise algorithm how to infer optimal (fixed) kernel bandwidth. For larger sets of recorded spike trains the time complexity of the straightforward version of the algorithm increases, so that it becomes unusable for online queries when studying the features of the method. Here we provide a solution, which improves the time complexity of the implementation. That at the end allows us to work with experimental data in a reasonable manner. It is also worth to note that the proposed solution does not interfere with the actual result of the original method - for the properties and comparison with other methods look in the original paper [1]. 2. Methods 2.1 Original method The firing rate is a non-negative function l for which the integral ňab l(t)dt gives the expected number of spikes during the time interval [a,b). In the experimental recording we get only one or more trials of spike train data. The problem is how to assess the Ů firing rate l , which will be as close as possible to the original l, which is believed to stand in the background of spike discharges. The method is to convolve the spike train with a specific kernel, thus obtaining a smooth estimate of l, for example see Fig. 1. In this case we use fixed Gaussian kernel and the problem is reduced to the question how to select the optimal bandwidth w, so Ů that the difference between l and l is minimal. The method itself is beyond the scope of this article, for details see [1]. What is important here is that the core of computation can be summarized in the following statement: find w*, such that the formula (1) is minimal: (1), where ti is the time of i-th spike, n is the number of trials, , defines the time range of the record and kw is as the kernel used. Since we will study the Gaussian kernel the equation (1) can be rewritten as (2), where N is the number of spikes. Note that the term 2Öpn2 is constant and has no effect on w*. Let us denote the inner term of the sum in (2) as E(w, t,i t)j . We will denote W the set of possible values of w, in which we are searching. We denote the size W = |W | and assume that its points are equidistant. © 2010 EuroMISE s.r.o. en74 P. Šanda: Speeding Up the Algorithm for Finding Optimal Kernel Bandwidth in Spike Train Analysis, en 73-75 The straightforward implementation will find the minimum value via evaluating this term in three nested loops: computational cost or (3) split (l2) into many small subtasks which are successively distributed to CPUs according to their load. In real-life implementation we have chosen (3) because (1) tends to produce high overhead of the parallelization engine and (2) assumes that the underlying CPUs are equivalent in performance and accessibility (that breaks in many distributed environments). (11) for wÎW (12) for jÎ[1,...N] (13) for iÎ[1,...,j - 1] Tmp=E(w, t,i t)j if (min>tmp)min=tmp,w*=w thus obtaining the time complexity O(N2W). The selection of W is dependent on the interval [a, b) and required precision of the optimal value. In a typical case w* << b - a we can select the upper bound of W to log(b - a), in the case of bisection (see below) the upper bound is not so vital. Fig. 2. Typical shape of the function for experimental spike train data. Splitting the task for the parallel execution at the loop (l1) level will not allow us to use parallelization in case of the bisection run, thus we will split the task on p parts at the level (l2). That will give us the final estimate 2 for the time complexity of O(N log(W)/p). 2.2 Bisecting Now we will use that in a typical case where C forms an unimodal function, see Fig 2., though this cannot be asserted in general (such a problem can, for example, occur when searching for bandwidths is smaller than the sampling resolution of input data). Having the unimodal function and a sensible estimate of lower and upper bounds we can use any of the extremum search algorithms based on sectioning the domain. This will reduce loop (l1) time complexity from a linear to a logarithmic factor and as a result we obtain the complexity of O(N2log(W)). As hinted above while this method helps a lot certain attention needs to be paid before its use. 2.3.1 Splitting Since the upper bound in the loop (l3) is not constant, trivial splitting of (l2) will produce p subtasks [1,...,N/p),[N/p,...,2N/p),..., [(p-1)N/p,...,N] with increasing time complexity of subtasks. At the end this would produce a situation where the first subtasks are completed having the relevant CPUs idling while the last subtasks would still be in computation. 2.3 Parallelization Because the evaluation of E(w, t,i t)j is independent of the previous computations, it is a natural target for parallelization. There are more ways how to solve it - (1) move the splitting of task into (l3), (2) splitting p tasks in (l2) in a proportional way, so that each subtask has the same Fig.1. Illustration of the problem. The thick line is the original l, the top line shows experimentally measured spike train generated Ů from this function, the thin line is firing rate l , which we try to optimize. EJBI – Volume 6 (2010), Issue 1 Let us stick with the implementation details now. 3. Results 3.1 Tuning parameters The algorithm was implemented based on the sections above, allowing all the strategies - exhaustive search or bisecting both in sequential and parallelized versions. The language used was C++, for parallelization openMPI implementation [9] of MPI standard was chosen, for bisection we used Brent minimization algorithm [10]. In order to find the proper splitting of the subtasks we analyzed measured time demands for a different fragmentation of the tasks, see Fig. 3. We ran the optimality search for two sets of 1000 and 18000 spikes. In the case of the larger set we could see that taking any value below ¦ = 500 gives approximately the same time demands. On the very beginning there is a visible peak caused by the growth of the load by the parallelization maintenance (i.e. the cost of distributing subtasks starts to be larger than computation of subtasks themselves). From¦ = 500 we could see gradual growth caused by the insufficient fragmentation (i.e. some CPUs are needlessly idle and waiting for other unfinished subtasks). Fig. 3. Figure shows how splitting affects time performance of the computation. The left panel presents the case, where the input data were 1000 spikes, while the input for the right panel was 18000 spikes. Both sets were taken from the real experimental data, bisection was used in this case. f is the size of one (fixed) subtask, that is the number of (l3) iterations. © 2010 EuroMISE s.r.o. P. Šanda: Speeding Up the Algorithm for Finding Optimal Kernel Bandwidth in Spike Train Analysis, en 73-75 Tab. 1. Comparison of time demands. Method Time (min:sec) Sequential search 58:41 Parallel search 4:14 Sequential bisection 2:43 Parallel bisection 0:09 As the number of spikes in the input set will decrease, this value will also decrease, as the results for the set of 1000 spikes show. Here we could see similar properties as far as the shape is concerned, but the total time needed for computation is now negligible. This leads to the final choice of ¦ = 100, which will be always sufficient for any larger input sets. As it can be seen in the left panel of Fig. 3, it is a reasonable value even for small sets, but that is not so important due to small total time demands. This value is, of course, dependent on the particular computational setting - in our case all tests have been done on a small cluster with 20 CPU cores. 3.2 Real time demands For the comparison of real-time improvements we offer the table below. The input data and parameters were the same for all the tasks: 18000 spikes, [1;400] ms range for bandwidth, precision of 1 ms (W=400/1). EJBI – Volume 6 (2010), Issue 1 4. Discussion We have proposed and implemented a parallel algorithm for optimal kernel bandwidth search which has better time performance than its “straightforward” version. Moreover when the function (1) is unimodal on the given range, we can use the bisecting version, which reduces the time even more drastically. To check the reasonable ranges, one can do the first trial run which uses only few sampling points (and in fact cannot be omitted even in normal case). This performance boost does not play an important role in the case of small input sets of spikes, however, it is significant in case of large sets. The whole work was motivated by real demands, when experimental sets of ~20000 spikes were evaluated and, moreover, their subsets also needed to be evaluated the approximate knowledge of the function (1) shape reduced the need for a slow version of the algorithm. At the end we proposed tuning parameters for an example cluster configuration and provided actual results of the performance improvement. Acknowledgments The work was supported by the grant SVV2010-265 513. References [1] Shimazaki H., Shinomoto S:. Kernel bandwidth optimization in spike rate estimation. Journal of Computational Neuroscience 2010; 29: 171182. [2] Cunningham J.P., Gilja V., Ryu S.I., Shenoy, K.V.: Methods for estimating neural firing rates, and their application to brainmachine interfaces. Neural Networks 2009; 22 (9): 12351246. en75 [3] Sanderson A.C.: Adaptive Filtering of Neuronal Spike Train Data. IEEE Transactions on Biomedical Engineering 1980; 27 (5): 271 274. [4] Richmond B.J., Optican L.M., Podell M., Spitzer H.: Temporal encoding of twodimensional patterns by single units in primate inferior temporal cortex. I. Response characteristics. J Neurophysiol 1987; 57 (1): 132146. [5] Richmond B.J., Optican L.M., Spitzer H.:. Temporal encoding of twodimensional patterns by single units in primate primary visual cortex. I. Stimulus-response relations. J Neurophysiol 1990; 64 (2): 351369. [6] Paulin M.G.: Digital filters for firing rate estimation. Biological cybernetics 1992; 66 (6): 525531. [7] Paulin M.G., Hoffman L.F.: Optimal firing rate estimation. Neural Networks 2001; 14 (6-7): 877 881. [8] Nawrot M., Aertsen A., Rotter S.: Single-trial estimation of neuronal firing rates: From single-neuron spike trains to population activity. Journal of Neuroscience Methods 1999; 94 (1): 81 92. [9] Gabriel E., Fagg G.E., Bosilca G., Angskun T., Dongarra J.J., Squyres J.M. et al:. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings, 11th European PVM/MPI Users' Group Meeting 2004; 97104. [10] Brent R.P.: Algorithms for minimization without derivatives. Dover Pubns; 2002. Contact Mgr. Pavel Šanda Institute of Physiology, Academy of Sciences of the Czech Republic Vídeňská 1083 142 20 Prague 4 Czech Republic e-mail: [email protected] © 2010 EuroMISE s.r.o.
© Copyright 2026 Paperzz