IEbE TRANSACTIONS ON SIGNAL PROCESSING VOL 44, NO I I N O V b M B E R 1Y96 operation consists, in the general case, of adding the AY/2 lowfrequency components X = O.....-V /2 - 1 to the *V/2 highfrequency components k = V/2. . ’ . . -l’- 1. Then, according to Proposals 1 and 2 -I b,, + = + E,, * h i ,, 11 for 11 289 I Using Multirate Architectures in Realizing Quadratic Volterra Kernels Vikram M. Gadre and R. K. Patney + B,, * h ~ + ~D , ,,* bu,, = 0.. . . . -Y/2 - 1. IV. CONCLUSION We have shown that bandfolding i n the DCT domain does not result in a downsampling of the signal. The result is two convolution operations that involve both the odd and even samples of the signal. The impulse response of the resulting filters ( h 1. It H . Ir 1 ) . h i , ’ ) depends on the parity of the samples and on the subband (low or high). ACKNOWLEDGMENT The authors would like to thank the reviewers both for their helpful comments, suggestions, and related references. Their criticisms helped the authors improve the quality of this paper. The authors wish to express their deep gratitude for the support they have received from Prof. R. Goutte of INSA. REFERENCES W. H. Chen and S. C. Fralick, “lniage enhancement using cosine transform filtering,” in Proc. Svmp. Currmt Mcithemtitical Problems Image Science, Montcrey, CA, Nov. 1976. pp. 186-192. C. Diab, R. Prost, and R. Gouite, “Exact subband image deconipositionlreconstruction by DCT,” Si,yncil Proc.essing: Imti,ye Commun., vol. 4, no. 6, pp. 489-496, Nov. 1992. G. Karlsson and M. Vetterli, “Extension of finite length signal for subband coding,” Signul Proc.es.sing, vol. 17. pp. 161-168, Junc 1989. H. Kiya, K. Nishikawa, and M. Iwahashi, “A development of symmetric extension method for subband image coding,” I Processing, vol. 3 , no. 1, pp. 78-81, Jan. 1994. B. Chitprasert and K. R. Rao, “Diwcte cosine transform filtering,” Signal Processing, vol. 19, no. 3. pp. 233-245, Mar. 1990. S. A. Mmtucci, “Symmetric convolution and the discrete sine and cosine transforms,” IEEE Truns. Signtrl Processing, vol. 42, no. 5, pp. 1038-1051, May 1994. H. S. Hou, D. R. Tretter, and M. J. Vngcl, “lntere\ting properties o f t h e discrete cosine transform,” J. Visuul Commun. Imuge Represent., vol. 3, no. I , pp. 73-83, Mar. 1992. S. C. Chan, K. L. Ho, and C. W. Kok. “lhterpolation of 2-D signal by subsequence FFT,” IEEE Trcins. Circuirs Sjsf,-/l; Anrilog Digilal Signcd Processing, vol. 40, no. 2, pp 115-1 18, Feb. 1993. 2. Wang, “Interpolation using type I discrete cosine transform,“ E k tron. Len., vol. 26, no. 15, pp. 1170-1172, July 1990. Z . Wang and L. Wang, “Interpolation using the fast discrete sine transform.” Signul Processing, vol. 26, no. I , pp. 131-137, Jan. 1992. Z. Wang, “Interpolation using the discrete cosine transform: reconsideration,” Electron. Lett.. vol. 29, no. 2, pp. 198-200, Jan. 1993. K. N. Ngan, “Experiments on two-dinienhional decimation in time and orthogonal transform domains,” Signtil Processing, vol. 1 1. no. 3, pp. 249-263, Oct. 1986. A. Neri, G. Russo, and P. Talone, “Inter-block filtcring and downsampling in DCT domain.” Signul Procrsingc Imcrge Commun., vol. 6, no. 4, pp. 303-317, Aug. 1994. Z . Wang, “ Fast algorithms for the discrete I V transform and for the discrete Fourier transform”, IEEE Truns. Acousl. Speech Signcil Processing, vol. ASSP-32, pp. 803-8 16, Aug. 1984. R. Prost, C. Diab, and R. Goutte, “Exact multiresolution image decomposition and reconstruction in discrete space and frequency domains,” Signul Processing: Image Commun., vol. I, pp 249-257, Sept. 1995. Abstract-Multirate architectures have been used for realizing linear FIR digital filters with reduced computational complexity. The Volterra kernel can be represented as a generalized convolution. It would thus be expected that multirate architectures could be used to advantage in realizing Volterra kernels as well. The quadratic Volterra kernel may be realized in the form of an “LDL structure.” The LDL structure includes a set of FIR filters of increasing length, which may be realized in a computationally efficient manner using multirate architectures. I. INTRODUCTION For linear and circular convolution, it is possible to make use of short convolution algorithms given by Winograd [l, ch. 21 to achieve a reduction in computational complexity (CC). These short convolution algorithms have been used together with block processing for reducing the CC of running FIR filtering [4] and [6]. Multirate architectures offer a convenient framework for doing this, as has been illustrated in these references. The quadratic filter involves a polynomial of second degree in the input process at a number of past samples, which may be represented as a “generalized convolution.” In view of this, it would be expected that the principles that enable a reduction in CC for linear convolution have their counterparts for quadratic filters. In this correspondence, it is shown that this is indeed the case. Quadratic kernels may be realized using an “LDL structure” [3] having an FIR filter in each of its parallel branches. The order of these FIR filters increases from one branch to the next. The basic idea in this correspondence is the following: Some of the longer FIR filters on the parallel paths in the LDL structure can be realized using multirate architectures that reduce the CC of the realization. By developing a mean-length lemma, it is shown that the realization of a set of FIR filters with increasing length can offer some additional advantages in multiplicative complexity (MC) over realizing an isolated filter, while leaving the additive complexity (AC) unaffected. FOR QUADRATIC KERNELS 11. LDL STRUCTURES Consider a quadratic Volterra filter acting on the current sample and -If past samples of an input process ( I to produce the output process P. The form of the equation describing this relationship is [3] where cy,, \ I B, H, vector { t i [ i t ] . . . [ , [ t i - -\I]]‘ vector of linear coefficients symmetric coefficient matrix associated with the quadratic form. Manuscript received February 24, 1994: revised December 13, 1995. The associate editor coordinating the review of this papcr and approving it for publication was Prof. Roberto Bamberger. V. M. Gadre is with the Department of Elcctrical Engineering, Indian Institute of Technology, Bombay, Powai, Mumbai, 400 076, India. R. K. Patney is with the Department of Electrical Engineering, Indian Institute of Tcchnology, Dclhi, India. Publisher Item Identifier S 1053-587X(96)08229-3. 1053-587X/96$05.00 0 1996 IEEE Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 44, NO. I I , NOVEMBER 1996 2892 ( l i n e a r FIR filter) ( q u a d r a t i c form realized as a weighted sum of parallel FIR filters followed b y squarer.) Each inner product 1 Un,~L, is realized using architectures of t h e form o f f i g . 2 a n d l o r f i g . 3 Fig 1 Realimtion of quadratic kernel bd?ed on The LDL’ decomposition of symmetric matrices may now be used to decompose H , into the product of a lower triangular, diagonal, and upper triangular matrix Due to the symmetry of H,, the upper and lower triangular matrices are related through transposition H,= L,D,L,T (2) where L , is lower triangular with unit diagonal, and D , is diagonal By doing so, it is possible to realize the quadratic kemel as a parallel “LDL structure” [3] shown in Fig 1 Substituting (2) in (1) gives f where u,[r?] u,[ri] by i(4 [I?]= is L: 1. , B,T T -” , IT + u;[r?]D,Ik,[t!] decomposition Fig. 2 provides an example of a multirate architecture that realizes a given FIR transfer function H(;)with reduced CC. This will henceforth be referred to as MR2, “2” being a mnemonic for the downsampling and upsampling factor involved. The ideas leading to this architecture are briefly reviewed from [4]. The input signal X (s ) , output signal l - ( z ) , and filter H ( s ) are each decomposed into their polyphase components of order 2 as follows: (3) Denoting the Ith element of the vector ,=o z[tzl, 1 \I Ilq L [ t L ] LDL‘ L, = ,!U[??- J + I] + 11[H - + 11 (4) 2=0 i=2+1 Since D , is diagonal, (3) may be rewntten as u,[n] = BTCTt2 41 + 11 (5) Dft2(11,L [ r t ] ) ’ %=I , If H , and D are both of full rank, the number of multiplications successively increases from 1 to M as one goes down the branches If H , and D , are not of full rank, the number of parallel paths gets reduced according to the rank cc 111. MULTIRATE ARCHITECTURES FOR REDUCED The filters in each of the parallel paths of Fig. 1 are linear. Some of them may be realized using reduced CC multirate architectures. From (6)-(8), it is clear that each of the terms on the right-hand sides of the equations may be regarded as a polynomial of degree 1, with “coefficients” equal to the polyphase components of the respective signals. Taking the product of the :-transforms X ( s ) and H ( x ) may then be regarded as taking a product of two polynomials for which one may use an efficient polynomial multiplication Winograd algorithm. The “multiplications” of “coefficients” in this algorithm now translate into linear convolutions, which are implemented on the channels of MR2. The analysis segment combines the input polyphase components linearly for the purpose of providing appropriate input sequences to the channels, whereas the synthesis segment combines the results of these partial convolutions suitably to produce the output polyphase components. Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply. lbEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 44, NO 1 I , NOVEMBbR I996 2893 Channel f iltefS \ -1 * x Cnl (with z-transform x (2)) analysis _It- segment --. synthesis segment y In1 (with z-transform y (z)) Fig. 2. Multirate system MR2 bwed on 2-by-2 point convolution algorithm. FIR filter H ( c ) realized with a delay of 1 sample. Y bl (2-transform y(t) -t-l (With Z Fig. 3 . Multirate architecture MR3 realiLtrig a three-by-three point convolution algorithm FIR hlter H ( - ) realized with a delay of two samples Other examples of multirate architectures that reduce the CC of FIR filtering are given in [4]. An example of an architecture, henceforth referred to as MR3, based on a three-point by three-point algorithm [I, p. 851 is shown in Fig. 3. The channel filters in this figure are derived from the polyphase components of order 3 of the FIR filter H ( ; ) to be realized. It can be verified, by writing I‘(2) in terms of X(): in MR2, that a delay of one sample is introduced, resulting in the overall system having a transfer function of L-’€€(z) instead of H ( , : ) . Similarly, MR3 introduces a delay of two samples. In general, a multirate causal system with a multirate factor of’\L introduces a minimum delay of 9- 1 [SI. Thus, MR2 and MR3 incur only a minimum delay of 1 and 2, respectively. However, this delay can be “absorbed’ conveniently as is shown in Section V. The MC and AC of direct realization, as well as of realization with an arbitrary reduced CC multirate architecture, may be expressed in the following general “slope-intercept” form: MC = p L ; + AC = ~ J L 5. (9) where L is the length of the FIR filter segment being realized. The constants 1) and s have been tabulated in Table I for direct realization, realization with MR2, and with MR3. Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply. lEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO I I , NOVEMBER 1996 2894 Realization P S Using M R 2 314 112 Using MR 3 213 2 Direct realization 1 -1 If MC is the lone criterion, it is seen that it is always advantageous to use an architecture .4, for which p < 1, as compared with direct realization. However, from the point of view of AC, the use of A will be preferred to direct realization above a certain threshold length only. If 4 is either MR2 or MR3, this threshold L r is derivable from Table I and is given by L7 = s+1 1- y -. (10) Substituting 11 and s for MR2 from Table I into (IO), it is seen that MR2 would be preferred over direct realization for L, > 6. It may also be inferred that MR3 is always advantageous as compared with MR2 if MC is the lone criterion, but if AC is also considered, then MR3 becomes advantageous over MR2 for L > 18. the CC may be realized using such architectures. In this section, both MC and AC will be kept in mind while making calculations. As mentioned before, MR3 becomes advantageous over MR2 only for L 2 18. If the quadratic kernel involves fewer than 18 samples of the input, only MR2 need be used. That situation is considered first. From (4), each FIR filter in the quadratic kernel of Fig. 1 has a leading coefficient equal to I , which does not need a multiplication. If we consider the impulse response coefficients of the FIR filters other than the leading I , they form an FIR filter in their own right. MR2 can be used to realize them with no additional delay incurred since they already have a delay of 1 incorporated. L in (11) will then be taken to mean one less than the filter length since the leading unity coefficient has been omitted. The special feature of the current situation is that it is possible to use the system MR2 from filter segment lengths less than six onwards, even if the overall AC is to be left unaffected. This would be useful if multiplications are much more cumbersome than additions in a given signal processing situation, where one may gain more in overall MC without losing in AC. The mean of the filter lengths in this set is easily calculated to be ( L t L L l ) / 2 . From the mean length lemma and Table I, the use of the architecture A, with given p7 and s t is preferable to direct realization (for which = I, s , = - l), provided + IV. THEMEANLENGTH LEMMA FOR REALIZING A SETOF FII~TERS This section develops a lemma that addresses the following situation: Two realizations -4, and A, are considered for realizing each filter in any set of AV filters keeping AC in mind. The set of filter lengths is F’I, = ( L , . L2. . . . . L , v ) . From (9), AC varies with L according to AC,(L)= p,L + s,. (11) It is assumed that the slope parameter of A , , viz. y 3 , is less than A, i.e., 11, < p i . I ) The Mean Length Lemma: It is advantageous to use the realization A, as compared with ,4z in this situation if the mean of F/, exceeds a threshold, i.e., provided 1 3, - s, p L of c L>-. -1-/,€rr, i’t - In particular, d48could be the architecture MR2. For MR2, s , = 1 / 2 , p) = 3/4, and hence, Lc L , , > 1 2 is required. It is not meaningful to use MR2 for L , < 2 anyway since each of the two polyphase components must include at least one filter coefficient. With L / = 2, one would have L,, > 10.Of course, L,, is constrained by the number of samples used in the Volterra kemel. If this number is only 9, for example, then, as (17) indicates, one can use MR2 for all filters with L 1 to 9. The gain in MC is enhanced by using MR2 for L 2 Ll, rather than L 2 6, if Ll < G . This is because one has availed of the MC advantage for L , 5 L < 6 as well while suffering from no loss in (L), AC. The gain in MC as a function of L , which is denoted MCqa17, as compared with direct realization, is + (12) - 1’J Proof( In order that 14, should be preferred over MC,,,,,, ( L ) L - +L 1 d4z, it should be true that The total gain in MC, which is denoted MCt,,, is therefore ,I, for L = Ll : L , , L =>-1 s Ltrr L > - si - b z P I -P , which proves the lemma. If the filter lengths are consecutive, then this lemma may be used to gain additionally in MC without losing in AC. This is shown in the next section. V. MULTIRATE SYSTEMS AND QUADRATIC KERNELS The FIR filters in the parallel paths of the LDL structure that are “long enough” to merit the use of a multirate architecture for reducing For the specific example of Lj = 4, L,, = 9, and MCL11,,L72 = 9.7.5. The additional gain in MC due to the filters of length Li 5 L < G having been realized with MR2 is then obtained by putting Li = 4, L, = 5 in (19), and is 2.25. This computation may be repeated for any values of Lt, and Ll. The situation is now considered when the number of samples involved in the quadratic Volterra kernel is greater than 18 and, hence, large enough to warrant the use of MR3. Suppose 22 samples, viz. . r [ n ] . . . . . ,r[n - 211, are involved in the quadratic kemel. A look at (4) reveals that none of the FIR filters of the LDL structure, except . other words, all of the longest one, involves the sample s [ n ] In them, barring the first one, have inbuilt delays, which increase with Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 11, NOVEMBER 1996 decreasing length. Therefore, in the current example, the shortest filter has an inbuilt delay of 21; the filter with length 2 (and hence L = 1) has a delay of 20, and so on. For using MR3, a delay of two samples would have to be taken care of. This is implicitly provided for in all the filters of length less than or equal to 20. For the length 21 filter, there is an inbuilt delay of 1, and on omitting the leading coefficient of 1 as explained in the beginning of this section, an additional delay of 1 has automatically been provided for, as required by MR3. For the length 22 filter, however, one additional coefficient must be realized “loose” as explained earlier, other than the leading coefficient of 1 . Thus, for the length 21 and 22 filters, 20 filter coefficients can be included in the realization employing MR3 while taking care of the delay incurred. Were a single filter segment being considered in isolation, then it would be appropriate to use MR3 for L > 18. However, a consequence of the mean length lemma is that one may begin from a smaller length since all that one requires is that the mean length of the set of filter segments realized using MR3 be greater than 18. Thus, one may use MR3 to realize the filters with length 17 onwards in this example. From the preceding discussion, the set FI, for thejfilter segments being realized with MR3 in this case is (16, 17, 18, 19, 20, 20) after considering coefficient omissions to take care of delays and leading unity coefficients. The mean is 18.33, which is greater than 18. The MC of MR2 is (3/4)L, and that of MR3 is ( 2 / 3 ) L .Thus, one has additionally gained by ( 3 / 3 - 213) * (16 17) = 11/4 in MC by realizing the filter segments of lengtlh 16 and 17 using MR3. + VI. CONCLUSION In this correspondence, the use of multirate architectures for the realization of quadratic Volterra kernels with reduced CC is investigated. A mean-length lemma is developed to explain the variations that arise when a set of FIR filters is being realized by using multirate architectures as opposed to an isolated filter. REFERENCES R. E. Blahut, Fast Algorithmsfor Digital Signal Processing. Reading, MA: Addison-Wesley, 1985. N. K. Bose, Digital Filters: Theory andApp1ication.r New York: North Holland, 1985. Y. Lou, C. L. Nikias, and A. N. Venetsanopoulos, “Efficient VLSI array processing structures for adaptive quadratic digital filters,” Circ., Sys., Signal Procesisng, vol. 7, no. 2, pp. 253-273, 1988. Z. J. Mou and P. Duhamel, “Short-length FIR filters and their use in fast nonrecursive filtering,” IEEE Trans. Signal Processing, vol. 39, no. 6, pp. 1322-1332, June 1991. M. Vetterli, “A theory of multirate filter banks,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 356-372, Mar. 1987. -, “Running FIR and IIR filtering using multirate filter hanks,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, no. 5, pp. 730-738, May 1988. 2895 A Nonlinear Analytical Model for the Quantized LMS Algorithm-The Power-of-Two Step Size Case Neil J. Bershad and JosC Carlos M. Bermudez Abstruct- This correspondence presents a study of the quantization effects in the finite precision LMS algorithm with power-of-two step sizes. Deterministic nonlinear recursions are presented for the mean and second-moment matrix of the weight vector about the Wiener weight for white Gaussian data models and small algorithm step size p. The numerical solutions of these recursions are shown to agree very closely with the Monte Carlo simulations during all phases of the adaptation process. Design examples demonstrate the selection of the number of quantizer bits and the adaptation step size fi to yield a desired transient behavior and cancellation performance. The results obtained indicate that previous models are too conservative in predicting the converged MSE for a given number of bits. I. INTRODUCTION The least mean squares (LMS) algorithm is very popular in implementations of real-time high-speed digital adaptive filters. Fixed-point arithmetic is prevalent in such applications. The effects of a finite word-length on the behavior of the LMS algorithm have been studied in [1]-[8]. In particular, [8] extended the conditional moment techniques developed in [4]-[7] to the study of the nonlinear behavior of the quantized LMS adaptation using arbitrary step sizes ,U. The reader is referred to [8] for further details of the problem. For arbitrary j i , the LMS updating equation requires two multiplications [9]. First, p is multiplied by the error signal. The result is then quantized and multiplied by the input signal. Finally, a second quantization determines the updating term [8]. A different implementation is possible when p is an exact power of two. In this case, multiplications by 1’ are usually realized as right shifts. The error and input signals are first multiplied in double precision. The result is then shifted (multiplication by j ~ )and quantized to single precision. Compared with the arbitrary step size case [8], this implementation substantially modifies the algorithm behavior. The convergence becomes controlled by the quantized value of the entire weight update term. This was the problem studied in [1]-[3] using a linear model and in [4] using a continuous nonlinear function. This note studies the nonlinear behavior of the quantized LMS algorithm when products by a power-of-two step size p are implemented as right shifts. The results for the arbitary step size case, which have been derived in [SI,cannot be used because the operational order is different, and the quantizer input is a product of two unquantized signals. Furthermore, the mathematical approach used in [8] cannot be applied either. Instead, the quantizer operation is expressed as the sum of linear and periodic functions. Then, characteristic functions are used to evaluate conditional expectations in the adaptive weight recursion. A small j i approximation yields recursive equations for the mean and second moment matrix of the weight vector about the Wiener weight. The recursions are solved numerically and shown to Manuscript received September 1, 1994; revised April 2, 1996. This work was supported, in part, by the Brazilian National Council for Development of Science and Technology (CNPq) under grant No. 201532/93-0. The associate editor coordinating the review of this paper and approving it for publication was Prof. JosC M. F. Moura. N. J. Bershad is with the Department of Electrical and Computer Engineering, University of California, Irvine, Irvine, CA 92692 USA. J. C. M. Bermudez is with the Department of Electrical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil. Publisher Item Identifier S 1053-587X(96)08230-X. 1053-587W96$05,00 0 1996 IEEE Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on November 4, 2008 at 23:19 from IEEE Xplore. Restrictions apply.
© Copyright 2025 Paperzz