A fast RTL Power Estimator for Combinational Circuit1 一种快速的组合电路 RTL 功耗估算器 ZHAO Wen-qing(赵文庆), CUI Ming-dong(崔铭栋) and TANG Pu-shan(唐璞山) (CAD Lab, Electronic Engineering Department, Fudan University, 200433, Shanghai) 复旦大学电子工程系 CAD 研究室,200433,上海 Abstract:VLSI design is toward much higher level with the development of modern synthesis tools. However, due to the computational complexity problem, gate level power estimators are becoming more and more inapplicable for high level modules. In order to estimate circuit power at the early design stage, RTL power analysis tools are needed. In this paper, we present a fast method to calculate RTL combinational module power. Once the power library is built, we can give the power dissipation of a certain module under stimulation of any input vector. Our method used Taylor’s expansion to establish an equation-based model, and Monte-Carlo simulation is used for library establishment. The result of ISCAS85 benchmark shows that the relative error of our method is within 5%. Key words: RTL, power analysis 摘要:随着现代综合工具的发展,集成电路设计越来越趋向于更高的层次。门级的功耗模拟 器由于在计算复杂度上存在的问题,对于高层次模块变得愈加不适用。为了设计初期能在高 层次进行功耗估算,我们需要 RTL 的功耗模拟器。本文提出了一种快速分析组合 RTL 模块功 耗的方法,经过建立模块功耗库,可以非常快的计算出任意输入向量驱动的电路功耗。我们 的方法使用泰勒一阶近似的公式模型,并在建库过程中采用 Monte-Carlo 模拟方法。ISCAS85 benchmark 电路模拟的结果显示,该方法的误差可以在 5%以内。 关键字:RTL,功耗分析 1. Introduction It is typically the case that area, speed and reliability are always given more concern in traditional IC design process, however, much larger scale and much faster speed of modern electronic systems has led to a serious of power-related problems which are receiving more concern. Often general-purpose macros developed independently by third-party intellectual property (IP) providers are reused everywhere. In a power-constraint design (such as consumer electronic devices), the power dissipation of high level modules are required to be predicted at early design phase. Thus, tools that allow designer to evaluate power budget during various design phases are in great demands. Research on gate-level power estimator has been on for quite a long time, and many techniques have been proposed (paper [1] gives a survey). Practical tools are already in use now. These estimators can give precise power dissipation of a circuit driven by certain input vectors. However, due to the nature of its simulation process, gate-level estimators always have the slowest speed, and more critical disadvantage is that circuit netlist must be known before any simulation could be performed, this greatly blocked the advance of high-level design technology. Designers are no longer satisfied with such estimators, and a RTL estimator which will help them to correctly evaluate the power in RTL design phase is needed in order to choose a proper framework structure. 1 This research is supported by National High Technology Research and Development 863 Plan 863-SOC-Y-3-3 NSFC oversea’s young scientist joint research project 69928402, the doctoral program foundation of Ministry of Education of China 2000024628, NSFC project 69806004, , foundation for university key teacher by the Ministry of Education, 本文工作受国家 863 计划 863-SOC-Y-2-6-1,863-SOC-Y-3-3,国家自然科学基金海外杰出青年 学者合作研究基金项目 69928402,国家自然科学基金项目 69806004,教育部高等学校博士学科点专项 科研基金 2000024628 和教育部高等学校骨干教师资助计划资助 1 In response to this need, researches on RTL power estimation have been started as early as the 90’s of the last century and a number of high-level estimation techniques have been recently proposed since 1995. Generally speaking. All these techniques could be categorized into two classes, one of them tries to establish a tabular or equation model based on the analysis of a scale of circuit, these methods take as input the characteristic of stimulation vectors which are the decisive factor of module power dissipation. For a given vector, first, its probability and correlation are calculated, then a power table is searched or a power equation is evaluated using these parameters to produce power dissipation. Another kind of method goes much deeper into a module, the signal transition status of the inner nodes are analyzed, then transformation methods are performed which give the final power estimation. A common procedure for both of the two kinds of method are: 1) behavior level simulation to get vectors information as input parameter, 2) using specific models and these parameters to estimate power. The difference of the two kinds of method is that the first kind does not need to know inner structure of a module once the power library is established, while the second needs to know inner structure for a specific application. The method in [2] uses power factor approximation technique, which treats all the circuit input bits as ‘digital white noise’ and the product of an effective coefficient and input vector feature is used to reflect power dissipation, this brings a relative error as much as 80%. Method [3] gives more accurate results by treating different modules differently, however it is not a general method since equations are assumed to be provided by users for different modules. In method [4], a corresponding power table with four variables (average signal probability Pin, average signal transition density Din, spatial relativity SCin and temporal relativity TCin) as its dimension parameters is built first for each RTL module. Then the estimation process could be greatly simplified while good accuracy is still preserved. Its disadvantage is that to build a power library is considerably time-consuming. The inner-node-sampling method in [5] is quite different from the methods above,it focuses on STG (state transition graph) and takes part of inner node transition information as input parameters, a transformation equation based on these parameters gives the final result. However in this method, detailed circuit netlist is required in advance which is not suitable for practical RTL power estimation. Method [6] and [7] are general equation model, they use LMS (least mean square) or some other fitting algorithms to decide coefficients in their models. The parameters are quite like that used in [4], and it has fast calculation speed, spends much less time to build library, and with less accuracy. A careful study of the relationship between input vectors and module power dissipation shows that the property of any input port should not be neglected. Based on this observation, we bring forward our equation model which used the parameter of every input port instead of using their average parameters. We discard any mathematical fitting algorithms since all of them are too time-consuming. Instead, we adopted our new specific simulation procedure to decide the coefficients in our model. In Section 2, we will give our analysis of the property of module power, then in Section 3 and Section 4, we will describe the specific simulation procedure which determines the model coefficients, and in Section 5 we will evaluate the accuracy of our model, together with time complexity comparison. The last Section is the conclusion. 2. Preliminaries 2.1 Analysis of RTL module power dissipation The majority of current VLSI circuits are manufactured in CMOS process. Four main sources of power dissipation in CMOS circuits are: 1) Node signal transition which causes charge and/or discharge of parasitic capacitance. 2) Glitch power which is due to the inessential transition of glitches. 3) Short circuit current,since any signal transition will last for some time, when a momentary current flows from VDD to GND. 4) Static leakage current or sub-threshold current. Since 1), 2), 3) are associated with dynamic transition of a circuit, we call them dynamic power, while 4) is irrelevant to transition, we call it static power. For 2), it is extremely difficult to calculate, for it can only be observed using real-delay simulators, and it will occupy as much as 20% of total dynamic power. Part 3) is determined by average signal slewrate. A proper equation(1) of dynamic power is as follows: 2 m Pdynamic 0.5V 2 Cieff Eiavg (1) i 0 in which V is operating voltage,Cieff is effective capacitance on node i.,Eiavg is the average transition probability of node i. Static power 4) is mainly determined by circuit scale and process which can be described by a function directly proportional to the circuit scale. Practically speaking, the RTL simulators problems are black-box problems, i.e. the inner structure of each module is transparent to users no matter the module is a hard module which contains low level gate structure and interconnection information or a soft module which contains only VHDL code. In the case of soft modules, there is no guarantee that the final implementation is exactly the same as the one in our power library, for different synthesis tools will generate different implementation. If we stick to the module in our library, errors will be inevitable. Fortunately, [9] has pointed that mismatches in final implementation produced by technology, library and synthesis tools tend to have limited variance, although their absolute value can be significant. Library based on a certain technology can be easily modified to adapt other technologies, only a technology scaling parameter Stech is needed to make this modification. Technology tuning is performed once for all. We manage to build an inner-structure-irrelevant model, this model uses the same method to handle different kinds of RTL module for general purpose. The only information we can get to evaluate a system level design is module list and its interconnection. After behavioral level simulation, the input and output vectors of each module are known, so a thorough study of vectors is essential to find its contribution to module power dissipation, this will be discussed later in this Section. 2.2 Input vector property Often in RTL power analysis, average signal probability Pavg and average signal transition density Davg are used[7]. Take a circuit module with m input ports as a example. For a single input port with inputs V=[v1, … , vn], vx equals to 0 or 1, we define signal probability Pi and signal transition density Di,: n 1 n Pi V j V j 1 j 1 j V j 1 Di (def .1) n n 1 Obviously, P is the proportion of logical high in n vector cycles, while D is the proportion of signal transition in n-1 adjacent vector cycles. Then Pavg and Davg are defined as: m Pavg m n 1 n V j V i 1 j 1 i 1 j 1 j V j 1 Davg (def .2) mn m(n 1) the two parameters are not independent, the constraint between them as follows: D D P 1 2 2 (2) Statistically, P is a function of D, and the graph of this function is something like a reversed bell, so we focus on parameter D and believe that the effect of P could be reflected indirectly. 2.3 Error evaluation To evaluate the accuracy of our RTL model, we introduce two error factors: average relative error REavg and average total error TEavg. ,also we define max relative error Emax , RE avg 1 p p i 1 Ei x Ei x Ei x p i TEavg p E x E x i 1 i i 1 Ei x i 1 p Ei x Ei x MAX i 1 Ei x p Emax (def .3) p is total count of simulation number, x is the input vector in simulation, E i is the result of gate-level simulator and E is the result of RTL simulator. 2.4 Relation between power dissipation and input vectors 3 The result of a large number of simulation with different input vectors shows that it is signal transition density D that greatly affects power dissipation of a circuit, so it will definitely be the decisive factor in our equation based model. 1) As for a definite circuit, its power dissipation is proportional to the average hamming distance of input vectors. As illustrated in FIG (a) (b) 1(a), each point is the FIG 1 Transition vs Hamming distance simulation result of a vector sequence whose length is 100, and the transition probability over each input port is evenly distributed. Obviously, the relationship tend to be linear, very few points deviate from this line which shows that the error may be small if we use linear function to describe this relationship. 2) However, things are not so simple, as illustrated in FIG 1(b), if we unevenly distribute the transition probability over different input port, the power dissipation falls away from the linear line. Power value is not always the same even if the average input hamming distance is the same. So linear model is quite inadequate to describe this relationship. There must be some other parameters besides average hamming distance that affect module power. 3) We found that the contribution of signal transition of different input port is also different. This is because the number of directly and indirectly driven circuit node from a certain input port varies greatly. FIG 2 Different contribution As illustrated in FIG 2,the horizontal axis is input port number, and the vertical axis is unitary power dissipation. In order to get the unitary power dissipation value, we made a 0/1 and 1/0 transition over certain input port while maintain some fixed random value over the other port, and we registered the total transition of all circuit nodes. An average value is obtained by repeating this procedure for 100 times. Obviously, the contribution brought by various ports also varies greatly. 3. Equation-based RTL power model We believe it inadequate only to consider average transition density. If the information of input port contribution is totally neglected, any compensation method later will not make up for it by using signal spatial or temporal relativity. The power model equation is a complicated equation of each Di, Prtl _ dynamic f D1 , D2 ,...Dn (3) So we define power contribution factor γwhich reveals the relationship between single input port and power dissipation, i lim Di 0 Power Power Di Di (4) Where Di denotes the signal transition density of the i-th input port, our model is the 1st-order approximation of the complicated power plane of f using Taylor’s expansion. n Power 1 Prtl _ dynamic f (0,0,0...0) Di (......) ...... i Di (5) Di 2! i where βis general power factor,n is input port number,γi the unitary contribution of the ith input port,Di is the transition density of the ith input port. Here f (0,0,0...0) 0 because a vector sequence without any transition will not cause any power dissipation. The first part of this model is 4 dynamic power, which is proportional to the summation of all the contribution of each input port. The second part is static power which is proportional to number of gates in this module. For static power, since it is almost proportional to circuit size, we have: Prtl _ static m (6) where λis static factor and m is the total node number of the circuit. 4. Procedure of our algorithm 4.1 Coefficients of dynamic part In stead of the traditional fitting algorithm (as in [6][7][8]), we developed a new training-vectors-free method which has relative faster speed to calculate model parameters. In our method, a transformation procedure is used to turn some gate-level simulation results into model parameters, which are as follows: a) For a definite input port i of a circuit, set its signal transition density as 100%, i.e. each time the corresponding signal will change its status. b) For the other input ports, set random vectors. c) Perform gate-level simulation and calculate the power dissipation of 2 simulation cycles. d) Repeat procedure a)--c) for N times and get the average power value Pi for input port i. e) Change another input port and repeat a)--d) until all the input ports are calculated, thus we get a set of power values. f) Unify these power values, so we get the coefficients γi. P . i i P i i g) Generate a number of random test vectors and perform gate-level simulation to get precise power value, then calculate the ratio. Preal m i Di i h) Repeat procedure g) for N times and get the average value of the ratio, this is the value of coefficient β. Note that here N refers to minimum sample size. According to the well-known central limit theorem in statistics shows that the sample mean and error distribution approaches the normal distribution for large N. Let α denote the confidence level, to achieve a confidence level of (1-α) and an error tolerance of ε, the number of samples required is 2 S t N [ N 1, / 2 ] (7 ) P P and S are the sample mean and sample variance respectively, the quantity of t can be obtained from the t-distribution function with N-1 degrees of freedom. We set 1-α=95% andε=5%, so, after more than N iteration, we are in 95% confidence that the error of sample mean is within ±5% 4.2 Coefficients of static part As for the static part, things are even simpler, we only need to use power estimator to get static power value, then this value is divided by gate number to get static coefficient λ. 5. Results This Section presents results from the application of our new power model to the ISCAS85 benchmark circuits. In the power library, a corresponding set of parameters is built for each circuit. A specific algorithm is performed to produce all kinds input vectors with its length 100, which cover the possible range of D. For each input port, we assigned a random transition density value and generate vectors according to this value, in this way our results may be more practical. 1000 vectors are used to evaluate the accuracy of the equation model. The gate level simulator we use to build power library and make comparison is a unit-delay simulator FEC. FEC uses the method of timing-event sequence[10] to simulate circuit functioning 5 and gives precise transition number of a circuit stimulated by certain input vectors, it is a relatively accurate gate-level simulator. Table 1 gives the accuracy of our model for the ISCAS85 circuits,the hardware platform we use is Pentium III 550,128M RAM。 REavg TEavg Emax Circuit Gate Time to build Name number library(s) C1355 546 1.90% 0.42% 5.34% 85 C1908 880 3.17% 0.11% 11.31% 93 C2670 1193 2.27% 0.03% 10.76% 147 C3540 1669 5.29% 0.11% 13.85% 158 C432 160 1.57% 0.07% 7.76% 7 C499 202 0.41% 0.01% 2.59% 10 C5315 2184 1.87% 0.26% 6.53% 796 C6288 2406 4.24% 0.17% 13.47% 1265 C7552 3512 1.38% 0.26% 4.01% 1540 C880 383 3.41% 0.48% 12.04% 30 Table 1: Accuracy of power estimation Table 2 gives the error comparison with some other simulators listed in the reference, and Table 3 gives the comparison of time for building library. Average Relative Error REavg(%) Circuit [4] [7] [8] [9] OUR MODEL Name C1355 4.91 13.06 12.1 15.6 1.90 C1908 3.85 4.66 13.6 3.17 C2670 3.08 6.09 21.9 2.73 C3540 3.61 4.66 21.7 5.32 C432 5.56 3.46 18.5 11.4 2.40 C499 4.05 12.03 4.5 2.29 C6288 3.75 9.68 21.7 31.6 4.24 C7552 3.03 8.23 16.5 5.3 1.38 C880 3.73 6.4 15.8 8.8 3.41 Table 2 Average error comparison with other methods Circuit Name C1355 C1908 C2670 C3540 C432 C499 C6288 C7552 C880 Time to build power library (s) [4] [7] [8] [9] OUR MODEL 7524 1332 5616 85 16524 2168 8676 93 51228 8064 28944 147 76428 18288 9684 158 42300 1351 6300 7 8280 979 5616 10 123840 33228 16524 1265 210240 71100 42228 1540 22464 1746 14364 30 Table 3 Time comparison with other methods comment:the platform [7][8] used is SUN UltraSparc 1, 64M RAM 6. Conclusion In this paper, we present a new analytical method for RTL power estimation, all the parameters are obtained exclusively from input statistics, any extra information is not needed at all. After the power library is established, a relatively accurate power value could be predicted for each RTL 6 module. From Table 1,2 and 3, we can obviously find the advantage of analyzing signal transition over each input port: a) The accuracy is improved to some extent. b) Very short time for building power library without human intervention and any training vector are not needed. c) Only input vectors are needed to predict module power dissipation after power library is build. d) Can handle all kinds of combinational module circuit due to its generality. In our future work, we will focus on signal spatial relativity and the contribution due to this factor in order to improve model accuracy, also we will find ways to predict the power of sequential RTL module. Reference [1] F.Najm, “A survey of power estimation techniques in VLSI circuit” on IEEE trans. On VLSI Syst., Vol. 2, Dec. 1994, pp.446-455. [2] S.R.Powell and P.M.Chau, “Estimating power dissipation of VLSI signal processing chips: the PFA technique” in Proc. VLSI Signal Processing IV, 1990, pp250-259. [3] P.Landman and J.M. Rabaey, “Architectural power analysis: the Dual bit type method,” on IEEE trans. On VLSI systems, Vol.3, June 1995, pp173-187. [4] Subodh Gupta and Farid N. Najm, “Power Modeling for High-Level Power Estimation” on IEEE trans. On VLSI,Vol. 8, No.1, Feb 2000, pp18-29. [5] Jing-Yuan Lin, Wen-Zen Shen and Jing-Yang Jou, “A Structure-Oriented Power Modeling Technique for Macrocells” on IEEE trans. On VLSI, Vol.7, No.3, September 1999, pp380-391. [6] Cheng-Ta Hsieh, Qing Wu, Chih-Shun Ding, Massoud Pedram, “Statistical Sampling and Regression Analysis for RT-Level Power Evaluation” on IEEE conf. On CAD,1996, pp583-588. [7] Subodh Gupta and Farid N. Najm, “Analytical Model for High Level Power Modeling of Combinational and Sequential Circuits” on IEEE symposium Low Power Design, 2000. [8] Subodh Gupta and Farid N. Najm “Energy-Per-Cycle Estimation at RTL” on ICCAD 97. [9] Alessandro Bogliolo and Luca Benini “Robust RTL Power Macromodels” on IEEE trans. On VLSI, Vol.6, No.4 Dec 1998, pp578-581 [10]TANG pushan, (英文书名)<<Theory and Methods of VLSI Computer Aided Design>>, Fudan University publishing house,1990.(in Chinese)[唐璞山 <<VLSI 计算机辅助设计理论和方 法>>,复旦大学出版社,1990] 7 作者简介: 赵文庆 1950 年生,分别于 1977 年、1983 年于复旦大学物理系获得学士和 硕士学位,现任复旦大学电子工程系教授,目前主要从事集成电路的计 算机辅助设计和系统开发、物理版图的设计和验证等算法研究。 崔铭栋 1976 年生,1998 年于复旦大学获得电子工程系学士学位,现在复 旦大学电子工程系 CAD 实验室攻读硕士学位,主要从事集成电路的散 热分析和功耗估算方面的研究。 唐璞山 1934 年生,1953 年毕业于复旦大学物理系,毕业后一直在复旦大 学从事半导体器件物理、工艺以及集成电路设计与 CAD 方面的研究。 现任复旦大学电子工程系教授,博士生导师,目前主要研究领域是集成 电路设计方法,自动版图设计和验证的算法和系统开发。 Zhao wenqing was born in 1950, He received B.S. and M.S. degree from Physics Department of Fudan University in 1973 and 1983 respectively. Now He is a professor in E.E.Department of Fudan University. His research interests are R&D of VLSI CAD, including layout synthesis, verification and logic synthesis. Cui mingdong, was born in 1976, he received B.S. degree from Electronic Engineering Department of Fudan University in 1998. Now he is reading for M.S. degree in CAD lab, E.E.Dept of Fudan University, and his research interests are VLSI power estimation and heat dissipation analysis. Tang pushan was born in 1934, he graduated from the Physics Department, Fudan University, China in 1953. Since then he has been working on research and teaching of semiconductor device physics, IC design and fabrication, and computer-aided design of ICs. Now he is a professor Doctoral Supervisor in E.E.Department of Fudan University. His research Interests are VLSI CAD Methodology, including VLSI layout/verification algorithms and VLSI system design. 8 算法创新点: 算法分析了输入向量的特性以及功耗和输入向量的关系,得到两条结论:一、功耗和输 入向量的平均翻转密度密切相关。二、每个输入端的翻转对功耗的贡献是不一样的。基于此 建立了简洁明了的泰勒一阶展开功耗公式模型,引入功耗贡献因子和功耗比例系数,使用每 个输入端口的信号翻转密度作为模型参数。建库时用模拟计算功耗贡献因子和功耗比例系数 的方法,以一定的可信度和误差容许限度进行蒙特卡罗概率模拟。设计了特别的向量产生算 法。额外增加随机分配的端口翻转密度,产生的向量更接近于实际情况。 详细摘要: 随着 CAD 工具的发展,在目前的电路设计领域,设计人员普遍在 RTL 级进行电路设计并 使用合适的工具进行综合。在现今的系统设计中,不同的总体结构可能在功能上完全一致, 而在功耗上却大相径庭,而如何能从功耗方面选择较好的结构,过去不得不依赖所有电路细 节设计结束后进行低层次的功耗估算,这种方法的两大弊端是耗费时间太长并且设计流程反 复开销太大。所以设计者迫切需要能够在更高的设计层次――RTL 级就对功耗作出正确的评 估,籍此选择较好的设计架构。 在“A Fast RTL Power Estimator for Combinational Circuit”一文中,作者采用了 黑箱子的方法,即试图在不知道 RTL 模块内部结构的前提下估算其功耗。估算的模式是一次 建库,多次计算。为了达到这个目的,首先对输入向量的特性以及功耗和输入向量的关系进 行分析,得到两条结论:一、功耗和输入向量的平均翻转密度密切相关,基本成正比关系。 二、每个输入端的翻转对功耗的贡献是不一样的,需要区别对待。根据观察影响功耗因素的 结果,建立了独特的基于泰勒一阶展开的功耗公式模型,使用的参数是每个输入端口的信号 翻转密度,模型的系数采用功耗贡献因子和功耗比例系数。 在建库过程中,提出了基于模拟计算功耗贡献因子和功耗比例系数的方法。借用蒙特卡 罗的模拟思想,用一定的可信度和误差容许限度进行概率的模拟,可以很快的结束计算过程 并的到相对精确的系数结果。 为了提高验证的可信度,作者设计了特别的向量产生算法。额外增加了随机分配的端口 翻转密度,这样产生出来的向量更接近于实际情况。 最后,该功耗模型对 ISCAS85 BenchMark 电路进行建库和估算,大量的输入向量被用来 测试和验证这个模型,并和国际上成熟的算法做了比较。可以看出,在精度相当的情况下, 建库时间有数量级的提高。 9 另图: (a) (b) FIG 1 Transition vs Hamming distance FIG 2 Different contribution 10 图题和表题译文: Table 1: Accuracy of power estimation 表一、功耗估算的准确性 Table 2 Average error comparison with other methods 表二、平均误差和其他方法的比较 Table 3 Time comparison with other methods 表二、耗时和其他方法的比较 FIG 1 Transition vs Hamming distance 图一、节点总翻转数和海明码矩的关系 FIG 2 Different contribution 图二、不同的功耗贡献 11 对评审退改的回答意见 尊敬的<<半导体学报>>编辑部: 我们仔细阅读了评审意见,并答复如下: 1、参见论文附页 2、论文中采用的例子是 ISCAS85 Benchmark,电路类型包括优先编码器、ALU、 乘法器、加法器、控制逻辑等类型,具有很好的代表性。在本文中对所有 Benchmark 中的组合电路例子都进行了验证。一方面目前我们还没有找到更好 的例子,另一方面 ISCAS85 Benchmark 的计算结果是绝大部分国际上功耗研究 相互比较使用的惯例,所以我们将暂时使用这些电路来说明问题。 3、参见论文题目。 4、详细摘要和作者中英文简介请参见论文附页,增加了基金资助说明,参见题注。 5、由于 RTL 功耗估算是近来比较新的问题,国内对其的研究刚刚开始,贵刊在前 几年的杂志中,相关的文献较少,所以我们引用的大多数是国外的研究成果。 6、中文图题、表题和参考文献请参看论文附页。 7、我们已邮寄修改稿和所有其他需要的文档,所有的有关文档将以电子邮件发到 贵编辑部,请在近期内查收。 我们同意转让论文版权,请寄给我们版权转让书。 此致 敬礼 赵文庆,崔铭栋,唐璞山 2001/5/31 12
© Copyright 2026 Paperzz