An Efficient Low-Power Instruction Scheduling Algorithm for Embedded Systems Contents • • • • • • • • • • Introduction Bus Power Model Related Works Motivation Figure-of-Merit Algorithm Overview Random Scheduling Schedule Selection Experimental Result Conclusion Introduction (1/3) • Nomadic life-style is wide spreading these days thanks to the rapid progress in microelectronics technologies. • Not only did electronic equipment get smaller, it got smarter. • The low power electronics will play a key role in this nomadic age. • The figure-of-merit for nomadic age = (intelligence)/(Size * Cost * Power). Electronic Equipment toward Smaller Size Nomadic Tool Introduction (2/3) • ASIPs have high-programmability and application specific hardware structure. • Because of ASIPs have high configurability and productivity, it have the merit of time-to-market. • Retargetable compiler is essential tool for application analysis and code generation in the design of ASIPs. • By equipping a retargetable compiler with an efficient scheduling algorithm, low-power code can be generated. Compiler-in-Loop Architecture Exploration Introduction (3/3) • The power consumption of ASIP instruction memory was found to be 30% or higher of the entire process power consumption. • Minimizing power consumption at instruction bus is critical in lowpower ASIP design. Power Distribution for ICORE Bus Power Model (1/2) • Bit transition on bus lines is one of the major contributing factors to power consumption. • Traditional power model just uses self-capacitance model. • With the development of nanometer technologies, couplingcapacitance is significant. • As a result, how to solve the crosstalk problem on buses has become an important issue. Self capacitance model Self & coupling capacitance model Bus Power Model (2/2) • Crosstalk type • Bus power model Instruction Recoding • Instruction recoding analyzes the performance pattern of the application program and reassign the binary code. • Histogram graphs are used for analysis of application performance pattern. • Chattopadhyay et al. obtained initial solution using MWP and applied simulated annealing with the initial solution. Histogram Graph Cold Schedule • C. Su et al. first proposed cold scheduling – How to reflect Control Dependency into SCG ? – Why MST and Simulated Annealing as postprocess ? – TSP is better choice Cold Schedule • K. Choi et al. has formulated cold-scheduling as TSP problem – Reasonable approach • C. Lee et al. expanded cold-scheduling to VLIW Motivation (1/2) Recoding Cold-Scheduling Input Instruction sequence Instruction binary format Output Recoded instruction binary Instruction order Optimization Scope Global Local Considered Inst. Field Partial field All fields Comparison between Recoding and Cold Scheduling Motivation (1/2) (a) Different Scheduling Results (b) Constructed Histogram Graphs (c) Optimal Recoding Results Figure-of-Merit • Maximizing the variance of transition edge weights increases the efficiency of recoding. • The larger the sum of self-loop edge weights, the greater will be the power saving effect of a code sequence. Figure-of-merit Algorithm Overview • Presented FM is global function • Global instruction scheduling is difficult to implement • We solve the optimization problem using random schedule gathering and schedule selection Schedule Selection Random Scheduling Make_Schedules_for_BBs (BB_SET[ ]) begin for each BB in BB_SET[ ] do begin list_schedule_solution = LIST_SCHEDULE (BB); latency_UB = LATENCY (list_schedule_solution); Insert list_schedule_solution to Schedules_for_BBs[BB]; for i = 0 until ITERATION_COUNT (BB) do begin new_schedule = RANDOM_SCHEDULE (BB); acceptable = False; if (LATENCY (new_schedule) <= latency_UB) then begin acceptable = True; for each schedule solution s in Schedules_for_BBs[BB] do if (LATENCY (s) == LATENCY (new_schedule)) then begin similarity_measure = COMPARE (s, new_schedule); if (similarity_measure > Threshold*LATENCY (new_schedule)) then begin accpetable = False; break; end end if (acceptable) then Insert new_schedule to Schedules_for_BBs[BB]; end end end return Schedules_for_BBs[ ]; end • Considerations - Runtime performance - BB size and iteration count - Differences (similarity) between random schedules Schedule Selection (1/3) • Problem formulation • Global histogram graph can be decomposed to local histograms • So, we can consider the divide-and-conquer algorithm Merge of Histogram Graphs Schedule Selection (2/3) • NP-Hardness - To maximize the global variance, must consider not only the sum of each local variances but also covariance of all local histogram pairs Schedule Selection (3/3) • We used the dynamic programming method to achieve local cost maximization via a bottom-up approach • For further optimization, we used simulated-annealing Greedy Selection Algorithm Experimental Result • We used PCC as our measure of performance. Comparison of PCC Values Conclusion • We presented a new instruction scheduling algorithm for low-power code synthesis • It’s very exhaustive method to generate low-power code in application specific domain • But, advance of computing power makes our method reasonable
© Copyright 2026 Paperzz