Stochastic Approaches for Dynamic Thermal Management in High Performance Microprocessor Chips
A Stochastic Local Hot Spot Alerting Technique — In an ASPDAC-08 conference paper, we addressed the questions of how and when to identify and issue a hot spot alert in a microprocessor. These are important questions since temperature reports by thermal sensors may be erroneous, noisy, or arrive too late to enable effective application of thermal management mechanisms to avoid chip failure. More precisely, we presented a stochastic technique for identifying and reporting local hot spots under probabilistic conditions induced by uncertainty in the chip junction temperature and the system power state. In particular, we introduced a stochastic framework for estimating the chip temperature and the power state of the system based on a combination of Kalman Filtering (KF) and Markovian Decision Process (MDP) model. Experimental results demonstrated the effectiveness of the framework and show that the proposed technique alerts about thermal threats accurately and in a timely fashion in spite of noisy or sometimes erroneous readings by the temperature sensor.
Continuous Frequency Adjustment Technique Based on Dynamic Workload Prediction — In a VLSI Design-08 conference paper, we presented a technique for continuous frequency adjustment (CFA) which enables one to adjust the frequency values of various functional blocks in the system at very low granularity so as to minimize energy while meeting a performance constraint. A key feature of the proposed technique is that the workload characteristics for functional blocks are effectively captured at runtime to generate a frequency value that is continuously adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. The workload prediction is accomplished by solving an initial value problem (IVP). Applying CFA to a real-time system in 65nm CMOS technology, we demonstrate the effectiveness of the proposed technique by reporting 13.6% energy saving under a performance constraint.
A Unified Framework for System-level Design: Modeling and Performance Optimization of Scalable Networking System — In an ISQED-07 conference paper, we presented a new unified modeling framework, called the extended queuing Petri net (EQPN), which combines extended stochastic Petri net and G/M/1 queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power and temperature optimization for high-speed scalable networking systems. The EQPN model is employed to represent the performance behaviors and to minimize power consumption of the system under performance constraints through mathematical programming formulations. Being able to model the system with the EQPN would enable the users to accomplish the design of reliable and optimized system at the beginning of design cycle. The proposed system model was compared with existing stochastic models with real simulation data.
Minimizing Power Dissipation during Write Operation to Register Files — In an ISLPED-07 conference paper, we introduced a power reduction mechanism for the write operation in register files (RegFiles), which adds a conditional charge-sharing structure to the pair of complementary bit-lines in each column of the RegFile. Because the read and write ports for the RegFile are separately implemented, it is possible to avoid pre-charging the bit-line pair for consecutive writes. More precisely, when writing same values to some cells in the same column of the RegFile, it is possible to eliminate energy consumption due to precharging of the bit-line pair. At the same time, when writing opposite values to some cells in the same column of the RegFile, it is possible to reduce energy consumed in charging the bit-line pair thanks to charge-sharing. Motivated by these observations, we modified the bit-line structure of the write ports in the RegFile removing the per-cycle bit-line pre-charging and employing conditional data dependent charge-sharing. Experimental results on a set of SPEC2000INT / MediaBench benchmarks showed an average of 61.5% power savings with 5.1% area overhead and 16.2% increase in write access delay. Lower power dissipation also resulted in lower substrate temperature in the RegFile.
Active Bank Switching for Temperature Control of the Register File in a Microprocessor — In a GLS-VLSI-07 paper, we described an effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor. The idea is to divide the physical register file into two equal-sized banks, and to alternate between the two banks when allocating new registers to the instruction operands. Experimental results show that this periodic active bank switching scheme achieves 3.4? of steady-state temperature reduction, with a mere 0.75% average performance penalty.
Dynamic Thermal Management for MPEG-2 Decoding In an ISLPED-06 paper, we presented an effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation. Given a target MPEG-2 decoding time, we dynamically select either an intra-frame spatial degradation or an inter-frame temporal degradation strategy in order to make sure that the microprocessor chip will continue to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss. For our experiments, we used the MPEG-2 decoder program of MediaBench and modify/combine Wattch and HotSpot for the power and thermal simulations and measurements, respectively. Our experimental results demonstrated that we can achieve thermally safe state with spatial quality degradation of 0.12 RMSE and with frame drop rate of 12.5% on average.
Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach — In an ICCD-06 paper, we introduced a stochastic DTM technique in high-performance VLSI system with especial attention to the uncertainty in temperature observation. More specifically, we presented a stochastic thermal management framework to improve the accuracy of decision making in DTM, which performs dynamic voltage and frequency scaling to minimize total power dissipation and on-chip temperature. Multi-objective optimization with the aid of a mathematical programming solver was used to reduce operating temperature. Experimental results with a 32-bit embedded RISC processor demonstrated the effectiveness of the technique and show that the proposed algorithm ensures thermal safety under performance constraints.