1999-2004 Presentations

ICCAD-04 Half-Day Tutorial

"Best Practices in Low-Power Design, Part I," M Pedram, ICCAD, Nov. 2004.

Abstract -- Description: In the last decade, huge effort has been invested to come up with a wide range of design solutions that help in solving the power consumption problem for different types of electronic devices, components and systems. Some of those solutions turned out to be very practical and effective, thus finding a path into commercial products of a different nature. Other approaches, which sounded promising on paper, showed too many limitations for attracting the attention of real designers. The objective of this tutorial is to offer the attendees some well-established, yet innovative recipes for addressing the power problem in real life. The first part of the tutorial tutorial will describe basic techniques, applicable at different levels of abstraction, regarding frequency and voltage dynamic control, as well as solutions for leakage power management.

DATE-04 Embedded Tutorial

"Distributed Multimedia System Design: A Holistic Perspective - Part III," M Pedram, DATE, Feb. 2004.

Abstract -- Multimedia systems play a central part in many human activities. Due to the significant advances in the VLSI technology, there is an increasing demand for portable multimedia appliances capable of handling advanced algorithms required in all forms of communica-tion. Over the years, we have witnessed a steady move from stand-alone (or desktop) multimedia to deeply distributed multimedia systems. Whereas desktop-based systems are mainly optimized based on the performance constraints, power consumption is the key design constraint for multimedia devices that draw their energy from batteries. The overall goal of successful design is then to find the best mapping of the target multimedia application onto the architectural resources, while satisfying an imposed set of design constraints (e.g. minimum power dissipation, maximum performance) and specified QoS metrics (e.g. end-to-end latency, jitter, loss rate) which directly impact the media quality. This tutorial addresses a few fundamental issues that make the design process particularly challenging and offers a holistic perspective towards a coherent design methodology.

ASP-DAC-04 Tutorial

"Design and Runtime Techniques for Leakage Control and Minimization of CMOS VLSI Circuits in Active and Sleep Modes," F. Fallah and M. Pedram, ASP-DAC, Jan 2004.

Abstract -- In many new designs, the leakage component of power consumption is comparable to the dynamic component. Many reports indicate that 50% or even higher percentage of the total power consumption is due to the leakage of transistors and this percentage will increase with technology scaling unless effective techniques are used to bring leakage under control. This tutorial will focus on circuit techniques and design methods to accomplish this goal. We will start the tutorial by describing the main sources of leakge in CMOS VLSI circuits and how these sources will scale with technology scaling. Next we will review a number of leakage current scenarios (ACTIVE and SLEEP mode), types of leakage control solutions (DESIGN vs. RUNTIME based solutions) and expected performance impacts. We will then present a few examples of DESIGN-based techniques for subthreshold leakage control. More precisely, we will explain how technology mapping can be modified to reduce the leakage through concurrent assignment of threshold voltages and transistor sizes as well as library cell selection. We will then describe a precomputation-based guarding technique, which reduces both the leakage and dynamic power and show the tool flow that may be used for applying it to an industrial VLIW processor. We will show how the low leakage of combinational gates, Flip Flops and bus drivers found in ASIC cell libraries can be reduced through circuit design and layout optimization techniques. The tutorial will be continued by presenting RUNTIME mechanisms for subthreshold leakage control. More specifically, we will talk about forward and backward biasing techniques and the transistor stacking technique. We will next present an algorithm for finding the minimum leakage vector of a circuit and show how the results can be improved by adding more controllability to a given circuit. One advantage of this method is that it can be applied to a sequential circuit without any delay overhead. We will also describe proven techniques for power gating and how to avoid potential power plane integrity problems. Finally, we will show how the gate-tunneling leakage can be reduced by using high threshold, thick-oxide sleep transistors. Another method for reducing the gate-tunneling leakage is using dual oxide technology. This method is analogous to the dual threshold technique for reducing the sub-threshold leakage. We will discuss how this method can be combined with the dual threshold technique to reduce both sub-threshold and gate-tunneling leakage.

ASP-DAC-03 Tutorial

" Energy-Aware Networked Multimedia Systems: Modeling, Analysis and Optimization," R. Marculescu and M Pedram, ASP-DAC, Jan 2003.

Abstract -- In this tutorial, we address the fundamental issues in the design and optimization of modern mobile multimedia systems (from both hardware and software perspectives) and illustrate the potential power/performance trade-offs and their impact on media quality. Most notably, the transition from desktop multimedia to portable multimedia based on heterogeneous design platforms brings concurrency and communication as key players in system-level modeling, analysis and optimization of these systems. For complex multimedia systems composed of many heterogeneous components that interact and communicate, early power/performance estimation and QoS-based power management are the critical steps for judicious allocation of the on-chip resources. This is particularly important since the on-chip resources are very limited compared to the available resources in desktop multimedia systems. As a practical means, we will use examples from the design of the Apollo Testbed to illustrate concepts and methods for dynamic voltage and frequency scaling, power management and power-efficient encoding techniques. MPEG-2 will be featured as the driver application to illustrate the impact of different design choices on multimedia systems where the QoS requirements vary considerably and power and buffering resources are very limited. For such systems, the ability to explore many application-architecture mappings using different computational resources and communication schemes, while trying to satisfy tight QoS requirements, becomes of crucial importance.

Apollo Testbed

"Overview of the Apollo Testbed II," M. Pedram, Dec. 2003.

"Apollo: Adaptive power optimization and control for the land warrior," M. Pedram, Apr. 2001.

Abstract -- In my talk, I will provide an overview of the power management work at USC and the Apollo Testbed. This testbed is used as a platform to develop and/or evaluate various power optimization and management techniques targeted toward embedded systems.

Leakage Modeling and Minimization

"Precomputation-based Guarding for Dynamic and Leakage Power Reduction," A. Abdollahi, F. Fallah and M. Pedram, Jun. 2003.

"Leakage Current Reduction in Sequential Circuits by Modifying the Scan Chains," A. Abdollahi, F. Fallah and M. Pedram, Mar. 2003.

"Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits," A. Abdollahi, F. Fallah and M. Pedram, Aug. 2002.

Dynamic Power Management

"Battery-Aware Power Management Based on Markovian Decision Processes," P. Rong and M. Pedram, Nov. 2002.

"Dynamic power management in a mobile multimedia system with guaranteed quality-of-service," M. Pedram, June 2001.

"Stochastic modeling of a power-managed system: construction and optimization," Q.Qiu and M. Pedram, November 2000.

Dynamic Voltage and Frequency Scaling

"Dynamic Voltage and Frequency Scaling Under a Precise Energy Model Considering Variable and Fixed Components of the System Power Dissipation," K. Choi, W-B. Lee, R. Soma and M. Pedram, Nov. 2004.

"Dynamic Voltage and Frequency Scaling based on Workload Decomposition," K. Choi, R. Soma and M. Pedram, Aug. 2004.

"Off-chip Latency-Driven Dynamic Voltage and Frequency Scaling for MPEG Decoding," K. Choi, R. Soma and M. Pedram, Jun. 2004.

"Fine-Grained Dynamic Voltage and Frequency Scaling for Precise Energy and Performance Trade-off based on the Ratio of Off-chip Access to On-chip Computation Times," K. Choi, K. Soma and M. Pedram, Feb. 2004.

"Frame-Based Dynamic Voltage and Frequency Scaling for an MPEG Decoder," K. Choi, K. Dantu, W-C. Cheng and M. Pedram, Nov. 2002.

Dynamic Workload Distribution in Ad Hoc Networks

"An Energy-Aware Simulation Model and Transaction Protocol for Dynamic Workload Distribution in Mobile Ad Hoc Networks," F. Ghasemi, P. Rong, and M. Pedram, June 2003.

"Extending the Lifetime of a Network of Battery-Powered Mobile Devices by Remote Processing: A Markovian Decision-based Approach," P. Rong and M. Pedram, Jun. 2003.

"Energy-Aware MPEG-4 FGS Streaming," K. Choi, M. Pedram and K. Kim, Jun. 2003.

Dynamic TFT LCD Backlight Scaling and DVI Encoding

"Concurrent Contrast and Brightness Scaling for a Backlit TFT-LCD Display," W-C. Cheng, Y. Hou and M. Pedram, Feb. 2004.

"Chromatic Encoding: a Low Power Encoding Technique for Digital Visual Interface," W-C. Cheng and M. Pedram, Mar. 2003.

Physical Design and Logic Synthesis

"Interconnect Design for Memory IC's," C-S. Hwang and M. Pedram, Feb. 2004.

"Technology Mapping and Packing for Coarse-grained, Anti-fuse Based FPGAs," C-W. Kang, A. Iranli and M. Pedram, Feb. 2004.

"Optimizing the Energy-Delay-Ringing Product in On-Chip CMOS Line Drivers," S. Abbaspor, M. Pedram and P. Heydari, Mar. 2003.

"Low-power Synthesis of FSMs with Mixed D & T Flip-Flops," A. Iranli, P. Rezvani, M. Pedram, Jan. 2003.

"Effective Capacitance for the RC Interconnect in VDSM Technologies," S. Abbaspor and M. Pedram, Jan. 2003.

"Technology Mapping for Low Leakage Power and High Speed with Hot-Carrier Effect Consideration," C-W. Kang and M. Pedram, Jan. 2003.

"A Graph-Theoretic Approach to Algebraic Kernel Extraction in Boolean Networks," P. Rezvani and M. Pedram, January 2001.

"Analysis of Jitter due to Power-Supply Noise in Phase-Locked Loops, " P. Heydari and M. Pedram, May 2000.

"Analysis and Optimization of Ground Bounce in Digital CMOS Circuits," P. Heydari and M. Pedram, September 2000.

Static Timing and Signal Integraity Analysis

"Gate Delay Calculation Considering the Crosstalk Capacitances," S. Abbaspour and M. Pedram, Feb. 2004.

Low Power Encoding

"BEAM: Bus Encoding Based on Instruction-Set-Aware Memories," Y. Aghaghiri, M. Pedram and F. Fallah, Jan. 2003.

"Low power address bus encoding techniques," M. Pedram, May 2001.

"Memory bus encoding for low power: a tutorial," W-C. Cheng and M. Pedram, April 2001.

Temperature-dependent Signal Integrity Analysis and Optimization

"Analysis of Substrate Thermal Gradient Effects on Optimal Buffer Insertion," A. H. Ajami, M. Pedram, K. Banarjee, Nov. 2001.

"Analysis of Non-uniform Temperature-Dependent Interconnect Performance in High Performance ICs," A. Ajami, M. Pedram, K. Banarjee and L. van Ginneken, June 2001.

"Effects of Non-uniform Substrate Temperature on the Clock Signal Integrity in High Performance Designs," A. Ajami, K. Banerjee and M. Pedram, May 2001.

"Analysis and Optimization of Thermal Issues in High Performance VLSI," K. Banerjee, M. Pedram and A. Ajami, April 2001.

Abstract -- In my talk, I will focus on analysis and modeling of non-uniform chip temperature profile and study of its effects on different aspects of signal integrity and performance in very high performance VLSI interconnects. A fundamental understanding of the heat transfer process inside a chip is necessary in order to provide some of the critical thermal boundary conditions for the interconnect lines. Consequently, a review of methods to calculate the thermal profiles of VLSI interconnect lines is given first. A non-uniform temperature-dependent distributed RC interconnect delay model is proposed next. The model has been applied to a wide variety of interconnect layouts and temperature distributions to quantify the impact of these thermal non-uniformities on signal integrity issues. Using this model, it is shown that clock distribution networks are the most vulnerable signal nets to the thermal non-uniformities of the substrate. Subsequently, a thermally driven near-zero skew clock routing methodology is proposed. Moreover, it is shown that the non-uniform substrate thermal profile can affect the optimal buffer insertion routines. As a result, new design guidelines are provided to reduce these effects on the optimality of the buffer insertion. From these examples, it becomes evident that many EDA flow optimization steps are affected by the non-uniformity of the substrate temperature.

Low Power Design Optimization

"Microprocessor Power Analysis by Labeled Simulation," C-T Hsieh, K. Chen and M. Pedram, January 2000.

"Architectural Power Optimization by Bus Splitting," C-T Hsieh and M. Pedram, January 2000.

"High-level design challenges and solutions for low power systems," M. Pedram, October 1999.

"Low power design methodologies and techniques: an overview," M. Pedram, March 1999.

"Power analysis and optimization," M. Pedram, November 1997.

Abstract -- Driven by increased levels of device integration and complexity, together with higher device speed, power dissipation has become a crucial design concern, limiting the number of devices that can be put on a chip and dramatically affecting the packaging and cooling costs associated with ASICs. Power dissipation is even a bigger concern for the class of battery-powered personal computing devices and wireless communication systems. Yet, today's design methodologies and CAD tools do not adequately address this critical problem. In these talks, I will provide an overview of the research work at USC which focuses on developing low-power design methodologies and CAD tools for power modeling, analysis and optimization to address the growing demand for power management within VLSI circuits and deep submicron technologies.

Integrated Logical/Physical Design for Deep Submicron Circuits

"Buffered Routing Tree Construction under Buffer Placement Blockages," W. Chen and M. Pedram, March 2001.

"Post-Layout Timing-Driven Cell Placement Using an Accurate Net Length Model with Movable Steiner Points," A. Ajami and M. Pedram, January 2001.

"Simultaneous Gate Sizing and Fanout Optimization," W. Chen, C-T. Hsieh and M. Pedram, March 2001.

"Performance-Driven Concurrent Placement and Gate Sizing for Deep Submicron Circuits," W. Chen and M. Pedram, December 2000.

"PILOT: Placement with Integrated Logic Optimization for Timing," J. Lou and M. Pedram, November 1999.

"Integrated logical/physical design for DSM Circuits," A. Salek and M. Pedram, April 1999.

"Linking layout to logic synthesis: a unification-based approach," M. Pedram, February 1998.

Abstract -- Deep-sub-micron (DSM) design realities, such as the dominance of interconnect delays and the rise of signal integrity concerns, have forced IC designers and EDA vendors to re-think the existing design methodologies and tools. Three distinct approaches are evolving. The first is based on a constraint-driven forward synthesis approach with local iterations between physical planning and logic optimization. The second is based on performance-driven physical design followed by post-layout local optimizations such as logic restructuring and remapping, rewiring, buffering, and resizing. The third approach is based on early circuit partitioning and floorplanning followed by a unified, constraint-driven logical and physical design of each part. In my talk, I will make the case for the unification-based approach, present algorithms and techniques for solving the problems of simultaneous gate sizing and placement and concurrent fanout optimization and global routing. I will conclude my talk with a synopsis of the missing links and an overview of our ongoing research work at USC.

Model Order Reduction using Balanced Truncation

"Balanced Truncation with Spectral Shaping for RLC Interconnects," P. Heydari and M. Pedram, January 2001.

"Model Order Reduction of Large Circuits Using Balanced Truncation Via the Arnoldi Method," P. Rabiei and M. Pedram, February 1999.

Abstract -- This talk focuses on the problem of developing a numerically stable and efficient algorithm for model reduction of large RLC networks using frequency-weighted balanced truncation technique. The salient features of this algorithm include guaranteed stability of the reduced transfer function as well as availability of provable frequency-weighted error bounds. Such frequency weighting is essential to provide better control over time-domain error of the reduced system. The first k largest singular values of the system are obtained using the Lanczos algorithm, and the Lyapunov equations are solved by an iterative Lyapunov equation solver. Experimental results demonstrate the higher accuracy of our technique compared to Krylov-subspace-based model reduction techniques and other truncated balanced realizations that do not use spectral shaping. Based on MATLAB simulations, the run-time of our method is only 5% more than that of PRIMA.

Design of Battery-Powered CMOS Circuits

"An interleaved dual-battery power supply for battery-operated electronics," M. Pedram, January 2000.

"Battery-powered digital CMOS design," M. Pedram, May 1999.

Abstract -- This talk focuses on the problem of maximizing the battery service life in battery-powered CMOS circuits. In particular, we have recently proposed an integrated model of the VLSI hardware and the battery sub-system that powers it. We have shown that, under this model and for a fixed operating voltage, the battery efficiency (or utilization factor) decreases as the average discharge current from the battery increases. The implication is that the battery life is a super-linear function of the average discharge current. Furthermore, even if the average discharge current remains the same, different discharge current profiles (distributions) may result in very different battery lifetimes. The maximum battery life is achieved when the variance of the discharge current distribution is minimized. Finally, we have demonstrated that accounting for the dependence of battery capacity on the average discharge current changes the shape of the energy-delay trade-off curve and hence the value of the operating voltage that results in the optimum energy-delay product for the target circuit. Consequently, we have proposed a more accurate metric (i.e., the battery discharge rate times delay product as opposed to the energy-delay product) for comparing various low power optimization methodologies and techniques targeted toward battery-powered electronics. Analytical derivations as well as simulation results demonstrate the importance of correct modeling of the battery-hardware system as a whole.

Design Implications and Challenges of CMOS Scaling

More Moore: Manufacturing Challenges, System Drivers, and Design Technology Solutions for Late-Silicon Age and Beyond

A key challenge in designing integrated circuits that use nanoscale devices is increase in parameter variability and leakage currents. As a result of dopant fluctuations, statistical process variations and leakage currents, the available noise margins and design head rooms are becoming smaller. Additionally these circuits dissipate considerable amount of power even when they are idle. Other scaling challenges for Late-Silicon Age (2012-2018) include:

Implementation of advanced nanoscale CMOS devices with enhanced drive current
Device and circuit modeling and engineering for 10-100GHz operations
Engineering manufacturable high-performance interconnect structures and addressing global wiring scaling issues
Enabling test of increasingly complex devices and structures
System-on-chip integration of embedded memory and mixed-signal/RF components
Management of leakage power consumption
Yield enhancement in light of cost-performance tradeoffs.

In my talk, I will review a range of possible solutions, including:

Reliable implementation platforms, including multi-core designs with embedded memory and mixed-signal circuitry, as well as power-efficient regular/tile-based system-on-chip designs exploiting regularity, concurrency, and adaptability
Performance-constrained statistical design optimization engines along with leakage power reduction
HW/SW-based fault-tolerant and self-correcting/self-repairing designs that target implementation platforms working under very low SNR, are non-deterministic, unpredictable and unreliable
Energy efficiency, system-level power delivery and thermal design to effectively handle the dissipation of delivered power
Design productivity improvements through RT-level design for manufacturing, system-level design entry, and increasing power and efficiency of design automation methods and tools.

The presentation is concluded with the paradigm shift necessitated by the evolving non-CMOS molecular-scale devices and circuit fabrics and implications on design methodologies and tools.

Massoud Pedram