# Integrated Logical Physical Design for DSM Circuits #### Massoud Pedram Department of EE - Systems University of Southern California Los Angeles, California - Motivation and Background - **FANROUT Algorithm** - Results and Discussion - SCD Algorithm - Results and Discussion - Conclusions - Motivation and Background - **FANROUT Algorithm** - Results and Discussion - SCD Algorithm - Results and Discussion - **Conclusions** #### **Prior Work** - **Fanout Optimization** - Berman`89, Singh`90, Touati `90, van Ginneken `90, Vaishnav`93, Kung`98 - **Performance-driven Routing** - Rao`92, Boese`93,Cong`93, Vittal`94,Lillis '96, Cong`97 - Concurrent fanout optimization and Steiner tree routing - Okamato '96 #### **Fanout Optimization Problem** - Distribute a signal to a set of sinks with known loads and required times so as to maximize the required time at the root of the fanout tree - A logic level operation with little access to (or use of) routing information - NP-Complete for the general case # LT-Tree: Logical Structure Sinks: with given required times Sink Node **Buffers: with given strengths** ■ The connection topology must be determined - **LT-Tree Type-I:** - Every buffer is connected to at most one other buffer - No buffer has a right sibling - Sinks with larger required times are placed further from the root of the tree #### LT-Tree Based Fanout Opt. - If the sink loads are all equal, there exists an optimal LT-Tree such that the sinks with larger required times are placed further from the root - The LT-TREE algorithm is based on dynamic programming; Its complexity is O(n²) #### **Routing Tree Construction** - Route a signal to a set of sinks with known loads, required times, and positions so as to maximize the required time at the driver - A physical design operation with little power to change the logical structure of the circuit - NP-Complete for the general case Sinks: with given required times, loads, and positions - Lengths and connection topology must be determined - Hanan Sink Point Node - There exists a RSMT for a terminal set where all Steiner points are located on the Hanan grid graph - In the P-TREE algorithm, the branching factor at every Steiner point is exactly three # P-Tree Based Routing Opt. - For a given sink order, the dynamic programming-based P-TREE algorithm computes the set of all RST's with non-dominated required time and total capacitance - The complexity of the P-TREE algorithm is O(n<sup>5</sup>) # Solution Curves At every step of DP, a two-D solution curve is generated for every subproblem rooted at some grid point Building and pruning the solution curves are the two major operations in FANROUT A Solution Curve A Solution Curve (Required Time):1 # **Properties of FANROUT** - Finds the solution with the maximum required time at the root, subject to the given sink order and the structure of LT-Tree and P-Tree - Does not depend on a particular gate or wire delay model - Has polynomial time complexity (albeit it is high) # Heuristic FANROUT - FANROUT uses heuristics to achieve a lower runtime: - Limit the number of Hanan points to g points - Allow no more than *k* fanouts for every buffer - Use a heuristic implementation of P-TREE algorithm - Motivation and Background - **FANROUT Algorithm** - Results and Discussion - SCD Algorithm - Results and Discussion - **■** Conclusions - Motivation and Background - FANROUT Algorithm - Results and Discussion - SCD Algorithm - Results and Discussion - **Conclusions** #### **SCD: Problem Definition** ■ Given a mapped and placed circuit with allowed range of gate sizes, find the best local displacement and size for each gate in the circuit so as to minimize the circuit delay #### **Previous Work** - In-Place Continuous Gate Sizing - Fishburn '85, Cirit '87, Berkelaar '90, Sapatnekar '93, Kung '96 - In-Place Discrete Gate Sizing - Chan '90, Li '93, Coudert '96 - Gate Sizing and Relocation - Chuang '94 # Delay Model, Cont'd ■ Gate Sizing Model $$dint_{i,j}(z_j) = \alpha \mathbf{1}_{i,j} \cdot z_j + \beta \mathbf{1}_{i,j}$$ $$rdr_{i,j}(z_j) = \frac{\alpha 2_{i,j}}{z_j} + \beta 2_{i,j}$$ **■ Wire Load Estimation** $$cin_{i,j}(z_j) = \alpha 3_{i,j} \cdot z_j + \beta 3_{i,j}$$ $$cnet_{i} = \rho \cdot [C_{hor}(xnet_{i,\max} - xnet_{i,\min}) + C_{ver}(ynet_{i,\max} - ynet_{i,\min})]$$ $$rnet_{i} = \rho \cdot [R_{hor}(xnet_{i,\max} - xnet_{i,\min}) + R_{ver}(ynet_{i,\max} - ynet_{i,\min})]$$ # **Detailed Delay Model** - **Elmore Delay Model** - non-Convex - non-linear $$\begin{aligned} d_{i,j} &= dint_{i,j}(z_j) + rdr_{i,j}(z_j) \cdot \{ (\rho \cdot C_{hor}(xnet_{j,\max} - xnet_{j,\min}) \\ &+ \rho \cdot C_{ver}(ynet_{j,\max} - ynet_{j,\min}) + \sum_{g_k \in fanout(g_j)} cin_{j,k}(z_k) \} \\ &+ \rho \cdot \{ R_{hor}(xnet_{j,\max} - xnet_{j,\min}) + R_{ver}(ynet_{j,\max} - ynet_{j,\min}) \} \\ &\cdot \sum_{g_k \in fanout(g_j)} cin_{j,k}(z_k) \end{aligned}$$ #### Three Optimizations #### ■ Steps and Methods - Reposition the cells directly driven by the cells on the k most-critical paths - Use linear programming (LP) - Size down the cells directly driven by the cells on the k most-critical paths - Use geometric programming (GP) - Simultaneously size and place the cells on the k most-critical paths - Use generalized geometric programming (GGP) # **Neighbor Repositioning** #### ■ Linear programming (LP) minimize $t_{cycle}$ s.t. $a_j \ge a_i + d_{i,j} \ \forall (v_i, v_j) \in A$ $a_j \le t_{cycle} \ \forall v_j \in \text{primary outputs and } v_j \in C(k)$ $a_j \le \gamma T_{critical} \ \forall v_j \in \text{primary outputs and } v_j \notin C(k)$ $a_j \ge T_{start} \ \forall v_j \in \text{primary inputs}$ $|x_i - x_i'| \le \Delta_x \ \forall v_i \in Ne(1)$ $|y_i - y_i'| \le \Delta_y \ \forall v_i \in Ne(1)$ $\gamma : \text{constant, } 0 \le \gamma \le 1$ $T_{critical} : \text{constant, longest path delay before this step}$ # **Neighbor Resizing** #### **■** Geometric Programming minimize t<sub>cycle</sub> s.t. $$a_i \ge a_i + d_{i,j} \ \forall (v_i, v_j) \in A$$ $$a_j \le t_{cycle}$$ $\forall v_j \in \text{ primary outputs and } v_j \in C(k)$ $$a_j \le \lambda T_{critical} \quad \forall v_j \in \text{ primary outputs and } v_j \notin C(k)$$ $$a_j \ge T_{start}$$ $\forall v_j \in \text{primary inputs}$ $$|z_i - z_i'| \le \Delta_z \quad \forall v_i \in Ne(1)$$ $\lambda$ : constant, $0 \le \lambda \le 1$ $T_{critical}$ : constant, longest path delay before this step # **Critical Path Sizing & Place** ## Generalized Geometric Programming minimize t<sub>cycle</sub> s.t. $$a_j \ge a_i + d_{i,j} \quad \forall (v_i, v_j) \in A \quad v_i, v_j \in C(k)$$ $$a_j \le t_{cycle} \quad \forall v_j \in \text{ primary outputs, } v_i \in C(k)$$ $$a_j \ge T_{start}$$ $\forall v_j \in \text{ primary inputs, } v_i \in C(k)$ $$|x_i - x_i| \le \Delta_x \quad \forall v_i \in C(k)$$ $$|y_i - y_i| \le \Delta_y \quad \forall v_i \in C(k)$$ $$|z_i - z_i| \le \Delta_z \quad \forall v_i \in C(k)$$ # **GGP-Algorithm** - Transform the original GGP problem into a sequence of (convex) GP problems - The sequence of optimal solutions to the GP sequence converges to a point satisfying the Kuhn-Tucker necessary conditions for the optimality of GGP - Motivation and Background - FANROUT Algorithm - Results and Discussion - SCD Algorithm - Results and Discussion - **Conclusions** # Experimental Results Benchmark Circuit C499 Critical Path Delay Before: 13.91 ns In-Place Sizing: 6.89 ns SCD: 6.04 ns - Motivation and Background - **FANROUT Algorithm** - Results and Discussion - SCD Algorithm - Results and Discussion - Conclusions #### **Conclusions I** - FANROUT builds buffered routing trees with maximum required time at drivers - The resulting structure is a LT-Tree from the logical viewpoint and and a P-Tree from the physical viewpoint - Future work will focus on derivation of the initial sink order, using relaxed LT-Tree, and employing buffered P-Tree structures #### **Conclusions II** - SCD improves timing by balancing the path delays, i.e. longer delay paths get shorter at the expense of shorter delay paths getting longer - 15% improvement compared to in-place gate sizing on average - Future work will focus on combining other logic optimization techniques in the placement loop