# CSAM: A Clock Skew-aware Aging Mitigation Technique

Behzad Eghbalkhah<sup>1</sup>, Mehdi Kamal<sup>1</sup>, Ali Afzali-Kusha<sup>1</sup>, Mohammad Bagher Ghaznavi-Ghoushchi<sup>2</sup>,

Massoud Pedram<sup>3</sup>

<sup>1</sup>School of Electrical and Computer Engineering, University of Tehran
 <sup>2</sup>Department of Electrical Engineering, Shahed University
 <sup>3</sup>Department of EE-Systems, University of Southern California
 {eghbalkhah, mehdikamal, afzali}@ut.ac.ir, ghaznavi@shahed.ac.ir, pedram@usc.edu

## ABSTRACT

In this work, we propose a clock skew-aware aging mitigation (CSAM) technique which considers the effect of asymmetric aging both on logic path and clock tree together. Simultaneous consideration of both parts in the design optimization problem enables us to reduce the area overhead while increasing the lifetime. For the aging mitigation of the logic path, we make use of both internal node control (INC) and input vector control (IVC) techniques while, for the clock tree circuits, a proper choice between NAND or NOR based integrated clock gating (ICG) cell is made. The optimization may be performed based on two objective functions of maximizing lifetime or minimizing the area overhead for a predetermined clock frequency and lifetime. To assess the efficacy of the proposed technique, we compared the lifetimes and area overheads for a set of circuits from ISCAS89 and ITC99 benchmark suites when CSAM and conventional techniques are used. The results, obtained using SPICE simulations for the circuits in a 45-nm technology, reveals that an average lifetime improvement of 34% and an average area overhead reduction of 25.7% for the two objective functions, respectively.

#### 1. INTRODUCTION

Bias Temperature Instability effects (BTI), Time Dependent Dielectric Breakdown (TDDB) and Hot Carrier Injection (HCI) are known as main sources of circuit aging and temporal performance degradation [1] [2]. Among them, the BTI effect may be considered as a dominant reliability concern as the gate oxide becomes thinner especially in highly scaled technologies [3]. In the case of PMOS devices, the effect is induced due to a negative bias voltage, and hence, is called negative BTI (NBTI). The effect makes the threshold voltage more negative over time degrading the circuit performance and hence reduce the life time [4]. The results presented in [5] [6] [7] indicate that, in addition to the gate oxide thickness, the amount of NBTI-induced degradation exponentially depends on the operating temperature. In addition, the degradation is proportional to the amount of the negative bias voltage (stress). As the bias becomes more negative, the magnitude of the gate oxide field ( $E_{ox}$ ) increases. Finally, the inversion layer hole density also plays an important role [5].

During the actual operation of the circuit, the bias voltage dynamically changes causing the PMOS device undergoing alternate stress and recovery periods (dynamic NBTI effect). In the stress condition, the magnitude of the threshold voltage increases due to the generation of interface traps at the Si-SiO<sub>2</sub> interface. During the recovery condition where the negative bias is removed, some of the interface traps annihilated resulting in a partial recovery [4]. The recovery reduces the threshold voltage change ( $\Delta V_T$ ) for the AC (dynamic) stress compared to the case of the DC (static) stress where the threshold voltage shift is not reduced over the time. The amount of the threshold voltage shift recovery depends on the duty cycle and input patterns.

Conventional reliability analysis assumes either a DC stress condition or an average duty cycle if an AC stress condition is considered. Since during the actual operation, different parts of the circuit may have different operation modes (such as standby mode where clock gating may be invoked), even the AC stress condition with an average duty cycle cannot predict the impact of the NBTI effect with a sufficient accuracy [4] [8]. During the standby mode, the input voltage of the PMOS device may have a LOW input voltage (corresponding to logic zero) where the transistor is under the static NBTI stress. Consequently, the standby mode leads to asymmetric degradation of the devices in all frozen parts of the circuit. This type of degradation is translated to more stress on some transistors, and hence, increase in the absolute value of the threshold voltage (slower transistor). The speed reduction of these transistors leads to pulse width shift or the duty cycle modulation for the clock tree. Also, it increases the propagation delay of the combination circuits. In general, the use of power management techniques, such as clock or power gating, cause asymmetric aging for different transistors on a chip. Since the amount of degradation caused by the static NBTI is considerably larger for the stressed transistors compared to that of the dynamic NBTI, the delay degradation in the critical paths of the combinational parts potentially could be high enough to violate the circuit timing constraint. In addition to the delay degradation of the logic path circuits, the NBTI phenomenon may adversely affect the timing reference provided by the gated clock trees. The reason is that the PMOS devices used in the gating logic as well as clock buffers are subject to aging in the presence of the NBTI stress. In the logic of the gated part of the clock tree, the transistors under stress experience static NBTI while all the PMOS transistors in the non-gated part suffers dynamic NBTI stress. This causes asymmetric aging rate which induces a non-uniformity in the timing rendered by the clock tree at different parts of the chip. The effects of asymmetric NBTI induced aging on the reliability (lifetime) of the logic path and clock tree circuits have been studied and techniques to improve the reliability have been suggested (see, e.g., [9]- [10]). To the best of our knowledge, these techniques, however, have focused either only on the logic path (neglecting the NBTI effect on the clock tree part) or on the clock tree part (ignoring the NBTI effect on the logic path circuit).

In this paper, we propose a technique to increase the lifetime of the circuits by considering the NBTIinduced degradation of both the clock tree and logic path circuits. Although for the transistors which use high- $\kappa$ gate dielectrics and metal gates, the effect of Positive Bias Temperature Instability (PBTI) becomes important [11], in this work, we only focus on presenting the results for the NBTI. The approach may be easily used for high- $\kappa$  gate dielectrics metal gate transistors by including the model for the delay degradation of the circuit due to the PBTI effect [11] [12]. The rest of the paper is organized as follows. In Section II, related works are briefly reviewed while the problem statement is presented in Section III. The proposed design technique is described in Section IV and the results are discussed in Section V. Finally, Section VI concludes the paper.

#### 2. PREVIOUS WORKS

As hinted previously, in the circuits where power management techniques are used to lower power consumption, the operating conditions (including the voltage and frequency) and the inputs of different parts of the circuit are changed based on the workload. This causes different delay degradations owing to asymmetric (non-uniform) stress and temperature distributions for various parts of the circuit. One of the widely used power management techniques in modern digital circuits is clock gating. For the parts whose clock is gated, the inputs of the logic path are frozen and the circuit only consume leakage power. Using an input vector control (IVC) and internal node control (INC) techniques, the optimum input values which minimizes the leakage power can be applied to the gates. For the parts where the clock is not gated, devices alternatively go to stress and recovery phases, while when clock gating is invoked frozen inputs cause constant stress or recovery condition. This suggests that the non-uniform stress present in these cases could cause asymmetric aging due to the different rates of aging in the active and standby mode, making NBTI degradation a more serious problem. Also, as mentioned previously, the NBTI phenomenon may also affect the clock tree skew. If the clock gating technique is not used, the degradation may be assumed symmetric (uniform) throughout the network not inducing any additional skew. If, however, the gating technique is invoked, the asymmetric aging causes some additional clock skew in the network. The effects of asymmetric aging due to usage of clock gating schemes are discussed in [9] [10] [13] [14] [15].

The NBTI-induced effects have been extensively studied in recent years. These works include presenting techniques for lifetime prediction (see, e.g., [4] [8]), NBTI-aware timing analysis (see, e.g., [4] [16] [17]), and reliability improvement of VLSI circuits such as memories and processors in the presence of NBTI. In this section, for the sake of brevity, we only review some works which are focused on aging mitigation of the circuits considering clock gating schemes. There are a number of circuit optimization techniques for mitigating the effect

of NBTI on combination logic path and clock tree networks. A review of the techniques for combination logic path may be found in [18]. The IVC and INC techniques used for the leakage reduction may be also used as an effective methods for suppressing the NBTI-induced degradation (see, e.g., [19] [20] [21] [22] [23] [24]). Since the input vectors applied to a combinational logic affects NBTI induced degradation, input vector control can be used to mitigate this phenomena during idle cycles. The authors in [22] proposed dynamic gate replacement (DGR) and divide and conquer-based gate replacement (DCBGR) algorithm as two INC schemes together with an input vector selection method, to simultaneously reduce the leakage power and mitigate NBTI-induced degradation. In [23], a linear-time heuristic technique is presented for tree-structured circuits. The technique presented in [24], inserts a transmission gate in front of protected gates which are identified by the proposed framework to estimate dynamic NBTI and static NBTI. None of these works presented NBTI mitigation techniques for the clock tree network.

The first work considering the effect of NBTI in the clock skew has been presented in [25]. In this technique, first, the clock skews induced by NBTI for different parts of the circuit are estimated. Then, half of the maximum value of these skews is used as a guardband for the clock tree generation tool. The use of the maximum value yields an overestimation of the clock frequency for all parts of the circuit. Also, when the skew degradations is large, the use of the technique may not be practical [15]. A technique for equalizing the signal probability (SP) of all clock tree trunks for balancing the NBTI stress is presented in [26]. In addition, to estimate the NBTI effect, a compact formula for computing equivalent temperatures under a Gaussian temporal temperature variation is proposed. The key idea is to properly switch at the runtime between gated-HIGH or gated-LOW for all of the gated clock tree trunks using a low frequency secondary clock. The technique suffers from a considerable area overhead, using routing resources for the secondary clock, and possible logic failures caused by switching of secondary independent clock due to spurious clock pulses [15].

The problem of asymmetric aging with a special focus on clock skew, pulse width, and aspects of burn-in is discussed in [9]. A timing analysis framework based on SSTA is presented for asymmetric aging analysis and mitigation of NBTI induced degradation on clock skew. An NBTI-aware skew management technique which also focused on the asymmetric aging of the gated clock trees was proposed in [15]. Choosing between NOR- or NAND-based integrated clock gating (ICG) cells for each trunk, the method modulated the signal probability of the clock tree to reduce the clock skew induced by NBTI. Similar to the last two works, this work did not consider the impact of the NBTI stress on the logic paths of the circuit.

In this work, we propose a design time technique to reduce the effects of the asymmetric aging on the circuit considering both the clock tree and the combinational logic paths simultaneously. It makes use of both INC and IVC methods for the logic paths and NOR- or NAND-based ICG cells for the skew management of the clock tree. The contributions of this work are given below:

- The optimization algorithm considers the clock skew between the launch and capture registers to decide for the use of the NAND- or NOR-based ICG cells for the clock tree and INC and IVC techniques for the logic paths. The algorithm increases or decreases the clock skew such that the cost of the NBTI mitigation is minimized.
- The decision for using the internal node control are made based on the signal probability of internal nodes during lifetime in both the active and standby modes rather than only considering the internal nodes in standby mode.
- Modeling the delay degradation of the critical paths with non-linear functions of signal probabilities, we utilize a non-linear non-integer programing for our optimization problem.

In the next section, we describe the problem statement more clearly.

## **3. PROBLEM STATEMENT**

The design constraints for VLSI circuits include power, speed, and area which are accompanied by stringent consideration of time-zero fabrication imperfections due to process variation and reliability issues during the circuit lifetime. The reliability is determined by phenomena which affect the circuit characteristics over the time including, *e.g.*, NBTI and dielectric wear out. The focus of this work is on NBTI-induced delay degradation where we introduce some techniques to guarantee the desired performance during the desired lifetime. Next, we describe the timing requirement of the pipelined stages, discuss NBTI aging effect on ICG cells, and give a motivational example.

## 3.1 Circuit Timing

The minimum clock period of the pipelined stage which determines the circuit performance is given by

$$Clock Period \geq T_{cq} + T_{pd} + T_{skew} + T_{setup}$$
(1)

where  $T_{cq}$  is the clock-to-Q delay of the launch flip-flop,  $T_{pd}$  is the propagation delay through the combinational block,  $T_{setup}$  is the setup time of the capture flip-flop, and  $T_{skew}$  is the clock skew defined as the maximum difference between the arrival times of the clock signals at the launch and capture flip-flops. Noting the fact that the aging changes the four terms on the right hand side of (1), in order to make sure that the inequality is satisfied, one should either minimize the change of each term independently or alternatively minimize the summation of the changes. This way, one can guarantee that the inequality is not violated for a given clock period during the circuit lifetime. In this work, we focus on minimizing the sum of the second and third terms on the RHS of (1). As will be shown later, this approach provides a longer lifetime as well as less area overhead (and complexity) in the implementation of the mitigation techniques when compared to the case of minimizing these two terms individually. The impact of the NBTI defect on the delay degradation of the combinational block has been discussed in the literature more extensively (see, e.g., [3] [4] [8] [27]). In next subsection, we concentrate on the NBTI aging effect of ICG cells (or more specifically, NAND and NOR gates) which influence $T_{skew}$ .

## 3.2 NBTI Aging Effect on ICG Cells

As mentioned previously, in order to minimize the leakage (static) power consumption, a clock gating scheme may be used to selectively gate the clock of the unused parts of the circuit. The gating is performed using integrated clock gating (ICG) cells which are composed of a latch followed by a NAND or NOR gate (see Figure 1.)



Figure 1. Clock Gating by a) NAND-based b) NOR-based Integrated Clock Gating Cell

The presence of the latch avoids glitches and premature ending of the clock signal [15]. When the NAND (NOR) gate is used, the clock tree trunk is frozen at a HIGH (LOW) signal level. Since the aging behavior of an inverter depends on its input value, the aging order of the following inverters in the clock tree buffers depends on the type of the gating element (cell). It should be noted that the time-zero delays as well as the NBTI impacts on them for

the NAND and NOR gates are different, and hence, one may optimize the use of these two types of elements to control the amount skew in the clock tree. The NBTI effect on the delay degradation of the gates is illustrated in Figure 2. The delay degradation corresponds to the increase in the delay after ten years versus the signal probabilities of both inputs of the gates. The FO4 delays of the NAND and NOR gates, which were equalized to the inverter delay by sizing, were obtained using the 45 nm Nangate library [28]. The figure reveals a higher delay degradation for the NOR gate compared to that of the NAND gate. It should be noted that when gating is used, the signal probability for the input connected to the clock signal is 0.5 while the probability for the other input equals to the gating probability. Similar results for different ages have been obtained for all the standard cells in the library used in the circuit. These results have been utilized in solving the optimization problem of this work.



Figure 2. Delay degradation for NAND and NOR gates after 10 years for different SP of the inputs.

## **3.3 Motivational Example**

As mentioned before, to reduce the asymmetric aging of the logic path in the standby operating mode, INC and IVC are exploited to minimize the applied stress of PMOS transistors. Since IVC is implemented through the set and preset inputs of the flip-flops, while there is no logic gate overhead, there is a routing overheard for this technique. The implementation of INC, however, requires the insertion of extra hardware which has some area overhead along with some routing complexities. With the objective of lowering the complexities, in this paper, we invoke the difference between the skews of the launch and capture flip-flops to reduce the overheads of using the INC technique. To elucidate the approach, we make use of the motivational example illustrated in Figure 3.



Figure 3. Negative and Positive Clock Skew.

In this example, there are two cases of positive and negative skews where, in both cases, the time-zero delay of the logic path is assumed to be 1ns and the clock period is considered to be 1.075ns assuming 75ps guardband in the design phase. Now, suppose that after ten years, the aging effect will cause an increase of 100ps if no mitigation scheme is employed. This obviously corresponds to a 25ps timing violation preventing the circuit from operating at the desired frequency (1/1.075 GHz). Using mitigation techniques, one can reduce the aging induced delay degradation to less than the considered guardband. In the case of Figure 3 (a) (Figure 3 (b)), the clock signal reaches the capture flip-flop 15ps later (earlier) than launch flip-flop implying a negative (positive) clock skew. This implies a 10ps timing violation for this stage of the pipeline. Therefore, if we consider the negative skew, the aging mitigation schemes needs to lower the delay degradation only from 100ps to 90ps. Apparently, the overhead associated with the schemes would be lower than that of the case where we do not consider the positive skew. In the case of positive skew, the mitigation technique should reduce the delay degradation from 100ps to 60ps implying a higher overhead compared to the previous case. This example emphasizes the fact that the amount of the clock skew does influence the overhead associated with the mitigation.

In this work, we use this fact to study two design scenarios based on clock skew-awareness. In the case of the first scenario, the objective is to increase the lifetime by minimizing the RHS of the inequality given in (1) without any constraint on the overhead. In the second scenario, the aim is to minimize the overhead such that the operating frequency remain unchanged up to a given lifetime. As the results will show, the minimum clock period achieved in the first scenario in the case of using our approach is less than that of the approach which finds the minimum clock period without taking into account the clock skew degradation. Also, in the case of the second scenario, our approach renders a substantial decrease in the overhead.

#### 4. PROPOSED TECHNIQUE

In this section, we describe our proposed aging mitigation design technique. In the first step, the paths of the circuit which could become potentially critically under aging phenomenon are extracted using static timing analysis (STA). The analysis makes use of nominal gate delays. The potential critical paths are those whose delays may become more than the desired clock period as the circuit becomes aged. We assume an upper bound of 50% for the delay degradation of each gate due to aging [4]. Based on this delay degradation assumption, the potential critical paths are determined and used in the optimization problem.

For each critical path, a set of gates which impact the delay degradation of the path is determined. This set includes the gates in the critical path as well as any other gates whose output can affect the inputs of the gates in the critical path. The output of the latter gate type influence the input signal probabilities of the former gate type affecting their NBTI-induced delay degradation. Hence, to find this set, a cone zone should be generated by backtracking the output of the path to any related inputs of the circuit. Note that the signal probabilities of the inputs of the inputs of the gates in the cone zone of each critical path will take part in the problem formulation. Figure 4 depicts a cone zone for a critical path where gates G1, G3, and G5 belong to this critical path while gates G2 and G4 are included in the zone because of their impact on the delay degradation of the gates in the critical path.



Figure 4, A Critical Path and its Corresponding Cone Zone

Once the corresponding cone zone of each critical path is extracted, the signal probabilities (SPs) of the input/output nodes (or interchangeably named as wires) of the gates inside each cone should be formulated. Note that the SPs (defined here as the zero probability of the signal) of the (primary) inputs of the zones are extracted using gate-level simulations with random data. Next, we should determine the SPs of the nodes when the INC and IVC techniques are used. In addition, these values in the clock tree should be determined.

For the implementation of the INC technique, the structures shown in Figure 5 a) and b) are used to freeze the inputs of the gates to the logic HIGH (type A) and LOW (type B), respectively. In this approach, the control signal

*ctrl* determines whether the wire should be frozen to a specific value or not. When ctrl = 0, the actual wire signal is transmitted ( $W'_i = W_i$ ) and when ctrl = 1,  $W'_i = LOW$  or HIGH regardless $W_i$ . Therefore, the impact of INC on signal probability of a wire can be formulated as

$$SP_{W_i} = \left(1 - SP_{CG_i}\right) \times SP_{W_i} + SP_{CG_i} \times \left(\overline{C_i} \times SP_{W_i} + C_i \times \overline{F_i}\right)$$
(2)

where,  $SP_{W_i}(SP_{W_i})$  is the signal probability of the  $W'_i(W_i)$ , and  $SP_{CG_i}$  is the probability of the clock gating of the circuit to which the *i*<sup>th</sup> wire belongs,  $C_i$  is a binary variable which determines the existence of the INC technique for the *i*<sup>th</sup> wire during the clock gating phase, and  $F_i$  is a binary variable which determines the type of the INC structure used for the *i*<sup>th</sup> wire ( $F_i = 1$  corresponds to using type A). The values of  $C_i$  and  $F_i$  will be determined through the optimization process. It should be noted that due to the overhead of INC, the proposed formulation provides the option of using ( $C_i = 1$ ) or not using ( $C_i = 0$ ) the technique for each internal wire separately.



Figure 5. Adding TG inside a Wire in the Two Cases of Freezing Node Value to (a) HIGH and (b) LOW.

In the case of the IVC technique, since the preset and reset signals exists for the flip-flops, we assume that the IVC technique may be used always during the clock gating phase and hence  $C_i$  is assumed to be 1 in (2). Therefore, similarly to (2), the signal probability of the primary inputs of the cone zone circuit is formulated by

$$SP_{W_i'} = (1 - SP_{CG_i}) \times SP_{W_i} + SP_{CG_i} \times \overline{K}_i$$
<sup>(3)</sup>

where  $K_i$  is a binary variable which specifies the use of preset and reset signals for the *i*<sup>th</sup> primary input ( $K_i = 1$  corresponds to using preset). The value of  $K_i$  is determined through the optimization process.

For each critical path, the signal probability is propagated from the corresponding primary inputs in the cone zone to the primary output using (2) and (3) as well as the logical function of the gate. By formulating the signal probabilities of the wires, the signal probability of the output of the  $i^{th}$  gate  $(SP_{OG_i})$  is determined as a function of the gate type.

Now, we concentrate on the impact of the NBTI effect on the clock tree during the clock gating phase. As mentioned before, we have the option of utilizing NAND or NOR based ICG cells. The activation of NAND-based and NOR-based ICG freezes the input clock signal of the trunk (branch) to logic HIGH and LOW, respectively. Based on this discussion, the output signal probability of the clock tree buffer (inverter or ICG cell) is formulated as

$$SP_{O_i} = (1 - [SP_{CG_i}]) \times (1 - SP_{I_i}) + [SP_{CG_i}] \times [J_i \times (SP_{CG_i} \times (1 - SP_{I_i})) + \overline{J_i} \times (1 - SP_{CG_i} \times SP_{I_i})]$$

$$(4)$$

where  $SP_{I_i}(SP_{O_i})$  is the signal probability of the clock input (output) of the *i*<sup>th</sup> buffer,  $[SP_{CG_i}]$  (which indicates the use of clock gating for that the *i*<sup>th</sup> buffer) is 0 when  $SP_{CG_i}$  is 0 and is 1 otherwise, and  $J_i$  is a binary variable which determines the type of the ICG cell used for the *i*<sup>th</sup> buffer. The optimization process specifies the value of  $J_i$  which is 1 (0) if a NAND-based (NOR-based) ICG cell should be used. The first part on the RHS of (4) corresponds to the case that there is no clock gating and the buffer is an inverter. The second part is for the case of clock gating when the inverter is replaced by ICG cell. This signal probability is propagated through the remaining buffers (inverters or following ICG cells) of the branch.

Having found the SPs of different nodes of the circuit, we can extract the delay degradation caused by the NBTI effect. In order to calculate the amount of delay degradation as a function of SPs of the gate inputs, at first, the transistor level netlist for each gate are extracted from the standards cell library. Afterward, different values of the SP are mapped to the corresponding threshold voltage degradation amount using the model given in [4]. The overall NBTI effect on  $V_{th}$  over time can be calculated as following [4]:

$$\Delta V_{th} = A.SP^n.t^n \tag{5}$$

where *A* is a technology dependent factor which is a function of temperature, *n* is a constant which depends on the fabrication process (n=1/6 or n=1/4 based on the diffusion), *SP* is the duty cycle or signal probability of the signal applied to the gate of the transistor and *t* is the total time (age of the transistor).

Based on the number of inputs for each gate, to calculate the threshold voltage change of the PMOS transistors of the gate in the case of the NBTI effect (NMOS transistors in the case of the PBTI effect), all the permutations of the SP values (starting from 0 to 1 with the steps of 0.01) of the inputs are considered. The threshold voltage changes are considered in the SPICE simulations to determine the delay degradation for each combination of the SP values. Finally, the simulation results are fitted to second order polynomials of the input SPs for each gate

using the curve fitting. In this work, to implement the circuits, without loss of generality, we made use of gates with only one or two inputs.

Next, we formulate the delays of the (potential critical) paths inside the combinational circuits as well as the clock tree. The path delay ( $D_{CP}$ ) is equal to the sum of delays of the gates and transmission gates wherever they exist. Therefore,

$$D_{CP_i} = \sum_{j=0} D_{G_i} + \sum_{j=0} C_j \times D_{TG}$$
(6)

where  $D_{G_j}$  is the delay of the *j*<sup>th</sup> gate in the *i*<sup>th</sup> critical path,  $D_{TG}$  is the delay of the transmission gate, and  $C_j$  is the binary variable defined in (2). It should be noted that the delay includes the change of the delay induced by the NBTI effect after *Y* years. For the clock tree, the delay of a buffer inside the clock tree depends on the type of the ICG cell as well as the signal probability of its input(s). Hence, the delay of the *i*<sup>th</sup> cell in the clock tree (*i.e.*,  $D_{CT,B_i}$ ) is obtained from

$$D_{CT,B_i} = \left(1 - \left[SP_{CG_i}\right]\right) \times D_{INV} + \left[SP_{CG_i}\right] \times \left(J_i \times D_{NAND} + \overline{J_i} \times D_{NOR}\right)$$
(7)

where  $J_i$  is aforementioned binary variable,  $D_{INV}$ ,  $D_{NAND}$ , and  $D_{NOR}$  are the delays of the INV, NAND, and NOR gates, respectively. Again, the delay degradations have been included in these delays. Note that the delay of a path in the clock tree (*i.e.*, a path from the circuit clock input to a flip-flip) is equal to the summation of the delay of the buffers in the path. This delay may be used to find the clock skew at any point in the tree. Now, we can include the delays of the launch and capture flip-flops in the delay of the *i*<sup>th</sup> (potential) critical path. Therefore, the delay for the *i*<sup>th</sup> path ( $D_{CPF_i}$ ) obtained from

$$D_{CPF_i} = D_{CP_i} + D_{FF,Launch} + D_{FF,Captue} + S_{Launch-Capture,i}$$
(8)

where  $S_{Launch-Capture,i}$  is the clock skew of the launch and capture flip-flops,  $D_{FF,Launch}$  stands for the delay of the clock edge to output (*Clock-to-Q*) for the launch flip-flop, and  $D_{FF,Capture}$  is the setup time of the capture flip flop. Note that the clock period of the system should be larger than  $D_{CPF_i}$  to avoid timing violations. To calculate the clock skew, we use (7) to calculate the clock arrival times for the launch and capture flip-flops. In formulating the optimization problem in our work, (8) is used for the delay modeling of critical paths in the presence of INC, IVC, and clock gating techniques. Now, we discuss the two objective functions used in the optimization process which merely concentrates on potential critical paths to reduce the problem size. The first objective function is based on increasing the circuit lifetime by minimizing the delay degradation after *Y* years. The function may be expressed as

$$Objective: Minimize \max(\{D_{CPF_i} | CPF is a Potential Critical Path\})$$

$$Subjec To: No non - critical path becomes a potential critical path$$

$$(9)$$

Note that the constraint is for making sure that the management of the clock tree skew does not convert a noncritical to a critical path during the optimization. Note that in this case, we only consider the non-critical paths which have a common launcher or capture flip-flop with the potential critical paths. Again, for these paths we assumed that the upper bound of 50% delay degradation for the path.

In the case of the second objective function, the objective is to minimize the area overhead provided that the delay degradation of the circuit does not cause the predefined clock period (*i.e.*, *CP*) violation. Since the overhead is induced by the transmission gates used for the implementation of the INC structure, this objective function corresponds to minimizing the number transmission gate insertion. The function may be written as

Objective:  

$$Minimize \sum_{foreach Potential Critical Path} \left(\sum C_{j}\right)$$
Subject To:  

$$foreach (\{CPF_{i} | CPF_{i} is a Potential Critical Path\})$$

$$D_{CPF_{i}} < CP$$
No non – critical path becomes a potential critical path
$$D_{CPF_{i}} < CP$$

#### 5. RESULTS AND DISCUSSION

In this section, the efficacy of the CSAM technique is studied. The study is based on applying a set of ten benchmarks from ISCAS'89 and ITC'99 packages. The corresponding gate and flip-flop counts for each benchmark are listed in Table I. The benchmarks were synthesized in the 45nm Nangate standard cell library [28]. The corresponding critical paths were extracted after the place and route step in the physical design flow. The clock tree depth, number of potential critical paths, and the clock period for each benchmark are also reported in Table I. Note that the critical paths were extracted based on the description provided in Section 4 and the clock period was obtained based on the nominal delay of the gates without considering the NBTI impact. For the clock tree synthesis, we used inverter gates with three different drive strengths.

| Benchmark | Gates  | Flip-Flops | Clock Tree Depth | Potential Critical Paths | Clock Period (ps) |
|-----------|--------|------------|------------------|--------------------------|-------------------|
| b15       | 9,271  | 416        | 4                | 10                       | 803               |
| b17       | 27,323 | 1,314      | 4                | 35                       | 705               |
| b18       | 72,124 | 3,020      | 4                | 17                       | 2,304             |
| s838      | 472    | 32         | 2                | 32                       | 731               |
| s1488     | 988    | 6          | 2                | 6                        | 627               |
| s5378     | 1,993  | 176        | 2                | 23                       | 826               |
| s9234     | 1,665  | 145        | 2                | 11                       | 1,336             |
| s13207    | 2,767  | 627        | 4                | 24                       | 1,027             |
| s15850    | 3,736  | 524        | 4                | 20                       | 959               |
| s35932    | 14,681 | 1,728      | 4                | 13                       | 760               |

Table I. The Number of the Gates and Flip-flops, Clock Tree Depth, Number of Critical Paths and the Clock Period of the Considered Benchmarks.

The clock gating of different nodes of the clock tree were determined randomly such that the gating probability of each node was larger than the probability of the nodes closer to the clock source (tree root). This was performed by defining maximum and minimum boundaries for the clock gating probability for the buffers in each clock tree depth level. These boundaries are obtained from

$$\left[ (i-1)\frac{MPCGP}{CT_{Depth}} \le RGP \le i\frac{MPCGP}{CT_{Depth}} \right] for i = 1 to CT_{Depth}$$
(11)

where MPCGP is the maximum predefined clock gating probability which was considered to be 70% in this work,  $CT_{Depth}$  is the depth of the clock tree of the circuit, and RGP is a uniform random number within the specified range.

The proposed technique (CSAM) is evaluated under two different scenarios explained in Section 4. We used an open source nonlinear optimization tool called NLOPT solver [29]. In the first scenario, the objective function given in (9) was used to increase the lifetime. The minimum clock period (which should be larger than the maximum critical path delay) was obtained by the simultaneous usages of the IVC and INC techniques under two conditions of invoking the clock skew management (proposed technique) and not invoking clock skew management (conventional approach). The time period considered for the studying the impact of aging was 10 years.

The delay degradations of the two approaches for different benchmarks are depicted in Figure 6. The delay degradations of the proposed technique in all the benchmarks are smaller than those of the conventional approach. Next, to see the effect of this on the circuit lifetime, we assumed a guard band for the clock period of the circuit equal to the delay degradation of the proposed technique. This corresponds to a 10 year circuit lifetime for the suggested approach. The lifetimes of the circuits based on these guard bands are illustrated in Figure 7. The results

show that on average the lifetime increases 34% compared to the conventional technique while the best (worst) corresponds to 77% (4%).



Figure 6. Delay Degradation after 10 Years under the NBTI Effect for Different Benchmarks.



Figure 7. Lifetime Improvement Achieved by CSAM Technique in Comparison to Conventional Techniques.

Next, we study the results for the second objective function formulated in (10). In this case, first, for each benchmark, we solved the optimization problem using the conventional approach. The minimum required guard band of the clock period for each benchmark was selected such that the delay degradation (minimized by using the IVC and INC techniques) did not exceed this guard band after the expected lifetime of ten years. Next, we used the determined guard band (clock period) as a constraint while minimizing the INC technique overhead. The number of transmission gates and areas of the circuits in the case of applying conventional and proposed techniques to the selected benchmarks are reported in Table II.

|           |       |             | Conventional Technique |                         |          | CSAM Technique |                         |          |
|-----------|-------|-------------|------------------------|-------------------------|----------|----------------|-------------------------|----------|
| Benchmark | Gates | Area        | INC                    | Area (µm <sup>2</sup> ) | Area     | INC            | Area (µm <sup>2</sup> ) | Area     |
|           |       | $(\mu m^2)$ |                        | (with INC technique)    | Overhead |                | (with INC technique)    | Overhead |
| b15       | 9271  | 14492       | 237                    | 15203                   | 4.9%     | 208            | 15116                   | 4.3%     |
| b17       | 27323 | 43432       | 309                    | 44359                   | 2.1%     | 270            | 44242                   | 1.9%     |
| b18       | 72124 | 108038      | 410                    | 109268                  | 1.1%     | 392            | 109214                  | 1.1%     |
| s838      | 472   | 633         | 11                     | 666                     | 5.2%     | 1              | 636                     | 0.5%     |
| s1488     | 988   | 633         | 56                     | 801                     | 26.5%    | 41             | 756                     | 19.4%    |
| s5378     | 1993  | 1107        | 306                    | 2025                    | 82.9%    | 248            | 1851                    | 67.2%    |
| s9234     | 1665  | 2532        | 425                    | 3807                    | 50.4%    | 371            | 3645                    | 44.0%    |
| s13207    | 2767  | 6266        | 331                    | 7259                    | 15.8%    | 301            | 7169                    | 14.4%    |
| s15850    | 3736  | 6780        | 368                    | 7884                    | 16.3%    | 297            | 7671                    | 13.1%    |
| s35932    | 14681 | 24722       | 354                    | 25784                   | 4.3%     | 347            | 25763                   | 4.2%     |

Table II. The Number of the Gates and Areas of the Benchmarks in the Case of Applying Conventional and CSAM Techniques

Using the data presented in Table II, the overhead reductions of the technique for the benchmarks are presented in Figure 8. The percentage of the reduction in the number of INC structures varies between 2% and 90.9% with the average of 25.7%.



Figure 8. Reduction in the Number of Transmission Gates Used to Implement the INC Technique.

As mentioned before, the NAND- or NOR-based ICG cells have different behaviors in the presence of NBTI effect. In the proposed technique in this work, by selecting a proper ICG cell, we either increase or decrease the clock skew such that the objective function is best satisfied (see (8), (9), and (10)). To show the approach, for the case of the first objective function, we have depicted the minimum and maximum clock skews for both the CSAM and conventional techniques in Figure 9 (a) and (b), respectively. As the results show, the proposed technique has increased the range of the clock skew for increasing the lifetime. For some benchmarks, *e.g.*, b15, the maximum and minimum skews of the CSAM method are both negative. This implies that the clock periods for all potential critical paths have been stretched. For some other benchmarks, *e.g.*, s5378, the technique has reduced the clock period for some potential critical paths inducing positive skews. Since the pipeline stages are in series, a positive

skew for one stage may increase the clock period for the following pipeline stage. Note that the positive clock skew of the previous stage is either due to a shorter path or the use of the INC and IVC techniques has reduced the delay degradation considerably for that path.



Figure 9. a) Minimum and b) Maximum Clock Skew for both CSAM and Conventional NBTI Mitigation Techniques.

#### 6. CONCLUSION

In this paper, an NBTI-mitigation technique based on simultaneous use of internal node control (INC), input vector control (IVC), and clock skew management techniques was suggested. In the clock skew management techniques both NAND- and NOR-based integrated clock gating (ICG) cells were invoked. The proposed technique (CSAM), increased the lifetime of the circuit while decreasing the overhead and complexity of the implementation compared to those of the conventional NBTI mitigation scheme. The NBTI-induced asymmetric aging created by the clock gating in both the logic path and clock tree circuit were formulated as a non-linear non-integer optimization problem which was solved by considering two objective functions. We evaluated the efficacy of the proposed method by solving the optimization problem for a set of benchmarks including 10 circuits from ISCAS89 and ITC99 for both the CSAM technique and conventional approach which did not invoke the clock skew management. The results indicated that the lifetime improvement of the proposed technique was on average 34% more compared to that of the conventional method when there was no constraint on the overhead. Also, for the same lifetime, the suggested technique provided 25.7% lower area overhead. The achieved improvements showed the significance of concurrent use of the clock skew management along with the INC and IVC techniques.

#### REFERENCES

- S. Mahapatra, A. Islam, S. Deora, V. Maheta, K. Joshi and M. Alam, "Characterization and modeling of NBTI stress, recovery, material dependence and AC degradation using R-D framework," in *in Proc. 18th IEEE Int. Symp. Physical and Failure Analysis of Integr. Circuits (IPFA)*, 2011.
- [2] J. Keane, X. Wang, D. Persaud and C. Kim, "An All-In-One Silicon Odometer for Separately Monitoring HCI, BTI, and TDDB," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 817-829, 2010.
- [3] K. Kang, S. P. Park, K. Roy and M. A. Alam, "Estimation of Statistical Variation in Temporal NBTI Degradation and its Impact on Lifetime Circuit Performance," in *in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des.*, 2007.
- [4] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu and Y. Cao, "The Impact of NBTI Effect on Combinational Circuit: Modeling, Simulation, and Analysis," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 1, no. 1, pp. 1-11, 2010.
- [5] S. Mahapatra, N. Goel, S. Desai, S. Gupta, B. Jose, S. Mukhopadhyay, K. Joshi, A. Jain, A. Islam and M. Alam, "A Comparative Study of Different Physics-Based NBTI Models," *IEEE Trans. Electron Devices*, vol. 60, no. 3, pp. 901-916, 2013.
- [6] S. Mahapatra, D. Saha, D. Varghese and P. Kumar, "On the generation and recovery of interface traps in MOSFETs subjected to NBTI, FN, and HCI stress," *IEEE Trans. Electron Devices*, vol. 53, no. 7, pp. 1583-1592, 2006.
- [7] S. Desai, S. Mukhopadhyay, N. Goel, N. Nanaware, B. Jose, K. Joshi and S. Mahapatra, "A comprehensive AC / DC NBTI model: Stress, recovery, frequency, duty cycle and process dependence," in *in Proc. IEEE Int. Reliab. Phys. Symp. (IRPS)*, 2013.
- [8] Sarvesh Bhardwaj, Wenping Wang, Rakesh Vattikonda, Yu Cao, Sarma Vrudhula, "Predictive Modeling of the NBTI Effect for Reliable Design," in *in Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2006.
- [9] P. Jain, F. Cano, B. Pudi and N. Arvind, "Asymmetric Aging: Introduction and Solution for Power-Managed Mixed-Signal SoCs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 3, pp. pp.691,695, 2014.
- [10] J. Velamala, K. Sutaria, V. Ravi and Y. Cao, "Failure Analysis of Asymmetric Aging Under NBTI," *IEEE Trans. Dev. Mat. Rel.*, vol. 13, no. 2, pp. pp.340,349, 2013.
- [11] M. W. a. K. Z. J. Stathis, "Reliability of advanced high-k/metal-gate n-FET devices," *Microelectronics Reliability*, vol. 50, no. 9, p. 1199–1202, 2010.
- [12] S. Kumar, C. Kim and S. Sapatnekar, "Adaptive Techniques for Overcoming Performance Degradation Due to Aging in CMOS Circuits," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 4, pp. 603,614, 2011.
- [13] J. Velamala, V. Ravi and Y. Cao, "Failure diagnosis of asymmetric aging under NBTI," in *in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD)*, 2011.

- [14] M. Chen, V. Reddy, S. Krishnan, V. Srinivasan and Y. Cao, "Asymmetric Aging and Workload Sensitive Bias Temperature Instability Sensors," *IEEE Des. Test Comput.*, vol. 29, no. 5, pp. pp.18,26, 2012.
- [15] A. Chakraborty and D. Pan, "Skew Management of NBTI Impacted Gated Clock Trees," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 32, no. 6, pp. pp.918,927, 2013.
- [16] S. Han and J. Kim, "NBTI-aware statistical timing analysis framework," in *in Proc. IEEE Int. SOC Conference (SOCC)*, 2010.
- [17] W. Wang, Z. Wei, S. Yang and Y. Cao, "An Efficient Method to Identify Critical Gates under Circuit Aging," in *in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD)*, 2007.
- [18] X. Chen, Y. Wang, H. Yang, Y. Xie and Y. Cao, "Assessment of Circuit Optimization Techniques under NBTI," *IEEE Des. Test*, vol. 30, no. 6, pp. pp.40,49, 2013.
- [19] Y. Wang, H. Luo, K. He, R. Luo, H. Yang and Y. Xie, "Temperature-Aware NBTI Modeling and the Impact of Standby Leakage Reduction Techniques on Circuit Performance Degradation," *IEEE Trans. Dependable and Secure Computing*, vol. 8, no. 5, pp. pp.756,769, 2011.
- [20] F. Firouzi, S. Kiamehr and M. Tahoori, "Power-Aware Minimum NBTI Vector Selection Using a Linear Programming Approach," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 32, no. 1, pp. pp.100,110, 2013.
- [21] J. Abella, X. Vera and A. Gonzalez, "Penelope: The NBTI-Aware Processor," in *in Proc 40th Annu. IEEE/ACM Int. Symp. Microarchitecture. (MICRO 2007)*, 2007.
- [22] Y. Wang, X. Chen, W. Wang, Y. Cao, Y. Xie and H. Yang, "Leakage Power and Circuit Aging Cooptimization by Gate Replacement Techniques," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 4, pp. pp.615,628, 2011.
- [23] D. R. Bild, R. P. Dick, and G. E. Bok, "Static NBTI Reduction Using Internal Node Control," ACM Trans. Des. Autom. Electron. Syst., vol. 7, no. 4, pp. pp. 45:1-45:30, 2012.
- [24] I.-C. Lin, C.-H. Lin and K.-H. Li, "Leakage and Aging Optimization Using Transmission Gate-Based Technique," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 32, no. 1, pp. pp.87,99, 2013.
- [25] J. M. Cohn, "Method for reducing design effect of wearout mechanisms on signal skew in integrated circuit designs". U.S. Patent 6651230, November 2003.
- [26] A. Chakraborty, G. Ganesan, A. Rajaram and D. Pan, "Analysis and optimization of NBTI induced clock skew in gated clock trees," in *in Proc. Des., Autom. Test Eur.*, 2009.
- [27] B. Paul, K. Kang, H. Kufluoglu, M. Alam and K. Roy, "Impact of NBTI on the temporal performance degradation of digital circuits," *IEEE Electron Device Lett.*, vol. 26, no. 8, pp. pp.560,562, 2005.
- [28] "NanGate 45nm PDK Release v1.3," [Online]. Available: http://www.nangate.com.
- [29] "NLopt, Nonlinear Optimization Library," [Online]. Available: http://openopt.org/nlopt.