Optimization of CMOS Low Power High Speed Dual Edge Triggered Flip Flop
Harpreet Singh

ABSTRACT:
In recent years, there has been an increasing demand for high-speed digital circuits at low power consumption. The use of dual edge-triggered flip-flops can help reduce the clock frequency to half of the single edge-triggered flip-flops while maintaining the same data throughput, this thereafter translates to better performance in terms of both power dissipation and speed. Pulse-triggered flip-flops employ time borrowing across cycle boundaries which results in zero or negative setup time. Moreover, the pulse generator can be shared among many flip-flops to reduce the power dissipation and chip area. Pulse generator provides a narrow window to the latching stage during which the flip-flop is in the transparent mode. By reducing this pulse width, the setup time and hold time of the flip-flop are reduced. In this thesis, two dual-edge triggered D flip-flops are designed using master slave approach and pulsed latch approach respectively. One offers high performance while another offers clock skew tolerance. Comparing to other flip-flops in the latest publications, the Clock-to-Qdelay, setup time, hold time and power consumption of this flip-flop are all smaller. In addition to this the proposed design consists of 15 transistors only and thus requires lesser overall silicon area. The result of the simulation demonstrates that this dual-edge triggered flip-flop is a viable means to improve design performance and to ease the strict and tight timing budget.

Keywords:- Dual Edge Triggered Flip Flop (DETFF), Dual Static Pulsed Edge Triggered Flip Flop (DSPFF), Flip Flop (FF), Power Delay Product (PDP)

I. INTRODUCTION
In the past decades, Moore’s law drives the VLSI technology to continuously increase the transistor densities, there are hundreds millions of transistors or even billions of transistors on a chip today, which results in that the power consumption of VLSI chips has constantly been increasing. The trend in VLSI technology scaling in the last few years, shows that the number of on-chip transistors increase about 40% every year. And operation frequency of VLSI systems increases about 30% every year. Two central categories of cells included in cell libraries are flip-flops and latches. These are extremely important circuit elements in any synchronous VLSI chip. They are not only responsible for correct timing, functionality, and performance of the chips, but also their clocked devices consume a significant portion of the total active power. Based on the comparison of the power breakdown for different elements in VLSI chips, latches and flip-flops are the major source of the power consumption in synchronous systems. Latches and flip-flops have a direct impact on power consumption and speed of VLSI systems. Therefore study on low-power and high performance latches and flip-flops is inevitable. A universal flip-flop with the best performance, lowest power consumption, and highest robustness against noise would be an ideal component to be included in cell libraries. However, it will be shown in this thesis, that increasing the performance of flip-flops generally involves significant power and robustness trade-offs. Therefore, a set of different latches and flip-flops with different performances are essential to limit the use of more power consuming and noise-sensitive elements only for smaller portion of the chips with performance-critical units.

1.1 Flip-flops and Latches
Building a sequential machine requires memory elements which read a value, save it for some time and then write that stored value somewhere else even if the element’s input value has subsequently changed. A Boolean logic gate can compute values, but its output value will change shortly after its input changes. Each alternative circuit used as a memory element has its own advantages and disadvantages. A generic memory element has as internal memory and some circuitry to control access to the internal memory. Access to the internal memory is controlled by the clock input. The memory element reads its data input value when instructed by the clock and stores that value in its memory. The output reflects the stored value, probably after some delay. In CMOS circuits the
memory is formed in two ways. The first approach uses positive feedback or regeneration. Here, one or more output signals are intentionally connected back to the inputs. These results in a class of elements called multi vibrator circuits. The second approach to build memory function in circuits is to use charge storage as a means to store signal values. This approach, which is very popular in MOS world, requires regular refreshing as charge tends to leak away with time.

Memory elements differ in many key respects:
- Exactly what form of clock signal causes the input data value to be read.
- How the behavior of data around the read signal from clock affects the stored value.
- When the stored value is presented to the output.
- Whether there is ever a combinational path from the input to the output.

Introducing a terminology for memory elements requires caution. Many terms are used in slightly or grossly different ways by different people. However, in this thesis Dietmeyer’s convention is chosen, dividing memory elements into two major types [1]:
- **Latches** are transparent while the internal memory is being set from the data input and the possible changes of the input value can be transmitted to the output.
- **Flip-flops** are not transparent; reading the input value and changing the flip-flop’s output are two separate events. Figure 1.1 illustrates the differences at the output of a positive-edge-triggered flip-flop and an active-high latch. As it can be seen in this figure, possible changes of input can be seen at the output of the latch while it is transparent.

The factors which are desirable in flip-flops are as follows:
- High speed
- Low power consumption
- Noise stability
- Smaller area
- Supply voltage scalability
- Low glitch probability
- Less internal activity when data activity is low

According to the requirements of the system, the designer has to consider all these parameters while choosing a structure for flip-flops. What makes this decision even harder is that usually most of these parameters are not independent from each other. Trade-offs between desired parameters, make this decision a multi-dimensional optimization problem for high-performance systems. Flip-Flops are extremely important circuit elements in all synchronous VLSI circuits. They are not only responsible for the correct timing, functionality and performance of the chip, but also they and other clock distribution networks consume a significant portion of the total power of the circuit. It is estimated that the power consumption of the clock system, which consists of clock distribution networks and storage elements, is as high as 20%–45% of the total system power [2]. Comparing to different elements in the VLSI circuits, flip-flops are the primary source of the power consumption in synchronous system. Moreover, flip-flops have a large impact on circuit speed. The performance of the Flip-Flop is an important element to determine the performance of the whole circuit. For example, the Clock-to-Q delay, Setup time and Hold time, all these parameters of the flip-flops can affect the performance of the whole circuit. There are certain timing parameters associated with a flip-flop. Some of these Parameters are specific to the logic family to which the flip-flop belongs. There are some parameters that have different values for different flip-flops belonging to the same broad logic family. It is therefore important that one considers these timing parameters before using a certain flip-flop in a given application. Some of the important ones are set-up and hold times, propagation delay, clock pulse HIGH and LOW times, asynchronous input active pulse width, clock transition time and maximum clock frequency.

### 1.2 Power Consumption in CMOS Circuits

To reduce the power dissipation of a CMOS circuit, the various sources must be identified. There are two types of power consumption relevant to circuit design: the average power and peak power. The peak power is related to the maximum instantaneous current drawn from the supply which can result in large voltage drops/bounces on the resistive power/ground rails. This can badly affect circuit reliability and causes overheating of the devices which degrade the circuit performance. It is therefore essential to have peak power under control. The average power dissipation in a circuit decides the battery size and weight needed to operate the circuit for a given amount of time. Minimizing the average power is more critical than the peak power for almost all low power applications. Fortunately, approaches for average
power optimization also reduces peak power as noticed in [1].

In typical digital CMOS circuits, there are two main classes of power dissipation: dynamic power \( P_{\text{dyn}} \) and static power \( P_{\text{static}} \). The term dynamic arises from the transient switching behavior of the CMOS circuit. Dynamic power can be further divided into two main components: switching power \( P_{\text{switch}} \) and short-circuit \( P_{\text{sc}} \) power.

Similarly, static power has two main elements: DC power \( P_{\text{dc}} \) and leakage power \( P_{\text{leakage}} \).

### 1.2.1 Dynamic Power

#### 1.2.1.1 Switching Power

When data transitions are there in a CMOS transistor, energy is drawn from the power supply to charge up the load capacitance from 0 to \( V_{\text{dd}} \).

*For example in CMOS inverter as shown in figure 1.2, the power drawn from the power supply is dissipated as heat in PMOS transistor during the charging process*

\[
E_p = \int_0^{V_{\text{dd}}} C_L V_L \, dV_L = \frac{1}{2} V_{\text{dd}}^2 C_L
\]

(1.1)

Figure 0.2 Switching power dissipation in CMOS inverter

This \( \frac{1}{2} V_{\text{dd}}^2 C_L \) energy stored in the output capacitance is released during the discharging of the load capacitance, which occurs when the output of the inverter transitions from logical ‘1’ to logical ‘0’. The load capacitance of the CMOS logic gate consists of the output node capacitance of the logic gate, interconnects effective capacitance, and the input node capacitance of the driven gate.

#### 1.2.1.2 Short Circuit Power

It is another component of power, short-circuit power becomes important because of finite non-zero rise and fall times of transistors that causes a direct current path between the supply and ground. This power component is usually not significant in logic design, but it appears in transistors that are used to drive large capacitances, such as bus wires and especially off-chip circuits. As wires on chip became narrower, long wires have more resistance. CMOS gates at the end of those resistive wires see slow input transitions.

Consider the inverter in figure 1.2. Figure 1.3 shows short circuit current \( I_{\text{sc}} \) as the inverter is driven by a rising ramp input from time 0 to \( T_R \). Therefore, in the duration when \( V_{\text{th}} < V_{\text{in}} < (V_{\text{dd}} - V_{\text{th}}) \) holds, there will be a conductive path open between \( V_{\text{dd}} \) and \( Gnd \) because both the NMOS and PMOS devices will be simultaneously on.

\[
F_{\text{short-circuit}} = \frac{1}{12} k \cdot \tau_f \cdot F_{\text{crit}} \cdot (V_{\text{dd}} - 2V_{\text{th}})^3
\]

(1.2)

Where the rise time and fall time (assumed equal) and \( k \) is the gain factor of the transistor [2].

Figure 0.3 Short-circuit power dissipation in CMOS inverter

Short-circuit power is usually estimated as:

Figure 0.4 Short circuit current in CMOS inverter

### 1.2.2 Static Power

Unlike dynamic power, static power is consumed during steady-state where there is no input/output transition. Static power has two sources: DC power and leakage power. The first component is an inherent property of some CMOS circuit styles,
while the second is outcome of the fact that a MOS transistor is not a perfect switch and so leaks some current.

**1.2.2.1 DC Power**

A conventional CMOS logic gate (figure 1.1) dissipates no DC power, and only consume power during logic transitions as explained in the previous section. However, some families CMOS circuit families like pseudo-NMOS wastes DC current when the output is at logic ‘0’ as shown in figure 1.4, for a pseudo-NMOS inverter. Note in this case, logic‘0’ is not equal to $V_{ss}$ and depends on the current ratio between the PMOS and NMOS devices. For good high noise margin (low $V_{OL}$), the PMOS pull-up is made much weaker than the NMOS pull-down, resulting in rise time speed degradation. These demerits in addition to the DC current dissipation limit of the applications that can benefit from this logic style.

![Figure 0.5 DC power dissipation in pseudo-NMOS inverter](image)

**1.2.2 Leakage Power**

The third component of power dissipation in CMOS circuits, as shown in Equation 1.1, is the static or leakage power. Even though a transistor is in a stable logic state, just because it is powered-on, it continues to leak small amounts of power at almost all junctions due to various effects like reverse biased diode leakage and , gate induced drain leakage and gate oxide tunneling. The figure 1.5 shows various components of leakage power in NMOS transistor.

![Figure 0.6 Components of Leakage current in NMOS transistor](image)

**1.3 Sub threshold Leakage**

Sub threshold current flows from the source to drain even if the gate to source voltage is below the threshold voltage of the device. This happens due to weak inversion effect: when the gate voltage is below $V_T$, carriers move by diffusion along the surface similar to charge transport across the base of bipolar transistors. Weak inversion current becomes significant when the gate to source voltage is smaller than but very close to the threshold voltage of the device. The second prominent effect is the Drain-Induced Barrier Lowering (DIBL). DIBL is essentially the reduction of threshold voltage of the transistor at higher drain voltages. As the drain voltage is increased, the depletion region of the p-n junction between the drain and body increases in size and extends underneath the gate, so the drain assumes a greater portion of the burden of balancing depletion region charge, a smaller burden for the gate is leaved. As a result, the charge present on the gate retains the charge balance by attracting more carriers into the channel, an effect equivalent to lowering the device’s threshold voltage.

**1.4 Low Power Design Approaches**

In order to minimize the power of integrated circuit, a low power methodology was developed. It spans from process-level technology modifications all the way up to high-level system design [7]. The designer can optimize at all levels of the design space, which has a collective effect on total system power reduction. The figure 1.6 shows block diagram of low power design techniques.

**1.4.1 Power reduction through gate design**

At the gate level, several approaches that can be applied to the design include:
Use of low power CMOS standard cell library.

Logic minimization.

1.4.2 Power reduction through circuit design

To minimize the power at circuit level, many techniques can be used such as:

- Optimum transistor sizing.
- Clever circuit design techniques that minimize internal swing.
- Reduce the switching activity by logic minimization.
- Reduced $V_{DD}$ in non-critical paths

1.4.3 Gate sizing

The power dissipated by a gate is directly proportional to its capacitive load $C_L$, whose main components are: i) output capacitance of the gate itself (due to parasitic), ii) the wire capacitance, and iii) input capacitance of the gates in its fanout. The output and input capacitances of gates are proportional to the gate size. Reducing the gate size reduces its capacitance, but increases its delay. Hence, in order to preserve the timing behavior of the synchronous digital circuit, not all gates can be made smaller; only the ones that do not belong to a critical path can be slowed down.

1.4.4 Supply voltage and frequency scaling

Dynamic power is proportional to the square of the operating voltage. Therefore, reducing the voltage significantly improves the power consumption. Furthermore, since frequency is directly proportional to supply voltage, the frequency of the circuit can also be reduced, and thereby a cubic power reduction is possible. However, the delay of a circuit also depends on the supply voltage as follows.

$$\tau = kC_L \frac{V_{dd}}{(V_{dd} - V_T)^2}$$

(1.11)

Where $\tau$ is the circuit delay, $k$ is the gain factor, $C_L$ is the load capacitance, $V_i$ is the threshold voltage, and $V_{dd}$ is the supply voltage. Thus, by reducing the voltage, although we can achieve cubic power reduction, the execution time increases. The main challenge in achieving power reduction through voltage and frequency scaling is therefore to obtain power reduction while meeting all the timing constraints.

1.4.5 Techniques for reducing average short circuit power

Short circuit power is directly proportional to rise time and fall time on gates. Therefore, reducing the input transition times will decrease the short circuit current component. However, propagation delay requirements have to be considered while doing so. Short circuit currents are significant in CMOS VLSI circuit when the rise/fall time at the input of a gate is much larger than rise/fall time of the output. This is because the short circuit path will be active for a longer period of time. To minimize the total average short circuit current, it is advantageous to have equal input and output edge times. In this case, the power consumed by the short circuit current is typically less than 10% of the total dynamic power. An significant point to note is that if the supply is lowered to below the sum of the thresholds of the transistors, $V_{dd} < VT_n + |VT_p|$, the short circuit currents can be eliminated because both devices will never be on at the same time for any input voltage value.

II. PROPOSED WORK

2.1 Dual Edge Trigger Static Pulsed Flip-flop (DSPFF)

The dual-edge triggered static pulsed flip-flop (DSPFF) is shown in Fig.3.1. In its pulse generator, the four inverters are used to generate the inverted and delayed clock signals. These signals along with two nMOS pass transistors create a narrow sampling window at both the rising and falling edges of the clock. Once the PULS signal is generated, both pass transistors, $N_1$ and $N_2$, are turned on to capture the inputs data. A smaller delay can be obtained since $DB$ and $D$ are directly fed to the nodes, respectively.

![Fig 2.1 Schematic Diagram Of DSPFF](image-url)
The DET flip flop given in [2] is shown in figure 3.2. This flip-flop is basically a Master Slave flip-flop structure. This has two data paths. The upper data path consists of a Single Edge Triggered flip-flop implemented using transmission gates. This works on negative edge. The lower data path consists of a positive edge triggered flip-flop implemented using transmission gates. Both the data paths have feedback loops connected such that, whenever the clock is stopped, the logic level at the output is retained. This flip-flop has 22 transistors. In these 22 transistors 12 transistors are clocked transistors.

The DET flip flop proposed in [2] is shown in figure 3.3. This flip flop is similar to figure 1 except that feedback has been changed. In the feedback path of both latches of this flip flop, the input data controls the passing of the clock signals. If $clk = 1$, $M1$ turns on, when $D = 0$, Node A discharges to 0 and Node B switches to 1 due to this $M2$ turns on. As a consequence, $M1$ and $M2$ attempt to write 0 and $(VDD - Vtn)$ voltages simultaneously onto Node A. This voltage conflict is present until the clock = 0. So this structure allows large current to flow at the input. Another problem with this circuit is reduction of noise margin. The degraded voltage level at Node A also causes a direct path current in the subsequent inverters. This increases power consumption. Similarly in other cases power consumption is increased.

2.3 Proposed Double-Edge Triggered Flip-Flop

The proposed design is shown in Fig. 3. This flip-flop is basically a Master Slave flip-flop structure and it consists of two data paths. The upper data path consists of transmission gates $TG1$, $TG2$ and inverter $I1$. The lower data path consists of transmission gates $TG3$, $TG4$ and inverter $I3$. The input data is connected to $TG1$ and $TG3$ and the output is taken from inverter $I5$ whose input is connected with $TG2$ and $TG4$. Both the data paths have feedback loops connected such that, whenever the clock is stopped, the logic level at the output is retained so as to maintain the static functionality. The transmission gates (TG) in both the data path are clocked such that the upper data path works as positive edge triggered flip flop and lower data path works as negative edge triggered flip flop. The feedback in upper data path consists of inverter $I2$ and pass transistor $P1$ and in lower data path feedback consists of inverter $I4$ and pass transistor.
P2. Our design is identical to figure 1 except that feedback has been changed. In the feedback path of both latches of this flip flop, the input data controls the passing of the clock signals. If clk = 1, M1 turns on, when D = 0, Node A discharges to 0 and Node B switches to 1 due to this M2 turns on. As a consequence, M1 and M2 attempt to write 0 and (VDD –Vtn) voltages simultaneously onto Node A. This voltage conflict is present until the clock = 0. So this structure allows large current to flow at the input. Another problem with this circuit is reduction of noise margin. The degraded voltage level at Node A also causes a direct path current in the subsequent inverters. This increases power consumption. Similarly in other cases power consumption is increased.

Fig. 2.3 Schematic Diagram Of DETFF2

2.3 Proposed Double-Edge Triggered Flip-Flop

The proposed design is shown in Fig. 3. This flip-flop is basically a Master Slave flip-flop structure and it consists of two data paths. The upper data path consists of transmission gates TG1, TG2 and inverter I1. The lower data path consists of transmission gates TG3, TG4 and inverter I3. The input data is connected to TG1 and TG3 and the output is taken from inverter I5 whose input is connected with TG2 and TG4. Both the data paths have feedback loops connected such that, whenever the clock is stopped, the logic level at the output is retained so as to maintain the static functionality. The transmission gates (TG) in both the data path are clocked such that the upper data path works as positive edge triggered flip flop and lower data path works as negative edge triggered flip flop. The feedback in upper data path consists of inverter I2 and pass transistor P1 and in lower data path feedback consists of inverter I4 and pass transistor P2. Our design is identical to figure 1 except feedback. The feedback transmission gates of figure 3.1 are not on critical paths so we replaced them with pass transistors. This improved the power efficiency of our flip flop. So the novelty of the proposed flip flop lies in the feedback strategy used to make the design static using a pass transistor and transmission gate. The main advantages of the proposed design are increased performance and low power consumption with low transistor count.

Fig. 2.4 Schematic Diagram Of DETFF3

III. SIMULATION AND RESULTS

We have analyzed several conventional flip-flop and a new flip flop using 65nm technology. Each flip-flop is optimized for power delay product. The simulation results were captured after varying clock frequency from 400MHz-2GHz and supply voltage is also varied from 0.8V-1.2V. Transistor sizing is also varied from 65nm to 120nm and results were captured.

Table 1 comparison of power,delay and PDP of flip flops

<table>
<thead>
<tr>
<th>Flip-Flops</th>
<th>Clk-Q Delay(ps)</th>
<th>Power(µW)</th>
<th>PDP(µW–ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DSPFF</td>
<td>410.21</td>
<td>2.20</td>
<td>902.46</td>
</tr>
<tr>
<td>DETFF1</td>
<td>24.01</td>
<td>2.39</td>
<td>57.38</td>
</tr>
<tr>
<td>DETFF2</td>
<td>23.31</td>
<td>26.47</td>
<td>617.01</td>
</tr>
</tbody>
</table>
Table 2 comparison of power consumption at different supply voltages of flip flops

<table>
<thead>
<tr>
<th>VDD (V)</th>
<th>DETFF1 (10⁻¹⁸W)</th>
<th>DETFF2 (10⁻¹⁸W)</th>
<th>Proposed DETFF (10⁻¹⁸J)</th>
<th>Improvement % Over 1</th>
<th>Improvement % Over 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.8</td>
<td>1.37</td>
<td>11.8</td>
<td>1.04</td>
<td>24.08</td>
<td>91.19</td>
</tr>
<tr>
<td>0.9</td>
<td>1.86</td>
<td>18.2</td>
<td>1.24</td>
<td>33.33</td>
<td>93.20</td>
</tr>
<tr>
<td>1.0</td>
<td>2.39</td>
<td>26.4</td>
<td>1.57</td>
<td>34.30</td>
<td>97.09</td>
</tr>
<tr>
<td>1.1</td>
<td>3.19</td>
<td>36.4</td>
<td>1.96</td>
<td>38.55</td>
<td>94.62</td>
</tr>
<tr>
<td>1.2</td>
<td>4.02</td>
<td>48.4</td>
<td>2.43</td>
<td>39.55</td>
<td>94.98</td>
</tr>
</tbody>
</table>

Table 3 comparison of PDP at different supply voltages of flip flops

<table>
<thead>
<tr>
<th>VDD (V)</th>
<th>DETFF1 (10⁻¹⁸W)</th>
<th>DETFF2 (10⁻¹⁸W)</th>
<th>Proposed DETFF (10⁻¹⁸W)</th>
<th>Improvement % over 1</th>
<th>Improvement % over 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.8</td>
<td>46.3</td>
<td>420.</td>
<td>17.9</td>
<td>61.26</td>
<td>95.73</td>
</tr>
<tr>
<td>0.9</td>
<td>51.1</td>
<td>552.</td>
<td>17.4</td>
<td>65.81</td>
<td>96.83</td>
</tr>
<tr>
<td>1.0</td>
<td>57.3</td>
<td>617.</td>
<td>18.9</td>
<td>66.95</td>
<td>98.74</td>
</tr>
<tr>
<td>1.1</td>
<td>67.0</td>
<td>771.</td>
<td>21.0</td>
<td>68.64</td>
<td>97.27</td>
</tr>
<tr>
<td>1.2</td>
<td>81.0</td>
<td>1024.</td>
<td>23.9</td>
<td>70.48</td>
<td>97.66</td>
</tr>
</tbody>
</table>

Figure 3.1 Variation Of PDP

IV CONCLUSION & FUTURE WORK

4.1 Conclusion
In the thesis, a dual-edge triggered flip-flop with high performance and low power is designed and their timing analysis is performed. Dual-edge triggered flip flop offers the same data throughput of single edge-trigger flip-flops at half of the clock frequency, this thereafter translates to better performance in terms of both power dissipation and speed. Also dual edge triggered static pulsed D flip-flop is designed, consisting of clock generator and pulsed latch. Latch stage employs time borrowing and hence makes the design skew tolerant. The pulse generator can generates very narrow pulse at both the rising edge of the clock and falling edge of the clock. In the latching stage of the flip-flop, the input D and Dbar can help the output Q and Qbar charge or discharge directly. In this way, we can reduce the Clock-to-Q delay. Comparing to the other flip-flops in the latest publications, the Clock-to-Q delay of this flip flop has improved and power consumption has also reduced.

4.2 Future Scope:
We can extend our work by designing of Dual Edge Triggered Flip Flop using different topologies. We can use FINFET or nano devices e.g Carbon tubes in place of CMOS. Performance parameters can also improved by choosing different width to length ratio.

REFERENCES

[2] Imran Ahmed Khan,Danish Shaikh  and  Mirza Tariq Beg,“2 GHz Low Power Double Edge
Triggered flip-flop in 65nm CMOS Technology”, IEEE 2012.


