# IMPLEMENTATION OF LOW POWER AND LOW ENERGY SYNCHRONOUS SAPT LOGIC

Chitambara Rao.K<sup>1</sup>,Nagendra.K<sup>2</sup> Sreenivasa Rao.Ijjada<sup>3</sup>

<sup>1</sup>Department of ECE, AITAM College of Engineering, Tekkali, Srikakulam,India rao\_chidu@ymail.com <sup>2</sup>Department of ECE, GIT, GITAM University, Visakhapatnam, India nage.kotaru@gmail.com <sup>3</sup>Department of ECE, GIT, GITAM University, Visakhapatnam, India isnaidu2003@gmail.com

#### ABSTRACT

This paper presents the design and implementation of a low-energy synchronous self timed logic topology using sense amplifier-based pass transistor logic (SAPTL). The SAPTL structure can realize for very low power computation by leakage current controlling networks with reduced supply voltages. The introduction of synchronous operation in SAPTL further improves energy-delay performance without a significant increase in hardware complexity. A simple XOR gate is implemented in SAPTL architecture. The power consumption of the SAPTL is less.

#### **KEYWORDS**

Low-voltage low-power logic styles, pass-transistor logic, VLSI circuit design.Low-leakage circuits, pass transistor, self-timing, sense amplifier-based pass transistor logic (SAPTL), high-speed circuits, MOSFET logic devices, 90nm CMOS

## **1. INTRODUCTION**

As the technology continuously to scaling both supply voltage and device threshold voltage down together to achieve the required performance of the device. Reducing the supply voltage effectively reduces dynamic energy consumption, but this is increases the leakage energy due to the device lower threshold voltage needed to maintain performance [1]. As a result, for low-energy applications, the leakage energy that the system can tolerate ultimately limits the minimum device threshold voltage. Speed therefore, benefits little from technology scaling. The continued scaling of transistor feature sizes leads to an increase in integration density, which brings about a corresponding increase in compute density.

This scaling also results in an increase in overall circuit power consumption and this increase in power that accompanies this scaling trend is preventing us from truly harnessing the benefits of decreasing transistor feature sizes. For applications that are severely energy limited, such as those using implantable electronics, the energy per operation must continue to decrease, allowing for years of battery life at relatively low operating frequencies and power levels. The best way to reduce the energy per operation is to reduce the supply voltage, Vdd.

The SAPTL technique [1] is a novel circuit architecture that breaks this trade-off in order to achieve very low energy without affecting the speed. The initial SAPTL circuits were designed to operate synchronously but with the intent of being able to operate asynchronously with some minor modifications.

The SAPTL technique offers an efficient way to realize synchronous operation. Because of the differential signaling used, it is easy to determine when a logical operation completes. Therefore, SAPTL topology is a good method for reducing power consumption and improving speed in extremely low energy applications [1].

## 2. VARIOUS LOGIC IMPLEMENTATIONS

#### 2.1. CMOS logic

The Commonly used logic style is static complementary CMOS which consists of pull down and pull up networks as shown in Fig 1[2]. This is combination of pull-up network (PUN) and the pull-down network (PDN).The static CMOS is really an extension of the static CMOS inverter to multiple inputs. As shown in the figure, The N input logic gate with all inputs is distributed to both the pull-up and pull-down networks. In this static CMOS logic for an N-input logic gate, 2N-transistors are required which results in significantly large implementation area. The function of the PUN is to provide a connection between the output and VDD anytime the output of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the output to VSS when the output of the logic gate is meant to be 0. Conventional static CMOS has been a technique of choice in most processor design. Alternatively, static pass transistor circuits have also been suggested for low-power applications. Dynamic circuits, when clocked carefully, can also be used in low-power high speed systems.



Figure 1. N input Static CMOS circuit

The PUN and PDN networks are constructed in such a way that one and only one of the networks is conducting in steady state [3].

In this way, once the transient period is over, a path always exists between VDD node and the output node F, realizing a high output ("one"), or, alternatively, between VSS node and output F node for a low output ("zero"). Therefore the output node is always a low-impedance node in steady state.

#### 2.2. Ratioed logic

Ratioed logic is the technique, used to reduce the number of transistors required to implement a given logic function, but it reduces robustness and extra power dissipation. The ratioed logic is shown in the figure 2[4]; this logic is used to reduce circuit complexity of the static CMOS devices. When the PDN is turned off, the PUN in complementary CMOS provides a conditional path between VDD and the output. In This logic, the entire PUN is replaced with a single unconditional load device that pulls up the output for a high output. In this, instead of a combination of active pull-down and pull-up networks gate consists of an NMOS pull-down

network that realizes the logic function, and a simple load device. The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1versus 2N for complementary CMOS).

When the output is pulled high (assuming that VOL is below Vtn), the nominal high output voltage (VOH) for this gate is VDD since the pull-down devices are turned off. On the other hand, the nominal low output voltage is not 0 V since there is a fight between the devices in the PDN and the grounded PMOS load device.

This technique results in reduced noise margins and static power dissipation. The sizing of the load device relative to the pull-down devices can be used to trade-off parameters such a noise margin, propagation delay and power dissipation. Since the voltage swing on the output and the overall functionality of the gate depends upon the ratio between the NMOS and PMOS sizes, the circuit is called ratioed. In the ratio less logics, the low and high levels do not depend upon transistor sizes.



Figure 2. ratioed Logic

#### **2.3. DCVSL**

DCVSL is a ratioed logic in which the number of transistors required / reduced to implement a given logic function, at the cost of reduced robustness and extra power dissipation. The purpose of the (PUN) Pull up Network in complementary CMOS is to provide a conditional path between Vdd and the output when the (PDN) Pull down Network is turned off. In ratioed logic, the entire PUN is replaced with a single unconditional load device that pulls up the output for a high output. Hence number of transistors decreases in implementing a design. In ratioed logic style static currents will occur and there is no rail to rail swing. In order to eliminate above disadvantage we go for DCVSL where each input is provided in complementary format and produces complementary outputs in them. It consists of feedback mechanism ensuring that the load device is turned off when not needed.



Figure 3. Dual Cascode Voltage Switching Logic

In the figure 3, PDN's: PDN1, PDN2 use nMOS devices and are mutually exclusive; i.e when PDN1 conducts PDN2 is off and vice versa. So that the required function and its inverse function is implemented simultaneously. Suppose that PDN1 conducts while PDN2 does not, so

that OUT and OUT are initially high and low, respectively. Turning on PDN, causes OUT to be pulled down, though there is still a fight between M1 and PDN1.  $\overline{OUT}$  is in a high

impedance state as M2 and PDN2 are both turned off. PDN1 must be strong enough to bring OUT below Vdd-IVtpl the point at which M2 turns ON and  $\overline{OUT}$  to be Vdd, so that OUT discharges to Gnd. Hence in DCVSL both complementary and Non-complementary inputs are given to obtain both logic design and its inverse without static power dissipation with increase in complexity. Additionally, the dynamic power dissipation is high.

#### 2.4. DOMINO logic

Domino CMOS has become the prevailing logic family for high performance CMOS applications and it is extensively used in most state-of-the-art processors due to its high speed capabilities [2]. The structure of this logic module consists of an n-type dynamic logic block followed by a static inverter. During the precharge mode, the n-type dynamic gate output is charged up to VDD, and the output of the inverter is set to 0. In evaluation mode, this dynamic gate conditionally discharges, and the output of the inverter makes a transition from  $0 \rightarrow 1$ . If all the inputs of a Domino gate are outputs of the other Domino gates, then it is ensured that all inputs are set to 0 at the end of the precharge phase, and that the only transitions during evaluation are  $0 \rightarrow 1$  transitions [3]. The formulated rule is hence obeyed. The introduction of the static inverter has the additional advantage that the fan-out of the gate is driven by a static inverter with a low-impedance output, which increases noise immunity. The buffer furthermore reduces the capacitance of the dynamic output node by separating internal and load capacitances. A major limitation in Domino logic is that only non-inverting logic can be implemented. This requirement has limited the widespread use of pure Domino logic.

#### 2.5. Pass Transistor Logic

One circuit alternative is to use pass transistor networks to reduce leakage current. Pass transistor logic is a simple and compact circuit topology and in some cases, outperforms static CMOS circuits. The pass transistor network itself does not have Vdd and ground connections, thus drastically reducing the number of leakage paths as shown in Figure 4.



Figure 4. Generic pass transistor network

In pass transistor logic (PTL), leakage is confined to the driving and level restoring circuitry associated with the pass transistor network. These circuits are used to recover the voltage swing and delay degradation inherent in PTL circuits. Figure 5 shows a conventional pass transistor network that implements logic functions based on multiplexer or binary decision diagram (BDD) tree structures. The main drawback of these types of tree structures is that sneak paths exists allowing leakage current to flows.



Figure 5. A conventional pass transistor tree showing sneak laekage paths

Pass transistor networks can be made more complex, thus reducing the total number of drivers and level restorers in order reduce the number of leakage paths, but unfortunately, the number of sneak paths in the pass transistor tree increases exponentially with the number of logic inputs, i.e.,  $2^N$  sneak paths for N levels. Note that the delay is also dependent on the number of levels and is proportional to N<sup>2</sup> Pass transistors also increases the effective channel length (and thus resistance of the leakage path) between the supply rails. However, PTL has the potential to offer more computational density for a given leakage path resistance than simply increasing the transistor channel length.

### **3. SAPTL ARCHITECTURE**

The basic organization of the SAPTL circuit is shown in Figure 6. It consists of (1) the pass transistor tree, called the stack, which computes the required logic function (2) a root node driver that injects signals into the stack and (3) a sense amplifier that is used to recover both voltage swing and performance.



Figure 6. Saptl architecture

1) By decoupling sub threshold leakage current from the stack threshold voltage, allowing for increased performance without an increase in leakage energy, and 2) by confining

subthreshold leakage to well-defined and controllable paths found only in the drivers and sense amplifiers.

Note that the total energy consumed by the SAPTL is composed of the following:

2) the energy used by the driver to energize the stack 2) the energy used by the sense amplifier to resolve the correct logical levels and drive the inputs of the fan-out stacks and 3) the energy needed to generate the appropriate timing information, either globally, such as clock distribution networks, or locally, as in handshaking circuits.

#### 3.1. The Stack & Driver

To mitigate the limitations of conventional multiplexer- based pass transistor trees due to sneak paths, and recognizing that pass transistors are inherently bidirectional circuits, an inverted pass transistor tree, which is referred to as the stack is utilized, and shown in Figure 5.

The stack still has no supply rail connections and has predictable delay paths, and in addition, has pseudo-differential outputs, where a signal or current is present in either S or Sbar, but not both at the same time. The input capacitances of the stack can be made equal by making the transistors closer to the root input larger.

This also has the effect of decreasing the delay of the stack, by reducing the resistance of the signal path near the root of the -tree. Since the input can only propagate from the root of the stack to the output, there are no sneak paths that exist, and thus to first order, reducing Vth to near zero is possible.

The reduction in threshold voltage also reduces the resistance, and thus, the propagation delay from the root of the stack to the outputs S and Sbar without any corresponding increase in leakage current drawn from the supply rails. The absence of sneak paths also allows the construction of deeper and more complex stacks, again without an increase in supply rail leakage. This Vth reduction and complexity increase, however, imposes stricter input resolution requirements on the sense amplifier, due to the lower Ion/Ioff ratio at its inputs. Each path from the root node to the output of the stack represents a minterm of a logic function, thus to program the stack, each branch representing the minterms contained in the desired logic function to be implemented is connected to the output S and each max term is connected to S. Figure 10 shows how a 2-input stack can be configured to generate a boolean function of two variables.



Figure 7. Stack architecture

Here the depth of the stack, Nstack, is defined as the number of transistors in series from root node to output, from the nature of the stack, it is the same for every path. Note that the input capacitance of the stack, Cin is proportional to 2N stack.

In this paper, an inverter is used as the root driver. A driver, which is a simple inverter in this case, injects an evaluation current into the root of the stack. In operation, either or, but not both, is charged toward the supply rail when the driver energizes the selected path through the stack[1]. After each computation and before every evaluation, both differential outputs are reset to ground (logical "0") to initialize the stack to a known state. This initialization is done by turning on all the transistors in the stack and draining the charges out through the root of the stack when the driver output is zero.

#### 3.2. Sense Amplifier

The sense amplifier, shown in Fig. 8, serves three purposes: 1) it amplifies the low-voltage stack output, restoring the signal to full voltage; 2) it serves as a buffer stage at the output of the stack, so as to improve overall speed; and 3) it precharges both its outputs to (logical "1"), allowing the reset of the driven fan-out stacks.



Figure 8. Sence amplifier circuit

The sense amplifier consists of two stages. The stage one is the preamplifier to reduce the impact of mismatch in the actual technology environment, and the stage two acts as a crosscoupled latch which retains the processed data even after the stack is reset. The sense amplifier is designed to detect input voltages that are less than, thus reducing the performance degradation due to the low stack voltage swings and the absence of gain in the pass transistor network. By turning off the driver as soon as the sense amplifier makes a decision, the stack voltage swings are kept to a minimum, reducing the energy required to perform the desired logical operation.

## 3.3. Synchronous Timing

One approach to providing timing information to the SAPTL is by using global two-phase nonoverlapping clock signals is shown in the figure 9. Due to the possibility of charge build-up within the non- energized stack paths, two-phase clocking is used in order to precondition all the internal nodes of the stack to ground prior to applying the root node drive signal. A stack can be preconditioned by setting all the inputs to the stack to Vdd and the root node to ground, thus forcing all nodes to be discharged. This ensures that there are no unwanted charge sharing events that occur inside the stack during the evaluation phase that could possibly cause the sense amplifier to make an incorrect decision.



Figure 9. Synchronous saptl block diagram for XOR gate

# 4. RESULTS

The synchronous operation of the SAPTL provides robustness in the presence of variability as well as performance. The output wave forms of the Synchronous SAPTL for XOR gate implementation is shown in figure 10. For two input data all the conditions are verified. The power consumption of the SAPTL XOR is very less than other logic blocks.

# 5. CONCLUSION & FUTURE SCOPE

The synchronous operation of the SAPTL provides robustness in the presence of variability as well as performance. The low implementation cost of the synchronous operation makes the self-timed SAPTL family a very promising candidate to realize robust and low-energy computations. The self timed SAPTL using the bundled data protocol can potentially achieve higher speed performance by overlapping the data evaluation and reset cycle. Finally we can conclude that low power design can be done by using this Sense amplifier based pass transistor logic (SAPTL). The above logic circuits can be implemented in sub threshold region to reduce power dissipation further [15].



Figure 10. Output waveforms of SAPTL XOR gate

# **6. REFERENCES**

- [1] Tsung-Te Liu, Louis P. Alarcón, Matthew D. Pierson, and Jan M. Rabaey "Asynchronous Computing in Sense Amplifier-based Pass Transistor Logic" Berkeley Wireless Research Center, University of California, Berkeley CA 94704
- [2] Sreenivasa Rao.Ijjada, Ayyanna.G, G.Sekhar Reddy, Dr.V.Malleswara Rao, "PERFORMANCE OF DIFFERENT CMOS LOGIC STYLES FOR LOW POWER AND HIGH SPEED" -International Journal of VLSI design & Communication Systems (VLSICS), Vol.2, No.2, June 2011", pp. 66-76
- "Digital Integrated Circuits" by Rabaey Scribd ,chapter1
  http://www.scribd.com/doc/2190480/Digital-Integrated-Circuits-by-Rabaey
- [4] "Static CMOS Design

http://www.scribd.com/doc/38287154

[5] T. Sakurai, "Perspectives on power-aware electronics," in *ISSCC Dig.Tech. Papers*, 2003, Vol. 1, pp. 26–29.

- [6] L. Alarcón, T.-T. Liu, M. Pierson, and J. Rabaey, "Exploring very lowenergy logic: A case study," J. Low Power Electron., Vol. 3, no. 3, pp. 223–233, Dec. 2007.
- [7] J. Sparsø and S. Furber, *Principles of Asynchronous Circuit Design*. Norwell, MA: Kluwer, 2001.
- [8] J. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits: A Design Perspective*, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2003.
- [9] H. Li, S. Bhunia, Y. Chen, K. Roy, and T. Vijaykumar, "DCG: deterministic clock-gating for low-power microprocessor design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, Vol. 12, no. 3, pp. 245–254, Mar. 2004.
- [10] N. Banerjee, K. Roy, H. Mahmoodi, and S. Bhunia, "Low power synthesis of dynamic logic circuits using fine-grained clock gating," in *Proc. DATE*, Mar. 2006, Vol. 1, pp. 1–6.
- [11] T.-T. Liu, L. Alarcón, M. Pierson, and J. Rabaey, "Asynchronous computing in sense amplifierbased pass transistor logic," in *Proc. 14<sup>th</sup> IEEE Int. Symp. ASYNC*, Apr. 2008, pp. 105–115.
- [12] T. Williams, "Performance of iterative computation in self-timed rings," J. VLSI Signal Process., Vol. 7, no. 1/2, pp. 17–31, Feb. 1994.
- [13] K. Stevens, R. Ginosar, and S. Rotem, "Relative timing," *IEEE Trans. Very Large Scale Integr.* (*VLSI*) *Syst.*, Vol. 11, no. 1, pp. 129–140, Feb. 2003.
- [14] I. Sutherland, "Micropipelines," Commun. ACM, Vol. 32, no. 6, pp. 720–738, Jun. 1989.
- [15] S. Narendra, "Scaling of stack effect and its application for leakage reduction," in *Proc. ISLPED*, Aug. 2001, pp. 195–200.
- [16] Louis Poblete Alarcon & Jan M. Rabaey "sense amplifier based pass transistor logic" Ph.D report-december 2010

# 7. AUTHORS

1. Mr.Chitambara Rao.K completed his B.Tech in 2002. And he worked for TPIST as an asst. professor during the period of 2002-2005, he worked for SISTAM during the period of 2005-2006, he worked for PCE during the period from 2006-2007.He received his M.Tech degree in he worked for 2009 from SATYABHAMA University, Chennai. Presently he is working in AITAM College of engineering as an Sr.Assistant Profesor.

2. Sreeenivasa Rao.Ijjada received his AMIE degree from The Institution of Engineers (INDIA) in the year 2001 and received M.Tech degree in the year 2006 from J.N.T.U. Kakinada. He is a Ph.D Scholar and working in GITAM Institute of Technology, GITAM University, and Vishakhapatnam as an Assistant Professor. He is a life member of AMIE. His Research activities are related to Low Power VLSI Design.



