# A Comparison of Via-programmable Gate Array Logic Cell Circuits

Thomas C.P. Chau, Philip H.W. Leong Department of Computer Science and Engineering The Chinese University of Hong Kong {cpchau, phwl}@cse.cuhk.edu.hk

### ABSTRACT

Via-programmable gate arrays (VPGAs) offer a middle ground between application specific integrated circuits and field programmable gate arrays in terms of flexibility, manufactuing cost, speed, power and area. In this paper, we present a novel VPGA logic cell, the complementary universal logic gate (CULG) which can be used to implement both sequential and combinatorial elements. Its performance is compared with a number of other designs including transmission gate, differential cascode voltage switch with pass gate, and standard cell. The CULG is found to have comparable power-delay product and process variation sensitivity to the other designs while offering the lowest power consumption.

## 1. INTRODUCTION

FPGAs offer low non-recurrent engineering (NRE) costs and higher flexibility compared with application specific integrated circuits (ASICs). However, FPGAs occupy between 20 to 40 times more area, have critical path delay 3-4 times larger and consume approximately 12 times more power [6]. On the other hand, the mask costs and manufacturing times of ASICs are far greater than those of FPGAs. Via-programmable gate arrays (VPGAs), where the device is customised using vias instead of static RAM as on an FPGA, combine advantages of both, offering greatly improved area, delay, power over FPGAs and reduced mask costs over ASICs. Control of variability is also expected to be improved over ASICs due to its regular structure.

The area penalty in an FPGA is due to it being dominated by static RAM and multiplexer structures for implementing configuration bits, logic and programmable interconnect. Since both static RAMs and multiplexers can be implemented with single vias, much greater silicon efficiency can be expected in a VPGA. Sam M.H. Ho, Brian P.W. Chan, Steve C.L. Yuen, Kong-Pang Pun, Oliver C.S. Choy Department of Electronic Engineering The Chinese University of Hong Kong {mhho, pwchan, clyuen, kppun, cschoy}@ee.cuhk.edu.hk

VPGA logic cells can be classified into two groups, lookup table (LUT) based and logic-based. The former directly implements a K-input lookup table by multiplexing  $2^{K}$  memory elements and the latter implements a configurable logic gate which can implement different logic functions.

In this paper, a study of different LUT-based logic cell circuits for low power applications is made. This area is gaining importance due to the increasing demand for battery powered devices including wireless sensor networks, communications systems, digital signal processing and systems operated using energy-scavenging. The main contributions of this work are:

- We propose a universal logic cell which has improved power consumption compared with previous designs. Moreover, the same cell can be used to implement both logic and registers.
- We compare a number of previously reported logic cells for VPGAs. Although some of the cells have been studied in terms of power-delay product (PDP) and other metrics, our study includes low-voltage operation and process variations.

The remainder of the paper is organised as follows. In Section 2, we review the literature on via programmable logic cell (VPLC) designs. Section 3 describes the circuit design of two previously proposed VPLC designs and Section 4 describes the proposed complementary universal logic gate. In Section 5, we present simulation results comparing the different designs. Finally, conclusions are drawn in Section 6.

# 2. BACKGROUND

The design of a VPLC hinges on producing a design which can combine high speed, small area and low power consumption. Compared with an FPGA, VPLC designs can use viadefined pull-up and pull-down connections to define values of a LUT, and the problem reduces to one of efficiently multiplexing them.

Tong et. al [11] proposed a number of LUT designs using different multiplexer circuits. They included an NMOS tree with weak keeper, transmission gates (TG) and a differential cascode voltage switch with pass-gate (DCVSPG). The



Figure 1: Via-programmable universal logic gate in MCML.

LUTs are customised and interconnected through via masks above metal 2, the justification being that lower masks are more complex and expensive to manufacture. Their study found that the TG and DCVSPG designs had the best energydelay performance and 4-LUTs were most efficient in terms of area and delay. It was also noted that the DCVSPG cell could also serve as a voltage level converter for dual-Vdd applications. Further work by this group showed that the addition of multiplexers and logic gates to a 3-LUT can achieve an area-delay product reduction of 48% [5]. Two via-configurable interconnect architecture are discussed in Patel et. al.[8].

Ran and Marek-Sadowska [10] reported on a via-configurable functional cell (ViaCC) consisting of vertically aligned P and N diffusion strips. Both the logic function and interconnections are defined using metall to metal2 vias. A parameter n, is used to define the number of transistor pairs available. For n = 3, the ViaCC can implement the xyz, x(y+z), x+yzand x + y + z functions. Since this does not implement a LUT or functionally equivalence, this cell is not considered in our study.

Brauer et. al. [2] proposed a via-programmable universal logic gate (ULG) in MOS current mode logic which can implement any 3-input function as well as some 4 and 5-input functions (shown in Figure 1). The gate function is defined using the first via mask, with metal 3 and above used for cell-to-cell connections. Note that this cell has static power consumption as vn is connected to a fixed bias voltage so current flows at all times, not only when switching. For this reason, the ULG is not considered in this study.

The commercial eASIC Nextreme product [3] employs maskless direct-write e-Beam processing for prototypes and a single via mask for production. A 3-LUT is used for the underlying logic function and the fabric has additional embedded blocks including block RAMs, phased locked loops and delay locked loops.

Altera's structured ASIC platform [9], HardCopy II, is based



Figure 2: Transmission gate schematic.



Figure 3: Implementation of a NAND gate using a transmission gate VPLC.

on the "HCell" which can also implement a 3-LUT.

#### 3. VIA-PROGRAMMABLE LOGIC CELLS

Our study concentrates on 3-LUTs as used in other reported VPLC designs by Tong et. al [11], Nextreme [3], Brauer et. al. [2] and Phoon et. al. [9]. In this section, the circuit designs of the different types of VPLCs are presented.

#### 3.1 Transmission Gate (TG)

The transistion gate based 3-LUT is implemented as a 2input multiplexer with inputs being selectable from c, cx, VDD, or GND as shown in Figure 2. There are four paths in the multiplexer, one of which is chosen at a particular time.

As an example, for the 3-input NAND gate in Figure 2, when paths i1–i3 are selected, the output should be 1. Since there is an inverter serving as a buffer at the output of the LUT, the input of the inverter should be equal to logic 0 and hence nodes i1–i3 need to be connected to GND. When i4 is selected, the output of the LUT should be cx, therefore i4 is connected to c. The resulting implementation is shown in Figure 3. Table 1 shows how some other logic functions can be implemented using the TG cell.

| f        | i0 | i1 | i2 | i3 |
|----------|----|----|----|----|
| XOR      | cx | с  | с  | cx |
| NAND     | 0  | 0  | 0  | С  |
| NOR      | с  | 1  | 1  | 1  |
| FA-carry | 1  | cx | cx | 0  |

 Table 1: Transmission gate implementation of some logic functions.

| f        | i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 |
|----------|----|----|----|----|----|----|----|----|
| XOR      | с  | cx | сх | с  | СХ | с  | с  | х  |
| NAND     | 1  | 1  | 1  | cx | 0  | 0  | 0  | С  |
| NOR      | cx | 0  | 0  | 0  | с  | 1  | 1  | 1  |
| FA-carry | 0  | с  | с  | 1  | 1  | cx | cx | 0  |

Table 2: DCVSPG implementation of some logicfunctions.

# 3.2 Differential Cascode Voltage Switch with Pass Gate (DCVSPG)

The schematic of a 3-LUT [11] in differential cascode voltage switch with pass gate (DCVSPG) logic [7] is shown in Figure 4. Both the logic function and its complement are generated, the former using the right-hand NMOS network and the latter the left. When one side pulls down, the other pulls up with a "weak 1" (due to the threshold voltage drop from gate to source across the NMOS pass transistor network). The cross coupled connections to the PMOS transistors allow the logic to achieve full swing and avoid static power dissipation. As only NMOS transistors are used to implement the logic, input capacitance is reduced leading to improved speed. Output buffers are used to isolate the DCVSPG output from the load.

As for the transmission gate case, the logic function is customised by connecting the bottom ports, i0-i7, to the appropriate values of 0, 1, c or cx. Table 2 shows how several different logic functions are implemented. One other advantage of this scheme is that the DCVSPG can operate as a level shifter for multiple supply configurations [11].

# 4. COMPLEMENTARY UNIVERSAL LOGIC GATE (CULG)

#### 4.1 Circuit

The complementary universal logic gate (CULG) is constructed using cross-coupled PMOS pull-up loads, two complementary NMOS pull-down logic network, and 2 output inverters. One of the NMOS networks will pull down whereas the other will be in a high impedance state. The pull down network will turn on the opposite side's PMOS transistor, forcing the other side's output to a high value and performing voltage level restoration. The output inverters buffer the complementary output signals.

The logic style employed is known as cascode voltage switch logic (CVSL) [4] and we employ the universal logic gate arrangement of Figure 1 [2] to customise the logic. CVSL is slower than DCVSPG since the PMOS pull-up only occurs after the opposite NMOS network has pulled down. However, as described later in this subsection, CULG has additional flexibility over DCVSPG as one of the pairs of inputs



Figure 5: Complementary universal logic gate (CULG) schematic.

| Κ | #NMOS in CULG | #NMOS in DCVSPG |
|---|---------------|-----------------|
| 1 | 2             | 0               |
| 2 | 6             | 4               |
| 3 | 10            | 12              |
| 4 | 22            | 28              |

Table 3: NMOS count of CULG and DCVSPG for different number of logic inputs.

can either be connected to the gate or source of the logic network. For certain logic circuits, in particular XOR and multiplexers, the latter scheme improves speed.

The CULG consists of 3 levels of logic, indicated as level a, b, c in Figure 5. Both level a and b have 2 pairs of transistors, gates being connected to complementary inputs. Level c is implemented using one pair of transistors. For example, a1 and a1x are the first set of non-inverting and inverting inputs at level a. By connecting the drains of the NMOS pairs (indicated by dark squares) to the appropriate source coupled pairs (indicated by dark circles), the CULG can implement all 3-input functions and some 4-input or 5-input functions [2]. Complementary outputs are produced at f and fx. In the CULG, inputs are connected to transistor gates only, and the levels of NMOS pull-down logic can be optimised for each function. Thus, the CULG has the potential for improved performance over the DCVSPG.

The CULG also requires fewer transistors than TG and DCVSPG for K-LUTs where  $K \geq 3$  as the number of NMOS transistors required in the logic network is  $\sum_{i=1}^{K-1} 2^i + 2^{K-1}$ , compared with  $\sum_{i=1}^{K-1} 2^{i+1}$  for DCVSPG. Table 3 shows the number of NMOS transitors required for the CULG and DCVSPG designs for different numbers of inputs.

XOR is an important logic function for arithmetic operations, and is extensively used in adders and multipliers. The CULG implementation of a 3-input XOR logic gate can be optimised to a complementary pass transistor logic (CPL) [12] circuit as shown in Figure 8. Compared to connecting the third input to the gate of a level c transistor, driving the sources of level b directly results in a reduction in both delay and power. Figure 6 shows a transient analysis of the standard and optimised cases. It can be seen that the



Figure 4: DCVSPG schematic.



Figure 6: Rise-time of XOR, with and without optimisation.

10-90% rise-time is improved by 25% when the third input drives the source of level b.

Figure 9 shows the circuit for a transparent latch using the CULG. A two phase non-overlapping clock scheme is used ( $\phi 1$  and  $\phi 2$  in Figure 9). An edge-triggered D-type flip flop (DFF) can be made from two transparent latches. The CULG's ability to realize a DFF is an important advantage over the straight LUT as for TG and DCVSPG. This is because the same cells can be configured to perform both combinational and sequential functions, resulting in better resource utilization.

#### 4.2 Layout

A layout of the CULG in a standard 0.18  $\mu$ m CMOS process is shown in Figure 10. Nodes that are via-programmable are connected to Metal3 and Metal4, and the Via4 layer is used for configuring the logic cell. The cell size is  $12\mu$ m ×  $8\mu$ m ( $96\mu$ m<sup>2</sup>).

A VPGA utilizing a VPLC cell resembles an "island-style" FPGA architecture [1], with multiplexers implemented using vias. N basic logic elements (BLEs) are grouped into a cluster, each cluster connected through routing channels and switch boxes. A BLE can be composed with 3 CULGs, one for the LUT and 2 for the DFF. Local routing area of the cluster is a lot smaller than for FPGAs. However, since two CULGs can be configured as a DFF, a cluster with only CULGs as unit cells is possible. As such, a cluster with 3N CULGs can be configured into N logic elements all regis-



Figure 7: CULG implementation of a 3-input NAND.



Figure 8: Optimised CULG implementation of a 3-input XOR.



Figure 9: Implementation of a transparent latch.



Figure 10: Layout of CULG logic cell.



Figure 11: Island-style FPGA.

tered, 3N unregistered logic elements, or any combination in-between. This provides greater area efficiency compared to FPGAs, in which the area of unused latches is wasted.

#### 5. **RESULTS**

In order to compare the VPLCs, process parameters for a standard  $0.18\mu$ m CMOS process with nominal Vdd of 1.8 V were used ( $V_{tp} = -0.54$ ,  $V_{tn} = 0.34$  V). The circuits were simulated using Cadence SPECTRE, a SPICE circuit simulator, and UltraSim, a full-chip simulator. The BSIM3 mosfet model was used and minimum sized transistors were used throughout except for inverters and buffers which were sized to give equal rise and fall times.

The VPLCs were compared with standard cell designs of the same circuits, made using the vendor's standard cell library. These are indicated as "SCELL" in the results below.

#### 5.1 Area Comparison

Table 4 summarises the properties of TG, DCVSPG and CULG logic cells for implementing different functions. For the logic cells implemented by TG, DCVSPG and CULG, 2 output inverters are included to provide complementary outputs and differential wiring is assumed. The transistor counts are shown in Table 4, indicated by (the number of PMOS / the number of NMOS), output inverters being included. For the CULG, the number in brackets is the actual number of NMOS transistors used to implement the function.

Layouts were made manually for each of the cell except SCELL which was obtained from the vendor's standard cell library. The TG, DCVSPG and CULG cells occupy similar area. Compared with SCELL, the area overhead of the VPLCs are between a factor of 6.5 (ring oscillator) down to 2 (adder).

#### 5.2 Test Circuits

Several circuits were designed and implemented on the VPLC to evaluate different logic families. In Figure 12, 9 stages of NAND3 are cascaded and configured as an ring oscillator in order to find the combinatorial delay. A transient analysis was performed and propagation delay and average power consumption recorded over different conditions (variations of supply voltage, temperature and simulation parameters). Since the ring oscillator may not reflect the properties of complex combinatorial logic, an 8-bit counter and 8×8 multiplier were also simulated. The counter is made from an

|        | 3-input NAND   |              | 1-bit Full Adder |              | $8 \times 8$ multiplier |                |           |
|--------|----------------|--------------|------------------|--------------|-------------------------|----------------|-----------|
|        | Dimension      | #Transistors | Dimension        | #Transistors | Dimension               | #Transistors   | Flip-flop |
|        | $\mu { m m}^2$ | P/N(used)    | $\mu m^2$        | P/N(used)    | $\mu \mathrm{m}^2$      | P/N(used)      | type      |
| TG     | 14*7           | 8/8          | 14*14            | 16/16        | 112*154                 | 704/2464       | SCELL     |
| DCVSPG | 12*8           | 4/14         | 12*16            | 8/28         | 96*176                  | 704/2464       | SCELL     |
| CULG   | 12*8           | 4/12(8)      | 12*16            | 8/24(20)     | 96*176                  | 704/2112(1568) | generic   |
| SCELL  | $2.6^{*}5.7$   | 3/3          | $10.7^{*}5.7$    | 14/14        | 40*106                  | 920/920        | SCELL     |

Table 4: Properties of the TG, DCVSPG, CULG and SCELL-based designs. The numbers in parentheses indicate the number of NMOS transistors actually used to implement the logic function.



Figure 12: 9-stage ring oscillator schematic.



Figure 13: 8-bit counter schematic.

8-bit ripple carry adder and an 8-bit register, as shown in Figure 13. The multiplier is ripple-carry based, with array of 64 AND gates, 48 full-adders and 8 half-adders, as shown in Figure 14.

A summary of the power, delay, and power-delay product (PDP) of the different test circuits for different VPLCs is given in Figure 15, Figure 16, and Figure 17 respectively. The circuits operating at the nominal supply voltage 1.8 V are indicated by HV and that operating at low supply voltage of 0.9 V are indicated by LV. All the results normalised to the SCELL result.

It can be seen that CULG has the lowest overall power consumption of all of the VPLCs. In terms of delay, TG is the slowest for all cases, DCVSPG being fastest and CULG inbetween. DCVSPG and CULG both have lower PDP than TG, their ring oscillator and counter values being similar and CULG being significantly better for the multiplier circuit. In summary, the CULG has excellent power consumption and PDP compared with the other designs.

The CULG has a much higher delay in the ring oscillator circuit than SCELL. This is expected since it employs a minimum size pull-up PMOS and has an additional buffer to drive per stage. For more representative circuits such as the adder and multiplier, the delay, power and PDP compare



Figure 14:  $8 \times 8$  multiplier schematic.



Figure 15: Normalized power consumption of the three VPLCs at nominal voltage.

quite favourably, the normalised PDP ranging between 0.5 and 2.

#### **Ring** Oscillator

Figure 18 summarises the experiment results for the ring oscillator. For the computer simulation to operate correctly, the supply was set to rise from 0 to Vdd in a few ns. The delay was 1/(2f), where f is the frequency of the output square wave, and the power was measured as the supply voltage  $\times$  the average switching current over one period. CULG is always faster than TG and consumes less power. DCVSPG consumes 73% more power than CULG for a 44% improvement in speed.

#### Counter

Figure 19 summarises the results for the 8-bit counter. The clock and reset signal are buffered through inverters. These



Figure 16: Normalized delay of the three VPLCs at nominal voltage.



Figure 17: Normalized PDP of the three VPLCs at nominal voltage.

inverters are part of the input stimulus circuitry, so their power consumption was not included in the results. The sum bits drive the input of an 8-bit register, and the most significant bit (MSB) carry-out is connected to the input of a CMOS inverter, which is used as the output load. The power was averaged for the counter counting from 01 to FF and back to 01. Delay was measured as the propagation delay of the 8th-bit carry-out when the sum switches from FF to 00. CULG is 18% faster than TG and consumes fairly similar power. TG performs much slower at supply voltages below 0.75 V, and is unable to operate at a 0.5 V supply voltage. DCVSPG consumes 80% more power than CULG on average, in return for being 50% faster.

#### Multiplier

Figure 20 summarises the results for the 8x8 multiplier. The input stimulus circuit contains a number of inverters, which buffer the input signals before being applied to the tested circuits, and their power consumption was not included in our measurements. The power was averaged over an output switching, and the delay was measured as the time for a valid output to be observed after any input changes. It is observed that CULG has lower power consumption than TG and DCVSPG, except in cases that the supply voltage is below 0.75 V. TG is slower than CULG by 180%. CULG has 20% higher delay than DCVSPG, but DCVSPG consumes approximately 130% more power than CULG.

#### 5.3 **Process Variation**



Figure 21: Delay deviation of ring oscillator over temperature (log scale).



Figure 22: Frequency variation over PMOS width (log scale).

We performed 3 sets of experiments to explore the different VPLCs' sensitivity of delay to different process parameters. The percentage deviation of delay using typical mean (TM), worse case speed (WS), and worse case power (WP) simulation parameters were measured over different operating temperatures. Figure 21 summarises the results, and shows that TG has the worst delay deviation. DCVSPG is the best and CULG is in-between. This result implies that DCVSPG may offer better yield compared with the other circuits.

#### **5.4** Further Investigations

#### Effect of PMOS Width

An additional simulation was performed to investigate the effect of PMOS transistor size for CULG and DCVSPG. Using the circuit in Figure 12, and with the nominal supply voltage, the width of the PMOS load transistors were varied and speed and power performance measured. From Figure 22 and Figure 23, it is observed that CULG's performance is very sensitive to PMOS width. As the width of PMOS increases, the frequency decreases sharply and the power consumption remains steadly. DCVSPG is less sensitive to the change in PMOS width, but its frequency also decreases. Thus, it is concluded that the minimum PMOS sizing as used in all of our other simulations, is an appropriate choice.

#### Multiplier Area Optimisation



Figure 18: Power and delay of ring oscillator (log scale).



Figure 19: Power and delay of 8-bit counter (log scale).



Figure 20: Power and delay of 8x8 multiplier (log scale).



Figure 23: Power consumption over PMOS load (log scale).

A basic multiplier cell can be formed from a full adder and a 2-input AND gate. Since the cell would require three 3-LUTs, a large improvement in circuit area can be achieved if a full adder and a 2-input AND gate can be combined in one VPLC. This requires the addition of two NMOS transistors as shown in Figure 24. The multiplier cell can then be implemented using two VPLCs instead of three.

Figure 25 shows the effect of this modification. The optimization causes an average 40% increase in power and 90% increase in delay. These are caused by the increased number of series NMOS transistors in the logic network. Moreover, we are not able to use the optimised 3-input XOR described in Section 4.1 where the source rather than gate is driven (this can be seen by comparing Figure 8 and Figure 24). These factors combine together to cause a longer switching delay, and greater dynamic power consumption. This area optimisation thus involves a trade-off with PDP performance.

#### 6. CONCLUSION



Figure 24: Area-optimised CULG for multipliers.



Figure 25: Power consumption and delay of multiplier optimized for PDP and area (log scale).

In this work a novel VPLC, the CULG, was described and a comparison of different designs given. On the set of test circuits used, DCVSPG and CULG had significantly better delay than the TG-based design. DCVSPG was better overall in terms of speed, CULG in terms of power consumption and both DCVGPG and CULG had similar power-delay product. The TG scheme had the highest sensitivity to process variation with CULG second and the DCVSPG performing particularly well in this metric. In terms of functionality, CULG allows the same cell to be used to implement both latches and logic, improving area utilization. We believe that the CULG design is a promising choice for VPGA applications where area and power consumption are important factors.

In future work we intend to fully characterise the CULG cell, explore other VPLC circuits, routing structures for VPGAs and extend our comparisons over larger circuits involving multiple VPLCs. Test chips will also be fabricated.

#### Acknowledgements

The authors gratefully acknowledge support from the Research Grants Council (Earmarked Grant CUHK413707) and the Innovation Technology Fund (Grant GHP/028/07SZ), the Hong Kong Special Administrative Region, China.

Any opinions, findings, conclusions or recommendations expressed in this material/event (or by members of the project team) do not reflect the views of the Government of the Hong Kong Special Administrative Region or the Innovation and Technology Commission.

#### 7. REFERENCES

- V. Betz, J. Rose, and A. Marquardt, editors. Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA, USA, 1999.
- [2] E. J. Brauer, I. Hatirnaz, S. Badel, and Y. Leblebici. Via-programmable expanded universal logic gate in mcml for structured ASIC applications: circuit design. In *International Symposium on Circuits and Systems* (ISCAS), 2006.
- [3] eASIC. http://www.easic.com, 2008.

- [4] L. Heller, W. Griffin, J. Davis, and N. Thoma. Cascode voltage switch logic: A differential CMOS logic family. Solid-State Circuits Conference. Digest of Technical Papers. 1984 IEEE International, XXVII:16–17, Feb 1984.
- [5] A. Koorapaty, V. Chandra, K. Y. Tong, C. Patel, L. Pileggi, and H. Schmit. Heterogeneous programmable logic block architectures. In *in Design Automation and Test in Europe Conference*, 2003.
- [6] I. Kuon and J. Rose. Measuring the gap between FPGAs and ASICs. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 26(2):203-215, Feb. 2007.
- [7] F. Lai and W. Hwang. Differential cascode voltage switch with the pass-gate (DCVSPG) logic tree for high performance cmos digital systems. In *Proceedings* of the International Symposium on VLSI Technology, Systems, and Applications, pages 358–362, 1993.
- [8] C. Patel, A. Cozzie, H. Schmit, and L. Pileggi. An architectural exploration of via patterned gate arrays. *Proceedings of the 2003 international symposium on Physical design*, pages 184–189, 2003.
- [9] H. K. Phoon, M. Yap, and C. K. Chai. A highly compatible architecture design for optimum FPGA to structured-ASIC migration. In *IEEE International Conference on Semiconductor Electronics (ICSE)*, pages 506–510, 2006.
- [10] Y. Ran and M. Marek-Sadowska. Designing via-configurable logic blocks for regular fabric. *IEEE Transactions on Very Large Scale Integration (VLSI)* Systems, 14(1):1–14, Jan 2006.
- [11] K. Y. Tong, V. Kheterpal, V. Rovner, L. Pileggi, H. Schmit, and R. Puri. Regular logic fabrics for a via patterned gate array (VPGA). In *Proceedings of the Custom Integrated Circuits Conference*, pages 4.3.1–4.3.4, 2003.
- [12] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimohigashi, and A. Shimizu. A 3.8-ns CMOS 16×16-b multiplier using complementary pass-transistor logic. Solid-State Circuits, IEEE Journal of, 25(2):388–395, Apr 1990.