# Large-Scale Power-Combining and Linearization in Watt-Class mmWave CMOS Power Amplifiers

Ritesh Bhat, Anandaroop Chakrabarti and Harish Krishnaswamy Department of Electrical Engineering, Columbia University, New York, NY-10027

Abstract—Switching-class PAs employing device-stacking have been recently explored to meet the challenge of efficient power amplification at mmWave frequencies at moderate power levels of around 20dBm. In this paper, we propose the use of a single-step, large-scale (8-way), 75%-efficient lumped quarterwave power combiner that is co-designed with stacked Class-Elike PA unit cells to enable a Q-band 45nm SOI CMOS PA with a peak P<sub>sat</sub> of 27.2dBm (>0.5W), peak PAE of 10.7% and 1dB flatness in  $P_{sat}$  over nearly the entire Q-band (33-46GHz). This measured output power level is approximately 5  $\times$  higher than prior reported mmWave silicon PAs. In order to support complex modulations with high average-efficiency, we also propose a novel linearizing architecture that combines largescale power combining, supply-switching for efficiency under backoff and dynamic load modulation for linearization. A second fully-integrated 42.5GHz 45nm SOI CMOS PA is implemented based on this architecture and achieves 60% of the peak efficiency at 6dB back-off.

### I. INTRODUCTION

Millimeter-wave (mmWave) frequencies dictate the use of deeply scaled CMOS technologies, which have low breakdown voltages. To counter this and increase saturated output power  $(P_{sat})$ , techniques such as power combining and device stacking have been investigated [1], [2]. Large-scale power combining is usually associated with a degradation in efficiency and/or bandwidth. While device stacking has enabled moderate output powers ( $\approx 20$ dBm), watt-level output power is yet to be achieved in CMOS at mmWave frequencies. To address these issues, we propose the use of a lumped-element quarter-wave combiner that enables 8-way power combining with 75% measured efficiency. The use of this combiner in conjunction with stacked Q-band Class-E-like SOI CMOS PAs [2] results in watt-class operation ( $P_{sat} > 0.5W$ ) from a 45nm SOI CMOS PA array with a 1dB bandwidth spanning 33-46GHz owing to combiner-PA co-design.

A second major challenge is the trade-off between efficiency and linearity. Finally, conventional PAs are most efficient near  $P_{sat}$  and exhibit poor efficiency under back-off. PAs employing linearizing architectures, such as Doherty [3] and outphasing [4], have been proposed that enable nonlinear PAs to be used with complex modulations while achieving high efficiency under back-off. This paper introduces the first linearizing PA architecture at mmWave frequencies that simultaneously employs large-scale power combining for higher output power ( $P_{out}$ ), PA supply switching for higher efficiency under back-off, and load-modulation for linearization of switching power amplifiers. A 45nm SOI CMOS, 42.5GHz, 8-way-combined, stacked, Class-E-like PA array prototype employing the aformentioned architecture achieves 60% of its peak efficiency at 6dB back-off.



Fig. 1. Schematic of the 33-46GHz 45nm SOI CMOS Watt-Class PA Array



Fig. 2. Schematic of each unit cell PA comprising a Class-E-like driver PA with 2 devices stacked followed by a Class-E-like main PA with 4 devices stacked used in the 33-46GHz 45nm SOI CMOS watt-class PA array.

### II. 33-46GHz 45NM SOI CMOS WATT-CLASS PA ARRAY

Large-scale, low-loss power combining on silicon is fraught with several challenges. Transformer-based series power combining is limited by the asymmetry that results from parasitic inter-winding capacitance, causing non-constructive addition of individual PA voltages and stability challenges [5]. With Wilkinson power combiners, the maximum number of PAs that can be combined in a single Wilkinson is restricted to 2-4 by the highest transmission-line  $Z_0$  that can be achieved in the back end of the line or BEOL (required  $Z_0 = 50\sqrt{n}$ ohms, where *n* is the number of PAs combined). Cascading 2:1 Wilkinsons results in a severe increase in combiner loss. In this work, a lumped-element  $\lambda/4$  combiner is pursued, which is essentially a Wilkinson combiner sans the isolation resistors. In order to enable 8-way combining in a single structure, a lumped  $\pi$ -section equivalent of a quarter-wave transmissionline is employed in each path [6], with the  $\pi$ -section realized as a single spiral inductor (Fig. 1). To achieve the desired  $Z_0 = 50\sqrt{n}$  ohms and quarter-wavelength, the spiral must achieve an inductance of  $L = 50\sqrt{n}/\omega_o$  (=500pH for n=8)



Fig. 3. (a) Measured large-signal performance at 37GHz and (b) Measured  $P_{sat}$ , drain efficiency and PAE of the 33-46GHz power-combined PA array across frequency.

and a parasitic capacitance of  $C = 1/(50\sqrt{n\omega_o})$  (=25fF for n=8) on either side. Therefore, the number of elements that can be combined in a single step will be limited by the achievable self-resonant frequency (SRF) of spirals in the BEOL. We have found that up to 12 elements may be combined based on achievable SRF, but have pursued 8-way combining in this work due to floor-planning considerations. The key insight is that larger-scale one-step power combining is achieved compared to Wilkinson combining because spirals are able to achieve higher  $Z_0$  (greater inductance for a given parasitic capacitance) than transmission lines via their self magnetic coupling. Loss is also reduced as spirals are able to use wider line widths than thin high- $Z_0$  transmission lines. The combiner presents a 50 $\Omega$  load to each PA. It is implemented in the topmost metal layer which has a thickness of  $2.225 \mu m$  and is  $9.5\mu m$  above the substrate. Measured breakouts show a spiral Q of 25 and 8-way combining efficiency of 75% at 45GHz, and excellent agreement with EM simulations (78% at 43GHz). For comparison, 8-way-combining via a 3-layer cascade of 2:1 Wilkinsons is simulated to have an efficiency of only 63%.

Fig. 2 shows the circuit diagram of each individual PA and its driver. The Class-E-like PA employs four stacked nMOS devices, and the Class-E-like driver employs two stacked devices [2]. A breakout of a unit-cell PA comprising the 2stack driver and the 4-stack PA achieves a peak  $P_{sat}$  of 20dBm and a peak PAE of 16%. The input is split between the PAs using a  $3\lambda/4$  splitter. A broad bandwidth is achieved in the array through a stagger-tuned design between the unit cell and the combiner: the unit cell's output power decreases along Qband when loaded with the combiner's input impedance, while the combiner's efficiency peaks at around 45GHz.

The  $3.2mm \times 1.3mm$  power-combined PA array (Fig. 1) is probed in chip-on-board configuration. The large signal performance vs output power at 37GHz and the drain efficiency, PAE and  $P_{sat}$  across frequency are summarized in Fig. 3(a) and (b) respectively. The PA maintains 1dB-flatness in saturated output power (26-27dBm) and  $P_{-1dB}$  (20-21dBm) from 33-46GHz. The PA achieves a peak  $P_{sat}$  of 27.2dBm at 35GHz with a peak PAE of 10.7%. Small-signal S-parameters were also measured and show a similar bandwidth, but are not presented here for the sake of brevity. Large-signal measurement below 33GHz is limited by the experimental setup, but it is expected that the PA's bandwidth extends significantly below Q-band.



Fig. 4. Schematic of the 42.5GHz digitally-controlled quarter-wave-loadmodulated switching PA array. The state of the array for  $n = 8, m = 5, R_l = 25\Omega$  is shown for illustration. The red digital paths are for the OFF PAs (shaded in grey) while the green digital paths are for the ON PAs.

## III. 42.5GHz DIGITALLY-CONTROLLED QUARTER-WAVE LOAD-MODULATED SWITCHING PA ARRAY

In the digitally-controlled quarter-wave load-modulated switching PA array architecture (Fig. 4), several (n) Class-Elike mmWave PAs [2] are combined using the lumped-element quarter-wave combiner described earlier. Each Class-E-like PA can be turned OFF using a unique digital control by means of a supply switch to save dc power under back-off. A key feature of the combiner is its load-modulation behavior as PAs are turned OFF. Assume that *n*-*m* PAs are turned OFF, and *m* PAs are kept ON. Each PA is designed to present a short-circuit output impedance to the combiner when turned OFF. The  $\lambda/4$  branch transforms this short-circuit impedance to an open circuit at the combining point. Consequently, the impedance seen by the *m* ON PAs is  $Z_0^2/(50m)$  (=200/*m* ohms in this implementation, as  $Z_0 = 100\Omega$ , L=353pH and C=35.4fF). Switching-class PAs are essentially voltage-source-like PAs which produce an output power that is inversely proportional to load resistance. Consequently, the output power of each PA is given by  $P_{unit} \propto V_{DD}^2/(200/m) \propto m$  and the total output power is given by  $P_{out} \propto m^2$ . Thus, the load modulation makes the output amplitude linear with m.

Three possible usage scenarios may be envisioned for this architecture, namely: (i) as a 3-bit mmWave power DAC, where the input is maintained at a Class-E drive level and output amplitude is controlled digitally by turning PAs ON and OFF, (ii) as a power amplifier with the digital control purely exercised as a means of efficient static output power control (i.e. to support low-power modes with high efficiency), and finally (iii) as a power amplifier where the output modulation is constructed by means of a combination of input modulation and digital control for efficiency under back-off. Options (ii) and (iii) are enabled by the fact that the mmWave Class-E-like PAs do possess linearity and small-signal gain due to softswitching at mmWave frequencies [2]. The third option bears some resemblance to a multi-step Doherty architecture [7], but is distinct in the nature of the output combiner and the load-modulation mechanism.



Fig. 5. Circuit schematic of a breakout of the supply-switched Class-E-like unit-cell PA in the 42.5GHz digitally-controlled quarter-wave-load-modulated switching PA array.



Fig. 6. (a) Chip microphotographs of chip photo of the 8-way combined wideband watt-class power amplifier and (b) the Digitally-Controlled Quarter-wave Load-modulated Switching PA Array.

The circuit schematic of the supply-switched unit-cell PA is shown in Fig. 5. In 45nm SOI technology, the dc  $V_{ds,max}$  is 1.2V, and the peak RF swing across any two device junctions must be kept below  $2 \times V_{ds,max}$  for the 40nm floating-body (FB) devices for long term reliable operation. A thick-oxide pMOS device  $M_8$  is used as the supply switch with a dc  $V_{ds,max}$  of 2.4V. Therefore, at most two FB devices can be stacked in the main PA to prevent breakdown of the pMOS supply switch in both ON and OFF states. To increase output power, the main PA is designed for an optimal load of  $25\Omega$ when all PAs are on  $(Z_0 = 100\Omega)$ . Each unit-cell also has its own digital control bit  $(b_n)$  and accompanying high-speed digital circuitry to ensure high-speed turn ON/OFF [8]. When the main PA is turned OFF,  $M_8$  is turned OFF, while  $M_2$ shorts the gate of the driver's input device  $(M_3)$  to ground. The combination of these two techniques helps conserve dc power. In order to ensure that an OFF PA presents a shortcircuit impedance to the combiner, switch  $M_5$  is turned ON which applies a high gate bias to  $M_6$ .  $M_1$  is also turned ON, which, in conjunction with the series  $44\Omega$  resistor preserves input-match for the OFF driver. A breakout of the unit cell PA achieves measured  $P_{sat}$  of 18dBm and a peak PAE of 18% under static testing.

Fig. 7(a) shows the simulated input impedance  $Z_{in}$  of the 8-way lumped-element quarter-wave combiner seen by the ON PAs as PAs are turned OFF.  $Re(Z_{in})$  shows the desired 200/m dependence. The imaginary part remains small across all settings. The simulated efficiencies for the combiner, the unit-cell and the system are summarised in Fig. 7(c) as a function of m. While the combiner's efficiency backs off better than a Class-B PA as its load is modulated. A detailed theoretical treatment of this behavior of mmWave Class-E-like PAs is beyond the scope of this paper and will be included in a future journal



Fig. 7. (a) Simulated input impedance  $Z_{in}(m)$  vs *m* of the combiner (b) Simulated unit-cell  $P_{sat}$  with  $Z_{load} = Z_{in}(m)$  and comparison of simulated and ideal system  $P_{sat}$  vs m (c) Simulated combiner, unit-cell and system efficiencies (d) Simulated system output voltage



Fig. 8. (a) Measured output power versus input power and (b) measured output voltage for different digital control settings (i.e. different number of PAs on) at 42.5GHz for the digitally-controlled quarter-wave-load-modulated switching PA array for usage scenario (i).

publication. The simulated output power of the system closely follows the expected quadratic behaviour with m as shown in Fig. 7(b). This results in an almost-linear output amplitude with m (Fig. 7(d)) demonstrating the linearizing feature of this architecture and hence, its utility as a 3-bit power DAC.

Small-signal S parameter measurements indicate that input and output match are maintained across the digital control settings. The large-signal  $P_{out}$  vs.  $P_{in}$  profile can be seen in Fig. 8(a). A P<sub>sat</sub> of 23.4dBm is achieved at 42.5GHz when all PAs are ON. Fig. 9 shows drain efficiency and PAE as a function of output power at 42.5GHz across digital control settings. The optimal drain efficiency and PAE contours depict how the digital control should be exercised in conjunction with input modulation for optimal average efficiency for usage scenario (iii). Our measurements show a  $2.25 \times$  improvement in drain efficiency and a  $1.75 \times$  improvement in PAE at 6dB back-off over the baseline case where all PAs are always kept ON. Usage scenario (i) is a subset of scenario (iii) where the input power is kept constant at the peak value and the PA behaves as a 3-bit quantizer. The efficiency vs  $P_{out}$ curve is a set of discrete points for a constant  $P_{in}$  across m settings in Fig. 9. The peak PAE ( $\approx 7\%$ ) and output power (23.4dBm) are lower than expected (12% and 25dBm) due to frequency mismatch between the PAs and the combiner.

| Work                                               | [1]                  | [3]      | [4]             | [9]               | [10]             | [11]   | [12]            | This Work                      | This Work             |
|----------------------------------------------------|----------------------|----------|-----------------|-------------------|------------------|--------|-----------------|--------------------------------|-----------------------|
| Technology                                         | 45nm SOI             | 45nm SOI | 40nm            | 130nm             | 130nm            | 130nm  | 90nm            | 45nm SOI                       | 45nm SOI              |
|                                                    | CMOS                 | CMOS     | CMOS            | SiGe              | SiGe             | SiGe   | CMOS            | CMOS                           | CMOS                  |
| <b>Freq.(GHz)</b> , $\frac{\Delta f}{f}$           | 41.5-                | 421      | 60 <sup>1</sup> | 58-               | 59-              | 80-    | 60 <sup>1</sup> | <b>33-46,33</b> % <sup>2</sup> | 42.5                  |
| , · · · · J0                                       | 48.5,16%             |          |                 | 62,6.7%           | 64,8.1%          | 90,12% |                 |                                |                       |
| Gain <sub>max</sub> (dB)                           | 18.36 <sup>3,4</sup> | 7        | N/R             | $18^{4}$          | $20^{4}$         | 8      | 20.6            | 19.4                           | 11.9                  |
| Psat,max (dBm)                                     | 24.3 <sup>3,4</sup>  | 18       | 15.6            | $20^{4}$          | 23.14            | 21     | 19.9            | 27.2                           | 23.4                  |
| <b>DE</b> <sub>max</sub> (%)                       | 21.3 <sup>3,4</sup>  | 33       | N/R             | 15 <sup>4</sup>   | N/R              | 4      | N/R             | 11.7                           | 8.2                   |
| <b>PAE</b> <sub>max</sub> (%)                      | 14.6 <sup>3,4</sup>  | 23       | 25              | 12.7 <sup>4</sup> | 6.3 <sup>4</sup> | N/R    | 14.2            | 10.7                           | 6.7                   |
| $\frac{\mathbf{DE}_{-6dB}}{\mathbf{DE}_{max}}(\%)$ | N/R                  | 72.7     | N/R             | N/A               | N/A              | 23     | N/R             | 42.7                           | 60                    |
| Architecture                                       | 2-bit Power          | Doherty  | Outphasing      | N/A               | N/A              | N/A    | 4-way           | 8-way                          | $\lambda/4$ Load Mod- |
|                                                    | DAC                  |          |                 |                   |                  |        | Power           | Power                          | ulated Switched       |
|                                                    |                      |          |                 |                   |                  |        | Combined        | Combined                       | PA (option III)       |
| Fully Integrated ?                                 | No <sup>3,4</sup>    | Yes      | Yes             | Yes <sup>4</sup>  | Yes <sup>4</sup> | Yes    | Yes             | Yes                            | Yes                   |

TABLE I Comparison with State-of-the-Art CMOS and Sige mmWave PAs with  $P_{sat} > 20$  dBm or Employing Linearizing Architectures

 $^{1}$ Large-signal performance across frequency is not reported.  $^{2}$ Measurement below 33GHz is limited by equipment.  $^{3}$ Does not have an on-chip supply inductor (biased using external bias-Ts).  $^{4}$ Assumes 3dB external differential-to-single-ended converter.



Fig. 9. Drain Efficiency vs.  $P_{out}$  and PAE vs.  $P_{out}$  for different digital control settings (i.e. different number of PAs on) at 42.5GHz digitally-controlled quarter-wave-load-modulated switching PA array. \*Curves are offset for clarity.

The PAE is also expected to be higher in an SoC transmitter implementation, either through elimination of the input  $50\Omega$ terminations presented by OFF PAs (which degrade efficiency under backoff) through co-design with the preceding driver stage or through the addition of another driver stage within each supply-switched unit cell. The measured output voltage (Fig. 8(b)) displays the expected linear profile with *m*. A supply-switched unit cell breakout is measured to have  $\approx$ 225ps rise/fall times on turn ON/OFF and can support Gbps OOK modulation speeds, which indicates that the array can sustain high-speed complex modulations as well.

## IV. CONCLUSION

Table I shows a comparison of the measured performance of both prototypes to state-of-the-art CMOS and SiGe mmWave PAs either with output powers in excess of 20dBm or using linearizing architectures. The 33-46GHz 27dBm PA achieves the highest output power by almost a factor of 5 with comparable efficiency to prior works when the ideal assumed 3dB differential-to-single-ended-conversion of prior differential works is taken into account. It also exhibits the highest fractional bandwidth by a factor 2. This is despite the fact that a typical 130nm SiGe process has higher  $f_{max}$  (240GHz for IBM SiGe8HP [9]) than a typical 45nm SOI CMOS process (200GHz for IBM 12SO [2]) as well as significantly higher breakdown voltage. The digitally-controlled quarterwave load-modulated switching PA array is the only implementation to achieve back-off characteristics better than Class B (i.e. efficiency at 6dB back-off >50% of peak efficiency), peak output power >20dBm *and* linearization. Measurements of this PA under modulation are ongoing.

## ACKNOWLEDGEMENTS

The authors acknowledge the DARPA ELASTx program, DARPA program manager Dr. S. Raman, and AFRL (Drs. R. Worley and P. Watson) for their support.

#### REFERENCES

- A. Balteanu *et al.*, "A 45-GHz, 2-bit power DAC with 24.3 dBm output power, >14 V<sub>pp</sub> differential swing, and 22% peak PAE in 45-nm SOI CMOS," in 2012 IEEE RFIC Symp., June 2012, pp. 319 –322.
- [2] A. Chakrabarti and H. Krishnaswamy, "High power, high efficiency stacked mmWave Class-E-like power amplifiers in 45nm SOI CMOS," in 2012 IEEE CICC, Sep. 2012, pp. 1 –4.
- [3] A. Agah *et al.*, "A 45GHz Doherty power amplifier with 23% PAE and 18dBm output power, in 45nm SOI CMOS," in 2012 IEEE IMS, June 2012, pp. 1 –3.
- [4] D. Zhao et al., "A 60GHz outphasing transmitter in 40nm CMOS with 15.6dBm output power," in 2012 IEEE ISSCC, Feb. 2012, pp. 170 –172.
- [5] J.-W. Lai and A. Valdes-Garcia, "A 1V 17.9dBm 60GHz power amplifier in standard 65nm CMOS," in 2010 ISSCC, Feb. 2010, pp. 424 –425.
- [6] J.-G. Kim and G. Rebeiz, "Miniature Four-Way and Two-Way 24 GHz Wilkinson Power Dividers in 0.13 µm CMOS," *IEEE MWCL*, vol. 17, no. 9, pp. 658 –660, Sep. 2007.
- [7] L. Piazzon *et al.*, "New generation of multi-step Doherty amplifier," in 2011 IEEE EuMIC, Oct. 2011, pp. 116 –119.
- [8] A. Chakrabarti and H. Krishnaswamy, "Design Considerations for Stacked Class-E-like mmWave Power DACs in CMOS," 2013 IEEE IMS, Submitted for publication.
- [9] U. Pfeiffer and D. Goren, "A 20 dBm Fully-Integrated 60 GHz SiGe Power Amplifier With Automatic Level Control," *IEEE JSSC*, vol. 42, no. 7, pp. 1455 –1463, July 2007.
- [10] U. R. Pfeiffer and D. Goren, "A 23-dBm 60-GHz Distributed Active Transformer in a Silicon Process Technology," *IEEE T-MTT*, vol. 55, no. 5, pp. 857 –865, May 2007.
- [11] E. Afshari *et al.*, "Electrical funnel: A broadband signal combining method," in 2006 IEEE ISSCC, Feb. 2006, pp. 751 –760.
- [12] C. Law and A.-V. Pham, "A high-gain 60GHz power amplifier with 20dBm output power in 90nm CMOS," in 2010 IEEE ISSCC, Feb. 2010, pp. 426 –427.