Flexible on-chip power and clock
11243559 · 2022-02-08
Assignee
Inventors
Cpc classification
H03K3/012
ELECTRICITY
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
Modern integrated circuits have an increasing need for various levels of both supply voltage (V) and operating frequency (f) available at fine spatial and temporal granularity. This work introduces a solution that provides a number and quality of locally distributed V/f domains through FOPAC. Opportunistically sharing design resources and features between multi-phase voltage regulators (MPVRs) and resonant rotary clocks (ReRoCs) enabling i) the scalability to hundreds of domains, ii) fast switching times for both voltage and frequency, leading to temporal flexibility, and iii) locally distributed designs, leading to spatial flexibility.
Claims
1. A flexible on-chip power and clock (FOPAC) architecture comprising: multi-phase voltage regulators (MVPRs) that share a fly capacitor with a resonant rotary clock (ReRoC), wherein the ReRoC provides multiple clock phases for an interleaved operation; and wherein comparators provide that high gain are driven by a higher frequency clock signal (CLKHf) and comparators that provide nominal gain by a lower frequency clock signal (CLKLf).
2. The flexible on-chip power and clock (FOPAC) architecture of claim 1, wherein the ReRoC includes IC interconnects for transmission lines.
3. The flexible on-chip power and clock (FOPAC) architecture of claim 2, further comprising inverter pairs that are uniformly distributed along the transmission lines.
4. The flexible on-chip power and clock (FOPAC) architecture of claim 3, wherein the inverter pairs are uniformly distributed along the transmission lines in anti-parallel fashion.
5. The flexible on-chip power and clock (FOPAC) architecture of claim 4, wherein a traveling wave along the transmission line of the ReRoC provides multiple phases with a duty cycle of 50%.
6. The flexible on-chip power and clock (FOPAC) architecture of claim 5, wherein the phase delay of the ReRoC is evenly distributed in the direction of wave propagation.
7. The flexible on-chip power and clock (FOPAC) architecture of claim 1, wherein the two different frequencies of the clock signals are provided from the ReRoC.
8. The flexible on-chip power and clock (FOPAC) architecture of claim 1, wherein the comparators are a doubletail latch-type for speed and low kickback noise.
9. The flexible on-chip power and clock (FOPAC) architecture of claim 1, wherein the MVPR is an integrated switched capacitor voltage regulator (SCVR).
10. The flexible on-chip power and clock (FOPAC) architecture of claim 9, wherein the SCVR comprises at least one fly capacitor that generates a DC output voltage.
11. The flexible on-chip power and clock (FOPAC) architecture of claim 9, wherein the SCVR has non-overlapping signals that operate at the MHz frequency range.
12. The flexible on-chip power and clock (FOPAC) architecture of claim 1, wherein the ReRoC comprises a frequency divider with a main loop of MOSABs and a sub-loop of MISABs.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(21) 1. Flexible On-Chip Power and Clock
(22) 1.1 Introduction
(23) Modern integrated circuits have an increasing need for various levels of both supply voltage (V) and operating frequency (f) available at fine spatial and temporal granularity. This work introduces a solution that provides a number and quality of locally distributed V/f domains through FOPAC, as shown in
(24) The performance improvements and power savings enabled by flexible on-chip power and clocks (FOPAC) are motivated in
(25) FOPAC provides spatially and temporally flexible power/clock domains that are fine-tuned for each individual unit and collectively designed with shared overhead with superior performance. This flexibility comes with significant savings in power, performance, area, and accuracy, thanks to the opportunistic design of the MPVR and the ReRoC, leading to the following features: (1) fast switching between different V/f pairs, (2) symmetric switching between V/f pairs to improve power savings and performance, as illustrated in
(26) 1.2. Proposed FOPAC Architecture
(27) Flexible on-chip power and clock (FOPAC) architecture encompasses the FOPAC circuit building blocks, as illustrated in
(28) 1.2.1 Dynamic Frequency Scaling of ReRoC
(29) In the dynamic ReRoC frequency divider illustrated in
(30)
(31) The m phases of the ReRoCs are used to produce the frequency divider output. When m=8, the phase delays between the adjacent SABs in the main-loop is 7/8|←2″ and the sub-loop is 6/8*2π. From Eq 1.1, to perform a frequency division of r=9, n1=6 and n2=5 connections are required. This can be achieved with the circuit topology in
(32) The power consumption of the frequency dividers for division ratios 3 to 44 with a master clock of 3.3 GHz is shown in
(33) 1.2.2 Dynamic Voltage Scaling With MPVR
(34) The circuit topology for the proposed MPVR, which is an integrated SCVR with ReRoC for FOPAC, as illustrated in
(35) Overall, one ReRoC structure with k dividers can provide the k distinct clock frequencies for comparator operation, and the m multi-phase signals, shown in
(36) Two sets of results are presented to validate and measure the effectiveness of the DVS operation with MPVR. The first set of results is the symmetric and fast response to both step-up and step-down changes in the reference voltage of the MPVR with an output current of 50 mA (arbitrarily selected) shown in
(37) The second set of experimental results for DVS scaling with MPVR are based on the fly capacitor selection on i) SCVR conversion efficiency and ii) the opportunistic design assisting DFS. In
(38)
(39) In EQ. 1.2, Pout, Psw, Pbuff, Ppar, and Pcontrol are the output power, switching power, buffer power, parasitic power, and control and reference circuit related power, respectively. These values are obtained from SPICE simulations of extracted layouts of a FOPAC designed in a 65 nm technology. The (known) impact of the fly capacitor size on the maximum power efficiency is for varying load current is shown in
(40) In the FOPAC architecture fly capacitors connected to the ReRoCs can optionally be used to lower the frequency. Frequency scaling with different fly capacitor values on an arbitrarily selected ReRoC frequency of 4.7 GHz is shown in
(41) 1.2.3 FOPAC Methodology
(42) The FOPAC methodology, compliant with the traditional ASIC flow, is illustrated in
(43) 1.2.3.1 ReRoC design: The custom ReRoC clock distribution network synthesis has five steps, illustrated in
(44) 1.2.3.2 Power generation: The power generation (and distribution) has five steps, illustrated in
(45) The input to Step 1 is the topology of the ReRoC rings including the number of ReRoC rings and tapping points for the multiple phases required for the MPVRs from the ReRoC design flow. For a given power budget, an SCVR topology may be designed with the goal of achieving the desired target efficiency by distributing the power budget over multiple SCVRs. The number of SCVRs required may be divided such that each ReRoC ring has a voltage regulator (with load balancing) and the rest of the design has the appropriate number of voltage regulators necessary to operate during the low performance mode. Similar to use of PVT analysis in ReRoC design for DFS, the power generation stage may use SPICE simulations in step 5 for signal integrity analysis of DVS operation.
(46) 1.3. FOPAC DVFS Evaluation
(47) FOPAC is demonstrated on three different industrial designs that are publicly available: 1) AES encryption core, 2) Arm core—CORTEX M1, and 3) VSCALE RISC-V. The designs are placed and routed (P&R) and subjected to STA in order to verify the timing of the ASIC flow at the system level. The timing and power characteristics of the FOPAC components (ReRoC and MPVR) are analyzed in deeper detail through SPICE simulations of layout-extracted models.
(48) In particular, the transmission line interconnect parasitics are extracted using the high frequency structural simulator (HFSS). The algorithms are implemented in C++ and Matlab. An industrial 65 nm technology library is used for the design and routing.
(49) An arbitrary ReRoC frequency of 3.3 GHz (FM) is chosen to evaluate the FOPAC methodology. Two sets of dynamic resonant frequency dividers to perform frequency division in integer ratios 3 to 9 are designed for the core clock source (Fcore) and the MPVR clock source (Fmpvr). In the PVT stage, the geometries of the ReRoC rings along with the frequency dividers are varied ±10% to represent the worst case scenarios. The deviation from the target frequency of 3.3 GHz with PVT variations for 500 Monte-Carlo runs are presented in Table 1.2,
(50) Worst case static IR and Ldi/dt analysis are performed on the layout extracted industrial designs (RLC models). The average worst case voltage drop across the three industrial designs is 2.1% of the Vdd.
(51) A sample operation of FOPAC DVFS operation of the RISC-V core is presented in
(52) The high gain comparators are clocked at FM and nominal gain comparators are clocked at Fmpvr. The frequency Fcore and voltage are scaled between different levels to validate the accuracy of the switching speed. To enable the fly capacitor reuse mode (RUmode), 10 of the SCVRs are shut down and the fly capacitor is loaded to the ReRoC rings in the RISC-V core. In the RUmode, it takes ˜3 ns for the frequency of the ReRoC FM to stabilize to 3.14 GHz (with voltage scaling). Then, it takes 0.97 ns to scale Fcore to 624 MHz and Fmpvr to 348 MHz to operate the RISC-V core in the RUmode with Vdd=0.91 V. In total, it takes 3.97 ns to scale the frequency to the RUmode at run time by utilizing the fly capacitor of the SCVRs.
(53) The power consumption of the FOPAC based designs versus PLL based designs operating at 825 MHz is presented in
(54) FOPAC is compared to prior resonant works in Table 1.4. The numbers reported in Table 1.4 are from SPICE simulations of the sweep of the V/f range—0.9 to 1.2V and 348 MHz to 1.1 GHz—over SS, FF, FS, and SF corners, and not only the results reported in Table 3. Overall, FOPAC delivers power with 77% efficiency, and achieves 25%-39% power reduction thanks to 64%-74% reduction in clock power.
(55) The voltage scaling within FOPAC is symmetric and robust with a (worst-case) tresponse of 89 ps. The DFS switching time within FOPAC utilizing a 3.3 GHz ReRoC is 0.9 ns. Overall, FOPAC demonstrates scaling of the voltage-frequency over a wide range without the need for on/off-chip inductors while re-utilizing (RUmode) the fly capacitor for higher frequency tuning.
(56) 1.4 Conclusions
(57) In this paper, the fusion of resonant rotary clock with on chip voltage regulators enabling flexible on-chip power and clock is presented. FOPAC is designed and evaluated on three different industrial designs to validate the architecture. FOPAC can switch between different V/f domains in 1.9 ns with a ReRoC clock operating at 3.3 GHz. FOPAC achieves 25%-39% power savings while offering fly capacitance re-usability to tune the ReRoC frequency at run time without any negative implications. FOPAC can provide high number of V/f domains with fast DVFS capability while consuming low-power and operating reliably, justified via evaluation on industrial designs in this work.
(58) 2. Distributed Digital Low-Dropout Regulators With Phase Interleaving For On-Chip Voltage Noise Mitigation
(59) 2.1 Introduction
(60) Digital low-dropout regulator (DLDO) where multiple parallel power transistors are controlled by a digital loop consisting of a shift register and a clocked comparator has been drawing attention in modern integrated systems such as processors and Internet of Things (IoT) applications. Due to the adoption of a digital control loop, low design complexity, low voltage operation capability, and process scalability can be easily achieved. Various techniques such as adaptive control and reduced dynamic stability, coarse-fine-tuning and burst-mode operation, event-driven PI control, and computationally control have been implemented to improve the transient response. Meanwhile, reliability enhancement techniques for DLDOs have also been investigated recently.
(61) On the other hand, on-chip power delivery network becomes larger and more sophisticated with technology scaling to support increased functionality of the load circuits. Distributed on-chip voltage regulation with multiple tiny voltage regulators deployed at different locations of the chip emerges as a promising solution to achieve localized voltage regulation and superb voltage noise performance. With increased number of voltage regulators distributed across the chip, load current is supplied by nearby voltage regulators and therefore less IR drop can be realized as compared with a centralized power delivery scheme utilizing a single voltage regulator.
(62) DLDO is an ideal candidate for distributed on-chip voltage regulation due to the aforementioned advantages. Similar to a multiphase buck or switched-capacitor converter, the clock distribution network of distributed DLDOs also needs to be carefully considered.
(63) Despite the advantages of DLDOs, limit cycle oscillation due to inherent quantization error can occur, which leads to unwanted output voltage ripple at steady state. In this work, the benefits of distributed DLDOs with phase interleaving are investigated both theoretically and through extensive simulations regarding on-chip voltage noise mitigation under different number of DLDOs. Design issue of the corresponding clock distribution network is also addressed. Instead of utilizing a ring oscillator, resonant rotary clock (ReRoC) is proposed for robust clock generation and distribution.
(64) 3 Distributed Digital Low-Dropout Regulators With Phase Interleaving
(65) The amplitude and mode of limit cycle oscillation can be investigated utilizing a nonlinear sampled feedback model for DLDO steady state operation shown in
(66)
(67) where T, D, M, Fl, and KOUT are the clock period, amplitude of the clocked comparator output, mode of limit cycle oscillation, load pole, and gain of P(z), respectively. KOUT=KDCIpMOS and Fl=1/(RL∥Rp)C. KDC, IpMOS, RL, Rp, and C are DC constant, current provided by a single power transistor, load resistance, power transistor array resistance, and output capacitance, respectively.
(68) The following Nyquist condition needs to be satisfied for a certain mode of limit cycle oscillation to exist
N(A,φ)P(e.sup.jωT)S(e.sup.jωT)=1∠(−π) EQ. 2.4
(69) where the angular limit cycle oscillation frequency ω=π/TM. Determined by (4),
(70)
(71) Considering distributed on-chip voltage regulation with N DLDOs, under a certain load current condition Iload, if all N DLDOs operate with the same clock signal, the equivalent IpMOS becomes NIpMOS as N power transistors are turned on/off at the rising edge of each clock cycle. Fl and T remain unchanged as compared with the case of a single DLDO. For this case, φ becomes
(72)
(73) When N distributed DLDOs operate in a phase-interleaved fashion, Fl and IpMOS remain unchanged while the equivalent T becomes T/N. Therefore, the corresponding φ becomes
(74)
(75) Compared with φ1, φ2 is more likely to satisfy the condition of φ∈(0,π/M) with a larger value of M due to the added factor N, which means a larger limit cycle oscillation mode. However, the amplitude of output voltage ripple demonstrates an approximately linear relationship with the equivalent size of a single power transistor while is less affected by the equivalent switching frequency. A smaller output voltage ripple is therefore typically introduced with phase interleaved clocks.
(76) 2.3 Clock Generation and Distribution
(77) ReRoC is proposed to serve as the clock circuitry 2300 for distributed DLDOs as shown in
θ/2π=t/T. EQ. 2.8
(78) T is the clock period in Eq. 2.8. A power optimal solution can be generated by designing GHz frequency range ReRoCs and then performing frequency division to generate the target frequency. The resonant frequency divider is designed with the spot-advancing blocks (SABs) to perform frequency division. Multiple SABs along with them phases of the ReRoC are utilized to perform frequency division. Cascading multiple SABs together the desired frequency division ratio can be achieved. Each SAB block in the divider topology phase shifts the clock signal by ((m−1)/m).Math.2π. Therefore, each SAB block employed for the frequency division generates a phase shifted signal. The same phase clock signals can be tapped from the first SAB block of each divider topology. Overall, one ReRoC structure with k dividers can provide the k distinct clock frequencies for comparator operation, and the multi-phase signals. ReRoCs are designed in the GHz frequency range and the clock for the multi-phase distributed DLDOs is generated after frequency division and duty cycle conversion.
(79) It is straightforward to tap them multi-phase clock sources for the distributed DLDOs since the tapping locations are accurately known on the frequency divider, and the routing not as complex as clock distribution networks accomplished in earlier approaches. Consider the tapping point for a particular phase ΘPi to be located at (x,y). The distributed DLDOs clock source taps onto ΘPi that satisfies the phase requirement. The clock phase depends on: i) ΘDLDOp—the phase required for the DLDO, and ii) Θli the phase attributed to the tapping wire li, where ηDLDOp=Θli+ΘPi. ReRoC can be designed to fit the locations of DLDOs or vice versa.
(80) ReRoCs, through self-resonance and being locked in-phase across ReRoC rings in an ReRoC oscillatory array (ROA) topology, are robust to local variations through self-tuning. In other words, a local variation affecting a local ReRoC ring gets compensated across the ROA, as each local ReRoC ring is locked in phase, and will quickly recover back to the stable resonance mode (or stabilize at a new resonance mode). This structure provides for robustness to variations and limits jitter and skew variations. The self-resonance, on the other hand, requires algorithmic solutions to balance the load across the ROA. The ReRoCs are known to be stable under process variations in terms of frequency matching across large floorplans.
(81) 2.4 Evaluation
(82) To validate the effectiveness of distributed DLDOs with phase interleaving operation, DLDOs with a supply voltage of 1.1V, which is typical for processor applications, are implemented using a 32 nm CMOS process. Output voltage level of 1V is considered. Resistive mesh model is used for the power grid with model parameters. Current provided by a single power transistor IpMOS is approximately 2 mA. 10 MHz clock frequency is utilized for steady state operation. Distributed on-chip voltage regulation with, respectively, 3, 6, and 9 DLDOs is considered. A single DLDO in each case has, respectively, 96, 48, and 32 power transistors such that the total maximum current capability for each case is the same. Uniform distribution of DLDOs is considered. Load currents of 50 mA, 150 mA, and 300 mA are considered for each case. Simulation results demonstrating the output voltage ripple waveforms of 6 distributed DLDOs under different load current with and without phase interleaving operation are shown in
(83) Under a load current of 50 mA and 9 DLDOs, phase interleaving operation reduces voltage ripple from 179.41 mV to 97.42 mV, which is within 10% of Vout. Furthermore, up to 46% output voltage ripple reduction is demonstrated.
(84) 2.5 Conclusions
(85) Distributed on-chip voltage regulation demonstrates advantages as compared with a centralized scheme with only a single voltage regulator. DLDO is an ideal candidate for distributed on-chip voltage regulation. When distributed, the clock generation and distribution need to be carefully considered. It is investigated in this work through both theoretical analysis and extensive simulations that distributed DLDOs with phase interleaving operation helps to mitigate on-chip voltage noise at steady state. Corresponding clock circuitry leveraging resonant rotary clock, which is highly resilient to process variation, is also proposed. Distributed DLDOs with phase interleaving operation demonstrate up to 46% on-chip voltage noise mitigation.
(86) While the invention has been described with reference to the embodiments above, a person of ordinary skill in the art would understand that various changes or modifications may be made thereto without departing from the scope of the claims.