Abstract
A system and method for high-speed, low-power cryogenic computing are presented, comprising ultrafast energy-efficient RSFQ superconducting computing circuits, and hybrid magnetic/superconducting memory arrays and interface circuits, operating together in the same cryogenic environment. An arithmetic logic unit and register file with an ultrafast asynchronous wave-pipelined datapath is also provided. The superconducting circuits may comprise inductive elements fabricated using both a high-inductance layer and a low-inductance layer. The memory cells may comprise superconducting tunnel junctions that incorporate magnetic layers. Alternatively, the memory cells may comprise superconducting spin transfer magnetic devices (such as orthogonal spin transfer and spin-Hall effect devices). Together, these technologies may enable the production of an advanced superconducting computer that operates at clock speeds up to 100 GHz.
Claims
1. A cryogenic memory array chip configured to operate at below 10 K, comprising: a plurality of magnetic Josephson junctions; a readout circuit configured to read a magnetic state of the plurality of magnetic Josephson junctions; and addressing logic configured to select a subset of the plurality of magnetic Josephson junctions, having a parallel data interface configured to transfer parallel data from the subset of the plurality of magnetic Josephson junctions at a rate of at least 20 GHz.
2. The cryogenic memory array chip according to claim 1, wherein the plurality of magnetic Josephson junctions are serially biased.
3. The cryogenic memory array chip according to claim 1, wherein the cryogenic memory array chip comprises a 64-bit wide data interface.
4. The cryogenic memory array chip according to claim 1, configured to operate at 4 K.
5. The cryogenic memory array chip according to claim 1, comprising memory cells selected from the group consisting of magnetic Josephson junction memory cells, cryogenic orthogonal spin transfer memory cells, and cryogenic spin-Hall effect memory cells.
6. The cryogenic memory array chip according to claim 1, wherein the plurality of magnetic Josephson junctions comprises at least one of a plurality of superconducting ferromagnetic transistors and a plurality of superconducting nanowire devices.
7. The cryogenic memory array chip according to claim 1, wherein the cryogenic memory array chip comprises a first layer formed of a first material having a first inductance and a second layer formed of a second material having a second inductance, the first material having an inductance which is predominantly a kinetic inductance which does not couple with magnetic fields external to the first layer, and the second material having an inductance which is predominantly a magnetic inductance which interacts with magnetic fields external to the second layer, the first inductance being higher than the second inductance.
8. The cryogenic memory array chip according to claim 7, wherein the first material is niobium nitride and the second material is niobium.
9. The cryogenic memory array chip according to claim 1, in combination with: a carrier; a cryogenic digital processor chip on the carrier, configured for interchip communication with the cryogenic memory array chip at a rate of at least 20 GHz; and an electrical interface configured to communicate data between the cryogenic digital processor chip and the external room-temperature electronic system.
10. The cryogenic memory array chip according to claim 9, wherein the cryogenic digital processor chip comprises a serial data communication interface.
11. The cryogenic memory array chip according to claim 9, wherein the cryogenic memory array chip is reworkably attached to the carrier.
12. The cryogenic memory array chip according to claim 11, wherein the cryogenic digital processor chip is attached to the carrier with epoxy.
13. The cryogenic memory array chip according to claim 9, wherein the cryogenic digital processor chip comprises a shift register, a clock controller, an address buffer, and input buffer, an output buffer, and a time-to-digital converter.
14. The cryogenic memory array chip according to claim 9, wherein the cryogenic memory array chip communicates with the cryogenic digital processor chip through a parallel set of passive transmission lines, wherein multi-bit data from the cryogenic memory array chip is converted to a dual-rail form, with each bit and its inverted state propagating along respective parallel passive transmission lines to a first-in, first-out buffer released based on a recovered clock signal, wherein the dual rail form data are merged via a binary tree of Muller C-elements, to produce the recovered clock signal and the multi-bit data.
15. A method of storing and retrieving information, comprising: providing a cryogenic memory array chip configured to operate at below 10 K, comprising a plurality of magnetic Josephson junctions, a readout circuit configured to read a magnetic state of the plurality of magnetic Josephson junctions, and addressing logic configured to select a subset of the plurality of magnetic Josephson junctions, having a parallel data interface configured to transfer parallel data from the subset of the plurality of magnetic Josephson junctions at a rate of at least 20 GHz; receiving first data to be stored; addressing a first subset of the plurality of magnetic Josephson junctions; selectively altering a magnetic state of the first subset of the plurality of magnetic Josephson junctions in dependence on the first data to be stored; receiving second data to be stored; addressing a second subset of the plurality of magnetic Josephson junctions; selectively altering a magnetic state of the second subset of the plurality of magnetic Josephson junctions in dependence on the second data to be stored; addressing the first subset of the plurality of magnetic Josephson junctions; selectively reading the magnetic state of the first subset of the plurality of magnetic Josephson junctions with the readout circuitry corresponding to the first data; and presenting the first data at a readout port.
16. The method according to claim 15, wherein the cryogenic memory array chip comprises a first layer formed of a first material having a first inductance and a second layer formed of a second material having a second inductance, the first material having an inductance which is predominantly a kinetic inductance which does not couple with magnetic fields external to the first layer, and the second material having an inductance which is predominantly a magnetic inductance which interacts with magnetic fields external to the second layer, the first inductance being higher than the second inductance.
17. The method according to claim 15, wherein the cryogenic memory array chip is provided on a carrier with a cryogenic digital processor chip configured for interchip communication with the cryogenic memory array chip at a rate of at least 20 GHz, further comprising communicating data between the cryogenic digital processor chip and the external room-temperature electronic system through a thermal and electrical interface.
18. The method according to claim 17, further comprising: buffering an address corresponding to the selected subset of the plurality of magnetic Josephson junctions; buffering the first data to be stored; buffering the second data to be stored; buffering the first data at the readout port; serializing the first data at the readout port; and recovering a clock associated with the serialized first data.
19. A cryogenic memory array chip configured to operate at below 10 K, comprising: a first layer formed of a first material having a first inductance which is predominantly a kinetic inductance which does not couple with magnetic fields external to the first layer; a second layer formed of a second material having a second inductance which is predominantly a magnetic inductance which interacts with magnetic fields external to the second layer, the first inductance being higher than the second inductance; a plurality of magnetic Josephson junctions comprising serial bias circuitry formed of the first layer and the second layer, each magnetic Josephson junction having a data storage state corresponding to a magnetic state; a readout circuit configured to read the data storage state of the plurality of magnetic Josephson junctions; and addressing logic configured to select a subset of the plurality of magnetic Josephson junctions, and to transfer parallel data from the subset of the plurality of magnetic Josephson junctions at a rate of at least 20 GHz.
20. The cryogenic memory array chip according to claim 19, further comprising: a carrier supporting the cryogenic memory array chip; a cryogenic digital processor chip; an interchip communication bus configured to communicate data between the cryogenic digital processor chip and the cryogenic memory array chip; and an external communication port configured to communicate data between the cryogenic digital processor chip and an external room-temperature electronic system.
Description
BRIEF DESCRIPTION OF THE FIGURES
(1) FIG. 1 shows a conceptual diagram of a high-speed energy-efficient superconducting microprocessor, comprising an arithmetic logic unit (ALU) and register file, with a wave-pipelined datapath and timing scheme exhibiting skewed words that represent asynchronous propagation of carry bits through the processor.
(2) FIG. 2 shows the modular structure of the register file, with serial DC biasing of successive registers.
(3) FIG. 3 shows an example energy-efficient RSFQ circuit, with bias inductor and quantizing inductor that might be fabricated from a special high-inductance superconducting layer.
(4) FIG. 4 shows thin-film stacks for several types of MJJ, with the circuit symbol.
(5) FIGS. 5A and 5B show graphs representing a prototype SIsFS MJJ switched repeatedly back and forth between the zero-voltage and the finite voltage states using an external weak magnetic field.
(6) FIGS. 6A-6B show a 3-terminal MJJ structure, the superconductor-ferromagnet transistor (SFT), with a stack SFIFSIS, where each of the three superconductor layers is a separate terminal, together with its circuit symbol.
(7) FIGS. 7A-7C show a single MJJ with a ballistic SFQ readout (SFQ-MJJ), together with a schematic of a memory array.
(8) FIGS. 8A-8C show an alternative single-MJJ cell combined with a three-terminal SFT cell selector, and a memory array of such cells.
(9) FIGS. 9A-9B shows a further alternative memory cell comprising a COST junction connected in parallel with an unshunted SQUID via an inductance in a configuration known as a Relaxation Oscillator (RO).
(10) FIGS. 10A and 10B shows a superconductor NanoWire Device (NWD), comprising a narrow superconducting channel (width less than 100 nm) modulated by injection current from a superconducting gate (FIG. 10B), together with its symbol (FIG. 10A).
(11) FIGS. 11A and 11B show a hybrid CSHE-NWD cell, where the NWD is the selection element for Write operations, together with a proposed cell array architecture.
(12) FIGS. 12A and 12B show a COST-NWD memory cell and array.
(13) FIG. 13 shows a block diagram for a prototype test system, comprising a cryogenic (4 K) testbed multi-chip module (MCM), and a room-temperature FPGA-based memory test controller (MTC).
(14) FIG. 14 shows a functional block diagram of the key components of the cryogenic test control and acquisition chip (TCA), which communicates at high speed with the MRAM chip but at lower speed with the room-temperature controller.
(15) FIG. 15A shows a functional block diagram of bit-parallel chip-to-chip communication on an MCM from a transmitter (Tx) on the left via PTLs to a receiver (Rx) on the right, with clock recovery at the receiver.
(16) FIG. 15B shows the RSFQ circuit schematic for a Muller-C element.
(17) FIG. 16 shows a cross-sectional view of a patterned circuit that comprises both a standard SIS Josephson junction, and also an array of MJJ memory cells.
(18) FIGS. 17A-17D show the steps to fabricate a circuit with both an MJJ and an SFT.
(19) FIG. 18 shows a chip cross section for a circuit in which NWD drivers and COST-MRAM memory cells are fabricated on top of pre-fabricated planarized Josephson junction (RSFQ) circuits.
(20) FIGS. 19A-19B shows the schematic circuit and functional behavior of an ERSFQ half-adder cell.
(21) FIG. 20 shows the block diagram of an energy-efficient ALU.
(22) FIGS. 21A and 21B show how a simple RSFQ circuit (21A) is modified to become an energy-efficient ERSFQ circuit (21B) with zero static power dissipation.
(23) FIGS. 22A and 22B show the circuit layout and block diagram of an ERSFQ-8-bit wave-pipelined adder.
(24) FIGS. 23A and 23B show a cross section of a superconducting multilayer process with a low inductance that is predominantly magnetic inductance (23A, left) and an alternative process with a high-inductance top layer that is predominantly kinetic inductance (23B, right).
(25) FIGS. 24A-24C shows an MJJ structure with one magnetic layer (24A) an alternate MJJ structure with two magnetic layers (24B), and the electrical behavior of a corresponding junction in magnetic field (24C).
(26) FIGS. 25A and 25B shows the ballistic memory readout architecture for the SFQ-MJJ MRAM of FIG. 7 for a “0” and “1” state, respectively.
(27) FIGS. 26A and 26B show a schematic for a bit-line driver of a Write circuit for an MRAM array.
(28) FIGS. 27A and 27B shows schematic of a row of an ERSFQ address decoder for an MRAM array, and a layout of a full decoder.
(29) FIGS. 28A and 28B show an example of a cryogenic system on a cryocooler that may support a hybrid superconducting/magnetic memory array and digital processor. Left (28A) system overview; Right (28B) Detail of cryogenic stage including active magnetic shielding.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(30) I. Superconducting Energy-Efficient Wave-Pipelined Digital Processor
(31) FIG. 1 shows a conceptual diagram of the wave-pipelined datapath and timing scheme for a proposed high-speed energy-efficient superconducting microprocessor, comprising an arithmetic logic unit (ALU) and register file, with skewed words that represent asynchronous propagation of carry bits through the processor. This multibit ALU comprises a cascade of 1-bit ALUs, and one of its unique features is that both the instruction and the carry bit cascade asynchronously through the ALU. This architecture combines the timing advantage and modular scaling found in a ripple adder with the speed advantage of asynchronous circuitry.
(32) A block diagram of the architecture of the register file is shown in FIG. 2. This consists of an array of 128 registers, each with an identical modular structure, designed for a 64-bit word. Since the bias current for each of these is the same, this enables the current bias for each register to be supplied in series from one to the next, known as “current recycling”. The bias current for each register enters through a power line and drains to the ground plane, so this requires that the ground plane for a given register be connected through a via to the power line of the succeeding register. In addition to reducing the total bias current for the register file by the factor of 128, this scheme decouples the phases of each register's ERSFQ bias line, so that only the accessed register will dissipate power, thus also reducing the total power by the factor of 128. Note also that SFQ pulses may freely traverse from one register to the next, despite the offset in DC voltage.
(33) The basic element of the ALU is an ERSFQ half-adder cell (see FIGS. 19A-19B). Here the addend bits A and B are added, together with the Carry bit. The output is triggered by the arrival of the Clock pulse, generating the Sum output at the bottom. A key feature of this cell is its asynchronous Carry signal, which is not latched to a clock signal, and is therefore produced as soon as both ‘1’ arguments arrive at the ALU. This property allows Carry signals to propagate in the form of a wave (wave-pipelining).
(34) A portion of the detailed block diagram of the 8-bit energy efficient ALU is shown in FIG. 20, comprising repeated half-adder units (HA). The instruction select is implemented through a switch cell (Sw) that relays Sum and Carry signals from the first stage Half Adder to the second stage, and provides for executions of such instructions as ADD, XOR, AND, and OR. In combination with selective inversion of the operands, this results in a broad instruction list. This novel ALU architecture exploits the advantages of local timing in an ERSFQ ALU by propagating an instruction code and a clock signal together from LSB to MSB of the operands.
(35) This same “skewed word” approach (see FIG. 1) is used in reading from and writing to a register file as well, providing extremely high throughput. The short vertical dimension of the ALU provides a very low latency (˜80 ps in simulation), where latency is defined as the “turnaround time” between the start of loading the LSBs of the operands, and receiving an output LSB. The wave propagation time from LSB to MSB is simulated to be ˜400 ps for the 8-bit ALU, but this does not affect the performance of the wave-pipelined datapath.
(36) Both the throughput and the energy performance of this ALU are orders of magnitude superior to ALUs in other technologies. For example, an 8-bit version of the ALU based on current fabrication technology (not fully optimized) was simulated on a circuit level, and found to operate at a clock frequency of 44 GHz, giving a throughput of 3.5×10.sup.11 bit-ops/sec. The bias current drawn by a one-bit slice of this design is 50 I.sub.Cmin, where I.sub.Cmin is the critical current of the smallest Josephson junction in the design. Taking the switching energy to be I.sub.CΦ.sub.0 and using a minimum I.sub.Cmin=38 μA gives 2.5×10.sup.17 bit-ops/Joule as the energy performance of this ALU. By virtue of the modular architecture, the speed and energy per bit are independent of the word size, enabling scaling to 64 bits.
(37) FIGS. 21A-21B shows how a simple RSFQ circuit (a unit of a Josephson transmission line or JTL) on the left transforms to an energy-efficient ERSFQ design on the right. The junction switching dynamics (with SFQ switching energy ˜I.sub.bΦ.sub.0) and dynamic power dissipation are the same in both cases. The only difference is that the static power dissipation P.sub.S in the bias line is eliminated by replacing the bias resistor R.sub.bias by a series combination of a Josephson junction (with critical current I.sub.c=I.sub.b) and an inductance L.sub.bias. This enables the bias voltage to be reduced by a factor ˜50, reducing the system power dissipation by a similar factor. Switching of the current-limiting bias junction will compensate for imbalance of average voltages across different bias terminals. Standard RSFQ cells which have already been developed can be modified to ERSFQ circuits by this simple resistor replacement.
(38) FIGS. 22A and 22B show the circuit layout and block diagram of a prototype ERSFQ 8-bit wave-pipelined adder. This is comprised of multiple identical half-adder modules of FIG. 18 (symbolized by HA in the schematic) connected in a tree structure as shown. This comprises ˜2000 Josephson junctions, and this prototype operated at a clock frequency of 20 GHz with a dissipation of 0.36 fJ per operation, proving the viability of this wave-pipelined, low-power approach for larger and faster superconducting processors.
(39) II. High-Inductance Wiring Layer for Energy-Efficient RSFQ Circuits
(40) RSFQ electronics deals with the storage and transfer of magnetic single flux quanta (SFQ) with flux Φ.sub.0=2 mV-ps=2 mA-pH. A loop comprising two Josephson junctions and an inductor L can store a flux quantum if LI.sub.C˜Φ.sub.0, where I.sub.C is the critical current of the Josephson junctions. In transporting an SFQ from one portion of the circuit to another, it is critical that the SFQ not be trapped in unintended inductors, so that normally LI.sub.C<<Φ.sub.0. In contrast, some loops are designed as storage elements, in which case we want a quantizing inductance L.sub.q=Φ.sub.0/I.sub.C˜20 pH if I.sub.C˜0.1 mA.
(41) FIG. 23A (left) shows the cross-section of a typical superconducting connecting layer above a superconducting ground plane in an integrated circuit process, where we assume that both superconductors comprise Nb, which has the highest critical temperature (9.2 K) of any simple elemental metal, and operates well at 4-5 K. Each film may be t˜200 nm thick, and they are separated by an insulator (such as SiO.sub.2) of thickness d˜200 nm. The magnetic penetration depth λ of Nb is ˜100 nm; this is the thickness on the surface of a superconductor in which the currents flow. Consider the inductance of a parallel-plane structure of length □ and width w, where we assume that w>>d so that edge effects may be neglected. Then the inductance can be given as L=(□/w)L.sub.s, where L.sub.s=μ.sub.0(λ.sub.1+λ.sub.2+d) is the sheet inductance or inductance per square of the line. Here λ.sub.1 is the penetration depth of the top superconductor and λ.sub.2 that of the bottom superconductor. The magnetic field produced by the current lies in the insulator, and penetrates into the superconductor within λ of the surface. The contribution μ.sub.0d to L.sub.s is purely magnetic inductance, while the contributions μ.sub.0λ corresponding to the superconducting films (much thicker than λ) are half magnetic and half kinetic inductance. In the example here, L.sub.s=0.5 pH/square, of which about 75% is magnetic inductance and 25% kinetic inductance. Taking I.sub.C˜0.1 mA, the quantizing inductance would be L.sub.q˜20 pH. Short lengths of the line will have L<<L.sub.q, so that it is easy to lay out non-quantizing loops. Quantizing loops, however, will tend to be long. In prior art superconducting integrated circuits, there may be two or more superconductor wiring layers, each separated by insulators of various thicknesses. Various combinations of these layers will lead to different values of L.sub.s, but they are all fairly small and predominantly magnetic.
(42) In contrast, consider the cross-section in FIG. 23B (right), where the top superconductor wiring layer now comprises a thin film with thickness t<<λ. In this limit, the sheet inductance due to this top superconductor is given by μ.sub.0λ.sup.2/t, virtually all of its kinetic, and the current flows uniformly within the film. The total sheet inductance is then L.sub.s=μ.sub.0(λ.sub.1.sup.2/t.sub.1+λ.sub.2+d). If this top film is made from NbN with λ.sub.1=500 nm (depending on deposition conditions) and take t.sub.1=50 nm, then L.sub.s˜6.7 pH, of which 95% is kinetic and 5% magnetic inductance. Furthermore, this thin top layer will also be able to carry a sufficiently large superconducting current at 4 K, since the superconducting critical temperature of NbN is somewhat higher than that of Nb (10-15 K, depending on deposition conditions). An inductor made with this layer will be ideal for designing a compact quantizing inductor L.sub.q, as well as a compact bias inductor L.sub.b, which may have a value that is comparable or larger than this quantizing inductor.
(43) A further advantage of the use of an inductor that is primarily kinetic inductance is that it will have substantially reduced magnetic mutual inductance with other lines and with external fields, as compared with a predominantly magnetic inductance of the same value. This is particularly important for energy-efficient RSFQ, where the bias current in a given line is set by an inductor (rather than by a resistor as with conventional RSFQ), and parasitic mutual inductance may alter the bias current.
(44) A further aspect of the availability of a high-inductance layer is that one may design a passive transmission line (PTL) with a higher characteristic impedance Z.sub.0 for the same dimensions. Since Z.sub.0=(L/C).sup.1/2, increasing L by a factor of 13 increases Z.sub.0 by a factor of 3.5. This may offer additional flexibility in design of PTLs, which are used in energy-efficient RSFQ to transport signals over significant distances on chip with negligible dissipation. Further, one can take advantage of such a difference in Z.sub.0 to deliberately introduce a mismatch that prevents launching of a pulse on a PTL. For example, bias lines are essentially PTLs, but in conventional RSFQ, a bias resistor near the bias current insertion point acts to block the launching of an SFQ pulse onto the bias line (see FIG. 21A). In contrast, in energy-efficient RSFQ, a bias inductor is used instead of a bias resistor. If a compact bias inductor is also located near the bias current insertion point (see FIG. 21B), the impedance mismatch can also act to block the SFQ pulse from being launched onto the bias line. This may help to avoid possible crosstalk via bias line coupling.
(45) A further advantage to a high-inductance layer is that it may be used to construct other superconducting devices that may be integrated with RSFQ digital circuits. For example, a superconducting nanowire single photon detector (SNSPD, also called SSPD or SNAP) is typically constructed from a thin NbN layer with a very high sheet inductance. See D. Gupta, “Single photon counting hotspot detector with integrated RSFQ readout electronics,” IEEE Trans. Appl. Supercond., vol. 9, p. 4487 (1999), expressly incorporated herein by reference; see also U.S. Pat. Nos. 6,812,464; 7,049,593; 8,565,844; 2012/0077689; 2013/0143744, expressly incorporated herein by reference. Further, a similar NbN layer may be used to construct a three-terminal NanoWireDevice (FIGS. 9A and 9B), which operates as a transistor and may be used as cell selector for spintronic memory cells interfaced with RSFQ circuits (FIGS. 10A, 10B and 11A-11B). The availability of such a layer permits these and other essentially analog devices to be closely integrated with RSFQ digital circuits.
(46) III. Hybrid Superconducting-Magnetic Memories Based on Magnetic Josephson Junctions
(47) One preferred embodiment comprises a class of hybrid superconducting-magnetic memories based on magnetic Josephson junctions (MJJ) and superconductor-ferromagnetic transistors (SFT). This memory technology has been called “SPEED-MRAM”, for SuPerconducting Energy-Efficient Dense MRAM. For its Read and Write functions, SPEED-MRAM comprises memory cells that are integrated with eSFQ or ERSFQ energy-efficient peripheral circuitry. To fabricate SPEED-MRAM, a new fabrication process integrates MJJs, SFTs, and SFQ digital processor circuits and periphery circuits in the same fabrication cycle.
(48) SPEED-MRAM is dense, scalable, and operates at high speed. A memory cell consists of a single small MJJ, with optional cell selector, so that density scales with the microfabrication technology. There are no poorly scalable elements, such as SQUIDs. Furthermore, SPEED-MRAM is architecturally compatible with SFQ technology, since signal levels and impedances are similar. Finally, SPEED-MRAM is energy-efficient; the Read operation is performed with an SFQ pulse, and consumes energy only when ‘1’ is read out. A low Write energy is achieved by employing a magnetic junction barrier that is a soft magnetic material with a low coercivity. Periphery circuits are realized with energy-efficient SFQ logic.
(49) A preferred memory element in SPEED-MRAM comprises a magnetic Josephson junction (MJJ) that is comprised of vertical stacks of superconducting, magnetic, and insulating layers (S, F, and I), such that there is a superconducting critical current I.sub.C that is ˜0.1-0.5 mA (or even smaller), and a normal-state junction resistance R.sub.n such that I.sub.CR.sub.n˜0.5 mV, similar to that of Josephson junctions in conventional RSFQ. Since Φ.sub.0=2 mV-ps, the switching speed is 4 ps. Preferred stacks are SIsFS, SIsFsF, and SF.sub.1IF.sub.2S (see FIG. 4), where small s represents a very thin superconducting layer that is weakly superconducting due to the proximity of a magnetic layer. A preferred superconducting layer is Nb; a preferred I layer is a tunnel barrier Al.sub.2O.sub.3, which may be produced by oxidizing a thin layer of Al, and may be only ˜1-2 nm thick. The magnetic layer is preferably ferromagnetic, with one preferred composition (which is not unique) comprising magnetically soft dilute Pd.sub.0.99Fe.sub.0.01 alloy. The MJJ critical current IC can change reversibly due to the magnetic state of the F layer(s), which constitutes a memory cell. A non-destructive readout of such a cell is obtained using SFQ switching. The state of the magnetic layer can be rewritten using a somewhat larger current pulse.
(50) Note that an SIsFS MJJ comprises a series combination of an SIs junction and an sFS junction, but the entire structure behaves as a single junction with a single value of I.sub.C. The magnetization of the F layer produces magnetic flux Φ which is preferably parallel to the plane of the junction, and modulates I.sub.C of the junction. FIGS. 5A and 5B shows an experimental prototype comprising such an SIsFS junction where S and s are Nb, I is Al.sub.2O.sub.3, and F is Pd.sub.0.99Fe.sub.0.01, data from T. Larkin et al., “Ferromagnetic Josephson Switching Device with High Characteristic Voltage”, Appl. Phys. Lett., vol. 100, 222601 (2012), expressly incorporated herein by reference. Here the V(I) curves are for an MJJ (open circles) and for a similar junction but without the F layer. The switching data on the right show the voltage for an MJJ which exhibits two critical currents I.sub.C0>I.sub.C1, depending on the magnetization of the F layer. The junction is biased at a current I between I.sub.C0 and I.sub.C1, so that if the MJJ has the higher value of I.sub.C, its voltage is zero (‘0’ state), while if the MJJ has the lower value of I.sub.C, its voltage is nonzero (‘1’ state). Therefore, a weak magnetic field pulse can switch the MJJ between the ‘0’ state and the ‘1’ state, repeatedly and reproducibly. Specifically, a positive field pulse switches the MJJ from the ‘0’ to the ‘1’ state, while a negative field pulse switches the MJJ from the ‘1’ state to the ‘0’ state. This junction is hysteretic, but it can be converted to a non-hysteretic junction more appropriate to RSFQ circuits, by shunting with a small resistor, as is known in the prior art.
(51) A detailed theory of the critical current of similar SIsFS structures was recently presented in Bakurskiy et al., “Theoretical model of superconducting spintronic SIsFS devices”, Appl. Phys. Lett., vol. 102, 192603 (2013); and in Vernik et al., “Magnetic Josephson junctions with superconducting interlayer for cryogenic memory”, IEEE Trans. Appl. Supercond., vol. 23, 1701208 (2013), expressly incorporated herein by reference. Other recent research (see L. Uspenskaya, et al., “Magnetic patterns and flux pinning in PdFe—Nb hybrid structures”, JETP Lett., vol. 97, p. 155 (2013), expressly incorporated herein by reference) has shown that the effective magnetization in the dilute ferromagnetic layer is controlled by the presence of Fe-rich Pd.sub.3Fe nanoclusters, which can be easily reordered by a weak magnetic field. This suggests possible scalability issues of SIsFS memory elements in submicron junctions. Further, the contribution to the net magnetic flux inside the junction becomes smaller with decreasing cross-sectional area of the junction. In order to maintain a flux ˜Φ.sub.0/2, the composition and thickness of the F layer may need to be changed in smaller junctions. For example, the Fe content in the dilute PdFe alloy may need to be increased, or alternatively, one could split the F layer into two layers separated by another s layer, creating an SIsFsFS stack. This memory layer progression is shown in FIG. 4. These two F sublayers could be either parallel or antiparallel in their magnetization, corresponding to different values of flux in the junction
(52) Another preferred embodiment of the MJJ is shown in FIGS. 24A-24C which incorporates two ferromagnetic layers F.sub.1 and F.sub.2 in a basic structure SF.sub.1IF.sub.2S. In some cases, a thin normal (N) layer (such as Cu) may also be introduced between F layers to decouple them.
(53) FIGS. 26A-26B also show the current-voltage characteristic of a prototype Nb/Ni/AlOx/Ni/Nb MJJ device, 10 μm square, at 4 K. The critical current is strongly modulated with a relatively weak magnetic field, due to the magnetization in the Ni layers. See Prokopenko, et al., “DC and RF measurements of superconducting-ferromagnetic multi-terminal devices”, Proc. IEEE 14.sup.th Int. Superconductive Electronics Conf. (2013), expressly incorporated herein by reference.
(54) The functioning of this MJJ embodiment is believed to be due to rotation of the magnetization of one F layer relative to the other. For example, the bottom F layer (F.sub.1 in FIGS. 24A and 24B) may have a fixed magnetization direction, which may be produced by applying a magnetic field of about 5 mT during layer deposition to establish an easy axis of magnetization parallel to the field. In contrast, the magnetization of the upper F layer (F.sub.2) may be able to rotate relative to that of F.sub.1. Note that an antiparallel arrangement will correspond to a smaller flux in the junction and hence a higher critical current, as compared with a parallel arrangement.
(55) This rotation of magnetization in one of two magnetic films is similar to the behavior of conventional magnetic spin valves. See, e.g., en.wikipedia.org/wiki/Spin_valve, expressly incorporated herein by reference. Spin valves typically incorporate an extra antiferromagnetic (AF) layer to pin the magnetization of an F layer using the exchange bias effect. An alternative strategy without an AF layer is preferred, whereby F.sub.1 is designed to have a higher coercive force than that of the free F.sub.2 layer. Hence for a magnetic field exceeding the coercive force of the F.sub.2 layer but less than that of F.sub.1, the former will switch, leaving the latter unaffected. For example, if a CuNi alloy is used for the F layers, a thin permalloy (Py) layer on the top CuNi layer may lead to a coupled film with reduced coercive field.
(56) A further preferred embodiment of an MJJ comprises a double-tunnel-junction structure that functions as a three-terminal superconducting device, with an injector junction that modulates the critical current of a Josephson junction. The critical current of a conventional Josephson junction may be modulated by an external magnetic field, but that inductive coupling may not be fully scalable to small submicron junctions. The SISFIFS device of FIGS. 6A-6B, also known as a Superconductor-Ferromagnet Transistor or SFT, provides scalable modulation with good input/output isolation. See Nevirkovets, “Hybrid superconductor-ferromagnet transistor-like device”, Supercond. Sci. Technol., vol. 24, 024009 (2011), and Prokopenko (2013), expressly incorporated herein by reference. The SFIFS junction represents the injector junction, whereby the introduction of the thin F layer substantially improves the isolation from the SIS acceptor junction. The injector junction may have zero critical current (if the thicknesses of the F layers are large enough), but the acceptor junction may be a standard Josephson junction, which may be non-hysteretic or hysteretic in its V(I) relation. If it is hysteretic, it may be made non-hysteretic using a resistive shunt as is known in the prior art. Very recently, Prokopenko et al (2013) showed 30 dBV input/output isolation and a gain of 1.25.
(57) FIGS. 6A-6B show the SFQ-MJJ memory cell and a cell array organization in model and schematic form, which may be based on SIsFS junctions. The entire memory cell comprises just a single MJJ. The cells are serially connected to form a bit array column. The key feature of this design is the column layout implemented as a microstrip passive transmission line (PTL) formed by the connected superconducting electrodes of the MJJs over a superconducting ground plane. FIG. 7B shows a perspective view of the layer structure of the SFQ-MJJ cell. This shows the Word Select Line (WL) and Bit Line-Write (BL-W) on top for clarity, although these might be underneath the junction in a real device. The WL is a current line controlled by a JJ-based current-loop line driver; see FIGS. 26A and 26B, described in more detail below. For the Write operation, BL-W and WL current lines intersect with current pulse shapes indicated in FIG. 7C.
(58) A key innovation of this preferred embodiment of an SFQ-MJJ memory cell is the ballistic SFQ readout (FIGS. 25A and 25B), in which interrogating SFQ pulses propagate along the bit column PTL. In the superconducting state, an MJJ is equivalent to a nonlinear inductor with a Josephson inductance L.sub.J˜(Φ.sub.0/2π)(I.sub.C.sup.2−I.sub.b.sup.2).sup.−1/2. Each readout column is a PTL comprising the distributed inductance of the junctions and their electrodes, together with the distributed capacitance between microstrips and the ground plane. For the word (row) selection, a WL current is applied to induce a reference magnetic field to the intersecting MJJs. All other rows of MJJs will be in the high-I.sub.C state. This puts the selected MJJs into a state with two clearly distinguishable values of I.sub.C, depending on the MJJ magnetization state. If the I.sub.C of the MJJ is high (stored ‘0’), then the Read SFQ will traverse the MJJ and will continue its propagation down the column PTL to the Sense Circuit on the bottom. If the I.sub.C of the MJJ is low (stored ‘1’), then the Read SFQ pulse will exceed the I.sub.C of the MJJ, causing the junction to switch, and the SFQ pulse will escape from the PTL. This is equivalent to the PTL temporarily opening, causing the propagating pulse to be destroyed. Simulations show that this process is quite robust, although it will create weak reflections and ripples at the PTL output and input, which can be easily discriminated by the Sense Circuit, which comprises a one-junction SFQ receiver. The Read process is entirely ballistic and is free of half-select problems; energy is dissipated only during reading out a ‘1’. The critical current of MJJs can be somewhat lower than typical for RSFQ circuits (˜100 μA), since the bit-error-rate (BER) of SFQ transmission is quite low. MJJ is projected to be I.sub.Cs˜10 μA or even lower, which makes the read energy of a ‘1’ E=Φ.sub.0I.sub.C˜10.sup.−20 J.
(59) The cell area of an SFQ-MJJ is very small, less than 1 μm.sup.2. Even accounting for a larger pitch to avoid intercell crosstalk in an MRAM array, the resulting density should exceed 10.sup.7 bits/cm.sup.2. These superconducting PTLs should be practically free of loss and dispersion, but if necessary one could include periodic Josephson junction repeaters to regenerate the Read SFQ pulse. For example, one could include two Josephson junctions every 16 MJJs in the column. This would not substantially reduce the MRAM memory density.
(60) The line drivers for the BL-W bit lines are shown schematically in FIGS. 26A and 26B. These are designed using a new energy-efficient current steering technique. The current steering is accomplished using two SQUID self-resetting switches, steering DC current to either of two superconducting bit lines. All bit lines are connected serially and share the same DC bias, with energy dissipated per switching event ˜LI.sub.b.sup.2/2. The power dissipation occurs only during a switching event from ‘1’ to ‘0’ or ‘0’ to ‘1’, and power is not dissipated once the switching process is completed. This approach results in substantial power savings compared to prior-art SQUID stack drivers, which dissipated power while the state was ‘1’.
(61) Another important RSFQ periphery circuit for the MRAM is an address decoder, shown in FIGS. 27A and 27B, which shows the detailed circuit schematic of a single row in FIG. 27A on the left, and the layout of a prototype 4-bit decoder in FIG. 27B on the right. This N-to-2.sup.N decoder was designed using energy-efficient ERSFQ logic, with special attention towards reduction of its circuit complexity and layout area. The required bit decoding function is achieved with only 3 junctions per bit line. The layout of the 4-bit decoder made use of Gray-code addressing, and used only 140 aJ of power. Both power and area are expected to be reduced further in a fully optimized process.
(62) FIGS. 8A-8C shows an alternative MRAM memory architecture that uses SFT-MJJs, where the SFT acts as a cell selector. The SFT-MJJ cell is formed by a single MJJ and SFT cell selector connected in parallel (see FIG. 8A). These cells are serially connected to form an array column. As the SFT has shown excellent I/O isolation, this cell selector may be functionally similar to an FET in conventional room-temperature MRAM cells. FIG. 8B shows the layer structure of an SFT-MJJ memory cell, in which both devices are fabricated in the same in-situ process and arranged vertically. (Further details on the Fabrication are given below.) It is important to note that the SFT-MJJ is not a SQUID, in that the loop inductance is very small, so that the two branches of the Josephson junction are essentially in phase. When the Read current is applied to the BL-Read line, the current splits and distributes in each cell in accordance with the I.sub.C of each branch. Therefore, any reduction in I.sub.C of one branch will redistribute some current to the other branch.
(63) During the Read process, a WL-Read current is applied to the SFT injector in selected Word cells. This action suppresses I.sub.C of the SFT acceptor junction, and increases the BL current portion flowing through the MJJ branch. This current increase is designed so as to trigger the MJJ into a resistive mode if ‘1’ is stored (low I.sub.C of the MJJ), or it will stay in the superconducting state if ‘0’ is stored (high I.sub.C of the MJJ). One can read the corresponding voltages at the top of the BL-Read line using a simple voltage sense JJ circuit. Simulations show that the optimum ratio between the nominal I.sub.C of the MJJ and SFT acceptor is 5:1. This leads to 50% modulation of the I.sub.C of the MJJ, which in turn leads to ±30% margins in the BL-Read current. The voltage across the MJJ will not leak to other word cells (half-selected), nor to any other columns due to the isolating properties of the SFT. The line drivers are identical to those described above for the SFQ-MJJ cell arrays. For the Write operation, intersecting BL-W and WL current lines with current pulse shapes are used, as shown in FIG. 7C.
(64) In this current Readout mode, the energy consumed is somewhat larger than that of the SFQ-MJJ cells using the ballistic SFQ readout, by a factor ˜10, but still quite small. The Write energy is essentially the same as for the SFQ-MJJ cells. The cell area for the SFT-MJJ cell will be somewhat larger than that of the SFQ-MJJ cell, if they are fabricated side-by-side as shown in FIG. 7B. However, an alternative fabrication may allow them to be stacked vertically, yielding a very similar bit density ˜10.sup.7 bits/cm.sup.2.
(65) IV. Superconducting Interface Circuits for Spintronic Memory Cells
(66) As an alternative embodiment to MJJ-based memory cells described above, one can use spintronic MRAMs (based on electron spin transfer in magnetic materials) that are specially designed to operate at cryogenic temperatures of 4 K and be compatible with superconducting interface circuits. Preferred embodiments are cryogenic implementations of orthogonal spin transfer (OST) and spin-Hall effect (SHE), and are referred to as COST and CSHE.
(67) FIGS. 9A-9B show a memory cell that comprises a COST junction (called OSTJ in FIG. 9A) connected in parallel with an unshunted SQUID (with hysteretic Josephson junctions) and an inductance L. When the current from the word line selects the corresponding SQUID, its unshunted JJs switch to the voltage state, forcing the bit line current to flow through the OSTJ. By changing the polarity of the bit line current, we can magnetize (Write ‘1’) or demagnetize (Write ‘0’) the OSTJ. In order to read out, we apply approximately half of the Write current to the bit line and excite the SQUID with a short pulse through the Word line. A SQUID shunted through with an R-L is called a Relaxation Oscillator SQUID (RO-SQUID). See U.S. Pat. No. 5,406,201, expressly incorporated herein by reference. At the value of OSTJ resistance R close to that for critical damping of the SQUID (with McCumber-Stewart parameter β.sub.C˜1), the voltage across the SQUID will be either in the shape of an oscillatory relaxation pulse, or a continuous DC offset, depending on the values of R and L. Because of resonance conditions, even a small increase in resistance R stops the SQUID relaxation. This phenomenon can be applied to readout of the memory cell, without any other circuitry. The relatively small resistance of an OSTJ makes this readout quite feasible. Read and Write currents are summarized in Table I below. A drawback to this RO-SQUID readout is the relatively large area associated with the SQUID and inductor, which may be as large as 100 μm.sup.2.
(68) TABLE-US-00001 TABLE I Current Parameters for COST cell with RO-SQUID Write ‘0’ Write ‘1’ Read Bit Line −I.sub.write +I.sub.write ~I.sub.write/2 Word Line I.sub.select I.sub.select I.sub.select (short pulse)
(69) A much more compact superconducting interface circuit for COST and CSHE cells than the RO-SQUID is a three-terminal nanowire device (NWD), illustrated in FIG. 9A. The scale is deep submicron, with a typical channel width ˜100 nm or less. An NWD is functionally similar to a traditional FET in semiconductor technology, although it exploits a very different physical phenomenon to achieve switching. It is fabricated in a 2D geometry from a single thin film of superconducting material, typically an ultrathin film of NbN that is highly resistive in its normal state. The three-terminal device is separated into two distinct regions: the gate and the channel. Similar to a non-inverting transistor amplifier, when a logical LOW is fed into the gate input, the channel remains superconducting, and when a logical HIGH is fed into the gate, the channel becomes highly resistive (typically >2 kΩ). Unlike an FET, however, the NWD switching action is controlled by modulating the gate and channel between the superconducting and resistive states. The resistive transition in the channel is induced by locally exceeding the critical current density in the channel, causing current that would otherwise freely drain through the channel to be diverted to the output.
(70) Tests of a prototype device have shown operation for frequencies>100 MHz, with an output impedance of 100Ω, and given the design similarity to superconducting nanowire single-photon detectors (SNSPDs) mentioned above, the device should be capable of approaching at least 1 GHz. Further, previous work on SNSPDs has shown that the device jitter is less than 40 ps, suggesting similar jitter performance for an NWD. This prototype NWD was capable of driving devices with impedances between 10Ω and 10 kΩ, taking a 10 μA signal into the gate and outputting 40-80 μA, depending on the output impedance.
(71) Integration of nanowire superconducting logic will expand the domain of RSFQ, particularly in the area of memories. The device's ability to drive high output impedances will be of particular value to RSFQ integration. NWDs are used here as high-impedance line drivers for connecting RSFQ digital circuits and spintronic memories. Their large current gain may also be used as a way to generate SFQ fanout pulses in RSFQ circuits. The superconducting layer for NWDs may be integrated into a standard RSFQ process, as described below. The same superconducting layer may also function as a high-inductance layer for RSFQ circuits.
(72) Spin-Hall-effect (SHE) memories are being developed for room-temperature operations, see U.S. Pat. No. 7,839,675; 2014/0001524; also WO2014/025838, all expressly incorporated herein by reference. The present application uses versions of these memory cells operating at cryogenic temperatures, known as cryogenic SHE or CSHE.
(73) FIG. 10 shows a symbol and simplified block diagram of memory element based on CSHE, with an NWD-driven select line. The CSHE has low write resistance and exhibits high magnetoresistance. The latter prevents the use of JJs for readout, so an alternative is a “cross-talk” readout scheme described below. The CSHE is a three-terminal device that allows one to decouple Read and Write operations. The NWD may be used as a selection element for Write. The Read operation requires a separate grid of impedance-matched lines for transmitting voltage pulses along a row, while sensing their responses along all columns, thus providing word-parallel memory readout. All of these lines are superconducting passive transmission lines, assuring lossless, dispersionless transmission of pulses, and enabling large memory arrays with low power dissipation.
(74) A similar NWD device may be used as the driver for a COST memory cell, which is a two-terminal device as shown in FIG. 12A. This shows the design of an MRAM memory cell and array based on a combination of COST and NWD devices, together with superconducting read and write lines. The NWD acts as a cell-selecting device functionally similar to an FET in a typical room-temperature spin-torque transfer (STT) MRAM cell. See, for example, U.S. Pat. Nos. 7,170,778; 8,611,117; 8,116,122; 2014/0035617; 2014/0015074, expressly incorporated herein by reference. Once the NWD switches from superconducting to resistive upon activation of the Word Line (WL) Select current, it redirects Read or Write currents to the COST element. Once the WL current is turned off, the NWD selector returns to the superconducting state. The power is dissipated only at the selected cells during Read or Write operations. Since the NWD has a significant power gain, only a very small current is needed to activate the selected NWD. Furthermore, an NWD can be designed with an output impedance that closely matches an optimized COST cell, although it is much higher than the typical impedance for RSFQ. This corresponds to a very high magnetoresistance (MR) close to 100% or above, which in turns leads to a robust memory cell design with high parameter margins. The NWD/COST memory cell circuit area is defined by the COST pillar and the NWD. It is straightforward to integrate both of these side by side within a 2 μm×2 μm area, sufficient to achieve a memory density>10.sup.7 bits/cm.sup.2. It is possible to reduce the cell size even further by fabricating the NWD selector under the COST pillar.
(75) FIG. 12B shows the simplified block diagram of an NWD-COST MRAM cell array. In order to match the higher impedance of the NWD, it is preferable to use line drivers based on similar NWDs biased with AC current. The NWD drivers will redirect the bias current into Word and Bit lines once an SFQ pulse arrives from the RSFQ periphery circuits. The NWD returns to the OFF state once the bias current is reduced to zero. The Read driver can be constructed as a single NWD. For the Write driver, one can use the same driver switched ON for a longer period, or having a larger current amplitude. For a bipolar Write driver, one can use a differential push-pull scheme. The main challenge in the design of NWD-COST MRAM will be to optimize the NWD to respond to an SFQ pulse input. This will be done by designing an SFQ/NWD signal converter.
(76) V. Cryogenic Multi-Chip Module (MCM) for Hybrid Technology Computing System
(77) In order to communicate between a cryogenic high-speed processor or memory array on the one hand, and a room-temperature system controller on the other, one needs to address an interface problem of sending multiple N-bit words (where N may be 64 bits for an advanced processor), addresses, and control signals between the room-temperature and cryogenic systems. FIG. 13 shows a block diagram of a system to test a prototype superconducting MRAM chip. This requires several key technologies: cryogenic high-speed multi-chip modules (MCMs), cryocoolers and cryogenic system integration, superconducting and semiconducting circuits for multi-rate data and clock operation, interfacing between hybrid electronic technologies, and high-speed data processing on FPGAs. These are many of the same technologies that will be needed to develop a hybrid-technology superconducting supercomputer.
(78) FIG. 13 shows a block diagram of a system for testing high-speed functional operation of a 64×64 bit MRAM chip under test (CUT). The system comprises a cryogenic Testbed MCM (comprises the MRAM chip and a Test Control and Acquisition Chip—TCA) linked to a room-temperature FPGA-based memory test controller (MTC). The communications between the MRAM and the TCA on the MCM comprise 64 parallel bits at high speeds (20 GHz or above), while the communications between the TCA and the MTC are at much lower speeds, and mostly serialized data.
(79) The intention here is to test the performance and yield of multiple MRAM chips, on the same MCM with the same TCA. This requires the use of a reworkable MCM bonding technology for cryogenic chips with multi-GHz signals. See U.S. Pat. No. 8,159,825, expressly incorporated herein by reference. This allows one to successively test multiple MRAM chips by dismounting the tested memory chip without damaging the contact pads of the Testbed MCM. The TCA chip will be mounted using permanent bonding epoxy, as it will not need to be changed.
(80) The FPGA-based MTC is programmed to generate pseudorandom 64-b words and send them to specific addresses in the 64-word MRAM array, and later to retrieve the same words and determine whether there are any bit errors. In more detail, the MTC comprises an algorithm-based pattern generator (to generate the words and the addresses), a verification module (to check for bit errors), and a control block that provides an interface to an external control computer for test summary and evaluation.
(81) The TCA chip (with block diagram shown in FIG. 14) communicates 64-bit data words and addresses serially at low speeds (MHz to 1 GHz) with the MTC module, and communicates the same data in parallel at high speeds (tens of GHz) to/from the MRAM chip. The TCA chip comprises: High-frequency (HF) clock controller: An SFQ device that produces 8 high-speed SFQ pulses for one test cycle at the trigger from the MTC module. Input data buffer: A latch-based buffer capable of storing 8 64-bit words. The data are serially loaded via deserializer at low speed. At the signal from the HF clock controller, all 8 words of the test data are sent to the chip under test. Address buffer: A latch-based buffer capable of storing eight 6-bit addresses and a 1-bit control signal (read/write). As with the input data buffer, it has a serial interface to the MTC module. Output data buffer: A latch-based buffer capable of storing 8 64-bit words. The 8-word data block read from the memory chip under test are recorded at high speed and then serially uploaded to the MTC module via a serializer at low-speed. Time-to-digital converter: RSFQ TDC circuit (see U.S. Pat. No. 6,653,952, expressly incorporated herein by reference) for measurement of the MRAM access time (5 ps time resolution).
(82) This test setup will provide flexibility in MRAM testing, allowing test programs to investigate such things as critical test patterns and pattern sensitive faults. In general, there are three classes of errors: bit cell soft errors, hard errors, and transmission errors. Since a cryogenic memory system cannot be tested without the interface link, it is very likely that transmission errors, especially at high data rates, are inseparable from other errors in the system. This system will also permit direct measurement of all memory performance parameters such as cycle time, access time, and access power.
(83) FIGS. 28A and 28B show a configuration of a recent cryogenic test system (for superconducting high-speed digital receiver systems) that may provide a similar cryogenic package to the proposed superconducting MRAM test system. See, e.g., D. Gupta et al., “Modular Multi-function Digital-RF Receiver Systems,” IEEE Trans. Appl. Supercond., vol. 21, p. 883 (2011), expressly incorporated herein by reference. The illustrated system was built around a Sumitomo two-stage cryocooler, with a 4 K cold stage and a 50 K intermediate temperature stage. The cryogenic system may use a combination of active and passive magnetic shielding of the MRAM chips and RSFQ circuits. See, e.g., Y. Polyakov, “3D active demagnetization of cold magnetic shields”, IEEE Trans. Appl. Supercond., vol. 21, p. 724 (2011), expressly incorporated herein by reference.
(84) Proper high-speed testing of MRAM chips requires data exchange at the level of 64-bit words at full speeds, which may ultimately be as fast as 100 GHz. In general, bit errors of all types increase at high frequencies. RSFQ circuits are characterized by SFQ voltage pulses, with integrated voltage of 2 mV-ps, typically corresponding to a signal ˜1 mV high with a pulsewidth of 2 ps. These pulses pass between chips on an MCM, using passive microstrip transmission lines (PTLs), over distances of up to 10 cm or more. This is especially challenging when a parallel word of 64 bits is sent simultaneously. It is virtually impossible to maintain fully synchronous signals over these distances.
(85) FIG. 15A presents a preferred embodiment of a method for clock recovery when a parallel set of SFQ pulses is sent across PTLs from one chip to another. At the transmit chip on the left, each of the 64 bits has a clocked destructive memory cell, a DFFC, a standard RSFQ cell which is a D-flip-flop with complementary outputs (see pavel.physics.sunysb.edu/RSFQ/Lib/dffc.html). The DFFC has one data input, a clock input, and two outputs, the regular (non-inverting) output (top) and the inverting output (bottom). If the data stored is a ‘1’, the DFFC generates an SFQ pulse from its non-inverting output when triggered. If the data stored is a ‘0’, the DFFC generates an SFQ pulse from its inverting pulse when triggered. This lends itself naturally to dual rail data propagation, where each DFFC always sends an SFQ on one of its two output lines (never both), regardless of the data. At the receive end, the non-inverting output lines are sent to FIFO (first-in, first out) memory buffers. (See, e.g., Herr & Bunyk, “Implementation and application of first-in, first-out buffers”, IEEE Trans. Appl. Supercond., vol. 13, p. 563, 2002, expressly incorporated herein by reference.) Further, the 64 bit signals from both ‘0’ and ‘1’ lines are sent to a tree of Muller C-elements (the element with a C, having a schematic shown in in FIG. 15B). The C-element, also known as a confluence buffer, is another standard RSFQ cell (pavel.physics.sunysb.edu/RSFQ/Lib/c.html, expressly incorporated herein by reference) which in this case acts essentially as an asynchronous AND. The final root of the tree generates the new clock which triggers the FIFO buffers, and releases the data to the receiving circuit. This approach ensures that if there is some dispersion in bit arrival, the latching clock is not released until all bits have successfully arrived.
(86) A rapid train of SFQ pulses may maintain its integrity when propagated on lossless superconducting lines at 4K, but these pulses must be substantially amplified to avoid bit errors when propagated on conventional lines at room temperature. This is necessary, for example, in the data sent from the TCA to the MTC. One preferred approach is to provide a cascade of broadband semiconductor amplifiers sending signal on low-loss transmission lines, taking care not to introduce significant noise or heat into the cryogenic system. These transmission lines may comprise high-temperature superconducting electrodes over the colder parts of the data path to room temperature. An alternative preferred approach is to switch to the optical domain at a convenient point, and transmit the signal further via infrared pulses on low-loss optical fibers. Optical signals are well known for the ability to multiplex many signals on the same optical fiber without loss or crosstalk. Optical fibers are also quite compatible with cryogenics, and provide high data throughput with very little heating. Semiconductor laser diodes (such as VCSELs) may be the source of such electro-optical transducers, and fast semiconductor photodiodes may be optoelectronic receivers that convert optical signals back to electrical pulses.
(87) VI. Integrated Circuit Process with Both Superconducting Circuits and MRAM Cells
(88) To manufacture hybrid superconducting/MRAM circuits, it is essential to combine the integrated circuit processes for both technologies. This builds on the superconducting IC foundry previously developed at Hypres for Nb-based circuits with a complexity ˜10 k Josephson junctions per 1 cm.sup.2 chip. Recently, Hypres developed a fabrication process with 6 superconducting layers and planarization using chemical-mechanical polishing (CMP), and adopted a CALDERA process for performing pattern-independent planarization. See U.S. Pat. Nos. 8,301,214; 8,473,818; 8,383,426; 2011/0089405; all expressly incorporated herein by reference. The process involves one CMP step per layer, planarizing the layer as well as the via that connects it to the next layer. The process is integrated with the previous standard process by adding the new layers below the ground plane, and hence enabling extension of the number of layers to 4+n, where n is the number of additional planarized layers. The fact that there is one CMP step makes the process ˜20% faster per layer to implement, and integration by extending the number of layers has led us to name the process RIPPLE (Rapid Integration of Planarized Process for Layer Extension). See U.S. Provisional Patent application 61/887,919, “Method for increasing the integration level of superconducting electronic circuits, and a resulting circuit”, expressly incorporated herein by reference. The present RIPPLE-2 process with 6 superconducting layers is being extended to a RIPPLE-4 process with 8 superconducting layers, followed by a RIPPLE-6 process with 10 superconducting layers.
(89) In one preferred embodiment, the MJJ/SFT fabrication can be integrated with one of these RIPPLE processes. In order to fabricate MJJ and SFT devices, an existing deposition module with four 4″ magnetron sputtering is fitted with two types of ferromagnetic materials: a PdFe alloy (99% Pd/1% Fe) and Permalloy (80.2% Ni/14.7% Fe/4.6% Mo/0.5% Mn). The magnets on the 4″ cathodes are upgraded to high-strength magnets in order to enable sputtering of ferromagnetic materials.
(90) FIG. 16 shows a cross-sectional view of a process that integrates existing planarized superconducting layers with MJJ cells, with additional superconducting wiring layers on top. In order to simplify the fabrication process, it is important to make the MJJ (SFIFS) and SFT (SFIFSIS) multilayer structures in the same deposition run. This can be done by depositing a stack of an MJJ on top of an SFT, i.e., depositing an SFIFSISFIFS multilayer structure in-situ, as shown in FIG. 17A. (The S layers are all Nb, and the I layers may all be AlOx, but the various F layers may be different as discussed earlier.) This stack is processed first to produce an SFT, and afterwards the larger area MJJ, FIG. 17B. Note that the bottom electrode of the SFT is the SFIFS structure. Since the current flows along the top S layer of the SFIFS structure, the F layers and the AlOx tunnel barrier do not play any role here. In the same way one can combine in the same deposition run other device structures, such as SisFS and SIsFsFS. FIGS. 17C and 17D show two ways to integrate an MJJ and an SFT in a memory cell where the MJJ and SIS acceptor junction are connected in parallel. In the first case, FIG. 17C, the MJJ and SFT are situated in-plane next to each other. The wiring layers connect the bottom electrodes of the MJJ and SFT, and the top electrode of the MJJ is connected to the middle layer of the SFT. In the second case, FIG. 17D, the SFT and MJJ are integrated in a stacked geometry. Here the MJJ and SIS acceptor of the SFT naturally share one electrode. The bottom electrode of the MJJ can be connected to the top electrode of the SIS as shown by the slanted via contact in FIG. 17D. This design makes for a very small memory cell, enabling very dense MRAM. Note that FIGS. 17A-17D do not show control lines needed for flipping the magnetization in one of the F layers of the MJJ. These lines would be done using the RIPPLE process, in which the control lines run beneath the memory cell.
(91) In an alternative preferred embodiment, either the COST or the CSHE cells may be integrated with the Josephson junction circuits of the RIPPLE process, and also the NWDs. This is analogous to the proven development path for room-temperature MRAM, in which magnetoresistive devices (such as OST) are integrated on top of prefabricated CMOS wafers.
(92) FIG. 18 shows an example of COST and NWD devices grown on top of planarized superconducting devices, with an additional superconducting wiring layer on top. The NWD layer may comprise an extra thin Nb or NbN layer. A similar process would be used for integrated fabrication of MRAM based on CSHE devices.
(93) The proposed integrated fabrication process is compatible in temperatures, materials, and equipment. Specifically, JJ circuits are sensitive to degradation if the temperature is raised above 200° C. for any part of the subsequent process. Fortunately, the COST fabricated steps do not involve annealing, and no steps require more than 150° C. Furthermore, both JJ and COST devices use transition metals, ensuring compatibility of process materials, process rates and conditions, and equipment. Although contamination of the Nb superconducting process by ferromagnetic materials is possible (and could degrade the superconductivity), this is practically manageable and presents a low risk, as demonstrated in preliminary efforts at process integration.
(94) These detailed examples of preferred embodiments do not imply that this invention is limited only to these examples. Other embodiments of energy-efficient superconducting computers with hybrid memory arrays may also follow from the principles herein disclosed.