Patent classifications
H03K19/17796
Neural network processing with chained instructions
Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.
SYSTEMS AND METHODS FOR MODULAR DISAGGREGATED INTEGRATED CIRCUIT SYSTEMS
Systems and methods are provided for system circuitry disaggregation into an integrated circuit system with multiple chiplets having disaggregated components. A system may include a first programmable logic fabric die that includes programmable logic circuitry and a number of supporting chiplets that include disaggregated field programmable gate array (FPGA) circuitry. The chiplets are connected to the first programmable logic fabric die in a three-dimensional arrangement.
SYSTEMS AND METHODS FOR MODULAR DISAGGREGATED INTEGRATED CIRCUIT SYSTEMS
Systems and methods are provided for system circuitry disaggregation into an integrated circuit system with multiple chiplets having disaggregated components. A system may include a first programmable logic fabric die that includes programmable logic circuitry and a number of supporting chiplets that include disaggregated field programmable gate array (FPGA) circuitry. The chiplets are connected to the first programmable logic fabric die in a three-dimensional arrangement.
LOGIC DRIVE BASED ON STANDARDIZED COMMODITY PROGRAMMABLE LOGIC SEMICONDUCTOR IC CHIPS
A chip package includes an interposer comprising a silicon substrate, multiple metal vias passing through the silicon substrate, a first interconnection metal layer over the silicon substrate, a second interconnection metal layer over the silicon substrate, and an insulating dielectric layer over the silicon substrate and between the first and second interconnection metal layers; a field-programmable-gate-array (FPGA) integrated-circuit (IC) chip over the interposer; multiple first metal bumps between the interposer and the FPGA IC chip; a first underfill between the interposer and the FPGA IC chip, wherein the first underfill encloses the first metal bumps; a non-volatile memory (NVM) IC chip over the interposer; multiple second metal bumps between the interposer and the NVM IC chip; and a second underfill between the interposer and the NVM IC chip, wherein the second underfill encloses the second metal bumps.
LOGIC DRIVE BASED ON STANDARDIZED COMMODITY PROGRAMMABLE LOGIC SEMICONDUCTOR IC CHIPS
A chip package includes an interposer comprising a silicon substrate, multiple metal vias passing through the silicon substrate, a first interconnection metal layer over the silicon substrate, a second interconnection metal layer over the silicon substrate, and an insulating dielectric layer over the silicon substrate and between the first and second interconnection metal layers; a field-programmable-gate-array (FPGA) integrated-circuit (IC) chip over the interposer; multiple first metal bumps between the interposer and the FPGA IC chip; a first underfill between the interposer and the FPGA IC chip, wherein the first underfill encloses the first metal bumps; a non-volatile memory (NVM) IC chip over the interposer; multiple second metal bumps between the interposer and the NVM IC chip; and a second underfill between the interposer and the NVM IC chip, wherein the second underfill encloses the second metal bumps.
DDR COMPATIBLE OPEN ARRAY ACHITECTURES FOR RESISTIVE CHANGE ELEMENT ARRAYS
A high-speed memory circuit architecture for arrays of resistive change elements is disclosed. An array of resistive change elements is organized into rows and columns, with each column serviced by a word line and each row serviced by two bit lines. Each row of resistive change elements includes a pair of reference elements and a sense amplifier. The reference elements are resistive components with electrical resistance values between the resistance corresponding to a SET condition and the resistance corresponding to a RESET condition within the resistive change elements being used in the array. A high speed READ operation is performed by discharging one of a row's bit lines through a resistive change element selected by a word line and simultaneously discharging the other of the row's bit lines through of the reference elements and comparing the rate of discharge on the two lines using the row's sense amplifier. Storage state data are transmitted to an output data bus as high speed synchronized data pulses. High speed data is received from an external synchronized data bus and stored by a PROGRAM operation within resistive change elements in a memory array configuration.
Methods and apparatuses for sub-threhold clock tree design for optimal power
A method and flow for implementing a “clock tree” inside an ASIC using Sub-threshold or Near-threshold technology with optimal power. The invention may also implement concurrently use of two voltage domains inside a single place and route block. One voltage domain for the “clock tree” buffers and one voltage domain for the other cells at the block. The voltage domain for the “clock tree” buffers that is used is slightly higher than the voltage domain which is used for the other cells. The higher voltage ensures a large reduction of the total number of buffers inside the “clock tree” and the dynamic and static power are reduced dramatically despite the use of slightly higher operating voltage.
MULTI-FUNCTION UNIT FOR PROGRAMMABLE HARDWARE NODES FOR NEURAL NETWORK PROCESSING
Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.
FPGA chip with distributed multifunctional layer structure
An FPGA chip includes one functional unit, one pre-allocation manager, and wiring segments. The functional unit includes a first module CPE and a second module PLF. The pre-allocation manager may be connected by means of one of the wiring segments. By configuring one pre-allocation manager, data transmission directions of the wiring segments may be changed. The functional unit is connected to one pre-allocation manager by means of a conventional line. The first module CPE and the second module PLF which are adjacent in the same functional unit are connected by means of a cross-connection line. The second functional modules are interconnected by means of a conventional routing system. Different functional blocks can be connected to each other from any position of a circuit.
FPGA chip with distributed multifunctional layer structure
An FPGA chip includes one functional unit, one pre-allocation manager, and wiring segments. The functional unit includes a first module CPE and a second module PLF. The pre-allocation manager may be connected by means of one of the wiring segments. By configuring one pre-allocation manager, data transmission directions of the wiring segments may be changed. The functional unit is connected to one pre-allocation manager by means of a conventional line. The first module CPE and the second module PLF which are adjacent in the same functional unit are connected by means of a cross-connection line. The second functional modules are interconnected by means of a conventional routing system. Different functional blocks can be connected to each other from any position of a circuit.