Patent classifications
G06F15/803
Faulty core recovery mechanisms for a three-dimensional network on a processor array
Embodiments of the invention relate to faulty recovery mechanisms for a three-dimensional (3-D) network on a processor array. One embodiment comprises a multidimensional switch network for a processor array. The switch network comprises multiple switches for routing packets between multiple core circuits of the processor array. The switches are organized into multiple planes. The switch network further comprises a redundant plane including multiple redundant switches. Multiple data paths interconnect the switches. The redundant plane is used to facilitate full operation of the processor array in the event of one or more component failures.
Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs), and related methods
Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs) and related methods are disclosed. In aspects disclosed herein, ICs are provided that include a central processing unit (CPU) having multiple processor cores (cores) to improve performance. To further improve CPU performance, the multiple cores can also be designed to communicate with each other to offload workloads and/or share resources for parallel processing, but at a communication overhead associated with passing data through interconnects which have an associated latency. To mitigate this communication overhead inefficiency, aspects disclosed herein provide the CPU with its multiple cores in a 3DIC. Because 3DICs can overlap different IC tiers and/or align similar components in the same IC tier, the cores can be designed and located between or within different IC tiers in a 3DIC to reduce communication distance associated with processor core communication to share workload and/or resources, thus improving performance of the multi-processor CPU design.
Apparatus and Methods of Providing Efficient Data Parallelization for Multi-Dimensional FFTs
In some embodiments, an apparatus may include a memory configured to store data at a plurality of addresses and a processor circuit including a plurality of processor cores. Each processor core may include multiple threads. The processor circuit may be configured to subdivide an input data stream into a plurality of three-dimensional matrices corresponding to a number of processor cores of the processor circuit. The processor circuit may be further configured to associate each matrix with a respective one of the plurality of processor cores and determine concurrently a three-dimensional FFT for each matrix of the plurality of three-dimensional matrices within the respective one of the plurality of processor cores to produce an FFT output.
DIE AND PACKAGE
Provided efficiently and at low cost are: a package for core number ratios appropriate for all types of computers; and dies included in the package.
This package includes at least one die provided with: at least one of a first core formed of a CPU core or a latency core and a second core formed of an accelerator core or a throughput core; an external interface; memory interfaces 24 to 26; and a die interface 23 which is connected to another die.
The die includes a first type die and a second type die each including both the first core and the second core and the core number ratio between the first core and the second core in the first type die differs from that in the second type die.
Moreover, the memory interfaces include an interface conforming to TCI.
In addition, the memory interfaces further include an interface conforming to HBM.
DIE AND PACKAGE, AND MANUFACTURING METHOD FOR DIE AND PRODUCING METHOD FOR PACKAGE
To enable to provide efficiently and at low cost: a package for core number ratios appropriate for all types of computers; and dies included in the package.
A set of the dies and the package are provided with a plurality of dies each including at least an accelerator core 21 or a CPU core 22, an external interface, memory interfaces 24 to 26, and a die interface 23 which is connected to another die.
The die includes a first type die and a second type die each including both the accelerator core and the CPU core, and the core number ratio between the accelerator core and the CPU core in the first type die differs from that in the second type die.
Moreover, the memory interfaces include an interface conforming to TCI.
In addition, the memory interfaces further include an interface conforming to HBM.
PARALLEL PROCESSING APPARATUS, PARALLEL COMPUTING METHOD, AND RECORDING MEDIUM STORING PARALLEL COMPUTING PROGRAM
A parallel processing apparatus includes: processors; and a network switch, wherein a first processor: generates divided matrix data by dividing the matrix data in such a manner that an overlapping portion is present with each other; transmits the divided matrix data to a second processor; generates first evaluation-value matrix data from the divided matrix data; transmits, to the second processor, first elements in a first overlapping portion of the first evaluation-value matrix data; receives, from the second processor, second elements of a second overlapping portion of second evaluation-value matrix data; calculates first added evaluation data by adding the second elements to the first elements; transmits the first added evaluation data to the second processor; receives, from the second processor, second added evaluation data; and calculates a first C point or a first F point based on the first evaluation-value matrix data which is updated using the second added evaluation data.
NEURAL NETWORK ACCELERATOR
A computing element array system includes an array of computing elements connected by connections. Each computing element has a control circuit, a storage circuit, and an operation circuit and the connections each connect two computing elements. The storage circuit can input and store a data packet comprising a data value and a target-tag from one of the connections. The operation circuit can perform an operation on the data value to form a processed data value. The target-tag specifies a computing element to perform the operation on the data value. The control circuit can identify a computing element from the target-tag, enable the operation circuit to process the data value if the identified computing element matches the computing element, modify the data packet to comprise the processed data value, and enable the output of the modified data packet on one of the connections.
MULTI-PROCESSOR CORE THREE-DIMENSIONAL (3D) INTEGRATED CIRCUITS (ICs) (3DICs), AND RELATED METHODS
Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs) and related methods are disclosed. In aspects disclosed herein, ICs are provided that include a central processing unit (CPU) having multiple processor cores (cores) to improve performance. To further improve CPU performance, the multiple cores can also be designed to communicate with each other to offload workloads and/or share resources for parallel processing, but at a communication overhead associated with passing data through interconnects which have an associated latency. To mitigate this communication overhead inefficiency, aspects disclosed herein provide the CPU with its multiple cores in a 3DIC. Because 3DICs can overlap different IC tiers and/or align similar components in the same IC tier, the cores can be designed and located between or within different IC tiers in a 3DIC to reduce communication distance associated with processor core communication to share workload and/or resources, thus improving performance of the multi-processor CPU design.
SYSTEMS, DEVICES, ARTICLES, AND METHODS FOR QUANTUM PROCESSOR ARCHITECTURE
A topology or hardware graph of a quantum processor is modifiable, for example prior to embedding of a problem, for instance by creating chains of qubits, where each chain which operates as a single or logical qubit to impose a logical graph on the quantum processor. A user interface (UI) allows a user to select a topology suited for embedding a particular problem or type of problem, to supply parameters that define the desired topology, or to supply or specify a problem graph or problem definition from which a processor-based system determines or selects an appropriate topology or logical graph to impose. Topologies may have regularity and/or self-similarity over the quantum processor or portions thereof, which portions may constitute unit cells. Logical graphs imposed on the quantum processor may take the form of a hypercube graph. A UI allows the user to specify a desired dimension of the hypercube graph.
INTERCONNECT CIRCUITS AT THREE-DIMENSIONAL (3-D) BONDING INTERFACES OF A PROCESSOR ARRAY
Embodiments of the invention relate to processor arrays, and in particular, a processor array with interconnect circuits for bonding semiconductor dies. One embodiment comprises multiple semiconductor dies and at least one interconnect circuit for exchanging signals between the dies. Each die comprises at least one processor core circuit. Each interconnect circuit corresponds to a die of the processor array. Each interconnect circuit comprises one or more attachment pads for interconnecting a corresponding die with another die, and at least one multiplexor structure configured for exchanging bus signals in a reversed order.