G06F15/803

Discrete three-dimensional processor

A discrete three-dimensional (3-D) processor a plurality of storage-processing units (SPU's), each of which comprises a non-memory circuit and more than one 3-D memory (3D-M) array. The preferred 3-D processor further comprises communicatively coupled first and second dice. The first die comprises the 3D-M arrays and the in-die peripheral-circuit components thereof; whereas, the second die comprises the non-memory circuits and off-die peripheral-circuit components of the 3D-M arrays.

Parallel processing apparatus, parallel computing method, and recording medium storing parallel computing program

A parallel processing apparatus includes: processors; and a network switch, wherein a first processor: generates divided matrix data by dividing the matrix data in such a manner that an overlapping portion is present with each other; transmits the divided matrix data to a second processor; generates first evaluation-value matrix data from the divided matrix data; transmits, to the second processor, first elements in a first overlapping portion of the first evaluation-value matrix data; receives, from the second processor, second elements of a second overlapping portion of second evaluation-value matrix data; calculates first added evaluation data by adding the second elements to the first elements; transmits the first added evaluation data to the second processor; receives, from the second processor, second added evaluation data; and calculates a first C point or a first F point based on the first evaluation-value matrix data which is updated using the second added evaluation data.

Fast determination of workgroup batches from multi-dimensional kernels
10467724 · 2019-11-05 · ·

Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, workgroup batch circuitry is configured to select (e.g., in a single clock cycle) multiple workgroups to be distributed to different shader circuitry. In some embodiments, iterator circuitry is configured to determine next positions in different dimensions at least partially in parallel. For example, in some embodiments, first circuitry is configured to determine a next position in a first dimension and an increment amount for a second dimension. In some embodiments, second circuitry is configured to determine at least partially in parallel with the determination of the next position in the first dimension, next positions in the second dimension for multiple possible increment amounts in the second dimension. In some embodiments, this may facilitate a configurable number of workgroups per batch and may increase performance, e.g., by increasing the overall number of workgroups dispatched per clock cycle.

Processor for Calculating Mathematical Functions in Parallel

A three-dimensional processor (3D-processor) for calculating mathematical functions in parallel, comprises a larger number (e.g. at least one thousand) of computing elements, with each computing element comprising at least one three-dimensional memory (3D-M) array for storing at least a portion of a look-up table (LUT) for a mathematical function and an arithmetic logic circuit (ALC) for performing arithmetic operations on the LUT data. Even though each individual 3D-M cell is slower than a conventional two-dimensional memory (2D-M) cell, this deficiency in speed is offset by a significantly larger scale of parallelism.

HETEROGENEOUS MINIATURIZATION PLATFORM

A method of forming an electrical device is provided that includes forming microprocessor devices on a microprocessor die; forming memory devices on an memory device die; forming component devices on a component die; and forming a plurality of packing devices on a packaging die. Transferring a plurality of each of said microprocessor devices, memory devices, component devices and packaging components to a supporting substrate, wherein the packaging components electrically interconnect the memory devices, component devices and microprocessor devices in individualized groups. Sectioning the supporting substrate to provide said individualized groups of memory devices, component devices and microprocessor devices that are interconnected by a packaging component.

Parallel computing system and communication control program

A parallel computing system includes a plurality of processors multi-dimensionally commented by an interconnection network, wherein each of the processors in the parallel computing system determines, in dimensional order, communication channels to other processors in the interconnection network, each of the processors sets, as relative coordinates of destination processors with respect to the plurality of processors in data communications performed at a same timing, relative coordinates common to all of the processors, and each of the processors performs data communications with destination processors having the set relative coordinates.

HIGH PERFORMANCE COMPUTING (HPC) NODE HAVING A PLURALITY OF SWITCH COUPLED PROCESSORS
20190294576 · 2019-09-26 ·

A High Performance Computing (HPC) node comprises a motherboard, a switch comprising eight or more ports integrated on the motherboard, and at least two processors operable to execute an HPC job, with each processor communicably coupled to the integrated switch and integrated on the motherboard.

Heterogeneous miniaturization platform

A method of forming an electrical device is provided that includes forming microprocessor devices on a microprocessor die; forming memory devices on an memory device die; forming component devices on a component die; and forming a plurality of packing devices on a packaging die. Transferring a plurality of each of said microprocessor devices, memory devices, component devices and packaging components to a supporting substrate, wherein the packaging components electrically interconnect the memory devices, component devices and microprocessor devices in individualized groups. Sectioning the supporting substrate to provide said individualized groups of memory devices, component devices and microprocessor devices that are interconnected by a packaging component.

TECHNOLOGIES FOR PROVIDING A SCALABLE ARCHITECTURE FOR PERFORMING COMPUTE OPERATIONS IN MEMORY

Technologies for providing a scalable architecture to efficiently perform compute operations in memory include a memory having media access circuitry coupled to a memory media. The media access circuitry is to access data from the memory media to perform a requested operation, perform, with each of multiple compute logic units included in the media access circuitry, the requested operation concurrently on the accessed data, and write, to the memory media, resultant data produced from execution of the requested operation.

Network topology system and method

A network topology system comprises a plurality of nodes, each of the plurality of nodes having a set of connection rules which is built by the steps of: generating a series of prime number differences; generating a series of communication strategy numbers; extracting as many terms as the number of connecting nodes from a recursive sequences to serve as an index series; generating a series of connection strategy numbers by extracting the Nth terms from the series of communication strategy numbers, wherein N stands for each number of the index series; and generating a series of connecting nodes numbers by calculating the sum of each odd number and each term of the series of connection strategy numbers so as to build the connection rules for each odd-numbered node to connect the nodes numbered in corresponding with the numbers of the connecting nodes number series.