Patent classifications
G06F15/7896
Apparatus and mechanism for processing neural network tasks using a single chip package with multiple identical dies
Apparatus and methods for processing neural network models are provided. The apparatus can comprise a plurality of identical artificial intelligence processing dies. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies can include at least one inter-die input block and at least one inter-die output block. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies is communicatively coupled to another artificial intelligence processing die among the plurality of identical artificial intelligence processing dies by way of one or more communication paths from the at least one inter-die output block of the artificial intelligence processing die to the at least one inter-die input block of the artificial intelligence processing die. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies corresponds to at least one layer of a neural network.
FLEXIBLE, SCALABLE GRAPH-PROCESSING ACCELERATOR
An accelerator device includes a first processing unit to access a structure of a graph dataset, and a second processing unit coupled with the first processing unit to perform computations based on data values in the graph dataset.
Photoelectric computing unit, photoelectric computing array, and photoelectric computing method
A photoelectric computing unit, a photoelectric computing array and a photoelectric computing method. The photoelectric computing unit includes a semiconductor multifunctional region structure, which includes at least one carrier control region, at least one coupling region, and at least one photon-generated carrier collection region and readout region.
APPARATUS AND MECHANISM FOR PROCESSING NEURAL NETWORK TASKS USING A SINGLE CHIP PACKAGE WITH MULTIPLE IDENTICAL DIES
Apparatus and methods for processing neural network models are provided. The apparatus can comprise a plurality of identical artificial intelligence processing dies. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies can include at least one inter-die input block and at least one inter-die output block. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies is communicatively coupled to another artificial intelligence processing die among the plurality of identical artificial intelligence processing dies by way of one or more communication paths from the at least one inter-die output block of the artificial intelligence processing die to the at least one inter-die input block of the artificial intelligence processing die. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies corresponds to at least one layer of a neural network.
Flexible, scalable graph-processing accelerator
An accelerator device includes a first processing unit to access a structure of a graph dataset, and a second processing unit coupled with the first processing unit to perform computations based on data values in the graph dataset.
OVERLAYS FOR SOFTWARE AND HARDWARE VERIFICATION
Disclosed are various approaches for updating a processor, such as a field programmable gate array (FPGA), to execute in a verifiable manner without having to reprogram or reconfigure the processor after initial configuration. A sequence of values describing an order of execution for a plurality of function blocks within a task lane that represents a task is generated. Then, a list of operation codes and register locations for inputs and outputs of each function block in the task lane is generated. Next, a sequence of function blocks for the task lane based at least in part on the sequence of values and the list of operation codes and register locations is generated. Then, the sequence of function blocks is stored in a memory of the processor. Finally, the sequence of values and the list of operation codes and register locations can be executed with the processor.
Performance islands for CPU clusters
Embodiments include an asymmetric multiprocessing (AMP) system having two or more central processing unit (CPU) clusters of a first core type and a CPU cluster of a second core type. Some embodiments include determining a control effort for an active thread group, and assigning the thread group to a first performance island according to the control effort range of the first performance island. The first performance island can include a first CPU cluster of the first core type, where a second performance island includes a second CPU cluster of the first core type, where the second performance island corresponds to a different control effort range than the first performance island. Some embodiments include assigning the first CPU cluster as a preferred CPU cluster of the first thread group, and transmitting a first signal identifying the first CPU cluster as the preferred CPU cluster assigned to the first thread group.
Digital signal processor (DSP) with global and local interconnect architecture and reconfigurable hardware accelerator core
A digital signal processor (DSP) with a global and local interconnection architecture includes a plurality of DSP hardware accelerator cores each including at least one DSP module for performing a DSP operation. The DSP further includes a global interconnect for data transfer between non-adjacent ones of the DSP hardware accelerator cores and between the DSP hardware accelerator cores and processing elements external to the DSP hardware accelerator cores. The DSP further includes a local stream interconnect associated with each of the DSP hardware accelerator cores for data transfer within each core and for data transfer between adjacent ones of the DSP hardware accelerator cores.
Method of notifying a process or programmable atomic operation traps
Disclosed in some examples, are methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).
APPARATUS AND MECHANISM FOR PROCESSING NEURAL NETWORK TASKS USING A SINGLE CHIP PACKAGE WITH MULTIPLE IDENTICAL DIES
Apparatus and methods for processing neural network models are provided. The apparatus can comprise a plurality of identical artificial intelligence processing dies. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies can include at least one inter-die input block and at least one inter-die output block. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies is communicatively coupled to another artificial intelligence processing die among the plurality of identical artificial intelligence processing dies by way of one or more communication paths from the at least one inter-die output block of the artificial intelligence processing die to the at least one inter-die input block of the artificial intelligence processing die. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies corresponds to at least one layer of a neural network.