Patent classifications
G06F15/8023
Reconfigurable parallel processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Reconfigurable parallel processing with various reconfigurable units to form two or more physical data paths and routing data from one physical data path to a gasket memory to be used in a future physical data path as input
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
EXECUTION ENGINE FOR EXECUTING SINGLE ASSIGNMENT PROGRAMS WITH AFFINE DEPENDENCIES
The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.
NEURAL PROCESSING DEVICE AND LOAD/STORE METHOD OF NEURAL PROCESSING DEVICE
A neural processing device is provided. The neural processing device comprises: a processing unit configured to receive an input activation and a weight and perform a two-dimensional matrix calculation with the input activation and the weight to generate an output activation, a first memory, and a load-store unit (LSU) configured to perform memory access operations between the first memory and a second memory. The memory access operations include a main memory access operation for a current processing operation that is performed by the processing unit, and a standby memory access operation for a standby processing operation that is performed by the processing unit after the current processing operation. A level of the first memory is equal to a level of the processing unit, and a level of the second memory is different from the level of the first memory.
General-purpose parallel computing architecture
An apparatus includes multiple computing cores, where each computing core is configured to perform one or more processing operations and generate input data. The apparatus also includes multiple coprocessors associated with each computing core, where each coprocessor is configured to receive the input data from at least one of the computing cores, process the input data, and generate output data. The apparatus further includes multiple reducer circuits, where each reducer circuit is configured to receive the output data from each of the coprocessors of an associated computing core, apply one or more functions to the output data, and provide one or more results to the associated computing core. In addition, the apparatus includes multiple communication links communicatively coupling the computing cores and the coprocessors associated with the computing cores.
FUSELOAD ARCHITECTURE FOR SYSTEM-ON-CHIP RECONFIGURATION AND REPURPOSING
Methods, systems, and devices that support fuseload architectures for system-on-chip (SoC) reconfiguration and repurposing are described. Trim data may be loaded from fuses to registers on a die based on a fuse header. For example, a set of registers coupled with a set of fuses on the die may be identified, where the set of fuses may store trim data to be copied to the registers as part of a fuseload procedure. In such cases, one or more fuse headers may be identified within the trim data, and each fuse header may correspond to a fuse group that includes a subset of fuses. Based on one or more subfields within a fuse header, a mapping between fuse addresses and register addresses may be determined, and the trim data from each fuse group may be copied into a set of registers based on the mapping.
Vector computational unit
A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.
Processor element matrix performing maximum/average pooling operations
A processor is provided. The processor includes a plurality of processing elements configured to be arranged in a matrix form, and a controller configured to control the plurality of processing elements during a plurality of cycles to process a target data, control first processing elements so that each of the first processing elements operates data provided from adjacent first processing elements and the input first element and inputs each of second elements included in a second row among the plurality of elements to second processing elements arranged in the second row among the plurality of processing elements, control the second processing elements so that each of the second processing elements operates data provided from adjacent second processing elements and the input second element, and operates data provided from the adjacent first processing elements in the same column among the first processing elements and pre-stored operation data.
DATA PROCESSING SYSTEMS
A data processing system comprising a plurality of processing units. Each processing unit comprises a set of plural functional units and an internal communications network that routes communications between the functional units in a particular sequence order of the functional units. Each processing unit is connected to at least one other processing unit via a communications bridge that has at least two connections, a first connection that routes communications between a first pair of network nodes of the pair of processing units, and a separate, second connection that routes communications between a second, different pair of network nodes of the pair of processing units. Each connected pair of network nodes comprises network nodes having different positions in the internal communications network sequence order of the network nodes and/or network nodes associated with functional units of different types.
Reconfigurable Parallel Processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.