G06F15/8038

System of Heterogeneous Reconfigurable Processors for the Data-Parallel Execution of Applications

A system for a data-parallel execution of at least two implementations of an application on reconfigurable processors with different layouts is presented. The system comprises a pool of reconfigurable data flow resources with data transfer resources that interconnect first and second reconfigurable processors having first and second layouts that impose respective first and second constraints for the data-parallel execution of the application. The system further comprises an archive of configuration files and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises first and second compilers that generate for the application, based on the respective first and second constraints, first and second configuration files that are stored in the archive of configuration files and adapted to be executed data-parallel compatible on respective first and second reconfigurable processors.

COMPILER FOR A PARALLEL PROCESSOR
20230035474 · 2023-02-02 ·

A method for concurrently performing multiple computations in an associative processing unit (APU) includes having data in two matrices, representing data in two portions of a memory array of the APU, creating a Tartan matrix by computing an outer product between a first bit vector indicating selected rows and a second bit vector indicating selected columns, the Tartan matrix representing data stored in a third portion of the memory array wherein all cells having a value 1 in the Tartan matrix indicate selected cells, concurrently activating all cells of the matrices and storing a result of Boolean operations therebetween in one of the two matrices, wherein a new value is obtained on cells located at a same row and a same column as the selected cells in the Tartan matrix and an original value remains on other cells.

PROCESSING SYSTEM FOR A VEHICLE

A processing system for use in a vehicle comprises numerous simple processors, numerous parallel processors, an interface for connecting to a communication bus in the vehicle, and a monitoring device that is connected to the interface and each of the processors. The monitoring device is designed for redundant configuration of the simple processors and/or the parallel processors in relation to one another.

ACCELERATOR AND ELECTRONIC DEVICE INCLUDING THE SAME

An accelerator includes: a memory configured to store input data; a plurality of shift buffers each configured to shift input data received sequentially from the memory in each cycle, and in response to input data being stored in each of internal elements of the shift buffer, output the stored input data to a processing element (PE) array; a plurality of backup buffers each configured to store input data received sequentially from the memory and transfer the stored input data to one of the shift buffers; and the PE array configured to perform an operation on input data received from one or more of the shift buffers and on a corresponding kernel.

Processing system with interspersed processors with multi-layer interconnection

Embodiments of a multi-processor array are disclosed that may include a plurality of processors and configurable communication elements coupled together in a interspersed arrangement. Each configurable communication element may include a local memory and a plurality of routing engines. The local memory may be coupled to a subset of the plurality of processors. Each routing engine may be configured to receive one or more messages from a plurality of sources, assign each received message to a given destination of a plurality of destinations dependent upon configuration information, and forward each message to assigned destination. The plurality of destinations may include the local memory, and routing engines included in a subset of the plurality of configurable communication elements.

Mainboard and server

A mainboard and a server are provided. The mainboard includes: a board body, a preset number of Purley platform central processors, and one or more memories. The preset number of Purley platform central processors and the one or more memories are installed on the board body. The Purley platform central processors are sequentially connected with each other, and each of the memories is connected to one of the Purley platform central processors. Each of the memories is configured to receive to-be-burned data inputted from outside and transmit the to-be-burned data to the Purley platform central processor connected with the memory. Each of the Purley platform central processors is configured to burn the to-be-burned data when receiving the to-be-burned data transmitted by the connected memory connected with the Purley platform central processor, to have a function corresponding to the to-be-burned data.

Single-chip multi-processor communication

A heterogeneous multi-core integrated circuit comprising two or more processors, at least one of the processors being a general purpose CPU and at least one of the processors being a specialized hardware processing engine, the processors being connected by a processor local bus on the integrated circuit, wherein the general purpose CPU is configured to generate a first instruction for an atomic operation to be performed by a second processor, different from the general purpose CPU, the first instruction comprising an address of the second processor and a first command indicating a first action to be executed by the second processor, and transmit the first instruction to the second processor over the processor local bus. The first command may include the first action, or may be a descriptor of the first action or a pointer to where the first action may be found in a memory.

MEMORY-BASED DISTRIBUTED PROCESSOR ARCHITECTURE
20210365334 · 2021-11-25 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

System for Executing an Application on Heterogeneous Reconfigurable Processors

A system for executing an application on a pool of reconfigurable processors with first and second reconfigurable processors having first and second architectures that are different from each other is presented. The system comprises an archive of configuration files with first and second configuration files for executing the application on the first and second reconfigurable processors, respectively, and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises a runtime processor that allocates reconfigurable processors for executing the application and an auto-discovery module that is configured to perform discovery of whether the reconfigurable processors include at least one of the first reconfigurable processors and whether the reconfigurable processors include at least one of the second reconfigurable processors. The runtime processor starts execution of the first and second configuration files in dependence upon the discovery of the auto-discovery module.

APPARATUS AND METHOD FOR DYNAMIC CONTROL OF MICROPROCESSOR CONFIGURATION

An apparatus and method for intelligently scheduling threads across a plurality of logical processors. For example, one embodiment of a processor comprises: a plurality of cores; one or more peripheral component interconnects to couple the plurality of cores to memory, and in response to a core configuration command to deactivate a core of the plurality of cores, a region within the memory is updated with an indication of deactivation of the core.