G06F15/8038

Accelerator and electronic device including the same

An accelerator includes: a memory configured to store input data; a plurality of shift buffers each configured to shift input data received sequentially from the memory in each cycle, and in response to input data being stored in each of internal elements of the shift buffer, output the stored input data to a processing element (PE) array; a plurality of backup buffers each configured to store input data received sequentially from the memory and transfer the stored input data to one of the shift buffers; and the PE array configured to perform an operation on input data received from one or more of the shift buffers and on a corresponding kernel.

HIGH PERFORMANCE PROCESSOR FOR LOW-WAY AND HIGH-LATENCY MEMORY INSTANCES
20240143457 · 2024-05-02 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Write-through detection for a memory circuit with an analog bypass portion

The circuit includes a memory array arranged as rows and columns of memory cells. An array portion stores a respective memory word in a given one of the rows in response to a word-write signal corresponding to a write address of the given one of the rows and in response to a plurality of bit-write signals associated with the plurality of columns, and reads a respective memory word from a given one of the rows in response to a word-read signal corresponding to a read address of the given one of the rows and in response to a plurality of bit-read signals associated with the plurality of columns. The circuit also includes a write-through detection system that activates an analog bypass portion to read the memory word from the analog bypass portion in response to the read address being the same as the write address.

System for executing an application on heterogeneous reconfigurable processors

A system for executing an application on a pool of reconfigurable processors with first and second reconfigurable processors having first and second architectures that are different from each other is presented. The system comprises an archive of configuration files with first and second configuration files for executing the application on the first and second reconfigurable processors, respectively, and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises a runtime processor that allocates reconfigurable processors for executing the application and an auto-discovery module that is configured to perform discovery of whether the reconfigurable processors include at least one of the first reconfigurable processors and whether the reconfigurable processors include at least one of the second reconfigurable processors. The runtime processor starts execution of the first and second configuration files in dependence upon the discovery of the auto-discovery module.

Multi-core processor including a master core performing tasks involving operating system kernel-related features on behalf of slave cores

A multi-core processor comprises a plurality of slave cores, the slave cores being without operating system kernel-related features, and the slave cores to execute respective instructions. A master core configures the operating system kernel-related features on behalf of the slave cores. The master core is to control usage of the operating system kernel-related features during execution of the instructions on the respective slave cores.

MEMORY CIRCUIT WITH ANALOG BYPASS PORTION

One example includes a memory circuit. The circuit includes a memory array arranged as rows and columns of memory cells. An array portion stores a respective memory word in a given one of the rows in response to a word-write signal corresponding to a write address of the given one of the rows and in response to a plurality of bit-write signals associated with the plurality of columns, and reads a respective memory word from a given one of the rows in response to a word-read signal corresponding to a read address of the given one of the rows and in response to a plurality of bit-read signals associated with the plurality of columns. The circuit also includes a write-through detection system that activates an analog bypass portion to read the memory word from the analog bypass portion in response to the read address being equal to the write address.

In-memory associative processing for vectors

Methods, systems, and devices for in-memory associative processing for vectors are described. A device may perform a computational operation on a first set of contiguous bits of a first vector and a first set of contiguous bits of a second vector. The first sets of contiguous bits may be stored in a first plane of a memory die and the computational operation may be based on a truth table for the computational operation. The device may perform a second computational operation on a second set of contiguous bits of the first vector and a second set of contiguous bits of the second vector. The second sets of contiguous bits may be stored in a second plane of the memory die and the computational operation based on the truth table for the computational operation.

Processing System With Interspersed Processors With Multi-Layer Interconnection
20190155666 · 2019-05-23 ·

Embodiments of a multi-processor array are disclosed that may include a plurality of processors and configurable communication elements coupled together in a interspersed arrangement. Each configurable communication element may include a local memory and a plurality of routing engines. The local memory may be coupled to a subset of the plurality of processors. Each routing engine may be configured to receive one or more messages from a plurality of sources, assign each received message to a given destination of a plurality of destinations dependent upon configuration information, and forward each message to assigned destination. The plurality of destinations may include the local memory, and routing engines included in a subset of the plurality of configurable communication elements.

PROCESSORS, METHODS, AND SYSTEMS FOR DEBUGGING A CONFIGURABLE SPATIAL ACCELERATOR
20190095383 · 2019-03-28 ·

Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.

Processing system with interspersed processors with multi-layer interconnection

Embodiments of a multi-processor array are disclosed that may include a plurality of processors and configurable communication elements coupled together in a interspersed arrangement. Each configurable communication element may include a local memory and a plurality of routing engines. The local memory may be coupled to a subset of the plurality of processors. Each routing engine may be configured to receive one or more messages from a plurality of sources, assign each received message to a given destination of a plurality of destinations dependent upon configuration information, and forward each message to assigned destination. The plurality of destinations may include the local memory, and routing engines included in a subset of the plurality of configurable communication elements.