G06F15/80

METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A COHERENT VICTIM CACHE WITH PARALLEL DATA PATHS

Methods, apparatus, systems and articles of manufacture are disclosed facilitate read-modify-write support in a coherent victim cache with parallel data paths. An example apparatus includes a random-access memory configured to be coupled to a central processing unit via a first interface and a second interface, the random-access memory configured to obtain a read request indicating a first address to read via a snoop interface, an address encoder coupled to the random-access memory, the address encoder to, when the random-access memory indicates a hit of the read request, generate a second address corresponding to a victim cache based on the first address, and a multiplexer coupled to the victim cache to transmit a response including data obtained from the second address of the victim cache.

AGGRESSIVE WRITE FLUSH SCHEME FOR A VICTIM CACHE
20230004500 · 2023-01-05 ·

A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes: line type bits configured to store an indication that a corresponding cache line of the second sub-cache is configured to store write-miss data, and an eviction controller configured to evict a cache line of the second sub-cache storing write-miss data based on an indication that the cache line has been fully written.

COMPUTATIONAL MEMORY WITH COOPERATION AMONG ROWS OF PROCESSING ELEMENTS AND MEMORY THEREOF
20230004522 · 2023-01-05 ·

A computing device includes an array of processing elements mutually connected to perform single instruction multiple data (SIMD) operations, memory cells connected to each processing element to store data related to the SIMD operations, and a cache connected to each processing element to cache data related to the SIMD operations. Caches of adjacent processing elements are connected. The same or another computing device includes rows of mutually connected processing elements to share data. The computing device further includes a row arithmetic logic unit (ALU) at each row of processing elements. The row ALU of a respective row is configured to perform an operation with processing elements of the respective row.

MAPPING LOGICAL AND PHYSICAL PROCESSORS AND LOGICAL AND PHYSICAL MEMORY
20230237011 · 2023-07-27 ·

A mapping may be made between an array of physical processors and an array of functional logical processors. Also, a mapping may be made between logical memory channels (associated with the logical processors) and functional physical memory channels (associated with the physical processors). These mappings may be stored within one or more tables, which may then be used to bypass faulty processors and memory channels when implementing memory accesses, while optimizing locality (e.g., by minimizing the proximity of memory channels to processors).

MAPPING LOGICAL AND PHYSICAL PROCESSORS AND LOGICAL AND PHYSICAL MEMORY
20230237011 · 2023-07-27 ·

A mapping may be made between an array of physical processors and an array of functional logical processors. Also, a mapping may be made between logical memory channels (associated with the logical processors) and functional physical memory channels (associated with the physical processors). These mappings may be stored within one or more tables, which may then be used to bypass faulty processors and memory channels when implementing memory accesses, while optimizing locality (e.g., by minimizing the proximity of memory channels to processors).

3D Convolutional Neural Network (CNN) Implementation on Systolic Array-Based FPGA Overlay CNN Accelerator

Integrated circuit devices, methods, and circuitry are provided for enabling FPGA-based two-dimensional (2D) systolic array CNN accelerators to operate on three-dimensional (3D) input data having an extra dimension in temporal or spatial dimension. Technology, methods, and circuity for three-dimensional (3D) convolution, 3D folding, and 3D pooling are provided for the 3D CNN accelerators. A depth counter is provided to feed 3D input data and filter data through the 2D CNN accelerator to produce a 3D CNN accelerator that can efficiently operate on 3D input data.

System of Heterogeneous Reconfigurable Processors for the Data-Parallel Execution of Applications

A system for a data-parallel execution of at least two implementations of an application on reconfigurable processors with different layouts is presented. The system comprises a pool of reconfigurable data flow resources with data transfer resources that interconnect first and second reconfigurable processors having first and second layouts that impose respective first and second constraints for the data-parallel execution of the application. The system further comprises an archive of configuration files and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises first and second compilers that generate for the application, based on the respective first and second constraints, first and second configuration files that are stored in the archive of configuration files and adapted to be executed data-parallel compatible on respective first and second reconfigurable processors.

System of Heterogeneous Reconfigurable Processors for the Data-Parallel Execution of Applications

A system for a data-parallel execution of at least two implementations of an application on reconfigurable processors with different layouts is presented. The system comprises a pool of reconfigurable data flow resources with data transfer resources that interconnect first and second reconfigurable processors having first and second layouts that impose respective first and second constraints for the data-parallel execution of the application. The system further comprises an archive of configuration files and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises first and second compilers that generate for the application, based on the respective first and second constraints, first and second configuration files that are stored in the archive of configuration files and adapted to be executed data-parallel compatible on respective first and second reconfigurable processors.

Fuseload architecture for system-on-chip reconfiguration and repurposing

Methods, systems, and devices that support fuseload architectures for system-on-chip (SoC) reconfiguration and repurposing are described. Trim data may be loaded from fuses to registers on a die based on a fuse header. For example, a set of registers coupled with a set of fuses on the die may be identified, where the set of fuses may store trim data to be copied to the registers as part of a fuseload procedure. In such cases, one or more fuse headers may be identified within the trim data, and each fuse header may correspond to a fuse group that includes a subset of fuses. Based on one or more subfields within a fuse header, a mapping between fuse addresses and register addresses may be determined, and the trim data from each fuse group may be copied into a set of registers based on the mapping.

HYBRID MACHINE LEARNING ARCHITECTURE WITH NEURAL PROCESSING UNIT AND COMPUTE-IN-MEMORY PROCESSING ELEMENTS
20230025068 · 2023-01-26 ·

Methods and apparatus for performing machine learning tasks, and in particular, a hybrid architecture that includes both neural processing unit (NPU) and compute-in-memory (CIM) elements. One example neural-network-processing circuit generally includes a plurality of CIM processing elements (PEs), a plurality of neural processing unit (NPU) PEs, and a bus coupled to the plurality of CIM PEs and to the plurality of NPU PEs. One example method for neural network processing generally includes processing data in a neural-network-processing circuit comprising a plurality of CIM PEs, a plurality of NPU PEs, and a bus coupled to the plurality of CIM PEs and to the plurality of NPU PEs; and transferring the processed data between at least one of the plurality of CIM PEs and at least one of the plurality of NPU PEs via the bus.