G06F9/3897

Hybrid block-based processor and custom function blocks

Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.

DISTRIBUTED ERROR DETECTION AND CORRECTION WITH HAMMING CODE HANDOFF
20220283942 · 2022-09-08 ·

A device includes a data path, a first interface connected to the data path and configured to receive a request from a processor package to write a data value to a memory address, and a controller connected to the data path and configured to receive the request to write the data value to the memory address and to calculate a Hamming code of the data value. The controller is configured to transmit the data value and the Hamming code on the data path. The device includes an external memory interleave connected to the data path. The external memory interleave is configured to receive the data value and calculate a test Hamming code of the data value and to determine whether to send the data value to an external memory interface to be written to the memory address based on a comparison of the Hamming code and the test Hamming code.

Methods, systems and apparatus for adjusting a data path element of a neural network accelerator from convolution mode to pooling mode

Methods, apparatus, systems, and articles of manufacture are disclosed to improve convolution efficiency of a convolution neural network (CNN) accelerator. An example hardware accelerator includes a hardware data path element (DPE) in a DPE array, the hardware DPE including an accumulator, and a multiplier coupled to the accumulator, the multiplier to multiply first inputs including an activation value and a filter coefficient value to generate a first convolution output when the hardware DPE is in a convolution mode, and a controller coupled to the DPE array, the controller to adjust the hardware DPE from the convolution mode to a pooling mode by causing at least one of the multiplier or the accumulator to generate a second convolution output based on second inputs, the second inputs including an output location value of a pool area, at least one of the first inputs different from at least one of the second inputs.

MULTI-PROCESSOR, MULTI-DOMAIN, MULTI-PROTOCOL, CACHE COHERENT, SPECULATION AWARE SHARED MEMORY AND INTERCONNECT

A device includes an interconnect and a plurality of devices connected to the interconnect. The plurality of devices includes a first interface connected to the interconnect and a second interface connected to the interconnect. The plurality of devices further includes a first memory bank connected to the interconnect and a second memory bank connected to the interconnect. The plurality of devices further includes an external memory interface connected to the interconnect and a controller configured to establish virtual channels among the plurality of devices connected to the interconnect.

Credit aware central arbitration for multi-endpoint, multi-core system

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to determine a first destination device connected to the data path and associated with the first memory access request and a first credit threshold corresponding to the first memory access request. The arbiter circuit is further configured to determine a second destination device connected to the data path and associated with the second memory access request and a second credit threshold corresponding to the second memory access request. The arbiter circuit is configured to arbitrate access to the data path by the first memory access request and the second memory access request based on the first credit threshold and the second credit threshold.

Adaptive credit-based replenishment threshold used for transaction arbitration in a system that supports multiple levels of credit expenditure
11429527 · 2022-08-30 · ·

A device includes an arbiter circuit configured to receive a first request for a resource. The first request is associated with a first credit cost. The arbiter circuit is further configured to receive a second request for the resource. The second request is associated with a second credit cost. The arbiter circuit is further configured to select the first request for the resource as an arbitration winner. The arbiter circuit is further configured to decrement a number of available credits associated with the resource by the first credit cost. The arbiter circuit is further configured to, in response to the number of available credits associated with the resource falling to a lower credit threshold, wait until the number of available credits associated with the resource reaches an upper credit threshold to select an additional arbitration winner for the resource.

Multicore, multibank, fully concurrent coherence controller

A system includes a multi-core shared memory controller (MSMC). The MSMC includes a snoop filter bank, a cache tag bank, and a memory bank. The cache tag bank is connected to both the snoop filter bank and the memory bank. The MSMC further includes a first coherent slave interface connected to a data path that is connected to the snoop filter bank. The MSMC further includes a second coherent slave interface connected to the data path that is connected to the snoop filter bank. The MSMC further includes an external memory master interface connected to the cache tag bank and the memory bank. The system further includes a first processor package connected to the first coherent slave interface and a second processor package connected to the second coherent slave interface. The system further includes an external memory device connected to the external memory master interface.

CONFIGURABLE CACHE FOR MULTI-ENDPOINT HETEROGENEOUS COHERENT SYSTEM
20220229779 · 2022-07-21 ·

A device includes a memory bank. The memory bank includes data portions of a first way group. The data portions of the first way group include a data portion of a first way of the first way group and a data portion of a second way of the first way group. The memory bank further includes data portions of a second way group. The device further includes a configuration register and a controller configured to individually allocate, based on one or more settings in the configuration register, the first way and the second way to one of an addressable memory space and a data cache.

Cognitively adaptable front-end with FPNA enabled integrated network executive
11387863 · 2022-07-12 · ·

A system-in-package or multi-chip module architecture that includes a field programmable neural array, that instantiates an instance of a neural network model trained to receive internal observation signals from a digitally controlled integrated circuit, and rapidly generate corresponding system settings to optimize desired characteristics. The system continuously adapts system settings to environmental conditions via a feedback loop of observation signals. The field programmable neural array may also receive observation signals external to the module to generate system settings based on factors not otherwise definable by internal signal characteristics.

MULTICORE SHARED CACHE OPERATION ENGINE

Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.