IPIQ

G06F15/80

Control barrier network for reconfigurable data processors

11580056 · 2023-02-14 ·

SambaNova Systems, Inc.

A processing system comprises a control bus and a plurality of logic units. The control bus is configurable by configuration data to form signal routes in a control barrier network coupled to processing units in an array of processing units. The plurality of logic units has inputs and outputs connected to the control bus and to the array of processing units. A logic unit in the plurality of logic units is operatively coupled to a processing unit in the array of processing units and is configurable by the configuration data to consume source tokens and a status signal from the processing unit on the inputs and to produce barrier tokens and an enable signal on the outputs based on the source tokens and the status signal on the inputs.

Multi-port memory architecture for a systolic array

11580059 · 2023-02-14 ·

Marvell Asia Pte. Ltd.

A memory architecture and a processing unit that incorporates the memory architecture and a systolic array. The memory architecture includes: memory array(s) with multi-port (MP) memory cells; first wordlines connected to the cells in each row; and, depending upon the embodiment, second wordlines connected to diagonals of cells or diagonals of sets of cells. Data from a data input matrix is written to the memory cells during first port write operations using the first wordlines and read out from the memory cells during second port read operations using the second wordlines. Due to the diagonal orientation of the second wordlines and due to additional features (e.g., additional rows of memory cells that store static zero data values or read data mask generators that generate read data masks), data read from the memory architecture and input directly into a systolic array is in the proper order, as specified by a data setup matrix.

System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network

11579887 · 2023-02-14 ·

Micron Technology, Inc.

Tony M. Brewer

Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.

System having a hybrid threading processor, a hybrid threading fabric having configurable computing elements, and a hybrid interconnection network

11579887 · 2023-02-14 ·

Micron Technology, Inc.

Tony M. Brewer

Technologies for providing shared memory for accelerator sleds

11579788 · 2023-02-14 ·

Intel Corporation

Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.

Discrete Three-Dimensional Processor

20230038812 · 2023-02-09 ·

HangZhou HaiCun Information Technology Co., Ltd.

Guobiao Zhang

A discrete three-dimensional (3-D) processor comprises stacked first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). In one preferred embodiment, the first and second dice are face-to-face bonded. In another preferred embodiment, the first and second dice have a same die size.

Discrete Three-Dimensional Processor

20230039565 · 2023-02-09 ·

HangZhou HaiCun Information Technology Co., Ltd.

Guobiao Zhang

A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.

Discrete Three-Dimensional Processor

20230041616 · 2023-02-09 ·

HangZhou HaiCun Information Technology Co., Ltd.

Guobiao Zhang

A discrete three-dimensional (3-D) processor comprises stacked first and second dice. The first die comprises three-dimensional memory (3D-M) arrays, whereas the second die comprises at least a portion of a logic/processing circuit and an off-die peripheral-circuit component of the 3D-M array(s). The preferred 3-D processor can be used to compute non-arithmetic function/model. In other applications, the preferred 3-D processor may also be a 3-D configurable computing array, a 3-D pattern processor, or a 3-D neuro-processor.

OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP

20230042858 · 2023-02-09 ·

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

BUILT-IN SELF-TEST FOR A PROGRAMMABLE VISION ACCELERATOR OF A SYSTEM ON A CHIP

20230037738 · 2023-02-09 ·

Patent classifications

G06F15/80