G06F13/1673

Data through gateway

A gateway for use in a computing system to interface a host with the subsystem for acting as a work accelerator to the host, the gateway having: an accelerator interface for connection to the subsystem to enable transfer of batches of data between the subsystem and the gateway; a data connection interface for connection to external storage for exchanging data between the gateway and storage; a gateway interface for connection to at least one second gateway; a memory interface connected to a local memory associated with the gateway; and a streaming engine for controlling the streaming of batches of data into and out of the gateway in response to pre-compiled data exchange synchronisation points attained by the subsystem, wherein the streaming of batches of data are selectively via at least one of the accelerator interface, data connection interface, gateway interface and memory interface.

Semiconductor memory systems with on-die data buffering

A semiconductor memory system includes a first semiconductor memory die and a second semiconductor memory die. The first semiconductor memory die includes a primary data interface to receive an input data stream during write operations and to deserialize the input data stream into a first plurality of data streams, and also includes a secondary data interface, coupled to the primary data interface, to transmit the first plurality of data streams. The second semiconductor memory die includes a secondary data interface, coupled to the secondary data interface of the first semiconductor memory die, to receive the first plurality of data streams.

State buffer memloc reshaping

A computer-implemented method includes identifying, from instruction code for executing by a computing system to implement a neural network, a first instruction for allocating a first region of a local memory of an accelerator of the computing system to a tensor, and a first direct memory access (DMA) load instruction for loading the tensor from a location of a system memory of the computing system to a second region of the local memory; adding a first tensor copy instruction in the instruction code to save the tensor in the first region of the local memory to a third region of the local memory that has dimensions different from dimensions of the first region; and replacing the first DMA load instruction with a second tensor copy instruction for saving data in the third region of the local memory to the second region of the local memory.

Data storage device and method for adaptive command completion posting
11487434 · 2022-11-01 · ·

Systems and methods for dynamic and adaptive interrupt coalescing are disclosed. NVM Express (NVMe) implements a paired submission queue and completion queue mechanism, with host software on the host device placing commands into the submission queue. The memory device notifies the host device, via an interrupt, of entries on the completion queue. Responsive to receiving the interrupt, the host device access the completion queue to access entries placed by the memory device therein. The host device may take a certain amount of time to service the interrupt resulting in host latency. Given knowledge of the host latency, the memory device time the sending of the interrupt so that, given the host latency, the memory device may post the entry to the completion queue in a timely manner.

Systems, methods, and apparatus to enable data aggregation and adaptation in hardware acceleration subsystems

Methods, apparatus, systems, and articles of manufacture are disclosed herein to enable data aggregation and pattern adaptation in hardware acceleration subsystems. In some examples, a hardware acceleration subsystem includes a first scheduler, a first hardware accelerator coupled to the first scheduler to process at least a first data element and a second data element, and a first load store engine coupled to the first hardware accelerator, the first load store engine configured to communicate with the first scheduler at a superblock level by sending a done signal to the first scheduler in response to determining that a block count is equal to a first BPR value and aggregate the first data element and the second data element based on the first BPR value to generate a first aggregated data element.

Data storage system capable of performing interleaving scatter transmissions or interleaving gather transmissions
11487659 · 2022-11-01 · ·

A data storage system includes a first memory, a second memory, and a memory controller. The memory controller transmits a first data segment from the first memory to the second memory according to an initial address, adds a first interval value to the initial address to generate a succeeding address, and updates a stream number. When the stream number has not reached a target stream number, the memory controller transmits second data segment from the first memory to the second memory according to the succeeding address, and updates the stream number. When the stream number has reached the target stream number, the memory controller sets the stream number to an initial value, adds an offset value to the initial address to update the succeeding address, and transmits a third data segment from the first memory to the second memory according to the updated succeeding address.

APPARATUS AND METHOD FOR DATA COMMUNICATIONS BETWEEN NON-VOLATILE MEMORY DEVICES AND A MEMORY CONTROLLER
20220350762 · 2022-11-03 ·

A data communication apparatus includes a transceiver coupled to a data path and configured to transmit or receive data through the data path; and an interrupt circuit coupled to an interrupt path corresponding to the data path and configured to determine whether to allow any apparatus to occupy the data path. The interrupt circuit generates an interrupt signal for preventing another apparatus from accessing the data path, in response to an activation signal for transmitting or receiving the data through the transceiver.

Sideband Information Over Host Interface Considering Link States
20230090103 · 2023-03-23 ·

A data storage device includes a memory device and a controller coupled to the memory device. The controller is configured to create one or more thresholds for sending sideband information to a host device, determine that a link state is in a state other than L0, retain sideband information until the one or more thresholds is reached, and send the sideband information to the host device upon reaching the one or more thresholds for a corresponding link state. The one or more thresholds correspond to a link state between the host device and the data storage device. The thresholds are either based on an amount of sideband information retained, a time of retaining sideband information, or a combination of the amount of sideband information retained and the time of retaining sideband information. The sideband information is retained and sent in a first-in first-out order.

ARITHMETIC PROCESSING DEVICE AND MEMORY ACCESS METHOD
20230089332 · 2023-03-23 · ·

A memory is accessed based on memory access requests that has different data read sizes. A memory access method includes outputting each of read commands corresponding to the plurality of memory access requests to a memory at a timing that avoids conflict of read data output from the memory; generating an output start timing of the data read from the memory to an outside; retaining the data read from the memory in each of buffers, and causing any of the plurality of buffers to output data based on the output start timing; and delaying, in a case of receiving a subsequent memory access request during execution of memory access corresponding to a preceding memory access request, the output start timing of data from the buffer corresponding to the subsequent memory access request from the output start timing of data from the buffer corresponding to the preceding memory access request.

METHOD AND APPARATUS FOR PAGE VALIDITY MANAGEMENT AND RELATED STORAGE SYSTEM
20220342811 · 2022-10-27 ·

A method of performing a garbage collection operation on a source block includes: performing a plurality of partial page clean operations during a series of host write operations. Each partial clean operation includes: performing a validity check process within a partitioned searching range of the source block to obtain valid page information; and performing a page clean process according to the valid page information and a target clean page number to read valid pages indicated by the valid page information.