G06F13/1621

State buffer memloc reshaping

A computer-implemented method includes identifying, from instruction code for executing by a computing system to implement a neural network, a first instruction for allocating a first region of a local memory of an accelerator of the computing system to a tensor, and a first direct memory access (DMA) load instruction for loading the tensor from a location of a system memory of the computing system to a second region of the local memory; adding a first tensor copy instruction in the instruction code to save the tensor in the first region of the local memory to a third region of the local memory that has dimensions different from dimensions of the first region; and replacing the first DMA load instruction with a second tensor copy instruction for saving data in the third region of the local memory to the second region of the local memory.

IMMEDIATE OFFSET OF LOAD STORE AND ATOMIC INSTRUCTIONS

One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.

HIGH-PERFORMANCE ON-CHIP MEMORY CONTROLLER
20230090429 · 2023-03-23 ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling, by an on-chip memory controller, a plurality of hardware components that are configured to perform computations to access a shared memory. One of the on-chip memory controller includes at least one backside arbitration controller communicatively coupled with a memory bank group and a first hardware component, wherein the at least one backside arbitration controller is configured to perform bus arbitrations to determine whether the first hardware component can access the memory bank group using a first memory access protocol; and a frontside arbitration controller communicatively coupled with the memory bank group and a second hardware component, wherein the frontside arbitration controller is configured to perform bus arbitrations to determine whether the second hardware component can access the memory bank group using a second memory access protocol different from the first memory access protocol.

DYNAMIC COMPRESSION FOR MULTIPROCESSOR PLATFORMS AND INTERCONNECTS

The present disclosure provides an interconnect for a non-uniform memory architecture platform to provide remote access where data can dynamically and adaptively be compressed and decompressed at the interconnect link. A requesting interconnect link can add a delay to before transmitting requested data onto an interconnect bus, compress the data before transmission, or packetize and compress data before transmission. Likewise, a remote interconnect link can decompress request data.

Hierarchical arbitration structure
11636056 · 2023-04-25 · ·

An apparatus including a plurality of set arbitration circuits and a die arbitration circuit. The set arbitration circuits may each be configured to receive first commands and second commands and comprise a bank circuit configured to queue bank data in response to client requests and a set arbitration logic configured to queue the second commands in response to the bank data. The die arbitration circuit may be configured to receive the commands from the set arbitration circuits and comprise a die-bank circuit configured to queue die data in response to the client requests and a die arbitration logic configured to queue the second commands in response to the die data. Queuing the bank data and the die data for the second commands may maintain an order of the client requests and prioritize the first commands corresponding to a current controller over the first commands corresponding to a non-current controller.

PREFETCHER TRAINING

An apparatus comprises a cache to store information, items of information in the cache being associated with addresses; cache lookup circuitry to perform lookups in the cache; and a prefetcher to prefetch items of information into the cache in advance of an access request being received for said items of information. The prefetcher selects addresses to train the prefetcher. In response to determining that a cache lookup specifying a given address has resulted in a hit and determining that a cache lookup previously performed in response to a prefetch request issued by the prefetcher for the given address resulted in a hit, the prefetcher selects the given address as an address to be used to train the prefetcher.

METHODS, SYSTEMS AND COMPUTER READABLE MEDIA FOR IMPROVING REMOTE DIRECT MEMORY ACCESS PERFORMANCE

The subject matter described herein includes methods, systems, and computer readable media for improving remote direct memory access (RDMA) performance. A method for improving RDMA performance occurs at an RDMA node utilizing a user space and a kernel space for executing software. The method includes posting, by an application executing in the user space, an RDMA work request including a data element indicating a plurality of RDMA requests associated with the RDMA work request to be generated by software executing in the kernel space; and generating and sending, by the software executing in the kernel space, the plurality of RDMA requests to or via a system under test (SUT).

Memory system for determining whether to control a point of execution time of a command based on valid page counts of target memory blocks and operating method thereof
11604734 · 2023-03-14 · ·

Embodiments of the disclosure relate to a memory system and an operating method thereof. The memory system is configured to select, among the plurality of memory blocks, one or more target memory blocks operable to store user data to be accessed by a host which requests the memory system to write data, and determine whether to control a point of execution time of a command received from the host, based on valid page counts of respective target memory blocks.

TRANSMISSION OF USB DATA IN A DATA STREAM
20230104594 · 2023-04-06 · ·

In a method for transfer of USB data in a data stream, which includes streaming data, the streaming data and the USB data, which includes a number of USB packets with a first number of bits, is received. The USB data is divided to create a number of transfer packets with a second number of bits of USB data. A transfer packet is inserted into the data stream, and the assembled data stream is transferred. An interruption pattern is added to at least an initial transfer packet to signal that USB data is present in the data stream.

Memory controller and operating method thereof

A memory controller may include: a request checker identifying memory devices corresponding to requests received from a host among the plurality of memory devices and generating device information on the identified memory devices to perform operations corresponding to the requests; a dummy manager outputting a request for controlling a dummy pulse to be applied to channels of selected memory devices according to the device information among the plurality of channels; and a dummy pulse generator sequentially applying the dummy pulse to the channels coupled to the selected memory devices, based on the request for controlling the dummy pulse. A memory controller may include an idle time monitor outputting an idle time interval of the memory device and a clock signal generator generating a clock signal based on the idle time interval and outputting the clock signal to the memory device through the channel to perform a current operation.