G06F2212/621

ARITHMETIC PROCESSOR AND METHOD FOR OPERATING ARITHMETIC PROCESSOR
20230010353 · 2023-01-12 · ·

An arithmetic processor including a plurality of core groups each including a plurality of cores and a cache unit, a plurality of home agents each including a tag directory and a store command queue and a store command queue. The store command queue enters the received store request to the entry queue in order of reception, the cache unit stores the data of the store request in a data RAM. The store command queue sets a data ownership acquisition flag of the store request to valid when obtaining a data ownership of the store request and issues a top-of-queue notification to the cache control unit when the flag of the top-of-queue entry is valid. In response to the top-of-queue notification, the cache unit update a cache tag to modified state and issue a store request completion notification.

Computer Memory Expansion Device and Method of Operation

A memory expansion device operable with a host computer system (host) comprises a non-volatile memory (NVM) subsystem, cache memory, and control logic configurable to receive a submission from the host including a read command and specifying a payload in the NVM subsystem and demand data in the payload. The control logic is configured to request ownership of a set of cache lines corresponding to the payload, to indicate completion of the submission after acquiring ownership of the cache lines, and to load the payload to the cache memory. The set of cache lines correspond to a set of cache lines in a coherent destination memory space accessible by the host. The control logic is further configured to, after indicating completion of the submission and in response to a request from the host to read demand data in the payload, return the demand data after determining that the demand data is in the cache memory.

Processing host write transactions using a non-volatile memory express controller memory manager
11550477 · 2023-01-10 · ·

Embodiments of the present disclosure generally relate to an NVMe storage device having a controller memory manager and a method of accessing an NVMe storage device having a controller memory manager. In one embodiment, a storage device comprises a non-volatile memory, a volatile memory, and a controller memory manager. The controller memory manager is operable to store one or more NVMe data structures within the non-volatile memory and the volatile memory.

Handling memory requests

A converter module is described which handles memory requests issued by a cache (e.g. an on-chip cache), where these memory requests include memory addresses defined within a virtual memory space. The converter module receives these requests, issues each request with a transaction identifier and uses that identifier to track the status of the memory request. The converter module sends requests for address translation to a memory management unit and where there the translation is not available in the memory management unit receives further memory requests from the memory management unit. The memory requests are issued to a memory via a bus and the transaction identifier for a request is freed once the response has been received from the memory. When issuing memory requests onto the bus, memory requests received from the memory management unit may be prioritized over those received from the cache.

Method for using victim buffer in cache coherent systems

In accordance with various aspects of the invention, a recall transaction is issued if a tag filter entry needs to be freed up for an incoming transaction. Directory entries chosen for a recall transaction are pushed into a fully associative structure called victim buffer. If this structure gets full, then an entry is selected from entries inside the victim buffer for the recall.

STREAMING ENGINE WITH FLEXIBLE STREAMING ENGINE TEMPLATE SUPPORTING DIFFERING NUMBER OF NESTED LOOPS WITH CORRESPONDING LOOP COUNTS AND LOOP OFFSETS
20230053842 · 2023-02-23 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

CONTROLLER WITH CACHING AND NON-CACHING MODES

An apparatus includes a CPU core, a first cache subsystem coupled to the CPU core, and a second memory coupled to the cache subsystem. The first cache subsystem includes a configuration register, a first memory, and a controller. The controller is configured to: receive a request directed to an address in the second memory and, in response to the configuration register having a first value, operate in a non-caching mode. In the non-caching mode, the controller is configured to provide the request to the second memory without caching data returned by the request in the first memory. In response to the configuration register having a second value, the controller is configured to operate in a caching mode. In the caching mode the controller is configured to provide the request to the second memory and cache data returned by the request in the first memory.

Memory access request for a memory protocol

A computer-implemented method includes identifying two or more memory locations and referencing, by a memory access request, the two or more memory locations. The memory access request is a single action pursuant to a memory protocol. The computer-implemented method further includes sending the memory access request from one or more processors to a node and fetching, by the node, data content from each of the two or more memory locations. The computer-implemented method further includes packaging, by the node, the data content from each of the two or more memory locations into a memory package, and returning the memory package from the node to the one or more processors. A corresponding computer program product and computer system are also disclosed.

Hybrid hardware-software coherent framework
11586369 · 2023-02-21 · ·

Examples herein describe an accelerator device that shares the same coherent domain as hardware elements in a host computing device. The embodiments herein describe a mix of hardware and software coherency which reduces the overhead of managing data when large chunks of data are moved from the host into the accelerator device. In one embodiment, an accelerator application executing on the host identifies a data set it wishes to transfer to the accelerator device to be processed. The accelerator application transfers ownership from a home agent in the host to the accelerator device. A slave agent can then take ownership of the data. As a result, any memory operation requests received from a requesting agent in the accelerator device can gain access to the data set in local memory via the slave agent without the slave agent obtaining permission from the home agent in the host.

System, device and method for accessing device-attached memory

A device connected to a host processor via a bus includes: an accelerator circuit configured to operate based on a message received from the host processor; and a controller configured to control an access to a memory connected to the device, wherein the controller is further configured to, in response to a read request received from the accelerator circuit, provide a first message requesting resolution of coherence to the host processor and prefetch first data from the memory.