G06F13/1605

Reducing save restore latency for power control based on write signals

A method of save-restore operations includes monitoring, by a power controller of a parallel processor (such as a graphics processing unit), of a register bus for one or more register write signals. The power controller determines that a register write signal is addressed to a state register that is designated to be saved prior to changing a power state of the parallel processor from a first state to a second state having a lower level of energy usage. The power controller instructs a copy of data corresponding to the state register to be written to a local memory module of the parallel processor. Subsequently, the parallel processor receives a power state change signal and writes state register data saved at the local memory module to an off-chip memory prior to changing the power state of the parallel processor.

Memory system
11593285 · 2023-02-28 · ·

A memory system includes a memory device, a memory controller configured to control the memory device, and an interface device configured to perform an interfacing operation for transmission of a control signal and data between the memory device and the memory controller. The interface device activates a blocking function for the interfacing operation in response to a configuration command of the memory controller including a blocking activation signal and performs an interface configuration operation in response to an interface configuration command of the memory controller while the blocking function is activated.

APPARATUS TO OPTIMIZE GPU THREAD SHARED LOCAL MEMORY ACCESS

One embodiment provides for a graphics processor comprising first logic coupled with a first execution unit, the first logic to receive a first single instruction multiple data (SIMD) message from the first execution unit; second logic coupled with a second execution unit, the second logic to receive a second SIMD message from the second execution unit; and third logic coupled with a bank of shared local memory (SLM), the third logic to receive a first request to access the bank of SLM from the first logic, a second request to access the bank of SLM from the second logic, and in a single access cycle, schedule a read access to a read port for the first request and a write access to a write port for the second request.

METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A COHERENT VICTIM CACHE WITH PARALLEL DATA PATHS

Methods, apparatus, systems and articles of manufacture are disclosed facilitate read-modify-write support in a coherent victim cache with parallel data paths. An example apparatus includes a random-access memory configured to be coupled to a central processing unit via a first interface and a second interface, the random-access memory configured to obtain a read request indicating a first address to read via a snoop interface, an address encoder coupled to the random-access memory, the address encoder to, when the random-access memory indicates a hit of the read request, generate a second address corresponding to a victim cache based on the first address, and a multiplexer coupled to the victim cache to transmit a response including data obtained from the second address of the victim cache.

AGGRESSIVE WRITE FLUSH SCHEME FOR A VICTIM CACHE
20230004500 · 2023-01-05 ·

A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes: line type bits configured to store an indication that a corresponding cache line of the second sub-cache is configured to store write-miss data, and an eviction controller configured to evict a cache line of the second sub-cache storing write-miss data based on an indication that the cache line has been fully written.

ASYNCHRONOUS PIPELINE MERGING USING LONG VECTOR ARBITRATION
20230004398 · 2023-01-05 ·

Devices and techniques for asynchronous pipeline merging are described herein. An apparatus, includes a memory controller, which includes merge circuitry; where the memory controller chiplet is configured to perform operations including those to: perform a bitwise logical operation on a first logging bit vector and a second logging bit vector to obtain a result vector, wherein the first logging bit vector is associated with a first pipeline and the second logging bit vector is associated with a second pipeline, and wherein bits in respective index positions of the first and second logging bit vectors represent transactions; select a completed transaction from the result vector using a round-robin technique; and forward the completed transaction from the set of completed transactions to an output pipeline.

Arbitration control for pseudostatic random access memory device

An arbitration control circuit in a pseudo-static random access memory (PSRAM) device includes a first arbiter circuit and a second arbiter circuit. The first arbiter circuit receives a normal access request signal and a refresh access request signal and generates a first output signal in response to a logical operation to arbitrate between the normal access reqeuest signal and the refresh access request signal. The second arbiter circuit configured to receive the first output signal and a delayed signal of the first output signal, and to generate a second output signal in response to a logical operation of the first output signal and the delayed signal. The second output signal has a first logical state indicative of granting the read or write access request and a second logical state indicative of granting the refresh access request to the memory cells of the PSRAM device.

SYSTEMS, DEVICES AND METHODS WITH OFFLOAD PROCESSING DEVICES
20230231811 · 2023-07-20 ·

A method can include receiving network packets including forwarding plane packets; evaluating header information of the network packets to map network packets to any of a plurality of destinations on the module, each destination corresponding to any of a plurality of services executed by offload processors of the module; configuring operations of the offload processors; and in response to forwarding plane packets, executing operations on the forwarding plane packets; wherein the receiving, evaluation and processing of the forwarding plane packets are performed independent of the host processor. Corresponding systems and methods are also disclosed.

PROGRAMMABLE ATOMIC OPERATOR RESOURCE LOCKING
20230215500 · 2023-07-06 ·

Devices and techniques for programmable atomic operator resource locking are described herein. A request for a programmable atomic operator (PAO) can be received at a memory controller that includes a programmable atomic unit (PAU). Here, the request includes an identifier for the PAO and a memory address. The memory addressed is processed to identify a lock value. A verification can be performed to determine that the lock value indicates that there is no lock corresponding to the memory address. Then, the lock value is set to indicate that there is now a lock corresponding to the memory address and the PAO is invoked based on the identifier for the PAO. In response to completion of the PAO, the lock value is set to indicate that there is no longer a lock corresponding to the memory address.

Victim cache that supports draining write-miss entries

A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.