Patent classifications
G06F12/0857
Method and system for accelerating storage of data in write-intensive computer applications
A method of optimising a service rate of a buffer in a computer system having memory stores of first and second type is described. The method selectively services the buffer by routing data to each of the memory store of the first type and the second type based on read/write capacity of the memory store of the first type.
REVERSE ORDER QUEUE UPDATES BY VIRTUAL DEVICES
A system includes a memory including a ring buffer having a plurality of slots, a processor in communication with the memory, a guest operating system, and a hypervisor. The hypervisor is configured to detect a request associated with a memory entry, retrieve up to a predetermined quantity of memory entries in the ring buffer from an original slot to an end slot, and test a respective descriptor of each successive slot from the original slot through the end slot while the respective descriptor of each successive slot in the ring buffer remains unchanged. Additionally, the hypervisor is configured to execute the request associated with the memory entries and respective valid descriptors. The hypervisor is also configured to walk the ring buffer backwards from the end slot to the original slot while clearing the valid descriptors.
Dynamically coalescing atomic memory operations for memory-local computing
Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
Executing an atomic primitive in a multi-core processor system
The present disclosure relates to a method for a computer system comprising a plurality of processor cores, including a first processor core and a second processor core, wherein a cached data item is assigned to a first processor core, of the plurality of processor cores, for exclusively executing an atomic primitive. The method includes receiving, from a second processor core at a cache controller, a request for accessing the data item, and in response to determining that the execution of the atomic primitive is not completed by the first processor core, returning a rejection message to the second processor core.
Interconnected systems fence mechanism
An apparatus to facilitate memory barriers is disclosed. The apparatus comprises an interconnect, a device memory, a plurality of processing resources, coupled to the device memory, to execute a plurality of execution threads as memory data producers and memory data consumers to a device memory and a system memory and fence hardware to generate fence operations to enforce data ordering on memory operations issued to the device memory and a system memory coupled via the interconnect.
Configurable cache for multi-endpoint heterogeneous coherent system
A device includes a memory bank. The memory bank includes data portions of a first way group. The data portions of the first way group include a data portion of a first way of the first way group and a data portion of a second way of the first way group. The memory bank further includes data portions of a second way group. The device further includes a configuration register and a controller configured to individually allocate, based on one or more settings in the configuration register, the first way and the second way to one of an addressable memory space and a data cache.
High-throughput software-defined convolutional interleavers and de-interleavers
High-throughput software-defined convolutional interleavers and de-interleavers are provided herein. In some examples, a method for generating convolutionally interleaved samples on a general purpose processor with cache is provided. Memory is represented as a three dimensional array, indexed by block number, row, and column. Input samples may be written to the cache according to an indexing scheme. Output samples may be generated every MN samples by reading out the samples from the cache in a transposed and vectorized order.
Arithmetic processing device and arithmetic processing method
An arithmetic processing device including: request issuing units configured to issue an access request to a storage; and banks each of which includes: a first cache area including first entries; a second cache area including second entries; a control unit; and a determination unit that determines a cache hit or a cache miss for each of the banks, wherein the control unit performs: in response that the access requests simultaneously received from the request issuing units make the cache miss, storing the data, which is read from the storage device respectively by the access requests, in one of the first entries and one of the second entries; and in response that the access requests simultaneously received from the request issuing units make the cache hit in the first and second cache areas, outputting the data retained in the first and second entries, to each of issuers of the access requests.
MEMORY CONTROLLER WITH TRANSACTION-QUEUE-DEPENDENT POWER MODES
A memory controller component of a memory system stores memory access requests within a transaction queue until serviced so that, over time, the transaction queue alternates between occupied and empty states. The memory controller transitions the memory system to a low power mode in response to detecting the transaction queue is has remained in the empty state for a predetermined time. In the transition to the low power mode, the memory controller disables oscillation of one or more timing signals required to time data signaling operations within synchronous communication circuits of one or more attached memory devices and also disables one or more power consuming circuits within the synchronous communication circuits of the one or more memory devices.
COMMUNICATING A PROGRAMMABLE ATOMIC OPERATOR TO A MEMORY CONTROLLER
Devices and techniques for communicating a programmable atomic operator to a memory controller are described herein. A memory controller can receive a memory request and extract a command indicator that indicates a programmable atomic operator (PAO) command from the memory request. The memory controller can then extract a PAO index from the request and invoke the PAO based on the PAO index.