G06F12/0846

Adaptive Address Tracking
20230052043 · 2023-02-16 · ·

Described apparatuses and methods track access metadata pertaining to activity within respective address ranges. The access metadata can be used to inform prefetch operations within the respective address ranges. The prefetch operations may involve deriving access patterns from access metadata covering the respective ranges. Suitable address range sizes for accurate pattern detection, however, can vary significantly from region to region of the address space based on, inter alia, workloads produced by programs utilizing the regions. Advantageously, the described apparatuses and methods can adapt the address ranges covered by the access metadata for improved prefetch performance. A data structure may be used to manage the address ranges in which access metadata are tracked. The address ranges can be adapted to improve prefetch performance through low-overhead operations implemented within the data structure. The data structure can encode hierarchical relationships that ensure the resulting address ranges are distinct.

VOLATILE MEMORY TO NON-VOLATILE MEMORY INTERFACE FOR POWER MANAGEMENT
20230046808 · 2023-02-16 ·

Systems, methods, and apparatus related to a memory system that manages an interface for a volatile memory device and a non-volatile memory device to control memory system power. In one approach, a controller evaluates a demand on memory performance. If the demand of a current computation task needed by the host is high, a DRAM device is powered-up to meet the demand. Otherwise, if the non-volatile memory device is adequate to meet the demand, the DRAM memory is partially or fully-powered down to save power. In another approach, a task performed for a host device uses one or more resources of a first memory device (e.g., DRAM). A performance capability of a second memory device (e.g., NVRAM) is determined. A controller of the memory system determines whether the performance capability of the second memory device is adequate to service the task. In response to determining that the performance capability is adequate, the controller changes a mode of operation of the memory system so that one or more resources of the second memory device are used to service the task.

LOOK-UP TABLE READ

A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.

Transparent interleaving of compressed cache lines

Low latency in a non-uniform cache access (“NUCA”) cache in a computing environment is provided. A first compressed cache line is interleaved with a second compressed cache line into a single cache line of the NUCA cache, where data of the first compressed cache line is stored in one or more even sectors in the single cache line and stored in zero or more odd sectors in the single cache line after the data fills the one or more even sectors, and data of the second compressed cache line is stored in the one or more odd sectors in the single cache line and stored in zero or more even sectors in the single cache line after the data fills the one or more odd sectors.

DETERMINISTIC MIXED LATENCY CACHE
20230101038 · 2023-03-30 · ·

A method and processing device for accessing data is provided. The processing device comprises a cache and a processor. The cache comprises a first data section having a first cache hit latency and a second data section having a second cache hit latency that is different from the first cache hit latency of the first data section. The processor is configured to request access to data in memory, the data corresponding to a memory address which includes an identifier that identifies the first data section of the cache. The processor is also configured to load the requested data, determined to be located in the first data section of the cache, according to the first cache hit latency of the first data section of the cache.

METHOD FOR EXECUTING ATOMIC MEMORY OPERATIONS WHEN CONTESTED
20230033550 · 2023-02-02 ·

Described are methods and a system for atomic memory operations with contended cache lines. A processing system includes at least two cores, each core having a local cache, and a lower level cache in communication with each local cache. One local cache configured to request a cache line to execute an atomic memory operation (AMO) instruction, receive the cache line via the lower level cache, receive a probe downgrade due to other local cache requesting the cache line prior to execution of the AMO, and send the AMO instruction to the lower level cache for remote execution in response to the probe downgrade.

INTEGRATED CIRCUITS (IC) EMPLOYING SUBSYSTEM SHARED CACHE MEMORY FOR FACILITATING EXTENSION OF LOW-POWER ISLAND (LPI) MEMORY AND RELATED METHODS

Integrated circuits (ICs) employ subsystem shared cache memory for facilitating extension of low-power island (LPI) memory. An LPI subsystem and primary subsystems access a memory subsystem on a first access interface in a first power mode and the LPI subsystem accesses the memory subsystem by a second access interface in the low power mode. In the first power mode, the primary subsystems and the LPI subsystem may send a subsystem memory access request including a virtual memory address to a subsystem memory interface of the memory subsystem to access either data stored in an external memory or a version of the data stored in a shared memory circuit. In the low-power mode, the LPI subsystem sends an LPI memory access request including a direct memory address to an LPI memory interface of the memory subsystem to access the shared memory circuit to extend the LPI memory.

Managing workload of programming sets of pages to memory device

A system includes a memory device having multiple dice and a processing device operatively coupled to the memory device. The processing device is to perform operations, including receiving a memory operation to program a set of pages of data across at least a subset of the plurality of dice. The operations further include partitioning the set of pages into a set of partitions, programming the set of partitions to the plurality of dice, and storing, in a metadata table, at least one bit to indicate that the set of pages is partitioned.

Techniques for generating a system cache partitioning policy
11609860 · 2023-03-21 · ·

In various embodiments, a computing system includes, for example, a plurality of processing units that share access to a system cache. A cache management application receives, for example, resource savings information for each processing unit. The resource savings information indicates, for example, amounts of a resource (e.g., power) that are saved when different units of the system cache are allocated to a processing unit. The cache management application determines, for example, the number of units of system cache to allocate to each processing unit based on the received resource savings information.

USING IDLE CACHES AS A BACKING STORE FOR BOOT CODE

The present disclosure may include a processor that uses idle caches as a backing store for a boot code. The processor designates a boot core and an active cache from a plurality of cores and a plurality of caches. The processor configures remaining caches from the plurality of caches to act as a backing store memory. The processor modifies the active cache to convert cast outs to a system memory into lateral cast outs to the backing store memory. The processor copies a boot image to the backing store memory and executes the boot image by the boot core.