G06F2212/1041

Communicating a programmable atomic operator to a memory controller
11614891 · 2023-03-28 · ·

Devices and techniques for communicating a programmable atomic operator to a memory controller are described herein. A memory controller can receive a memory request and extract a command indicator that indicates a programmable atomic operator (PAO) command from the memory request. The memory controller can then extract a PAO index from the request and invoke the PAO based on the PAO index.

Adaptive Address Tracking
20230088638 · 2023-03-23 · ·

Described apparatuses and methods track access metadata pertaining to activity within respective address ranges. The access metadata can be used to inform prefetch operations within the respective address ranges. The prefetch operations may involve deriving access patterns from access metadata covering the respective ranges. Suitable address range sizes for accurate pattern detection, however, can vary significantly from region to region of the address space based on, inter alia, workloads produced by programs utilizing the regions. Advantageously, the described apparatuses and methods can adapt the address ranges covered by the access metadata for improved prefetch performance. A data structure may be used to manage the address ranges in which access metadata are tracked. The address ranges can be adapted to improve prefetch performance through low-overhead operations implemented within the data structure. The data structure can encode hierarchical relationships that ensure the resulting address ranges are distinct.

DRAM-AWARE CACHING

Data caching may include storing data associated with DRAM transaction requests in data storage structures organized in a manner corresponding to the DRAM bank, bank group and rank organization. Data may be selected for transfer to the DRAM by selecting among the data storage structures.

MEMORY SYSTEM
20230093251 · 2023-03-23 · ·

A memory system includes a first volatile memory having an access unit of a first bit width; a second volatile memory having an access unit of the first bit width and having a capacity larger than the first volatile memory; and a controller connected to the first and second volatile memories. The controller allocates a first address space having the first bit width as a unit to the first volatile memory, allocates a second address space having the first bit width as a unit to the second volatile memory, selects at least one of the first and second volatile memories based on a first address indicating a position in a third address space having a second bit width as a unit, calculates a second address in the address space allocated to the selected volatile memory, and accesses a position corresponding to the second address of the selected volatile memory.

DATA TRANSFORMATION AND QUALITY CHECKING

Data transformation and data quality checking is provided by reading data from a source datastore and storing the data into memory, performing in-memory processing of the data stored in memory, where the data is maintained in-memory for performance of the in-memory processing thereof, and where the in-memory processing includes performing one or more transformations on the data stored in memory, in which the data stored in memory is transformed and stored back into the memory and applying one or more data quality rules to the data stored in-memory, and based on performing the in-memory processing of the data stored and maintained in memory for the in-memory processing, loading to a target datastore at least some of the data processed by the in-memory processing.

Techniques for configuring parallel processors for different application domains

In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

ACCESS CONTROL CONFIGURATIONS FOR INTER-PROCESSOR COMMUNICATIONS
20220342729 · 2022-10-27 ·

Methods, systems, and devices for access control configurations for inter-processor communications are described to support reconfiguration of a dynamic access control configuration at a device. For example, additional configuration fields may be added to existing access control rules of the device, where these additional fields may be configured by a processor sending information to a receiving processor, via a shared memory resource or region of the device. The additional fields may include a read-only value which may specify a processor which has exclusive write permission for a memory region of the share memory. This value may indicate the sending processor of the memory region, and the value may be set by access control hardware when the additional field is changed. Other processors of the device may be prevented from writing to the memory region.

CACHE REFRESH SYSTEM AND PROCESSES
20230081780 · 2023-03-16 ·

The present disclosure relates generally to computer systems and, more particularly, to a cache refresh system and related processes and methods of use. The method of refreshing data in cache memory includes: setting, by a computer system, a refresh indicator to “true”; refreshing data in the cache memory, by the computer system, upon a determination that the refresh indicator is set to “true”; and setting, by the computer system, the refresh indicator to “false” after the refreshing of the cache memory.

PROCESSING METHOD AND ACCELERATING DEVICE
20220335299 · 2022-10-20 ·

The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.

Configuration cache for the ARM SMMUv3
11474953 · 2022-10-18 · ·

A method of translating a virtual address into a physical memory address in an ARM System Memory Management Unit version 3 (SMMUv3) system includes searching a Configuration Cache memory for a matching tag that matches an associated tag upon receiving the virtual address and the associated tag, and extracting, in a single memory lookup cycle, a matching data field associated with the matching tag when the matching tag is found in the Configuration Cache memory. A matching data field of the Configuration Cache memory includes a matching Stream Table Entry (STE) and a matching Context Descriptor (CD), both associated with the matching tag. The Configuration Cache memory may be configured as a content-addressable memory. The method further includes storing entries associated with a multiple memory lookup cycle virtual address-to-physical address translation into the Configuration Cache memory, each of the entries including a tag, an associated STE and an associated CD.