G06F2212/6028

Throttling Schemes in Multicore Microprocessors
20220365879 · 2022-11-17 ·

An electronic device includes a cache, a processing cluster having one or more processors, and prefetch throttling circuitry that determines a congestion level of the processing cluster based on an extent to which the data retrieval requests sent from the processors to the cache are not satisfied by the cache. Congestion criteria require that the congestion level of the cluster is above a cluster congestion threshold. In accordance with a determination that the congestion level of the cluster satisfies the congestion criteria, the prefetch throttling circuit causes one of the processors to limit prefetch requests to the cache to prefetch requests of at least a threshold quality. In accordance with a determination that the congestion level of the cluster does not satisfy the congestion criteria, the prefetch throttling circuit forgoes causing the processors to limit prefetch requests to the cache to prefetch requests of at least the threshold quality.

System, device and method for accessing device-attached memory

A device connected to a host processor via a bus includes: an accelerator circuit configured to operate based on a message received from the host processor; and a controller configured to control an access to a memory connected to the device, wherein the controller is further configured to, in response to a read request received from the accelerator circuit, provide a first message requesting resolution of coherence to the host processor and prefetch first data from the memory.

Link stack based instruction prefetch augmentation

A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20230052652 · 2023-02-16 ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

Data write system and method with registers defining address range
11500776 · 2022-11-15 · ·

A data write system includes a processor circuit, a first memory, at least one register, and a second memory. The first memory is coupled to the processor circuit. The at least one register is configured to define at least one range. The second memory is coupled to the first memory. If a cache miss occurs and an access address of a reading command is in the at least one range in the second memory, a predetermined amount of data corresponding to the access address is written from the second memory into at least one first way of the first memory.

Hardware compression and decompression engine

A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.

Vector prefetching for computing systems
11500779 · 2022-11-15 · ·

Described is a computing system for vector prefetching which includes a hierarchical memory including multiple caches, a missing address storage unit (MASU) associated with each cache which stores prefetch requests suffering a cache miss, a prefetcher which sends prefetch requests towards the hierarchical memory, and a vector prefetch unit. The vector prefetch unit determines existence of at least one of a relationship between a cache block associated with the prefetch request and cache blocks associated with one or more entries in a MASU, or a relationship between cache blocks associated with different entries in a MASU, and sends a vector prefetch request based on related prefetch requests including indicators indicating a starting cache block and a number of related cache blocks to a higher memory level to obtain data associated with each cache block. The hierarchical memory stores the data received in at least one response message from the higher memory level if available.

Flash memory devices and prefetch methods thereof
11494312 · 2022-11-08 · ·

A storage device includes a flash memory array and a controller. The flash memory array stores a plurality of user data. After the controller finishes initialization, the controller accesses the user data stored in the flash memory array according to a plurality of host commands and an H2F mapping table, and records a plurality of address information about the user data in a powered-ON access table.

SHARED PREFETCH INSTRUCTION AND SUPPORT

Techniques for shared data prefetch are described. An exemplary instruction for shared data prefetch includes at least one field for an opcode, at least one field for a source operand to provide a memory address at least a byte of data, wherein the opcode is to indicate that circuitry is to fetch of a line of data from memory at the provided address that contains the byte specified with the source operand and store that byte in at least a cache local to a requester, wherein the byte of data is to be stored in a shared state.

TRANSLATION HINTS
20230102006 · 2023-03-30 ·

A hinter data processing apparatus is provided with processing circuitry that determines that an execution context to be executed on a hintee data processing apparatus will require a virtual-to-physical address translation. Hint circuitry transmits a hint to a hintee data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. A hintee data processing apparatus is also provided with receiving circuitry that receives a hint from a hinter data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. Processing circuitry determines whether to follow the hint and, in response to determining that the hint is to be followed, causes the virtual-to-physical address translation to be prefetched for the execution context of the data processing apparatus. In both cases, the hint comprises an identifier of the execution context.