G06F2212/68

IMPLEMENTING MAPPING DATA STRUCTURES TO MINIMIZE SEQUENTIALLY WRITTEN DATA ACCESSES
20230048104 · 2023-02-16 ·

A system includes a memory device, and a processing device, operatively coupled to the memory device, to perform operations including receiving a request to sequentially write data to a block of a memory device, in response to receiving the request, writing the data to the block to obtain sequentially written data, initiating accumulation of logical-to-physical (L2P) mapping data corresponding to the sequentially written data, determining that a criterion for terminating the accumulation of the L2P mapping data is satisfied, in response to determining that the criterion is satisfied, terminating the accumulation of the L2P mapping data to obtain accumulated L2P mapping data, and updating an L2P mapping data structure based on the accumulated L2P mapping data.

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

APPARATUS AND METHOD FOR PERFORMING ADDRESS TRANSLATION

An apparatus, system, and method for address translation are provided. Physical address information corresponding to virtual addresses is prefetched and stored, where at least some sequences of the virtual addresses are in a predefined order. The physical address information is prefetched based on identification information provided by a data processing activity, comprising at least a segment identifier and a portion of a virtual address to be translated. The storage has segments of entries, wherein each segment stores physical address information which corresponds to virtual addresses in a predefined order. This predefined order means that it is not necessary to store virtual addresses in the storage. Storage capacity and response speed are therefore gained.

SUPPORTING DATA COMPRESSION USING MATCH SCORING

A processing system is provided that includes a memory for storing an input bit stream and a processing logic, operatively coupled to the memory, to generate a first score based on: a first set of matching data related to a match between a first bit subsequence and a candidate bit subsequence within the input bit stream, and a first distance of the candidate bit subsequence from the first set of matching data. A second score is generated based on a second set of matching data related to a match between a second bit subsequence and the candidate bit subsequence, and a second distance of the candidate bit subsequence from the second set of matching data. A code to replace the first or second bit subsequence in an output bit stream is identified. Selection of the one of the bit subsequences to replace is based on a comparison of the scores.

Method and Apparatus for Shared Virtual Memory to Manage Data Coherency in a Heterogeneous Processing System
20180011792 · 2018-01-11 · ·

One embodiment provides for a heterogeneous computing device comprising a first processor coupled with a second processor, wherein one or more of the first or second processor includes graphics processing logic; wherein each of the first processor and the second processor includes first logic to perform virtual to physical memory address translation; and wherein the first logic includes cache coherency state for a block of memory associated with a virtual memory address.

TECHNOLOGIES FOR ADDRESS TRANSLATION CACHE RESERVATION IN OFFLOAD DEVICES

Techniques for address translation cache (ATC) reservation in offload devices are disclosed. In the illustrative embodiment, a processor of a compute device sends a start ATC reservation descriptor to an offload device. The start ATC reservation descriptor includes an identifier associated with a virtual machine for which at least part of an address translation cache of the offload device should be reserved. The offload device establishes a zone in the ATC of the offload device that is reserved for address translations associated with the identifier. Such cache reservation may be used when, e.g., a priority of a task is high or there is a need for critical or important workload to have lower latency and higher throughput.

Memory address translation

Circuitry comprises a translation lookaside buffer to store memory address translations, each memory address translation being between an input memory address range defining a contiguous range of one or more input memory addresses in an input memory address space and a translated output memory address range defining a contiguous range of one or more output memory addresses in an output memory address space; in which the translation lookaside buffer is configured selectively to store the memory address translations as a cluster of memory address translations, a cluster defining memory address translations in respect of a contiguous set of input memory address ranges by encoding one or more memory address offsets relative to a respective base memory address; memory management circuitry to retrieve data representing memory address translations from a memory, for storage by the translation lookaside buffer, when a required memory address translation is not stored by the translation lookaside buffer; detector circuitry to detect an action consistent with access, by the translation lookaside buffer, to a given cluster of memory address translations; and prefetch circuitry, responsive to a detection of the action consistent with access to a cluster of memory address translations, to prefetch data from the memory representing one or more further memory address translations of a further set of input memory address ranges adjacent to the contiguous set of input memory address ranges for which the given cluster defines memory address translations.

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20230229448 · 2023-07-20 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

Retaining cache entries of a processor core during a powered-down state

A processor core associated with a first cache initiates entry into a powered-down state. In response, information representing a set of entries of the first cache are stored in a retention region that receives a retention voltage while the processor core is in a powered-down state. Information indicating one or more invalidated entries of the set of entries is also stored in the retention region. In response to the processor core initiating exit from the powered-down state, entries of the first cache are restored using the stored information representing the entries and the stored information indicating the at least one invalidated entry.

Memory Management Unit for Multi-Threaded Architecture

An exemplary multi-threaded memory management system comprises a memory management unit (MMU) configured with a plurality of physical address (PA) output ports individually dedicated to a respective plurality of threads, wherein the MMU is configured to adjust scheduling of the plurality of threads based on the status of an item requested from a cache. The MMU may be configured to translate a virtual address (VA) input from an individual thread to a PA output on the respective PA output port. The cache may be a translation look-aside buffer. The item requested from the cache may be in transient status when a response is expected or valid status when the response is received. The MMU may signal a thread scheduler to run a thread when a requested item's status becomes valid, permitting stalling individual threads without blocking other threads that continue running using the PA output port dedicated to each thread.