G06F2212/654

APPARATUS AND METHOD FOR PERFORMING ADDRESS TRANSLATION

An apparatus, system, and method for address translation are provided. Physical address information corresponding to virtual addresses is prefetched and stored, where at least some sequences of the virtual addresses are in a predefined order. The physical address information is prefetched based on identification information provided by a data processing activity, comprising at least a segment identifier and a portion of a virtual address to be translated. The storage has segments of entries, wherein each segment stores physical address information which corresponds to virtual addresses in a predefined order. This predefined order means that it is not necessary to store virtual addresses in the storage. Storage capacity and response speed are therefore gained.

ZERO-COPY SPARSE MATRIX FACTORIZATION SYNTHESIS FOR HETEROGENEOUS COMPUTE SYSTEMS
20230024035 · 2023-01-26 ·

A system, method, and computer-readable medium for synthesizing zero-copy sparse matrix factorization operations in heterogeneous compute systems are provided. The system includes a host and an accelerator device. The host device is configured to divide an input matrix into a plurality of blocks which are transferred to a memory of the accelerator device. The host device is also configured to generate at least one index buffer that includes pointers to the block in the accelerator's memory, where each index buffer represents a frontal matrix associated with a matrix decomposition algorithm. The host processor is configured to receive one or more kernels configured to process the index buffer(s) on an accelerator device. The index buffers are processed by the accelerator device and the modified block data is written back to a memory of the host device to generate a factorized output matrix.

Memory address translation

Circuitry comprises a translation lookaside buffer to store memory address translations, each memory address translation being between an input memory address range defining a contiguous range of one or more input memory addresses in an input memory address space and a translated output memory address range defining a contiguous range of one or more output memory addresses in an output memory address space; in which the translation lookaside buffer is configured selectively to store the memory address translations as a cluster of memory address translations, a cluster defining memory address translations in respect of a contiguous set of input memory address ranges by encoding one or more memory address offsets relative to a respective base memory address; memory management circuitry to retrieve data representing memory address translations from a memory, for storage by the translation lookaside buffer, when a required memory address translation is not stored by the translation lookaside buffer; detector circuitry to detect an action consistent with access, by the translation lookaside buffer, to a given cluster of memory address translations; and prefetch circuitry, responsive to a detection of the action consistent with access to a cluster of memory address translations, to prefetch data from the memory representing one or more further memory address translations of a further set of input memory address ranges adjacent to the contiguous set of input memory address ranges for which the given cluster defines memory address translations.

Retaining cache entries of a processor core during a powered-down state

A processor core associated with a first cache initiates entry into a powered-down state. In response, information representing a set of entries of the first cache are stored in a retention region that receives a retention voltage while the processor core is in a powered-down state. Information indicating one or more invalidated entries of the set of entries is also stored in the retention region. In response to the processor core initiating exit from the powered-down state, entries of the first cache are restored using the stored information representing the entries and the stored information indicating the at least one invalidated entry.

Performing speculative address translation in processor-based devices

Performing speculative address translation in processor-based devices is disclosed herein. In one exemplary embodiment, a processor-based device provides a processing element (PE) that defines a speculative translation instruction such as an enqueue instruction for offloading operations to a peripheral device. The speculative translation instruction references a plurality of bytes including one or more virtual memory addresses. After receiving the speculative translation instruction, an instruction decode stage of an execution pipeline circuit of the PE transmits a request for address translation of the virtual memory address to a memory management unit (MMU) of the PE. The MMU then performs speculative address translation of the virtual memory address into a corresponding translated memory address. In some embodiments, any address translation errors encountered are raised to an appropriate exception level, and may be raised synchronously or asynchronously with respect to an operation performed when the speculative translation instruction is executed.

LEVEL-AWARE CACHE REPLACEMENT
20230012880 · 2023-01-19 ·

An electronic device includes one or more processors and a cache that stores data entries. The electronic device transmits a request for translation of a first address to the cache. In accordance with a determination that the request is not satisfied by the data entries in the cache, the electronic device transmits the request to memory that is distinct from the cache, and receives data including a second address corresponding to the first address. In accordance with a determination that the data does not satisfy cache promotion criteria, the electronic device replaces an entry at a first priority level in the cache with the data. In accordance with a determination that the data satisfies the cache promotion criteria, the electronic device replaces an entry at a second priority level that is a higher priority level than the first priority level in the cache with the data including the second address.

Method, system, and apparatus for supporting multiple address spaces to facilitate data movement

Methods, systems, and apparatuses provide support for multiple address spaces in order to facilitate data movement. One apparatus includes an input/output memory management unit (IOMMU) comprising: a plurality of memory-mapped input/output (MMIO) registers that map memory address spaces belonging to the IOMMU and at least a second IOMMU; and hardware control logic operative to: synchronize the plurality of MMIO registers of the at least the second IOMMU; receive, from a peripheral component endpoint coupled to the IOMMU, a direct memory access (DMA) request, the DMA request to a memory address space belonging to the at least the second IOMMU; access the plurality of MMIO registers of the IOMMU based on context data of the DMA request; and access, from the IOMMU, a function assigned to the memory address space belonging to the at least the second IOMMU based on the accessed plurality of MMIO registers.

Support for encrypted memory in nested virtual machines
11550612 · 2023-01-10 · ·

A method includes receiving a memory access request comprising a first memory address and translating the first memory address to a second memory address using a first page table associated with the first virtual machine. The first page table indicates whether the memory of the first virtual machine is encrypted. The method further includes determining that the first virtual machine is nested within a second virtual machine and translating the second memory address to a third memory address using a second page table associated with the second virtual machine. The second page table indicates whether the memory of the second virtual machine is encrypted.

Prefetch mechanism for a cache structure

An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.

TLB device supporting multiple data streams and updating method for TLB module

Aspects of managing Translation Lookaside Buffer (TLB) units are described herein. The aspects may include a memory management unit (MMU) that includes one or more TLB units and a control unit. The control unit may be configured to identify one from the one or more TLB units based on a stream identification (ID) included in a received virtual address and, further, to identify a frame number in the identified TLB unit. A physical address may be generated by the control unit based on the frame number and an offset included in the virtual address.