G06F2212/681

METHOD AND APPARATUS FOR DETECTING ATS-BASED DMA ATTACK
20220114107 · 2022-04-14 · ·

Embodiments are directed to providing a secure address translation service. An embodiment of a system includes a computer-readable memory for storage of data, the computer-readable memory comprising a first memory buffer and a second memory buffer, an attack discovery unit device comprising processing circuitry to perform operations, comprising, receiving a direct memory access (DMA) request from a remote device via a Peripheral Component Interconnect Express (PCIe) link, the direct memory access (DMA) request comprising a host physical address and a header indicating that the target memory address has previously been translated to a host physical address (HPA), and blocking a direct memory access in response to a determination of at least one of that the remote device has not obtained a valid address translation from a translation agent, or that the remote device has not obtained a valid translation for the target memory address from the translation agent.

Range based flushing mechanism

An apparatus to facilitate memory flushing is disclosed. The apparatus comprises a cache memory, one or more processing resources, tracker hardware to dispatch workloads for execution at the processing resources and to monitor the workloads to track completion of the execution, range based flush (RBF) hardware to process RBF commands and generate a flush indication to flush data from the cache memory and a flush controller to receive the flush indication and perform a flush operation to discard data from the cache memory at an address range provided in the flush indication.

METHOD AND APPARATUS FOR REDUCING THE LATENCY OF LONG LATENCY MEMORY REQUESTS
20220091986 · 2022-03-24 ·

Systems, apparatuses, and methods for efficiently processing memory requests are disclosed. A computing system includes at least one processing unit coupled to a memory. Circuitry in the processing unit determines a memory request becomes a long-latency request based on detecting a translation lookaside buffer (TLB) miss, a branch misprediction, a memory dependence misprediction, or a precise exception has occurred. The circuitry marks the memory request as a long-latency request such as storing an indication of a long-latency request in an instruction tag of the memory request. The circuitry uses weighted criteria for scheduling out-of-order issue and servicing of memory requests. However, the indication of a long-latency request is not combined with other criteria in a weighted sum. Rather, the indication of the long-latency request is a separate value. The circuitry prioritizes memory requests marked as long-latency requests over memory requests not marked as long-latency requests.

TECHNIQUES TO IMPROVE TRANSLATION LOOKASIDE BUFFER REACH BY LEVERAGING IDLE RESOURCES

Techniques are disclosed for processing address translations. The techniques include detecting a first miss for a first address translation request for a first address translation in a first translation lookaside buffer, in response to the first miss, fetching the first address translation into the first translation lookaside buffer and evicting a second address translation from the translation lookaside buffer into an instruction cache or local data share memory, detecting a second miss for a second address translation request referencing the second address translation, in the first translation lookaside buffer, and in response to the second miss, fetching the second address translation from the instruction cache or the local data share memory.

Quality of service for input/output memory management unit

A data processing system includes a memory, a group of input/output (I/O) devices, an input/output memory management unit (IOMMU). The IOMMU is connected to the memory and adapted to allocate a hardware resource from among a group of hardware resources to receive an address translation request for a memory access from an I/O device. The IOMMU detects address translation requests from the plurality of I/O devices. The IOMMU reorders the address translation requests such that an order of dispatching an address translation request is based on a policy associated with the I/O device that is requesting the memory access. The IOMMU selectively allocates a hardware resource to the input/output device, based on the policy that is associated with the I/O device in response to the reordering.

APPARATUS AND METHOD FOR EFFICIENT PROCESS-BASED COMPARTMENTALIZATION
20210311883 · 2021-10-07 ·

An apparatus and method for efficient process-based compartmentalization. For example, one embodiment of a processor comprises: execution circuitry to execute instructions and process data; memory management circuitry coupled to the execution circuitry, the memory management circuitry to manage access to a system memory by a plurality of related processes using one or more process-specific translation structures and one or more shared translation structures to be shared by the related processes; and one or more control registers to store a process-specific base address pointer associated with a first process of the plurality of related processes and to store a shared base address pointer to identify the shared translation structures; wherein the memory management circuitry is to use the process-specific base address pointer in combination with a first linear address provided by the first process to walk the process-specific translation structures to identify any permissions and/or physical address associated with the first linear address, wherein if permissions are identified, the memory management circuitry is to use the permissions in place of any permissions specified in the shared translation structures.

Facilitating access to memory locality domain information

Processing within a computing environment is facilitated by ascertaining locality domain information of a unit of memory to processing capability within the computing environment. Once ascertained, the locality domain information of the unit of memory may be cached in a data structure to facilitate one or more subsequent lookups of the locality domain information associated with one or more affinity evaluations of the unit of memory to processing capability of the computing environment.

Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

ADDRESS TRANSLATION PREFETCHING FOR INPUT/OUTPUT DEVICES
20230401160 · 2023-12-14 ·

In one example of the present technology, an input/output memory management unit (IOMMU) of a computing device is configured to: receive a prefetch message including a virtual address from a central processing unit (CPU) core of a processor of the computing device; perform a page walk on the virtual address through a page table stored in a main memory of the computing device to obtain a prefetched translation of the virtual address to a physical address; and store the prefetched translation of the virtual address to the physical address in a translation lookaside buffer (TLB) of the IOMMU.

APPARATUS AND METHOD FOR EFFICIENT PROCESS-BASED COMPARTMENTALIZATION
20210200687 · 2021-07-01 ·

An apparatus and method for efficient process-based compartmentalization. For example, one embodiment of a processor comprises: execution circuitry to execute instructions and process data; memory management circuitry coupled to the execution circuitry, the memory management circuitry to manage access to a system memory by a plurality of related processes using one or more process-specific translation structures and one or more shared translation structures to be shared by the related processes; and one or more control registers to store a process-specific base address pointer associated with a first process of the plurality of related processes and to store a shared base address pointer to identify the shared translation structures; wherein the memory management circuitry is to use the process-specific base address pointer in combination with a first linear address provided by the first process to walk the process-specific translation structures to identify any permissions and/or physical address associated with the first linear address, wherein if permissions are identified, the memory management circuitry is to use the permissions in place of any permissions specified in the shared translation structures.