G06F2212/682

DYNAMICAL SWITCHING BETWEEN EPT AND SHADOW PAGE TABLES FOR RUNTIME PROCESSOR VERIFICATION
20200097313 · 2020-03-26 ·

Implementations disclosed describe a system and a method to execute a virtual machine on a processing device, receive a request to access a memory page identified by a guest virtual memory address (GVA) in an address space of the virtual machine, translate the GVA to a guest physical memory address (GPA) using a guest page table (GPT) comprising a GPT entry mapping the GVA to the GPA, translate the GPA to a host physical address (HPA) of the memory page, store, in a translation lookaside buffer (TLB), a TLB entry mapping the GVA to the HPA, modify the GPT entry to designate the memory page as accessed, detect an attempt by an application to modify the GPT entry; generate, in response to the attempt to modify the GPT entry, a page fault; and flush, in response to the page fault, the TLB entry.

Nested hypervisor memory virtualization

This disclosure generally relates to hypervisor memory virtualization. In an example, multiple page table stages may be used to provide a page table that may be used by a processor when processing a workload for a nested virtual machine. An intermediate (e.g., nested) hypervisor may request an additional page table stage from a parent hypervisor, which may be used to virtualize memory for one or more nested virtual machines managed by the intermediate hypervisor. Accordingly, a processor may use the additional page table stages to ultimately translate a virtual memory address for a nested virtual machine to a physical memory address.

Range-based memory system

A mechanism is provided for efficient coherence state modification of cached data stored in a range of addresses in a coherent data processing system in which data coherency is maintained across multiple caches. A tag search structure is maintained that identifies address tags and coherence states of cached data indexed by address tags. In response to a request from a device internal to or external from the coherence network, the tag search structure is searched to identify address tags of cached data for which the coherence state is to be modified and requests are issued in the data processing system to modify a coherence state of cached lines with the identified address tags. The request from the external device may specify a range of addresses for which a coherence state change is sought. The tag search structure may be implemented as search tree, for example.

Effective address based instruction fetch unit for out of order processors

Aspects of the invention include a computer-implemented method for executing one or more instructions by a processing unit. The method includes receiving, by an instruction fetch unit (IFU), a request to fetch an instruction for execution, wherein the instruction includes an effective address (EA). The IFU can further access an instruction cache directory (I-directory) using the EA of the requested instruction to determine whether the EA of the requested instruction matches an EA stored in an associated instruction cache (I-cache). An instruction cache (I-cache) can output the requested instruction in response to or based at least in part on determining that the requested instruction EA matches an entry in the I-cache. A decode unit can decode the requested instruction output by the I-cache.

Dynamically adapting mechanism for translation lookaside buffer shootdowns

An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

Method and apparatus for an efficient TLB lookup

The present disclosure relates to a method of operating a translation lookaside buffer (TLB) arrangement for a processor supporting virtual addressing, wherein multiple translation engines are used to perform translations on request of one of a plurality of dedicated processor units. The method comprises: maintaining by a cache unit a dependency matrix for the engines to track for each processing unit if an engine is assigned to the each processing unit for a table walk. The cache unit may block a processing unit from allocating an engine to a translation request when the engine is already assigned to the processing unit in the dependency matrix.

APPARATUS, METHODS, AND SYSTEMS FOR LOW LATENCY COMMUNICATION IN A CONFIGURABLE SPATIAL ACCELERATOR
20200004690 · 2020-01-02 ·

Systems, methods, and apparatuses relating to low latency communications in a configurable spatial accelerator are described. In one embodiment, a processor includes a spatial array of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, a plurality of request address file circuits coupled to the spatial array of processing elements and a cache memory, each request address file circuit of the plurality of request address file circuits to access data in the cache memory in response to a request for data access from the spatial array of processing elements, a plurality of translation lookaside buffers comprising a translation lookaside buffer in each of the plurality of request address file circuits to provide an output of a physical address for an input of a virtual address, and a function controller to receive an interrupt that includes a first field, that when set to a first value, causes a shootdown message to be broadcast to the plurality of translation lookaside buffers to cause a shootdown in the plurality of translation lookaside buffers

Method and apparatus for shared virtual memory to manage data coherency in a heterogeneous processing system
11921635 · 2024-03-05 · ·

Embodiments described herein provide a scalable coherency tracking implementation that utilizes shared virtual memory to manage data coherency. In one embodiment, coherency tracking granularity is reduced relative to existing coherency tracking solutions, with coherency tracking storage memory moved to memory as a page table metadata. For example and in one embodiment, storage for coherency state is moved from dedicated hardware blocks to system memory, effectively providing a directory structure that is limitless in size.

DYNAMICALLY ADAPTING MECHANISM FOR TRANSLATION LOOKASIDE BUFFER SHOOTDOWNS
20190377688 · 2019-12-12 ·

An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

System, apparatus and method for user space object coherency in a processor

In an embodiment, a processor comprises: an execution circuit to execute instructions; at least one cache memory coupled to the execution circuit; and a table storage element coupled to the at least one cache memory, the table storage element to store a plurality of entries each to store object metadata of an object used in a code sequence. The processor is to use the object metadata to provide user space multi-object transactional atomic operation of the code sequence. Other embodiments are described and claimed.