G06F2212/682

Translation entry invalidation in a multithreaded data processing system

In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests, including a translation invalidation request of an initiating hardware thread, are received in a shared queue. The translation invalidation request is broadcast so that it is received and processed by the plurality of processor cores. In response to confirmation of the broadcast, the address translated by the translation entry is stored in a queue. Once the address is stored, the initiating processor core resumes dispatch of instructions within the initiating hardware thread. In response to a request from one of the plurality of processor cores, an effective address translated by a translation entry being invalidated is accessed in the queue. A synchronization request for the address is broadcast to ensure completion of processing of any translation invalidation request for the address.

APPARATUS AND METHOD FOR LOW-OVERHEAD SYNCHRONOUS PAGE TABLE UPDATES
20170286300 · 2017-10-05 ·

An apparatus and method are described for low overhead synchronous page table updates. For example, one embodiment of a processor comprises: a set of one or more cores to execute instructions and process data; a translation lookaside buffer (TLB) comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; locking circuitry to allow a thread to lock a first page table entry (PTE) in the TLB to ensure that only one thread can modify the first PTE at a time, wherein the TLB is to modify the first PTE upon the thread acquiring the lock; a PTE invalidation circuit to execute a PTE invalidate instruction on a first core to invalidate the first PTE in other TLBs of other cores, the PTE invalidation circuit, responsive to execution of the PTE invalidate instruction, to responsively determine a number of other TLBs of other cores which need to be notified of the PTE invalidation, transmit PTE invalidate messages to the other TLBs, and wait for responses; and the locking circuitry to release the lock on the first PTE responsive to receiving responses from all of the other TLBs.

APPARATUS AND METHOD FOR LAZY TRANSLATION LOOKASIDE BUFFER (TLB) COHERENCE
20170286316 · 2017-10-05 ·

An apparatus and method are described for managing TLB coherence. For example, one embodiment of a processor comprises: one or more cores to execute instructions and process data; one or more translation lookaside buffers (TLBs) each comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; one or more epoch counters each programmed with a specified epoch value; and TLB validation logic to validate a specified set of TLB entries at intervals specified by the epoch value.

Managing translation invalidation
09779028 · 2017-10-03 · ·

Managing translation invalidation includes: in response to determining that a first invalidation message (IM) applies to a subset of virtual addresses (VAs) consisting of fewer than all VAs associated with a first set of translation context (TC) values, searching VA-indexed structure(s) to find and invalidate any entries that correspond to a VA in the subset; in response to determining that a second IM applies to all VAs associated with a second set of TC values and that no entry exists in invalidation-tracking structure(s) corresponding to the second set, bypassing searching any VA-indexed structure(s); and in response to determining that a third IM applies to all VAs associated with a third set of TC values and that at least one entry exists in the invalidation-tracking structure(s) corresponding to the third set, storing invalidation information in the invalidation-tracking structure(s) to invalidate the third set and delaying searching any VA-indexed structure(s).

Power Aware Translation Lookaside Buffer Invalidation Optimization

One disclosed embodiment includes a method for memory management. The method includes receiving a first request to clear one or more entries of a translation lookaside buffer (TLB), receiving a second request to clear one or more entries of the TLB, bundling the first request with the second request, determining that a processor associated with the TLB transitioned to an inactive mode, and dropping the bundled first and second requests based on the determination.

Translation entry invalidation in a multithreaded data processing system

In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests, including a translation invalidation request of an initiating hardware thread, are received in a shared queue. The translation invalidation request is broadcast so that it is received and processed by the plurality of processor cores. In response to confirmation of the broadcast, the address translated by the translation entry is stored in a queue. Once the address is stored, the initiating processor core resumes dispatch of instructions within the initiating hardware thread. In response to a request from one of the plurality of processor cores, an effective address translated by a translation entry being invalidated is accessed in the queue. A synchronization request for the address is broadcast to ensure completion of processing of any translation invalidation request for the address. Subsequent memory referent instructions can be ordered with respect to the broadcast synchronization request by a synchronization instruction.

METHOD AND APPARATUS TO ENABLE A CACHE (DEVPIC) TO STORE PROCESS SPECIFIC INFORMATION INSIDE DEVICES THAT SUPPORT ADDRESS TRANSLATION SERVICE (ATS)
20210406195 · 2021-12-30 ·

Embodiments described herein may include apparatus, systems, techniques, or processes that are directed to PCIe Address Translation Service (ATS) to allow devices to have a DevTLB that caches address translation (per page) information in conjunction with a Device ProcessInfoCache (DevPIC) that will store process specific information. Other embodiments may be described and/or claimed.

Packet processing device, packet processing method, and recording medium
11194734 · 2021-12-07 · ·

In order to achieve a packet processing device which make it possible to process a packet at high speed, a bus that transfers a communication packet, and a plurality of processors and executes at least one task including either of a first task and a second task are included, wherein the first task performs processing when a first task identifier given to the first task and a second task identifier added to the communication packet received from the bus coincide with each other, the second task performs the processing for the communication packet that is not added with the second task identifier, and the processing executes first processing, based on the packet identifier, and thereafter, adds, to the communication packet, the second task identifier indicating the different first task that executes second processing subsequent to the first processing, and transmits the communication packet to the bus.

Method and apparatus for an efficient TLB lookup

The present disclosure relates to a method of operating a translation lookaside buffer (TLB) arrangement for a processor supporting virtual addressing, wherein multiple translation engines are used to perform translations on request of one of a plurality of dedicated processor units. The method comprises: maintaining by a cache unit a dependency matrix for the engines to track for each processing unit if an engine is assigned to the each processing unit for a table walk. The cache unit may block a processing unit from allocating an engine to a translation request when the engine is already assigned to the processing unit in the dependency matrix.

HETEROGENOUS-LATENCY MEMORY OPTIMIZATION

Memory pages are background-relocated from a low-latency local operating memory of a server computer to a higher-latency memory installation that enables high-resolution access monitoring and thus access-demand differentiation among the relocated memory pages. Higher access-demand memory pages are background-restored to the low-latency operating memory, while lower access-demand pages are maintained in the higher latency memory installation and yet-lower access-demand pages are optionally moved to yet higher-latency memory installation.