G06F2212/6028

PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO FETCH DATA TO INDICATED CACHE LEVEL WITH GUARANTEED COMPLETION
20170286118 · 2017-10-05 ·

A processor of an aspect includes a plurality of caches at a plurality of different cache levels. The processor also includes a decode unit to decode a fetch instruction. The fetch instruction is to indicate address information for a memory location, and the fetch instruction is to indicate a cache level of the plurality of different cache levels. The processor also includes a cache controller coupled with the decode unit, and coupled with a cache at the indicated cache level. The cache controller, in response to the fetch instruction, is to store data associated with the memory location in the cache, wherein the fetch instruction is architecturally guaranteed to be completed. Other processors, methods, systems, and machine-readable storage mediums storing instructions are disclosed.

PROCESSORS, METHODS, AND SYSTEMS TO ALLOCATE LOAD AND STORE BUFFERS BASED ON INSTRUCTION TYPE

A processor of an aspect includes a decode unit to decode memory access instructions of a first type and to output corresponding memory access operations, and to decode memory access instructions of a second type and to output corresponding memory access operations. The processor also includes a load store queue coupled with the decode unit. The load store queue includes a load buffer that is to have a plurality of load buffer entries, and a store buffer that is to have a plurality of store buffer entries. The load store queue also includes a buffer entry allocation controller coupled with the load buffer and coupled with the store buffer. The buffer entry allocation controller is to allocate load and store buffer entries based at least in part on whether memory access operations correspond to memory access instructions of the first type or of the second type. Other processors, methods, and systems, are also disclosed.

User-level fork and join processors, methods, systems, and instructions

A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.

Control flow guided lock address prefetch and filtering

A method of prefetching target data includes, in response to detecting a lock-prefixed instruction for execution in a processor, determining a predicted target memory location for the lock-prefixed instruction based on control flow information associating the lock-prefixed instruction with the predicted target memory location. Target data is prefetched from the predicted target memory location to a cache coupled with the processor, and after completion of the prefetching, the lock-prefixed instruction is executed in the processor using the prefetched target data.

Memory management for graphics processing unit workloads

A method, a device, and a non-transitory computer readable medium for performing memory management in a graphics processing unit are presented. Hints about the memory usage of an application are provided to a page manager. At least one runtime memory usage pattern of the application is sent to the page manager. Data is swapped into and out of a memory by analyzing the hints and the at least one runtime memory usage pattern.

SECURE ADDRESS TRANSLATION SERVICES USING A PERMISSION TABLE

Embodiments are directed to providing a secure address translation service. An embodiment of a system includes memory for storage of data, an IOMMU coupled to the memory, and a host-to-device link to couple the IOMMU with one or more devices and to operate as a translation agent on behalf of one or more devices in connection with memory operations relating to the memory, including receiving a translated request from a discrete device via the host-to-device link specifying a memory operation and a physical address within the memory pertaining to the memory operation, determining page access permissions assigned to a context of the discrete device for a physical page of the memory within which the physical address resides, allowing the memory operation to proceed when the page access permissions permit the memory operation, and blocking the memory operation when the page access permissions do not permit the memory operation.

DYNAMIC CACHE MEMORY MANAGEMENT WITH CACHE POLLUTION AVOIDANCE

A computer -implemented method for managing a cache memory includes fetching, via a processor, a data portion, identifying, via the processor, a transiency classification of a data portion in a memory address range, saving, via the processor, the data portion to a first level (L1) cache memory, evaluating, via the processor, whether the data portion should be copied to at least one other cache memory of a plurality of cache memories based on the transiency classification of the data portion, and selectively saving, via the processor, the data portion to a potential one or more of the plurality of cache memories based on the transiency classification of the data portion.

MANIPULATION OF VIRTUAL MEMORY PAGE TABLE ENTRIES TO FORM VIRTUALLY-CONTIGUOUS MEMORY CORRESPONDING TO NON-CONTIGUOUS REAL MEMORY ALLOCATIONS
20170220482 · 2017-08-03 · ·

Systems and methods for managing contiguous addressing via virtual paging registers in a page table used in a high-performance computing platform. One embodiment commences upon initializing a first paging register with a first virtual address of a first virtual address length to form a first virtual address space, then receiving a request from a process to allocate physical memory corresponding to a second virtual address request. A memory allocator allocates the requested physical memory from a physical memory location determined by the memory allocator. An operating system or other sufficiently privileged access identifies a second paging register that is contiguously adjacent to the first paging register. If the second paging register is already in use, then the method identifies an unused (third) paging register into which the contents of the second paging register can be relocated. The method stores the second virtual address into the now freed-up second paging register.

Method of establishing pre-fetch control information from an executable code and an associated NVM controller, a device, a processor system and computer program products

A method of establishing pre-fetch control information from an executable code is described. The method comprises inspecting the executable code to find one or more instructions corresponding to an unconditional change in program flow during an execution of the executable code when the executable code is retrieved from a non-volatile memory comprising a plurality of NVM lines. For each unconditional change of flow instruction, the method comprises establishing a NVM line address of the NVM line containing said unconditional change of flow instruction; establishing a destination address associated with the unconditional change of flow instruction; determining whether the destination address is in an address range corresponding to a NVM-pre-fetch starting from said NVM line address; establishing a pre-fetch flag indicating whether the destination address is in the address range corresponding to a NVM-pre-fetch starting from said NVM line address; and recording the pre-fetch flag in a pre-fetch control information record.

Facilitating efficient prefetching for scatter/gather operations

The disclosed embodiments relate to a computing system that facilitates performing prefetching for scatter/gather operations. During operation, the system receives a scatter/gather prefetch instruction at a processor core, wherein the scatter/gather prefetch instruction specifies a virtual base address, and a plurality of offsets. Next, the system performs a lookup in a translation-lookaside buffer (TLB) using the virtual base address to obtain a physical base address that identifies a physical page for the base address. The system then sends the physical base address and the plurality of offsets to a cache. This enables the cache to perform prefetching operations for the scatter/gather instruction by adding the physical base address to the plurality of offsets to produce a plurality of physical addresses, and then prefetching cache lines for the plurality of physical addresses into the cache.