Patent classifications
G06F12/1027
Memory address translation
Circuitry comprises a translation lookaside buffer to store memory address translations, each memory address translation being between an input memory address range defining a contiguous range of one or more input memory addresses in an input memory address space and a translated output memory address range defining a contiguous range of one or more output memory addresses in an output memory address space; in which the translation lookaside buffer is configured selectively to store the memory address translations as a cluster of memory address translations, a cluster defining memory address translations in respect of a contiguous set of input memory address ranges by encoding one or more memory address offsets relative to a respective base memory address; memory management circuitry to retrieve data representing memory address translations from a memory, for storage by the translation lookaside buffer, when a required memory address translation is not stored by the translation lookaside buffer; detector circuitry to detect an action consistent with access, by the translation lookaside buffer, to a given cluster of memory address translations; and prefetch circuitry, responsive to a detection of the action consistent with access to a cluster of memory address translations, to prefetch data from the memory representing one or more further memory address translations of a further set of input memory address ranges adjacent to the contiguous set of input memory address ranges for which the given cluster defines memory address translations.
METHOD AND DEVICE WITH MEMORY ACCESS REQUEST PROCESSING
Embodiments may relate to processing an access request to a memory. The access request to the memory may be based on a first virtual memory address. If a first physical memory address corresponding to the first virtual memory address is determined to be not acquired (e.g., based on a page table), it may be determined whether the first virtual memory address is a valid address. If the first virtual memory address is a valid address, a target virtual memory space or a target physical memory space for the access request may be allocated based on a free memory pool for the target kernel. The free memory pool may include currently allocated virtual and/or physical memory. The access request may be processed based on the allocated target virtual memory space or target physical memory space.
METHOD AND DEVICE WITH MEMORY ACCESS REQUEST PROCESSING
Embodiments may relate to processing an access request to a memory. The access request to the memory may be based on a first virtual memory address. If a first physical memory address corresponding to the first virtual memory address is determined to be not acquired (e.g., based on a page table), it may be determined whether the first virtual memory address is a valid address. If the first virtual memory address is a valid address, a target virtual memory space or a target physical memory space for the access request may be allocated based on a free memory pool for the target kernel. The free memory pool may include currently allocated virtual and/or physical memory. The access request may be processed based on the allocated target virtual memory space or target physical memory space.
Retaining cache entries of a processor core during a powered-down state
A processor core associated with a first cache initiates entry into a powered-down state. In response, information representing a set of entries of the first cache are stored in a retention region that receives a retention voltage while the processor core is in a powered-down state. Information indicating one or more invalidated entries of the set of entries is also stored in the retention region. In response to the processor core initiating exit from the powered-down state, entries of the first cache are restored using the stored information representing the entries and the stored information indicating the at least one invalidated entry.
Retaining cache entries of a processor core during a powered-down state
A processor core associated with a first cache initiates entry into a powered-down state. In response, information representing a set of entries of the first cache are stored in a retention region that receives a retention voltage while the processor core is in a powered-down state. Information indicating one or more invalidated entries of the set of entries is also stored in the retention region. In response to the processor core initiating exit from the powered-down state, entries of the first cache are restored using the stored information representing the entries and the stored information indicating the at least one invalidated entry.
Memory management device capable of managing memory address translation table using heterogeneous memories and method of managing memory address thereby
Provided are a GPU and a method of managing a memory address thereby. TLBs are configured using an SRAM and an STT-MRAM so that a storage capacity is significantly improved compared to a case where a TLB is configured using an SRAM. Accordingly, the page hit rate of a TLB can be increased, thereby improving the throughput of a device. Furthermore, after a PTE is first written in the SRAM, a PTE having high frequency of use is selected and moved to the STT-MRAM. Accordingly, an increase in TLB update time which may occur due to the use of the STT-MRAM having a low write speed can be prevented. Furthermore, a read time and read energy consumption can be reduced.
RESILIENCY AND PERFORMANCE FOR CLUSTER MEMORY
Disclosed are various embodiments for improving resiliency and performance of clustered memory. A computing device can acquire a chunk of byte-addressable memory from a cluster memory host. The computing device can then identify an active set of allocated memory pages and an inactive set of allocated memory pages for a process executing on the computing device. Next, the computing device can store the active set of allocated memory pages for the process in the memory of the computing device. Finally, the computing device can store the inactive set of allocated memory pages for the process in the chunk of byte-addressable memory of the cluster memory host.
HANDLING OF SINGLE-COPY-ATOMIC LOAD/STORE INSTRUCTION
In response to a single-copy-atomic load/store instruction for requesting an atomic transfer of a target block of data between the memory system and the registers, where the target block has a given size greater than a maximum data size supported for a single load/store micro-operation by a load/store data path, instruction decoding circuitry maps the single-copy-atomic load/store instruction to two or more mapped load/store micro-operations each for requesting transfer of a respective portion of the target block of data. In response to the mapped load/store micro-operations, load/store circuitry triggers issuing of a shared memory access request to the memory system to request the atomic transfer of the target block of data of said given size to or from the memory system, and triggers separate transfers of respective portions of the target block of data over the load/store data path.
Poison Mechanisms for Deferred Invalidates
An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.
VIRTUAL ACCELERATORS IN A VIRTUALIZED COMPUTING SYSTEM
An example method of virtualizing a hardware accelerator in a host cluster of a virtualized computing system includes: commanding, at an initiator host in the host cluster, a programmable expansion bus device to reconfigure as a virtual accelerator based on specifications of a hardware accelerator in a target host of the host cluster; executing, in the programmable expansion bus device, software to emulate the virtual accelerator as connected to an expansion bus of the initiator host; receiving, at the programmable expansion bus device, compute tasks from an application executing in the initiator host; and sending, to the target host, the compute tasks for processing by the hardware accelerator.