G06F2212/1008

MEMORY SHARING METHOD OF VIRTUAL MACHINES BASED ON COMBINATION OF KSM AND PASS-THROUGH

A memory sharing method of virtual machines through the combination of KSM and pass-through, including: a virtual machine manager judging whether operating systems of guests use IOMMU, if not, not participating in shared mapping of a KSM technology; if yes, judging memory pages of each guest to confirm whether the pages are mapping pages, if yes, remain the mapping pages into a host; and if not, on the premise of keeping the properties of Pass-through, using the KSM technology for all non-mapping pages to merge the memory pages with same contents among various virtual machines and perform write protection processing simultaneously. The guest memory pages are divided into those special for DMA and those for non-DMA purpose, then the KSM technology is only selectively applied to the non-DMA pages, and on the premise of keeping the properties of Pass-through, the object of saving memory resources is achieved simultaneously.

DATA STORAGE DEVICE AND DATA STORAGE METHOD FOR DETECTING CURRENTLY-USED LOGICAL PAGES
20180011646 · 2018-01-11 ·

A data storage device utilized for storing a plurality of data includes a memory and a controller. The memory includes a plurality of blocks, and each of the blocks includes a plurality of physical pages. The controller is coupled to the memory and maps the logical pages to the physical pages of the memory. When the controller detects that a first logical page of the logical pages is a currently-used logical page, it detects whether or not the second logical page which belongs to the last logical page of the first logical page is a currently-used logical page in order to find what is truly the last currently-used logical page.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

VIRTUALIZED CACHES
20230004494 · 2023-01-05 ·

Systems and methods are disclosed for virtualized caches. For example, an integrated circuit (e.g., a processor) for executing instructions includes a virtually indexed physically tagged first-level (L1) cache configured to output to an outer memory system one or more bits of a virtual index of a cache access as one or more bits of a requestor identifier. For example, the L1 cache may be configured to operate as multiple logical L1 caches with a cache way of a size less than or equal to a virtual memory page size. For example, the integrated circuit may include an L2 cache of the outer memory system that is configured to receive the requestor identifier and implement a cache coherency protocol to disambiguate an L1 synonym occurring in multiple portions of the virtually indexed physically tagged L1 cache associated with different requestor identifier values.

SYSTEMS AND METHODS FOR ADAPTIVE HYBRID HARDWARE PRE-FETCH

An apparatus includes a processor core and a memory hierarchy. The memory hierarchy includes main memory and one or more caches between the main memory and the processor core. A plurality of hardware pre-fetchers are coupled to the memory hierarchy and a pre-fetch control circuit is coupled to the plurality of hardware pre-fetchers. The pre-fetch control circuit is configured to compare changes in one or more cache performance metrics over two or more sampling intervals and control operation of the plurality of hardware pre-fetchers in response to a change in one or more performance metrics between at least a first sampling interval and a second sampling interval.

Feature map and weight selection method and accelerating device

The present disclosure provides a processing device including: a coarse-grained pruning unit configured to perform coarse-grained pruning on a weight of a neural network to obtain a pruned weight, an operation unit configured to train the neural network according to the pruned weight. The coarse-grained pruning unit is specifically configured to select M weights from the weights of the neural network through a sliding window, and when the M weights meet a preset condition, all or part of the M weights may be set to 0. The processing device can reduce the memory access while reducing the amount of computation, thereby obtaining an acceleration ratio and reducing energy consumption.

APPROACH FOR SUPPORTING MEMORY-CENTRIC OPERATIONS ON CACHED DATA
20230021492 · 2023-01-26 ·

A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.

Encoded inline capabilities

Disclosed embodiments relate to encoded inline capabilities. In one example, a system includes a trusted execution environment (TEE) to partition an address space within a memory into a plurality of compartments each associated with code to execute a function, the TEE further to assign a message object in a heap to each compartment, receive a request from a first compartment to send a message block to a specified destination compartment, respond to the request by authenticating the request, generating a corresponding encoded capability, conveying the encoded capability to the destination compartment, and scheduling the destination compartment to respond to the request, and subsequently, respond to a check capability request from the destination compartment by checking the encoded capability and, when the check passes, providing a memory address to access the message block, and, otherwise, generating a fault, wherein each compartment is isolated from other compartments.

Read level calibration in memory devices using embedded servo cells

An example memory sub-system includes a memory device and a processing device, operatively coupled to the memory device. The processing device is configured to identify a set of embedded servo cells stored on the memory device; determine a read voltage offset by performing read level calibration based on the set of embedded servo cells; and apply the read voltage offset for reading a memory page associated with the set of embedded servo cells.

Poison Mechanisms for Deferred Invalidates

An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.