G06F12/0882

DETECTION OF MEMORY ACCESSES
20230044342 · 2023-02-09 ·

Examples described herein relate to dynamically adjust a manner of identifying hot pages in a remote memory pool based on adjustment of parameters of a data structure. In some examples, the parameters of the data structure include a range of number of access counts and a number of pages associated with the range.

DETECTION OF MEMORY ACCESSES
20230044342 · 2023-02-09 ·

Examples described herein relate to dynamically adjust a manner of identifying hot pages in a remote memory pool based on adjustment of parameters of a data structure. In some examples, the parameters of the data structure include a range of number of access counts and a number of pages associated with the range.

SUPPORTING FAULT INFORMATION DELIVERY

A processor implementing techniques to supporting fault information delivery is disclosed. In one embodiment, the processor includes a memory controller unit to access an enclave page cache (EPC) and a processor core coupled to the memory controller unit. The processor core to detect a fault associated with accessing the EPC and generate an error code associated with the fault. The error code reflects an EPC-related fault cause. The processor core is further to encode the error code into a data structure associated with the processor core. The data structure is for monitoring a hardware state related to the processor core.

SUPPORTING FAULT INFORMATION DELIVERY

A processor implementing techniques to supporting fault information delivery is disclosed. In one embodiment, the processor includes a memory controller unit to access an enclave page cache (EPC) and a processor core coupled to the memory controller unit. The processor core to detect a fault associated with accessing the EPC and generate an error code associated with the fault. The error code reflects an EPC-related fault cause. The processor core is further to encode the error code into a data structure associated with the processor core. The data structure is for monitoring a hardware state related to the processor core.

Mitigating read disturb effects in memory devices

A die read counter and a block read counter are maintained for a specified block of a memory device. An estimated number of read events associated with the specified block is determined based on a value of the block read counter and a value of the die read counter. Responsive to determining that the estimated number of read events satisfies a criterion, a media management operation of one or more pages associated with the specified block is performed.

Mitigating read disturb effects in memory devices

A die read counter and a block read counter are maintained for a specified block of a memory device. An estimated number of read events associated with the specified block is determined based on a value of the block read counter and a value of the die read counter. Responsive to determining that the estimated number of read events satisfies a criterion, a media management operation of one or more pages associated with the specified block is performed.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

VIRTUALIZED CACHES
20230004494 · 2023-01-05 ·

Systems and methods are disclosed for virtualized caches. For example, an integrated circuit (e.g., a processor) for executing instructions includes a virtually indexed physically tagged first-level (L1) cache configured to output to an outer memory system one or more bits of a virtual index of a cache access as one or more bits of a requestor identifier. For example, the L1 cache may be configured to operate as multiple logical L1 caches with a cache way of a size less than or equal to a virtual memory page size. For example, the integrated circuit may include an L2 cache of the outer memory system that is configured to receive the requestor identifier and implement a cache coherency protocol to disambiguate an L1 synonym occurring in multiple portions of the virtually indexed physically tagged L1 cache associated with different requestor identifier values.

VIRTUALIZED CACHES
20230004494 · 2023-01-05 ·

Systems and methods are disclosed for virtualized caches. For example, an integrated circuit (e.g., a processor) for executing instructions includes a virtually indexed physically tagged first-level (L1) cache configured to output to an outer memory system one or more bits of a virtual index of a cache access as one or more bits of a requestor identifier. For example, the L1 cache may be configured to operate as multiple logical L1 caches with a cache way of a size less than or equal to a virtual memory page size. For example, the integrated circuit may include an L2 cache of the outer memory system that is configured to receive the requestor identifier and implement a cache coherency protocol to disambiguate an L1 synonym occurring in multiple portions of the virtually indexed physically tagged L1 cache associated with different requestor identifier values.