G06F2212/302

Apparatus and method for memory management in a graphics processing environment

An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.

LOCAL MEMORY TRANSLATION TABLE

Embodiments described herein provide techniques to facilitate access to local memory of a graphics processor by a guest software domain. The guest software domain can access the local memory via an address translation system that includes a local memory translation table.

LOCAL MEMORY TRANSLATION TABLE ACCESSED AND DIRTY FLAGS

Embodiments described herein provide techniques to facilitate access to local memory of a graphics processor by a guest software domain. The guest software domain can access the local memory via an address translation system that includes a local memory translation table. In one embodiment, accessed and/or dirty bits are enabled in the local memory translation table, which may be used to accelerate the GPU local memory portion of VM Migration for a VM that includes a vGPU.

Techniques for an efficient fabric attached memory

Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.

Data transfer system
11226915 · 2022-01-18 · ·

A data transfer system including a first memory and a processor includes a second memory and a DMA controller. The processor performs RMW on data which has a size less than a cache line size and in which a portion of a cache line (a unit area of the first memory) is a write destination. Output target data is transferred from an I/O device to the second memory. Thereafter, the DMA controller transfers the output target data from the second memory to the first memory in one or a plurality of transfer unit sizes by which the number of occurrences of RMW is minimized.

Shared local memory tiling mechanism

An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.

Machine learning sparse computation mechanism

An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.

Computing accelerator for processing multiple-type instruction and operation method thereof

Disclosed is a general-purpose computing accelerator which includes a memory including an instruction cache, a first executing unit performing a first computation operation, a second executing unit performing a second computation operation, an instruction fetching unit fetching an instruction stored in the instruction cache, a decoding unit that decodes the instruction, and a state control unit controlling a path of the instruction depending on an operation state of the second executing unit. The decoding unit provides the instruction to the first executing unit when the instruction is of a first type and provides the instruction to the state control unit when the instruction is of a second type. Depending on the operation state of the second executing unit, the state control unit provides the instruction of the second type to the second executing unit or stores the instruction of the second type as a register file in the memory.

Trusted local memory management in a virtualized GPU

Embodiments are directed to trusted local memory management in a virtualized GPU. An embodiment of an apparatus includes one or more processors including a trusted execution environment (TEE); a GPU including a trusted agent; and a memory, the memory including GPU local memory, the trusted agent to ensure proper allocation/deallocation of the local memory and verify translations between graphics physical addresses (PAs) and PAs for the apparatus, wherein the local memory is partitioned into protection regions including a protected region and an unprotected region, and wherein the protected region to store a memory permission table maintained by the trusted agent, the memory permission table to include any virtual function assigned to a trusted domain, a per process graphics translation table to translate between graphics virtual address (VA) to graphics guest PA (GPA), and a local memory translation table to translate between graphics GPAs and PAs for the local memory.

Virtualization and multi-tenancy support in graphics processors

Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.