Patent classifications
G06F2212/302
Method and Apparatus for Shared Virtual Memory to Manage Data Coherency in a Heterogeneous Processing System
One embodiment provides for a heterogeneous computing device comprising a first processor coupled with a second processor, wherein one or more of the first or second processor includes graphics processing logic; wherein each of the first processor and the second processor includes first logic to perform virtual to physical memory address translation; and wherein the first logic includes cache coherency state for a block of memory associated with a virtual memory address.
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
MEMORY MODULE, SYSTEM INCLUDING THE SAME, AND OPERATION METHOD OF MEMORY MODULE
A memory module includes a device memory configured to store data and including a first memory area and a second memory area, and a controller including an accelerator circuit. The controller is configured to control the device memory, transmit a command to exclude the first memory area from the system memory map to a host processor in response to a mode change request, and modify a memory configuration register to exclude the first memory area from the memory configuration register. The accelerator circuit is configured to use the first memory area to perform an acceleration operation.
IN-MEMORY DATABASE (IMDB) ACCELERATION THROUGH NEAR DATA PROCESSING
An accelerator is disclosed. The accelerator may include an on-chip memory to store a data from a database. The on-chip memory may include a first memory bank and a second memory bank. The first memory bank may store the data, which may include a first value and a second value. A computational engine may execute, in parallel, a command on the first value in the data and the command on the second value in the data in the on-chip memory. The on-chip memory may be configured to load a second data from the database into the second memory bank in parallel with the computation engine executing the command on the first value in the data and executing the command on the second value in the data.
DYNAMIC ASSIGNMENT OF DOWN SAMPLING INTERVALS FOR DATA STREAM PROCESSING
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Policy-based system interface for a real-time autonomous system
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Tile-based graphics
A tile-based graphics system has a rendering space sub-divided into a plurality of tiles which are to be processed. Graphics data items, such as parameters or texels, are fetched into a cache for use in processing one of the tiles. Indicators are determined for the graphics data items, whereby the indicator for a graphics data item indicates the number of tiles with which that graphics data item is associated. The graphics data items are evicted from the cache in accordance with the indicators of the graphics data items. For example, the indicator for a graphics data item may be a count of the number of tiles with which that graphics data item is associated, whereby the graphics data item(s) with the lowest count(s) is (are) evicted from the cache.
METHODS AND APPARATUSES FOR DYNAMICALLY CHANGING DATA PRIORITY IN A CACHE
Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
Trusted local memory management in a virtualized GPU
Embodiments are directed to trusted local memory management in a virtualized GPU. An embodiment of an apparatus includes one or more processors including a trusted execution environment (TEE); a GPU including a trusted agent; and a memory, the memory including GPU local memory, the trusted agent to ensure proper allocation/deallocation of the local memory and verify translations between graphics physical addresses (PAs) and PAs for the apparatus, wherein the local memory is partitioned into protection regions including a protected region and an unprotected region, and wherein the protected region to store a memory permission table maintained by the trusted agent, the memory permission table to include any virtual function assigned to a trusted domain, a per process graphics translation table to translate between graphics virtual address (VA) to graphics guest PA (GPA), and a local memory translation table to translate between graphics GPAs and PAs for the local memory.
METHODS AND APPARATUS FOR WAVE SLOT RETIREMENT PROCEDURES
The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of workloads based on a workload order, each of the plurality of workloads being received in the workload order including at least a first workload and a second workload. The apparatus may also allocate one or more workloads of the plurality of workloads to one or more wave slots. Additionally, the apparatus may execute the one or more allocated workloads at the one or more wave slots, such that at least the first workload is executed at the first wave slot and the second workload is executed at the second wave slot. The apparatus may also allocate at least one other workload of the plurality of workloads to at least one previously-allocated wave slot of the one or more wave slots.