Patent classifications
G06F15/7839
GRAPHICS PROCESSOR DATA ACCESS AND SHARING
- Altug Koker ,
- Varghese George ,
- Aravindh Anantaraman ,
- Valentin Andrel ,
- Abhishek R. Appu ,
- Niranjan Cooray ,
- Nicolas Galoppo von Borries ,
- Mike Macpherson ,
- Subramaniam Maiyuran ,
- Elmoustapha Ould-Ahmed-Vall ,
- David Puffer ,
- Vasanth Ranganathan ,
- Joydeep Ray ,
- Ankur N. Shah ,
- Lakshminarayanan Striramassarma ,
- Prasoonkumar Surti ,
- Saurabh Tangri
Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
MEMORY CONTROLLER MANAGEMENT TECHNIQUES
Methods and apparatus relating to memory controller techniques. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, and a processor communicatively coupled to the cache memory and the high-bandwidth memory, the processor to manage data transfer between the cache memory and the high-bandwidth memory for memory access operations directed to the high-bandwidth memory. Other embodiments are also disclosed and claimed.
CACHE STRUCTURE AND UTILIZATION
Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.
COMPUTE OPTIMIZATION IN GRAPHICS PROCESSING
Embodiments are generally directed to compute optimization in graphics processing. An embodiment of an apparatus includes one or more processors including a multi-tile graphics processing unit (GPU) to process data, the multi-tile GPU including multiple processor tiles; and a memory for storage of data for processing, wherein the apparatus is to receive compute work for processing by the GPU, partition the compute work into multiple work units, assign each of multiple work units to one of the processor tiles, and process the compute work using the processor tiles assigned to the work units.
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.
MULTI-TILE ARCHITECTURE FOR GRAPHICS OPERATIONS
- Altug Koker ,
- Ben Ashbaugh ,
- Scott Janus ,
- Aravindh Anantaraman ,
- Abhishek R. Appu ,
- Niranjan Cooray ,
- Varghese George ,
- Arthur Hunter ,
- Brent E. Insko ,
- Elmoustapha Ould-Ahmed-Vall ,
- Selvakumar Panneer ,
- Vasanth Ranganathan ,
- Joydeep Ray ,
- Kamal Sinha ,
- Lakshminarayanan Striramassarma ,
- Surti Prasoonkumar ,
- Saurabh Tangri
Embodiments are generally directed to a multi-tile architecture for graphics operations. An embodiment of an apparatus includes a multi-tile architecture for graphics operations including a multi-tile graphics processor, the multi-tile processor includes one or more dies; multiple processor tiles installed on the one or more dies; and a structure to interconnect the processor tiles on the one or more dies, wherein the structure to enable communications between processor tiles the processor tiles.
Multi-tile Memory Management for Detecting Cross Tile Access Providing Multi-Tile Inference Scaling and Providing Page Migration
- Lakshminarayanan Striramassarma ,
- Prasoonkumar Surti ,
- Varghese George ,
- Ben Ashbaugh ,
- Aravindh Anantaraman ,
- Valentin Andrei ,
- Abhishek Appu ,
- Nicolas Galoppo von Borries ,
- Altug Koker ,
- Mike Macpherson ,
- Subramaniam Maiyuran ,
- Nilay Mistry ,
- Elmoustapha Ould-Ahmed-Vall ,
- Selvakumar Panneer ,
- Vasanth Ranganathan ,
- Joydeep Ray ,
- Ankur Shah ,
- Saurabh Tangri
Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
MULTI-TILE MEMORY MANAGEMENT
- Abhishek R. Appu ,
- Altug Koker ,
- Aravindh Anantaraman ,
- Elmoustapha Ould-Ahmed-Vall ,
- Valentin Andrei ,
- Nicolas Galoppo von Borries ,
- Varghese George ,
- Mike Macpherson ,
- Subramaniam Maiyuran ,
- Joydeep Ray ,
- Lakshminarayana Striramassarma ,
- Scott Janus ,
- Brent Insko ,
- Vasanth Ranganathan ,
- Kamal Sinha ,
- Arthur Hunter ,
- Prasoonkumar Surti ,
- David Puffer ,
- James Valerio ,
- Ankur N. Shah
Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.
GRAPHICS PROCESSOR OPERATION SCHEDULING FOR DETERMINISTIC LATENCY
Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.
COMPRESSION TECHNIQUES
Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.