Patent classifications
G06F2212/455
OPPORTUNISTIC LATE DEPTH TESTING TO PREVENT STALLING FOR OVERLAPPING CACHE LINES
Methods, systems and apparatuses provide for graphics processor technology that determines whether a first cache line allocated for early depth testing overlaps a second cache line allocated for late depth testing, and when the first cache line overlaps the second cache line, switches the first cache line to be allocated for late depth testing, and bypasses an early depth test for the first cache line. The technology can also compare coordinates of the first cache line with the coordinates of the second cache line, where an overlap is determined when coordinates for at least one pixel in the first cache line match coordinates for at least one pixel in the second cache line. Additionally, the technology can also perform early depth testing on each pixel in the first cache line when the first cache line does not overlap any existing cache lines allocated for late depth testing.
DATA COMMUNICATION METHOD, COMMUNICATION SYSTEM AND COMPUTER-READABLE STORAGE MEDIUM
The present application provides a data communication method, a communication system and a computer-readable storage medium. The method comprises: acquiring, by a data production module, target data to be sent to a data consumption module; determining in a preset GPU shared memory, by the data production module, a target memory block into which the target data is to be written, wherein the GPU shared memory is a predetermined GPU memory for data communication between the data production module and the data consumption module; writing, by the data production module, the target data into the target memory block to obtain memory address information corresponding to the target data; and sending, by the data production module, the memory address information to the data consumption module so that the data consumption module is operable to access the target data based on the memory address information.
DISPLAY ENGINE INITIATED PREFETCH TO SYSTEM CACHE TO TOLERATE MEMORY LONG BLACKOUT
A disclosed technique includes prefetching display data into a cache memory, wherein the display data includes data to be displayed on a display during a memory black-out period for a memory; triggering the memory black-out period; and during the black-out period, reading from the cache memory to obtain data to be displayed on the display.
METHODS AND APPARATUSES FOR DYNAMICALLY CHANGING DATA PRIORITY IN A CACHE
Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
Memory sharing via a unified memory architecture
A method and system for sharing memory between a central processing unit (CPU) and a graphics processing unit (GPU) of a computing device are disclosed herein. The method includes allocating a surface within a physical memory and mapping the surface to a plurality of virtual memory addresses within a CPU page table. The method also includes mapping the surface to a plurality of graphics virtual memory addresses within an I/O device page table.
Data structure-aware prefetching method and device on graphics processing unit
The invention discloses a data structure-aware prefetching method and device on a graphics processing unit. The method comprises the steps of acquiring information for a memory access request in which a monitoring processor checks a graph data structure and read data, using a data structure access mode defined by a breadth first search and graph data structure information to generate four corresponding vector prefetching requests and store into a prefetching request queue. The device comprises a data prefetching unit distributed into each processing unit, each data prefetching unit is respectively connected with an memory access monitor, a response FIFO and a primary cache of a load/store unit, and comprises an address space classifier, a runtime information table, prefetching request generation units and the prefetching request queue. According to the present invention, data required by graph traversal can be prefetched more accurately and efficiently using the breadth first search, thereby improving the performance of GPU to solve a graph computation problem.
Data Processing Method and Device
A data processing method is applied to a digital interface, which includes: reading data cached by a data source, where the data source includes a video source and an auxiliary data source; outputting video data, if the video data cached by the video source is not empty, where when the video data is output, corresponding position marks are at start and end positions of a frame structure of the video data and at start and end positions of a row structure of the video data; and outputting auxiliary data, if the video data cached by the video source is empty, the auxiliary data cached by the auxiliary data source is not empty and the frame structure or the row structure of the video data has been output, where when the auxiliary data is output, corresponding position marks are at a start position and an end position of the auxiliary data.
Cache arrangement for graphics processing systems
A graphics processing system is disclosed having a cache system (24) arranged between memory (23) and the graphics processor (20), the cache system comprising a first cache (53) for transferring data to and from the graphics processor (20) and a second cache (54) arranged and configured to transfer data between the first cache (53) and memory (23). When data is to be written from the first cache (53) to memory (23), a cache controller (55) determines a data type of the data and, in dependence on the data type, either causes the data to be written into the second cache (54) without writing the data to memory (23), or causes the data to be written to memory (23) without storing the data in the second cache (54). In embodiments the second cache (54) is write-only allocated.
Register spill/fill using shared local memory space
A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.
Systems for performing instructions for fast element unpacking into 2-dimensional registers
Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.