G06F12/0875

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

TECHNOLOGY FOR EARLY ABORT OF COMPRESSION ACCELERATION

An integrated circuit includes a compression accelerator to process a request from software to compress source data into an output file. The compression accelerator includes early-abort circuitry to provide for early abort of compression operations. In particular, the compression accelerator uses a predetermined sample size to compute an estimated size for a portion of the output file. The sample size specifies how much of the source data is to be analyzed before computing the estimated size. The compression accelerator also determines whether the estimated size reflects an acceptable amount of compression, based on a predetermined early-abort threshold. The compression accelerator aborts the request if the estimated size does not reflect the acceptable amount of compression. The compression accelerator may complete the request if the estimated size reflects the acceptable amount of compression. Other embodiments are described and claimed.

TECHNOLOGY FOR EARLY ABORT OF COMPRESSION ACCELERATION

An integrated circuit includes a compression accelerator to process a request from software to compress source data into an output file. The compression accelerator includes early-abort circuitry to provide for early abort of compression operations. In particular, the compression accelerator uses a predetermined sample size to compute an estimated size for a portion of the output file. The sample size specifies how much of the source data is to be analyzed before computing the estimated size. The compression accelerator also determines whether the estimated size reflects an acceptable amount of compression, based on a predetermined early-abort threshold. The compression accelerator aborts the request if the estimated size does not reflect the acceptable amount of compression. The compression accelerator may complete the request if the estimated size reflects the acceptable amount of compression. Other embodiments are described and claimed.

OPPORTUNISTIC LATE DEPTH TESTING TO PREVENT STALLING FOR OVERLAPPING CACHE LINES

Methods, systems and apparatuses provide for graphics processor technology that determines whether a first cache line allocated for early depth testing overlaps a second cache line allocated for late depth testing, and when the first cache line overlaps the second cache line, switches the first cache line to be allocated for late depth testing, and bypasses an early depth test for the first cache line. The technology can also compare coordinates of the first cache line with the coordinates of the second cache line, where an overlap is determined when coordinates for at least one pixel in the first cache line match coordinates for at least one pixel in the second cache line. Additionally, the technology can also perform early depth testing on each pixel in the first cache line when the first cache line does not overlap any existing cache lines allocated for late depth testing.

CACHE SUPPORT FOR INDIRECT LOADS AND INDIRECT STORES IN GRAPH APPLICATIONS

Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.

CACHE SUPPORT FOR INDIRECT LOADS AND INDIRECT STORES IN GRAPH APPLICATIONS

Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.

PROCESSING DEVICE AND METHOD OF USING A REGISTER CACHE
20220413858 · 2022-12-29 · ·

A processing device is provided which comprises memory, a plurality of registers and a processor. the processor is configured to execute a plurality of portions of a program, allocate a number of the registers per portion of the program such that a number of remaining registers are available as a register cache and transfer data between the number of registers, which are allocated per portion of the program, and the register cache. The processor loads data to the allocated registers to execute a portion of the program, stores data, resulting from execution of the portion, in the register cache, reloads the data in the allocated registers and executes another portion of the program using the data reloaded to the allocated registers and A called function uses the number of allocated registers, which is less than an architectural limit of registers allocated per portion of the program.

MEMORY MANAGEMENT FOR OVERLAP DATA BETWEEN TILES OF NEURAL NETWORKS
20220413752 · 2022-12-29 ·

Techniques for providing an overlap data buffer to store portions of tiles between passes of chained layers of a neural network are described. One accelerator circuit includes one or more processing units to execute instructions corresponding to the chained layers in multiple passes. In a first pass, the processing unit(s) receives a first input tile of an input feature map from a primary buffer and performs a first operation on the first input tile to obtain a first output tile. The processing unit stores the first output tile in the primary buffer and identifies a portion of the first output tile as corresponding to overlap data between tiles of the input feature map. The processing unit stores the portion in a secondary buffer. In a second pass, the processing unit retrieves the portion to avoid fetching the portion that overlaps and computing the overlap data again.

Data sharing system and data sharing method therefor

The application provides an information processing device, system and method. The information processing device mainly includes a storage module and a data processing module, where the storage module is configured to receive and store input data, instruction and output data, and the input data includes one or more key features; the data processing module is configured to identify the key features included in the input data and score the input data in the storage module according to a judgment result. The information processing device, system and method provided by the application automatically scores text, pictures, audio, video, and the like instead of manually scoring, which is more accurate and faster.

Data sharing system and data sharing method therefor

The application provides an information processing device, system and method. The information processing device mainly includes a storage module and a data processing module, where the storage module is configured to receive and store input data, instruction and output data, and the input data includes one or more key features; the data processing module is configured to identify the key features included in the input data and score the input data in the storage module according to a judgment result. The information processing device, system and method provided by the application automatically scores text, pictures, audio, video, and the like instead of manually scoring, which is more accurate and faster.