G06T1/20

PERFORMING MULTIPLE POINT TABLE LOOKUPS IN A SINGLE CYCLE IN A SYSTEM ON CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

PERFORMING MULTIPLE POINT TABLE LOOKUPS IN A SINGLE CYCLE IN A SYSTEM ON CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Logical Slot to Hardware Slot Mapping for Graphics Processors

Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Logical Slot to Hardware Slot Mapping for Graphics Processors

Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Affinity-based Graphics Scheduling

Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

METHOD AND APPARATUS FOR OPERATING IMAGE DATA

The disclosure relates to method and apparatus for operating image data. The method includes: reading matrix data from the image data based on a matrix size, M rows and N columns, of an image operator (220); calculating column data in the matrix data with a single calculation instruction corresponding to the image operator, to obtain an intermediate calculation result (240); multiplexing and rearranging the intermediate calculation result into N rows of cached data (260); calculating matrix elements of a target column in the N rows of cached data with the single calculation instruction, to obtain a calculation result of the matrix data under the single calculation instruction (280); and outputting the calculation result as an image processing result of the matrix data by the image operator (300).

METHOD AND APPARATUS FOR OPERATING IMAGE DATA

The disclosure relates to method and apparatus for operating image data. The method includes: reading matrix data from the image data based on a matrix size, M rows and N columns, of an image operator (220); calculating column data in the matrix data with a single calculation instruction corresponding to the image operator, to obtain an intermediate calculation result (240); multiplexing and rearranging the intermediate calculation result into N rows of cached data (260); calculating matrix elements of a target column in the N rows of cached data with the single calculation instruction, to obtain a calculation result of the matrix data under the single calculation instruction (280); and outputting the calculation result as an image processing result of the matrix data by the image operator (300).

GRAPHICS PROCESSOR SWITCHING BASED ON COUPLED DISPLAY DEVICES

In one example in accordance with the present disclosure, a computing device is described. The computing device includes a number of ports. Each port receives a connection to a display device. A first port is coupled to the first graphics processor which supports a number of display devices and a second graphics processor. The computing device also in-cludes a controller. The controller determines when a number of coupled display devices is greater than the number of display devices supported by the first graphics processor and switches the first port from being driven by the first graphics processor to be driven by the second graphics processor.

GRAPHICS PROCESSOR SWITCHING BASED ON COUPLED DISPLAY DEVICES

In one example in accordance with the present disclosure, a computing device is described. The computing device includes a number of ports. Each port receives a connection to a display device. A first port is coupled to the first graphics processor which supports a number of display devices and a second graphics processor. The computing device also in-cludes a controller. The controller determines when a number of coupled display devices is greater than the number of display devices supported by the first graphics processor and switches the first port from being driven by the first graphics processor to be driven by the second graphics processor.

METHOD AND DEVICE FOR LATENCY REDUCTION OF AN IMAGE PROCESSING PIPELINE
20230053205 · 2023-02-16 ·

In some implementations, a method includes: determining a complexity value for first image data associated with of a physical environment that corresponds to a first time period; determining an estimated composite setup time based on the complexity value for the first image data and virtual content for compositing with the first image data; in accordance with a determination that the estimated composite setup time exceeds the threshold time: forgoing rendering the virtual content from the perspective that corresponds to the camera pose of the device relative to the physical environment during the first time period; and compositing a previous render of the virtual content for a previous time period with the first image data to generate the graphical environment for the first time period.