Patent classifications
G06T1/60
PERFORMING MULTIPLE POINT TABLE LOOKUPS IN A SINGLE CYCLE IN A SYSTEM ON CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
PERFORMING MULTIPLE POINT TABLE LOOKUPS IN A SINGLE CYCLE IN A SYSTEM ON CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Logical Slot to Hardware Slot Mapping for Graphics Processors
Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
Logical Slot to Hardware Slot Mapping for Graphics Processors
Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
METHOD AND APPARATUS FOR OPERATING IMAGE DATA
The disclosure relates to method and apparatus for operating image data. The method includes: reading matrix data from the image data based on a matrix size, M rows and N columns, of an image operator (220); calculating column data in the matrix data with a single calculation instruction corresponding to the image operator, to obtain an intermediate calculation result (240); multiplexing and rearranging the intermediate calculation result into N rows of cached data (260); calculating matrix elements of a target column in the N rows of cached data with the single calculation instruction, to obtain a calculation result of the matrix data under the single calculation instruction (280); and outputting the calculation result as an image processing result of the matrix data by the image operator (300).
METHOD AND APPARATUS FOR OPERATING IMAGE DATA
The disclosure relates to method and apparatus for operating image data. The method includes: reading matrix data from the image data based on a matrix size, M rows and N columns, of an image operator (220); calculating column data in the matrix data with a single calculation instruction corresponding to the image operator, to obtain an intermediate calculation result (240); multiplexing and rearranging the intermediate calculation result into N rows of cached data (260); calculating matrix elements of a target column in the N rows of cached data with the single calculation instruction, to obtain a calculation result of the matrix data under the single calculation instruction (280); and outputting the calculation result as an image processing result of the matrix data by the image operator (300).
SUPER RESOLUTION USING CONVOLUTIONAL NEURAL NETWORK
An apparatus for super resolution imaging includes a convolutional neural network (104) to receive a low resolution frame (102) and generate a high resolution illuminance component frame. The apparatus also includes a hardware scaler (106) to receive the low resolution frame (102) and generate a second high resolution chrominance component frame. The apparatus further includes a combiner (108) to combine the high resolution illuminance component frame and the high resolution chrominance component frame to generate a high resolution frame (110).
Kickslot Manager Circuitry for Graphics Processors
Disclosed embodiments relate to controlling sets of graphics work (e.g., kicks) assigned to graphics processor circuitry. In some embodiments, tracking slot circuitry implements entries for multiple tracking slots. Slot manager circuitry may store, using an entry of the tracking slot circuitry, software-specified information for a set of graphics work, where the information includes: type of work, dependencies on other sets of graphics work, and location of data for the set of graphics work. The slot manager circuitry may prefetch, from the location and prior to allocating shader core resources for the set of graphics work, configuration register data for the set of graphics work. Control circuitry may program configuration registers for the set of graphics work using the prefetched data and initiate processing of the set of graphics work by the graphics processor circuitry according to the dependencies. Disclosed techniques may reduce kick-to-kick transition time, in some embodiments.
Kickslot Manager Circuitry for Graphics Processors
Disclosed embodiments relate to controlling sets of graphics work (e.g., kicks) assigned to graphics processor circuitry. In some embodiments, tracking slot circuitry implements entries for multiple tracking slots. Slot manager circuitry may store, using an entry of the tracking slot circuitry, software-specified information for a set of graphics work, where the information includes: type of work, dependencies on other sets of graphics work, and location of data for the set of graphics work. The slot manager circuitry may prefetch, from the location and prior to allocating shader core resources for the set of graphics work, configuration register data for the set of graphics work. Control circuitry may program configuration registers for the set of graphics work using the prefetched data and initiate processing of the set of graphics work by the graphics processor circuitry according to the dependencies. Disclosed techniques may reduce kick-to-kick transition time, in some embodiments.
Viewpoint dependent brick selection for fast volumetric reconstruction
A method to culling parts of a 3D reconstruction volume is provided. The method makes available to a wide variety of mobile XR applications fresh, accurate and comprehensive 3D reconstruction data with low usage of computational resources and storage spaces. The method includes culling parts of the 3D reconstruction volume against a depth image. The depth image has a plurality of pixels, each of which represents a distance to a surface in a scene. In some embodiments, the method includes culling parts of the 3D reconstruction volume against a frustum. The frustum is derived from a field of view of an image sensor, from which image data to create the 3D reconstruction is obtained.