Patent classifications
G06T2210/52
NOISE SYNTHESIS FOR DIGITAL IMAGES
Apparatus and methods for providing software and hardware based solutions to the problem of synthesizing noise for a digital image. According to one aspect, a probability image is generated and noise blocks are randomly placed at locations in the probability image where the locations have probability values that are compared to a threshold criterion, creating a synthesized noise image. Embodiments include generating synthesized film grain images and synthesized digital camera noise images.
Compute cluster preemption within a general-purpose graphics processing unit
Embodiments described herein provide techniques enable a graphics processor to continue processing operations during the reset of a compute unit that has experienced a hardware fault. Threads and associated context state for a faulted compute unit can be migrated to another compute unit of the graphics processor and the faulting compute unit can be reset while processing operations continue.
Position-based rendering apparatus and method for multi-die/GPU graphics processing
Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
PROCESSING WORK ITEMS IN PROCESSING LOGIC
A plurality of work items are processed through a processing pipeline comprising a plurality of stages in processing logic. The processing of a work item includes: (i) reading data in accordance with a memory address associated with the work item, (ii) updating the read data, and (iii) writing the updated data in accordance with the memory address associated with the work item. The method includes processing a first work item and a second work item through the processing pipeline, wherein the processing of the first work item through the pipeline is initiated earlier than the processing of the second work item, and where it is determined that the first and second work items are associated with the same memory address, first updated data of the first work item is written to a register in the processing logic, and the processing of the second work item comprises reading the first updated data from the register instead of reading data from the memory.
FULLY-FUSED NEURAL NETWORK EXECUTION
A fully-connected neural network may be configured for execution by a processor as a fully-fused neural network by limiting slow global memory accesses to reading and writing inputs to and outputs from the fully-connected neural network. The computational cost of fully-connected neural networks scale quadratically with its width, whereas its memory traffic scales linearly. Modern graphics processing units typically have much greater computational throughput compared with memory bandwidth, so that for narrow, fully-connected neural networks, the linear memory traffic is the bottleneck. The key to improving performance of the fully-connected neural network is to minimize traffic to slow “global” memory (off-chip memory and high-level caches) and to fully utilize fast on-chip memory (low-level caches, “shared” memory, and registers), which is achieved by the fully-fused approach. A real-time neural radiance caching technique for path-traced global illumination is implemented using the fully-fused neural network for caching scattered radiance components of global illumination.
Accelerated processing via a physically based rendering engine
One embodiment of a computer-implemented method for processing ray tracing operations in parallel includes receiving a plurality of rays and a corresponding set of importance sampling instructions for each ray included in the plurality of rays for processing, wherein each ray represents a path from a light source to at least one point within a three-dimensional (3D) environment, and each corresponding set of importance sampling instruction is based at least in part on one or more material properties associated with at least one surface of at least one object included in the 3D environment; assigning each ray included in the plurality of rays to a different processing core included in a plurality of processing cores; and for each ray included in the plurality of rays, causing the processing core assigned to the ray to execute the corresponding set of importance sampling instructions on the ray to generate a direction for a secondary ray that is produced when the ray intersects a surface of an object within the 3D environment.
Multistage collector for outputs in multiprocessor systems
Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.
Method and apparatus for providing 360 stitching workflow and parameter
Disclosed herein is a method of creating an image stitching workflow including acquiring 360-degree virtual reality (VR) image parameters necessary to makes a request for image stitching and create the image stitching workflow, acquiring a list of functions applicable to the image stitching workflow, creating the image stitching workflow based on functions selected from the list of functions, determining the number of media processing entities necessary to perform tasks configuring the image stitching workflow and generating a plurality of media processing entities according to the determined number of media processing entities, and allocating the tasks configuring the image stitching workflow to the plurality of media processing entities.
Data structures, methods and tiling engines for storing tiling information in a graphics processing system
Data structures, methods and tiling engines for storing tiling data in memory wherein the tiles are grouped into tile groups and the primitives are grouped into primitive blocks. The methods include, for each tile group: determining, for each tile in the tile group, which primitives of each primitive block intersect that tile; storing in memory a variable length control data block for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group; and storing in memory a control stream comprising a fixed sized primitive block entry for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group, each primitive block entry identifying a location in memory of the control data block for the corresponding primitive block. Each primitive block entry may comprise valid tile information identifying which tiles of the tile group are valid for the corresponding primitive block. A tile is a valid tile for a primitive block if at least one primitive in the primitive block intersects that tile.
GRAPHICS PROCESSING
An instruction (or set of instructions) that can be included in a program to perform a ray tracing acceleration data structure traversal, with individual execution threads in a group of execution threads executing the program performing a traversal operation for a respective ray in a corresponding group of rays such that the group of rays performing the traversal operation together. The instruction(s), when executed by the execution threads in respect of a node of the ray tracing acceleration data structure, cause one or more rays from the group of plural rays that are performing the traversal operation together to be tested for intersection with the one or more volumes associated with the node being tested. A result of the ray-volume intersection testing can then be returned for the traversal operation.