IPIQ

G06F9/3888

Programmable graphics processor for multithreaded execution of programs

10217184 · 2019-02-26 ·

Nvidia Corporation

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

CONVERSION OF UNORM INTEGER VALUES TO FLOATING-POINT VALUES IN LOW POWER

20190042246 · 2019-02-07 ·

Intel Corporation

Methods and apparatus relating to conversion of an unsigned normalized (unorm) integer values to floating-point (float) values in low power are described. In an embodiment, conversion logic converts a unorm integer value to a floating-point value based on detection of whether the unorm integer matches one of three cases, wherein the unorm integer value comprises n bits. Memory stores a count value corresponding to n?1 bits of the unorm integer value after detection of a leading 1 in the unorm integer value. The three cases include: a first case with all zeros, a second case with all ones, and a third case with a combination of one or more zeros and one or more ones. Other embodiments are also disclosed and claimed.

APPARATUS AND METHOD FOR GANG INVARIANT OPERATION OPTIMIZATIONS

20190042269 · 2019-02-07 ·

An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more conditionally invariant operations will be invariant across threads of a parallel execution group and/or across the parallel execution lanes; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across threads of a parallel execution group and/or across the parallel execution lanes.

Processing Circuitry for Encoded Fields of Related Threads

20190034166 · 2019-01-31 ·

Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads. In some embodiments, the circuitry is configured to generate a result for the at least one of the multiple threads by selectively performing the arithmetic operation or using the input value that is common to the multiple threads based on the type value.

MONITOR SUPPORT ON ACCELERATED PROCESSING DEVICE

20190034151 · 2019-01-31 ·

Advanced Micro Devices, Inc.

A technique for implementing synchronization monitors on an accelerated processing device (APD) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an entry queue. The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a condition queue. When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a ready queue, and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.

Compiler-based instruction scoreboarding

10191724 · 2019-01-29 ·

Intel Corporation

Methods and apparatus relating to techniques for compiler-based instruction scoreboarding. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to remove unnecessary dependence edges from a data dependency graph, partition the data dependency graph into a plurality of sub-graphs, determine a live range for each of the plurality of sub-graphs, and assign a scoreboard entry to each of the plurality of sub-graphs, wherein sub-graphs which have interfering live ranges are assigned different scoreboard entries. Other embodiments are also disclosed and claimed.

Data processing systems

10176546 · 2019-01-08 ·

Arm Limited

Jorn Nystad

A data processing system determines for a stream of instructions to be executed, whether there are any instructions that can be re-ordered in the instruction stream 41 and assigns each such instruction to an instruction completion tracker and includes in the encoding for the instruction an indication of the instruction completion tracker it has been assigned to 42. For each instruction in the instruction stream, an indication of which instruction completion trackers, if any, the instruction depends on is also provided 43, 44. Then, when an instruction that is indicated as being dependent on an instruction completion tracker is to be executed, the status of the relevant instruction completion tracker is checked before executing the instruction.

REGISTER PARTITION AND PROTECTION FOR VIRTUALIZED PROCESSING DEVICE

20190004840 · 2019-01-03 ·

Ati Technologies Ulc

A register protection mechanism for a virtualized accelerated processing device (APD) is disclosed. The mechanism protects registers of the accelerated processing device designated as physical-function-or-virtual-function registers (PF-or-VF* registers), which are single architectural instance registers that are shared among different functions that share the APD in a virtualization scheme whereby each function can maintain a different value in these registers. The protection mechanism for these registers comprises comparing the function associated with the memory address specified by a particular register access request to the currently active function for the APD and disallowing the register access request if a match does not occur.

Page faulting and selective preemption

12067641 · 2024-08-20 ·

Intel Corporation

One embodiment provides a parallel processor comprising a memory interface and a processing array coupled with the memory interface. The processing array is configured to address memory accessed via the memory interface via a virtual address mapping and includes circuitry to resolve a page fault for the virtual address mapping, wherein each of the multiple compute blocks is separately preemptable.

Graphics processing

12067668 · 2024-08-20 ·

Arm Limited

There is provided an instruction, or instructions, that can be included in a program to perform a ray tracing operation, with individual execution threads in a group of execution threads executing the program performing the ray tracing operation for a respective ray in a corresponding group of rays such that the group of rays performing the ray tracing operation together. The instruction(s), when executed by the execution threads will cause one or more rays from the group of plural rays to be tested for intersection with a set of primitives. A result of the ray-primitive intersection testing can then be returned for the traversal operation.

Patent classifications

G06F9/3888