IPIQ

G06F8/4442

ORDERING OF SHADER CODE EXECUTION

20210374895 · 2021-12-02 ·

Examples described herein relate to a graphics processing apparatus that includes a memory device and a graphics processing unit (GPU). In some examples, the GPU is configured to execute a shader program that is to identify at least two code blocks that are independent from each other and cause execution of an unexecuted independent code block with available data based on use of a scoreboard to track data availability for independent code blocks. In some examples, execution of the shader program is to cause the GPU to select a first code block identifier for tracking completion of a dependency of the first independent code block. In some examples, execution of the shader program is to cause the GPU to identify an offset to a first instruction position in a sequence of instructions of the first independent code block in an instruction queue.

Efficient scheduling of load instructions

11372677 · 2022-06-28 ·

Amazon Technologies, Inc.

Robert Geva

When scheduling instructions for execution on a computing device, load instructions are processed before their dependent computational instructions. This can result in the load instructions being scheduled in a non-optimal order. To schedule the load instructions in a preferred order, a scheduler can speculatively schedule the load instructions without committing to their order. Subsequently, when the scheduler encounters the dependent computational instructions, the scheduler can reorder the speculatively scheduled load instructions according to the execution order of the dependent computational instructions.

Tuning of loop orders in blocked dense basic linear algebra subroutines

11354564 · 2022-06-07 ·

Intel Corporation

An example includes a sequence generator to generate a plurality of sequence pairs, a first one of the sequence pairs including: (i) a first input sequence representing first accesses to first tensors in a first loop nest of a first computer program, and (ii) a first output sequence representing a first tuned loop nest corresponding to the first accesses to the first tensors in the first loop nest; a model trainer to train a recurrent neural network based on the sequence pairs as training data, the recurrent neural network to be trained to tune loop ordering of a second computer program based on a second input sequence representing second accesses to a second tensor in a second loop nest of the second computer program; and a memory interface to store, in memory, a trained model corresponding to the recurrent neural network.

Methods and systems for nested stream prefetching for general purpose central processing units

11740906 · 2023-08-29 ·

Huawei Technologies Co., Ltd.

A method and hardware system to remove the overhead caused by having stream handling instructions in nested loops. Where code contains inner loops, nested in outer loops, a compiler pass identifies qualified nested streams and generates ISA specific instructions for transferring stream information linking an inner loop stream with an outer loop stream, to hardware components of a co-designed prefetcher. The hardware components include a frontend able to decode and execute instructions for a stream linking information transfer mechanism, a stream engine unit with a streams configuration table (SCT) having a field for allowing a subordinate stream to stay pending for values from its master stream, and a stream prefetch manager with buffers for storing values of current elements of a master stream, and with a nested streams control unit for reconfiguring and iterating the streams.

METHOD FOR THE EXECUTION OF A COMPUTER PROGRAM BY AN ELECTRONIC COMPUTING DEVICE COMPRISING A MAIN MEMORY AND A SECONDARY MEMORY

20220147442 · 2022-05-12 ·

COMMISSARIAT À L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES

A computing device divides an area of a main memory wherein a data structure is saved into NbS1 subdivisions, and then the computing device computes a weight w.sub.S,NbS1(k) for each of the NbS1 subdivisions using the following relationship: w.sub.S,NbS1(k)=P.sub.S(1+(k−1)×(NbS0−1)/(NbS1−1)), where: k is the order number k of one of the NbS1 subdivisions, and P.sub.S( ) is a predetermined function that is continuous over an interval [1; NbS0] and defined over each interval [k.sub.0, k.sub.0+1] by a polynomial of order less than four, where k.sub.0 is an integer order number contained in the interval [1; NbS0], and then when a datum D.sub.k,n contained in a subdivision k of the main memory has to be transferred to a secondary memory, the computing device transfers a block of w.sub.S,NbS1(k) data containing the datum D.sub.k,n where w.sub.S,NbS1(k) is the weight computed for this subdivision k.

Shared compilation cache verification system

11726756 · 2023-08-15 ·

Google Llc

Example embodiments of the present disclosure provide, in one example aspect, an example computer-implemented method for verification of a shared cache. The example method can include retrieving a precompiled shared cache entry corresponding to a shared cache key, the shared cache key being associated with an operation request. The example method can include obtaining a directly compiled resource associated with the operation request. The example method can include certifying one or more portions of the shared cache based at least in part on a comparison of the precompiled shared cache entry and the directly compiled resource.

REMOVING BRANCHING PATHS FROM A COMPUTER PROGRAM

20230244455 · 2023-08-03 ·

Nicolas Toper

Methods and systems are described for removing branches from a computer program. The system receives code for a computer program, with the code including a number of branches. Each branch is part of a branching path and includes a jump instruction. The system executes the code, and upon encountering a branching path at runtime, the system proceeds with a number of steps. First, the system computes the result of the branch, then prefetches independent instructions outside of the branch to be executed. The system then executes one or more of the prefetched independent instructions and removes an if statement within the jump instruction of the branch at the computed result of the branching path. The system then executes the jump instruction of the branch at the computed result of the branching path.

Thread prefetch mechanism

11232536 · 2022-01-25 ·

Intel Corporation

An apparatus to facilitate data prefetching is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads and prefetch logic to prefetch pages of data from the memory to assist in the execution of the plurality of processing threads.

Information processing apparatus, computer-readable recording medium storing therein compiler program, and compiling method

11231917 · 2022-01-25 ·

Fujitsu Limited

An information processing apparatus includes a memory; and a processor coupled to the memory and the processor configured to when source code includes an instruction for storing units of data in an area of an N-dimensional variable-length array (N being an integer and a value of N being equal to or greater than 2), generate object code in the memory to cause the units of data to be stored in an area of an N-dimensional fixed-length array instead of the area of the N-dimensional variable-length array, and when the source code includes an instruction for successively accessing the unit of data stored in the area of the N-dimensional variable-length array, generate the object code in the memory to cause the units of data stored in the area of the N-dimensional fixed-length array to be stored contiguously in an area of a one-dimensional fixed-length array.

REMOVING BRANCHING PATHS FROM A COMPUTER PROGRAM

20220019416 · 2022-01-20 ·

Nicolas Toper

Patent classifications

G06F8/4442