G06F8/451

ARITHMETIC LOGIC UNIT LAYOUT FOR A PROCESSOR
20220129409 · 2022-04-28 · ·

A processor has first, second and third ALUs. The first ALU has on a first side an input and an output. The second ALU has a first side facing the first side of the first ALU, an input and an output on the first side of the second ALU and being in a rotated orientation relative to the input and the output of the first side of the first ALU, and an output on a second side of the second ALU. The third ALU has a first side facing the second side of the second ALU, and an input and an output on the first side of the third ALU. The input of the first side of the first ALU is logically directly connected to the output of the first side of the second ALU.

Compiling a program from a graph
11720332 · 2023-08-08 · ·

A method for generating an executable program to run on one or more processor modules. The method comprises: receiving a graph comprising a plurality of data nodes, compute vertices and edges; and compiling the graph into an executable program including one or more types of multi-access instruction each of which performs at least two memory access (load and/or store) operations in a single instruction. The memory on each processor module comprises multiple memory banks whereby the same bank cannot be accessed by different load or store operations in the same instruction. The compilation comprises assigning instances of the multi-access instructions to implement at least some of the graph edges, and allocating the data to memory addresses within different ones of the banks. The allocating is performed subject to one or more constraints, including at least that different load/store operations should not access the same memory bank in the same instruction.

Method for realizing nGraph framework supporting FPGA rear-end device

Disclosed are a method for realizing an nGraph framework supporting an FPGA backend device, and a related apparatus. The method includes: integrating an OpenCL standard API library into an nGraph framework; creating, in the nGraph framework, an FPGA backend device creation module for registering an FPGA rear-end device, initializing an OpenCL environment and acquiring the FPGA backend device; creating, in the nGraph framework, an FPGA buffer space processing module for opening up an FPGA buffer space and for reading and writing an FPGA cache; creating, in the nGraph framework, an OP kernel implementation module for creating an OP kernel and compiling the OP kernel; and creating, in the nGraph framework, an FPGA compiling execution module for registering, scheduling and executing the OP kernel.

Classical artificial intelligence (AI) and probability based code infusion

A method, a computer system, and a computer program product for parallel conversion is provided. Embodiments of the present invention may include analyzing raw classical code using a code embedded deep learning model. Embodiments of the present invention may include analyzing running classical code using a deep learning model. Embodiments of the present invention may include marking a location of the raw classical code for a first quantum conversion. Embodiments of the present invention may include suggesting a memory size of the running classical code for a second quantum conversion. Embodiments of the present invention may include aggregating the raw classical code for the first quantum conversion. Embodiments of the present invention may include aggregating the running classical code for the second quantum conversion.

Compiler for translating between a virtual image processor instruction set architecture (ISA) and target hardware having a two-dimensional shift array structure
11182138 · 2021-11-23 · ·

A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.

Optimizing hardware FIFO instructions

Methods, systems, and apparatus for scheduling first-in-first-out instructions are described. In one aspect, a method includes receiving data representing code of a program to be executed by a processing unit comprising hardware processors. For each of one or more of the hardware processors, an order of independent groups of first-in-first-out (FIFO) instructions for execution by the hardware processor is identified in the data representing the code of the program. For each independent group of FIFO instructions for execution by the hardware processor, a path length metric that represents how long it will take to reach an end of the program from the independent group of FIFO instructions is determined. A new order of the independent groups of FIFO instructions for execution by the hardware processor is generated based at least on the path length metric for each independent group of FIFO instructions for execution by the hardware processor.

GPU PROGRAM MULTI-VERSIONING FOR HARDWARE RESOURCE UTILIZATION
20220004438 · 2022-01-06 ·

The present disclosure relates to methods and apparatus for graphical processing. A processing unit may generate or utilize different versions of a GPU program based on hardware resources allocated to the GPU program at runtime. The processing unit may be configured to generate a first version of a GPU program that accesses a resource from a global memory of the processing unit 120 and a second version of the GPU program that access the resource from a fast shared resource of the processing unit 120. The processing unit may utilize the first version of the GPU program if the resource cannot be stored on the fast shared resource allocated to the GPU program at run time, and may utilize the second version of the GPU program if the resource can be stored on the fast shared resource allocated to the GPU program at run time.

Engineering apparatus, control method of engineering apparatus, and program for generating executable code for controlling target hardware
11215960 · 2022-01-04 · ·

An engineering apparatus according to the present disclosure generates generating executable code, which causes target hardware to operate, from a control application. The engineering apparatus includes an algorithm converter that converts control logic included in the control application into control logic code, a type management unit that outputs a type definition code corresponding to a data block structure of data held by a function block included in the control application, an instance management unit that outputs a memory allocation code that allocates an instance of the function block to memory, and a build controller that generates the executable code based on the control logic code, the type definition code, and the memory allocation code. Executable code for execution by target hardware is debugged while the executable code is in the form of a control application before conversion to a high-level language.

Performing multiple functions in single accelerator program without reload overhead in heterogenous computing system

Examples herein describe compiling source code for a heterogeneous computing system that contains jump logic for executing multiple accelerator functions. The jump logic instructs the accelerator to execute different functions without the overhead of reconfiguring the accelerator by, e.g., providing a new configuration bitstream to the accelerator. At start up when a host program is first executed, the host configures the accelerator to perform the different functions. The methods or system calls in the host program corresponding to the different functions then use jump logic to pass function selection values to an accelerator program in the accelerator that inform the accelerator program which function it is being instructed to perform. This jump logic can be generated by an accelerator compiler and then inserted into the host program as a host compiler generates the executable (e.g., the compiled binary) for the host program.

Compilation method and apparatus with neural network
11789710 · 2023-10-17 · ·

A compile method for a neural network, the compile method includes receiving data related to the neural network, generating a grouped layer by grouping layers comprised in the neural network based on the data, generating a set of passes executable in parallel based on a dependency between a plurality of passes to process the neural network, generating a set of threads performing a plurality of optimization functions based on whether optimization operations performed by the optimization functions is performed independently for the layers, respectively, or sequentially based on a dependency between the layers, and performing compilation in parallel based on the grouped layer, the set of passes, and the set of threads.