G06F8/456

SYSTEMS AND METHODS FOR AUTOMATICALLY PARALLELIZING SEQUENTIAL CODE

Systems, methods, and apparatus for automatically parallelizing code segments are provided. For example, an environment includes a profiling agent, a parallelization agent, and a verification agent. The profiling agent executes a code segment and generates a profile of the executed code segment. The parallelization agent analyzes the code segment to determine whether a parallelizable portion is present in the code segment. When a parallelizable portion is present, the parallelization agent determines, based on the profile of the executed code segment, whether to parallelize the parallelizable portion of the code segment. If it is determined to parallelize the parallelizable portion of the code segment, the parallelization agent automatically parallelizes the parallelizable portion of the code segment. The verification agent verifies the functionality and/or correctness of the parallelized code segment.

Automatic conversion of sequential array-based programs to parallel map-reduce programs

The present disclosure relates generally to the field of automatic conversion of sequential array-based programs to parallel MapReduce programs. In various examples, automatic conversion of sequential array-based programs to parallel MapReduce programs may be implemented in the form of systems, methods and/or algorithms.

Optimizing parallel build of application

Optimizing a parallel build of an application includes, in parallel execution of commands, recording command sequence numbers and access information of the commands and detecting an execution conflict based on the command sequence numbers and the access information of the commands using a processor. Commands involved in the execution conflict are re-executed serially.

Unified intermediate representation

A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system. The CPU or other host device in the given computing system may compile the intermediate representation file to generate an instruction set architecture binary for the hardware target, such as a GPU, within the system.

Method of achieving intra-machine workload balance for task chunks for distributed graph-processing systems

Techniques are provided for efficiently distributing graph data to multiple processor threads located on a server node. The server node receives graph data to be processed by the server node of a graph processing system. The received graph data is a portion of a larger graph to be processed by the graph processing system. In response to receiving graph data the server node compiles a list of vertices and attributes of each vertex from the graph data received. The server node then creates task chunks of work based upon the compiled list of vertices and their corresponding attribute data. The server node then distributes the task chunks to a plurality of threads available on the server node.

Loop parallelization analyzer for data flow programs

System and method for automatically parallelizing iterative functionality in a data flow program. A data flow program is stored that includes a first data flow program portion, where the first data flow program portion is iterative. Program code implementing a plurality of second data flow program portions is automatically generated based on the first data flow program portion, where each of the second data flow program portions is configured to execute a respective one or more iterations. The plurality of second data flow program portions are configured to execute at least a portion of iterations concurrently during execution of the data flow program. Execution of the plurality of second data flow program portions is functionally equivalent to sequential execution of the iterations of the first data flow program portion.

Information processing device and method for assigning task
09733982 · 2017-08-15 · ·

A computer calculates memory access rates for respective tasks on basis of hardware monitor information obtained by monitoring operating states of hardware during execution of an application program. The tasks correspond to respective syntax units specified in the application program. The computer assigns, on basis of the calculated memory access rates, a first task to a socket in a processor in response to an instruction for executing the first task.

Reducing the scan cycle time of control applications through multi-core execution of user programs

A method for pipeline parallelizing a control program for multi-core execution includes using (12) data dependency analysis on a control program to identify tasks that can be performed in parallel, identifying (13) a largest task Tmax requiring the most execution time of the identified tasks, identifying (14) cut-points in the largest task Tmax where data dependency delays decouple the task, inserting (15) delayed data dependencies into cut-points of the largest task Tmax to create N pipeline sub-tasks, in which N is a number of cores available to a processor on which the control program will be executed, and scheduling (16) the tasks and pipeline sub-tasks to the available processor cores.

General purpose distributed data parallel computing using a high level language

General-purpose distributed data-parallel computing using a high-level language is disclosed. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. The distributed execution plan is then executed on large compute clusters. Thus, the developer is allowed to write the program using familiar programming constructs in the high level language. Moreover, developers without experience with distributed compute systems are able to take advantage of such systems.

Detecting and selecting two processing modules to execute code having a set of parallel executable parts

The execution of an executable code by a set of processing modules is provided, wherein the executable code is executed by at least one first processing module of the set of processing modules, wherein said executable code comprises a set of parallel executable parts, wherein each parallel executable part of the executable code comprises at least two parallel executable steps, and wherein said executing comprises: detecting by the at least one first processing module a parallel executable part of the set of parallel executable parts of the executable code to be executed; selecting by the at least one first processing module at least two second processing modules of the set of processing modules; and commanding by the at least one first processing module the selected at least two second processing modules to perform the at least two parallel executable steps of the detected parallel executable part of the executable code.