Patent classifications
G06F8/456
COMPILING GRAPH-BASED PROGRAM SPECIFICATIONS
A graph-based program specification includes: a plurality of components, each corresponding to a processing task and including one or more ports for sending or receiving one or more data elements; and one or more links, each connecting an output port of an upstream component of the plurality of components to an input port of a downstream component of the plurality of components. Prepared code is generated representing subsets of the plurality of components, including: identifying a plurality of subset boundaries between components in different subsets based at least in part on characteristics of linked components; forming the subsets based on the identified subset boundaries; and generating prepared code for each formed subset that when used for execution by a runtime system causes processing tasks corresponding to the components in that formed subset to be performed according to information embedded in the prepared code for that formed subset.
SYSTEMS AND METHODS FOR EAGER SOFTWARE BUILD
A method and apparatus of a device that builds a target using a plurality of processing units is described. In an exemplary embodiment, the device receives a build file for the target, where the build file identifies a plurality of dependencies and the first target is depended on a second target. In addition, the device generates a directed acyclic graph for the first target from the plurality of dependencies. Furthermore, the device transforms the directed acyclic graph by transforming the first set of dependencies to a second set of dependencies and the second set of dependency includes the first dependency that is from a first node in the first target to a second node of a second target. The device additionally identifies a plurality of independent executable tasks, where each of the plurality of independent executable tasks is executable without an unresolved dependency and at least one of the plurality of executable tasks is associated with the second set of dependencies of the transformed directed acyclic graph. The device further schedules the plurality of independent executable tasks on the plurality of processing units. In addition, the device concurrently executes the plurality of independent executable tasks.
Method, apparatus, and electronic device for improving parallel performance of CPU
Implementations of the present specification provide a method, an apparatus, and an electronic device for improving parallel performance of a CPU. The method includes: attempting to acquire data requests that are of a same type and that are allocated to the CPU core; determining a number of requests that are specified by the acquired one or more data requests; and in response to determining that the number of requests is greater than or equal to a maximum degree of parallelism: executing executable codes corresponding to the maximum degree of parallelism, wherein the maximum degree of parallelism is a maximum number of parallel threads executable by the CPU, and wherein the executable codes comprise code programs that are compiled and linked based on the maximum degree of parallelism at a time that is prior to a time of the executing.
METHOD, APPARATUS, AND ELECTRONIC DEVICE FOR IMPROVING PARALLEL PERFORMANCE OF CPU
Implementations of the present specification provide a method, an apparatus, and an electronic device for improving parallel performance of a CPU. The method includes: attempting to acquire data requests that are of a same type and that are allocated to the CPU core; determining a number of requests that are specified by the acquired one or more data requests; and in response to determining that the number of requests is greater than or equal to a maximum degree of parallelism: executing executable codes corresponding to the maximum degree of parallelism, wherein the maximum degree of parallelism is a maximum number of parallel threads executable by the CPU, and wherein the executable codes comprise code programs that are compiled and linked based on the maximum degree of parallelism at a time that is prior to a time of the executing.
Assisting parallelization of a computer program
A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.
SYSTEMS AND METHODS FOR MAPPING SOFTWARE APPLICATIONS INTERDEPENDENCIES
Systems and methods method for mapping between function calls and entities of the computer program. The method includes executing a computer program in a first computing environment; determining a first entity of the computer program to track; assigning an identifier to the first entity; determining the first entity has been accessed by at least one function call; and mapping the at least one function call with the identifier of the first entity; generating a cluster including the at least one function, wherein the cluster may be executed independently from the rest of the computer program.
AUTOMATIC COMPILER DATAFLOW OPTIMIZATION TO ENABLE PIPELINING OF LOOPS WITH LOCAL STORAGE REQUIREMENTS
Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).
Programmable state machine controller in a parallel processing system
Method and system are disclosed for a programmable state machine controller in a parallel processing system. The programmable state machine controller includes a set of control registers configured to serve a set of application specific engines; a set of task engines configured to access a plurality of application resources in parallel; one or more processors configured to: receive multiple requests from the set of application specific engines, determine availability of the set of task engines and the plurality of application resources being requested, assign the set of task engines to serve the set of application specific engines based on the availability of the set of task engines and the availability of the plurality of application resources being requested, and serve the multiple requests from the set of application specific engines in parallel.
Systems and methods for automatically parallelizing sequential code
Systems, methods, and apparatus for automatically parallelizing code segments are provided. For example, an environment includes a profiling agent, a parallelization agent, and a verification agent. The profiling agent executes a code segment and generates a profile of the executed code segment. The parallelization agent analyzes the code segment to determine whether a parallelizable portion is present in the code segment. When a parallelizable portion is present, the parallelization agent determines, based on the profile of the executed code segment, whether to parallelize the parallelizable portion of the code segment. If it is determined to parallelize the parallelizable portion of the code segment, the parallelization agent automatically parallelizes the parallelizable portion of the code segment. The verification agent verifies the functionality and/or correctness of the parallelized code segment.
Parallel program generating method and parallelization compiling apparatus
There is provided a parallel program generating method capable of generating a static scheduling enabled parallel program without undermining the possibility of extracting parallelism. The parallel program generating method executed by the parallelization compiling apparatus 100 includes a fusion step (FIG. 2/STEP026) of fusing, as a new task, a task group including a reference task as a task having a conditional branch, and subsequent tasks as tasks control dependent, extended-control dependent, or indirect control dependent on respective of all branch directions of the conditional branch included in the reference task.