Patent classifications
G06F8/456
Systems and methods for task parallelization
Various embodiments of the present disclosure can include systems, methods, and non-transitory computer readable media configured to obtain at least one script and at least one document, wherein the script includes one or more instructions to be translated for execution in a parallelized computing environment, and wherein the document includes data that is referenced by the script. A syntax tree for the script can be determined. At least one approach for optimizing the syntax tree can be applied. Parallelized code for execution in the parallelized computing environment can be generated. A binary representation of the document can be determined. The parallelized code can be processed based at least in part on the binary representation of the document.
Method and apparatus for remote field programmable gate array processing
In one embodiment, an apparatus comprises a fabric controller of a first computing node. The fabric controller is to receive, from a second computing node via a network fabric that couples the first computing node to the second computing node, a request to execute a kernel on a field-programmable gate array (FPGA) of the first computing node; instruct the FPGA to execute the kernel; and send a result of the execution of the kernel to the second computing node via the network fabric.
System and method for selectively delaying execution of an operation based on a search for uncompleted predicate operations in processor-associated queues
A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.
Compiling graph-based program specifications
A graph-based program specification includes: a plurality of components, each corresponding to a processing task and including one or more ports, including scalar data ports for sending or receiving a single data element and collection data ports for sending or receiving a collection of multiple data elements; and one or more links, each connecting an output port of an upstream component to an input port of a downstream component. Prepared code is generated representing subsets of the plurality of components, including: identifying one or more subset boundaries, including identifying one or more links connecting a collection data port of a component to a scalar data port of a component; forming the subsets based on the identified subset boundaries; and generating prepared code for each formed subset that when used for execution by a runtime system causes processing tasks corresponding to the components in each formed subset to be performed.
Generating object code from intermediate code that includes hierarchical sub-routine information
Examples are described for a device to receive intermediate code that was generated from compiling source code of an application. The intermediate code includes information generated from the compiling that identifies a hierarchical structure of lower level sub-routines in higher level sub-routines, and the lower level sub-routines are defined in the source code of the application to execute more frequently than the higher level sub-routines that identify the lower level sub-routines. The device is configured to compile the intermediate code to generate object code based on the information that identifies lower level sub-routines in higher level sub-routines, and store the object code.
Method and apparatus for compiling code based on a dependency tree
A compiling apparatus generates a dependency tree representing dependency relations among a plurality of instructions included in first code. The compiling apparatus detects, from the dependency tree, a partial tree including a first instruction, a second instruction, and a third instruction that depends on the operation results of the first and second instructions, and rewrites the instructions corresponding to the partial tree to a set of instructions including a plurality of complex instructions each of which causes a processor to perform a complex operation including a plurality of operations. The compiling apparatus generates second code on the basis of the dependency tree and the set of instructions.
Dynamic Stream Operator Fission and Fusion with Platform Management Hints
Methods and apparatus, including computer program products, implementing and using techniques for data stream processing in a runtime data processing environment. A stream processing graph that includes several connected operators is received. Source code of the operators is analyzed to identify hints describing whether an operator contains data structures, method parameters or other data that can be applied in a parallelization data processing environment. Performance metrics of the data processing environment within parallel regions is evaluated to determine whether data processing resources can be dynamically scaled up or down. In response to determining that the data processing resources can be dynamically scaled up, one or more operators are split to be processed on two or more parallel processing resources. In response to determining that the data processing resources can be dynamically scaled down, one or more operators are combined to be processed on a single parallel processing resource.
Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width
A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.
Dynamic distribution of loads across heterogeneous computing structures in computational rendering
Embodiments for dynamically distributing loads in computational rendering in a computing environment. A computational rendering model on a computational rendering to exploit nested recursive parallelism within a heterogenous computing architecture to enable communication overlap, memory transfer, and data and task management, wherein the computational rendering model is developed for the heterogenous computing architecture.
COMPUTER-READABLE RECORDING MEDIUM STORING CONVERSION PROGRAM AND CONVERSION METHOD
A recording medium stores a program causing a computer to execute a process including: generating, based on a dependency relationship between statements in a program, a directed graph in which the statement in the program is a node and the dependency relationship is an edge; detecting, based on the dependency relationship represented by the edge, a node of which a part of a loop process has a dependency relationship with another preceding or following node, from the directed graph; updating the directed graph by dividing the detected node into a first node having the part of the loop process and a second node having a loop process other than the part of the loop process, fusing the divided first node and the another node, and assigning dependency information based on a data access pattern to a node after fusing; and converting the program, based on the directed graph after update.