G06F8/314

Automatic generation of multi-source breadth-first search from high-level graph language for distributed graph processing systems

Techniques are described herein for automatic generation of multi-source breadth-first search (MS-BFS) from high-level graph processing language that can be executed in a distributed computing environment. In an embodiment, a method involves a computer analyzing original software instructions. The original software instructions are configured to perform multiple breadth-first searches to determine a particular result. Each breadth-first search originates at each of a subset of vertices of a graph. Each breadth-first search is encoded for independent execution. Based on the analyzing, the computer generates transformed software instructions configured to perform a MS-BFS to determine the particular result. Each of the subset of vertices is a source of the MS-BFS. In an embodiment, the second plurality of software instructions comprises a node iteration loop and a neighbor iteration loop, and the plurality of vertices of the distributed graph comprise active vertices and neighbor vertices. The node iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and the node iteration loop is configured to determine the particular result. The neighbor iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and each iteration of the neighbor iteration loop is configured to activate one or more neighbor vertices of the plurality of vertices for the following iteration of the neighbor iteration loop.

APPLICATION INTERFACE ON MULTIPLE PROCESSORS
20200285521 · 2020-09-10 ·

A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.

Assisting parallelization of a computer program

A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.

SYSTEM AND METHOD TO PERFORM PARALLEL PROCESSING ON A DISTRIBUTED DATASET

Disclosed is a system to perform parallel processing on a distributed dataset. A receiving module, for receiving a dataset along with a set of functions. A partitioning module, for partitioning the dataset into a set of distributed datasets. A distributing module, for distributing the set of distributed datasets amongst a set of computing nodes. A determining module, for determining an applicability of the function on the distributed dataset. An executing module, for executing one or more functions applicable on the distributed dataset. A generating module, for generating processed data for the distributed dataset based upon the executing of the one or more functions.

GENERATING SYNCHRONOUS DIGITAL CIRCUITS FROM SOURCE CODE CONSTRUCTS THAT MAP TO CIRCUIT IMPLEMENTATIONS
20200225920 · 2020-07-16 ·

A multi-threaded imperative programming language includes language constructs that map to circuit implementations. The constructs can include a condition statement that enables a thread in a hardware pipeline to wait for a specified condition to occur, identify the start and end of a portion of source code instructions that are to be executed atomically, or indicate that a read-modify-write memory operation is to be performed atomically. Source code that includes one or more constructs mapping to a circuit implementation can be compiled to generate a circuit description. The circuit description can be expressed using hardware description language (HDL), for instance. The circuit description can, in turn, be used to generate a synchronous digital circuit that includes the circuit implementation. For example, HDL might be utilized to generate an FPGA image or bitstream that can be utilized to program an FPGA that includes the circuit implementation associate with the language construct.

GENERATING A SYNCHRONOUS DIGITAL CIRCUIT FROM A SOURCE CODE CONSTRUCT DEFINING A FUNCTION CALL
20200225919 · 2020-07-16 ·

A multi-threaded imperative programming language includes a language construct defining a function call. A circuit implementation for the construct includes a first pipeline, a second pipeline, and a third pipeline. The first hardware pipeline outputs variables to a first queue and outputs parameters for the function to a second queue. The second hardware pipeline obtains the function parameters from the second queue, performs the function, and stores the results of the function in a third queue. The third hardware pipeline retrieves the results generated by the second pipeline from the second queue and retrieves the variables from the first queue. The third hardware pipeline performs hardware operations specified by the source code using the variables and the results of the function. A single instance of the circuit implementation can be utilized to implement calls to the same function made from multiple locations within source code.

AUTOMATIC GENERATION OF MULTI-SOURCE BREADTH-FIRST SEARCH FROM HIGH-LEVEL GRAPH LANGUAGE FOR DISTRIBUTED GRAPH PROCESSING SYSTEMS
20200133663 · 2020-04-30 ·

Techniques are described herein for automatic generation of multi-source breadth-first search (MS-BFS) from high-level graph processing language that can be executed in a distributed computing environment. In an embodiment, a method involves a computer analyzing original software instructions. The original software instructions are configured to perform multiple breadth-first searches to determine a particular result. Each breadth-first search originates at each of a subset of vertices of a graph. Each breadth-first search is encoded for independent execution. Based on the analyzing, the computer generates transformed software instructions configured to perform a MS-BFS to determine the particular result. Each of the subset of vertices is a source of the MS-BFS. In an embodiment, the second plurality of software instructions comprises a node iteration loop and a neighbor iteration loop, and the plurality of vertices of the distributed graph comprise active vertices and neighbor vertices. The node iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and the node iteration loop is configured to determine the particular result. The neighbor iteration loop is configured to iterate once per each active vertex of the plurality of vertices of the distributed graph, and each iteration of the neighbor iteration loop is configured to activate one or more neighbor vertices of the plurality of vertices for the following iteration of the neighbor iteration loop.

Systems and methods for creating and using a data structure for parallel programming
10585845 · 2020-03-10 · ·

System and method embodiments are provided for creating data structure for parallel programming. A method for creating data structures for parallel programming includes forming, by one or more processors, one or more data structures, each data structure comprising one or more global containers and a plurality of local containers. Each of the global containers is accessible by all of a plurality of threads in a multi-thread parallel processing environment. Each of the plurality of local containers is accessible only by a corresponding one of the plurality of threads. A global container is split into a second plurality of local containers when items are going to be processed in parallel and two or more local containers are merged into a single global container when a parallel process reaches a synchronization point.

Method of dynamically renumbering ports and an apparatus thereof
10560399 · 2020-02-11 · ·

Embodiments of the apparatus of dynamically renumbering ports relate to a network chip that minimizes the total logic on the network chip by limiting the number of states that needs to be preserved for all ports on the network chip. Each pipe on the network chip implements a dynamic port renumbering scheme that dynamically assigns a relative port number for each port assigned to that pipe. The dynamic port renumbering scheme allows for internal parallelism without increasing the total amount of state space required for the ports on the network chip.

Application interface on multiple processors
10534647 · 2020-01-14 · ·

A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.