Patent classifications
G06F9/5066
Dynamic placement of computation sub-graphs
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for assigning operations of a computational graph to a plurality of computing devices are disclosed. Data characterizing a computational graph is obtained. Context information for a computational environment in which to perform the operations of the computational graph is received. A model input is generated, which includes at least the context information and the data characterizing the computational graph. The model input is processed using the machine learning model to generate an output defining placement assignments of the operations of the computational graph to the plurality of computing devices. The operations of the computational graph are assigned to the plurality of computing device according to the defined placement assignments.
APPARATUS AND METHOD FOR TREE STRUCTURE DATA REDUCTION
Apparatus and method for tree structure data reduction. For example, one embodiment of an apparatus comprises: a plurality of compute units; bounding volume hierarchy (BVH) processing logic to update a BVH responsive to changes associated with leaf nodes of the BVH, the BVH processing logic comprising: treelet generation logic to arrange nodes of the BVH into a plurality of treelets, the treelets including a plurality of bottom treelets and a tip treelet, each treelet having a number of nodes selected based on workgroup processing resources of the compute units; a dispatcher to dispatch workgroups to compute units to process the treelets, wherein a separate workgroup comprising a separate plurality of threads is dispatched to process each treelet.
Sharing preprocessing, computations, and hardware resources between multiple neural networks
A method for training a Neural-Network (NN), the method includes receiving a plurality of NN training tasks, each training task including (i) a respective preprocessing phase that preprocesses data to be provided as input data to the NN, and (ii) a respective computation phase that trains the NN using the preprocessed data. The plurality of NN training tasks is executed, including: (a) a commonality is identified between the input data required by computation phases of two or more of the training tasks, and (b) in response to identifying the commonality, one or more preprocessing phases are executed that produce the input data jointly for the two or more training tasks.
Parallel solution generation
The application describes parallel solution generation. A data processing apparatus includes a memory including a computer program code, and at least two processors configured to execute the computer program code. The computer program code includes a component program run in parallel on at least two processors to generate solution components compiled in parallel of points, and to store the added solution components in the memory; and a solution program run in parallel on at least two processors to generate a solution by adding one solution component at a time, read from the memory, to the solution based on a key point, and to store the added solution component to the solution in the memory.
Intra-shard parallelization of data stream processing using virtual shards
A data stream may include a plurality of records that are ordered, and the plurality of records may be assigned to a processing shard. A first set of virtual shards may be formed, the first set of virtual shards having a first quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. First records of the plurality of records may be processed using the first set of virtual shards. The first quantity of virtual shards may be modified, based at least in part on an observed record age, to a second quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. A second set of virtual shards may be formed having the second quantity of virtual shards. Second records of the plurality of records may be processed using the second set of virtual shards.
Automatic scaling of microservices applications
A device may receive information identifying a set of tasks to be executed by a microservices application that includes a plurality of microservices. The device may determine an execution time of the set of tasks based on a set of parameters and a model. The set of parameters may include a first parameter that identifies a first number of instances of a first microservice of the plurality of microservices, and a second parameter that identifies a second number of instances of a second microservice of the plurality of microservices. The device may compare the execution time and a threshold. The threshold may be associated with a service level agreement. The device may selectively adjust the first number of instances or the second number of instances based on comparing the execution time and the threshold.
System, method, and computer program product for processing large data sets by balancing entropy between distributed data segments
Systems, methods, and computer program products are provided for load balancing for processing large data sets. The method includes identifying a number of segments and a transaction data set comprising transaction data for a plurality of transactions, the transaction data for each transaction of the plurality of transactions comprising a transaction value, determining an entropy of the transaction data set based on the transaction value of each transaction of the plurality of transactions, segmenting the transaction data set into the number of segments based on the entropy of the transaction data set and balancing respective entropies of each segment of the number of segments, and distributing processing tasks associated with each segment of the number of segments to at least one processor of a plurality of processors to process each transaction in each respective segment.
Logical node layout method and apparatus, computer device, and storage medium
The disclosed method is applicable to a many-core system. The method includes: acquiring multiple pieces of .routing information, each of which includes two logical nodes and a data transmission amount between the two logical nodes; determining a piece of unprocessed routing information with a maximum data transmission amount as current routing information; mapping each unlocked logical node of the current routing information to one unlocked processing node, and locking the mapped logical node and processing node, wherein if there is an unlocked edge processing node, the unlocked logical node is mapped to the unlocked edge processing node; and returning, if there is at least one unlocked logical node, to the step of determining the piece of unprocessed routing information with the maximum data transmission amount as the current routing information.
Graphics processing unit systems for performing data analytics operations in data science
Systems and methods are provided for efficiently performing processing intensive operations, such as those involving large volumes of data, that enable accelerated processing time of these operations. In at least one embodiment, a system includes a graphics processor unit (GPU) including a memory and a plurality of cores. The plurality of cores perform a plurality of data analytics operations on a respectively allocated portion of a dataset, each of the plurality of cores using only the memory to store data input for each of the plurality of data analytics operations performed by the plurality of cores. The data storage for the plurality of data analytics operations performed by the plurality of cores is also provided solely by the memory.
System and method for generating size-based splits in a massively parallel or distributed database environment
A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.