Patent classifications
G06F8/453
MAPPING COMPONENTS OF A NON-DISTRIBUTED ENVIRONMENT TO A DISTRIBUTED ENVIRONMENT
Embodiments of the present invention disclose a method, a computer program product, and a computer system for mapping components of non-distributed environments to distributed environments. A computer receives a data pipeline configured for a non-distributed environment and identifies one or more bottleneck components of the data pipeline. In addition, the computer converts data used in the pipeline to a format compatible with a distributed environment and installs the necessary computing libraries necessary for operating the pipeline within the distributed environment. The computer further converts the code of the pipeline to a code that is compatible with the distributed environment and optimizes components of the pipeline for use in the distributed environment.
SYSTEM AND METHOD TO PERFORM PARALLEL PROCESSING ON A DISTRIBUTED DATASET
Disclosed is a system to perform parallel processing on a distributed dataset. A receiving module, for receiving a dataset along with a set of functions. A partitioning module, for partitioning the dataset into a set of distributed datasets. A distributing module, for distributing the set of distributed datasets amongst a set of computing nodes. A determining module, for determining an applicability of the function on the distributed dataset. An executing module, for executing one or more functions applicable on the distributed dataset. A generating module, for generating processed data for the distributed dataset based upon the executing of the one or more functions.
SYSTEMS AND METHODS FOR MINIMIZING COMMUNICATIONS
A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.
Shared local memory tiling mechanism
An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.
Mapping components of a non-distributed environment to a distributed environment
Embodiments of the present invention disclose a method, a computer program product, and a computer system for mapping components of non-distributed environments to distributed environments. A computer receives a data pipeline configured for a non-distributed environment and identifies one or more bottleneck components of the data pipeline. In addition, the computer converts data used in the pipeline to a format compatible with a distributed environment and installs the necessary computing libraries necessary for operating the pipeline within the distributed environment. The computer further converts the code of the pipeline to a code that is compatible with the distributed environment and optimizes components of the pipeline for use in the distributed environment.
Iterative evaluation of data through SIMD processor registers
Executable code is generated for processing a data set in an in-memory database system. The executable code is based on program instructions including a predicate associated with a first part of the data set. The first part of the data set is divided into data sections. A data section comprises a number of data elements corresponding to a number of bit values to be allocated into a register at a processor. The register at the processor is associated with performing single instructions on multiple data. At the processor, the data sections are evaluated iteratively to determine bit vectors to be stored iteratively into the SIMD register. Based on the iteratively stored bit vectors at SIMD register at the processor, result data sets are iteratively determined through invoking data from the data set. The result data sets are provided through the processor for further consumption.
Incremental parallel processing of data
One example method includes identifying synchronous code including instructions specifying a computing operation to be performed on a set of data; transforming the synchronous code into a pipeline application including one or more pipeline objects; identifying a first input data set on which to execute the pipeline application; executing the pipeline application on a first input data set to produce a first output data set; after executing the pipeline application on the first input data set, identifying a second input data set on which to execute the pipeline application; determining a set of differences between the first input data set and second input data set; and executing the pipeline application on the set of differences to produce a second output data set.
Hardware Acceleration Method, Compiler, and Device
A hardware acceleration method includes: obtaining compilation policy information and a source code, where the compilation policy information indicates that a first code type matches a first processor and a second code type matches a second processor, analyzing a code segment in the source code according to the compilation policy information, determining a first code segment belonging to the first code type or a second code segment belonging to the second code type, compiling the first code segment into a first executable code, sending the first executable code to the first processor, compiling the second code segment into a second executable code, and sending the second executable code to the second processor.
Computer system and method for parallel program code optimization and deployment
A compiler system, method and computer program product for optimizing a program is disclosed. The compiler includes an extractor module configured to extract, from an initial program code, a hierarchical task representation wherein each node of the hierarchical task representation corresponds to a potential unit of execution. The root node of the hierarchical task representation represents the entire initial program code and each child node represents a sub-set of units of execution of its respective parent node. It further has a parallelizer module configured to apply to the hierarchical task representation pre-defined parallelization rules associated with the processing device to automatically adjust the hierarchical task representation by assigning particular units of execution to particular processing units of the processing device and by inserting communication and/or synchronization in that the adjusted hierarchical task representation reflects parallel program code for the processing device.
Minibatch Parallel Machine Learning System Design
The disclosure is directed to optimizing parallel machine learning system design and performance using minibatch. A system for allocating data center resources according to embodiments includes: a machine learning process; a machine learning data set; a processing system including a P parallel processing elements for training the machine learning process using the machine learning data set, wherein the machine learning data set is split into a plurality of batches with a batch size M; and a resource manager for (1) minimizing a training time T=T(M,P) of the machine learning process over M for each value of P, and (2) efficient system design.