Patent classifications
G06F8/453
METHOD OF DISTRIBUTED GRAPH LOADING FOR MINIMAL COMMUNICATION AND GOOD BALANCE VIA LAZY MATERIALIZATION AND DIRECTORY INDIRECTION USING INDEXED TABULAR REPRESENTATION
Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
Sparsity Uniformity Enforcement for Multicore Processor
Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.
Remote application modernization
Various embodiments of the present technology generally relate to the characterization and improvement of software applications. More specifically, some embodiments relate to systems and methods for modeling code behavior and generating new versions of the code based on the code behavior models. In some embodiments, a method of improving a codebase includes recording a run of the existing code, characterizing the code behavior via one or more models, prototyping new code according to a target language and target environment, deploying the new code to the target environment, and comparing the behavior of the new code to the behavior of the existing code. In some implementations, generating new code based on the behavior models includes using one or more machine learning techniques for code generation based on the target language and environment.
COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES
The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
Information processing apparatus, communication method and information processing system for communication of global data shared by information processing apparatuses
An information processing apparatus, among a plurality of information processing apparatuses, to which one of pieces of local data is assigned, the pieces of local data having been obtained by dividing global data shared by the plurality of information processing apparatuses, includes: a storage unit that includes a first storage area sectioned into prescribed units, and stores local data; a processor that executes a process including: detecting a plurality of continuous sections to which the target local data is to be written in a second storage area that is sectioned into the prescribed units in the different information processing apparatus, on the basis of storage area information that identifies data to which the target local data corresponds in the global data; and extracting as many pieces of local data as specified by the number of the continuous sections and transmitting the data to the different information processing apparatus.
Apparatus and method for handling registers in pipeline processing
An apparatus stores a program including a description of loop processing of iterating a plurality of instructions, and rearranges an execution sequence of the plurality of instructions in the program such that the loop processing is pipelined by software pipeline. The apparatus inserts an instruction to use a register for single instruction multiple data (SIMD) extension instruction, into the description of the loop processing in the program.
VOICE COMMAND INTEGRATION FOR LOCAL NETWORK CONNECTED DEVICES
Various arrangements for facilitating smart television content receivers in a local network are provided. In an example, a secondary television receiver receives audio data, converts the audio data into voice command data, and transmits the voice command data to a primary television receiver. In response, the primary television receiver transmits the voice command data to a voice processing server via the Internet, receives a command generated based on the voice command data, and transmits the command to the secondary television receiver. Based on the command, an operation of the secondary television receiver is controlled.
Methods and apparatus to configure heterogenous components in an accelerator
Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.
Methods and apparatus for automatic communication optimizations in a compiler based on a polyhedral representation
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Systems and methods for energy proportional scheduling
A compilation system using an energy model based on a set of generic and practical hardware and software parameters is presented. The model can represent the major trends in energy consumption spanning potential hardware configurations using only parameters available at compilation time. Experimental verification indicates that the model is nimble yet sufficiently precise, allowing efficient selection of one or more parameters of a target computing system so as to minimize power/energy consumption of a program while achieving other performance related goals. A voltage and/or frequency optimization and selection is presented which can determine an efficient dynamic hardware configuration schedule at compilation time. In various embodiments, the configuration schedule is chosen based on its predicted effect on energy consumption. A concurrency throttling technique based on the energy model can exploit the power-gating features exposed by the target computing system to increase the energy efficiency of programs.