Patent classifications
G06F8/4432
METHOD AND APPARATUS FOR INSTRUCTION CHECKPOINTING IN A DATA PROCESSING DEVICE POWERED BY AN UNPREDICTABLE POWER SOURCE
A computer-implemented method comprises generating computer executable code as one or more code portions; detecting a number of processing operations required to reach one or more predetermined stages in execution of each code portion; and associating with each code portion one or more progress indicators, each representing a respective execution stage of the one or more predetermined stages within execution of that code portion. The code portions are executed by a processor powered by an unpredictable power source. When the processor detects an energy condition indicating that no more than a reserve quantity of electrical energy is available, the progress indicators are used to determine whether or not to perform a checkpoint.
Compiler-optimized context switching with compiler-inserted data table for in-use register identification at a preferred preemption point
Compiler-optimized context switching may include receiving an instruction indicating a preferred preemption point comprising an instruction address; storing the preferred preemption point in a data structure; determining, based on the data structure, that the preferred preemption point has been reached by a first thread; determining that preemption of the first thread for a second thread has been requested; and performing a context switch to the second thread.
METHOD AND DEVICE FOR MANAGING ACCESSES OF MULTIPLE SOFTWARE COMPONENTS TO SOFTWARE INTERFACES
A method for managing accesses of multiple software components to software interfaces. In the method, a temporal allocation of the software components to the software interfaces is calculated statically based on requirements of the software components with respect to the software interfaces. The allocation is optimized continuously in light of an observed runtime behavior of the software components.
Compiler for implementing memory shutdown for neural network implementation configuration
Some embodiments provide a compiler for optimizing the implementation of a machine-trained network (e.g., a neural network) on an integrated circuit (IC). The compiler of some embodiments receives a specification of a machine-trained network including multiple layers of computation nodes and generates a graph representing options for implementing the machine-trained network in the IC. In some embodiments, the graph includes nodes representing options for implementing each layer of the machine-trained network and edges between nodes for different layers representing different implementations that are compatible. The compiler of some embodiments is also responsible for generating instructions relating to shutting down (and waking up) memory units of cores. In some embodiments, the memory units to shutdown are determined by the compiler based on the data that is stored or will be stored in the particular memory units.
Program rewrite device, storage medium, and program rewrite method
A program rewrite method executed by a computer, the method includes rewriting a program to output a first output group by performing operations for a first variable among a plurality of variables with a plurality of data types; rewriting the program to output a second output group by performing operations for a second variable among the plurality of variables with a plurality of data types; identifying, from the first output group and the second output group, a third output group that satisfied a predetermined criterion as a result of executing the rewritten programs; determining a data type that corresponds to the third output group as a use data type; and outputting a program in which the use data type is set for each of the plurality of variables.
Compiler for optimizing filter sparsity for neural network implementation configuration
Some embodiments provide a compiler for optimizing the implementation of a machine-trained network (e.g., a neural network) on an integrated circuit (IC). In some embodiments, the compiler determines whether sparsity requirements of channels implemented on individual cores are met on each core. If the sparsity requirement is not met, the compiler, in some embodiments, determines whether the channels of the filter can be rearranged to meet the sparsity requirements on each core and, based on the determination, either rearranges the filter channels or implements a solution to non-sparsity.
AUTOMATED USE OF COMPUTATIONAL MOTIFS VIA DEEP LEARNING DETECTION
A system and method are described for efficiently utilizing optimized implementations of computational patterns in an application. In various implementations, a computing system includes at least one or more processors, and these one or more processors and other hardware resources of the computing system process a variety of applications. Sampled, dynamic values of hardware performance counters are sent to a trained data model. The data model provides characterization of the computational patterns being used and the types of workloads being processed. The data model also indicates whether the identified computational patterns already use an optimized version. Later, a selected processor determines a given unoptimized computational pattern is no longer running and replaces this computational pattern with an optimized version. Although the application is still running, the processor performs a static replacement. On a next iteration of the computational pattern, the optimized version is run.
PROPAGATING REDUCED-PRECISION ON COMPUTATION GRAPHS
Methods, systems, and apparatus for propagating reduced-precision on computation graphs are described. In one aspect, a method includes receiving data specifying a directed graph that includes operators for a program. The operators include first operators that each represent a numerical operation performed on numerical values having a first level of precision and second operators that each represent a numerical operation performed on numerical values having a second level of precision. One or more downstream operators are identified for a first operator. A determination is made whether each downstream operator represents a numerical operation that is performed on input values having the second level of precision. Whenever each downstream operator represents a numerical operation that is performed on input values having the second level of precision, a precision of numerical values output by the operation represented by the first operator is adjusted to the second level of precision.
Systems and methods for energy proportional scheduling
A compilation system using an energy model based on a set of generic and practical hardware and software parameters is presented. The model can represent the major trends in energy consumption spanning potential hardware configurations using only parameters available at compilation time. Experimental verification indicates that the model is nimble yet sufficiently precise, allowing efficient selection of one or more parameters of a target computing system so as to minimize power/energy consumption of a program while achieving other performance related goals. A voltage and/or frequency optimization and selection is presented which can determine an efficient dynamic hardware configuration schedule at compilation time. In various embodiments, the configuration schedule is chosen based on its predicted effect on energy consumption. A concurrency throttling technique based on the energy model can exploit the power-gating features exposed by the target computing system to increase the energy efficiency of programs.
SYSTEMS AND METHODS FOR MINIMIZING COMMUNICATIONS
A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.