G06F8/4432

Reducing minimum operating voltage through heterogeneous codes

Preferred embodiments of systems and methods are disclosed to reduce a minimal working voltage, Vmin, and/or increase the frequency of Vmin while executing multithreaded computer programs with better reliability, efficiency, and performance. A computer complier complies multiple copies of high-level code, each with different a different set of resource allocators so system resources are allocated during simultaneous execution of multiple threads in a way that allows reducing Vmin at a given reference voltage frequency and/or increasing the frequency of Vmin at a given Vmin value.

Sequence optimizations in a high-performance computing environment
10776087 · 2020-09-15 · ·

Embodiments are directed to techniques to determine dataflow graph instructions comprising one or more pick/switch instruction pairs and generate a reverse static single assignment graph based on the dataflow graph instructions, the reverse static single assignment graph comprising strongly connected components, each of the strongly connected components associated with at least one of the one or more pick/switch instruction pairs. Embodiments also include traversing the reverse static single assignment graph depth-first, and replace pick/switch instructions associated with strongly connected components having configuration values with compound instructions.

Dynamic generation of CPU instructions and use of the CPU instructions in generated code for a softcore processor
10768916 · 2020-09-08 · ·

In one embodiment, a method may receive, by a compiler of a host computing system, source code for a computer application. The method may also include separating a first portion of the source code and a second portion of the source code that are to be compiled for execution by an accelerator operatively coupled to the host computing system. The method may also include compiling the first portion of the source code to generate hardware description language code. A logic block is to be generated on the accelerator in view of the hardware description language code. The method also includes compiling the second portion of the source code to generate softcore processor code, and adding instructions to the softcore processor code to cause the softcore processor code to interact with the logic block during execution of the softcore processor code and the logic block.

COMPILER-OPTIMIZED CONTEXT SWITCHING
20200264880 · 2020-08-20 ·

Compiler-optimized context switching may include receiving an instruction indicating a preferred preemption point comprising an instruction address; storing the preferred preemption point in a data structure; determining, based on the data structure, that the preferred preemption point has been reached by a first thread; determining that preemption of the first thread for a second thread has been requested; and performing a context switch to the second thread.

SYSTEMS AND METHODS FOR MINIMIZING COMMUNICATIONS

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.

PROPAGATING REDUCED-PRECISION ON COMPUTATION GRAPHS
20200249924 · 2020-08-06 ·

Methods, systems, and apparatus for propagating reduced-precision on computation graphs are described. In one aspect, a method includes receiving data specifying a directed graph that includes operators for a program. The operators include first operators that each represent a numerical operation performed on numerical values having a first level of precision and second operators that each represent a numerical operation performed on numerical values having a second level of precision. One or more downstream operators are identified for a first operator. A determination is made whether each downstream operator represents a numerical operation that is performed on input values having the second level of precision. Whenever each downstream operator represents a numerical operation that is performed on input values having the second level of precision, a precision of numerical values output by the operation represented by the first operator is adjusted to the second level of precision.

Local optimization of quantum circuits

Techniques facilitating local optimization of quantum circuits are provided. In one example, a computer-implemented method comprises applying, by a device operatively coupled to a processor, respective weights to matrix elements of a first matrix corresponding to a quantum circuit according to respective numbers of quantum gates between respective pairs of qubits in the quantum circuit; transforming, by the device, the first matrix into a second matrix based on the respective weights of the matrix elements; and permuting, by the device, respective qubits in the quantum circuit according to the second matrix, resulting in a permuted quantum circuit.

Data Polarization
20200210160 · 2020-07-02 ·

The Data Polarization process is completed on computer systems to make binary data information streams more efficient. The process does this by polarizing the binary segments and adding a signature to indicate how the segments were polarized for unpackaging. Polarizing in Data Polarization means that in all of the binary information segment, either all of the zeros are turned into ones, and ones turned into zeros. Afterwards, after computations or transmissions with the data package, with the signature, the information can be correctly interpreted and unpackaged. This helps computer systems use less energy in transmission and computation as less ones, or bursts of energy, are being used overall in the system, because of the optimized segments. This has many uses in a variety of computer systems including undersea cable relays, quantum computers, or Bitcoin miners.

REDUCING MINIMUM OPERATING VOLTAGE THROUGH HETEROGENEOUS CODES

Preferred embodiments of systems and methods are disclosed to reduce a minimal working voltage, Vmin, and/or increase the frequency of Vmin while executing multithreaded computer programs with better reliability, efficiency, and performance. A computer complier complies multiple copies of high-level code, each with different a different set of resource allocators so system resources are allocated during simultaneous execution of multiple threads in a way that allows reducing Vmin at a given reference voltage frequency and/or increasing the frequency of Vmin at a given Vmin value.

POWER OPTIMIZAITON IN AN ARTIFICIAL INTELLIGENCE PROCESSOR
20200183476 · 2020-06-11 ·

In one embodiment, the present disclosure includes a method of reducing power in an artificial intelligence processor. For each cycle, over a plurality of cycles, an AI model is translated into operations executable on an artificial intelligence processor. The translating is based on power parameters that correspond to power consumption and performance of the artificial intelligence processor. The AI processor is configured with the executable operations, and input activation data sets are processed. Accordingly, result sets, power consumption data, and performance data are generated and stored over the plurality of cycles. The method further includes training an AI algorithm using the stored parameters, the power consumption data, and the performance data. A trained AI algorithm outputs a plurality of optimized parameters to reduce power consumption of the AI processor. The AI model is then translated into optimized executable operations based on the plurality of optimized parameters.