IPIQ

G06F8/4432

Reducing minimum operating voltage through heterogeneous codes

10831535 · 2020-11-10 ·

International Business Machines Corporation

Preferred embodiments of systems and methods are disclosed to reduce a minimal working voltage, Vmin, and/or increase the frequency of Vmin while executing multithreaded computer programs with better reliability, efficiency, and performance. A computer complier complies multiple copies of high-level code, each with different a different set of resource allocators so system resources are allocated during simultaneous execution of multiple threads in a way that allows reducing Vmin at a given reference voltage frequency and/or increasing the frequency of Vmin at a given Vmin value.

Sequence optimizations in a high-performance computing environment

10776087 · 2020-09-15 ·

Intel Corporation

Yongzhi Zhang

Embodiments are directed to techniques to determine dataflow graph instructions comprising one or more pick/switch instruction pairs and generate a reverse static single assignment graph based on the dataflow graph instructions, the reverse static single assignment graph comprising strongly connected components, each of the strongly connected components associated with at least one of the one or more pick/switch instruction pairs. Embodiments also include traversing the reverse static single assignment graph depth-first, and replace pick/switch instructions associated with strongly connected components having configuration values with compound instructions.

Dynamic generation of CPU instructions and use of the CPU instructions in generated code for a softcore processor

10768916 · 2020-09-08 ·

Red Hat, Inc.

Ulrich Drepper

In one embodiment, a method may receive, by a compiler of a host computing system, source code for a computer application. The method may also include separating a first portion of the source code and a second portion of the source code that are to be compiled for execution by an accelerator operatively coupled to the host computing system. The method may also include compiling the first portion of the source code to generate hardware description language code. A logic block is to be generated on the accelerator in view of the hardware description language code. The method also includes compiling the second portion of the source code to generate softcore processor code, and adding instructions to the softcore processor code to cause the softcore processor code to interact with the logic block during execution of the softcore processor code and the logic block.

COMPILER-OPTIMIZED CONTEXT SWITCHING

20200264880 · 2020-08-20 ·

Kelvin D. Nilsen

Compiler-optimized context switching may include receiving an instruction indicating a preferred preemption point comprising an instruction address; storing the preferred preemption point in a data structure; determining, based on the data structure, that the preferred preemption point has been reached by a first thread; determining that preemption of the first thread for a second thread has been requested; and performing a context switch to the second thread.

SYSTEMS AND METHODS FOR MINIMIZING COMMUNICATIONS

20200249855 · 2020-08-06 ·

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.

PROPAGATING REDUCED-PRECISION ON COMPUTATION GRAPHS

20200249924 · 2020-08-06 ·

Yuanzhong Xu

Methods, systems, and apparatus for propagating reduced-precision on computation graphs are described. In one aspect, a method includes receiving data specifying a directed graph that includes operators for a program. The operators include first operators that each represent a numerical operation performed on numerical values having a first level of precision and second operators that each represent a numerical operation performed on numerical values having a second level of precision. One or more downstream operators are identified for a first operator. A determination is made whether each downstream operator represents a numerical operation that is performed on input values having the second level of precision. Whenever each downstream operator represents a numerical operation that is performed on input values having the second level of precision, a precision of numerical values output by the operation represented by the first operator is adjusted to the second level of precision.

Local optimization of quantum circuits

10706365 · 2020-07-07 ·

International Business Machines Corporation

Paul Nation

Techniques facilitating local optimization of quantum circuits are provided. In one example, a computer-implemented method comprises applying, by a device operatively coupled to a processor, respective weights to matrix elements of a first matrix corresponding to a quantum circuit according to respective numbers of quantum gates between respective pairs of qubits in the quantum circuit; transforming, by the device, the first matrix into a second matrix based on the respective weights of the matrix elements; and permuting, by the device, respective qubits in the quantum circuit according to the second matrix, resulting in a permuted quantum circuit.

Data Polarization

20200210160 · 2020-07-02 ·

Yelizar Aleksandr Dergachev

The Data Polarization process is completed on computer systems to make binary data information streams more efficient. The process does this by polarizing the binary segments and adding a signature to indicate how the segments were polarized for unpackaging. Polarizing in Data Polarization means that in all of the binary information segment, either all of the zeros are turned into ones, and ones turned into zeros. Afterwards, after computations or transmissions with the data package, with the signature, the information can be correctly interpreted and unpackaged. This helps computer systems use less energy in transmission and computation as less ones, or bursts of energy, are being used overall in the system, because of the optimized segments. This has many uses in a variety of computer systems including undersea cable relays, quantum computers, or Bitcoin miners.

REDUCING MINIMUM OPERATING VOLTAGE THROUGH HETEROGENEOUS CODES

20200210229 · 2020-07-02 ·

POWER OPTIMIZAITON IN AN ARTIFICIAL INTELLIGENCE PROCESSOR

20200183476 · 2020-06-11 ·

Sushma Honnavara-Prasad

In one embodiment, the present disclosure includes a method of reducing power in an artificial intelligence processor. For each cycle, over a plurality of cycles, an AI model is translated into operations executable on an artificial intelligence processor. The translating is based on power parameters that correspond to power consumption and performance of the artificial intelligence processor. The AI processor is configured with the executable operations, and input activation data sets are processed. Accordingly, result sets, power consumption data, and performance data are generated and stored over the plurality of cycles. The method further includes training an AI algorithm using the stored parameters, the power consumption data, and the performance data. A trained AI algorithm outputs a plurality of optimized parameters to reduce power consumption of the AI processor. The AI model is then translated into optimized executable operations based on the plurality of optimized parameters.

Patent classifications

G06F8/4432