IPIQ

G06F8/4432

DYNAMIC GENERATION OF CPU INSTRUCTIONS AND USE OF THE CPU INSTRUCTIONS IN GENERATED CODE FOR A SOFTCORE PROCESSOR

20200167139 · 2020-05-28 ·

Ulrich Drepper

In one embodiment, a method may receive, by a compiler of a host computing system, source code for a computer application. The method may also include separating a first portion of the source code and a second portion of the source code that are to be compiled for execution by an accelerator operatively coupled to the host computing system. The method may also include compiling the first portion of the source code to generate hardware description language code. A logic block is to be generated on the accelerator in view of the hardware description language code. The method also includes compiling the second portion of the source code to generate softcore processor code, and adding instructions to the softcore processor code to cause the softcore processor code to interact with the logic block during execution of the softcore processor code and the logic block.

LOCAL OPTIMIZATION OF QUANTUM CIRCUITS

20200104747 · 2020-04-02 ·

Paul Nation

Techniques facilitating local optimization of quantum circuits are provided. In one example, a computer-implemented method comprises applying, by a device operatively coupled to a processor, respective weights to matrix elements of a first matrix corresponding to a quantum circuit according to respective numbers of quantum gates between respective pairs of qubits in the quantum circuit; transforming, by the device, the first matrix into a second matrix based on the respective weights of the matrix elements; and permuting, by the device, respective qubits in the quantum circuit according to the second matrix, resulting in a permuted quantum circuit.

MANAGING CONTACT STATUS UPDATES IN A PRESENCE MANAGEMENT SYSTEM

20200092394 · 2020-03-19 ·

A system configured to perform operations to receive, via a network communication interface, an indication of a power event occurring at a first device. The first device is for an online identity. The power event causes the first device to switch from an external power source to an internal battery. The first device represents that the online identity is online while the first device receives power from the internal battery. The system is further configured to perform operations to hold, at a second device, at least one status update for an online contact of the online identity while the first device receives power from the internal battery. Furthermore, the system is configured to perform operations to release, for transmission to the first device, the at least one status update in response to determining that the first device switches back to the external power source.

Managing contact status updates in a presence management system

10594830 · 2020-03-17 ·

International Business Machines Corporation

Game Rendering Method, Terminal Device, and Non-Transitory Computer-Readable Storage Medium

20200081722 · 2020-03-12 ·

Senlin Li

The present disclosure discloses a game rendering method and a terminal device. The terminal device includes a JS layer, a bridge layer, and a system framework layer. The method includes the follows. The JS layer transmits drawing instructions cached in an instruction set to the bridge layer, when a number of the drawing instructions cached in the instruction set is greater than or equal to a first threshold. The bridge layer obtains a rendering result by using an OpenGL capability to process the drawing instructions, and transmits the rendering result to the system framework layer. The system framework layer performs rendering based on the rendering result.

Computer system and method for multi-processor communication

10564947 · 2020-02-18 ·

Karlsruhe Institute of Technology

A compiler system, computer-implemented method and computer program product for optimizing a program for multi-processor system execution. The compiler includes an interface component configured to load from a storage component program code to be executed by one or more processors (P1 to Pn) of a multi-processor system. The compiler further includes a static analysis component configured to determine data dependencies) within the program code, and further determines all basic blocks of the control flow graph providing potential insertion positions along paths where communication statements can be inserted to enable data flow between different processors at runtime. An evaluation function component of the compiler is configured to evaluate each potential insertion position with regards to its impact on program execution on the multi-processor system at runtime by using a predefined execution evaluation function.

Systems and methods for energy proportional scheduling

10540107 · 2020-01-21 ·

Reservoir Labs, Inc.

A compilation system using an energy model based on a set of generic and practical hardware and software parameters is presented. The model can represent the major trends in energy consumption spanning potential hardware configurations using only parameters available at compilation time. Experimental verification indicates that the model is nimble yet sufficiently precise, allowing efficient selection of one or more parameters of a target computing system so as to minimize power/energy consumption of a program while achieving other performance related goals. A voltage and/or frequency optimization and selection is presented which can determine an efficient dynamic hardware configuration schedule at compilation time. In various embodiments, the configuration schedule is chosen based on its predicted effect on energy consumption. A concurrency throttling technique based on the energy model can exploit the power-gating features exposed by the target computing system to increase the energy efficiency of programs.

Compiler for performing zero-channel removal

11941533 · 2024-03-26 ·

Perceive Corporation

Some embodiments provide a compiler for optimizing the implementation of a machine-trained network (e.g., a neural network) on an integrated circuit (IC). The compiler of some embodiments receives a specification of a machine-trained network including multiple layers of computation nodes and generates a graph representing options for implementing the machine-trained network in the IC. The compiler, as part of generating the graph, in some embodiments, determines whether any set of channels contains no non-zero values (i.e., contains only zero values). For sets of channels that include no non-zero values, some embodiments perform a zero channel removal operation to remove all-zero channels wherever possible. In some embodiments, zero channel removal operations include removing input channels, removing output channels, forward propagation, and backward propagation of channels and constants.

Systems and methods for minimizing communications

11907549 · 2024-02-20 ·

Qualcomm Incorporated

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.

Power optimization in an artificial intelligence processor

11892896 · 2024-02-06 ·

Groq, Inc.

Sushma Honnavara-Prasad

In one embodiment, the present disclosure includes a method of reducing power in an artificial intelligence processor. For each cycle, over a plurality of cycles, an AI model is translated into operations executable on an artificial intelligence processor. The translating is based on power parameters that correspond to power consumption and performance of the artificial intelligence processor. The AI processor is configured with the executable operations, and input activation data sets are processed. Accordingly, result sets, power consumption data, and performance data are generated and stored over the plurality of cycles. The method further includes training an AI algorithm using the stored parameters, the power consumption data, and the performance data. A trained AI algorithm outputs a plurality of optimized parameters to reduce power consumption of the AI processor. The AI model is then translated into optimized executable operations based on the plurality of optimized parameters.

Patent classifications

G06F8/4432