G06F8/453

Hardware acceleration method, compiler, and device

A hardware acceleration method, a compiler, and a device, to improve code execution efficiency and implement hardware acceleration. The method includes: obtaining, by a compiler, compilation policy information and source code, where the compilation policy information indicates that a first code type matches a first processor and a second code type matches a second processor; analyzing, by the compiler, a code segment in the source code according to the compilation policy information, and determining a first code segment belonging to the first code type or a second code segment belonging to the second code type; and compiling, by the compiler, the first code segment into first executable code, and sending the first executable code to the first processor; and compiling the second code segment into second executable code, and sending the second executable code to the second processor.

Systems and methods for energy proportional scheduling

A compilation system using an energy model based on a set of generic and practical hardware and software parameters is presented. The model can represent the major trends in energy consumption spanning potential hardware configurations using only parameters available at compilation time. Experimental verification indicates that the model is nimble yet sufficiently precise, allowing efficient selection of one or more parameters of a target computing system so as to minimize power/energy consumption of a program while achieving other performance related goals. A voltage and/or frequency optimization and selection is presented which can determine an efficient dynamic hardware configuration schedule at compilation time. In various embodiments, the configuration schedule is chosen based on its predicted effect on energy consumption. A concurrency throttling technique based on the energy model can exploit the power-gating features exposed by the target computing system to increase the energy efficiency of programs.

SHARED LOCAL MEMORY TILING MECHANISM

An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.

Vehicle master device, rewrite target group administration method, computer program product and data structure of specification data

A vehicle master device includes a rewrite specification data acquisition unit that is configured to acquire rewrite specification data from outside, a rewrite specification data analysis unit that is configured to analyze the rewrite specification data acquired by the rewrite specification data acquisition unit, a group generation unit that is configured to divide the plurality of rewrite target ECUs to generate a plurality of groups based on the rewrite specification data analyzed by the rewrite specification data analysis unit, and an instruction execution unit that is configured to instruct the plurality of rewrite target ECUs for each group of the plurality of groups generated by the group generation unit to perform at least one of installation, rollback, and activation.

Systems and methods for minimizing communications

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.

Accelerating application modernization

Various embodiments of the present technology generally relate to the characterization and improvement of software applications. More specifically, some embodiments relate to systems and methods for modeling code behavior and generating new versions of the code based on the code behavior models. In some embodiments, a method of improving a codebase includes recording a run of the existing code, characterizing the code behavior via one or more models, prototyping new code according to a target language and target environment, deploying the new code to the target environment, and comparing the behavior of the new code to the behavior of the existing code. In some implementations, generating new code based on the behavior models includes using one or more machine learning techniques for code generation based on the target language and environment.

METHODS AND APPARATUS TO CONFIGURE HETEROGENOUS COMPONENTS IN AN ACCELERATOR

Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.

PROMETHEUS: PROCESSING-IN-MEMORY HETEROGENOUS ARCHITECTURE DESIGN FROM A MULTI-LAYER NETWORK THEORETIC STRATEGY
20190370269 · 2019-12-05 ·

With increasing demand for distributed intelligent physical systems performing big data analytics on the field and in real-time, processing-in-memory (PIM) architectures integrating 3D-stacked memory and logic layers could provide higher performance and energy efficiency. Towards this end, the PIM design requires principled and rigorous optimization strategies to identify interactions and manage data movement across different vaults.

Systems and methods for minimizing communications

A system for allocation of one or more data structures used in a program across a number of processing units takes into account a memory access pattern of the data structure, and the amount of total memory available for duplication across the several processing units. Using these parameters duplication factors are determined for the one or more data structures such that the cost of remote communication is minimized when the data structures are duplicated according to the respective duplication factors while allowing parallel execution of the program.

Methods and apparatus to convert a non-series-parallel control flow graph to data flow
10496383 · 2019-12-03 · ·

Methods and apparatus to convert a non-series-parallel control flow graph to data flow. An example apparatus includes a node analyzer to detect a non-series-parallel node in sequential code, and an instruction generator to: generate instructions for prior nodes associated with the detected non-series-parallel node including a consumption operand, generate a combination instruction to combine results of the instructions generated for the prior nodes, and output the combination instructions and the instructions generated for the prior nodes to generate data flow code.