G06F8/456

Parallelizing compile method, parallelizing compiler, parallelizing compile apparatus, and onboard apparatus

A parallelizing compile method includes, dividing a sequential program for an embedded system into multiple macro tasks, specifying (i) a starting end task and (ii) a termination end task, fusing (i) the starting end task, (ii) the termination end task, and (iii) a group of the multiple macro tasks, extracting a group of multiple new macro tasks from the multiple new macro tasks fused in the fusing based on a data dependency, performing a static scheduling assigning the multiple new macro tasks to the multiple processor units, so that the group of the multiple new macro tasks is parallelly executable by the multiple processor units, and generating a parallelizing program. In addition, a parallelizing compiler, a parallelizing compile apparatus and an onboard apparatus are provided.

Offload computing protocol

Systems and methods for are provided for offloading computing tasks from constrained devices. An example apparatus includes an offload computing protocol (OCP) enabled device. The OCP enabled device includes OCP extensions to the operating system to enable the offloading of computing tasks. A proximity locator may use a radio transceiver to locate an OCP device that can accept a computing task. The OCP enabled device may include an OCP bundle comprising code and data, wherein the OCP bundle is to be sent to the OCP device.

Pre-instruction scheduling rematerialization for register pressure reduction

Examples are disclosed herein that relate to performing rematerialization operation(s) on program source code prior to instruction scheduling. In one example, a method includes prior to performing instruction scheduling on program source code, for each basic block of the program source code, determining a register pressure at a boundary of the basic block, determining whether the register pressure at the boundary is greater than a target register pressure, based on the register pressure at the boundary being greater than the target register pressure, identifying one or more candidate instructions in the basic block suitable for rematerialization to reduce the register pressure at the boundary, and performing a rematerialization operation on at least one of the one or more candidate instructions to reduce the register pressure at the boundary to be less than the target register pressure.

Optimization of execution of smart contracts

An example operation includes one or more of receiving a smart contract code by an analyzer node, building, by the analyzer node, a control flow-graph comprising a plurality of basic code blocks based on the smart contract code, computing, by the analyzer node, a read and write set for each of the basic code blocks from the plurality of the basic code blocks, and determining, by the analyzer node, at least two basic code blocks from the plurality of the basic code blocks that may be executed in parallel.

Systems and methods for tensor scheduling

A technique for efficient scheduling of operations in a program for parallelized execution thereof using a multi-processor runtime environment having two or more processors includes constraining the type or number of loop optimization transforms that may be explored such that memory and processing capacity available for the scheduling task are not exceeded, while facilitating a tradeoff between memory locality, parallelization, and/or data communication between memory modules of the multi-processor runtime environment.

METHOD AND APPARATUS FOR RETAINING OPTIMAL WIDTH VECTOR OPERATIONS IN ARBITRARY/FLEXIBLE VECTOR WIDTH ARCHITECTURE

A method and apparatus to optimize a list of vector instructions using dynamic programming, in particular memoization, by generating a table containing instruction subvectors having individual (parts), contiguous (superparts) and repeated (broadcasts) lanes. Because the instructions in the table are subvectors selected to have individual, contiguous and repeated lanes in the registers, compiler optimizations can be enhanced. Introduction of such dynamic programming allows for speculative lane optimizations, as well as improved analysis-guided optimizations, either of which can be performed alone or in combination with other optimizations, whether or not they make use of dynamic programming.

Automatic compiler dataflow optimization to enable pipelining of loops with local storage requirements

Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).

OFFLOAD COMPUTING PROTOCOL
20220188165 · 2022-06-16 ·

Systems and methods for are provided for offloading computing tasks from constrained devices. An example apparatus includes an offload computing protocol (OCP) enabled device. The OCP enabled device includes OCP extensions to the operating system to enable the offloading of computing tasks. A proximity locator may use a radio transceiver to locate an OCP device that can accept a computing task. The OCP enabled device may include an OCP bundle comprising code and data, wherein the OCP bundle is to be sent to the OCP device.

AUTOMATED DESIGN OF FIELD PROGRAMMABLE GATE ARRAY OR OTHER LOGIC DEVICE BASED ON ARTIFICIAL INTELLIGENCE AND VECTORIZATION OF BEHAVIORAL SOURCE CODE
20220164510 · 2022-05-26 ·

A method includes obtaining behavioral source code defining logic to be performed using at least one logic device, hardware information associated with the at least one logic device, and constraints identifying user requirements associated with the at least one logic device. The method also includes generating a design for the at least one logic device using the behavioral source code, the hardware information, and the constraints. The design enables the at least one logic device to execute the logic while satisfying the user requirements. The design is generated using a machine learning/artificial intelligence (ML/AI) algorithm that iteratively modifies potential designs to meet the user requirements.

Instruction set

The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.