G06F8/458

COMPILATION METHOD
20200218523 · 2020-07-09 · ·

A method for generating a program to run on multiple tiles. The method comprises: receiving an input graph comprising data nodes, compute vertices and edges; receiving an initial tile-mapping specifying which data nodes and vertices are allocated to which tile; and determining a subgraph of the input graph that meets one or more heuristic rules. The rules comprises: the subgraph comprises at least one data node, the subgraph spans no more than a threshold number of tiles in the initial tile-mapping, and the subgraph comprises at least a minimum number of edges outputting to one or more vertices on one or more other tiles. The method further comprises adapting the initial mapping to migrate the data nodes and any vertices of the determined subgraph to said one or more other tiles, and compiling the executable program from the graph with the vertices and data nodes allocated by the adapted mapping.

Compiler-Generated Asynchronous Enumerable Object
20200210156 · 2020-07-02 ·

A single asynchronous enumerable object is generated that contains the data and methods needed to iterate through an enumerable asynchronously. The asynchronous enumerable object contains the code for traversing the enumerable one step at a time and the operations needed to suspend an iteration to await completion of an asynchronous operation and to resume the iteration upon completion of the asynchronous operation. The allocation of a single object to perform all of these tasks reduces the memory consumption needed to execute an asynchronous enumeration.

Synchronisation of execution threads on a multi-threaded processor
10698690 · 2020-06-30 · ·

Method and apparatus are provided for synchronising execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronisation points corresponding to points where execution is to be synchronised with another thread. Execution of a thread is paused when it reaches a synchronisation point until at least one other thread with which it is intended to be synchronised reaches a corresponding synchronisation point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronisation event.

Compilation method
10691432 · 2020-06-23 · ·

A method for generating a program to run on multiple tiles. The method comprises: receiving an input graph comprising data nodes, compute vertices and edges; receiving an initial tile-mapping specifying which data nodes and vertices are allocated to which tile; and determining a subgraph of the input graph that meets one or more heuristic rules. The rules comprises: the subgraph comprises at least one data node, the subgraph spans no more than a threshold number of tiles in the initial tile-mapping, and the subgraph comprises at least a minimum number of edges outputting to one or more vertices on one or more other tiles. The method further comprises adapting the initial mapping to migrate the data nodes and any vertices of the determined subgraph to said one or more other tiles, and compiling the executable program from the graph with the vertices and data nodes allocated by the adapted mapping.

SUBSCRIPTION HANDLING AND IN-MEMORY ALIGNMENT OF UNSYNCHRONIZED REAL-TIME DATA STREAMS
20200192902 · 2020-06-18 ·

Methods for subscription handling and in-memory alignment of unsynchronized real-time data streams. A method (500) includes receiving a subscription (631) containing a signal identifier (626), and unsynchronized data (640). The method also includes detecting if the unsynchronized data for an actual time of measurement (ATM) timestamp (615) has completely arrived, and aligning (505) the unsynchronized data in predefined time slots (610). The method further includes filling (510) in data gaps (805) in the unsynchronized data for the ATM timestamp, and handling (520) the subscription using values (642) from the unsynchronized data for the ATM timestamp, and performing (515) memory protection when the subscription is handling inefficiently.

Improving emulation and tracing performance using compiler-generated emulation optimization metadata
10684835 · 2020-06-16 · ·

An emulator can use compiler metadata to efficiently emulate execution of executable machine code compiled from the source code. Based on accessing compiler metadata associated with machine code, an emulator can identify behavior(s) of the source code from which the machine code is compiled which are not implied by the machine code. From these behaviors, the emulator can identify emulator optimization(s) that can be applied, during emulation of execution of a thread, to reduce a number of steps needed to emulate execution the machine code, while preserving any externally-visible side-effects. These optimizations can operate to reduce a number of emulator operations needed emulate execution of the machine code, or to elide one or more machine code instructions from emulation. These optimizations can then be applied while emulating execution of the thread. The emulated execution could be recorded to a trace that is equivalent to a trace recorded without these optimizations.

IMPROVING EMULATION AND TRACING PERFORMANCE USING COMPILER-GENERATED EMULATION OPTIMIZATION METADATA
20200183669 · 2020-06-11 ·

An emulator can use compiler metadata to efficiently emulate execution of executable machine code compiled from the source code. Based on accessing compiler metadata associated with machine code, an emulator can identify behavior(s) of the source code from which the machine code is compiled which are not implied by the machine code. From these behaviors, the emulator can identify emulator optimization(s) that can be applied, during emulation of execution of a thread, to reduce a number of steps needed to emulate execution the machine code, while preserving any externally-visible side-effects. These optimizations can operate to reduce a number of emulator operations needed emulate execution of the machine code, or to elide one or more machine code instructions from emulation. These optimizations can then be applied while emulating execution of the thread. The emulated execution could be recorded to a trace that is equivalent to a trace recorded without these optimizations.

Distributed computing architecture

A distributed computing system may incorporate an implementation based on a codelet-based execution model, where a codelet is a high-level dataflow element. In addition to supporting the use of codelets, the system may further provide support for datalets, which are an extension of codelets providing better built-in support for static dataflow programming. Such a distributed computing system, implementing computing based on such codelets, may incorporate an implementation of an execution model, locality management schemes, scheduling schemes, a type system, and/or management of heterogeneous systems.

Computer system and method for parallel program code optimization and deployment

A compiler system, method and computer program product for optimizing a program is disclosed. The compiler includes an extractor module configured to extract, from an initial program code, a hierarchical task representation wherein each node of the hierarchical task representation corresponds to a potential unit of execution. The root node of the hierarchical task representation represents the entire initial program code and each child node represents a sub-set of units of execution of its respective parent node. It further has a parallelizer module configured to apply to the hierarchical task representation pre-defined parallelization rules associated with the processing device to automatically adjust the hierarchical task representation by assigning particular units of execution to particular processing units of the processing device and by inserting communication and/or synchronization in that the adjusted hierarchical task representation reflects parallel program code for the processing device.

Reducing compiler type check costs through thread speculation and hardware transactional memory

Systems, apparatuses and methods may provide for technology that generates a first compiler output based on input code that includes dynamically typed variable information and generates a second compiler output based on the input code, wherein the second compiler output includes type check code to verify one or more type inferences associated with the first compiler output. The technology may also execute the first compiler output and the second compiler output in parallel via different threads.