Patent classifications
G06F8/45
Reducing the scan cycle time of control applications through multi-core execution of user programs
A method for pipeline parallelizing a control program for multi-core execution includes using (12) data dependency analysis on a control program to identify tasks that can be performed in parallel, identifying (13) a largest task Tmax requiring the most execution time of the identified tasks, identifying (14) cut-points in the largest task Tmax where data dependency delays decouple the task, inserting (15) delayed data dependencies into cut-points of the largest task Tmax to create N pipeline sub-tasks, in which N is a number of cores available to a processor on which the control program will be executed, and scheduling (16) the tasks and pipeline sub-tasks to the available processor cores.
Methods and systems for detection in a state machine
A device including a data analysis element including a plurality of memory cells. The memory cells analyze at least a portion of a data stream and output a result of the analysis. The device also includes a detection cell. The detection cell includes an AND gate. The AND gate receives result of the analysis as a first input. The detection cell also includes a D flip-flop including an output coupled to a second input of the AND gate.
FAST ACCESS AND USE OF COMMON DATA VALUES RELATING TO APPLICATIONS IN PARALLEL COMPUTING ENVIRONMENTS
A mechanism is described for facilitating fast access and use of common data values relating to applications in parallel computing environments. A method of embodiments, as described herein, includes detecting a software application being hosted by a computing device, where the software application is further detected as accessing common data values. The method may further include determining whether access to the common data values is slow, and accessing an existing compiled program specific to the common data values at a database, if the access to the common data values is slow. The method may further include loading the existing compiled program to be executed by a processor at the computing device, where the existing compiled program to replace an originally compiled program.
REMOTE DIRECT MEMORY ACCESS-BASED ON STATIC ANALYSIS OF ASYNCHRONOUS BLOCKS
Described herein are methods of transferring arrays of data information by remote data memory access (RDMA). The method may include identifying data arrays in a local place that are to be copied to a remote place; and determining whether the data arrays are to be overwritten by analyzing asynchronous blocks from the data arrays in the local place at a start compilation time using a static compiler. The method may further include executing transfer of the data arrays from the local place to the remote place with a pull type RDMA.
Parallelizing compile method, parallelizing compiler, parallelizing compile apparatus, and onboard apparatus
A parallelizing compile method includes, dividing a sequential program for an embedded system into multiple macro tasks, specifying (i) a starting end task and (ii) a termination end task, fusing (i) the starting end task, (ii) the termination end task, and (iii) a group of the multiple macro tasks, extracting a group of multiple new macro tasks from the multiple new macro tasks fused in the fusing based on a data dependency, performing a static scheduling assigning the multiple new macro tasks to the multiple processor units, so that the group of the multiple new macro tasks is parallelly executable by the multiple processor units, and generating a parallelizing program. In addition, a parallelizing compiler, a parallelizing compile apparatus and an onboard apparatus are provided.
Dataflow graph programming environment for a heterogenous processing system
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
ORDERING OF SHADER CODE EXECUTION
Examples described herein relate to a graphics processing apparatus that includes a memory device and a graphics processing unit (GPU). In some examples, the GPU is configured to execute a shader program that is to identify at least two code blocks that are independent from each other and cause execution of an unexecuted independent code block with available data based on use of a scoreboard to track data availability for independent code blocks. In some examples, execution of the shader program is to cause the GPU to select a first code block identifier for tracking completion of a dependency of the first independent code block. In some examples, execution of the shader program is to cause the GPU to identify an offset to a first instruction position in a sequence of instructions of the first independent code block in an instruction queue.
Efficient scheduling of load instructions
When scheduling instructions for execution on a computing device, load instructions are processed before their dependent computational instructions. This can result in the load instructions being scheduled in a non-optimal order. To schedule the load instructions in a preferred order, a scheduler can speculatively schedule the load instructions without committing to their order. Subsequently, when the scheduler encounters the dependent computational instructions, the scheduler can reorder the speculatively scheduled load instructions according to the execution order of the dependent computational instructions.
Method, a device, and a computer program product for determining a resource required for executing a code segment
A method comprises: compiling the code segment with a compiler; and determining, based on an intermediate result of the compiling, a resource associated with a dedicated processing unit and for executing the code segment. As such, the resource required for executing a code segment may be determined quickly without actually executing the code segment and allocating or releasing the resource, which helps subsequent resource allocation and further brings about a better user experience.
METHODS AND APPARATUS FOR INTENTIONAL PROGRAMMING FOR HETEROGENEOUS SYSTEMS
Methods, apparatus, systems and articles of manufacture are disclosed for intentional programming for heterogeneous systems. An example non-transitory computer readable storage medium includes instructions that, when executed, cause processor circuitry to at least identify a first code block having a first algorithmic purpose based on a second code block having a second algorithmic purpose, the second algorithmic purpose corresponding to the first algorithmic purpose, translate the first code block into executable domain specific language code, and output the executable domain specific language code.