Patent classifications
G06F8/451
PARALLELIZATION METHOD, PARALLELIZATION TOOL, AND IN-VEHICLE DEVICE
A computer generates a parallel program, based on an analysis of a single program that includes a plurality of tasks written for a single-core microcomputer, by parallelizing parallelizable tasks for a multi-core processor having multiple cores. The computer includes a macro task (MT) group extractor that analyzes, or finds, a commonly-accessed resource commonly accessed by the plurality of tasks, and extracts a plurality of MTs showing access to such commonly-accessed resource. Then, the computer uses an allocation restriction determiner to allocate the extracted plural MTs to the same core in the multi-core processor. By devising a parallelization method described above, an overhead in an execution time of the parallel program by the multi-core processor is reduced, and an in-vehicle device is enabled to execute each of the MTs in the program optimally.
Method for Managing a Runtime System for a Hybrid Computing Architecture, Managed Runtime System, Apparatus and Computer Program
Examples relate to a method for managing a runtime system for a hybrid computing architecture, a device, an apparatus and to a corresponding computer program. The apparatus is configured to create a thread pool for each work thread of a computer program, generate native code for at least one of the at least two ISAs for code segments of the computer program, assign native code sequences to a corresponding thread in the thread pool for execution, with the native code sequences comprising the native code of the code segments, and execute the native code sequences.
VOICE COMMAND INTEGRATION FOR LOCAL NETWORK CONNECTED DEVICES
Various arrangements for facilitating smart television content receivers in a local network are provided. In an example, a secondary television receiver receives audio data, converts the audio data into voice command data, and transmits the voice command data to a primary television receiver. In response, the primary television receiver transmits the voice command data to a voice processing server via the Internet, receives a command generated based on the voice command data, and transmits the command to the secondary television receiver. Based on the command, an operation of the secondary television receiver is controlled.
DYNAMIC TASK ALLOCATION FOR NEURAL NETWORKS
The subject technology provides for dynamic task allocation for neural network models. The subject technology determines an operation performed at a node of a neural network model. The subject technology assigns an annotation to indicate whether the operation is better performed on a CPU or a GPU based at least in part on hardware capabilities of a target platform. The subject technology determines whether the neural network model includes a second layer. The subject technology, in response to determining that the neural network model includes a second layer, for each node of the second layer of the neural network model, determines a second operation performed at the node. Further the subject technology assigns a second annotation to indicate whether the second operation is better performed on the CPU or the GPU based at least in part on the hardware capabilities of the target platform.
LEARNED GRAPH OPTIMIZATIONS FOR COMPILERS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for compiler optimizations using a compiler optimization network. One of the methods includes receiving an input program, wherein the input program defines a graph of operation modules, wherein each node in the graph is a respective operation module, and each edge between nodes in the graph represents one operation module receiving the output generated by another operation module. The input program is processed by a compiler optimization network comprising a graph-embedding network that is configured to encode operation features and operation dependencies of the operation modules of the input program into a graph embedding representation and a policy network that is configured to generate an optimization action for each of one or more nodes encoded in the graph embedding representation. The compiler optimization network generates an output optimization plan comprising one or more optimization actions for the input program.
Dynamic distribution of loads across heterogeneous computing structures in computational rendering
Embodiments for dynamically distributing loads in computational rendering in a computing environment. A computational rendering model on a computational rendering to exploit nested recursive parallelism within a heterogenous computing architecture to enable communication overlap, memory transfer, and data and task management, wherein the computational rendering model is developed for the heterogenous computing architecture.
DATAFLOW GRAPH PROGRAMMING ENVIRONMENT FOR A HETEROGENOUS PROCESSING SYSTEM
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
COMPILER FOR TRANSLATING BETWEEN A VIRTUAL IMAGE PROCESSOR INSTRUCTION SET ARCHITECTURE (ISA) AND TARGET HARDWARE HAVING A TWO-DIMENSIONAL SHIFT ARRAY STRUCTURE
A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.
Reducing the scan cycle time of control applications through multi-core execution of user programs
A method for pipeline parallelizing a control program for multi-core execution includes using (12) data dependency analysis on a control program to identify tasks that can be performed in parallel, identifying (13) a largest task Tmax requiring the most execution time of the identified tasks, identifying (14) cut-points in the largest task Tmax where data dependency delays decouple the task, inserting (15) delayed data dependencies into cut-points of the largest task Tmax to create N pipeline sub-tasks, in which N is a number of cores available to a processor on which the control program will be executed, and scheduling (16) the tasks and pipeline sub-tasks to the available processor cores.
COMPILER-INITIATED TILE REPLACEMENT TO ENABLE HARDWARE ACCELERATION RESOURCES
A processing system includes a compiler that automatically identifies sequences of instructions of tileable source code that can be replaced with tensor operations. The compiler generates enhanced code that replaces the identified sequences of instructions with tensor operations that invoke a special-purpose hardware accelerator. By automatically replacing instructions with tensor operations that invoke the special-purpose hardware accelerator, the compiler makes the performance improvements achievable through the special-purpose hardware accelerator available to programmers using high-level programming languages.