G06F8/453

Seamless place and route for heterogenous network of processor cores

Methods and systems related to parallel computing using heterogeneous networks of computational nodes are disclosed herein. A method for executing a complex computation on a heterogeneous set of computational nodes linked together by a set of links in a network is disclosed. The method includes compiling, using a table of bandwidth values for the set of links in the network, a set of instructions for routing data for the execution of the complex computation. The method also includes configuring a set of programmable controllers on the heterogeneous set of computational nodes with the set of instructions. The method also includes executing the set of instructions using the set of programmable controllers. The method also includes routing data through the network to facilitate the execution of the complex computation by the heterogeneous set of computational nodes and in response to the execution of the instructions.

Voice command integration for local network connected devices
11956494 · 2024-04-09 · ·

Various arrangements for facilitating smart television content receivers in a local network are provided. In an example, a secondary television receiver receives audio data, converts the audio data into voice command data, and transmits the voice command data to a primary television receiver. In response, the primary television receiver transmits the voice command data to a voice processing server via the Internet, receives a command generated based on the voice command data, and transmits the command to the secondary television receiver. Based on the command, an operation of the secondary television receiver is controlled.

MAPPING COMPONENTS OF A NON-DISTRIBUTED ENVIRONMENT TO A DISTRIBUTED ENVIRONMENT

Embodiments of the present invention disclose a method, a computer program product, and a computer system for mapping components of non-distributed environments to distributed environments. A computer receives a data pipeline configured for a non-distributed environment and identifies one or more bottleneck components of the data pipeline. In addition, the computer converts data used in the pipeline to a format compatible with a distributed environment and installs the necessary computing libraries necessary for operating the pipeline within the distributed environment. The computer further converts the code of the pipeline to a code that is compatible with the distributed environment and optimizes components of the pipeline for use in the distributed environment.

CODE CONVERSION APPARATUS AND METHOD FOR IMPROVING PERFORMANCE IN COMPUTER OPERATIONS
20190317767 · 2019-10-17 · ·

A code conversion apparatus includes a memory and a processor coupled to the memory. The memory is configured to store therein a first code including a first data definition of a plurality of arrays, a first operation for the plurality of arrays, and a second data definition of an array indicating a result of the first operation. The processor is configured to convert the first data definition and the second data definition included in the first code into a data definition of an array of structures. The processor is configured to convert the first operation included in the first code into a second operation for the array of structures. The processor is configured to generate a second code including a predetermined instruction to perform the second operation on different pieces of data of the plurality of arrays in parallel with one another.

SYSTEMS AND METHODS FOR FACILITATING STREAMING IN A LOCAL NETWORK WITH MULTIPLE SUBNETS
20240146993 · 2024-05-02 ·

Systems, methods, and non-transitory, machine-readable media to facilitate streaming in a local network are disclosed. A primary media device may be configured to: operate as a server in a local network, receive audio/video (A/V) content, and provide the A/V content to a first display. A secondary media device may be communicatively connected to the primary media device and may be configured to: operate as a client with respect to the primary media device in the local network, receive the A/V content from the primary media device, and provide the A/V content to a second display. The primary media device and the secondary media device may use multiple subnets in the local network. The primary media device and/or the secondary media device may select a first subnet of the multiple subnets to use based at least in part on a type of content to communicate via the first subnet.

OPTIMIZE CONTROL-FLOW CONVERGENCE ON SIMD ENGINE USING DIVERGENCE DEPTH
20190294444 · 2019-09-26 ·

There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.

Compiling a parallel loop with a complex access pattern for writing an array for GPU and CPU

Computer-implemented methods are provided for compiling a parallel loop and generating Graphics Processing Unit (GPU) code and Central Processing Unit (CPU) code for writing an array for the GPU and the CPU. A method includes compiling the parallel loop by (i) checking, based on a range of array elements to be written, whether the parallel loop can update all of the array elements and (ii) checking whether an access order of the array elements that the parallel loop reads or writes is known at compilation time. The method further includes determining an approach, from among a plurality of available approaches, to generate the CPU code and the GPU code based on (i) the range of the array elements to be written and (ii) the access order to the array elements in the parallel loop.

Compiling a parallel loop with a complex access pattern for writing an array for GPU and CPU

Computer-implemented methods are provided for compiling a parallel loop and generating Graphics Processing Unit (GPU) code, and Central Processing Unit (CPU) code for writing an array for the CPU and the CPU. A method includes compiling the parallel loop by (i) checking, based on a range of array elements to be written, whether the parallel loop can update all of the array elements and (ii) checking whether an access order of the array elements that the parallel loop reads or writes is known at compilation time. The method further includes determining an approach, from among a plurality of available approaches, to generate the CPU code and the GPU code based on (i) the range of the array elements to be written and (ii) the access order to the array elements in the parallel loop.

Shared local memory tiling mechanism

An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.

Method and apparatus for performing machine learning operations in parallel on machine learning hardware

A method includes receiving a first set of data. The method also includes receiving an instruction to determine a largest value within the first set of data. The first set of data is divided into a first plurality of data portions based on a hardware architecture of a first plurality of processing elements. The first plurality of data portions is mapped to the first plurality of processing elements. Each data portion of the first plurality of data portions is mapped exclusively to a processing element of the first plurality of processing elements. Each data portion of the first plurality of data portions is processed by its respective processing element to identify a largest value from each data portion of the first plurality of data portions, wherein the processing forms a first output data comprising the largest value from the each data portion of the first plurality of data portions.