G06F8/45

Systems and methods for facilitating streaming in a local network with multiple subnets

Systems, methods, and non-transitory, machine-readable media to facilitate streaming in a local network are disclosed. A primary media device may be configured to: operate as a server in a local network, receive audio/video (A/V) content, and provide the A/V content to a first display. A secondary media device may be communicatively connected to the primary media device and may be configured to: operate as a client with respect to the primary media device in the local network, receive the A/V content from the primary media device, and provide the A/V content to a second display. The primary media device and the secondary media device may use multiple subnets in the local network. The primary media device and/or the secondary media device may select a first subnet of the multiple subnets to use based at least in part on a type of content to communicate via the first subnet.

COOPERATIVE PARALLEL MEMORY ALLOCATION
20210279837 · 2021-09-09 ·

Apparatuses, systems, and techniques to perform multi-threaded memory allocation in parallel by one or more software programs being performed on a parallel processing unit (PPU), such as a graphics processing unit (GPU), or any other processing unit capable of supporting multi-threaded software execution. In at least one embodiment, one or more software programs expressed in part by code using an application programming interface for parallel computing, such as CUDA, perform allocation, search, and deallocation of memory efficiently and in parallel on a GPU.

Compilation and execution of parallel code fragments

Systems and methods for executing compiled code having parallel code fragments is provided. One method includes storing executable code having a plurality of parallel code fragments, each of the plurality of parallel code fragments representing alternative executable paths through a code stream. The method further includes determining a code level supported by a processor executable at a computing system, the processor executable supporting a hosted computing environment. The method also includes translating the executable code into machine-readable code executable by a processor of the computing system. Translating the executable code includes selecting a code fragment from among the plurality of parallel code fragments for execution based on the code level supported by the processor executable. The method includes executing the machine-readable code within the hosted computing environment.

Constraints for applications in a heterogeneous programming environment

Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bitstream and/or binary code which configures programmable and non-programmable logic in a heterogeneous processing environment of a SoC to execute the graph. The compiler can also consider user-defined constraints when compiling the source code. The constraints can dictate where the kernels and buffers should be placed in the heterogeneous processing environment, performance requirements, data communication routes through the SoC, type of data path, delays, and the like.

Build time optimization using thread object variables
11119743 · 2021-09-14 · ·

A system includes a memory and a processor, where the processor is in communication with the memory. The processor is configured to retrieve data structure metadata from a source code of an application. Each of the complex thread variables are registered and an object is generated that is accessible from a thread initiated during execution of the application. At least one thread object implementation is generated within the object, where each of the thread object implementation corresponds to each of the complex thread variables referenced within the source code. Next, the processor is configured to modify an implementation of the source code of the application to call the at least one thread object implementation when attempting to access one or more complex thread variables referenced within the source code. Next, the source code is compiled into an object code corresponding to the application, where the object code includes the object.

Self-Optimizing Computation Graphs
20210200575 · 2021-07-01 · ·

A method includes receiving code of an application, the code structured as a plurality of instructions in a computation graph that corresponds to operational logic of the application. The method also includes processing the code according to an iterative learning process. The iterative learning process includes determining whether to adjust an exploration rate associated with the iterative learning process based on a state of a computing environment. Additionally, the process includes executing the plurality of instructions of the computation graph according to an execution policy that indicates certain instructions to be executed in parallel. The process also includes determining an execution time for executing the plurality of instructions of the computation graph according to the execution policy and based on the execution time and the exploration rate, adjusting the execution policy to reduce the execution time in a subsequent iteration.

Memory-based distributed processor architecture
11023336 · 2021-06-01 · ·

Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

System and method for executing instructions
11016776 · 2021-05-25 · ·

The present disclosure provides systems and methods for executing instructions. The system can include: processing unit having a core configured to execute instructions; and a host unit configured to: compile computer code into a plurality of instructions that includes a set of instructions that are determined to be executed in parallel on the core, wherein the set of instructions each includes an operation instruction and an indication bit and wherein the indication bit is set to identify the last instruction of the set of instructions, and provide the set of instructions to the core.

Apparatus and method for compiler hints for inter-core offload
11016766 · 2021-05-25 · ·

Apparatus and method for inserting offload hints for core-to-core offload operations. For example, one embodiment of a method comprises: evaluating instruction sequence for potential parallelization to determine if an adequate level of parallelization exists for core-to-core offload work; if an adequate level of parallelization exists, then selectively inserting offload hint instructions to offload work from a parent core to a helper core; processing the instruction sequence on a first core including the offload hint instructions; and responsive to a first offload hint instruction, the first core offloading work to a second core without operating system (OS) intervention.

Reordering condition checks within code

Described is a computer-implemented method of reordering condition checks. Two or more condition checks in computer code that may be reordered within the code are identified. It is determined that the execution frequency of a later one of the condition checks is satisfied at a greater frequency than a preceding one of the condition checks. It is determined that there is an absence of side effects in the two or more condition checks. The values of the condition checks are propagated and abstract interpretation is performed on the values that are propagated. It is determined that the condition checks are exclusive of each other, and the condition checks are reordered within the computer code.