Patent classifications
G06F8/45
MULTI-VERSION SHADERS
Described herein are techniques for generating a stitched shader program. The techniques include identifying a set of shader programs to include in the stitched shader program, wherein the set includes at least one multiversion shader program that includes a first version of instructions and a second version of instructions, wherein the first version of instructions uses a first number of resources that is different than a second number of resources used by the second version of instructions. The techniques also include combining the set of shader programs to form the stitched shader program. The techniques further include determining a number of resources for the stitched shader program. The techniques also include based on the determined number of resources, modifying the instructions corresponding to the multiversion shader program to, when executed, execute either the first version of instructions, or the second version of instructions.
LOGIC FABRIC BASED ON MICROSECTOR INFRASTRUCTURE WITH DATA REGISTER HAVING SCAN REGISTERS
Systems and methods described herein may relate to providing a dynamically configurable circuitry able to be programed using a microsector granularity. Furthermore, selective partial reconfiguration operations may be performed use write operations to write a new configuration over existing configurations to selectively reprogram a portion of programmable logic. A quasi-delay insensitive (QDI) shift register and/or control circuitry receiving data and commands from an access register disposed between portions of programmable logic may enable at least some of the operations described.
Software Acceleration Platform for Supporting Decomposed, On-Demand Network Services
An example embodiment may involve obtaining one or more blueprint files. The blueprint files may collectively define a system of processing nodes, a call flow involving a sequence of messages exchanged by the processing nodes, and message formats of the messages exchanged by the processing nodes. The example embodiment may also involve compiling the blueprint files into machine executable code. The machine executable code may be capable of: representing the processing nodes as decomposed, dynamically invoked units of logic, and transmitting the sequence of messages between the units of logic in accordance with the message formats. The units of logic may include a respective controller and one or more respective workers for each type of processing node.
Memory-based distributed processor architecture
Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.
APPARATUS AND METHOD FOR COMPILER HINTS FOR INTER-CORE OFFLOAD
Apparatus and method for inserting offload hints for core-to-core offload operations. For example, one embodiment of a method comprises: evaluating instruction sequence for potential parallelization to determine if an adequate level of parallelization exists for core-to-core offload work; if an adequate level of parallelization exists, then selectively inserting offload hint instructions to offload work from a parent core to a helper core; processing the instruction sequence on a first core including the offload hint instructions; and responsive to a first offload hint instruction, the first core offloading work to a second core without operating system (OS) intervention.
COMPUTE UNIT SORTING FOR REDUCED DIVERGENCE
Described herein are techniques for reducing divergence of control flow in a single-instruction-multiple-data processor. The method includes, at a point of divergent control flow, identifying control flow targets for different execution items, sorting the execution items based on the control flow targets, reorganizing the execution items based on the sorting, and executing after the point of divergent control flow, with the reorganized execution items.
BUILDING SYSTEM WITH A BUILDING MODEL EDITOR
A building management system for generating a building model for a building and operating building equipment of the building based on the building model. The system includes a processing circuit configured to receive a context, wherein the context includes metadata defining the building model for the building and generate a building model editor interface for viewing and editing the received context, wherein the building model interface includes building elements for the building model, wherein the building elements are based on the received context and represent the building equipment. The processing circuit is configured to receive user edits of the context via the building model interface, wherein the user edits include edits to the building elements, generate an updated context based on the user edits of the context, and deploy the updated context to control environmental conditions of the building with the building equipment based on the updated context.
APPLICATION DIVISION DEVICE, METHOD AND PROGRAM
A function defined in source code of an application is further partitioned into a plurality of logics without depending on function definition performed by a developer. An application partitioning apparatus (1) for partitioning an application distributively processed by a plurality of information processing apparatuses into a plurality of logics includes an acquisition unit (121) which acquires source code of the application, a first partitioning unit (122) which identifies a plurality of functions defined in the source code and partitioning the source code into the plurality of functions, a determination unit (123) which determines whether each of the partitioned functions can be further partitioned according to rules set in advance, and a second partitioning unit (124) which, when it is determined that each of the partitioned function can be partitioned, partitions the function into a plurality of functions including one or a plurality of rows.
DATAFLOW GRAPH PROGRAMMING ENVIRONMENT FOR A HETEROGENOUS PROCESSING SYSTEM
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
Adaptive locking in elastic threading systems
A multithreading system that performs elastic threading and dynamic patching is provided. The system receives a compiled object of a computing process, the compiled object comprising a set of locking instructions for ensuring exclusive access of a resource by the computing process. The system determines a thread count for the computing process. When the thread count indicates that a single thread is allocated to execute the computing process, the system patches the compiled object with a set of no-operation (NOP) instructions in place of the set of locking instructions. When the thread count indicates that two or more threads are allocated to execute the computing process, the system patches the compiled object with the set of locking instructions in place of the set of NOP instructions. The system executes the computing process according to the patched compiled object.