Patent classifications
G06F8/458
Locking data structures with locking structures in flash memory by setting bits in the locking structures
Systems and methods for managing content in a flash memory. A locking data structure is used to control access to data structures and the locking data structure is implemented in flash memory. The locking data structure is updated by overwriting the data such that the associated data structure is identified as locked or unlocked.
Annotation-driven framework for generating state machine updates
Embodiments of the present disclosure relate to techniques for maintaining a state of a distributed system. In particular, certain embodiments relate to identifying a function. Some embodiments relate to, upon determining that the function comprises an annotation indicating that the function is capable of modifying the state of the distributed system, transforming the function to allow the function to generate updates to a state machine.
OPTIMIZING ACCESSES TO READ-MOSTLY VOLATILE VARIABLES
A computer-implemented method, computer program product, and computer processing system are provided for eliminating a memory fence for reading a read-mostly volatile variable of a computer system. The read-mostly variable is read from more than written to. The method includes writing data to the read-mostly volatile variable only during a Stop-The-World (STW) state of the computer system. The method further includes executing the memory fence in any mutator threads and thereafter exiting the STW state. The method also includes reading the read-mostly volatile variable by the mutator threads without executing the memory fence after the STW state.
Application interface on multiple processors
A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.
CODE COMPILATION FOR SCALING ACCELERATORS
A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.
Optimizing accesses to read-mostly volatile variables
A computer-implemented method, computer program product, and computer processing system are provided for eliminating a memory fence for reading a read-mostly volatile variable of a computer system. The read-mostly variable is read from more than written to. The method includes writing data to the read-mostly volatile variable only during a Stop-The-World (STW) state of the computer system. The method further includes executing the memory fence in any mutator threads and thereafter exiting the STW state. The method also includes reading the read-mostly volatile variable by the mutator threads without executing the memory fence after the STW state.
SYSTEMS AND METHODS FOR ACCELERATING DATA OPERATIONS BY UTILIZING DATAFLOW SUBGRAPH TEMPLATES
Methods and systems are disclosed for accelerating big data operations by utilizing subgraph templates. In one example, a data processing system includes a data processing system comprising a hardware processor and a hardware accelerator coupled to the hardware processor. The hardware accelerator is configured with a compiler of an accelerator functionality to generate an execution plan, to generate computations for nodes including subgraphs in a distributed system for an application program based on the execution plan, and to execute a matching algorithm to determine similarities between the subgraphs and unique templates from an available library of templates.
CONTROL OF SCHEDULING DEPENDENCIES BY A NEURAL NETWORK COMPILER
A compiler receives a graph describing a neural network and accesses data to describe a target computing device to implement the neural network. The compiler generates an intermediate representation from the graph and the data, and determines dependencies between operations identified in the intermediate representation. A set of barrier tasks are determined to be performed to control flow of the set of operations based on the dependencies, where the set of barrier tasks are to be performed using hardware barrier components on the target computing device. Indications of the barrier tasks are inserted into the intermediate representation. The compiler generates a binary executable from the intermediate representation to enable performance of the barrier tasks to control performance of the set of operations at the target computing device.
Compiler-generated asynchronous enumerable object
A single asynchronous enumerable object is generated that contains the data and methods needed to iterate through an enumerable asynchronously. The asynchronous enumerable object contains the code for traversing the enumerable one step at a time and the operations needed to suspend an iteration to await completion of an asynchronous operation and to resume the iteration upon completion of the asynchronous operation. The allocation of a single object to perform all of these tasks reduces the memory consumption needed to execute an asynchronous enumeration.
Data flow processing method and related device
The present disclosure relates to data flow processing methods and devices. One example method includes obtaining a dependency relationship and an execution sequence of operating a data flow by a plurality of processing units, generating synchronization logic based on the dependency relationship and the execution sequence, and inserting the synchronization logic into an operation pipeline of each of the plurality of processing unit to generate executable code.