Patent classifications
G06F8/458
LOCKING STRUCTURES IN FLASH MEMORY
Systems and methods for managing content in a flash memory. A locking data structure is used to control access to data structures and the locking data structure is implemented in flash memory. The locking data structure is updated by overwriting the data such that the associated data structure is identified as locked or unlocked.
EFFICIENT PROFILING-BASED LOCK MANAGEMENT IN JUST-IN-TIME COMPILERS
Aspects of the present disclosure describe techniques for managing locks in just-in-time compiled code in a software application. An example method generally includes profiling locks by during execution of the JIT compiled code. Locks are generally profiled by identifying locks on resources accessed by the JIT compiled code, and recording access information for each of the identified locks. When a safepoint is reached during execution of the JIT compiled code, one or more locks eligible for conversion to a biased lock are identified .based on the recorded access information for each of the identified locks, one or more locks eligible for conversion to a biased lock. Each respective lock of the one or more eligible locks is converted to a biased lock based on a current lock status of the respective lock.
Optimize control-flow convergence on SIMD engine using divergence depth
There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.
SYSTEM AND METHODS WITH REDUCED COMPLEXITY IN THE INTEGRATION OF EXPOSED INFORMATION MODELS WITH APPLICATIONS
A computer system for automated model integration of an information model with a corresponding application includes: an information model server for exposing an information model to a consumer, the exposed information model including model-elements for exposing types or classes, and for exposing instances of types or classes and their member-values; an application component for providing application code augmented with mapping descriptions defining how an internal information model of the application is mapped to the exposed information model; and a model integration component that: registers internal information model-elements to be exposed; maps the registered internal information model-elements to exposed information model-elements in accordance with the mapping descriptions; and updates an information model-element by: detecting a change of an internal or exposed information model-element; determining a synchronization direction; and performing match-making to determine a model-element corresponding to the changed model-element by using signatures of the corresponding information model-elements.
Subscription handling and in-memory alignment of unsynchronized real-time data streams
Methods for subscription handling and in-memory alignment of unsynchronized real-time data streams. A method (500) includes receiving a subscription (631) containing a signal identifier (626), and unsynchronized data (640). The method also includes detecting if the unsynchronized data for an actual time of measurement (ATM) timestamp (615) has completely arrived, and aligning (505) the unsynchronized data in predefined time slots (610). The method further includes filling (510) in data gaps (805) in the unsynchronized data for the ATM timestamp, and handling (520) the subscription using values (642) from the unsynchronized data for the ATM timestamp, and performing (515) memory protection when the subscription is handling inefficiently.
System and method to accelerate reduce operations in graphics processor
Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.
Reordering condition checks within code
Described is a computer-implemented method of reordering condition checks. Two or more condition checks in computer code that may be reordered within the code are identified. It is determined that the execution frequency of a later one of the condition checks is satisfied at a greater frequency than a preceding one of the condition checks. It is determined that there is an absence of side effects in the two or more condition checks. The values of the condition checks are propagated and abstract interpretation is performed on the values that are propagated. It is determined that the condition checks are exclusive of each other, and the condition checks are reordered within the computer code.
Code compilation for scaling accelerators
A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.
Synchronization of concurrent computation engines
Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.
Deep neural networks compiler for a trace-based accelerator
A method of compiling neural network code to executable instructions for execution by a computational acceleration system having a memory circuit and one or more acceleration circuits having a maps data buffer and a kernel data buffer is disclosed, such as for execution by an inference engine circuit architecture which includes a matrix-matrix (MM) accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative compiling method includes generating a list of neural network layer model objects; fusing available functions and layers in the list; selecting a cooperative mode, an independent mode, or a combined cooperative and independent mode for execution; selecting a data movement mode and an ordering of computations which reduces usage of the memory circuit; generating an ordered sequence of load objects, compute objects, and store objects; and converting the ordered sequence of load objects, compute objects, and store objects into the executable instructions.