G06F9/522

PARALLEL DATA PROCESSING IN EMBEDDED SYSTEMS
20220342722 · 2022-10-27 · ·

The invention claims a computer-implemented method of lock-free parallel data processing in autonomous embedded systems comprising: one or more producer providing data, a Smart Object Pool, an asynchronous publisher object capable of cloning event objects, a circular batching queue, an event object handler, and a subscriber being arranged for sending the event objects to one or more consumers and for returning the event object to the smart object pool once it determines that no more consumer needs the event.

System and Method for Efficient Snapshots Barrier Mechanism for System With Presorted Container-Based Log
20220342721 · 2022-10-27 ·

A method, computer program product, and computer system for permitting, by a computing device, entering of a barrier object of a plurality of barrier objects with a first set of one or more Application Programming Interfaces (APIs) only when the barrier object is not set. The first set of the one or more APIs on the barrier object may wait until the barrier object is reset. A second set of the one or more APIs may set the barrier object. Waiting may occur until there are no longer any flows in the barrier object.

Task graph scheduling for workload processing

Techniques for scheduling operations for a task graph on a processing device are provided. The techniques include receiving a task graph that specifies one or more passes, one or more resources, and one or more directed edges between passes and resources; identifying independent passes and dependent passes of the task graph; based on performance criteria of the processing device, scheduling commands to execute the passes; and transmitting scheduled commands to the processing device for execution as scheduled.

SYNCHRONIZATION MECHANISMS FOR A MULTI-CORE PROCESSOR
20230077301 · 2023-03-09 ·

Systems, apparatuses and methods suitable for optimizing synchronization mechanisms for multi-core processors are provided. The synchronizing mechanisms may be optimized by receiving a command stream which comprises a plurality of commands including one or more wait commands, wherein each wait command has an associated state and one or more associated conditions; sequentially processing each command in the command stream until a wait command is reached; checking the state associated with the wait command to be processed, wherein if said state is a blocking state, further processing of commands in the command stream is paused until each of said wait command's associated conditions are met, and wherein if said state is a non-blocking state, the next command in the command stream is retrieved and processed.

Gateway fabric ports

A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway acts to route data between accelerators which are connected in a scaled system of multiple gateways and accelerators using a global address space set up at compile time of an application to run on the computer system.

System and Method for Lock-free Shared Data Access for Processing and Management Threads
20230128503 · 2023-04-27 ·

A method, computer program product, and computing system for defining a first flow for one or more processing threads with access to shared data within the storage system. The one or more processing threads may be executed using the first flow. A processing thread reference count may be determined for the one or more processing threads being executed using the first flow. One or more management threads may be executed on the shared data within the storage system based upon, at least in part, the processing thread reference count.

Communication in a computer having multiple processors
11599363 · 2023-03-07 · ·

A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.

Extended sync network

An apparatus is provided for converting the form in which a synchronisation request for a barrier synchronisation is provided. The synchronisation request is provided from a first synchronisation circuitry to a second synchronisation circuitry by asserting one of a set of separate signals that may each correspond to a bit in a register or a signal on a wire. The second synchronisation circuitry provides for the packetisation of the sync request by sending a packet comprising the sync request over a network to be received at a further subsystem.

Storage device write barriers

Technologies are provided for supporting storage device write barriers. A host computer can be configured to transmit a write barrier command to a storage device to indicate that one or more data access commands should be processed before one or more other data access commands are processed. For example, a host computer can transmit one or more data access commands to a storage device. The host computer can then transmit a write barrier command to the storage device. The storage device can be configured to receive the write barrier command and to associate a write barrier with the one or more data access commands. The host computer can continue to transmit additional data access commands to the storage device. However, the storage device will not process the additional data access commands until after the one or more data access commands associated with the write barrier have been processed.

Data synchronization for image and vision processing blocks using pattern adapters

A hardware thread scheduler (HTS) is provided for a multiprocessor system. The HTS is configured to schedule processing of multiple threads of execution by resolving data dependencies between producer modules and consumer modules for each thread. Pattern adaptors may be provided in the scheduler that allows mixing of multiple data patterns across blocks of data. Transaction aggregators may be provided that allow re-using the same image data by multiple threads of execution while the image date remains in a given data buffer. Bandwidth control may be provided using programmable delays on initiation of thread execution. Failure and hang detection may be provided using multiple watchdog timers.