G06F9/30087

System and method for asynchronous distribution of operations that require synchronous execution

A system and method may manage traffic to software applications that ingest operations into an asynchronous queue when those operations are required to execute in a synchronous manner. An identifier may be retrieved from data corresponding to each client operation. A process distribution module may be placed in front of the two incompatible systems/applications to inspect each data payload and intelligently distribute the transactions to each instance based on a well-defined algorithm (e.g., even/odd, last digit, etc.). Synchronous execution may then occur according to a timestamp for each operation.

Processor with Split Read
20230004392 · 2023-01-05 ·

An apparatus includes a processor and split-read control circuitry (SRCC). The processor is to issue a set of one or more split-read requests for loading one or more data values to one or more respective local registers of the processor. The SRCC is to receive the set of one or more split-read requests, to read the one or more data values on behalf of the processor, and to write the data values into the one or more respective local registers. The processor and the SRCC are to coordinate a status of the split-read requests via a split-read-status indication.

SYNCHRONIZATION BARRIER

Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.

System, Apparatus And Methods For Minimum Serialization In Response To Non-Serializing Register Write Instruction
20220413860 · 2022-12-29 ·

In one embodiment, a processor includes: a plurality of registers; a front end circuit to fetch and decode a non-serializing register write instruction, the non-serializing register write instruction to cause a value to be stored in a first register of the plurality of registers; and an execution circuit coupled to the front end circuit. The execution circuit, in response to the non-serializing register write instruction, is to determine an amount of serialization for the non-serializing register write instruction and execute the non-serializing register write instruction according to the amount of serialization. Other embodiments are described and claimed.

Automatic vision guided intelligent fruits and vegetables processing system and method

Intelligence guided system and method for fruits and vegetables processing includes a conveyor for carrying produces, various image acquiring and processing hardware and software, water and air jets for cutting and controlling the position and orientation of the produces, and a networking hardware and software, operating in synchronism in an efficient manner to attain speed and accuracy of the produce cutting and high yield and low waste produces processing. The 2nd generation strawberry decalyxing system (AVID2) uniquely utilizes a convolutional neural network (AVIDnet) supporting a discrimination network decision, specifically, on whether a strawberry is to be cut or rejected, and computing a multi-point cutline curvature to be cut along by rapid robotic cutting tool.

Prefetch mechanism for a cache structure

An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.

Monitoring Apparatus, Device, Method, and Computer Program and Corresponding System

Examples relate to a monitoring apparatus, a monitoring device, a monitoring method, and to a corresponding computer program and system. The monitoring apparatus is configured to obtain a first compute kernel to be monitored and to obtain one or more second compute kernels. The monitoring apparatus is configured to provide instructions, using interface circuitry, to control circuitry of a computing device comprising a plurality of execution units, to instruct the control circuitry to execute the first compute kernel using a first slice of the plurality of execution units and to execute the one or more second compute kernels concurrently with the first compute kernel using one or more second slices of the plurality of execution units, and to instruct the control circuitry to provide information on a change of a status of at least one hardware counter associated with the first slice that is caused by the execution of the first compute kernel. The monitoring apparatus is configured to determine information on the execution of the first compute kernel based on the information on the change of the status of the at least one hardware counter.

Implementation of load acquire/store release instructions using load/store operation with DMB operation

A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

Thread creation on local or remote compute elements by a multi-threaded, self-scheduling processor
11513840 · 2022-11-29 · ·

Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

Synchronization in a multi-tile processing arrangement

A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.