IPIQ

G06F9/522

Prefetch mechanism for a cache structure

11526356 · 2022-12-13 ·

Arm Limited

An apparatus and method is provided, the apparatus comprising a processor pipeline to execute instructions, a cache structure to store information for reference by the processor pipeline when executing said instructions; and prefetch circuitry to issue prefetch requests to the cache structure to cause the cache structure to prefetch information into the cache structure in anticipation of a demand request for that information being issued to the cache structure by the processor pipeline. The processor pipeline is arranged to issue a trigger to the prefetch circuitry on detection of a given event that will result in a reduced level of demand requests being issued by the processor pipeline, and the prefetch circuitry is configured to control issuing of prefetch requests in dependence on reception of the trigger.

THREAD SYNCHRONIZATION APPARATUS, THREAD SYNCHRONIZATION METHOD, AND PROGRAM

20220391265 · 2022-12-08 ·

Nippon Telegraph And Telephone Corporation

Execution speed of a thread is dynamically restored while reducing influence on processing accuracy of the whole system. A thread synchronization device (10) achieves synchronization among N threads (1-1, . . . , 1-N) to perform parallel processing by dividing time-series data into a plurality of pieces of data. Environment variables which achieve a trade-off between processing accuracy and execution speed are individually set at the threads (1-1, . . . , 1-N). An execution speed calculation unit (2) calculates execution speed of each of the threads. An environment variable update unit (3) updates the environment variables in accordance with the execution speed of the threads.

TECHNIQUES FOR EFFICIENTLY SYNCHRONIZING MULTIPLE PROGRAM THREADS

20220391264 · 2022-12-08 ·

Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

Synchronization between processes in a coordination namespace

11513867 · 2022-11-29 ·

International Business Machines Corporation

A system and method of supporting point-to-point synchronization among processes/nodes implementing different hardware barriers in a tuple space/coordinated namespace (CNS) extended memory storage architecture. The system-wide CNS provides an efficient means for storing data, communications, and coordination within applications and workflows implementing barriers in a multi-tier, multi-nodal tree hierarchy. The system provides a hardware accelerated mechanism to support barriers between the participating processes. Also architected is a tree structure for a barrier processing method where processes are mapped to nodes of a tree, e.g., a tree of degree k, to provide an efficient way of scaling the number of processes in a tuple space/coordination namespace.

Synchronization in a multi-tile processing arrangement

11593185 · 2023-02-28 ·

GRAPHCORE LIMITED

A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.

Gateway pull model

11507416 · 2022-11-22 ·

GRAPHCORE LIMITED

Brian Manula

A computer system comprising: (i) a computer subsystem configured to act as a work accelerator, and (ii) a gateway connected to the computer subsystem, the gateway enabling the transfer of data to the computer subsystem from external storage at pre-compiled data exchange synchronization points attained by the computer subsystem, which act as a barrier between a compute phase and an exchange phase of the computer subsystem, wherein the computer subsystem is configured to pull data from a gateway transfer memory of the gateway in response to the pre-compiled data exchange synchronization point attained by the subsystem, wherein the gateway comprises at least one processor configured to perform at least one operation to pre-load at least some of the data from a first memory of the gateway to the gateway transfer memory in advance of the pre-compiled data exchange synchronization point attained by the subsystem.

Gateway Fabric Ports

20230054059 · 2023-02-23 ·

A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway acts to route data between accelerators which are connected in a scaled system of multiple gateways and accelerators using a global address space set up at compile time of an application to run on the computer system.

MECHANISM TO PROVIDE RELIABLE RECEIPT OF EVENT MESSAGES

20230056665 · 2023-02-23 ·

Devices and techniques for providing receipts for event messages in a processor are described herein. A system includes multiple memory-compute nodes coupled to one another over a scale fabric; a set of registers; and an event manager hardware circuitry to: receive an event message corresponding to an event, and the event associated with an event mode; track a counter value representing a number of received event messages related to the event, the counter value stored in the set of registers; compare the number of received event messages to a trigger value; and in response to the number of received event messages equaling the trigger value: use an atomic operation to reset the counter value in the set of registers while maintaining the event mode; and alert a thread of the event.

Synchronization amongst processor tiles

11586483 · 2023-02-21 ·

GRAPHCORE LIMITED

A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

Synchronizing scheduling tasks with atomic ALU

11500677 · 2022-11-15 ·

Imagination Technologies Limited

A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.

Patent classifications

G06F9/522