G06F15/17325

MEMORY MANAGEMENT IN A MULTIPLE PROCESSOR SYSTEM

Methods and apparatus for memory management are described. In one example, this disclosure describes a method that includes executing, by a first processing unit, first work unit operations specified by a first work unit message, wherein execution of the first work unit operations includes accessing data from shared memory included within the computing system, modifying the data, and storing the modified data in a first cache associated with the first processing unit; identifying, by the computing system, a second work unit message that specifies second work unit operations that access the shared memory; updating, by the computing system, the shared memory by storing the modified data in the shared memory; receiving, by the computing system, an indication that updating the shared memory with the modified data is complete; and enabling the second processing unit to execute the second work unit operations.

DATA PROCESSING ENGINE TILE ARCHITECTURE FOR AN INTEGRATED CIRCUIT

An example data processing engine (DPE) for a DPE array in an integrated circuit (IC) includes: a core; a memory including a data memory and a program memory, the program memory coupled to the core, the data memory coupled to the core and including at least one connection to a respective at least one additional core external to the DPE; support circuitry including hardware synchronization circuitry and direct memory access (DMA) circuitry each coupled to the data memory; streaming interconnect coupled to the DMA circuitry and the core; and memory-mapped interconnect coupled to the core, the memory, and the support circuitry.

SYSTEMS AND METHODS FOR IMPLEMENTING AN INTELLIGENCE PROCESSING COMPUTING ARCHITECTURE

A system and method for automated data propagation and automated data processing within an integrated circuit includes an intelligence processing integrated circuit comprising at least one intelligence processing pipeline, wherein the at least one intelligence processing pipeline includes: a main data buffer that stores input data; a plurality of distinct intelligence processing tiles, wherein each distinct intelligence processing tile includes a computing circuit and a local data buffer; a token-based governance module, the token-based governance module implementing: a first token-based control data structure; a second token-based control data structure, wherein the first token-based control data structure and the second-token based control data operate in cooperation to control an automated flow of the input data and/or an automated processing of the input data through the at least one intelligence processing pipeline.

PERSISTENT KERNEL FOR GRAPHICS PROCESSING UNIT DIRECT MEMORY ACCESS NETWORK PACKET PROCESSING
20220261367 · 2022-08-18 ·

A graphics processing unit may, in accordance with a kernel, determine that at least a first packet is written to a memory buffer of the graphics processing unit by a network interface card via a direct memory access, process the at least the first packet in accordance with the kernel, and provide a first notification to a central processing unit that the at least the first packet is processed in accordance with the kernel. The graphics processing unit may further determine that at least a second packet is written to the memory buffer by the network interface card via the direct memory access, process the at least the second packet in accordance with the kernel, where the kernel comprises a persistent kernel, and provide a second notification to the central processing unit that the at least the second packet is processed in accordance with the kernel.

Interconnected Dies, Interconnected Microcomponents, Interconnected Microsystems and Their Communication Methods

The invention discloses connections among dies, in particular to interconnected dies, comprising: protocol conversion circuits, external interconnected interfaces and networks on die. The protocol conversion circuits comprise multiple protocol conversion modules which are used to communicate with various standard mainstream protocol interfaces. The external interconnected interfaces comprise several synchronization controllers which are used to communicate with other interconnected dies. The networks on die comprise transmission buses and routers which are used to transmit data packets from the interfaces or other interconnected dies. At the same time, the synchronization controllers and the protocol conversion modules are respectively connected with boundary nodes of the networks on die. The interconnected dies support interface expansion and inter-chip cascades, hardware circuits are simple and functional levels are clearly divided, with good expandability, which overcomes the defects of closed technology, numerous and jumbled systems and bad expandability of the current multi-die systems.

Instruction Set

The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.

Controlling timing in computer processing

A computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising: a switch control instruction which when executed causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined received time, the switch control instruction comprising a delay control field which holds a value defining a delay between issuance of the instruction in the sequence of instructions and its execution by the execution unit.

Methods and apparatus for correcting out-of-order data transactions between processors
11379278 · 2022-07-05 · ·

Methods and apparatus for correcting out-of-order data transactions over an inter-processor communication (IPC) link between two (or more) independently operable processors. In one embodiment, a peripheral-side processor receives data from an external device and stores it to memory. The host processor writes data structures (transfer descriptors) describing the received data, regardless of the order the data was received from the external device. The transfer descriptors are written to a memory structure (transfer descriptor ring) in memory shared between the host and peripheral processors. The peripheral reads the transfer descriptors and writes data structures (completion descriptors) to another memory structure (completion descriptor ring). The completion descriptors are written to enable the host processor to retrieve the stored data in the correct order. In optimized variants, a completion descriptor describes groups of transfer descriptors. In some variants, the peripheral processor caches the transfer descriptors to offload them from the transfer descriptor ring.

Control flow barrier and reconfigurable data processor

A reconfigurable data processor comprises an array of processing units arranged to perform execution fragments of a data processing operation. A control barrier network is coupled to processing units in the array. The control barrier network comprises a control bus configurable to form signal routes in the control barrier network, and a plurality of control barrier logic units having inputs and outputs connected to the control bus and to the array of processing units. The logic units in the plurality of logic units are configurable to consume source tokens and status signals on the inputs and produce barrier tokens on the outputs based on the source tokens and status signals on the inputs. Also, the logic units can produce enable signals for the array of processing units based on the source tokens and status signals on the inputs.

DISTRIBUTING A GLOBAL COUNTER VALUE IN A MULTI-SOCKET SYSTEM-ON-CHIP COMPLEX

Apparatuses, systems, and methods for distributing a global counter value in a multi-socket SoC complex. In exemplary aspects, an apparatus comprises a first system-on-a-chip (SoC) in a first socket and a second SoC in a second socket. The apparatus further comprises a reset circuit coupled to the first SoC and the second SoC, a reset synchronization circuit coupled to the reset circuit, the first SoC, and the second SoC, and a global counter clock signal coupled to the reset synchronization circuit, the first SoC, and the second SoC. The reset synchronization circuit is configured to generate a global counter reset signal in response to a reset signal received from the reset circuit and to distribute the global counter reset signal to the first SoC and the second SoC substantially simultaneously.