Patent classifications
G06F15/17325
BROADCAST SYNCHRONIZATION FOR DYNAMICALLY ADAPTABLE ARRAYS
An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
System, board card and electronic device for data accelerated processing
The present disclosure relates to a system, a computing apparatus, a board card, and an electronic device for data accelerated processing. The computing apparatus may be included in a combined processing apparatus. The combined processing apparatus may also include a universal interconnection interface and other processing apparatuses. The computing apparatus interacts with other processing apparatuses to jointly complete computing operations specified by the user. The combined processing apparatus may also include a storage apparatus which is respectively connected to the computing apparatus and other processing apparatuses and is used for storing data of the computing apparatus and other processing apparatuses. The solution of the present disclosure can be applied to various electronic devices.
Relay consistent memory management in a multiple processor system
Methods and apparatus for memory management are described. In one example, this disclosure describes a method that includes executing, by a first processing unit, first work unit operations specified by a first work unit message, wherein execution of the first work unit operations includes accessing data from shared memory included within the computing system, modifying the data, and storing the modified data in a first cache associated with the first processing unit; identifying, by the computing system, a second work unit message that specifies second work unit operations that access the shared memory; updating, by the computing system, the shared memory by storing the modified data in the shared memory; receiving, by the computing system, an indication that updating the shared memory with the modified data is complete; and enabling the second processing unit to execute the second work unit operations.
Systems and methods for implementing an intelligence processing computing architecture
A system and method for automated data propagation and automated data processing within an integrated circuit includes an intelligence processing integrated circuit comprising at least one intelligence processing pipeline, wherein the at least one intelligence processing pipeline includes: a main data buffer that stores input data; a plurality of distinct intelligence processing tiles, wherein each distinct intelligence processing tile includes a computing circuit and a local data buffer; a token-based governance module, the token-based governance module implementing: a first token-based control data structure; a second token-based control data structure, wherein the first token-based control data structure and the second-token based control data operate in cooperation to control an automated flow of the input data and/or an automated processing of the input data through the at least one intelligence processing pipeline.
PIPELINE COMPUTING APPARATUS, PROGRAMMABLE LOGIC CONTROLLER, AND PIPELINE PROCESSING EXECUTION METHOD
A pipeline computing apparatus (110) comprises: a computing unit (120) configured as a pipeline; a node monitoring unit (161) that obtains a node processing time; a queue monitoring unit (162) that obtains an accumulated message amount; a priority variable calculating unit (163) that, on the basis of the node processing time and the accumulated message amount in a reception queue in a stage previous to the node, calculates a priority variable of the node; and a time allocating unit (164) that allocates operating time to each of nodes in accordance with the priority variable.
SYNCHRONIZATION CIRCUIT AND SYNCHRONIZATION CHIP
The present disclosure provides a synchronization circuit, including M group synchronization signal generating circuits and a node synchronization signal generating circuit. For the synchronization circuit provided in the embodiments of the present disclosure, synchronization indication signals can be separately generated by a plurality of group synchronization signal generating circuits, so as to drive a node synchronization signal generating circuit to generate synchronization signals, thereby efficiently implementing synchronization control over a plurality of nodes in a multi-node environment.
SYNCHRONIZATION SIGNAL GENERATING CIRCUIT, CHIP AND SYNCHRONIZATION METHOD AND DEVICE, BASED ON MULTI-CORE ARCHITECTURE
The present disclosure provides a synchronization signal generating circuit, a chip, and a synchronization method and a synchronization device, based on a multi-core architecture, configured to generate a synchronization signal for M node groups, wherein each of the node groups includes at least one node, and M is an integer greater than or equal to 1. The synchronization signal generating circuit includes: a synchronization signal generating sub-circuit and M group ready signal generating sub-circuits. The M group ready signal generating sub-circuits are in one-to-one correspondence with the M node groups. The synchronization signal generating sub-circuit generates a first synchronization signal based on the first to-be-started signal, wherein the first synchronization signal is configured to instruct the K nodes in the first node group to start synchronization.
NEURAL PROCESSING UNIT SYNCHRONIZATION SYSTEMS AND METHODS
Systems and methods for exchanging synchronization information between processing units using a synchronization network are disclosed. The disclosed systems and methods include a device including a host and associated neural processing units. Each of the neural processing units can include a command communication module and a synchronization communication module. The command communication module can include circuitry for communicating with the host device over a host network. The synchronization communication module can include circuitry enabling communication between neural processing units over a synchronization network. The neural processing units can be configured to each obtain a synchronized update for a machine learning model. This synchronized update can be obtained at least in part by exchanging synchronization information using the synchronization network. The neural processing units can each maintain a version of the machine learning model and can synchronize it using the synchronized update.
Sync group selection
Implicit sync group selection is performed by having dual interfaces to a gateway. A subsystem coupled to the gateway selects a sync group to be used for an upcoming exchange by selecting the interface to which a sync request is written to. The gateway propagates the sync requests and/or acknowledgments in dependence upon configuration settings for the sync group that is associated with the interface to which the sync request was written to.
Control of data sending from a multi-processor device
A method for controlling the sending of data by a plurality of processors belonging to a device, the method comprising: sending a first message to a first processor of the plurality of processors to grant permission to the first processor of the plurality of processors to send a first set of data packets over at least one external interface of the device; receiving from the first processor, an identifier of a second processor of the plurality of processors; and in response to receipt of the identifier of the second processor, send a second message to the second processor to grant permission to the second processor to send a second set of data packets over the at least one external interface.