ON-CHIP NETWORK DESIGN METHOD FOR DISTRIBUTED PARALLEL OPERATION ALGORITHM
20230269200 ยท 2023-08-24
Inventors
Cpc classification
Y02D30/70
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
The present disclosure relates to an on-chip network design method for a distributed parallel operation algorithm. According to a distributed parallel operation algorithm of an on-chip network, the on-chip network is divided into two layers, including a unicast network and a multicast network, where the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network. By designing a multicast tree transmission architecture for the distributed parallel operation algorithm, a bidirectional replication node or a receiving node is disposed in each operation node.
Claims
1. An on-chip network design method for a distributed parallel operation algorithm, the method comprising: according to a distributed parallel operation algorithm of an on-chip network, dividing the on-chip network into two layers, comprising a unicast network and a multicast network; wherein the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and wherein the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network.
2. The on-chip network design method according to claim 1, wherein the multicast network comprises two types of nodes, namely, bidirectional replication nodes and receiving nodes, wherein a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, all the nodes in the multicast network jointly form a tree node graph, each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree, and reasonable design of the bidirectional replication node and the receiving node ensures better performance when resource usage is relatively small.
3. The on-chip network design method according to claim 2, wherein the bidirectional replication node decodes and stores data in a multicast packet sent by a previous level and copies and transmits the data packet to two nodes at a next level, and a node at the last level is the receiving node that receives and decodes the multicast packet and stores the data.
4. The on-chip network design method according to claim 1, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.
5. The on-chip network design method according to claim 2, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.
6. The on-chip network design method according to claim 3, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] To describe the technical solutions in the embodiments of the present disclosure or in the prior art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
[0025]
[0026]
[0027]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0028] A further description is made on the on-chip network design method for a distributed parallel operation algorithm in the present disclosure with reference to the accompanying drawings, and the following further describes the present disclosure in detail with reference to the embodiments.
[0029] An on-chip network design method for a distributed parallel operation algorithm is provided, where the method includes: according to a distributed parallel operation algorithm of an on-chip network, dividing the on-chip network into two layers, including a unicast network and a multicast network, where the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network.
[0030] Further, the multicast network includes two types of nodes, namely, bidirectional replication nodes and receiving nodes, where a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, all the nodes in the multicast network jointly form a tree node graph, each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree, and reasonable design of the bidirectional replication node and the receiving node ensures better performance when resource usage is relatively small.
[0031] Further, the bidirectional replication node decodes and stores data in a multicast packet sent by a previous level and copies and transmits the data packet to two nodes at a next level, and a node at the last level is the receiving node that receives and decodes the multicast packet and stores the data.
[0032] Further, a running process of the entire on-chip network is as follows: [0033] s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and [0034] s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.
[0035] Working principle: As shown in
[0036] The multicast network includes bidirectional replication nodes and receiving nodes, where a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, and all the nodes in the multicast network jointly form a tree node graph Each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree. A micro architecture of a bidirectional replication node is shown in
[0037] The foregoing embodiments are only intended to describe the preferred implementations of the present disclosure, rather than limiting the concept and scope of the present disclosure. Various modifications and improvements made on the technical solution of the present disclosure by those of ordinary skills in the art without departing from the design concept of the present disclosure shall fall within the claimed scope of the present disclosure. The technical content claimed by the present disclosure has been fully recorded in the claims.