ON-CHIP NETWORK DESIGN METHOD FOR DISTRIBUTED PARALLEL OPERATION ALGORITHM

20230269200 ยท 2023-08-24

    Inventors

    Cpc classification

    International classification

    Abstract

    The present disclosure relates to an on-chip network design method for a distributed parallel operation algorithm. According to a distributed parallel operation algorithm of an on-chip network, the on-chip network is divided into two layers, including a unicast network and a multicast network, where the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network. By designing a multicast tree transmission architecture for the distributed parallel operation algorithm, a bidirectional replication node or a receiving node is disposed in each operation node.

    Claims

    1. An on-chip network design method for a distributed parallel operation algorithm, the method comprising: according to a distributed parallel operation algorithm of an on-chip network, dividing the on-chip network into two layers, comprising a unicast network and a multicast network; wherein the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and wherein the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network.

    2. The on-chip network design method according to claim 1, wherein the multicast network comprises two types of nodes, namely, bidirectional replication nodes and receiving nodes, wherein a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, all the nodes in the multicast network jointly form a tree node graph, each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree, and reasonable design of the bidirectional replication node and the receiving node ensures better performance when resource usage is relatively small.

    3. The on-chip network design method according to claim 2, wherein the bidirectional replication node decodes and stores data in a multicast packet sent by a previous level and copies and transmits the data packet to two nodes at a next level, and a node at the last level is the receiving node that receives and decodes the multicast packet and stores the data.

    4. The on-chip network design method according to claim 1, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.

    5. The on-chip network design method according to claim 2, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.

    6. The on-chip network design method according to claim 3, wherein a running process of the entire on-chip network is as follows: s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0024] To describe the technical solutions in the embodiments of the present disclosure or in the prior art more clearly, the following briefly describes the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.

    [0025] FIG. 1 is a typical structural diagram of a MESH on-chip network.

    [0026] FIG. 2 is an architecture diagram of a dual-layer on-chip network.

    [0027] FIG. 3 is a micro architecture of a bidirectional replication node.

    DETAILED DESCRIPTION OF THE EMBODIMENTS

    [0028] A further description is made on the on-chip network design method for a distributed parallel operation algorithm in the present disclosure with reference to the accompanying drawings, and the following further describes the present disclosure in detail with reference to the embodiments.

    [0029] An on-chip network design method for a distributed parallel operation algorithm is provided, where the method includes: according to a distributed parallel operation algorithm of an on-chip network, dividing the on-chip network into two layers, including a unicast network and a multicast network, where the unicast network is configured to implement point-to-point propagation between nodes and transmit independent operation data required by operation nodes to each operation node in a form of unicast; and the multicast network is a customized multicast network for the distributed parallel operation algorithm and configured to transmit common operation data to all the operation nodes, such that a data packet in the network is efficiently transmitted through a combination of the unicast network and the multicast network.

    [0030] Further, the multicast network includes two types of nodes, namely, bidirectional replication nodes and receiving nodes, where a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, all the nodes in the multicast network jointly form a tree node graph, each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree, and reasonable design of the bidirectional replication node and the receiving node ensures better performance when resource usage is relatively small.

    [0031] Further, the bidirectional replication node decodes and stores data in a multicast packet sent by a previous level and copies and transmits the data packet to two nodes at a next level, and a node at the last level is the receiving node that receives and decodes the multicast packet and stores the data.

    [0032] Further, a running process of the entire on-chip network is as follows: [0033] s1: when an algorithm operation starts, a data input node receives multicast data and unicast data sent by a sensor, and then the node packs the multicast data and performs a multicast operation on the multicast data by using the multicast network, sends the multicast data to each operation node, and then sequentially packs and sends the unicast data to a corresponding operation node in the unicast network by using a unicast operation; and [0034] s2: each operation node starts an operation after receiving the corresponding multicast data and the corresponding unicast data, and continuously packs and sends an operation result to a storage node during the operation until all distributed parallel operations are completed, and an RISC-V processor node accesses the stored data by using the unicast network.

    [0035] Working principle: As shown in FIG. 2, a unicast network adopts an on-chip network of an N*N Mesh network topology. There are the following nodes in the unicast network in the network: 1. Data input node, responsible for receiving newly sensed data transmitted by a sensor or an upper level of the network, correspondingly packing the data into a unicast packet and a multicast packet, and respectively sending the data packets to corresponding operation nodes through the unicast network and the multicast network. 2. Node including operation units, responsible for unpacking and storing the data packets after receiving the unicast packet and the multicast packet sent to the node. Then the operation node invokes data corresponding to the multicast packet and the unicast packet for operation and packs and sends a computation result to a corresponding storage unit. 3. Node only responsible for receiving and sending of a packet. This type of node is only responsible for propagating packets in the unicast network to destination nodes in an X direction or Y direction without unpacking and storing data into the storage unit. 4. Node including a storage unit. This type of node stores all valid results and supports another node in sending a request to the node. After receiving the request, the node returns a packet including requested data to the another node. 5. Node including an RISC-V processor. An RISC-V processor is mounted on the node and configured to complete algorithms other than the computation content of the computing unit of the on-chip network. For example, after the on-chip network completes a convolution operation in a deep learning algorithm, the RISC-V processor invokes data in a storage node to complete subsequent operations such as pooling and full connection.

    [0036] The multicast network includes bidirectional replication nodes and receiving nodes, where a next level of each of the bidirectional replication nodes is connected to two bidirectional replication nodes or receiving nodes, and all the nodes in the multicast network jointly form a tree node graph Each multicast operation is transmitted from a top node of the tree to all bottom nodes of the tree. A micro architecture of a bidirectional replication node is shown in FIG. 3 and includes two parts of control logic and a dual-port buffer. After the control logic receives a Start_In signal, it indicates that an end B of the dual-port buffer at a previous level starts transmitting data, then the control logic at this level sends a written address and an enabling signal to a port A of the dual-port buffer, and stores data sent by a previous level until the previous level sends a Finish_In signal, to complete storage of all data. Then, the control logic at this level sends a Start_Out signal and starts sending a read address and a read enabling signal to a port B of the dual-port buffer and sends a Finish-Out signal after all the data sent by the previous level is sent completely. After this level completes the multicast operation, the control logic invokes a read operation of the port A again, to read valid data in the multicast packet, and invokes the operation unit with reference to data in the unicast packet to complete the operation.

    [0037] The foregoing embodiments are only intended to describe the preferred implementations of the present disclosure, rather than limiting the concept and scope of the present disclosure. Various modifications and improvements made on the technical solution of the present disclosure by those of ordinary skills in the art without departing from the design concept of the present disclosure shall fall within the claimed scope of the present disclosure. The technical content claimed by the present disclosure has been fully recorded in the claims.