MULTICHIP SYSTEM AND DATA PROCESSING METHOD ADAPTED TO THE SAME FOR IMPLEMENTING NEURAL NETWORK APPLICATION
20220004856 ยท 2022-01-06
Assignee
Inventors
Cpc classification
G06F7/53
PHYSICS
International classification
Abstract
A data processing method, a multichip system, and a non-transitory computer-readable medium for implementing a neuron network application are provided. The data processing method includes: allocating corresponding chips to process a corresponding part of a first stage data and a corresponding part of a second stage data; transmitting, by a first chip, a first part of the first stage data to a second chip through a channel; transmitting, by the second chip, a second part of the first stage data to the first chip through the channel; computing, by the first chip, the first stage data with a first part of weight values to obtain a first result, and computing, by the second chip, the first stage data with a second part of weight values to obtain a second result, where the first result and the second result are one of the second stage data.
Claims
1. A data processing method adapted to a multichip system for implementing a neural network application, wherein the multichip system comprises a channel, a first chip and a second chip connecting with the channel, wherein the neural network application comprises a first stage data, a second stage data, a third stage data, and a plurality of weight values, wherein the data processing method comprises: allocating the first chip to process a first part of the first stage data, a first part of the second stage data, and a first part of the third stage data, and allocating the second chip to process a second part of the first stage data, a second part of the second stage data, and a second part of the third stage data; acquiring, by the first chip, a first part of the plurality of weight values corresponding the second stage data; acquiring, by the second chip, a second part of the plurality of weight values corresponding the second stage data; acquiring, by the first chip, the first part of the first stage data; transmitting, by the first chip, the first part of the first stage data to the second chip through the channel; receiving, by the second chip, the first part of the first stage data; acquiring, by the second chip, the second part of the first stage data; transmitting, by the second chip, the second part of the first stage data to the first chip through the channel; receiving, by the first chip, the second part of the first stage data; computing, by the first chip, the first stage data with the first part of the plurality of weight values to obtain a first result, wherein the first result is one of the second stage data; and computing, by the second chip, the first stage data with the second part of the plurality of weight values to obtain a second result, wherein the second result is one of the second stage data.
2. The data processing method as claimed in claim 1, wherein after obtaining the first result and the second result, the data processing method further comprises: acquiring, by the first chip, a third part of the plurality of weight values corresponding the second stage data; acquiring, by the second chip, a fourth part of the plurality of weight values corresponding the second stage data; acquiring, by the first chip, the first part of the first stage data; transmitting, by the first chip, the first part of the first stage data to the second chip through the channel; receiving, by the second chip, the first part of the first stage data; acquiring, by the second chip, the second part of the first stage data; transmitting, by the second chip, the second part of the first stage data to the first chip through the channel; receiving, by the first chip, the second part of the first stage data; computing, by the first chip, the first stage data with the third part of the plurality of weight values to obtain a third result, wherein the third result is one of the second stage data; and computing, by the second chip, the first stage data with the fourth part of the plurality of weight values to obtain a fourth result, wherein the fourth result is one of the second stage data.
3. The data processing method as claimed in claim 2, wherein after obtaining the first result, the second result, the third result, and the fourth result, the data processing method further comprises: sequentially assigning the first result, the third result, the second result, and the fourth result as input data of the second stage data.
4. The data processing method as claimed in claim 1, wherein the multichip system further comprises a first memory and a second memory, the first memory is connected with the first chip, the second memory is connected with the second chip; wherein the first memory comprises a first zone and a second zone, and the second memory comprises a third zone and a fourth zone; and wherein the first part of the first stage data is stored in the first zone of the first memory and the first part of the second stage data is stored in the second zone of the first memory, and the second part of the first stage data is stored in the third zone of the second memory and the second part of the second stage data is stored in the fourth zone of the second memory.
5. The data processing method as claimed in claim 4, wherein the data processing method further comprises: erasing the first part of the first stage data from the first memory and erasing the second part of the first stage data from the second memory; and converting the second zone of the first memory and the fourth zone of the second memory into input data storage area.
6. The data processing method as claimed in claim 1, wherein the multichip system further comprises a memory connected with the first and second chips and a plurality of transmitting lines configured to connect the first and second chips; wherein the memory comprises a first zone and a second zone; and wherein the first stage data are stored in the first zone of the memory and the second stage data are stored in the second zone of the memory.
7. A multichip system for implementing a neural network application, wherein the neural network application comprises a first stage data, a second stage data, a third stage data, and a plurality of weight values, the multichip system comprises: a data channel; a first chip and a second chip connecting with the data channel; a storage; a processor, wherein computerized codes of the multichip system are stored in the storage and configured to be executed by the processor to perform a data processing method, the data processing method comprising: allocating the first chip to process a first part of the first stage data, a first part of the second stage data, and a first part of the third stage data, and allocating the second chip to process a second part of the first stage data, a second part of the second stage data, and a second part of the third stage data; acquiring, by the first chip, a first part of the plurality of weight values corresponding the second stage data; acquiring, by the second chip, a second part of the plurality of weight values corresponding the second stage data; acquiring, by the first chip, the first part of the first stage data; transmitting, by the first chip, the first part of the first stage data to the second chip through the data channel; receiving, by the second chip, the first part of the first stage data; acquiring, by the second chip, the second part of the first stage data; transmitting, by the second chip, the second part of the first stage data to the first chip through the data channel; receiving, by the first chip, the second part of the first stage data; computing, by the first chip, the first stage data with the first part of the plurality of weight values to obtain a first result, wherein the first result is one of the second stage data; and computing, by the second chip, the first stage data with the second part of the plurality of weight values to obtain a second result, wherein the second result is one of the second stage data.
8. The multichip system as claimed in claim 7, wherein the data processing method further comprises: acquiring, by the first chip, a third part of the plurality of weight values corresponding the second stage data; acquiring, by the second chip, a fourth part of the plurality of weight values corresponding the second stage data; acquiring, by the first chip, the first part of the first stage data; transmitting, by the first chip, the first part of the first stage data to the second chip through the channel; receiving, by the second chip, the first part of the first stage data; acquiring, by the second chip, the second part of the first stage data; transmitting, by the second chip, the second part of the first stage data to the first chip through the channel; receiving, by the first chip, the second part of the first stage data; computing, by the first chip, the first stage data with the third part of the plurality of weight values to obtain a third result, wherein the third result is one of the second stage data; and computing, by the second chip, the first stage data with the fourth part of the plurality of weight values to obtain a fourth result, wherein the fourth result is one of the second stage data.
9. The multichip system as claimed in claim 8, wherein the data processing method further comprises: sequentially assigning the first result, the third result, the second result, and the fourth result as input data of the second stage data.
10. The multichip system as claimed in claim 7, wherein the multichip system further comprises a first memory and a second memory, the first memory is connected with the first chip, the second memory is connected with the second chip; wherein the first memory comprises a first zone and a second zone, and the second memory comprises a third zone and a fourth zone; and wherein the first part of the first stage data is stored in the first zone of the first memory and the first part of the second stage data is stored in the second zone of the first memory, and the second part of the first stage data is stored in the third zone of the second memory and the second part of the second stage data is stored in the fourth zone of the second memory.
11. The multichip system as claimed in claim 10, wherein the data processing method further comprises: erasing the first part of the first stage data from the first memory and erasing the second part of the first stage data from the second memory; and converting the second zone of the first memory and the fourth zone of the second memory into input data storage area.
12. The multichip system as claimed in claim 7, wherein the multichip system further comprises a memory connected with the first and second chips and a plurality of transmitting lines configured to connect the first and second chips; wherein the memory comprises a first zone and a second zone; wherein the first stage data are stored in the first zone of the memory and the second stage data are stored in the second zone of the memory; and wherein each of the first and second chips acquires the first stage data from the memory through at least one of the transmitting lines.
13. A non-transitory computer-readable medium for implementing a neuron network application in a multichip system, the non-transitory computer-readable medium having program codes recorded thereon, the program codes being executed by a processor and comprising: A, setting-up input neurons and output neurons of the neural network, wherein each of the output neurons is connected to the input neurons via synapses for weighting outputs from the input neurons depending on weight values; B, waiting first stage data corresponding to the input neurons over a channel; C, computing partial first stage data with corresponding weight values; D, simultaneously computing second stage data corresponding to the output neurons; E, determining, whether to compute all of the weight values, if yes, proceed to F, if not, back to B; F, keeping the second stage data on a memory; G, setting-up the second stage data for the output neurons; and H, determining, whether all assigned output neurons are completed or not, if yes, switching to a next layer application, if no, calling a new channel task and back to A.
14. The non-transitory computer-readable medium for implementing a neuron network application in a multichip system as claimed in claim 13, wherein the new channel task comprising: I, loading the first stage data form the memory; J, broadcasting with the first stage data through the channel; K, determining, whether the first stage data are completely broadcasted, if yes, proceed to L, if not, back to J; and L, determining, whether to compute all of the first stage data, if yes, the new channel task is end, if not, back to I.
15. The non-transitory computer-readable medium for implementing a neuron network application in a multichip system as claimed in claim 13, wherein the switching of the next layer application comprises: M, setting-up input points of the memory as output points; and N, setting-up output points of the memory into input points.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION
[0045] The structure and the technical means adopted by the present disclosure to achieve the above and other objects can be best understood by referring to the following detailed description of the preferred embodiments and the accompanying drawings.
[0046] Referring to
[0047] Referring to
[0048] As shown in
[0049] Referring to
[0050] In the present disclosure, one of the chips C1-CN assigned to a master (i.e., operating in a master mode) by a predefined protocol occupies the broadcasting channel 110 and performs data bus operation. All the remaining chips operate in a slave mode and receive the data. Specifically, when the first stage data A.sub.0-A.sub.C are sequentially transmitted through the broadcasting channel 110, an operating protocol of the broadcasting channel 110 causes one of the chips to become the master and the other chips to operate as slaves. The master mode is an operation mode for a chip to maintain control of the computing chips. In one implementation, when operating in the master mode, the chip can further control and manage the other chips operating in the slave mode. The slave mode is an operation mode for one of the chips to allow the other chips, operating in master mode, to control and manage it.
[0051] Referring to
[0052] As shown in
[0053] As shown in
[0054] Then, each of the chips C1-CN sequentially acquires and transmits the corresponding part of the first stage data A.sub.0-A.sub.c to other chips through the broadcasting channel 110. After a master chip sequentially transmits all the data of it, the next chip becomes the master and performs the same operation, and the remaining chips become the slave for receiving the data. That is, once all corresponding part of the first stage data of the master chip are shared to other chips, the next chip having another corresponding part of the first stage data becomes the master chip until its first stage data is exhausted. For example, if the first chip C1 is the master chip, the first chip C1 acquires the first part of the first stage data A.sub.0-A.sub.2, and transmits the first part of the first stage data A.sub.0-A.sub.2 to the second chip C2 through the broadcasting channel 110, such that the second chip C2 receives the first part of the first stage data A.sub.0-A.sub.2. Similarly, the first chip C1 sequentially transmits the first part of the first stage data A.sub.0-A.sub.2 to other chips C3-CN, such that the other chips C3-CN sequentially receive the first part of the first stage data A.sub.0-A.sub.2. After the first part of the first stage data A.sub.0-A.sub.2 of the first chip C1 are shared to other chips C2-CN, the next chip, i.e., the second chip C2, having the second part of the first stage data A.sub.3-A.sub.5 becomes the master chip. Then, the second chip C2 acquires the second part of the first stage data A.sub.3-A.sub.5, and transmits the second part of the first stage data A.sub.3-A.sub.5 to the first chip C1 through the broadcasting channel 110, such that the first chip C1 receives the second part of the first stage data A.sub.3-A.sub.5. Then, the second chip C2 sequentially transmits the second part of the first stage data A.sub.3-A.sub.5 to other chips C3-CN, such that the other chips C3-CN sequentially receive the second part of the first stage data A.sub.3-A.sub.5. Therefore, the first chip C11 acquire all the first stage data A.sub.0-A.sub.c first, followed by the second chip C2, and so on.
[0055] After one of the chips C1-CN receives one of the first stage data A.sub.0-A.sub.c, the corresponding chip computes the first stage data A.sub.0-A.sub.c with a corresponding synapse weight value to generate a weight value output. That is, the plurality of chips C1-CN parallelly compute the first stage data A.sub.0-A.sub.C for a total of weight value outputs from the input neurons in accordance with its output function. For example, the first chip C1 computes the first stage data A.sub.0-A.sub.c with the first part of the weight values (e.g., W.sub.00 and so on) by the computing array 1201 to obtain a first result N.sub.0, where the first result No is one of the second stage data N.sub.0-N.sub.f. Then, the second chip C2 computes the first stage data A.sub.0-A.sub.C with the second part of the weight values (e.g., W.sub.02 and so on) to obtain a second result N.sub.2, where the second result N.sub.2 is one of the second stage data N.sub.0-N.sub.f.
[0056] The chips C1-CN repeatedly execute above acquisition and sequentially transmission until all of the chips C1-CN transmit the first stage data A.sub.0-A.sub.c to each other through the broadcasting channel 110, and thus their second stage data N.sub.0-N.sub.f are completed. Specifically, after the first result No and the second result N.sub.2 of the second stage data N.sub.0-N.sub.f are obtained, the first chip C1 acquires a third part of the weight values (e.g., W.sub.01 and so on) corresponding the second stage data N.sub.0-N.sub.f, and the second chip C2 acquires a fourth part of the weight values (e.g., W.sub.03 and so on) corresponding the second stage data N.sub.0-N.sub.f. Then, each of the chips C1-CN sequentially acquires and transmits the corresponding part of the first stage data A.sub.0-A.sub.c to other chips through the broadcasting channel 110 again. After a master chip sequentially transmits all the data of it, the next chip becomes the master and performs the same operation, and the remaining chips become the slave for receiving the data. That is, once all corresponding part of the first stage data of the master chip are shared to other chips, the next chip having another corresponding part of the first stage data becomes the master chip until its first stage data is exhausted. For example, if the first chip C1 is the master chip, the first chip C1 acquires the first part of the first stage data A.sub.0-A.sub.2, and transmits the first part of the first stage data A.sub.0-A.sub.2 to the second chip C2 through the broadcasting channel 110, such that the second chip C2 receives the first part of the first stage data A.sub.0-A.sub.2. Similarly, the first chip C1 sequentially transmits the first part of the first stage data A.sub.0-A.sub.2 to other chips C3-CN, such that the other chips C3-CN sequentially receive the first part of the first stage data A.sub.0-A.sub.2. After the first part of the first stage data A.sub.0-A.sub.2 of the first chip C1 are shared to other chips C2-CN, the next chip, i.e., the second chip C2, having the second part of the first stage data A.sub.3-A.sub.5 becomes the master chip. Then, the second chip C2 acquires the second part of the first stage data A.sub.3-A.sub.5, and transmits the second part of the first stage data A.sub.3-A.sub.5 to the first chip C1 through the broadcasting channel 110, such that the first chip C1 receives the second part of the first stage data A.sub.3-A.sub.5. Then, the second chip C2 sequentially transmits the second part of the first stage data A.sub.3-A.sub.5 to other chips C3-CN, such that the other chips C3-CN sequentially receive the second part of the first stage data A.sub.3-A.sub.5. Therefore, the first chip C11 acquire all the first stage data A.sub.0-A.sub.C first, followed by the second chip C2, and so on.
[0057] After one of the chips C1-CN receives one of the first stage data A.sub.0-A.sub.c, the corresponding chip computes the first stage data A.sub.0-A.sub.c with a corresponding synapse weight value to generate a weight value output. That is, the plurality of chips C1-CN parallelly compute the first stage data A.sub.0-A.sub.C for a total of weight value outputs from the input neurons in accordance with its output function. For example, the first chip C1 computes the first stage data A.sub.0-A.sub.c with the third part of the weight values (e.g., W.sub.01 and so on) by the computing array 1201 to obtain a third result N.sub.1, where the third result N.sub.1 is one of the second stage data N.sub.0-N.sub.f. Then, the second chip C2 computes the first stage data A.sub.0-A.sub.C with the fourth part of the weight values (e.g., W.sub.03 and so on) to obtain a fourth result N.sub.3, where the fourth result N.sub.3 is one of the second stage data N.sub.0-N.sub.f. The reason for this sequential mastering is due to the way in which all the chips C1-CN have their input neurons partially localized. It is made possible by the fact that each chip will be calculated with different synapses 4 as well as a target output neuron 3, even when the computed result of each chip is later stored as an output feature value. Furthermore, the first result N.sub.0, the third result N.sub.1, the second result N.sub.2, and the fourth result N.sub.3 are sequentially assigned as input data of the second stage data N.sub.0-N.sub.f.
[0058] While all the first stage data A.sub.0-A.sub.c have exhausted, all chips C1-CN will store their second stage data N.sub.0-N.sub.f in their memories S1-SN. For example, as shown in
[0059] In the next layer application of the neural network 1, the second stage data N.sub.0-N.sub.f stored as output are now for the next layer, and the second stage data N.sub.0-N.sub.f serve as input feature values, such that the second stage data N.sub.0-N.sub.f stored in their memories S1-SN are switched to be subsequent input feature values of the next layer of the neural network 1. At this time, the first stage data A.sub.0-A.sub.c are erased from their memories S1-SN. For example, the first part of the first stage data A.sub.0-A.sub.2 is erased from first zone Z1 of the first memory S1, and the second part of the first stage data A.sub.3-A.sub.5 is erased from the third zone Z3 of the second memory S2. Then, the second zone Z2 of the first memory S1 and the fourth zone Z4 of the second memory S2 are converted into input data storage area for storing corresponding second stage data N.sub.0-N.sub.f, and the first zone Z1 of the first memory S1 and the third zone Z3 of the second memory S2 are converted into output data storage area for storing corresponding third stage data B.sub.0-B.sub.c.
[0060] As shown in
[0061] In the first embodiment, the multichip system 10 does not share actual memory resources. In other words, it is impossible to directly access the local memories S1-SN of the other chips, but it is structured in which each chip shares necessary input feature values (e.g., the first stage data A.sub.0-A.sub.c or the second stage data N.sub.0-N.sub.f) through the common broadcasting channel 110 and uses the necessary parts for calculation. Therefore, the multichip system 10 is possible by the operation mechanism of the present disclosure, which is preferably applied to an application system requiring better performance.
[0062] Referring to
[0063] As shown in
[0064] As shown in
[0065] In certain embodiments, one or more process steps described herein may be performed by one or more processors (e.g., a computer processor) executing program codes recorded on a non-transitory computer-readable medium. For example, a process of implementing a neuron network application in a multichip system, as shown in
[0066]
[0067] Furthermore, when the computer system 30 performs the new channel task, program codes being executed by the processor 310 include: program code I, loading the first stage data A.sub.0-A.sub.C form the memory; program code J, broadcasting with the first stage data A.sub.0-A.sub.C through the channel 110; program code K, determining, whether the first stage data A.sub.0-A.sub.C are completely broadcasted, if yes, proceed to program code L, if not, back to program code J; and program code L, determining, whether to compute all of the first stage data A.sub.0-A.sub.C, if yes, the new channel task is end, if not, back to program code I.
[0068] Furthermore, when the computer system 30 switches to perform the next layer application, program codes being executed by the processor 310 include: program code M, setting-up input points of the memory as output points; and program code N, setting-up output points of the memory into input points. Specifically, as shown in
[0069] In some embodiments, the computer system 30 may include more than one processor. Moreover, the processor 310 may include one or more processors or one or more processor cores. The processor 310 may be coupled to the storage medium 320 and the peripheral devices 330 in any desired fashion. For example, in some embodiments, the processor 310 may be coupled to the storage medium 320 and/or the peripheral devices 330 via various interconnect. Alternatively or in addition, one or more bridge chips may be used to couple the processor 310, the storage medium 320, and peripheral devices 330. The storage medium 320 may include any type of memory system. For example, the storage medium 320 may include DRAM, and more particularly double data rate (DDR) SDRAM, RDRAM, etc. A memory controller may be included to interface to the storage medium 320, and/or the processor 310 may include a memory controller. The storage medium 320 may store the program codes to be executed by the processor 310 during use, data to be operated upon by the processor during use, etc. The peripheral devices 330 may represent any sort of hardware devices that may be included in the computer system 30 or coupled thereto.
[0070] The storage medium 320 may include the program codes one or more program codes representative of multichip system 10 (depicted in
[0071] In summary, in the present disclosure, the multichip system is capable of parallel operation. In order to improve the performance of a machine learning accelerating chip, the present disclosure provides a broadcasting channel for multichip system function. It is a structural design idea based on the fulfillment of the market demand of the form having the function. In order to realize this, in the present disclosure, the input feature value of each chip is partially transmitted and shared, and it is possible to calculate the troop by using it in other chips at the same time, and the calculated result will be finally output neuron value, and it acts as subsequent input data of the next layer again. This allows us to achieve high performance and low cost system with the multichip system to meet market demands.
[0072] The above descriptions are merely preferable embodiments of the present disclosure. Any modification or replacement made by those skilled in the art without departing from the principle of the present disclosure should fall within the protection scope of the present disclosure.