Parallel computing system and communication control program
10430375 ยท 2019-10-01
Assignee
Inventors
Cpc classification
International classification
Abstract
A parallel computing system includes a plurality of processors multi-dimensionally commented by an interconnection network, wherein each of the processors in the parallel computing system determines, in dimensional order, communication channels to other processors in the interconnection network, each of the processors sets, as relative coordinates of destination processors with respect to the plurality of processors in data communications performed at a same timing, relative coordinates common to all of the processors, and each of the processors performs data communications with destination processors having the set relative coordinates.
Claims
1. A parallel computing system comprising: a plurality of processors multi-dimensionally coupled via an interconnection network having torus or mesh topology, each of the plurality of processors in the parallel computing system being configured to: determine, in dimensional order, communication channels to other processors in the interconnection network, be provided with a relative coordinate system whose origin is set as the each processor, the relative coordinate system being common to all of the plurality of processors, perform a simultaneous data-transmission including: determining a distinct set of plural destination relative coordinates whose values are symmetric about the origin of the relative coordinate system provided for the each processor and whose distances in dimensional order from the origin of the relative coordinate system provided for the each processor are identical, selecting, from among the plurality of processors, based on the relative coordinate system provided for the each processor, a different group of plural destination processors having the determined distinct set of plural destination relative coordinates, and simultaneously transmitting data from the each processor to the selected different group of plural destination processors through the communication channels determined in dimensional order, wherein when performing all-to-all communication in the parallel computing system, each of the plurality of processors repeats the simultaneous data-transmission by changing the distinct set of plural destination relative coordinates so that each of all the plurality of processors is selected once as a destination processor included in one of the different groups of plural destination processors; and for any pair of first and second processors among the plurality of processors, first different groups of plural destination processors selected by the first processor have first distinct sets of plural destination relative coordinates that are respectively identical to second distinct sets of plural destination relative coordinates for second different groups of plural destination processors selected by the second processor, and identical transmission control including the simultaneous data-transmission is performed on both the first and second processors in the parallel computing system, based on the first and second distinct sets of plural destination relative coordinates, so that a number of inter processor communications routed through respective links is equalized among all links in the parallel computing system.
2. The parallel computing system according to claim 1, wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when L represents the maximum value of the lengths in respective dimensions of the interconnection network and Xn and Yn respectively represent the relative coordinate value in a first dimension and the relative coordinate value in a second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1; second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1; and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system perform, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
3. The parallel computing system according to claim 1, wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when the interconnection network is a two-dimensional torus and is different in length between a first dimension and a second dimension thereof, and when L represents the maximum value of the length in the first dimension and the length in the second dimension of the interconnection network and Xn and Yn respectively represent the relative coordinate value in the first dimension and the relative coordinate value in the second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors perform: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1, second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1, and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system perform, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
4. The parallel computing system according to claim 1, wherein each of the plurality of processors in the parallel computing system is configured to simultaneously transmit data to a maximum of four processors, wherein, when the interconnection network is a two-dimensional mesh, and when L represents the maximum value of the length in a first dimension and the length in a second dimension of the interconnection network and Xn and Yn respectively represent the relative coordinate value in the first dimension and the relative coordinate value in the second dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to four destination processors located at respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be different from the absolute value of Y1, and in which the absolute value of X1 and the absolute value of Y1 are set to be different from half the L value, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to Y1 and Y3 is set to X1, and relative coordinates (X4, Y4) in which X4 is set to the sign-inverted value of Y1 and Y4 is set to the sign-inverted value of X1, second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1) in which the absolute value of X1 is set to be equal to the absolute value of Y1, relative coordinates (X2, Y2) in which X2 is set to the sign-inverted value of X1 and Y2 is set to the sign-inverted value of Y1, relative coordinates (X3, Y3) in which X3 is set to the sign-inverted value of X1 and Y3 is set to Y1, and relative coordinates (X4, Y4) in which X4 is set to X1 and Y4 is set to the sign-inverted value of Y1, and third transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, 0), relative coordinates (0, Y1) in which the absolute value of Y1 is set to be equal to the absolute value of X1, relative coordinates (X2, 0) in which X2 is set to the sign-inverted value of X1, and relative coordinates (0, Y2) in which Y2 is set to the sign-inverted value of Y1, and wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
5. The parallel computing system according to claim 2, wherein, when the L value is an odd number, each of the plurality of processors repeats the simultaneous data-transmission to four destination processors (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other and performing repeats the simultaneous transmission to four destination processors (L1)/2 times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, while changing the relative coordinate values in each of the simultaneous data-transmissions, and wherein, when the L value is an even number, each of the plurality of processors repeats simultaneous transmission to four destination processors the (L1) 2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other, repeats the simultaneous data-transmission to four destination processors (L/21) times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, and performs once the simultaneous data-transmission to three destination processors with the absolute value of X1 and the absolute value of Y1 set to be equal to each other and set to half the L value, while changing the relative coordinate values in each of the simultaneous data-transmissions.
6. The parallel computing system according to claim 1, wherein, when the interconnection network is a three-dimensional torus and is equal in length among a first dimension, a second dimension, and a third dimension thereof, and when L represents the length in each of the dimensions of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of conditions of: 1) each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, and 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
7. The parallel computing system according to claim 1, wherein, when the interconnection network is a three-dimensional torus with one of a length in a first dimension, a length in a second dimension, and a length in a third dimension of the interconnection network different from the other lengths, and when L represents a maximum value of the length in the first dimension, the length in the second dimension, and the length in the third dimension of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of conditions of: 1) each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of: relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
8. The parallel computing system according to claim 1, wherein, when the interconnection network is a three-dimensional mesh, and when L represents a maximum value of a length in a first dimension, a length in a second dimension, and a length in a third dimension of the interconnection network and Xn, Yn, and Zn respectively represent the relative coordinate value in the first dimension, the relative coordinate value in the second dimension, and the relative coordinate value in the third dimension of the relative coordinates of an n-th destination processor with respect to a source processor, each of the plurality of processors performs: first transmission processing of transmitting data to six destination processors located at respective positions of: relative coordinates (X1, Y1, Z1) in which at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and in which the relative coordinates correspond to one of the following conditions of: 1) X1 is set to zero and one each of the absolute values X1, Y1, and Z1 is set to be different from zero and half the L value, 2) X1 is set to zero and each of the absolute values X1 and Z1 is set to be different from zero and half the L value, 3) X1 and Y1 are set to zero and the absolute value of Z1 is set to be different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to Z1, Y3 is set to X1, and Z3 is set to Y1, relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of Z1, Y4 is set to the sign-inverted value of X1, and Z4 is set to the sign-inverted value of Y1, relative coordinates (X5, Y5, Z5) in which X5 is set to Y1, Y5 is set to Z1, and Z5 is set to X1, and relative coordinates (X6, Y6, Z6) in which X6 is set to the sign-inverted value of Y1, Y6 is set to the sign-inverted value of Z1, and Z6 is set to the sign-inverted value of X1; and second transmission processing of transmitting data to four destination processors located at the respective positions of relative coordinates (X1, Y1, Z1) in which the absolute values of X1, Y1, and Z1 are set to the same value different from zero and half the L value, relative coordinates (X2, Y2, Z2) in which X2 is set to the sign-inverted value of X1, Y2 is set to the sign-inverted value of Y1, and Z2 is set to the sign-inverted value of Z1, relative coordinates (X3, Y3, Z3) in which X3 is set to X1, Y3 is set to Y1, and Z3 is set to the sign-inverted value of Z1, and relative coordinates (X4, Y4, Z4) in which X4 is set to the sign-inverted value of X1, Y4 is set to the sign-inverted value of Y1, and Z4 is set to Z1, wherein the plurality of processors in the parallel computing system performs, at the same timing, data communications using a same transmission processing with destination processors having the same relative coordinates.
9. The parallel computing system according to claim 6, wherein, when the L value is an odd number, each of the plurality of processors repeats the simultaneous data-transmission to six destination processors (L.sup.34L+3)/4 times, and repeats the simultaneous data-transmission to four destination processors L1 times, while changing the relative coordinate values in each of the simultaneous data-transmissions, and wherein, when the L value is an even number, each of the plurality of processors repeats the simultaneous data-transmission to six destination processors (L.sup.34L+1)/4 times, repeats the simultaneous data-transmission to four destination processors L2 times, and performs once the simultaneous data-transmission to seven destination processors, while changing the relative coordinate values in each of the simultaneous data-transmissions.
10. A method performed by each of a plurality of processors in a parallel computing system, the plurality of processors being multi-dimensionally coupled to each other in an interconnection network having multi-dimensional torus or mesh topology, each of the plurality of processors having a relative coordinate system common to all of the plurality of processors so that any one of the plurality of processors in the parallel computing system is uniquely identified by a set of relative coordinates based on the relative coordinate system, the method comprising: determining, in dimensional order, communication channels to other processors in the interconnection network, determining a plurality of distinct combinations each including plural sets of distinct destination relative coordinates so that the plurality of distinct combinations cover all the plurality of processors except the each processor in the parallel computing system, performing a transmission process to transmit data to all the plurality of processors in the parallel computing system, the transmission process including transmitting data, through the communication channels determined in dimensional order, simultaneously from the each processor to the plural sets of distinct destination relative coordinates included in each of the determined plurality of distinct combinations, wherein the plural sets of distinct destination relative coordinates included in each of the plurality of distinct combinations are determined to be symmetric about the origin of the relative coordinate system and to have identical distances in dimensional order from the origin of the relative coordinate system, so that a number of inter processor communications routed through respective links is equalized among all links in the parallel computing system.
11. The method of claim 10, wherein in a case where the interconnection network has a two-dimensional torus or mesh topology with two sides having a same length of 2n+1 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, and y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0), the transmission process includes: a first transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0<j<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, j), (i, j), (j, i), and (j, i), a second transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0<j<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, j), (i, j), (j, i), and (j, i); a third transmission that transmits, for each of all natural numbers i satisfying that 0<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, 0), (i, 0), (0, i), and (0, i); and a fourth transmission that transmits, for each of all natural numbers i satisfying that 0<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i), (i, i), (i, i), and (i, i).
12. The method of claim 10, wherein in a case where the interconnection network has a two-dimensional torus or mesh topology with two sides having a same length of 2n+2 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, and y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0), the transmission process includes: a first transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0<j<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, j), (i, j), (j, i), and (j, i); a second transmission that transmits, for each of all pairs of natural numbers i and j satisfying that 0<j<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, j), (i, j), (j, i), and (j, i); a third transmission that transmits, for each of all natural numbers i satisfying that 0<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, 0), (i, 0), (0, i), and (0, i); a fourth transmission that transmits, for each of all natural numbers i satisfying that 0<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i), (i, i), (i, i), and (i, i); a fifth simultaneous transmission that transmits, for each of all natural numbers i satisfying that 0<i=<n, data simultaneously to a distinct combination of four sets of destination relative coordinates: (n+1, i), (n1, i), (i, n+1), and (i, n1); and a sixth transmission that transmits data simultaneously to a distinct combination of three sets of destination relative coordinates: (n+1, 0), (0, n+1), and (n1, n1).
13. The method of claim 10, wherein in a case where the interconnection network has a three-dimensional torus or mesh topology with three sides having a same length of 2n+1 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y, z) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0, 0), and z is an integer whose sign indicates a direction along an z-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the z-axis on the dimensional-order path, the transmission process includes: a first transmission that performs, for each of all combinations of three natural numbers i, j, and k satisfying that 0<k<j<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i); a second transmission that performs, for each of all combinations of three natural numbers i, and j satisfying that 0<j<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j); transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), and (i, j, 0); transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), and (i, j, 0), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), and (j, i, 0), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), and (j, i, 0); and a third transmission that performs, for each of all natural numbers i satisfying that 0<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, i), (0, i, i), (i, 0, i), (i, 0, i), (i, i, 0), and (i, i, 0), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, 0, 0), (i, 0, 0), (0, 0, i), (0, 0, i), (0, i, 0), and (0, i, 0), transmitting data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i, i), (i, i, i), (i, i, i), and (i, i, i), and transmitting data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i, i), (i, i, i), (i, i, i), and (i, i, i).
14. The method of claim 10, wherein in a case where the interconnection network has a three-dimensional torus or mesh topology with three sides having a same length of 2n+2 where n is a natural number, and the relative coordinate system is set such that a set of relative coordinates of any one of the plurality of processors in the parallel computing system is represented as a set of relative coordinates (x, y, z) where x is an integer whose sign indicates a direction along an x-axis on a dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the x-axis on the dimensional-order path, y is an integer whose sign indicates a direction along a y-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the y-axis on the dimensional-order path from the origin (0, 0, 0), and z is an integer whose sign indicates a direction along an z-axis on the dimensional-order path from the origin (0, 0, 0) and whose absolute value indicates a number of links of processors along the z-axis on the dimensional-order path, the transmission process includes: a first transmission that performs, for each of all combinations of three natural numbers i, j, and k satisfying that 0<k<j<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), and (j, k, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), and (k, j, i); a second transmission that performs, for each of all combinations of three natural numbers i, and j satisfying that 0<j<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), and (i, j, i), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), and (j, i, j); transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), and (i, j, 0); transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), and (i, j, 0), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), and (j, i, 0), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), and (j, i, 0); a third transmission that performs, for each of all natural numbers i satisfying that 0<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (0, i, i), (0, i, i), (i, 0, i), (i, 0, i), (i, i, 0), and (i, i, 0), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, 0, 0), (i, 0, 0), (0, 0, i), (0, 0, i), (0, i, 0), and (0, i, 0), transmitting data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i, i), (i, i, i), (i, i, i), and (i, i, i), and transmitting data simultaneously to a distinct combination of four sets of destination relative coordinates: (i, i, i), (i, i, i), (i, i, i), and (i, i, i); a fourth transmission that performs, for each of all combinations of two natural numbers i and j satisfying that 0<j<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, i, j), (n1, i, j), (j, n+1, i), (j, n1, i), (i, j, n+1), and (i, j, n1), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, i, j), (n1, i, j), (j, n+1, i), (j, n1, i), (i, j, n+1), and (i, j, n1), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, j, i), (n1, j, i), (i, n+1, j), (i, n1, j), (j, i, n+1), and (j, i, n1), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, j, i), (n1, j, i), (i, n+1, j), (i, n1, j), (j, i, n+1), and (j, i, n1); a fifth transmission that performs, for each of all natural numbers i satisfying that 0<i=<n, transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, i, i), (n1, i, i), (i, n+1, i), (i, n1, i), (i, i, n+1), and (i, i, n1), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, i, 0), (n1, i, 0), (0, n+1, i), (0, n1, i), (i, 0, n+1), and (i, 0, n1), transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (n+1, 0, i), (n1, 0, i), (i, n+1, 0), (i, n1, 0), (0, i, n+1), and (0, i, n1), and transmitting data simultaneously to a distinct combination of six sets of destination relative coordinates: (i, n+1, n+1), (i, n1, n1), (n+1, i, n+1), (n1, i, n1), (n+1, n+1, i), and (n1, n1, i); and a sixth transmission that performs: transmitting data simultaneously to a distinct combination of four sets of destination relative coordinates: (n+1, n+1, 0), (0, 0, n+1), (n1, 0, n1), and (0, n1, 0), and transmitting data simultaneously to a distinct combination of three sets of destination relative coordinates: (0, n+1, n+1), (n+1, 0, 0), and (n1, n1, n1).
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
DESCRIPTION OF EMBODIMENTS
(5) In the figures, dimensions and/or proportions may be exaggerated for clarity of illustration. It will also be understood that when an element is referred to as being connected to another element, it may be directly connected or indirectly connected, i.e., intervening elements may also be present. Further, it will be understood that when an element is referred to as being between two elements, it may be the only element layer between the two elements, or one or more intervening elements may also be present.
(6) Embodiments of the present invention will be described below.
First Embodiment
(7) A parallel computing system according to an embodiment of the present invention connects a plurality of processors by using an interconnection network. In the present embodiment, the interconnection network is a two-dimensional torus, and the communication channels of the interconnection network are determined in dimensional order. Further, the length in a first dimension and the length in a second dimension of the interconnection network are equal to each other. See
(8) Each of the processors included in the parallel computing system of the present embodiment is capable of substantially simultaneously transmitting data to a maximum of four processors. Further, in the substantially simultaneous data transmission from each source processor to the respective destination processors, the relative coordinates of the destination processors with respect to the source processor are set to be common to all of the processors included in the parallel computing system.
(9) In the present embodiment, the length in the first dimension and the length in the second dimension of the interconnection network are represented as L. Herein, the length L indicates the number of links in the direction of each dimension, and does not necessarily refer to the physical length. This definition similarly applies to the other embodiments.
(10) Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, and the relative coordinate value in the second dimension is represented as Yn. In the present embodiment, an arbitrary source processor in the parallel computing system substantially simultaneously transmits data to destination processors by using all links of the two-dimensional torus interconnection network with substantially equal loads placed on the links.
(11) In the parallel computing system according to the present embodiment, each of the processors performs the following operations.
(12) The relative coordinates of the first destination processor with respect to an arbitrary source processor are now represented as X1, Y1. X denotes coordinate in X axis, and Y denotes coordinate in Y axis which is perpendicular to X axis. According to a first relative coordinate value determination algorithm, each of the processors determines the relative coordinate values of the relative coordinates X1, Y1 as follows:
(13) 1) Set the absolute value of X1 and the absolute value of Y1 to be different from each other; and
(14) 2) Set X1 to zero and set the absolute value of Y1 to be different from half the L value, or set Y1 to zero and set the absolute value of X1 to be different from half the L value.
(15) In this case, the relative coordinate values of the relative coordinates indicating three other destination processors are determined as follows.
(16) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(17) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(18) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(19) Each of the processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4. In this process, all of the processors in the parallel computing system perform, at substantially the same timing, substantially simultaneous data transmission to four destination processors indicated by substantially the same relative coordinates. Therefore, the present embodiment is capable of transmitting data by using all links of the two-dimensional torus interconnection network with substantially equal loads placed on the links.
(20) Further, according to a second relative coordinate value determination algorithm, each of the processors sets the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(21) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(22) 2) Set the absolute value of X1 and the absolute value of Y1 to be different from half the L value.
(23) In this case, each of the processors determines the relative coordinate values of the relative coordinates indicating three other destination processors as follows.
(24) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(25) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(26) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(27) Each of all processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4.
(28) Meanwhile, according to a third relative coordinate value determination algorithm, each of the processors sets the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(29) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(30) 2) Set the absolute value of X1 and the absolute value of Y1 to half the L value.
(31) In this case, each of the processors determines the relative coordinate values of the relative coordinates of two other destination processors as follows.
(32) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to zero.
(33) 2) In relative coordinates X3, Y3, set X3 to zero, and set Y3 to the sign-inverted value of Y1.
(34) Each of the processors in the parallel computing system substantially simultaneously transmits data to three destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, and X3, Y3 by using all links of the two-dimensional torus interconnection network with substantially equal loads placed on the links.
(35) In the present embodiment, the above-described three types of relative coordinate value determination algorithms are combined as specified to determine the destination processors to which data should be substantially simultaneously transmitted.
(36) Herein, if the L value is an odd number, the number of the processors included in the parallel computing system is also an odd number. With respect to an arbitrary source processor, therefore, the number of the destination processors to be subjected to all-to-all communication is an even number. In this case, each of the processors performs substantially simultaneous transmission to four destination processors by performing the substantially simultaneous transmission the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other and performing the substantially simultaneous transmission the (L1)/2 times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, while changing the relative coordinate values for each of the substantially simultaneous transmissions and determining the relative coordinate values of the destination processors so that data is not transmitted to the same processor more than once.
(37) Meanwhile, if the L value is an even number, the number of the processors included in the parallel computing system is an even number. With respect to an arbitrary source processor, therefore, the number of the destination processors to be subjected to all-to-all communication is an odd number. In this case, therefore, each of the processors performs substantially simultaneous transmission to four destination processors the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other, performs substantially simultaneous transmission to four destination processors the (L/21) times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, and performs one substantially simultaneous transmission to three destination processors with the absolute value of X1 and the absolute value of Y1 set to be equal to each other and set to half the L value, while changing the relative coordinate values for each of the simultaneous transmissions and determining the relative coordinate values of the destination processors so that the overlapping of destination processors does not occur.
(38) According to the above-described operations, if the transfer distances of data substantially simultaneously transmitted from each of the processors are added up for each of the dimensions and directions, the sum of the transfer distances from each of the processors is equal among all dimensions and directions. Further, all of the processors perform data transfer to the same relative coordinates. It is therefore possible to achieve all-to-all communication in the parallel computing system with equal loads placed on the links. If the L value is an even number, however, the parallel computing system may arbitrarily determine the transfer direction for each data transfer in a dimension in which the absolute value of the corresponding relative coordinate is half the L value.
(39) In the parallel computing system of the present embodiment, each of the processors performs substantially simultaneous transmission to a plurality of destination processors a plurality of times. Herein, simultaneous transmission performed by all of the processors in a similar manner is a condition for equalizing the loads on the links. In some cases, however, the processors have different processing times due to disturbance factors. As a result, asymmetry may occur in which, during a substantially simultaneous communication by a processor, another processor starts the next simultaneous communication.
(40) If all of the processors in the parallel computing system are synchronized between a substantially simultaneous transmission and the next transmission communication, the occurrence of asymmetry due to disturbance factors is inhibited. It is therefore possible that all of the processors perform substantially simultaneous transmission in the same phase at substantially the same time. This feature similarly applies to the other embodiments described later. The synchronization of the processors is performed with the use of a synchronization mechanism by hardware, which is generally provided in a parallel computing system. Even if a parallel computing system does not include a synchronization mechanism, as in the case of a PC (Personal Computer) cluster or the like, the parallel computing system may use a synchronization library usually provided by software.
(41) The processing of determining the relative coordinate values of the destination processors may be performed by each of the processors every time the data transmission processing is performed. Further, the relative coordinate values of the destination processors previously determined by an arbitrary algorithm may be set in a table or the like of each of the processors, and the relative coordinate values set in the table or the like may be read by each of the processors every time the data transmission processing is performed.
Second Embodiment
(42) A parallel computing system according to a second embodiment of the present invention connects a plurality of processors by using an interconnection network. In the present embodiment, the interconnection network is a two-dimensional torus, and the communication channels of the interconnection network are determined in dimensional order. Further, the length in a first dimension and the length in a second dimension of the interconnection network are different from each other.
(43) Each of the processors in the parallel computing system according to the second embodiment is capable of substantially simultaneously transmitting data to a maximum of four processors. The relative coordinates used when data is substantially simultaneously transmitted from each of the processors to a plurality of destination processors are common to all of the processors included in the parallel computing system.
(44) In the second embodiment, the maximum value of the length in the first dimension and the length in the second dimension of the interconnection network is represented as L. Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, and the relative coordinate value in the second dimension is represented as Yn. If any of the length in the first dimension and the length in the second dimension is less than the L value, however, a transmittable range corresponding to the length in the dimension is preset, and the transmission operation is not performed to relative coordinates exceeding the transmittable range.
(45) In the present embodiment, each of the processors in the parallel computing system substantially simultaneously transmits data to destination processors having the relative coordinates determined as follows, by using the respective links of the two-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(46) According to a first relative coordinate value determination algorithm, each of the processors first determines the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(47) 1) Set the absolute value of X1 to be different from the absolute value of Y1; and
(48) 2) Set X1 to zero and set the absolute value of Y1 to be different from half the L value, or set Y1 to zero and set the absolute value of X1 to be different from half the L value.
(49) In this case, each of the processors determines the relative coordinate values of the relative coordinates of three other destination processors as follows.
(50) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(51) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(52) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(53) Each of the processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4 by using all links of the two-dimensional torus interconnection network with substantially equal loads placed on the links.
(54) Further, according to a second relative coordinate value determination algorithm, each of the processors sets the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(55) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(56) 2) Set the absolute value of X1 and the absolute value of Y1 to be different from half the L value.
(57) In this case, each of the processors determines the relative coordinate values of the relative coordinates of three other destination processors as follows.
(58) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(59) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(60) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(61) Each of the processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4 by using the respective links of the two-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(62) Further, according to a third relative coordinate value determination algorithm, each of the processors sets the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(63) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(64) 2) Set the absolute value of X1 and the absolute value of Y1 to half the L value.
(65) In this case, each of the processors determines the relative coordinate values of the relative coordinates of two other destination processors as follows.
(66) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to zero.
(67) 2) In relative coordinates X3, Y3, set X3 to zero, and set Y3 to the sign-inverted value of Y1.
(68) Each of the processors in the parallel computing system substantially simultaneously transmits data to three destination processors located at the above-determined relative coordinates X1, Y1, X2, Y2, and X3, Y3 by using the respective links of the two-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(69) If the L value is an odd number, each of the processors performs substantially simultaneous transmission to four destination processors by performing the substantially simultaneous transmission the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other and performing the substantially simultaneous transmission the (L1)/2 times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, while changing the relative coordinate values of the destination processors in each of the simultaneous transmissions so that the overlapping of destination processors does not occur.
(70) Meanwhile, if the L value is an even number, each of the processors performs substantially simultaneous transmission to four destination processors the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other, performs substantially simultaneous transmission to four destination processors the (L/21) times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, and performs one substantially simultaneous transmission to three destination processors with the absolute value of X1 and the absolute value of Y1 set to be equal to each other and set to half the L value so that the overlapping of destination processors does not occur.
Third Embodiment
(71) In a third embodiment of the present invention, a parallel computing system connects a plurality of processors by using an interconnection network. The interconnection network is a two-dimensional mesh, and the communication channels of the interconnection network are determined in dimensional order. Further, each of the processors in the parallel computing system substantially simultaneously transmits data to a maximum of four processors. Herein, the relative coordinates of a plurality of destination processors with respect to an arbitrary processor are common to all of the processors included in the parallel computing system.
(72) In the present embodiment, the maximum value of the length in a first dimension and the length in a second dimension of the interconnection network is represented as L. Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, and the relative coordinate value in the second dimension is represented as Yn. If the relative coordinates of a destination processor with respect to an arbitrary processor are located outside the interconnection network, however, the transmission from the processor to the destination processor is not performed.
(73) Herein, each of the processors determines the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(74) 1) Set the absolute value of X1 to be different from the absolute value of Y1; and
(75) 2) Set X1 to zero and set the absolute value of Y1 to be different from half the L value, or set Y1 to zero and set the absolute value of X1 to be different from half the L value.
(76) In this case, the relative coordinate values of the relative coordinates of three other destination processors are determined as follows.
(77) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(78) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(79) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(80) Each of the processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4 by using the respective links of the two-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(81) Then, each of the processors determines the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(82) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(83) 2) Set the absolute value of X1 and the absolute value of Y1 to be different from half the L value.
(84) In this case, each of the processors determines the relative coordinate values of three other destination processors as follows.
(85) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to the sign-inverted value of Y1.
(86) 2) In relative coordinates X3, Y3, set X3 to Y1, and set Y3 to X1.
(87) 3) In relative coordinates X4, Y4, set X4 to the sign-inverted value of Y1, and set Y4 to the sign-inverted value of X1.
(88) Each of all processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, X2, Y2, X3, Y3, and X4, Y4 by using the respective links of the two-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(89) Further, each of the processors determines the relative coordinate values of relative coordinates X1, Y1 of a destination processor as follows:
(90) 1) Set the absolute value of X1 to be equal to the absolute value of Y1; and
(91) 2) Set the absolute value of X1 and the absolute value of Y1 to half the L value.
(92) In this case, each of the processors determines the relative coordinate values of other destination processors as follows.
(93) 1) In relative coordinates X2, Y2, set X2 to the sign-inverted value of X1, and set Y2 to zero.
(94) 2) In relative coordinates X3, Y3, set X3 to zero, and set Y3 to the sign-inverted value of Y1.
(95) Each of all processors in the parallel computing system substantially simultaneously transmits data to three destination processors located at the above-determined relative coordinates X1, Y1, X2, Y2, and X3, Y3 by using the respective links of the two-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(96) Herein, if the L value is an odd number, an arbitrary processor performs substantially simultaneous transmission to four destination processors by performing the substantially simultaneous transmission the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 in the relative coordinates of the destination processor set to be different from each other and performing the simultaneous transmission the (L1)/2 times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other so that the overlapping of destination processors does not occur.
(97) Meanwhile, if the L value is an even number, an arbitrary processor performs substantially simultaneous transmission to four destination processors the (L1).sup.2/4 times with the absolute value of X1 and the absolute value of Y1 set to be different from each other, performs substantially simultaneous transmission to four destination processors the (L/21) times with the absolute value of X1 and the absolute value of Y1 set to be equal to each other, and performs one substantially simultaneous transmission to three destination processors so that the overlapping of destination processors does not occur.
Fourth Embodiment
(98) In a parallel computing system according to a fourth embodiment of the present invention, the interconnection network is a three-dimensional torus, and the communication channels of the interconnection network are determined in dimensional order. Further, the length in a first dimension, the length in a second dimension, and the length in a third dimension of the interconnection network are substantially equal to one another.
(99) Each of the processors in the parallel computing system of the present embodiment substantially simultaneously transmits data to a maximum of six processors. Further, the relative coordinates of a plurality of destination processors with respect to an arbitrary processor are common to all of the processors included in the parallel computing system. In the present embodiment, an arbitrary source processor in the parallel computing system simultaneously transmits data to the destination processors in accordance with the following procedure by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links.
(100) Herein, each of the length in the first dimension, the length in the second dimension, and the length in the third dimension of the interconnection network is represented as L. Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, the relative coordinate value in the second dimension is represented as Yn, and the relative coordinate value in the third dimension is represented as Zn.
(101) Herein, a case is assumed in which each of the processors sets the relative coordinate values of relative coordinates X1, Y1, Z1 of an arbitrary destination processor in the following procedure. In this case, at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and the relative coordinates correspond to one of the following conditions:
(102) 1) Set X1 to zero, and set one of the absolute value of Y1 and the absolute value of Z1 to be different from half the L value;
(103) 2) Set Y1 to zero, and set one of the absolute value of X1 and the absolute value of Z1 to be different from half the L value;
(104) 3) Set Z1 to zero, and set one of the absolute value of X1 and the absolute value of Y1 to be different from half the L value;
(105) 4) Set X1 and Y1 to zero, and set the absolute value of Z1 to be different from half the L value;
(106) 5) Set X1 and Z1 to zero, and set the absolute value of Y1 to be different from half the L value; or
(107) 6) Set Y1 and Z1 to zero, and set the absolute value of X1 to be different from half the L value.
(108) In this case, each of the processors determines the relative coordinates of five other destination processors as follows.
(109) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(110) 2) In relative coordinates X3, Y3, Z3, set X3 to Z1, set Y3 to X1, and set Z3 to Y1.
(111) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of Z1, set Y4 to the sign-inverted value of X1, and set Z4 to the sign-inverted value of Y1.
(112) 4) In relative coordinates X5, Y5, Z5, set X5 to Y1, set Y5 to Z1, and set Z5 to X1.
(113) 5) In relative coordinates X6, Y6, Z6, set X6 to the sign-inverted value of Y1, set Y6 to the sign-inverted value of Z1, and set Z6 to the sign-inverted value of X1.
(114) Each of all processors in the parallel computing system substantially simultaneously transmits data to six destination processors located at the respective positions of the above-determined relative coordinates X1, Y1, Z1, X2, Y2, Z2, X3, Y3, Z3, X4, Y4, Z4, X5, Y5, Z5, and X6, Y6, Z6 by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links.
(115) Further, all of the processors perform, at substantially the same timing, simultaneous data transmission to destination processors having the same relative coordinate values.
(116) Further, each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(117) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(118) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be different from half the L value.
(119) In this case, each of the processors determines the relative coordinates of three other destination processors as follows.
(120) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(121) 2) In relative coordinates X3, Y3, Z3, set X3 to X1, set Y3 to Y1, and set Z3 to the sign-inverted value of Z1.
(122) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of X1, set Y4 to the sign-inverted value of Y1, and set Z4 to Z1.
(123) Each of all processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the above-determined relative coordinates X1, Y1, Z1, X2, Y2, Z2, X3, Y3, Z3, and X4, Y4, Z4 by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links.
(124) Further, each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(125) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(126) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to half the L value.
(127) In this case, each of the processors determines the relative coordinates of six other destination processors as follows.
(128) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, and set Y2 and Z2 to zero.
(129) 2) In relative coordinates X3, Y3, Z3, set X3 to zero, set Y3 to the sign-inverted value of Y1, and set Z3 to the sign-inverted value of Z1.
(130) 3) In relative coordinates X4, Y4, Z4, set X4 and Z4 to zero, and set Y4 to Y1.
(131) 4) In relative coordinates X5, Y5, Z5, set X5 to X1, set Y5 to zero, and set Z5 to Z1.
(132) 5) In relative coordinates X6, Y6, Z6, set X6 and Y6 to zero, and set Z6 to the sign-inverted value of Z1.
(133) 6) In relative coordinates X7, Y7, Z7, set X7 to the sign-inverted value of X1, set Y7 to the sign-inverted value of Y1, and set Z7 to zero.
(134) Each of all processors in the parallel computing system performs, as the first simultaneous transmission, substantially simultaneous data transmission to the first destination processor corresponding to the relative coordinates X1, Y1, Z1, the second destination processor corresponding to the relative coordinates X2, Y2, Z2, and the third destination processor corresponding to the relative coordinates X3, Y3, Z3 by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links. Then, each of the source processors performs, as the second simultaneous transmission, substantially simultaneous data transmission to the fourth destination processor corresponding to the relative coordinates X4, Y4, Z4, the fifth destination processor corresponding to the relative coordinates X5, Y5, Z5, the sixth destination processor corresponding to the relative coordinates X6, Y6, Z6, and the seventh destination processor corresponding to the relative coordinates X7, Y7, Z7 by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links. That is, in the present example, the data transmission to seven destination processors in total is performed in two sessions.
(135) The parallel computing system according to the present embodiment determines the relative coordinates of the destination processors by combining, as required, the above-described three types of relative coordinate value determination algorithms, and performs the substantially simultaneous data transmission to the destination processors.
(136) Herein, if the L value is an odd number, an arbitrary processor performs substantially simultaneous transmission to six destination processors the (L.sup.34L+3)/4 times, and performs substantially simultaneous transmission to four destination processors the L1 times such that the overlapping of destination processors does not occur.
(137) Meanwhile, if the L value is an even number, an arbitrary processor performs simultaneous transmission to six destination processors the (L.sup.34L+1)/4 times, performs substantially simultaneous transmission to four destination processors the L2 times, and performs substantially simultaneous transmission to seven destination processors with one substantially simultaneous transmission to three destination processors and one substantially simultaneous transmission to four destination processors so that the overlapping of destination processors does not occur.
Fifth Embodiment
(138) A parallel computing system according to a fifth embodiment of the present invention connects a plurality of processors by using an in interconnection network. The interconnection network according to the fifth embodiment is a three-dimensional torus, and the communication channels of the interconnection network are determined in dimensional order. Further, at least one of the length in a first dimension, the length in a second dimension, and the length in a third dimension of the interconnection network is different from the lengths in the other dimensions. Further, each of all processors in the parallel computing system substantially simultaneously transmits data to a maximum of six processors. The relative coordinates of a plurality of destination processors used when data is substantially simultaneously transmitted from an arbitrary processor to the destination processors are common to all of the processors included in the parallel computing system.
(139) Herein, the maximum value of the length in the first dimension, the length in the second dimension, and the length in the third dimension of the interconnection network is represented as L. Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, the relative coordinate value in the second dimension is represented as Yn, and the relative coordinate value in the third dimension is represented as Zn. If any of the length in the first dimension, the length in the second dimension, and the length in the third dimension is less than the L value, however, a transmittable range corresponding to the length in the dimension is preset, and the transmission processing is not performed to a processor located at relative coordinates exceeding the transmittable range.
(140) Herein, each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows.
(141) Herein, if at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values,
(142) 1) Set X1 to zero, and set one of the absolute value of Y1 and the absolute value of Z1 to be different from half the L value,
(143) 2) Set Y1 to zero, and set one of the absolute value of X1 and the absolute value of Z1 to be different from half the L value,
(144) 3) Set Z1 to zero, and set one of the absolute value of X1 and the absolute value of Y1 to be different from half the L value,
(145) 4) Set X1 and Y1 to zero, and set the absolute value of Z1 to be different from half the L value,
(146) 5) Set X1 and Z1 to zero, and set the absolute value of Y1 to be different from half the L value, or
(147) 6) Set Y1 and Z1 to zero, and set the absolute value of X1 to be different from half the L value.
(148) In this case, each of the processors determines the relative coordinates of five other destination processors as follows.
(149) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(150) 2) In relative coordinates X3, Y3, Z3, set X3 to Z1, set Y3 to X1, and set Z3 to Y1.
(151) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of Z1, set Y4 to the sign-inverted value of X1, and set Z4 to the sign-inverted value of Y1.
(152) 4) In relative coordinates X5, Y5, Z5, set X5 to Y1, set Y5 to Z1, and set Z5 to X1.
(153) 5) In relative coordinates X6, Y6, Z6, set X6 to the sign-inverted value of Y1, set Y6 to the sign-inverted value of Z1, and set Z6 to the sign-inverted value of X1
(154) Each of all processors in the parallel computing system substantially simultaneously transmits data to six destination processors located at the above-determined relative coordinates X1, Y1, Z1 to X6, Y6, Z6 by using the respective links of the three-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(155) Further, it is now assumed that each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(156) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(157) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be different from half the L value.
(158) In this case, each of the processors determines the relative coordinates of three other destination processors as follows.
(159) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(160) 2) In relative coordinates X3, Y3, Z3, set X3 to X1, set Y3 to Y1, and set Z3 to the sign-inverted value of Z1.
(161) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of X1, set Y4 to the sign-inverted value of Y1, and set Z4 to Z1.
(162) Each of all processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the above-determined relative coordinates X1, Y1, Z1 to X4, Y4, Z4 by using the respective links of the three-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(163) Meanwhile, each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(164) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(165) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to half the L value.
(166) In this case, each of the processors determines the relative coordinates of six other destination processors as follows.
(167) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, and set Y2 and Z2 to zero.
(168) 2) In relative coordinates X3, Y3, Z3, set X3 to zero, set Y3 to the sign-inverted value of Y1, and set Z3 to the sign-inverted value of Z1.
(169) 3) In relative coordinates X4, Y4, Z4, set X4 and Z4 to zero, and set Y4 to Y1.
(170) 4) In relative coordinates X5, Y5, Z5, set X5 to X1, set Y5 to zero, and set Z5 to Z1.
(171) 5) In relative coordinates X6, Y6, Z6, set X6 and Y6 to zero, and set Z6 to the sign-inverted value of Z1.
(172) 6) In relative coordinates X7, Y7, Z7, set X7 to the sign-inverted value of X1, set Y7 to the sign-inverted value of Y1, and set Z7 to zero.
(173) Each of all processors in the parallel computing system performs, as the first simultaneous transmission, substantially simultaneous data transmission to the first destination processor corresponding to the relative coordinates X1, Y1, Z1, the second destination processor corresponding to the relative coordinates X2, Y2, Z2, and the third destination processor corresponding to the relative coordinates X3, Y3, Z3 by using all links of the three-dimensional torus interconnection network with substantially equal loads placed on the links. Then, each of the source processors performs, as the second simultaneous transmission, substantially simultaneous data transmission to the fourth destination processor corresponding to the relative coordinates X4, Y4, Z4, the fifth destination processor corresponding to the relative coordinates X5, Y5, Z5, the sixth destination processor corresponding to the relative coordinates X6, Y6, Z6, and the seventh destination processor corresponding to the relative coordinates X7, Y7, Z7 by using the respective links of the three-dimensional torus interconnection network with a load of a specified value or less placed on each of the links.
(174) The parallel computing system according to the present embodiment determines the relative coordinates of the destination processors by combining, as required, the above-described three types of relative coordinate value determination algorithms, and performs the substantially simultaneous data transmission to the destination processors.
(175) Herein, if the L value is an odd number, each of the processors performs substantially simultaneous transmission to six destination processors the (L.sup.34L+3)/4 times, and performs substantially simultaneous transmission to four destination processors the L1 times so that the overlapping of destination processors does not occur.
(176) Further, if the L value is an even number, each of the processors performs substantially simultaneous transmission to six destination processors the (L.sup.34L+1)/4 times, performs substantially simultaneous transmission to four destination processors the L2 times, and performs substantially simultaneous transmission to seven destination processors with one substantially simultaneous transmission to three destination processors and one substantially simultaneous transmission to four destination processors so that the overlapping of destination processors does not occur.
Sixth Embodiment
(177) A parallel computing system according to a sixth embodiment of the present invention connects a plurality of processors by using an interconnection network. The interconnection network is a three-dimensional mesh, and the communication channels of the interconnection network are determined in dimensional order. Further, each of all processors in the parallel computing system substantially simultaneously transmits data to a maximum of six processors. The relative coordinates of a plurality of destination processors with respect to each of the processors are common to all of the processors included in the parallel computing system.
(178) Herein, the maximum value of the length in a first dimension, the length in a second dimension, and the length in a third dimension of the interconnection network is represented as L. Further, in the relative coordinates of the n-th destination processor with respect to a source processor, the relative coordinate value in the first dimension is represented as Xn, the relative coordinate value in the second dimension is represented as Yn, and the relative coordinate value in the third dimension is represented as Zn. If a destination processor is located outside the interconnection network, however, the transmission to the destination processor is not performed.
(179) Each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows.
(180) In this case, each of the processors determines the relative coordinate values so that at least one of the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 is different from the other absolute values, and that the relative coordinate values correspond to one of the following conditions:
(181) 1) Set X1 to zero, and set one of the absolute value of Y1 and the absolute value of Z1 to be different from half the L value;
(182) 2) Set Y1 to zero, and set one of the absolute value of X1 and the absolute value of Z1 to be different from half the L value;
(183) 3) Set Z1 to zero, and set one of the absolute value of X1 and the absolute value of Y1 to be different from half the L value;
(184) 4) Set X1 and Y1 to zero, and set the absolute value of Z1 to be different from half the L value;
(185) 5) Set X1 and Z1 to zero, and set the absolute value of Y1 to be different from half the L value; or
(186) 6) Set Y1 and Z1 to zero, and set the absolute value of X1 to be different from half the L value.
(187) In this case, each of the processors determines the relative coordinates of five other destination processors as follows.
(188) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(189) 2) In relative coordinates X3, Y3, Z3, set X3 to Z1, set Y3 to X1, and set Z3 to Y1.
(190) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of Z1, set Y4 to the sign-inverted value of X1, and set Z4 to the sign-inverted value of Y1.
(191) 4) In relative coordinates X5, Y5, Z5, set X5 to Y1, set Y5 to Z1, and set Z5 to X1.
(192) 5) In relative coordinates X6, Y6, Z6, set X6 to the sign-inverted value of Y1, set Y6 to the sign-inverted value of Z1, and set Z6 to the sign-inverted value of X1.
(193) Each of all processors in the parallel computing system substantially simultaneously transmits data to six destination processors located at the above-determined relative coordinates X1, Y1, Z1 to X6, Y6, Z6 by using the respective links of the three-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(194) Further, each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(195) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(196) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be different from half the L value.
(197) In this case, each of the processors determines the relative coordinates of three other destination processors as follows.
(198) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, set Y2 to the sign-inverted value of Y1, and set Z2 to the sign-inverted value of Z1.
(199) 2) In relative coordinates X3, Y3, Z3, set X3 to X1, set Y3 to Y1, and set Z3 to the sign-inverted value of Z1.
(200) 3) In relative coordinates X4, Y4, Z4, set X4 to the sign-inverted value of X1, set Y4 to the sign-inverted value of Y1, and set Z4 to Z1.
(201) Each of all processors in the parallel computing system substantially simultaneously transmits data to four destination processors located at the above-determined relative coordinates X1, Y1, Z1 to X4, Y4, Z4 by using the respective links of the three-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(202) Further, it is now assumed that each of the processors determines the relative coordinate values of relative coordinates X1, Y1, Z1 of a destination processor as follows:
(203) 1) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to be equal to one another; and
(204) 2) Set the absolute value of X1, the absolute value of Y1, and the absolute value of Z1 to half the L value.
(205) In this case, each of the processors determines the relative coordinate values of six other destination processors as follows.
(206) 1) In relative coordinates X2, Y2, Z2, set X2 to the sign-inverted value of X1, and set Y2 and Z2 to zero.
(207) 2) In relative coordinates X3, Y3, Z3, set X3 to zero, set Y3 to the sign-inverted value of Y1, and set Z3 to the sign-inverted value of Z1.
(208) 3) In relative coordinates X4, Y4, Z4, set X4 and Z4 to zero, and set Y4 to Y1.
(209) 4) In relative coordinates X5, Y5, Z5, set X5 to X1, set Y5 to zero, and set Z5 to Z1.
(210) 5) In relative coordinates X6, Y6, Z6, set X6 and Y6 to zero, and set Z6 to the sign-inverted value of Z1.
(211) 6) In relative coordinates X7, Y7, Z7, set X7 to the sign-inverted value of X1, set Y7 to the sign-inverted value of Y1, and set Z7 to zero.
(212) Each of all processors in the parallel computing system performs, as the first simultaneous transmission, substantially simultaneous data transmission to the first destination processor corresponding to the relative coordinates X1, Y1, Z1, the second destination processor corresponding to the relative coordinates X2, Y2, Z2, and the third destination processor corresponding to the relative coordinates X3, Y3, Z3 by using all links of the three-dimensional mesh interconnection network with substantially equal loads placed on the links. Then, each of the source processors performs, as the second simultaneous transmission, substantially simultaneous data transmission to the fourth destination processor corresponding to the relative coordinates X4, Y4, Z4, the fifth destination processor corresponding to the relative coordinates X5, Y5, Z5, the sixth destination processor corresponding to the relative coordinates X6, Y6, Z6, and the seventh destination processor corresponding to the relative coordinates X7, Y7, Z7 by using the respective links of the three-dimensional mesh interconnection network with a load of a specified value or less placed on each of the links.
(213) Herein, if the L value is an odd number, each of the processors performs substantially simultaneous transmission to six destination processors the (L.sup.34L+3)/4 times, and performs substantially simultaneous transmission to four destination processors the L1 times so that the overlapping of destination processors does not occur.
(214) Meanwhile, if the L value is an even number, each of the processors performs substantially simultaneous transmission to six destination processors the (L.sup.34L+1)/4 times, performs substantially simultaneous transmission to four destination processors the L2 times, and performs substantially simultaneous transmission to seven destination processors with one substantially simultaneous transmission to three destination processors and one substantially simultaneous transmission to four destination processors so that the overlapping of destination processors does not occur.
(215) In the above-described example of three-dimensional torus interconnection network, all of the processors in the parallel computing system are synchronized between a substantially simultaneous transmission and the next substantially simultaneous transmission. Accordingly, even if asymmetry occurs in the loads on the links of the three-dimensional torus interconnection network, all of the processors perform substantially the same simultaneous transmission at substantially the same time.
(216) In the all-to-all communication method according to the embodiments described above, each of the processors transmits and receives a plurality of messages in parallel in accordance with a plurality of communication controllers provided in the corresponding node. In this process, the processors serving as communication destinations are arranged in order. Thereby, the relative coordinates of the destination processors with respect to each of nodes are set to be the same among all of the nodes, and the number of inter-processor communications routed through all links between network routers of the system is equalized among the links.
Seventh Embodiment
(217) In a parallel computing system using a two-dimensional torus which connects nodes 1 each configured to include a processor 2, four communication controllers 3, and a network router 4, as illustrated in
(218)
(219) In the two-dimensional torus topology of
(220) In the two-dimensional torus topology according to the present embodiment, for all values of natural numbers i and j represented as 0<j<in, each of all processors first performs substantially simultaneous communication with four processors respectively located at relative coordinates (i, j), (i, j), (j, i), and (j, i) with respect to the processor (
(221) Then, each of the processors performs substantially simultaneous communication with four processors respectively located at relative coordinates (i, j), (i, j), (j, i), and (j, i) with respect to the processor (
(222) Further, for all values of a natural number i represented as 0<in, each of the processors performs substantially simultaneous communication with four processors respectively located at relative coordinates (i, 0), (i, 0), (0, i), and (0, i) with respect to the processor (
(223) Then, each of the processors performs substantially simultaneous communication with four processors respectively located at relative coordinates (i, i), (i, i), (i, i), and (i, i) with respect to the processor (
Eighth Embodiment
(224)
(225) In the example of
(226) For all values of a natural number i represented as 0<in, each of all processors first performs substantially simultaneous communication with four processors respectively located at relative coordinates (n+1, i), (n1, i), (i, n+1), and (i, n1) with respect to the processor (
(227) Then, each of all processors performs substantially simultaneous communication with three processors respectively located at relative coordinates (n+1, 0), (0, n+1), and (n1, n1) with respect to the processor (
Ninth Embodiment
(228) In a parallel computing system using a three-dimensional torus, in which each of the processors has six communication controllers, the number of inter-processor communications routed through all links is equalized among the links in accordance with the following algorithms.
(229) The following description will be made of an example of a three-dimensional torus topology which has three sides having the same length corresponding to an odd number. Herein, the length of each of the sides is represented as 2n+1. For all values of natural numbers i, j, and k represented as 0<k<j<in, each of the processors first performs simultaneous communication with six processors in accordance with the following combinations of relative coordinates. Here, i, J and k denotes coordinate values of X axis, Y axis and Z axis of a three-dimensional torus topology. In the following, six sets of relative coordinates form one group. This feature similarly applies to the subsequent examples.
(230) 1) (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), (j, k, i)
(231) 2) (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), (j, k, i)
(232) 3) (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), (j, k, i)
(233) 4) (i, j, k), (i, j, k), (k, i, j), (k, i, j), (j, k, i), (j, k, i)
(234) 5) (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), (k, j, i)
(235) 6) (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), (k, j, i)
(236) 7) (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), (k, j, i) 8) (i, k, j), (i, k, j), (j, i, k), (j, i, k), (k, j, i), (k, j, i)
(237) Then, for all values of natural numbers i and j represented as 0<j<in, each of the processors performs substantially simultaneous communication with six processors corresponding to the following combinations of relative coordinates with respect to the processor. Also in the following example, six sets of relative coordinates form one group.
(238) 1) (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), (i, j, i)
(239) 2) (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), (i, j, i)
(240) 3) (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), (i, j, i)
(241) 4) (i, i, j), (i, i, j), (j, i, i), (j, i, i), (i, j, i), (i, j, i)
(242) 5) (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), (j, i, j)
(243) 6) (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), (j, i, j)
(244) 7) (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), (j, i, j)
(245) 8) (j, j, i), (j, j, i), (i, j, j), (i, j, j), (j, i, j), (j, i, j)
(246) 9) (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), (i, j, 0)
(247) 10) (0, i, j), (0, i, j), (j, 0, i), (j, 0, i), (i, j, 0), (i, j, 0)
(248) 11) (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), (j, i, 0)
(249) 12) (0, j, i), (0, j, i), (i, 0, j), (i, 0, j), (j, i, 0), (j, i, 0)
(250) Further, for all values of a natural number i represented as 0<in, each of the processors performs substantially simultaneous communication with a plurality of processors corresponding to the following combinations of relative coordinates with respect to the processor.
(251) 1) (0, i, i), (0, i, i), (i, 0, i), (i, 0, i), (i, i, 0), (i, i, 0)
(252) 2) (i, 0, 0), (i, 0, 0), (0, 0, i), (0, 0, i), (0, i, 0), (0, i, 0)
(253) 3) (i, i, i), (i, i, i), (i, i, i), (i, i, i)
(254) 4) (i, i, i), (i, i, i), (i, i, i), (i, i, i)
Tenth Embodiment
(255) Description will be made of a three-dimensional torus topology which has three sides having the same length corresponding to an even number. In the present example, the length of each of the sides is represented as 2n+2. In a similar manner as in the example of three-dimensional torus topology, in which the topology has three sides having the same length corresponding to an odd number, an arbitrary processor communicates with a processor having a value of at least n and at most n as each of the relative coordinates thereof on the respective coordinate axes (x-axis, y-axis, and z-axis) with respect to the arbitrary processor. Then, in accordance with the following combinations, the arbitrary processor communicates with, among the remaining processors, the processors having a value of n1 or n+1 as the relative coordinate thereof on at least one of the three coordinate axes.
(256) Subsequently, for all values of natural numbers i and j represented as 0<j<in, each of the processors performs substantially simultaneous communication with six processors corresponding to the following combinations of relative coordinates with respect to the processor.
(257) 1) (n+1, i, j), (n1, i, j), (j, n+1, i), (j, n1, i), (i, j, n+1), (j, j, n1)
(258) 2) (n+1, i, j), (n1, i, j), (j, n+1, i), (j, n1, i), (i, j, n+1), (i, j, n1)
(259) 3) (n+1, j, i), (n1, j, i), (i, n+1, j), (i, n1, j), (j, i, n+1), (j, i, n1)
(260) 4) (n+1, j, i), (n1, j, i), (i, n+1, j), (i, n1, j), (j, i, n+1), (j, i, n1)
(261) Further, for all values of a natural number i represented as 0<in, each of the processors performs substantially simultaneous communication with six processors corresponding to the following combinations of relative coordinates with respect to the processor.
(262) 1) (n+1, i, i), (n1, i, i), (i, n+1, i), (i, n1, i), (i, i, n+1), (i, i, n1)
(263) 2) (n+1, i, i), (n1, i, i), (i, n+1, i), (i, n1, i), (i, i, n+1), (i, i, n1)
(264) 3) (n+1, i, 0), (n1, i, 0), (0, n+1, i), (0, n1, i), (i, 0, n+1), (i, 0, n1)
(265) 4) (n+1, 0, i), (n1, 0, i), (i, n+1, 0), (i, n1, 0), (0, i, n+1), (0, i, n1)
(266) 5) (i, n+1, n+1), (i, n1, n1), (n+1, i, n+1), (n1, i, n1), (n+1, n+1, i), (n1, n1, i)
(267) Finally, each of the processors performs substantially simultaneous communication with a plurality of processors corresponding to the following combinations of relative coordinates with respect to the processor.
(268) 1) (n+1, n+1, 0), (0, 0, n+1), (n1, 0, n1), (0, n1, 0)
(269) 2) (0, n+1, n+1), (n+1, 0, 0), (n1, n1, n1)
(270) In the above-described embodiments, the processing in which each of the processors determines the destination processors is performed as a program recorded in a recording medium is executed by each of the processors. The recording medium for recording the program may include a storage device, e.g., as a magnetic disk device included in a node provided with the processor. Further, the program may be stored in a portable recording medium, e.g., an optical disk and a magnetic disk, read from the portable recording medium with the use of a storage medium reading device provided to the node or the like, and executed by the processor. Various types of semiconductor recording devices may be used as the recording medium for recording the program.
(271) All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment(s) of the present invention(s) has (have) been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.