Construction of MBR (minimum bandwidth regenerating) codes and a method to repair the storage nodes

Abstract

This invention gives a coding method of MBR (Minimum Bandwidth Regenerating) codes. The related method includes the following steps: equally divide the original file of size B into k(k+1)/2 blocks, obtaining the first packets; construct a symmetrical k×k system matrix S with these first packets; generate k ID codes, wherein each ID code contains k elements; obtain the coded packet through operations between one column of the system matrix and the ID code; repeat the above steps with (n−k) different columns of the system matrix separately to get the (n−k) coded packets; construct the (n−k)×k check matrix P with the column number g which is the serial number of the ID codes in the coded packet set P.sub.g; store the rows of the system matrix and coded matrix to n nodes, each node stores one row. The present invention also involves a method to repair the failed nodes of the above coding scheme.

Claims

1. A coding method of Minimum Bandwidth Regenerating codes and repairing storage nodes in a distributed data storage system comprising the following steps: A) equally dividing an original file in a distributed data storage system of size B averagely into k(k+1)/2 blocks, obtaining first packets denoted as c.sub.i=b.sub.i,1 b.sub.i,2 . . . b.sub.i,L, i=1, 2, . . . , k(k+1)/2; B) constructing a symmetrical k×k system matrix S by using the first packets chosen sequentially according to serial number, and generating an upper-triangle of a distributed data storage system matrix by filling up the upper-triangle with the chosen first packets line by line successively according to an order of a column where elements of the system matrix are stored; C) generating k ID codes, each of which contains k elements; adding a given number of zeros to a head or end of the first packet in a column of the system matrix, wherein the given number of zeroes is equal to a first element of a respective ID code, to get k second packets; operating on the k second packets to get a coded packet; repeating the above steps with the columns of the system matrix by using different ID codes to obtain k coded packets; arranging the k coded packets according to the serial number to get a coded packet set P.sub.g=p.sub.g,1 p.sub.g,2 . . . p.sub.g,k, where g=1, 2, . . . , n−k, p.sub.g,k is the coded packet obtained from the k-th ID code and the g-th column of the system; repeating the above steps with (n−k) different columns of the system matrix, to obtain (n−k) coded packet sets; D) constructing a (n−k)×k check matrix P, each row of which is the successively arranged coded packet set P.sub.g; E) storing the first packets contained in each row of the system matrix to a storage node respectively, to get k system nodes; storing each row of the check matrix to a storage node, to get (n−k) check nodes, wherein n is the total amount of storage nodes; F) confirming invalidation of the storage nodes, and deciding whether each failed node is a system node, if yes, executing step G); otherwise, executing step H); G) downloading the f-th packet stored in each remained surviving system node i.e. the packet of the system node in the f-th column of the system matrix, to get (k−1) packets of the failed node, wherein f is the row number of the system matrix in which the failed node lies; f=1, 2, . . . , k; choosing the check node corresponding to the column and downloading data from the chosen node, operating on the data downloaded from the check node according to the ID codes; and then obtaining all the data of the failed node through combining the obtained data in the column of system matrix stored in the failed system node; storing the obtained data in a new node to replace the failed system node; and H) getting the column number of the system matrix corresponding to the failed node, then downloading a packet from each system node to get the above mentioned column; coding the downloaded data with all the ID codes to get data stored in the failed node; storing the obtained data into a new node to replace the failed node.

2. The method of claim 1, wherein the step C) further comprises the following steps: C1) obtaining k ID codes; C2) choosing an ID code and a column of the system matrix, adding the given number of zeros respectively to the heads or ends of the k first packets in the chosen column according to the maximum value of the elements in the chosen ID code and the element values of the ID code corresponding to the row number in which each first packet lies, to get k second packets; and then obtaining a coded packet through operations on the k second packets; C3) repeating step C2) respectively for the chosen columns of the system matrix successively, by using different ID codes, until (n−k) coded packets are obtained; arranging the obtained coded packets to get a coded packet set; and C4) choosing k different columns of the system matrix successively, and then repeating step C2) and step C3) by using the ID codes, to get (n−k) coded packet sets.

3. The method of claim 2, wherein the step C1) further comprises the following steps: C11) checking whether k is a prime number; if yes, executing step C12); otherwise, executing step C13); C12) according to
(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(0,a,2a, . . . ,(n−1)a)mod k, a=1,2, . . . ,k substituting a=1, 2, . . . , k into the sequence (0, a, 2a, . . . , (n−1)a) with a modulus of k respectively, obtaining (n−k) ID codes; C13) finding the minimum prime number p that is larger than k, and according to
(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(a−1,2a−1, . . . ,ka−1)mod p, a=1,2, . . . ,p−1, substituting a=1, 2, . . . , p−1 respectively into the sequence (a−1, 2a−1, 2a, . . . , ka−1) with a modulus of p, obtaining (n−k) ID codes.

4. The method of claim 3, wherein the step C2) further comprises: C21) finding the maximum value of the ID codes, denoting as
r.sub.max=max(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.n.sup.a) C22) adding a given number of zeros to the head of the y-th first packet in the chosen column of the system matrix, wherein the given number equals to the y-th element value in the current ID code, while adding r.sub.max−r.sub.y.sup.a zeros to the end, to get a second packet, wherein y=1, 2, . . . , k; repeating the above steps successively for the other (k−1) first packets in the column according to a y equal to the packet's row number respectively, to get k second packets; g is the chosen column in the system matrix, and g is among 1, 2, . . . , n−k; and C23) adding the k second packets together to get a coded packet p.sub.g,j which is generated by the current ID code; and p.sub.g,j denoting the coded packet obtained through operations between the g-th column in the system matrix and the j-th ID code.

5. The method of claim 4, wherein the step C4) further comprises: C41) choosing the next ID code which is adjacent to the ID code in the step C2); C42) regarding the obtained ID code which step C41) chose as the current ID code, and then repeating step C2) and C3) until all the ID codes are used.

6. The methods of claim 5, wherein the step B) further comprises: B1) taking out the obtained first packets according to their serial number, then filling up the upper-triangle of the system matrix S line by line successively, according to the order of the columns which the elements in the system matrix lie in, to generate the upper-triangle $[\begin{matrix} c_{1} & c_{2} & .Math. & c_{k} \\ c_{k + 1} & .Math. & c_{2 k - 1} \\ .Math. & .Math. \\ c_{B} \end{matrix}];$ wherein B=k(k+1)/2; B2) folding the obtained upper-triangle along a diagonal line to get the lower-triangle so that the system matrix is denoted as $S = [\begin{matrix} c_{1} & c_{2} & .Math. & c_{k} \\ c_{2} & c_{k + 1} & .Math. & c_{2 k - 1} \\ .Math. & .Math. & .Math. & .Math. \\ c_{k} & c_{2 k - 1} & .Math. & c_{B} \end{matrix}];$ the check matrix is denoted as $P = [\begin{matrix} P_{1} \\ P_{2} \\ .Math. \\ P_{n - k} \end{matrix}] = [\begin{matrix} p_{1, 1} & p_{1, 2} & .Math. & p_{1, k} \\ p_{2, 1} & p_{2, 2} & .Math. & p_{2, k} \\ .Math. & .Math. & .Math. & .Math. \\ p_{n - k, 1} & p_{n - k, 2} & .Math. & p_{n - k, k} \end{matrix}];$ wherein P.sub.1, P.sub.2, . . . , P.sub.n−k are the coded packet sets obtained respectively while step C3) is repeated; and the step D) further comprises the following steps: combining the system matrix and the check matrix together to get a data matrix, wherein each row is stored in a storage node; the data matrix is denoted as $M = [\begin{matrix} S \\ P \end{matrix}] .$

7. The method of claim 1, wherein the step G) further comprises: G1) acquiring row number f of the failed system node, and for all surviving system node, downloading their first packets which lie in the f-th column of the system matrix; G2) downloading coded packets stored in the check node which are generated by the f-th column of the system matrix; doing inverse operations between the downloaded coded packets according to the ID codes, to get the first packets in the f-th column of the system matrix; and G3) obtaining the first packets stored in the failed nodes through the relation between the columns and rows of the system matrix.

8. The method of claim 7, wherein the step H) further comprises: H1) finding out the row number e of the coded matrix which the failed check node lies in, e=1, 2, . . . , n−k; getting k ID codes; H2) downloading the e-th first packet from k system nodes respectively, to obtain data of the e-th column of the system matrix; getting the maximum value of the ID codes, r.sub.max=max(r.sub.1.sup.a, r.sub.2.sup.a, . . . , r.sub.n.sup.a); and H3) for the obtained data in the e-th column of the system matrix, coding the data according to the acquired ID codes, to obtain the packets stored in the failed node.

9. The method of claim 8 wherein the step H3) further comprises: H31) getting the maximum value of the ID codes, r.sub.max=max(r.sub.1.sup.a, r.sub.2.sup.a, . . . , r.sub.n.sup.a); H32) adding a given number of zeros to the head of the y-th first packet in the chosen column of the system matrix, wherein the given number equals to the y-th element value in the current ID code, while adding r.sub.max−r.sub.y.sup.a zeros to the end, to get a second packet, wherein y=1, 2, . . . , k; repeating the above steps successively for the other (k−1) first packets in the column according to a y equal to the packet's row number respectively, to get k second packets; g is the chosen column in the system matrix, and g is among 1, 2, . . . , n−k; and H33) adding the k second packets together to get a coded packet p.sub.g,j which is generated by the current ID code; and p.sub.g,j denoting the coded packet obtained through operations between the g-th column in the system matrix and the j-th ID code.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates the schematic diagram of data reconstruction of EC codes in the existing technology;

(2) FIG. 2 illustrates the schematic diagram of the repair process of failed storage nodes in EC codes in the existing technology;

(3) FIG. 3 illustrates the schematic diagram of the regeneration of RGC codes in the existing technology;

(4) FIG. 4 illustrates the flow chart of the coding scheme in the implementation case of MBR codes and the node repair method in this invention;

(5) FIG. 5 illustrates the flow chart of the obtaining of the coded packets in the present invention;

(6) FIG. 6 illustrates the flow chart of the obtaining of the ID codes in the present invention;

(7) FIG. 7 illustrates the flow chart of repair of the storage nodes in the present invention; and

(8) FIG. 8 illustrates the schematic diagram of the obtaining of the coded packets in the present invention.

DESCRIPTION OF PREFERRED EMBODIMENTS

(9) The invention case will be further described in conjunction with the figures.

(10) As shown in FIG. 4, in this invention case of the coding scheme and node repair method of MSR codes, the coding method of the MSR code comprises the following steps:

(11) Step S41 equally divides the original file, obtaining the first packets: in this step, we divide the original file of size B averagely into k (k+1)/2 blocks, each of L bits, obtaining the first packets, denoted as
c.sub.i=b.sub.i,1b.sub.i,2 . . . b.sub.i,L, i=1,2, . . . ,k(k+1)/2;
in this invention case, for the sake of simplicity, as an example, we set the block size L to be 1 bit, so that there is one bit in each first packet, i.e. c.sub.i=b.sub.i,1; therefore, in this invention case, the above parameter B is equal to k (k+1)/2, i.e. B=k (k+1)/2.

(12) Step S42 constructs the system matrix with the obtained first packets: in this step, we construct a symmetrical k×k system matrix S with the above-obtained first packets, where we fill up the upper-triangular line by line successively with the sequentially obtained first packets, to generate the upper-triangular. That is to say, in this step, we get a system matrix S through the first packets obtained in the above step. In this invention case, the system matrix is symmetric, i.e. if the diagonal line of the matrix is set to be the axis, elements on each side of the axis are symmetric. Thus, in this invention, we firstly obtain the upper-triangle, and then fold it to get the whole system matrix. The method to get the upper-triangle of the system matrix is: take out the first packets successively according to their serial number (i.e. its subscript i in the formula), then put the first packet whose i=1 in the position of Row 1 Column 1 in the system matrix, put the first packet whose i=2 in the position of Row 1 Column 2 in the system matrix, put the first packet whose i=3 in the position of Row 1 Column 3 in the system matrix, and so on, we get the first column of the system matrix till the k-th first packet; because we are constructing the upper-triangle of the matrix, the first packet is put in the position of Row 2 Column 2 of the system matrix S when i=k+1, according to this method, the (k+2)-th first packet is put in Row 2 Column 3, . . . , Row 2 Column k for the (2k−1)-th first packet. In short, when constructing the upper-triangle, we fill in the first column of the system matrix S with k first packets (c.sub.1 to c.sub.k), the second column with (k−1) first packets (c.sub.k+1 to c.sub.2k−1), the third column with (k−2) first packets (c.sub.2k to c.sub.3k−2), . . . , the k-th column with one first packet (c.sub.B), therefore, there are totally k+(k−1)+(k−2)+(k−3)+ . . . +2+1=k(k+1)/2 first packets in the upper-triangle of the system matrix S, which exactly uses up the first packets obtained in the above step. We then get the system matrix by folding the obtained upper-triangle along the diagonal. That is to say, in this invention, we take out the obtained first packets in order, with which we fill up the upper-triangular of the system matrix S line by line successively, to generate the upper-triangular

(13) $[\begin{matrix} c_{1} & c_{2} & .Math. & c_{k} \\ c_{k + 1} & .Math. & c_{2 k - 1} \\ .Math. & .Math. \\ c_{B} \end{matrix}];$
wherein B=k(k+1)/2;
then fold the above-obtained upper-triangular to get the lower-triangular so that the system matrix is denoted as

(14) 0 $S = [\begin{matrix} c_{1} & c_{2} & .Math. & c_{k} \\ c_{2} & c_{k + 1} & .Math. & c_{2 k - 1} \\ .Math. & .Math. & .Math. & .Math. \\ c_{k} & c_{2 k - 1} & .Math. & c_{B} \end{matrix}];$

(15) Step S43 constructs ID codes, and gets the coded packet set with the ID codes: in this step, we generate k ID codes, each of which contains k elements (the construction of ID codes will be expatiated later on); then, add given number of zeros to the head of the first packet in a column of the system matrix S in accordance with the corresponding elements of the ID code, to get k second packets; operate on the k second packets to get a coded packet; repeat the above steps with the above-mentioned column of the system matrix and different ID codes to obtain k coded packets; arrange the k coded packets according to the serial number to get a coded packet set P.sub.g=p.sub.g,1 p.sub.g,2 . . . p.sub.g,k, where g=1, 2, . . . , n−k and p.sub.g,k is the coded packet obtained from the g-th ID code and the k-th column of the system; then repeat the above steps with (n−k) different columns of the system matrix to obtain (n−k) coded packet sets. In other words, in this invention case, add zeros to the first packets in a certain row according to a certain ID code (e.g. the ID code with the serial number 1) to get k second packets, and we can obtain a coded packets through XOR operations between the k second packets; deal as the above steps with the same column according to successively different ID codes to obtain k coded packets; sequentially arrange the coded packets in accordance with the corresponding ID codes to obtain a coded packet set. Repeat the above steps with other (n−k−1) columns in the system matrix to obtain (n−k) coded packet sets totally. It deserves to be mentioned that in this invention we can choose the first to the (n−k)-th columns successively to realize the above steps.

(16) Step S44 constructs the check matrix with the coded packet sets: in this step, we construct a (n−k)×k check matrix P, whose rows are the successively arranged coded packet sets P.sub.g; i.e. each row of the check matrix P is a coded packet set obtained in the above steps; the first columns in each row are the coded packets obtained from the first ID code, the second columns in each row are the coded packets obtained from the second ID code, and so on, the k-th columns in each row are the coded packets obtained from the k-th ID code; different rows in the check matrix are the coded packet sets obtained from different columns of the system matrix S; generally, the rows of the check matrix are arranged in accordance with the order of the column number of the system matrix S; for instance, the coded packet set obtained from the first column of S is the first row in the check matrix P, the coded packet set obtained from the second column of S is the second row in the check matrix P, the coded packet set obtained from the third column of S is the third row in the check matrix P, and so on.

(17) Step S45 stores each row of the system matrix and check matrix in different storage nodes: in this step, store the first packets in each row of the system matrix in a storage node respectively, to get k system nodes; store each row of the check matrix in a storage node to obtain (n−k) check nodes, through which we realized coding and storing. n is the total number of storage nodes.

(18) In the step S43 in this invention, the obtaining of the coded packets specifically comprises the following steps:

(19) Step S51 gets the ID codes: in this step, we obtain k ID codes (specific steps to get the ID codes will be expatiated later on). Each ID code contains k elements, which indicate the number of zeros added to the heads of the first packets corresponding to the elements when second packets are generated.

(20) Step S52 add given number of zeros to the heads of the first packets in according with the ID codes: in this step, we firstly choose a certain column of the system matrix, e.g. the first column of S; then we choose an ID code, e.g. the first ID code which contains k elements; we can then obtain k second packets through operating on the chosen column in according with the chosen first ID code. The operating process is as follows: add given number of zeros to the first first packet in the first column where the given number equals to the first element of the chosen ID code; find the maximum value of the elements in all the ID codes, and then subtract the first element of the above ID code from the maximum value to get a certain value; of which we add the number of zeros to the end of the first first packet in the chosen column to get a second packet; the number of zeros added to the end of the packet is r.sub.max−r.sub.i.sup.a, where r.sub.max=max(r.sub.1.sup.a, r.sub.2.sup.a, . . . , r.sub.n.sup.a) is the maximum value of the elements in all the ID codes which is calculated in advance and generally is k−1; r.sub.i.sup.a is the current element in according with the first packet during this operation. Thus we obtain a second packet (e.g. corresponds to the first packet in Column 1 Row 1 of the system matrix S); repeat the above step for the first packets in different rows of the chosen column of the system matrix S to obtain k second packets.

(21) Step S53 operates on the above-obtained k second packets to get the coded packet: in this step, as mentioned above, we obtain k second packets from the chosen column of the system matrix S with an ID code, in this step, we operate on the above-obtained k second packets to get a coded packet. It deserves to be mentioned that operations and additions in this step all denote XORs between each other. Repeat the above steps S52 and S53 for the above-chosen column of S with the remained (k−1) ID codes to get totally k coded packets which are obtained from the same column of the system matrix S with different ID codes. Arrange the k coded packets as one row in according with the corresponding serial numbers of the ID codes to obtain a coded packet set.

(22) Step S54 gets (n−k) coded packet sets: repeat steps S52 and S53 for different columns of the system matrix, till we obtain (n−k) coded packet sets which make the redundancy symbols. Arrange the coded packet sets as one column in according with the column numbers that are used when generating the sets (in general, we choose Column 1, 2, 3, . . . , (n−k) successively in the system matrix S), then we finally obtain the coded matrix P.

(23) Moreover, in this invention case, the process to obtain the coded packets is shown in FIG. 8 which illustrates the conversion relationship among first packets, second packets and coded packets.

(24) FIG. 6 illustrates the specific steps to obtain the coded packets in this invention case, includes:

(25) Step S61 decides whether k is a prime number, if yes, execute step S62, otherwise, S63; in this invention case, k is the one in which we divide the original file into k (k+1)/2 pieces, and it's determined in advance.

(26) Step S62 gets k ID codes according to
(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(0,a,2a, . . . ,(n−1)a)mod k, a=1,2, . . . , k:
in this invention case, as above, substitute a=1, 2, . . . , k into the sequence (0, a, 2a, . . . , (n−1)a) with a modulus of k respectively, obtaining kID codes. Substitute each a into the sequence with modulus operation to get an ID code, where the value of a is exactly the serial number of the ID codes.

(27) Step S63 finds the minimum prime number p that is larger than k and get k ID codes according to
(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(a−1,2a−1, . . . ,ka−1)mod p, a=1,2, . . . , p−1:
substitute a=1, 2, . . . , p−1 respectively into the sequence (a−1, 2a−1, 2a, . . . , ka−1) with a modulus of p, obtaining k ID codes. Substitute each a into the sequence with modulus operation to get an ID code, where the value of a is exactly the serial number of the ID codes.

(28) There will always be node failures in actual distributed storage systems. At this time, new nodes should be introduced to substitute the failed nodes in order to keep system redundancy in a certain range. The process is called node regeneration. In this invention case, to regenerate a failed node and minimize the repair bandwidth, we can do as follows:

(29) Step S71 confirms the node failure: in this step, we confirm that a certain node fails.

(30) Step S72 decides the type of the failed node: in this step, we decide the type of the failed node, with regard to this invention case, there are two types: one is system node, which stores a row of the system matrix, and there are totally k system nodes; the other is check node, which stores a row of the check matrix P, and there are totally (n−k) check nodes; in this step, we judge that whether the failed node is a system node or a check node, if it's a system node, we should further find out the serial number of the failed node in all the system nodes, for example, if the f-th system node fails, where f values between 1 and k, we next execute step S73; if it's a check node, we should further find out the serial number of the failed node in all the check nodes, for example, if the i-th check node fails, where i values between 1 and (n−k), we next execute step S74. It deserves to be mentioned that the judge method bases on actual disposition of the nodes. For instance, in this invention case, there are totally n storage nodes, among which the first k are system nodes while the subsequent (n−k) are check nodes. When allocating storage nodes or data stored in these nodes, the allocation information will be recorded, which is usually stored in the form of metadata. The system here can be comprehended as a server, which is in charge of scheduling, management and the judging of node failures. Therefore, we can get the information that a storage node is a system node or a check node, and the corresponding original column of the system matrix as well.

(31) Step S73 downloads the f-th packet of each remained surviving system node and data in the corresponding check node to recover data stored in the failed node: in this step, we download the f-th packet stored in each remained valid system node (i.e. the packet of the system node in the f-th column of the system matrix), to get (k−1) packets of the failed node (we assume that the node will be repaired immediately when it's found failed), the parameter f is the column number of the system matrix in which the failed node lies; f=1, 2, . . . , k; operate on the data downloaded from the corresponding check node according to the ID codes, to obtain the all the data of the failed node; store the obtained data in a new node to replace the failed system node. In this step, we can always get the corresponding original column for every failed node, because the related coding information of the failed node can be obtained from the above-mentioned system.

(32) That is to say, in this step, we acquire the row number f of the failed system node, and for all remained surviving system node, download their first packets which lie in the f-th column of the system matrix; download coded packets that stored in the check node which is generated by the f-th column of the system matrix; do inverse operations between the downloaded coded packets according to the ID codes, to get the first packets in the f-th column of the system matrix; then we can obtain the lost first packets through the relation between the columns and rows of the system matrix.

(33) Step S74 gets the column number corresponding to the failed node in the system matrix, and download the column with which we obtain the data stored in the failed check node through coding: in this step, we firstly get the column number e, (e=1, 2, . . . , n−k) that generated the failed node (the coded packet set stored in this failed node is obtained through operations between the packets in the mentioned column of the system matrix according to ID codes), then download a packet from each system node to get the above mentioned column (i.e. the e-th column of the system matrix); code the download data with all the ID codes (the same operations as coding) to get data formerly-stored in the failed node; store the obtained data into a new node to replace the failed node.

(34) Step S75 finishes the repair process: we execute this step after step S73 or S74, substituting the new storage node for the failed node detected in the above step to accomplish node repair.

(35) That is to say, for the failure of check nodes, we need to find out the row number e of the coded matrix that the failed check node lies in, e=1, 2, . . . , n−k; and then get k ID codes (the ID codes are known and stored in advance); then download the e-th first packet from k system nodes respectively, to obtain data of the e-th column of the system matrix; for the obtained e-th column, they will be coded according to the acquired ID codes, to obtain the packets stored in the failed node.

(36) When coding, we firstly get the maximum value of the mentioned ID codes, i.e. r.sub.max=max(r.sub.1.sup.a, r.sub.2.sup.a, . . . , r.sub.n.sup.a); add given number (equal to the y-th element value in the current ID code) of zeros to the head of the y-th first packet in the chosen column, while r.sub.max−r.sub.y.sup.a bits zeros to the end, to get a second packet, where y=1, 2, . . . , k; repeat the above steps for the other (k−1) first packets in the column with a y equal to the packet's row number, to get k second packets; g is the chosen column in the system matrix, and g is among 1, 2, . . . , n−k; add the k second packets together to get a coded packet p.sub.g,j which is generated by the current ID code and denotes the coded packet obtained through operations between the g-th column in the system matrix and the j-th ID code.

(37) In this invention case, the k original packets (each with L bits), might as well be denoted as c.sub.i=b.sub.i,1 b.sub.i,2 . . . b.sub.i,L, i=1, 2, . . . , k. The difficulty lies in successfully finding out (n−k) independent coded packets that make arbitrary k out of n packets (include original packets and coded packets) are linearly independent. We generally call the packets that meet the above conditions (n, k) independent.

(38) The following simple example is based on a file B={c.sub.1,c.sub.2}, which contains two packets c.sub.1, c.sub.2. It's obvious that there are three linear independent packets {c.sub.1, c.sub.2, c.sub.1⊕c.sub.2} if XOR coding is applied. However, this can't meet the requirements of distributed storage systems. If we add one bit ‘0’ to the head and one bit ‘0’ to the end of the packet c.sub.1, denoting the changed packet as c.sub.i(r.sub.i) where r.sub.i is the number of zeros added to the head of the packet c.sub.i, the packets and coded packet after changing are linear independent in terms of the above three packets.

(39) Generally speaking, k original packets (with a length of L bits) are might as well denoted as c.sub.i=b.sub.i,1 b.sub.i,2 . . . b.sub.i,L, i=1, 2, . . . , k, while the coded packet y.sub.a is given by the following: y.sub.a=c.sub.1(r.sub.1)⊕c.sub.2(r.sub.2)⊕ . . . ⊕c.sub.k(r.sub.k). The total number of zeros added to both the head and the end of each packet c.sub.i is r.sub.max=max{r.sub.1, r.sub.2, . . . , r.sub.k}. The unique ID code for the coded packet y.sub.a is ID.sub.a=(r.sub.1.sup.a, r.sub.2.sup.a, . . . , r.sub.k.sup.a). It can be seen that adding r.sub.i redundancy bits to the head of the packet c.sub.i equals to the operation c.sub.i (r.sub.i)=2.sup.r.sup.max.sup.−r.sup.ic.sub.i.

(40) For any prime number k, the unique ID code of the coded packet y.sub.a can be represented as
ID=(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(0,a,2a, . . . ,(k−1)a)mod k, a=1,2, . . . , k. (12)
then packets {c.sub.1, c.sub.2, . . . , c.sub.k, y.sub.1, y.sub.2 . . . , y.sub.n−k} obtained from the above coding method are linear independent. For instance, if k=5, the corresponding ID codes are ID.sub.1=(0, 1, 2, 3, 4).sub.1, ID.sub.2=(0, 2, 4, 1, 3).sub.2, ID.sub.3=(0, 3, 1, 4, 2).sub.3, ID.sub.4=(0, 4, 3, 2, 1).sub.4, ID.sub.5=(0, 0, 0, 0, 0).sub.5.

(41) Similarly, for a non-prime positive integer k, we can choose the minimum prime number p which is larger than k. Then the ID code can be denoted as
(r.sub.1.sup.a,r.sub.2.sup.a, . . . ,r.sub.k.sup.a)=(a−1,2a−1, . . . ,ka−1)mod p, a=1,2, . . . ,p−1.
(r.sub.1.sup.p,r.sub.2.sup.p, . . . ,r.sub.k−1.sup.p)=(0,0, . . . ,0).
For instance, when k=4, we choose p=5, then the corresponding ID codes are ID.sub.1=(0, 1, 2, 3).sub.1, ID.sub.2=(1, 3, 0, 2).sub.2, ID.sub.3=(2, 0, 3, 1).sub.3, ID.sub.4=(3, 2, 1, 0).sub.4, ID.sub.5=(0, 0, 0, 0).sub.5.

(42) In summary, for an arbitrary positive integer k: if k is a prime number, we can generate (n, k) linear independent packets by adding (k−1) bits zeros to k original packets. If k is not a prime integer, we can as well generate the (n, k) linear independent packets, while we now add (p−2) bits zeros to the original packets (p is a prime number larger than k).

(43) MBR code with parameters (n, k, d) generally contains n nodes, denoted as {N.sub.1, N.sub.2, . . . , N.sub.n}. BMBR code applies in the system that contains n nodes, each of which stores k packets. We choose any k nodes to store the original file ({S.sub.i}.sub.i=1, 2, . . . , k), which is called system code; then the other (n−k) nodes are usually called check nodes in which coded packets are stored.

(44) The data object of size B is divided into k (k+1)/2 packets of equal size L bits. Let S be a (k×k) symmetric system matrix whose upper-triangle is filled up with the set {c.sub.i}.sub.i=1, 2, . . . , B. Since S is strictly symmetric, the full system matrix S can be represented as

(45) $S = [\begin{matrix} S_{1} \\ S_{2} \\ .Math. \\ S_{k} \end{matrix}] = [\begin{matrix} c_{1} & c_{2} & .Math. & c_{k} \\ c_{2} & c_{k + 1} & .Math. & c_{2 k - 1} \\ .Math. & .Math. & .Math. & .Math. \\ c_{k} & c_{2 k - 1} & .Math. & c_{B} \end{matrix}],$
Similarly, the check matrix P is represented as

(46) $P = [\begin{matrix} P_{1} \\ P_{2} \\ .Math. \\ P_{n - k} \end{matrix}] = [\begin{matrix} p_{1, 1} & p_{1, 2} & .Math. & p_{1, k} \\ p_{2, 1} & p_{2, 2} & .Math. & p_{2, k} \\ .Math. & .Math. & .Math. & .Math. \\ p_{n - k, 1} & p_{n - k, 2} & .Math. & p_{n - k, k} \end{matrix}],$
wherein P.sub.i,j is the coded packet obtained through XOR operations between the packets of a unique ID code (0, i, 2i, . . . , (k−1)i).sub.i mod k. For example, when k=3,

(47) $S = [\begin{matrix} S_{1} \\ S_{2} \\ S_{3} \end{matrix}] = [\begin{matrix} c_{1} & c_{2} & c_{3} \\ c_{2} & c_{4} & c_{5} \\ c_{3} & c_{5} & c_{6} \end{matrix}] .$
If L=3, the packet file to be stored can be denoted as

(48) $M = [\begin{matrix} S \\ P \end{matrix}] .$

(49) If a system node S.sub.i (i=1, 2, . . . , k) fails, we need to introduce a new node to replace it, while we can download a packet from each one of any k nodes for repairing. Specifically, all the chosen nodes transmit the i-th packet to the new node. If a check node P.sub.i (i=1, . . . , n−k) fails, we need to introduce a new node to replace it as well, while we can download one packet from each system node for coding. The coding process adopts the ID code of the failed node and transmits the coded packet to the new node. As can be seen obviously, data stored in the repaired node are the same as the failed node, since the ID code during the repair process is exactly the same as the failed node.

(50) BMBR code can always exactly repair the failed node whether it's a system node or a check node, and satisfy the bandwidth boundary of min-cut. Therefore, BMBR is optimal with regard to repair bandwidth.

(51) BMBR code is as well a kind of MBR code, which satisfies all the MDS properties. That is to say, the original file B can be recovered from any k nodes. The needed bandwidth during the regenerating process is generally k.sup.2, which is obvious not the optimal. We will give the optimal regenerating process which needs the least repair bandwidth as follows.

(52) The optimal regeneration is shown as follows: the data collector (DC) can download any k packets from the first column of the data matrix M, then any (k−1) packets from the second column, any (k−2) packets from the third column, till only one packet from the k-th column. It can be seen from the construction of BMBR, k packets in any column of the matrix M are mutually independent. Meanwhile, since S is symmetric, we can see that the packets chosen for regeneration are from the lower-triangle of S and check matrix P. Therefore, DC can acquire B linearly independent packets to finally decode out the original file.

(53) It's observed from the above regeneration that the amount of data downloaded during the whole process is exactly the size of the original file B, which reaches the theoretical optimal repair bandwidth.

(54) When evaluating the performance of BMBR, we mainly analyze and compare computation complexity of coding, decoding and repairing between BMBR code proposed in this invention case and traditional RGC and RS codes.

(55) Coding Computation Complexity

(56) For BMBR codes, there are totally (n−k) check nodes each stores k coded packets, where each packet is obtained from k original packets through XOR operations. Therefore, coding computation complexity is k(n−k)(k−1) XOR operations.

(57) As for RGC (based on GF(q)), similarly, there are (n−k) check nodes in the system each stores k coded packets. The difference is that the coded packets are obtained by XOR operations between corresponding polynomial coefficients chosen on GF(q) according to k original packets. Thus, coding computation complexity of traditional RGC code is k(n−k)(k−1) XOR operations, additionally with k.sup.2(n−k) multiplications on Galois Field GF(q).

(58) With regard to RS codes, the size of the original file is B=k (k+1)/2 while each node can only store one packet. We generally need to store (k+1)/2 times data that RS (n, k) code needs in order to store the file B. The coding process of RS code is similar to RGC, therefore the coding computation complexity of RS code is (k−1)(k+1)(n−k)/2 XOR operations and k(k+1)(n−k)/2 multiplications on Galois Field GF(q).

(59) Repairing Computation Complexity

(60) For the repair process of BMBR codes, if a system node and a check node fail at the same time, the system node can be regarded to possess a higher priority than the check node. In other words, the failed system node will be repaired before the check node. In order to repair a system node, we at least need one check node while at most k check nodes, thus the computation complexity to repair a system node is at least (k−1), at most k(k−1) XOR operations. It needs k system nodes to repair a check node, thus the computation complexity to repair a check node is k(k−1) XORs.

(61) In order to repair a RGC node, k assistant nodes collect k packets together into the newly introduced node, in which we regenerate the former failed packet through operations on the k packets. Therefore, computation complexity of the whole repair process is at least 2k.sup.2 multiplications on Galois Field and (2k(k−1)) XORs.

(62) For RS codes, in order to repair a failed node we need to download data that has the same size as the original file to reconstruct the original file, after which we generate the packet that is stored in the failed node through coding. Computation complexity of repairing is (k.sup.2(k+1)/2+k) multiplications on Galois Field and (k.sup.2(k+1)/2+k−1) XORs.

(63) Decoding Computation Complexity

(64) In order to recover the original file, BMBR codes need k (k−1)(k+1)/2 XOR operations. Similarly, the decoding computation complexity of RGC codes is k.sup.3 multiplications on Galois Field and k.sup.3 XORs, while that of RS codes is k.sup.2(k+1)/2 multiplications on Galois Field and k.sup.2(k+1)/2 XORs.

(65) We summarize the computation complexity of coding, decoding and repairing of BMBR, traditional RGC and RS codes in the following table:

(66) TABLE-US-00001 Encoding Repairing Decoding Codes Computation Computation Computation BMSR k(k − 1)(n − k) .Math. X k(k − 1) .Math. X k(k.sup.2 − 1)/2 .Math. X RS (k.sup.2 − 1)(n − k)/2 .Math. X (k.sup.2(k + 1)/ k.sup.2(k + 1)/2 .Math. X (k.sup.2 + k)(n − k)/2 .Math. M 2 + k − 1) .Math. X k.sup.2(k + 1)/2 .Math. M (k.sup.2(k + 1)/2 + k) .Math. M RGC k(k − 1)(n − k) .Math. X 2k(k − 1) .Math. X k.sup.3 .Math. X k.sup.2(n − k) .Math. M 2k.sup.2 .Math. M k.sup.3 .Math. M

(67) In the table, BMSR, RGC and RS denote the codes, X denotes XOR operation and M denotes multiplications on Galois Field.

(68) Compared with traditional RGC codes, the biggest advantage of BMBR codes proposed in this invention case lies in that BMSR has greatly decreased the computation complexity during coding and decoding, resulting from the substitution of simple XOR operations for complex operations on Galois Field. The construction of traditional RGC codes is based on Galois Field GF(q), additions, subtractions and multiplications of Galois Field are involved in the coding and decoding processes. Even though research on Galois Field operations is relatively mature, it's very tedious and long time consuming in practice, which obviously cannot meet the fast and reliable design criterions in today's distributed storage systems. However, BMBR code performs different. Its coding and decoding only limit to fast XOR operations, which greatly improves node repair and data reconstruction rate, resulting in a high application value and development potential in practical distributed storage systems.

(69) BMBR (Binary Minimum Bandwidth Regenerating) codes not only decrease the system computation complexity, but also guarantee that the required bandwidth in the repair process is the minimum (i.e. the same size as the original file), without redundant bandwidth. In today when the bandwidth resource is more and more precious, the benefits of BMSR are obvious. BMSR codes can guarantee: the lost coded packet can be repaired with some sub-sets of other surviving coded packet; the lost packet can be repaired with a give number of fragments, where the given number only relates to the amount of lost fragments but have nothing to do with what exactly is lost. At the same time, data stored in the repaired node is the same as that in the original failed node of BMSR codes, which is called exact repair, that has decreased system operation complexity to a great extent (e.g. metadata update, data broadcast after updating).

(70) The above mentioned cases only expressed some modes of execution of this invention, whose description is relatively specific and detailed but cannot consequently be considered as the restrictions of this invention. It should be pointed out that, for the ordinary technical persons in this field, there can be some deformations and improvements under the premise of not departing from the inventive concept which are within the protection scope of this invention. Therefore, the protection scope of this invention patent should be subject to the attached claims.

Construction of MBR (minimum bandwidth regenerating) codes and a method to repair the storage nodes

Assignee

Inventors

Cpc classification

Classification Explorer

H03M13/1515

ELECTRICITY

Classification Explorer

G06F11/1076

PHYSICS

Classification Explorer

H03M13/3761

ELECTRICITY

Classification Explorer

H03M13/373

ELECTRICITY

Classification Explorer

G06F2211/1028

PHYSICS

Classification Explorer

H03M13/616

ELECTRICITY

Classification Explorer

G06F11/1088

PHYSICS

International classification

Classification Explorer

H03M13/00

ELECTRICITY

Classification Explorer

H03M13/15

ELECTRICITY

Classification Explorer

H03M13/37

ELECTRICITY

Classification Explorer

G06F11/10

PHYSICS

Abstract

Claims

Description