Methods of data concurrent recovery for a distributed storage system and storage medium thereof

Abstract

Disclosed is a method of data concurrent recovery for a distributed storage system, that is, a method for synchronous repair of multiple failed nodes with a minimum recovery bandwidth when a node in a distributed storage system fails. First an assistant node is selected to get helper data sub-block, then the repair matrix related to the data block stored in the node to be repaired is constructed, and finally the lost data block is reconstructed by multiplying the repair matrix and the helper data helper data sub-block; the missing data block is reconstructed by decoding, wherein the node to be recovered includes all failed systematical nodes, or all or partly failed parity nodes. The method is applicable to concurrently recover multiple failed nodes at minimal recovery bandwidth, and the nodes to be recovered are selected according to the demand to reduce the recovery bandwidth as much as possible.

Claims

1. A distributed storage system with data concurrent recovery function, the system using minimum storage generating code C(n,k,d) based on interference alignment to encode data needed to be stored, the data being equally divided into k segments and then encoded into n data blocks denoted by b.sub.1, b.sub.2, . . . , b.sub.n each data block containing α sub-blocks, α=d−k+1, d representing repair degree, first k data blocks B=[b.sub.1, b.sub.2, . . . , b.sub.k].sup.T representing systematical blocks, rest m=n−k data blocks v.sub.j C=[c.sub.1, c.sub.2, . . . , c.sub.m].sup.T=[b.sub.k+1, b.sub.k+2, . . . , b.sub.n].sup.T representing parity blocks, then C=GB, G representing encoding matrix, each sub encoding matrix being G.sub.ij=ū.sub.iv.sub.j.sup.T+p.sub.jiI, p.sub.ji being entry of m×m matrix P, ū.sub.i and v.sub.j being sub-vectors of first α elements of u.sub.i and v.sub.j respectively, U=[u.sub.1, u.sub.2, . . . , u.sub.m] and V=[v.sub.1, v.sub.2, . . . , v.sub.m] being m×m invertible matrices, the system comprising multiple storage nodes each holding a data block, the multiple storage nodes residing on one or more computers each comprising a processor and a storage, the multiple storage nodes comprising systematical nodes that store the systematical blocks, parity nodes that store the parity blocks, and a regenerating node that collects helper data from surviving systematical nodes and surviving parity nodes and regenerates lost data blocks on failed nodes, each surviving node that offers helper data sub-blocks being an assistant node, wherein the system is configured to perform a method of data concurrent recovery comprising following steps: step 1: obtaining helper data sub-blocks from an assistant node: letting N={N.sub.i|i=1, . . . , n} denote n storage nodes, t.sub.1 number of failed systematical nodes, t.sub.2 number of failed parity nodes, t=t.sub.1+t.sub.2≥1 total number of fail nodes, N.sub.1, . . . ,N.sub.t.sub.1 and N.sub.k+1, . . . , N.sub.k+t.sub.2 the failed systematical nodes and parity nodes respectively, then selecting assistant nodes from surviving nodes, defining actual repaired nodes to constitute a list of nodes to be recovered, the list containing all or part of t.sub.1 failed systematical nodes and t.sub.2 failed parity nodes, for each missing systematical data block b.sub.i each assistant node N.sub.j encoding its own data block to be h.sub.j,i=b.sub.jv.sub.i, resulting in a vector {right arrow over (h)}.sub.i=[h.sub.t.sub.1.sub.+1,i, . . . , h.sub.k,i, h.sub.k+t.sub.2.sub.+1,i, . . . , h.sub.d+1,i].sup.T form by helper data sub-block, for each missing parity block c.sub.i each assistant node N.sub.j encoding its own data block to be g.sub.j,i=b.sub.jū.sub.i, resulting in a vector {right arrow over (g)}.sub.i=[g.sub.t.sub.1.sub.+1,i, . . . , g.sub.k,i, g.sub.k+t.sub.2.sub.+1,i, . . . , g.sub.d+1,i].sup.T form by helper data sub-block, and sending helper data sub-block to a regenerating node; step 2: setting a sub-repair matrix related to missing data blocks: a) setting a sub-repair matrix related to missing systematical data blocks: setting an α×(tα) matrix: $Φ_{i} = [\begin{matrix} \begin{matrix} \begin{matrix} p_{11} {\overline{v}}_{i}^{T} \\ p_{12} {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ p_{1 α} {\overline{v}}_{i}^{T} \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} .Math. \\ .Math. \end{matrix} \\ ⋱ \end{matrix} \\ .Math. \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} p_{i - 1, 1} {\overline{v}}_{i}^{T} \\ p_{i - 1, 2} {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ p_{i - 1, α} {\overline{v}}_{i}^{T} \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} {\overline{u}}_{1}^{T} + p_{i 1} {\overline{v}}_{i}^{T} \\ {\overline{u}}_{2}^{T} + p_{i 2} {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ {\overline{u}}_{α}^{T} + p_{i α} {\overline{v}}_{i}^{T} \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} p_{i + 1, 1} {\overline{v}}_{i}^{T} \\ p_{i + 1, 2} {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ p_{i + 1, α} {\overline{v}}_{i}^{T} \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} .Math. \\ .Math. \end{matrix} \\ ⋱ \end{matrix} \\ .Math. \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} p_{_{t_{1}, 1}} {\overline{v}}_{i}^{T} \\ p_{_{t_{1}, 2}} {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ p_{_{t_{1},} α} {\overline{v}}_{i}^{T} \end{matrix} \underset{\underset{totally t_{2} - {\overline{v}}_{i}^{T}}{︸}}{\begin{matrix} \begin{matrix} \begin{matrix} - {\overline{v}}_{i}^{T} \\ 0 \end{matrix} \\ .Math. \end{matrix} \\ 0 \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ - {\overline{v}}_{i}^{T} \end{matrix} \\ .Math. \end{matrix} \\ 0 \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} .Math. \\ .Math. \end{matrix} \\ ⋱ \end{matrix} \\ .Math. \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0 \\ 0 \end{matrix} \\ .Math. \end{matrix} \\ 0 \end{matrix}}]$ where last t.sub.2α columns contain t.sub.2 −v.sub.i.sup.T aligned upper-diagonally, setting an α×(d−t+1) matrix: $Γ_{i} = [\begin{matrix} - p_{t_{1} + 1, 1} & ... & - p_{k, 1} & 0 & ... & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{t_{1} + 1, t_{2} + 1} & ... & - p_{k, t_{2} + 1} & 1 & ... & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{t_{1} + 1, α} & ... & - p_{k, α} & 0 & ... & 1 \end{matrix}]$ where right bottom corner thereof is an (α−t.sub.2)×(α−t.sub.2) identity matrix, b) setting a sub-repair matrix related to missing parity blocks: setting an α×d matrix Ψ.sub.i: $Ψ_{i} = {κU}^{'} [\begin{matrix} - p_{11} & .Math. & - p_{k, 1} & 1 & .Math. & 0 & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{1, i - 1} & .Math. & - p_{k, i - 1} & 0 & .Math. & 1 & 0 & .Math. & 0 \\ κ^{- 1} p_{1, i} & .Math. & κ^{- 1} p_{k, i} & 0 & .Math. & 0 & 0 & .Math. & 0 \\ - p_{1, i + 1} & .Math. & - p_{k, i + 1} & 0 & .Math. & 0 & 1 & .Math. & 0 \\ .Math. & .Math. & .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{1 α} & .Math. & - p_{k, α} & 0 & .Math. & 0 & 0 & .Math. & 1 \end{matrix}] + [\begin{matrix} 1 & .Math. & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & ⋱ & .Math. \\ 0 & .Math. & 1 & .Math. & 0 \\ 0 & .Math. & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & ⋱ & .Math. \\ 0 & .Math. & 0 & .Math. & 0 \end{matrix}]$ wherein Ψ.sub.i is composed of: a sum of a matrix in a left bracket and a matrix in a right bracket, wherein last α−1 columns of the matrix in left bracket containing two identity matrices, upper-left corner of the matrix in right bracket being a k×k identity matrix and other entries thereof being all zero, setting a sub-repair matrix corresponding to c.sub.i by using Ψ.sub.i: for each j∈J.sub.i, calculating Ω.sub.i,j=−ψ.sub.ijū.sub.i.sup.T, where ψ.sub.ij denotes jth column of Ψ.sub.i, J.sub.i={j|j+1, . . . , t.sub.1, k+1, . . . , k+t.sub.2−1} denotes column indices of Ψ.sub.i corresponding to missing blocks except c.sub.i, then letting Δ.sub.i=[Ω.sub.i,1, . . . , Ω.sub.i,t.sub.1, Ω.sub.i,k+1, . . . , Ω.sub.i,k+i−1, I, Ω.sub.i,k+i, . . . , Ω.sub.i,k+t.sub.2.sub.−1], and denoting Ψ.sub.i.sup.J omitting columns of IDs J.sub.i by Ψ.sub.i.sup.J; and step 3: setting a repair matrix and regenerating missing data blocks: combining sub-repair matrix obtained in step 2: $[\begin{matrix} Φ_{1} \\ .Math. \\ Φ_{t_{1}} \\ Δ_{1} \\ .Math. \\ Δ_{t_{2}} \end{matrix}] [\begin{matrix} b_{1}^{T} \\ .Math. \\ b_{t_{1}}^{T} \\ c_{1}^{T} \\ .Math. \\ c_{t_{2}}^{T} \end{matrix}] = [\begin{matrix} Γ_{1} \\ ⋱ \\ Γ_{t_{1}} \\ Ψ_{1}^{J} \\ ⋱ \\ Ψ_{t_{2}}^{J} \end{matrix}] [\begin{matrix} {\overset{.Math.}{h}}_{1} \\ .Math. \\ {\overset{.Math.}{h}}_{t_{1}} \\ {\overset{.Math.}{g}}_{1} \\ .Math. \\ {\overset{.Math.}{g}}_{t_{2}} \end{matrix}]$ letting Ξ=[Φ.sub.1, . . . , Φ.sub.t.sub.1, Δ.sub.1, . . . , Δ.sub.t.sub.2].sup.T, $Θ = [\begin{matrix} Γ_{1} \\ ⋱ \\ Γ_{t_{1}} \\ Ψ_{1}^{J} \\ ⋱ \\ Ψ_{t_{2}}^{J} \end{matrix}],$ then repair matrix being R=Ξ.sup.−1Θ, and multiplying the repair matrix with the helper data sub-blocks to regenerate missing data blocks.

2. The system of claim 1, wherein the list of nodes to be recovered comprises part nodes of t.sub.1 failed systematical nodes and t.sub.2 failed parity nodes, number of failed nodes prepared to be recovered is t.sub.2′, then according to number of surviving parity nodes, number of assistant nodes needed is d−r+1, where r=t.sub.1+t.sub.2′ or r=t.sub.1+t.sub.2+d+1−n is number of nodes in the list.

3. The system of claim 1, wherein N.sub.1, . . . , N.sub.t.sub.1, and N.sub.k+1, . . . , N.sub.k+t.sub.2 represent failed systematical nodes and failed parity nodes, respectively, which does not limit to scenario where the failed systematical nodes and failed parity nodes are first t.sub.1 systematical nodes and first t.sub.2 parity nodes; the method is applicable to scenario where the failed systematical nodes and failed parity nodes are failed nodes located at any sorted position.

4. The system of claim 1, wherein Vis an identity matrix.

5. The system of claim 1, wherein the repair matrix is applicable to t<k, and when t≥k, missing data blocks are regenerated by decoding in this way: first downloading whole data segment, then decoding to obtain original data, and regenerating missing data blocks with encoding.

6. The system of claim 1, wherein Ξ is an invertible matrix, or Ξ turns to be invertible by adding a node to list of nodes to be recovered, or replacing some nodes to be recovered with other nodes; otherwise missing data is regenerated by decoding.

7. The system of claim 1, wherein the regenerating node is a proxy node in a centralize mode or a new node in a distributed mode.

8. A non-transitory computer-readable storage medium, comprising a program which can be executed by the system according to claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a schematic diagram of system model of data concurrent recovery;

(2) FIG. 2 is a schematic diagram illustrating the computation of sub-repair matrix of parity code;

(3) FIG. 3 is a schematic diagram illustrating the computation of repair matrix; and

(4) FIG. 4 is a schematic flowchart illustrating the algorithm of concurrent failure recovery.

DETAILED DESCRIPTION

(5) The present disclosure will be further described in detail below through specific embodiments in combination with the accompanying drawings. Many details described in the following embodiments are for the purpose of better understanding the present disclosure. However, a person skilled in the art can realize with minimal effort that some of these features can be omitted in different cases or be replaced by other methods. For clarity some operations related to the present disclosure are not shown or illustrated herein so as to prevent the core from being overwhelmed by excessive descriptions. For the person skilled in the art, such operations are not necessary to be explained in detail, and they can fully understand the related operations according to the description in the specification and the general technical knowledge in the field.

(6) In addition, the features, operations or characteristics described in the specification may be combined in any suitable manner to form various embodiments. At the same time, the steps or actions in the described method can also be sequentially changed or adjusted in a manner that can be apparent to those skilled in the art. Therefore, the various sequences in the specification and the drawings are only for the purpose of describing a particular embodiment, and are not intended to be an order of necessity, unless otherwise stated one of the sequences must be followed.

(7) In a method of recovering data concurrently for a distributed storage system proposed in the present disclosure, data recovery is implemented by means of: dividing the storage nodes into systematical nodes and parity nodes, constructing sub-repair matrices in relation to missing data blocks respectively, determining a repair matrix, and multiplying helper data sub-blocks with the repair matrix to obtain the missing data blocks. This method can concurrently recover one or more failed nodes in a distributed storage system.

(8) The policies of data recovery can be divided into two modes in the present disclosure. One is a centralized mode, in which an “agent” node will collect helper data obtained by encoded assistant nodes or data stored by the assistant node itself, then constructing a repair matrix and reconstructing missing data blocks, and finally transmitting the repaired data to a corresponding new node. The other one is a distributed mode, in which a new node itself will collect the helper data obtained by encoded assistant node or the data stored by the assistant node itself, and then construct a repair matrix and regenerate the missing data block. Note that the new node is only responsible for reconstructing the lost data on the failed node that it replaced. These two recovery modes have their own advantages and disadvantages and applicable occasions. The method of recovering data concurrently for a distributed storage system using minimum storage generating code based on interference alignment is applicable to these two recovery modes. In any mode and especially the distributed mode, a regenerating node probably does not need to regenerate all the missing data blocks. The regenerating node may refer to a proxy node in the centralize mode or a new node in the distributed mode. Theoretically, it has been proved that the smaller the number of missing data blocks to be recovered is, the less the required recovery bandwidth is, if there are enough number of surviving nodes. However, in our research, we find that for IA codes, all missing systematical data blocks must be jointly recovered; otherwise, the repair matrix cannot be generated and recovery will fail. Hence, an assistant node selection method is needed to ensure successful failure recovery and meanwhile minimize the recovery bandwidth usage.

(9) To make the disclosure clear and concise, it will first present the system model and notions used throughout herein. An IA code is denoted by C(n,k,d), where n is the total number of storage nodes, k is the number of “systematical nodes” on which original data is stored, and m=n−k is the number of “parity nodes” on which encoded data blocks are located. d is “repair degree” which is the number of assistant nodes needed in recovery when there is only one failed node out of the total n nodes. IA code requires that n≥2k and d≥2k−1.

(10) With IA code construction, a data segment of size B is divided into k strips of equal lengths, and then encoded into n data blocks denoted by b.sub.1, b.sub.2, . . . , b.sub.n with block ID 1, 2, . . . , n respectively. Each data block b.sub.i consists of α data sub-blocks b.sub.i=[b.sub.i1, b.sub.i2, . . . , b.sub.iα]. According to the related theory of MSR code, an IA code satisfied that
α=d−k+1 (1)

(11) Since IA codes are systematical, the first k data blocks B=[b.sub.1, b.sub.2, . . . , b.sub.k].sup.T are original uncoded data, which are also called systematical blocks, and the last m=n−k data blocks are parity blocks denoted by C=[c.sub.1, c.sub.2, . . . , c.sub.m].sup.T=[b.sub.k+1, b.sub.k+2, . . . , b.sub.n].sup.T. Therefore the whole data segment is divided into kα systematical sub-blocks and then encoded into mα parity sub-blocks, resulting in totally nα sub-blocks. The IA code construction formulas and corresponding notions here are
C=GB (2)

(12) where the encoding matrix is

(13) $\begin{matrix} G = [\begin{matrix} G_{1 1} & G_{1 2} & ... & G_{1 k} \\ G_{2 1} & G_{2 2} & ... & G_{2 k} \\ .Math. & ⋱ & .Math. \\ G_{m 1} & G_{m 2} & ... & G_{m k} \end{matrix}] & (3) \end{matrix}$

(14) Note all the computations here are on finite field GF(q). Each sub-encoding matrix is
G.sub.ij=ū.sub.iv.sub.j.sup.T+p.sub.jiI (4)

(15) where p.sub.ji is the entry of m×m matrix P at jth row and ith column, and P can be a Cauchy or Vardermonde matrix such that each sub-encoding matrix is invertible. Let U=[u.sub.1, u.sub.2, . . . , u.sub.m] and V=[v.sub.1, v.sub.2, . . . , v.sub.m] be m×m invertible matrices that
U=κ.sup.−1V′P (5)

(16) where V′=(V.sup.−1).sup.T is the dual basis of V, and κ∈GF(q) satisfying 1−κ.sup.2≠0. ū.sub.i and v.sub.j in formula (4) are sub-vectors consisting of the first α entries of u.sub.i and v.sub.j, respectively. I is an α×α unit vector and thus G.sub.ij in formula (4) is an α×α matrix.

(17) FIG. 1 depicts the system model of concurrent recovery for failed node storage data blocks based on IA code. In the shown distributed storage system, there are totally n nodes denoted by N={N.sub.i|i=1, . . . , n}. Note that the node used herein can be either physical or logical, say, multiple nodes can reside on one or more physical machines, which do not affect the effectiveness of the present disclosure. Each node N.sub.i holds a data block b.sub.i, hence N.sub.1, N.sub.2, . . . , N.sub.k are systematical nodes that store systematical blocks, while N.sub.k+1, . . . , N.sub.n are parity nodes that store parity blocks. Assume that t.sub.1 systematical nodes and t.sub.2 parity nodes fail, where t.sub.1, t.sub.2≥0 and the total number of failed nodes t=t.sub.1+t.sub.2≥1. Based on the equality between systematical nodes and parity nodes, without loss of generality, it shows that the nodes N.sub.1, . . . , N.sub.t.sub.1 and N.sub.k+1, . . . , N.sub.k+t.sub.2 fail in FIG. 1. The regenerating node collects helper data from k−t.sub.1 surviving systematical nodes and m−k.sub.2 surviving parity nodes, then regenerate the lost data blocks on the failed nodes. The surviving nodes that offer helper data sub-blocks are called “assistant node”.

(18) Before transmitting helper data sub-blocks to the regenerating nodes, each assistant node needs to encode its own data blocks, which is accomplished by calculating the inner product between α data sub-blocks and encoding vector on GF(q). If systematical block b.sub.i is lost, the encoding vector is v.sub.i′, where v.sub.i′ consists of the first α elements of the ith column of V′. And if parity block c.sub.i is lost, the corresponding encoding vector is ū.sub.i. If V is an orthogonal matrix, V′=V and v.sub.i′=v.sub.i. Further if V is an identity matrix,
v.sub.i′=v.sub.i=e.sub.i=[0, . . . 0,1,0, . . . ,0].sup.T (6)

(19) where e.sub.i is a unit vector whose ith element is 1 and other elements are all 0. In this case, if the lost block is systematical block, the assistant nodes only need to send their ith data sub-block to the regenerating node without encoding, thus the computation cost can be reduced and implementation is simpler. In actual storage systems, systematical blocks are used more frequently in read and write operations and have high priority in the recovery, hence such V=I codes are especially useful in practice. Therefore, we mainly consider this kind of IA codes in the present disclosure. If there is no special mentioning, all the following embodiments assume that V=I.

(20) In the present embodiment, the regenerating node recovers the lost data blocks through repair matrix which is obtained through the following procedure.

(21) Without loss of generality, we assume systematical blocks b.sub.1, . . . , b.sub.t.sub.1 and parity blocks c.sub.1, . . . , c.sub.t.sub.2 are missing, corresponding indices to the failed nodes are X.sub.t.sub.1.sub.,t.sub.2={1, . . . , t.sub.1, k+1, . . . , k+t.sub.2}. If the first t.sub.1 systematical blocks and the first t.sub.2 parity blocks are actually not the lost data blocks, simply rearrange the number of the data blocks and the rows and columns of the encoding matrix to map the missing systematical data blocks and parity blocks to the most front positions. The actual repaired nodes are defined to constitute the list of nodes to be recovered (which is also referred to as “recovery list” for short). Assuming that r denotes the number of nodes in the recovery list, then r=t when there are all t failed nodes to be recovered. The recovery list includes t.sub.1 failed systematical nodes and t.sub.2 failed parity nodes, the assistant nodes are selected from k−t.sub.1 surviving systematical nodes and α−t.sub.2 surviving parity nodes, where the total number of the assistant nodes is d−t+1. Since the systematical nodes and parity nodes are equal, it is possible to choose which systematical nodes and parity nodes. Without loss of generality, systematical nodes N.sub.t.sub.1.sub.+1, . . . , N.sub.k and parity nodes N.sub.k+t.sub.2.sub.+1, . . . , N.sub.d+1 are chosen to be the assistant nodes. All surviving systematical nodes must be selected as assistant nodes. Later it will show that how to select assistant nodes when the number of failed nodes to be recovered is less than t.

(22) First, we focus on the sub-repair matrix related to the missing systematical data blocks. For a missing systematical data blocks b.sub.i, each assistant node N.sub.j encodes its own data block to be h.sub.j,i=b.sub.jv.sub.i, resulting in a vector {right arrow over (h)}.sub.i=[h.sub.t.sub.1.sub.+1,i, . . . , h.sub.k,i, h.sub.k+t.sub.2.sub.+1,i, . . . , h.sub.d+1,i].sup.T formed by helper data.

(23) Then set a following α×(tα) matrix:

(24) $\begin{matrix} Φ_{i} = [\begin{matrix} p_{11} {\overline{v}}_{i}^{T} & .Math. & p_{i - 1, 1} {\overline{v}}_{i}^{T} & {\overline{u}}_{1}^{T} + p_{i 1} {\overline{v}}_{i}^{T} & p_{i + 1, 1} {\overline{v}}_{i}^{T} & .Math. & p_{t_{1}, 1} {\overline{v}}_{i}^{T} \\ p_{12} {\overline{v}}_{i}^{T} & .Math. & p_{i - 1, 2} {\overline{v}}_{i}^{T} & {\overline{u}}_{2}^{T} + p_{i 2} {\overline{v}}_{i}^{T} & p_{i + 1, 2} {\overline{v}}_{i}^{T} & .Math. & p_{t_{1}, 2} {\overline{v}}_{i}^{T} \\ .Math. & ⋱ & .Math. & .Math. & .Math. & ⋱ & .Math. \\ p_{1 α} {\overline{v}}_{i}^{T} & .Math. & p_{i - 1, α} {\overline{v}}_{i}^{T} & {\overline{u}}_{α}^{T} + p_{i α} {\overline{v}}_{i}^{T} & p_{i + 1, α} {\overline{v}}_{i}^{T} & .Math. & p_{t_{1}, α} {\overline{v}}_{i}^{T} \end{matrix} \underset{\underset{totally t_{2} - {\overline{v}}_{i}^{T}}{︸}}{\begin{matrix} - {\overline{v}}_{i}^{T} & 0 & .Math. & 0 \\ 0 & - {\overline{v}}_{i}^{T} & .Math. & 0 \\ .Math. & .Math. & ⋱ & .Math. \\ 0 & 0 & .Math. & 0 \end{matrix}}] & (7) \end{matrix}$

(25) where the last t.sub.2α columns in formula (7) contain t.sub.2 −v.sub.i.sup.T aligned upper-diagonally.

(26) Set a following α×(d−t+1) matrix:

(27) $\begin{matrix} Γ_{i} = [\begin{matrix} - p_{t_{1} + 1, 1} & .Math. & - p_{k, 1} & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{t_{1} + 1, t_{2} + 1} & .Math. & - p_{k, t_{2} + 1} & 1 & .Math. & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{t_{1} + 1, a} & .Math. & - p_{k, α} & 0 & .Math. & 1 \end{matrix}] & (8) \end{matrix}$

(28) and the right bottom corner of the matrix Γ.sub.i is an (α−t.sub.2)×(α−t.sub.2) identity matrix.

(29) Up to now, all sub-repair matrices for lost systematical blocks have been prepared. Next consider the sub-repair matrix related to missing parity blocks.

(30) For a lost parity block c.sub.i, each assistant node N.sub.j encodes its own data block to be g.sub.j,i=b.sub.jū.sub.i, resulting in a vector {right arrow over (g)}.sub.i=[g.sub.t.sub.1.sub.+1,i, . . . , g.sub.k,i, g.sub.k+t.sub.2.sub.+1,i, . . . , g.sub.d+1,i].sup.T formed by helper data.

(31) Then calculate an α×d matrix Ψ.sub.i as followings:

(32) $\begin{matrix} Ψ_{i} = {κU}^{'} [\begin{matrix} - p_{11} & .Math. & - p_{k, 1} & 1 & .Math. & 0 & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{1, i - 1} & .Math. & - p_{k, i - 1} & 0 & .Math. & 1 & 0 & .Math. & 0 \\ κ^{- 1} p_{1, i} & .Math. & κ^{- 1} p_{k, i} & 0 & .Math. & 0 & 0 & .Math. & 0 \\ - p_{1, i + 1} & .Math. & - p_{k, i + 1} & 0 & .Math. & 0 & 1 & .Math. & 0 \\ .Math. & .Math. & .Math. & ⋱ & .Math. & .Math. & ⋱ & .Math. \\ - p_{1 α} & .Math. & - p_{k, α} & 0 & .Math. & 0 & 0 & .Math. & 1 \end{matrix}] + [\begin{matrix} 1 & .Math. & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & ⋱ & .Math. \\ 0 & .Math. & 1 & .Math. & 0 \\ 0 & .Math. & 0 & .Math. & 0 \\ .Math. & ⋱ & .Math. & ⋱ & .Math. \\ 0 & .Math. & 0 & .Math. & 0 \end{matrix}] & (9) \end{matrix}$

(33) The right side of formula (9) is composed of two parts. The last α−1 columns of the matrix in the left bracket contain two identity matrices: an i−1 dimensional one on the upper-left corner and an α−i dimensional one on the lower-right corner. The upper-left corner of the matrix in the right bracket is a k×k identity matrix while other entries thereof are all 0.

(34) The sub-repair matrix corresponding to c.sub.i is calculated by using Ψ.sub.i, as shown in FIG. 2. Let ψ.sub.ij denote the jth column of Ψ.sub.i, J.sub.i={j|j=1, . . . , t.sub.1, k+1, . . . , k+t.sub.2−1} denote the column indices of Ψ.sub.i corresponding to the missing blocks except c.sub.i, and Ψ.sub.i.sup.J denote Ψ.sub.i omitting the columns of IDs J.sub.i. Calculate
Ω.sub.i,j=−ψ.sub.ijū.sub.i.sup.T (10)

(35) for every j∈J.sub.i, then let
Δ.sub.i=[Ω.sub.i,1, . . . ,Ω.sub.i,t.sub.1,Ω.sub.i,k+1, . . . ,Ω.sub.i,k+i−1,I,Ω.sub.i,k+i, . . . ,Ω.sub.i,k+t.sub.2.sub.−1] (11)

(36) Up to now we have all sub-repair matrices for the lost parity blocks.

(37) Now combine all the sub-repair matrices derived above into the whole repair equations:

(38) $\begin{matrix} [\begin{matrix} Φ_{1} \\ .Math. \\ Φ_{t_{1}} \\ Δ_{1} \\ .Math. \\ Δ_{t_{2}} \end{matrix}] [\begin{matrix} b_{1}^{T} \\ .Math. \\ b_{t_{1}}^{T} \\ c_{1}^{T} \\ .Math. \\ c_{t_{2}}^{T} \end{matrix}] = [\begin{matrix} Γ_{1} \\ ⋱ \\ Γ_{t_{1}} \\ Ψ_{1}^{J} \\ ⋱ \\ Ψ_{t_{2}}^{J} \end{matrix}] [\begin{matrix} {\overset{.Math.}{h}}_{1} \\ .Math. \\ {\overset{.Math.}{h}}_{t_{1}} \\ {\overset{.Math.}{g}}_{1} \\ .Math. \\ {\overset{.Math.}{g}}_{t_{2}} \end{matrix}] & (12) \end{matrix}$

(39) Let Ξ=[Φ.sub.1, . . . , Φ.sub.t.sub.1, Δ.sub.1, . . . , Δ.sub.t.sub.2].sup.T,

(40) $Θ = [\begin{matrix} Γ_{1} \\ ⋱ \\ Γ_{t_{1}} \\ Ψ_{1}^{J} \\ ⋱ \\ Ψ_{t_{2}}^{J} \end{matrix}],$
then if Ξ is invertible, the repair matrix can be obtained as follows:
R=Ξ.sup.−1Θ (13)

(41) and the missing data blocks can be regenerated by multiplying the repair matrix R with the helper data sub-blocks.

(42) If the actual missing data blocks are not the first t.sub.1 systematical blocks and the first t.sub.2 parity blocks, the repair matrix can be calculated in the same way as above through rearranging the number of the data blocks and the rows and columns of the encoding matrix and mapping the missing systematical data blocks and parity blocks to the most front positions.

(43) From the presentation above, it can be seen that with the concurrent recovery method disclosed herein, totally
W=t(d−t+1) (14)

(44) helper data sub-blocks are needed to regenerate t missing data blocks.

(45) Hence the method provided in the present disclosure achieves the lower bound of the network bandwidth of concurrent failure recovery. Besides, the method can save repair bandwidth only when t<min{k, α}. When t≥min{k, α}, downloading the whole data segment from the assistant node and then decoding to obtain original data and regenerating the missing data blocks with encoding is more bandwidth-saving. Since IA codes require that d>2k−1 and α≥k, hence min{k, α}=k. Therefore if it is for the purpose of saving bandwidth, the application scope of the method of the present disclosure is t<k.

(46) For some combinations of failed nodes, Ξ may be irreversible and hence the repair matrix cannot be generated through the formula (13). To solve this problem, one can add one or several nodes which may be failed or even surviving ones to the recovery list to make Ξ invertible, or replace some nodes to be recovered with other nodes which may also be failed or even surviving ones, or use decoding to regenerate missing data.

(47) The repair matrix generating process illustrated in FIG. 3 is summarized as follows:

(48) Step 1: for every i where 1≤i≤t.sub.1:

(49) calculating Φ.sub.i according to formula (7); and

(50) calculating Γ.sub.i according to formula (8);

(51) Step 2: for every i where 1≤i≤t.sub.2:

(52) calculating Ψ.sub.i according to formula (9);

(53) for every j where j∈J.sub.i:

(54) calculating Ω.sub.i,j according to formula (10); and

(55) generating Δ.sub.i and Ψ.sub.i.sup.J according to formula (11).

(56) Step 3: combining all Φ.sub.i, γ.sub.i, Δ.sub.i and Ψ.sub.i.sup.J obtained in above steps to form Ξ and Θ according to formula (12).

(57) Step 4: judging Ξ whether is invertible. If Ξ is invertible, calculate the repair matrix R according to formula (13).

(58) From the presentation above, to recover t failed nodes, only s=d−t+1 assistant nodes are needed, among them k−t.sub.1 are systematical nodes and d−t+1−(k−t.sub.1)=α−t.sub.2 are parity nodes. Since IA codes require that n≥2k and d≥2k−1, there may be more than α−t.sub.2 surviving parity nodes in a system. From the structure of the encoding matrix of IA codes, all parity nodes are equivalent and interchangeable with each other. Therefore, one can choose any α−t.sub.2 out of the n−k−t.sub.2 surviving parity nodes as assistant nodes only if is invertible.

(59) Moreover, in another embodiment, if there are enough surviving parity nodes, it may be no need to recover all the missing parity blocks, instead, one can choose to repair t.sub.2′<t.sub.2 failed parity nodes according to actual requirements. This can reduce the recovery bandwidth cost in some scenarios such as distributed recovery mode. However, unlike parity blocks, the repair matrix cannot be generated with the same way proposed above; that is, all the failed systematical nodes must be jointly recovered when regenerating lost data via the repair matrix with minimum storage generating code based on IA codes. Therefore, if a regenerating node just needs to regenerate t.sub.1′ systematical blocks and t.sub.2′ parity blocks where t.sub.1′≤t.sub.1 and t.sub.2′≤t.sub.2, at least r=t.sub.1+t.sub.2′ data blocks must be concurrently recovered in actual operation, among which t.sub.1 are systematical blocks and t.sub.2′ are parity blocks. Currently there are k−t.sub.1 surviving systematical nodes and m−t.sub.2 surviving parity nodes in the system. Consider the formula (14), d−(t.sub.1+t.sub.2′)+1−(k−t.sub.1)=α−t.sub.2′ surviving parity nodes are needed as the assistant nodes. If there is so many normal parity nodes, i.e. m−t.sub.2<α−t.sub.2′, another α−t.sub.2′−(m−t.sub.2) failed parity nodes have to be added to the list of data blocks to be recovered; and totally r=t.sub.1+t.sub.2′+α+t.sub.2′−(m−t.sub.2)=t.sub.1+t.sub.2+d+1−n nodes need to be concurrently repaired. In either case, d−r+1 assistant nodes are needed and each assistant node should offer r uncoded (for missing systematical data blocks) and/or coded (for missing parity blocks) helper data sub-blocks. Therefore the total number of helper data sub-blocks is shown in formula (15) as follows:

(60) $\begin{matrix} W = {\begin{matrix} (t_{1} + {t_{2}}^{'} (d - t_{1} - {t_{2}}^{'} + 1) & m - t_{2} \geq α - {t_{2}}^{'} and (t_{1} + {t_{2}}^{'}) < k \\ \begin{matrix} (t_{1} + t_{2} + d + 1 - n) \\ (n - t_{1} - t_{2}) \end{matrix} & \begin{matrix} m - t_{2} < α - {t_{2}}^{'} and \\ t_{1} + t_{2} + d + 1 - n < k \end{matrix} \\ k (d - k + 1) & \begin{matrix} other cases than the \\ above two conditions and \end{matrix} \\ not recoverable & otherwise \end{matrix} & (15) \end{matrix}$

(61) Thus if one of the first two conditions of formula (15) is met, the concurrent recovery method proposed above can be applied to save recovery bandwidth, and one should first choose all surviving systematical nodes then appropriate number of surviving parity nodes according to formula (15) as the assistant nodes. If there are fewer failed nodes, when V=I, all the systematical nodes can be chosen as the assistant nodes; then their stored data can be directly sent to the regenerating node without encoding the data, greatly reducing the complexity of the calculation. If the first two conditions of formula (15) are not met, decoding has to be adopted to recover data. Note that, with the concurrent recovery method for IA codes proposed by the present disclosure, the number of data blocks that are actually recovered may be different from that of the blocks the regenerating node needs to recover.

(62) Based on the above presentation, a concurrent failure recovery algorithm using IA codes for multiple data blocks is illustrated in another embodiment of the present disclosure in its entirety. The specific process thereof is shown in FIG. 4 as follows. Note the 4th condition in formula (15) when the failed data is not recoverable is not presented for brevity.

(63) Step 001: check the parameters against formula (15) to judge the applicable condition for data repair,

(64) if the 1.sup.st condition is met:

(65) Step 110: setting r=t.sub.1+t.sub.2′, selecting all k−t.sub.1 surviving systematical nodes and any α−t.sub.2′ surviving parity nodes as assistant nodes;

(66) Step 120: calculating Ξ according to the steps shown in FIG. 3;

(67) Step 130: judging whether Ξ is invertible.

(68) If Ξ is invertible, then go to

(69) Step 131: informing each assistant node to offer r uncoded (for missing systematical data blocks) and/or coded (for missing parity blocks) helper data sub-blocks to the regenerating node;

(70) Step 132: the regenerating node waiting until all helper data sub-blocks are received;

(71) Step 133: calculating the repair matrix according to the steps shown in FIG. 3;

(72) Step 134: regenerating the missing data blocks through multiplying the repair matrix with the helper data sub-block.

(73) If Ξ is not invertible, then go to step 310, or to step 135 or step 136;

(74) Step 135: adding a node to the recovery list, letting r=r+1 and re-calculating Ξ. If Ξ is invertible, go to perform step 131, otherwise go to perform step 310 or 135 or 136.

(75) Step 136: replacing a node in the recovery list, and re-calculating Ξ. If is invertible, go to perform step 131, otherwise go to perform step 310 or 135 or 136.

(76) If all possibilities for r<k have been ergodic during the execution of steps 135, 136, then go to perform step 310.

(77) If the 2.sup.nd condition is met:

(78) Step 210: besides t.sub.1 missing systematical data blocks and t.sub.2′ missing parity blocks, adding another α−t.sub.2′−m+t.sub.2 missing parity blocks to the recovery, and setting r=t.sub.1+t.sub.2+d+1−n;

(79) Step 220: selecting all the surviving nodes as assistant nodes;

(80) Then perform step 120 and step 130 to regenerate the missing data blocks.

(81) If the 3.sup.rd condition is met:

(82) Step 310: selecting k surviving nodes as assistant nodes, the regenerating node informing each of the assistant nodes to offer one uncoded data block;

(83) Step 320: the regenerating node waiting until all the helper data blocks are received;

(84) Step 330: decoding the received helper data blocks to obtain original data segment; and

(85) Step 340: regenerating the missing data blocks through encoding the original data segment.

(86) Those skilled in the art can understand that all or part of the steps of the various methods in the above-mentioned embodiments can be completed by instructing relevant hardware through a program. When all or parts of the functions in the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, which includes a program that can be executed by the processor. The storage medium may include: a read-only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, by storing the program in the memory of the computer and executing the program in the memory by the processor, all or part of the above functions can be realized. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may also be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a mobile hard disk, and saved on local device by downloading or copying; or it may update the version of the system of the local device, and when the program in the memory is executed by the processor, all or part of the functions in the foregoing embodiments can be implemented.

(87) The principle and implementation manners present disclosure has been described above with reference to specific embodiments, which are merely provided for the purpose of understanding the present disclosure and are not intended to limit the present disclosure. It will be possible for those skilled in the art to make variations based on the principle of the present disclosure.

Methods of data concurrent recovery for a distributed storage system and storage medium thereof

Assignee

Inventors

Cpc classification

Classification Explorer

G06F11/2094

PHYSICS

Classification Explorer

G06F11/1076

PHYSICS

Classification Explorer

H03M13/373

ELECTRICITY

Classification Explorer

G06F11/0709

PHYSICS

Classification Explorer

G06F11/1464

PHYSICS

Classification Explorer

H03M13/616

ELECTRICITY

Classification Explorer

G06F11/0772

PHYSICS

Classification Explorer

G06F11/1004

PHYSICS

Classification Explorer

G06F11/1469

PHYSICS

International classification

Classification Explorer

G06F11/07

PHYSICS

Classification Explorer

G06F11/10

PHYSICS

Classification Explorer

G06F11/14

PHYSICS

Abstract

Claims

Description