Stripe merging method and system based on erasure codes

11467905 · 2022-10-11

Assignee

Inventors

Cpc classification

International classification

Abstract

A stripe merging method and system based on erasure codes are provided. A StripeMerge-P algorithm is used first to determine alignment information of parity chunks of erasure code stripes based on a preprocessed hash table. Through a greedy strategy, erasure code stripe pairs to be merged are selected for merging. Through the hash table, location information of the parity chunks is directly looked up, so that no additional computing overhead is required, and the overhead of selecting and merging the erasure code stripe pairs is further reduced through the combination with the greedy strategy.

Claims

1. A stripe merging method based on erasure codes, comprising: step S1, pre-processing a plurality of erasure code stripes, determining a storage location of a parity chunk of each of the erasure code stripes, establishing a hash table, wherein the hash table indicates a location of the parity chunk of each of the erasure code stripes and a serial number of each of the erasure code stripes; step S2, determining a set of erasure code stripes that are fully aligned and partially aligned with a location of a parity chunk of a specific erasure code stripe among the erasure code stripes based on the hash table, selecting an erasure code stripe with a relatively low merging overhead with the specific erasure code stripe from the set of erasure code stripes, merging the selected erasure code stripe with the specific erasure code stripe to generate a merged stripe; step S3, deleting the selected erasure code stripe and the specific erasure code stripe from the erasure code stripes to obtain a plurality of updated erasure code stripes; step S4, repeatedly performing step S2 and step S3; step S5, calculating a merging overhead of an erasure code stripe pair formed by any two erasure code stripes among the updated erasure code stripes obtained in step S3 when two erasure code stripes that can be merged not be selected; step S6, selecting erasure code stripe pairs with a lowest merging overhead among all erasure code stripe pairs, merging the selected erasure code stripe pairs to generate a merged stripe; step S7, deleting erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 among all erasure code stripe pairs to obtain all new erasure code stripe pairs, wherein the erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 refer to each of the erasure code stripe pairs containing one of the erasure code stripes of the erasure code stripe pairs selected in step S6; and step S8, repeatedly performing step S6 and step S7 until all erasure code stripe pairs obtained in step S7 are empty.

2. The stripe merging method according to claim 1, wherein the step of selecting the erasure code stripe with a relatively low merging overhead with the specific erasure code stripe from the set of erasure code stripes in step S2 further comprises: selecting an erasure code stripe with a lowest merging overhead or with a merging overhead less than a threshold from the aligned erasure code stripes when the set of erasure code stripes includes the erasure code stripes whose locations of the parity chunks are fully aligned with the location of the parity chunk of the specific erasure code stripe; and selecting the erasure code stripe with the lowest merging overhead or with the merging overhead less than the threshold from the partially aligned erasure code stripes when the set of erasure code stripes does not include the erasure code stripes whose locations of the parity chunks are aligned with the location of the parity chunk of the specific erasure code stripe, wherein the partially aligned erasure code stripes are selected according to an order of numbers of aligned parity chunks from large to small, and each of the numbers of aligned parity chunks refer to the number of aligned parity chunks in two partially aligned erasure code stripes.

3. The stripe merging method according to claim 2, wherein the process of step S5 to step S8 is called a StripeMerge-G algorithm specifically comprising the following steps: step S10, calculating a merging overhead of any two narrow stripes in an inputted set of narrow stripes, storing a triple (c, s.sub.i, s.sub.j) in a set custom character, wherein s.sub.i and s.sub.j represent two different narrow stripes, c represents the merging overhead of s.sub.i and s.sub.j, the set of narrow stripes refers to a set of erasure code stripes, and each of the erasure code stripes corresponds to a narrow stripe; step S20, sorting all triples in the set custom character in an ascending order according to c; step S30, taking out a first triple (c*, s.sub.i*, s.sub.j*) in the set custom character, merging s.sub.i* and s.sub.j*, deleting all triples in the set custom character that contain s.sub.i* and s.sub.j*, wherein s.sub.i* and s.sub.j* represent two different narrow stripes after sorting, and c* represents the merging overhead of s.sub.i* and s.sub.j*; and step S40, repeatedly performing step S30 until the set custom character is empty.

4. The stripe merging method according to claim 3, wherein the process of step S2 to step S4 is called a StripeMerge-P algorithm specifically comprising the following steps: determining the set of narrow stripes and a set of fully aligned or partially aligned parity chucks; sorting narrow stripe pairs in the set of fully aligned or partially aligned parity chunks according to a number of aligned data chunks from large to small, sequentially looking for and merging the narrow stripe pairs with a lowest merging overhead or with a merging overhead less than the threshold; and performing the StripeMerge-G algorithm on the remaining narrow stripes.

5. The stripe merging method according to claim 1, wherein the step of merging the selected erasure code stripe with the specific erasure code stripe in step S2 specifically comprises the following steps: checking locations of data chunks in the erasure code stripes, wherein if two data chunks are stored in a specific node together, one of the data chunks is migrated to an idle node for the current merged stripe; migrating all parity chunks with a same coding coefficient into a same node; and migrating the data chuck if a data chuck is still provided in the node where the parity chunks are stored, wherein a sum of a number of parity chunks and a number of data chunks migrated in the above process is a merging overhead.

6. A stripe merging system based on erasure codes, comprising: an erasure code stripe processing unit configured for performing step S1, wherein step S1 is pre-processing a plurality of erasure code stripes, determining a storage location of a parity chunk of each of the erasure code stripes, establishing a hash table, wherein the hash table indicates a location of the parity chunk of each of the erasure code stripes and a serial number of each of the erasure code stripes; a first merging unit configured for performing sequentially step S2, S3, and step S4, wherein step S2 is determining a set of erasure code stripes that are fully aligned and partially aligned with a location of a parity chunk of a specific erasure code stripe among the erasure code stripes based on the hash table, selecting an erasure code stripe with a relatively low merging overhead with the specific erasure code stripe from the set of erasure code stripes, merging the selected erasure code stripe with the specific erasure code stripe to generate a merged stripe, wherein step S3 is deleting the selected erasure code stripe and the specific erasure code stripe from the erasure code stripes to obtain a plurality of updated erasure code stripes, wherein step S4 is repeatedly performing step S2 and step S3; and the second merging unit configured for performing sequentially step S5, step S6, step S7, and step S8, wherein step S5 is calculating a merging overhead of an erasure code stripe pair formed by any two erasure code stripes among the updated erasure code stripes obtained in step S3 when two erasure code stripes that can be merged cannot be selected, wherein step S6 is selecting erasure code stripe pairs with a lowest merging overhead among all erasure code stripe pairs and merging the selected erasure code stripe pairs to generate a merged stripe, wherein step S7 is deleting erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 among all erasure code stripe pairs to obtain all new erasure code stripe pairs, wherein the erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 refer to each of the erasure code stripe pairs containing one of the erasure code stripes of the erasure code stripe pairs selected in the step S6, where step S8 is repeatedly performing step S6 and step S7 until all erasure code stripe pairs obtained in step S7 are empty.

7. The stripe merging system according to claim 6, wherein the step of selecting the erasure code stripe with a relatively low merging overhead with the specific erasure code stripe from the set of erasure code stripes in the process of performing step S2 by the first merging unit further comprises: selecting an erasure code stripe with a lowest merging overhead or with a merging overhead less than a threshold from the aligned erasure code stripes when the set of erasure code stripes includes the erasure code stripes whose locations of the parity chunks are fully aligned with the location of the parity chunk of the specific erasure code stripe; and selecting an erasure code stripe with a lowest merging overhead or with a merging overhead less than the threshold from the partially aligned erasure code stripes when the set of erasure code stripes does not include the erasure code stripes whose locations of the parity chunks are aligned with the location of the parity chunk of the specific erasure code stripe, wherein the partially aligned erasure code stripes are selected according to an order of numbers of aligned parity chunks from large to small, and each of the numbers of aligned parity chunks refer to the number of aligned parity chunks in two partially aligned erasure code stripes.

8. The stripe merging system according to claim 7, wherein the process of performing step S5 to step S8 by the second merging unit is called a StripeMerge-G algorithm specifically comprising: step S10, calculating a merging overhead of any two narrow stripes in an inputted set of narrow stripes, storing a triple (c, s.sub.i, s.sub.j) in a set custom character, wherein s.sub.i and S.sub.j, represent two different narrow stripes, c represents the merging overhead of S.sub.i and s.sub.j, the set of narrow stripes refers to a set of erasure code stripes, and each of the erasure code stripes corresponds to a narrow stripe; step S20, sorting all triples in the set custom character in an ascending order according to c; step S30, taking out a first triple (c*, s.sub.i*, s.sub.j*) in the set custom character, merging s.sub.i* and s.sub.j*, deleting all triples in the set custom character that contain s.sub.i* and s.sub.j*, wherein s.sub.i* and s.sub.j* represent two different narrow stripes after sorting, and c* represents the merging overhead of s.sub.i* and s.sub.j*; and step S40, repeatedly performing step S30 until the set custom character is empty.

9. The stripe merging system according to claim 8, wherein the process of performing step S2 to step S4 by the first merging unit is called a StripeMerge-P algorithm specifically comprising: determining the set of narrow stripes and a set of fully aligned or partially aligned parity chucks; sorting narrow stripe pairs in the set of fully aligned or partially aligned parity chunks according to a number of aligned data chunks from large to small, sequentially looking for and merging the narrow stripe pairs with a lowest merging overhead or with a merging overhead less than the threshold; and performing the StripeMerge-G algorithm on the remaining narrow stripes.

10. The stripe merging system according to claim 6, wherein the step of selecting erasure code stripe pairs with a lowest merging overhead among all erasure code stripe pairs and merging the selected erasure code stripe pairs in the process of performing step S6 by the second merging unit further comprises: checking locations of data chunks in the erasure code stripes, wherein if two data chunks are stored in a specific node together, one of the data chunks is migrated to an idle node for the current merged stripe; migrating all parity chunks with a same coding coefficient into a same node; and migrating the data chuck if a data chuck is still provided in the node where the parity chunks are stored, wherein a sum of a number of parity chunks and a number of data chunks migrated in the above process is a merging overhead.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.

(2) FIG. 1 is a flow chart of a stripe merging method based on erasure codes according to an embodiment of the disclosure.

(3) FIG. 2 is a schematic diagram of chuck migration steps in a process of merging two narrow stripes into one wide stripe according to an embodiment of the disclosure.

(4) FIG. 3 is a schematic diagram of operation steps of a StripeMerge-P algorithm according to an embodiment of the disclosure.

(5) FIG. 4 is an architecture diagram of a stripe merging system based on erasure codes according to an embodiment of the disclosure.

DESCRIPTION OF THE EMBODIMENTS

(6) To better illustrate the goal, technical solutions, and advantages of the disclosure, the following embodiments accompanied with drawings are provided so that the disclosure are further described in detail. It should be understood that the specific embodiments described herein serve to explain the disclosure merely and are not used to limit the disclosure.

(7) In view of the above defects and improvement requirements of the related art, the disclosure provides a stripe merging method and system based on erasure codes with an aim to minimize generated bandwidths of wide stripes. The algorithms include a calculation method of the merging overhead of two narrow stripes, a StripeMerge-G algorithm and a StripeMerge-P algorithm. The merging overhead of two narrow stripes is one of the important indicators in the process of generating wide stripes. Based on this indicator, the StripeMerge-G algorithm and the StripeMerge-P algorithm are executed to generate wide stripes with less overhead.

(8) FIG. 1 is a flow chart of a stripe merging method based on erasure codes according to an embodiment of the disclosure, and as shown in FIG. 1, the following steps are included.

(9) In step S1, a plurality of erasure code stripes are pre-processed, a storage location of a parity chunk of each of the erasure code stripes is determined, and a hash table is established. The hash table indicates a location of the parity chunk of each of the erasure code stripes and a serial number of each of the erasure code stripes.

(10) In step S2, a set of erasure code stripes that are fully aligned and partially aligned with a location of a parity chunk of a specific erasure code stripe among the erasure code stripes is determined based on the hash table, an erasure code stripe with a relatively low merging overhead with the specific erasure code stripe is selected from the set of erasure code stripes, and the selected erasure code stripe is merged with the specific erasure code stripe to generate a merged stripe.

(11) In step S3, the selected erasure code stripe and the specific erasure code stripe are deleted from the erasure code stripes to obtain a plurality of updated erasure code stripes.

(12) In step S4, step S2 and step S3 are repeatedly performed, and step S5 is performed when two erasure code stripes that can be merged cannot be selected.

(13) In step S5, among the updated erasure code stripes obtained in step S3, a merging overhead of an erasure code stripe pair formed by any two erasure code stripes is calculated.

(14) In step S6, erasure code stripe pairs with a lowest merging overhead among all erasure code stripe pairs are selected, and the selected erasure code stripe pairs are merged to generate a merged stripe.

(15) In step S7, erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 among all erasure code stripe pairs are deleted to obtain all new erasure code stripe pairs. The erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 refer to each of the erasure code stripe pairs containing one of the erasure code stripes of the erasure code stripe pairs selected in step S6.

(16) In step S8, step S6 and step S7 are repeatedly performed until all erasure code stripe pairs obtained in step S7 are empty.

(17) In an optional embodiment, in step S2, the step of selecting the erasure code stripe with a relatively low merging overhead with the specific erasure code stripe from the set of erasure code stripes further includes the following steps.

(18) When the set of erasure code stripes includes the erasure code stripes whose locations of the parity chunks are fully aligned with the location of the parity chunk of the specific erasure code stripe, an erasure code stripe with a lowest merging overhead or with a merging overhead less than a threshold is selected from the aligned erasure code stripes.

(19) When the set of erasure code stripes does not include the erasure code stripes whose locations of the parity chunks are aligned with the location of the parity chunk of the specific erasure code stripe, an erasure code stripe with a lowest merging overhead or with a merging overhead less than the threshold is selected from the partially aligned erasure code stripes. The partially aligned erasure code stripes are selected according to an order of numbers of aligned parity chunks from large to small, and each of the numbers of aligned parity chunks refer to the number of aligned parity chunks in two partially aligned erasure code stripes.

(20) In an optional embodiment, merging of two erasure code stripes specifically includes the following steps.

(21) Locations of data chunks in the erasure code stripes are checked, and if two data chunks are stored in a specific node together, one of the data chunks is migrated to an idle node for the current merged stripe.

(22) All parity chunks with a same coding coefficient are migrated into a same node.

(23) If a data chuck is still provided in the node where the parity chunks are stored, the data chuck is migrated.

(24) A sum of a number of parity chunks and a number of data chunks migrated in the above process is a merging overhead.

(25) In an optional embodiment, the process of step S5 to step S8 is called the StripeMerge-G algorithm specifically including the following steps.

(26) In step S10, in an inputted set of narrow stripes, a merging overhead of any two narrow stripes is calculated, and a triple (c, s.sub.i, s.sub.j) is stored in a set custom character. Herein, s.sub.1 and s.sub.j represent two different narrow stripes, c represents the merging overhead of s.sub.i and s.sub.j, the set of narrow stripes refers to a set of erasure code stripes, and each of the erasure code stripes corresponds to a narrow stripe.

(27) In step S20, all triples in the set custom character are sorted in an ascending order according to c.

(28) In step S30, a first triple (c*, s.sub.i*, s.sub.j*) in the set custom character is taken out, s.sub.i* and s.sub.j* are merged, and all triples in the set custom character that contain s.sub.i* and s.sub.j* are deleted. Herein, s.sub.i* and s.sub.j* represent two different narrow stripes after sorting, and c* represents the merging overhead of s.sub.i* and s.sub.j*.

(29) In step S40, step S30 is repeatedly performed until the set custom character is empty.

(30) In an optional embodiment, the process of step S2 to step S4 is called the StripeMerge-P algorithm specifically including the following steps.

(31) The set of narrow stripes and a set of fully aligned or partially aligned parity chunks are determined.

(32) Narrow stripe pairs in the set of fully aligned or partially aligned parity chunks are sorted according to a number of aligned data chunks from large to small, and the narrow stripe pairs with a lowest merging overhead or with a merging overhead less than the threshold are sequentially looked for and merged.

(33) The StripeMerge-G algorithm is performed on the remaining narrow stripes.

(34) To be more specific, in a specific embodiment, in a storage system based on Reed-solomon (RS) codes, for each stripe, all k pieces of data are encoded by using the RS codes to generate n−k parity chunks, where the RS codes belong to a type of the erasure codes. These n chunks form a stripe, and any k chunks among the n chunks can repair the original data.

(35) FIG. 2 shows an example of chuck migration in the process of merging narrow stripes of (n, k)=(4, 2) into wide stripes of (n, k)=(6, 2), and N.sub.1 to N.sub.6 are 6 nodes.

(36) The stripe merging method based on erasure codes provided by the disclosure includes the following steps.

(37) (F1) Arrangement of data chunks is checked. If two data chunks are stored in a specific node together, one of the data chunks is migrated to an idle node for the current merged wide stripe.

(38) As shown in FIG. 2, a data chuck b and a data chuck c are both provided in the node N.sub.2, and the data chuck c is migrated to the node N.sub.6.

(39) (F2) All parity chunks with a same coding coefficient are migrated into the same node.

(40) As shown in FIG. 2, a parity chunk c+d in the node N.sub.1 and a parity chunk a+b in the node N.sub.3 have the same coding coefficient, so these parity chunks are migrated into the same node.

(41) (F3) If a data chuck is still provided in the node where the parity chunks are stored, the data chuck is migrated.

(42) As shown in FIG. 2, after the first two migrations are completed but the data chuck d is still provided in the node N.sub.3, and the data chuck d is then migrated to the node N.sub.5.

(43) (F4) The number of chunks migrated in this process is the merging overhead.

(44) As shown in FIG. 2, a total of 3 chunks are migrated in this process, so the merging overhead is 3.

(45) FIG. 3 shows an example of merging narrow stripes of (n, k)=(4, 2) into wide stripes of (n, k)=(6, 2).

(46) The stripe merging method based on erasure codes provided by the disclosure includes the following steps.

(47) (T1) A set of narrow stripes and an i-part check alignment set are inputted.

(48) As shown in FIG. 2, a hash table is established according to a distribution of parity chunks in the stripes. The key of the hash table is the locations of the parity chunks P1 and P2, and the value of the hash table is the serial numbers of the stripes whose parity chunks satisfy the corresponding distribution. The i-part check alignment set may be efficiently obtained through the hash table.

(49) Herein, the i-part check alignment set refers to a set formed by stripes having i aligned parity chunks for a specific narrow stripe, and i represents the number of aligned parity chunks.

(50) (T2) According to the order of i from large to small, each i-part check alignment set is selected in turn, each narrow stripe is checked to look for the pairs with the lowest merging overhead in the set, and if these pairs are close to optimal matching, these pairs are aligned and merged.

(51) As shown in FIG. 3, in a 2-part check alignment set, since a stripe 1 and a stripe 5 satisfy optimal matching, the stripe 1 and the stripe 5 are merged. In a 1-part check alignment set, since a stripe 2 and a stripe 3 are close to optimal matching, the stripe 2 and the stripe 3 are merged.

(52) (T3) The StripeMerge-G algorithm is performed on the remaining narrow stripes.

(53) As shown in FIG. 3, the StripeMerge-G algorithm is performed on the remaining narrow stripes 4 and 6.

(54) In general, in the stripe merging method based on erasure codes provided by the disclosure, the overhead of wide stripe generation may be reduced, and efficiency of generation of wide stripes may be improved.

(55) FIG. 4 is an architecture diagram of a stripe merging system based on erasure codes according to an embodiment of the disclosure, and as shown in FIG. 4, the system includes the following.

(56) An erasure code stripe processing unit 410 configured for performing step Si is included. In step S1, a plurality of erasure code stripes are pre-processed, a storage location of a parity chunk of each of the erasure code stripes is determined, and a hash table is established. The hash table indicates a location of the parity chunk of each of the erasure code stripes and a serial number of each of the erasure code stripes.

(57) A first merging unit 420 configured for performing step S2 to step S4 is included. In step S2, a set of erasure code stripes that are fully aligned and partially aligned with a location of a parity chunk of a specific erasure code stripe among the erasure code stripes is determined based on the hash table, an erasure code stripe with a relatively low merging overhead with the specific erasure code stripe is selected from the set of erasure code stripes, and the selected erasure code stripe is merged with the specific erasure code stripe to generate a merged stripe. In step S3, the selected erasure code stripe and the specific erasure code stripe are deleted from the erasure code stripes to obtain a plurality of updated erasure code stripes. In step S4, step S2 and step S3 are repeatedly performed, and when two erasure code stripes that can be merged cannot be selected, a second merging unit is indicated to perform step S5.

(58) A second merging unit 430 configured for performing step S5 to step S8 is included. In step S5, among the updated erasure code stripes obtained in step S3, a merging overhead of an erasure code stripe pair formed by any two erasure code stripes is calculated. In step S6, erasure code stripe pairs with a lowest merging overhead among all erasure code stripe pairs are selected, and the selected erasure code stripe pairs are merged to generate a merged stripe. In step S7, erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 among all erasure code stripe pairs are deleted to obtain all new erasure code stripe pairs. The erasure code stripe pairs related to the erasure code stripe pairs selected in step S6 refer to each of the erasure code stripe pairs containing one of the erasure code stripes of the erasure code stripe pairs selected in step S6. In step S8, step S6 and step S7 are repeatedly performed until all erasure code stripe pairs obtained in step S7 are empty.

(59) It can be understood that the detailed functional implementation of the foregoing units may be found with reference to the introduction in the foregoing method embodiments, and description thereof is not repeated herein.

(60) A person having ordinary skill in the art should be able to easily understand that the above description is only preferred embodiments of the disclosure and is not intended to limit the disclosure. Any modifications, equivalent replacements, and modifications made without departing from the spirit and principles of the disclosure should fall within the protection scope of the disclosure.