DATA REDUCTION IN BLOCK-BASED STORAGE SYSTEMS USING CONTENT-BASED BLOCK ALIGNMENT
20220100385 · 2022-03-31
Inventors
Cpc classification
H03M7/6047
ELECTRICITY
G06F3/0659
PHYSICS
G06F3/0644
PHYSICS
H03M7/3084
ELECTRICITY
International classification
Abstract
A method of data reduction in a block-based data storage system includes selecting a starting position in a block based on a deterministic function of block data content. Then for an unaligned block beginning at the selected starting position, a block digest (e.g., block hash) is generated and compared with stored block digests of stored data blocks. If there is a match, and the stored block matches the unaligned block, then a reference to the stored block is stored in place of the unaligned block, and otherwise the unaligned block and a corresponding digest are stored. The storing of references to already stored blocks, without the constraint of observing aligned-block boundaries, realizes increased savings of physical storage space.
Claims
1. A method of data reduction in a fixed block-based data storage system, comprising: selecting a starting position in a block based on a deterministic function of block data content; for an unaligned block beginning at the selected starting position, generating a fix size block digest and comparing the generated block digest with stored block digests of stored data blocks; and if the comparing results in a match, and the stored block at its specific offset is determined to match the unaligned block, then storing a reference to the stored block in place of the unaligned block, and otherwise storing the unaligned block and a corresponding digest with its offset in the hash table.
2. The method of claim 1, wherein selecting a starting position includes performing a repeated sliding window calculation over successive segments of the block, and making an exit decision based on the sliding window calculation satisfying an exit criterion.
3. The method of claim 2, wherein the sliding window calculation uses a function that produces a semi-random number for each window.
4. The method of claim 3, wherein the window size is in a range from 4 to 8 bytes.
5. The method of claim 2, wherein the sliding window calculations progress through successive windows with a step size substantially less than a size of the block, and the window size is greater than the step size.
6. The method of claim 5, wherein the step size is one byte.
7. The method of claim 2, wherein the exit criterion is statistically well distributed with respect to the block size, promoting the calculation of one hash per block.
8. The method of claim 7, wherein the exit criterion is a match between a sliding window calculation result and a predefined fixed value.
9. The method of claim 1, wherein, once a starting position has been selected for a first block of an extent, subsequent iterations are performed for successive blocks of the extent with each iteration starting at the same relative starting position, thereby promoting identification a set of blocks of the extent matching corresponding stored blocks.
10. The method of claim 9, wherein the iterations at the same relative starting position continue until reaching an unaligned block having to matching stored block, and thereafter returning to select a new starting position in the unaligned block.
11. A data storage system, comprising: physical data storage devices for storing blocks of user data; interface circuitry providing respective interfaces to the physical storage devices and to host computers generating storage commands received and executed by the data storage system; and processing circuitry configured and operative to store and execute computer program instructions to perform a method of data reduction including: selecting a starting position in a block based on a deterministic function of block data content; for an unaligned block beginning at the selected starting position, generating a block digest and comparing the generated block digest with stored block digests of stored data blocks; and if the comparing results in a match, and the stored block is determined to match the unaligned block, then storing a reference to the stored block in place of the unaligned block, and otherwise storing the unaligned block and a corresponding digest.
12. The data storage system of claim 11, wherein selecting a starting position includes performing a repeated sliding window calculation over successive segments of the block, and making an exit decision based on the sliding window calculation satisfying an exit criterion.
13. The data storage system of claim 12, wherein the sliding window calculation uses a function that produces a semi-random number for each window.
14. The data storage system of claim 13, wherein the window size is in a range from 4 to 8 bytes.
15. The data storage system of claim 12, wherein the sliding window calculations progress through successive windows with a step size substantially less than a size of the block, and the window size is greater than the step size.
16. The data storage system of claim 15, wherein the step size is one byte.
17. The data storage system of claim 12, wherein the exit criterion is statistically well distributed with respect to the block size, promoting the calculation of one hash per block.
18. The data storage system of claim 17, wherein the exit criterion is a match between a sliding window calculation result and a predefined fixed value.
19. The data storage system of claim 11, wherein, once a starting position has been selected for a first block of an extent, subsequent iterations are performed for successive blocks of the extent with each iteration starting at the same relative starting position, thereby promoting identification a set of blocks of the extent matching corresponding stored blocks.
20. The data storage system of claim 19, wherein the iterations at the same relative starting position continue until reaching an unaligned block having to matching stored block, and thereafter returning to select a new starting position in the unaligned block.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The foregoing and other objects, features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views.
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
DETAILED DESCRIPTION
[0012] Overview
[0013] Modern block-based data storage systems (also called “storage devices” herein) can support various methods to reduce physical storage space requirements and cost ($/GB) accordingly. Known methods include compression and block deduplication, which generally work on storage full page boundaries (e.g., naturally aligned 4 KB or 8 KB pages). Some newer methods may use a sub-page granularity such as a 512B block or sector granularity.
[0014] The files of file systems that are stored on block storage devices typically have much finer granularity, e.g., byte granularity, and an application may produce similar files with only very few or even a single byte offset inside the file data. For example, a word processing application may generate two versions of a document that differ in a single additional character or added/deleted word, so that the two document files are exactly the same up to the point of difference, and also the same after that point except for a small (byte-granular) shift in the contents. Similar results can be obtained for other applications such as spreadsheets, etc.
[0015] The known methods as outlined above would generally view all blocks after the difference point as different, just based on the shift in the contents, and would be unable to realize storage savings based on deduplicating such blocks accordingly. This is due in part to the constraint of operating at page or block/sector granularity.
[0016] The present disclosure provides a more content-aware method of detecting identical data blocks, with a much finer (e.g., byte) granularity, to readily detect and provide data reduction in common scenarios, such as the small-offset shifting of file contents as described above. The technique can enable a block storage device to detect byte granular shift (e.g., small number of bytes) between streams and once detected, apply a method to reduce redundant information and reduce physical storage utilization.
DESCRIPTION OF EMBODIMENTS
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023] Referring to
[0024] At 60, the block is scanned to select a starting position according to a predefined, deterministic criterion. An example technique is described below. The scanning is at a fine-grain granularity, e.g., byte granularity, and is done in a way that will repeatably select the same starting position for identical data. Once a starting position is selected, the scanning is stopped and processing proceeds (as indicated by Exit) to block 62.
[0025] At 62 the process calculates a block digest (e.g., hash) for a block-size extent (e.g., 4 KB) of data extending from the starting position (e.g., a specific unaligned byte). In general this extent, which is referred to herein as an “unaligned block”, extends across part of the current aligned block and part of the next sequential aligned block. The calculated digest is compared to the stored digests (e.g., by a hash table lookup) to identify a match, if one exists. If so, then at 64 the corresponding unaligned stored block is retrieved with corresponding offset and further processed to confirm an actual match, i.e., the stored unaligned block is compared with the unaligned block for which the digest has been calculated at 60. If the match is confirmed, then the new block can be replaced with a reference to the stored block, along with an offset value indicating the starting position of the unaligned block relative to the stored block being referenced. If at 62 no match is found, then the unaligned block from 60 is a new unique block, and at 66 it is added to the block store 52 along with its offset relative to the preceding aligned block boundary, while its digest/hash is also stored in the collection 50 so that the newly stored block may be discovered in a subsequent iteration of the process for another new block.
[0026] It should be noted that as the process of
[0027] The following is an example process for selecting a starting position in step 60. This example is divided into two steps, a repeated operation referred to as a “sliding window” calculation, and an exit decision [0028] Sliding window calculation: This step makes a calculation on a sliding window of block data, every time advancing the window by a certain step (granularity, e.g., a single byte). The calculation may generally be any function that produces a semi-random number for each window, such as a hash function (not to be confused with the block digests/hashes used to identify block identicality). In one example, a window size may be in a range between 4 and 8 bytes for example. The scanning progresses through successive windows of the block data, each time stepping by the desired granularity (e.g., one byte). In each step, the hash function is calculated for the current window data to obtain a resulting window hash value. The window may also be referred to as a “segment” herein. Generally, the step size will be substantially less than the size of the block (e.g., one byte <<4 KB), and the window size is greater than the step size (e.g., 4B>1B). [0029] Exit criteria: Continue the sliding window calculation until an exit criteria is fulfilled, which may generally be any criteria that is statistically well distributed (e.g., 1 out of 4096). In one embodiment, each window hash value is compared to a predefined fixed value, with a match triggering the exit. For example, this exit trigger may be when the window hash value lower bits equal 0xAAA, representing a 1:4096 chance. As a low probability exit point is statistically selected (e.g., 1:4096 within a 4K block), from a block a single block digest/hash is generated.
[0030] Other exit methods may require calculating all hashes in the block and selecting a single one, e.g., highest or lowest value.
[0031] As a result of the above processing, a hash table 50 is created with hashes that represent blocks with starting points dynamically selected based on block contents (exit criteria) rather than fixed by address alignment. As such, for different data streams with identical but shifted blocks, the starting points can now align. Once the alignment between blocks is known, streams can be scanned (logical-address based forward and backward) and to deduplicate extents of unaligned but otherwise identical blocks.
[0032]
[0033] While various embodiments of the invention have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the appended claims.