SYSTEM AND METHOD FOR AN IMPROVED REAL-TIME ADAPTIVE DATA COMPRESSION
20180300087 ยท 2018-10-18
Assignee
Inventors
Cpc classification
H03M7/30
ELECTRICITY
International classification
Abstract
The present invention is mainly to solve the technical problems of the prior art existed. The present invention relates to compression, in particular to an improved real-time adaptive data compression for efficient data storage. An aspect of present disclosure relates to a method for managing data storage in a data storage system. The method includes the steps of determining, by a processor of said data storage system, receipt of one or more blocks of data for storage; identifying, by the processor, a compression technique for storage of said one or more blocks of data; and compressing in-line or post processing, by the processor, if said compression technique is an in-line compression technique for writing the data in a memory, said one or more blocks of data based at least on a resources utilization of said data storage system.
Claims
1. A method for managing data storage in a data storage system, the method comprising: determining, by a processor of said data storage system, receipt of one or more blocks of data for storage; identifying, by the processor, a compression technique for storage of said one or more blocks of data; wherein said method is characterized by comprising a step of: compressing in-line or post processing, by the processor, if said compression technique is an in-line compression technique for writing the data in a memory, said one or more blocks of data based on combination of a CPU utilization, a memory utilization, and/or a number of operations in flight of said data storage system.
2. The method as claimed in claim 1, wherein said resources utilization is further selected from any or combination of, a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
3. The method as claimed in claim 1 further comprises: determining independent probabilities of said resources utilization of said data storage system to derive a probability of compressibility in-line value for automatically selecting compressing in-line or post processing said one or more blocks of data.
4. The method as claimed in claim 1 further comprises: updating a compression flag indicating selection of said compressing in-line for storage of said one or more blocks of data; or updating a post process flag indicating selection of said post processing for storage of said one or more blocks of data.
5. A method of evaluating at least on a resources utilization of a data storage system to determine if a write received by said data storage system should be compressed in-line or post-process in order to maintain a consistent performance with in-line compression and without inline compression, said resources utilization is selected from combination of a CPU utilization, a memory utilization, a number of operations in flight, a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
6. A system for managing data storage in a data storage system, the system comprising a storage processor and a memory configured to: determine, by said storage processor, receipt of one or more blocks of data for storage; identify, by said storage processor, a compression technique for storage of said one or more blocks of data; wherein said system is characterized by comprising a step of: compress in-line or post process, by said storage processor, if said compression technique is an in-line compression technique for writing the data in the memory, said one or more blocks of data based on combination of a CPU utilization, a memory utilization, and/or a number of operations in flight of said data storage system to dynamically scale performance to match the available resources of said data storage system.
7. The system as claimed in claim 6, wherein said resources utilization is further selected from any or combination of a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
8. The system as claimed in claim 6 further configured to: determine independent probabilities of said resources utilization of said data storage system to derive a probability of compressibility in-line value to automatically select compress in-line or post process said one or more blocks of data.
9. The system as claimed in claim 6 further configured to: update a compression flag indicating selection of said compressing in-line for storage of said one or more blocks of data; or update a post process flag indicating selection of said post processing for storage of said one or more blocks of data.
10. A data storage system comprising a data storage, a storage processor and a memory, said data storage system configured to: determine, by said storage processor, receipt of one or more blocks of data for storage; identify, by said storage processor, a compression technique for storage of said one or more blocks of data; wherein said system is characterized by comprising a step of: compress in-line or post process, by said storage processor, if said compression technique is an in-line compression technique for writing the data in the memory, said one or more blocks of data based on combination of a CPU utilization, a memory utilization, and/or a number of operations in flight of said data storage system.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030] In the FIGURES, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[0031]
DETAILED DESCRIPTION
[0032] Systems and methods are disclosed for managing data storage in data storage systems. Embodiments of the present disclosure include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, steps may be performed by a combination of hardware, software, firmware, and/or by human operators.
[0033] Embodiments of the present disclosure may be provided as a computer program product, which may include a machine-readable storage medium tangibly embodying thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process. The machine-readable medium may include, but is not limited to, fixed (hard) drives, magnetic tape, floppy diskettes, optical disks, compact disc read-only memories (CD-ROMs), and magneto-optical disks, semiconductor memories, such as ROMs, PROMs, random access memories (RAMs), programmable read-only memories (PROMs), erasable PROMs (EPROMs), electrically erasable PROMs (EEPROMs), flash memory, magnetic or optical cards, or other type of media/machine-readable medium suitable for storing electronic instructions (e.g., computer programming code, such as software or firmware).
[0034] Various methods described herein may be practiced by combining one or more machine-readable storage media containing the code according to the present disclosure with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various embodiments of the present disclosure may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the disclosure could be accomplished by modules, routines, subroutines, or subparts of a computer program product.
[0035] If the specification states a component or feature may, can, could, or might be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.
[0036] Although the present disclosure has been described with the purpose of managing data storage in a data storage system, it should be appreciated that the same has been done merely to illustrate the invention in an exemplary manner and any other purpose or function for which explained structures or configurations can be used, is covered within the scope of the present disclosure.
[0037] Exemplary embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those of ordinary skill in the art. Moreover, all statements herein reciting embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure).
[0038] Thus, for example, it will be appreciated by those of ordinary skill in the art that the diagrams, schematics, illustrations, and the like represent conceptual views or processes illustrating systems and methods embodying this invention. The functions of the various elements shown in the FIGURES may be provided through the use of dedicated hardware as well as hardware capable of executing associated software. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the entity implementing this invention. Those of ordinary skill in the art further understand that the exemplary hardware, software, processes, methods, and/or operating systems described herein are for illustrative purposes and, thus, are not intended to be limited to any particular named.
[0039] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.
[0040] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0041] The term machine-readable storage medium or computer-readable storage medium includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and/or data. A machine-readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-program product may include code and/or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
[0042] Furthermore, embodiments may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.
[0043] Generally, data reduction/efficiency technologies are never zero cost. In view of the above, what is noted that, with compression specifically, if a storage system is able to perform any number of operations without compression, the number of storage operations that can be performed with in-line compression must be fewer. The problem in native in-line compression is that the impact on storage operations performance is fixed and impactful. This makes running compression on different hardware/software have very different performance profiles with compression or without compression. For example, running on system x without compression will yield 100 operations per second. Running on system x with in-line compression will yield 80 operations per second. 20% degradation due to in-line compression. Running on system y without compression will yield 200 operations per second. Running on system y with in-line compression will yield 100 operations per second. 50% degradation due to in-line compression.
[0044] Thus, there is a dire need to provide a mechanism which dynamically, in real-time, determines per operation whether to compress data in-line or post process such that the impact on storage operations performance is flexible and improved with efficiency without any impact on the available resources and operations. Further, there is also a need to provide a mechanism that makes running compression on different hardware/software have almost similar performance profiles with compression or without compression.
[0045] The present invention is mainly to solve the technical problems of the prior art existed. The present invention relates to compression, in particular to an improved real-time adaptive data compression for efficient data storage.
[0046] In an embodiment, the present invention provides a mechanism which dynamically, in real-time, determines per operation whether to compress data in-line or post process such that the impact on storage operations performance is flexible and improved with efficiency without any impact on the available resources and operations. Further, the present invention provides a mechanism that makes running compression on different hardware/software have almost similar performance profiles with compression or without compression.
[0047] An aspect of the present disclosure relates to a method for managing data storage in a data storage system. The method includes the steps of determining, by a processor of said data storage system, receipt of one or more blocks of data for storage; identifying, by the processor, a compression technique for storage of said one or more blocks of data; and compressing in-line or post processing, by the processor, if said compression technique is an in-line compression technique for writing the data in a memory, said one or more blocks of data based at least on a resources utilization of said data storage system.
[0048] In an aspect, said resources utilization is selected from any or combination of a CPU utilization, a memory utilization, a number of operations in flight, a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
[0049] In an aspect, the method further determines independent probabilities of said resources utilization of said data storage system to derive a probability of compressibility in-line value for automatically selecting compressing in-line or post processing said one or more blocks of data.
[0050] In an aspect, the method further updates a compression flag indicating selection of said compressing in-line for storage of said one or more blocks of data or updates a post process flag indicating selection of said post processing for storage of said one or more blocks of data
[0051] An aspect of the present disclosure relates to a method of evaluating at least on a resources utilization of a data storage system to determine if a write received by said data storage system should be compressed in-line or post-process in order to maintain a consistent performance with in-line compression and without in-line compression, said resources utilization is selected from any or combination of a CPU utilization, a memory utilization, a number of operations in flight, a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
[0052] An aspect of the present disclosure relates to a system for managing data storage in a data storage system. The system includes a storage processor and a memory. The system is configured to determine, by said storage processor, receipt of one or more blocks of data for storage; identify, by said storage processor, a compression technique for storage of said one or more blocks of data; and compress in-line or post process, by said storage processor, if said compression technique is an in-line compression technique for writing the data in the memory, said one or more blocks of data based at least on a resources utilization of said data storage system to dynamically scale performance to match the available resources of said data storage system.
[0053] An aspect of the present disclosure relates to a data storage system comprising data storage, a storage processor and a memory. The data storage system is configured to determine, by said storage processor, receipt of one or more blocks of data for storage; identify, by said storage processor, a compression technique for storage of said one or more blocks of data; and compress in-line or post process, by said storage processor, if said compression technique is an in-line compression technique for writing the data in the memory, said one or more blocks of data based at least on a resources utilization of said data storage system.
[0054]
[0055] In an embodiment, the inputs can be pre-defined/pre-configured however, in an implementation their values are weighted different dynamically.
[0056] In an embodiment as shown in
[0057] In an embodiment, the method, according to the present invention, calculates the independent probabilities, chooses the bottleneck probability, and then given that probability, it (probabilistically) determines whether the data should be compressed in-line or in a post process.
[0058] The said resources utilization is selected from any or combination of a CPU utilization, a memory utilization, a number of operations in flight, a NVRAM/NVDIMM utilization, a network utilization, a cache utilization, a drive utilization, and a node storage capacity/utilization.
[0059] In an exemplary implementation, said resources can be pre-defined but their values are weighted different dynamically.
[0060] For example, if a system had 20% CPU used and 90% memory used, the method would determine that memory is the bottleneck and probability for compressing the operation inline vs. post process would be 10% i.e. 10% of the operations would be compressed inline while 90% of the operations would be post processed for compression.
[0061] In an exemplary embodiment, said bottleneck probabilities are multiplied for the said resources, thereby factoring in their criticality with a bias towards not compressing in-line/in-band and deferring to out-of-band compression.
[0062] Further, for each of the said resources, the present system divides their utilization into at least three ranges such as a low utilization range, a middle utilization range, and a high utilization range. If the resource usage is in the low utilization range, the present system considers the probability (for that resource) to be 100%. If the resource usage is in the high utilization range, the present system considers the probability (for that resource) to be 0%. If the resource usage is in the middle utilization range, the present system scale the probability linearly (negative slope) so that it matches the two end-points (where the middle utilization range meets the low/high utilization ranges). This per-resource probability is combined with the per-resource probabilities for all the relevant resources to determine the overall probability for the specific write. Given this combined probability, the present system probabilistically decides whether to perform in-band compression or not.
[0063] In an embodiment, the method, according to the present invention, evaluates a number of internal conditions (associated with the resources of the system) to determine if a write received by the storage system should be compressed in-line or post-process in order to maintain consistent performance with in-line compression and without in-line compression. Giving seemingly zero performance cost data efficiency regardless of the hardware.
[0064] In an embodiment, given any software or hardware system that is capable of running compression software, the present invention dynamically scales performance of the system to match the available resources.
[0065] It would be appreciated that, the present invention is applicable and would be applicable in any hardware or software instance that can use the software or capable of running compression software's.
[0066] In an embodiment, the system according to the present invention also includes a flag determining whether the data should be compressed in-line or whether the compression should be deferred to a post-process. In an exemplary implementation, a compression flag would be updated indicating selection of said compressing in-line for storage of said one or more blocks of data. In another implementation, a post process flag would be updated indicating selection of said post processing for storage of said one or more blocks of data.
[0067] In an embodiment, the system and/or the method according to the present invention determines compressibility inline vs. post process the method is able to scale performance and data efficiency based on the available resources in the system.
[0068] In an embodiment, the system and/or the method according to the present invention dynamically determines per operation whether to compress data in-line or post process.
[0069] As used herein, and unless the context dictates otherwise, the term coupled to is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms coupled to and coupled with are used synonymously. Within the context of this document terms coupled to and coupled with are also used euphemistically to mean communicatively coupled with over a network, where two or more devices are able to exchange data with each other over the network, possibly via one or more intermediary device.
[0070] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms comprises and comprising should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.
[0071] The term and/or means that and applies to some embodiments and or applies to some embodiments. Thus, A, B, and/or C can be replaced with A, B, and C written in one sentence and A, B, or C written in another sentence. A, B, and/or C means that some embodiments can include A and B, some embodiments can include A and C, some embodiments can include B and C, some embodiments can only include A, some embodiments can include only B, some embodiments can include only C, and some embodiments can include A, B, and C. The term and/or is used to avoid unnecessary redundancy.
[0072] Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc. The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.
[0073] While embodiments of the present disclosure have been illustrated and described, it will be clear that the disclosure is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the disclosure, as described in the claims.