Positional indexing for a tiered data storage system
10409516 ยท 2019-09-10
Assignee
Inventors
- Peter Y. Kushner (Fiskdale, MA, US)
- Jonathan I. Krasner (Coventry, RI, US)
- Chakib Ouarraoui (Watertown, MA, US)
Cpc classification
G06F3/0604
PHYSICS
G06F7/06
PHYSICS
G06F3/0605
PHYSICS
G06F7/22
PHYSICS
G06F7/10
PHYSICS
G06F3/0685
PHYSICS
G06F7/20
PHYSICS
G06F7/32
PHYSICS
G06F7/08
PHYSICS
G06F7/14
PHYSICS
H04L67/1097
ELECTRICITY
G06F7/24
PHYSICS
International classification
G06F7/26
PHYSICS
G06F7/24
PHYSICS
G06F7/22
PHYSICS
G06F7/08
PHYSICS
Abstract
The system and methods disclosed herein relate to an improvement in automated data tiering technology. The systems and methods disclosed herein enhance database storage performance characteristics in myriad ways. First, the speed with which data can be relocated from one tier to another in a tiered data storage system is increased by reducing the number of sort cycles required to perform data relocation. In addition, data relocation among the tiers is performed on the backend by an offload engine, which results in uninterrupted access to read/write commands within the data storage system from a user's perspective on the frontend. Third, users are able to adjust the percentages of hot or cold data that are relocated within the database without having to alter the service level agreements. In this way, users can make spontaneous changes to performance characteristics related to the promotion and demotion of data stored within a tiered data storage system.
Claims
1. A method for sorting data in a data storage system comprising a storage subsystem, wherein the storage subsystem comprises a processing unit, a storage subsystem memory, and a plurality of storage tiers having differing performance characteristics, and an offload engine, wherein the offload engine comprises a processing unit and an offload memory, the method comprising: a. receiving a first metric associated with a desired percentage of data stored in a sorted array to be categorized as hot data; b. receiving a second metric associated with a desired percentage of data stored in the sorted array to be categorized as cold data; c. copying the sorted array from the storage subsystem to the offload engine; d. separating the sorted array into first and second partitioned storage sectors using a single write operation, wherein the first partitioned storage sector contains hot data sorted in descending order, and the second partitioned storage sector contains cold data sorted in ascending order; and e. writing the sorted array into the storage subsystem memory.
2. The method of claim 1 wherein the first metric and the second metric are received in the offload engine.
3. The method of claim 1 wherein the first metric and the second metric are received in the storage subsystem.
4. The method of claim 1 wherein the storage subsystem memory further comprises a database.
5. The method of claim 1 wherein the offload engine is external to data storage system.
6. A system comprising: a. a storage subsystem including a processing unit, a storage subsystem memory, and a plurality of storage tiers having different performance characteristics; b. an offload engine including a processing unit and an offload memory communicatively coupled to the storage subsystem; c. computer-executable logic, wherein the computer-executable logic is configured for the execution of: i. receiving a first metric associated with a desired percentage of data stored in the offload engine to be categorized as hot data; ii. receiving a second metric associated with a desired percentage of data stored in the offload engine to be categorized as cold data; iii. copying a sorted array from the storage subsystem to the offload engine; iv. separating the sorted array into first and second partitioned storage sectors using a single write operation, wherein the first partitioned storage sector contains a plurality of hot data sorted in descending order, and the second partitioned storage sector contains a plurality of cold data sorted in ascending order; and v. writing the sorted array into the storage subsystem memory in the same order in which it was copied from the storage subsystem.
7. The system of claim 6 wherein the first metric and the second metric are received in the offload engine.
8. The system of claim 6 wherein the first metric and the second metric are received in the storage subsystem.
9. The system of claim 6 wherein the storage subsystem memory further comprises a database.
10. The system of claim 6 wherein the offload engine is external to the data storage system.
11. The system of claim 6 wherein the computer executable logic is operating on the offload engine.
12. A computer program product for sorting data in a data storage system comprising a storage subsystem, wherein the storage subsystem comprises a processing unit, a storage subsystem memory, and a plurality of storage tiers having differing performance characteristics, and an offload engine, wherein the offload engine comprises a processing unit and an offload memory, the computer program product comprising a non-transitory computer readable medium encoded with computer-executable program code the code configured to enable the execution of: a. receiving a first metric associated with a desired percentage of data stored in a sorted array to be categorized as hot data; b. receiving a second metric associated with a desired percentage of data stored in the sorted array to be categorized as cold data; c. copying the sorted array from the storage subsystem to the offload engine; d. separating the sorted array into first and second partitioned storage sectors using a single write operation, wherein the first partitioned storage sector contains hot data sorted in descending order, and the second partitioned storage sector contains cold data sorted in ascending order; and e. writing the sorted array into the storage subsystem memory a database located within the physical storage pool.
13. The computer program product of claim 12 wherein the first metric and the second metric are received in the offload engine.
14. The computer program product of claim 12 wherein the first metric and the second metric are received in the storage subsystem.
15. The computer program product of claim 12 wherein the storage subsystem memory further comprises a database.
16. The computer program product of claim 12 wherein the offload engine is external to the data storage system.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION
(6)
(7) Host 110, in some embodiments, is a computer having one or more Central Processing Units 120 and local memory 112. An operating system 127 executes on CPU 120, which enables CPU 120 to execute one or more instances of storage system management application 125. The storage system management application 125 has a set of storage system Application Program Interfaces (APIs) 126 that enables the storage system management application 125 to control execution of applications 145 running within a processing environment of the storage system 130.
(8) The storage system management application 125 has a Command Line Interface 124 that enables users to manually input data to the storage system management application 125. By setting information into the storage system management application 125, the user can control how the storage system management application 125 adjusts operation of the applications 145 running on the storage system 130. Through the use of the CLI 124, a user can set various parameters, such as the relative priority of the various applications 145, the Quality of Service (QoS) each application should receive, IO limits for the applications 145, and other similar types of parameters. The CLI 124 thus allows the management application user to control execution of the applications 145 running on the storage system 130 to ensure that the applications 145 running on storage system 130 are executing to provide access to data in a manner consistent with the user's objectives. In some embodiments, the CLI 124 and API 126 are consolidated into a single software layer. One example CLI 124 is available from Dell EMC and referred to as Solutions Enabler. Other command line interfaces may likewise be created and used to interact with and control execution of the storage system management application 125.
(9) In some embodiments, a Graphical User Interface (GUI) 122 is built on top of the API 126. GUI 122 consolidates operations to facilitate control of applications 145 using a more user-friendly interface than the CLI 124. One example GUI 122 is available from Dell EMC and referred to as Unisphere. Other graphical user interfaces 122 may likewise be created and used to interact with and control execution of the storage system management application 125. In some embodiments, GUI 122 facilitates interaction between the user and storage system management application 125, for example to enable the user to make provisioning decisions, generate reports, and ensure the health of all systems in the storage environment 100. In some implementations, GUI 122 allows the user to remotely create storage resources on the storage system 130, configure and schedule data protection protocols managed by storage system 130, and manage and monitor storage operations of storage system 130.
(10) Although GUI 122 is shown in
(11) In some embodiments, the storage system management software 125 is configured to enable the user to perform service level provisioning to specify the quality of service provided by the applications 145 running on storage system 130. Quality of service may be set on a per application per channel basis, per application for a group of channels, for a Storage Resource Pool (SRP), for a set of SRPs, or for some other measurable basis. As used herein, the term Quality of Service (QoS) will be used to refer to performance parameters of the Storage System 130. Example performance parameters that may be specified as QoS may include response time and other measurable performance related parameters. When a user specifies service level parameters for an application 145, the storage system management application 125 adjusts the priority, IO limits, and other performance aspects of the impacted application 145, and optionally other applications 145 also running on the storage system 130, to enable the storage system 130 to meet the provisioned service level.
(12) As shown in
(13) Storage resources 134 may be implemented using a number of different storage technologies. In some implementations storage resources 134 are configured as tiered storage, as depicted in
(14) Applications 145 execute within storage system 130 to access and operate on storage resources 134. By way of example and without limitation, one application 145 is a Fully Automated Storage Tiering (FAST) application 142, configured to move data between tiers of storage resources 134 within storage system 130. Different groups of storage resources (storage resource pools) may be separately managed by individual FAST applications. Hence a given storage system 130 may have multiple instances of FAST application 142 executing therein at the same time.
(15) Another example application is a Remote Data Forwarding (RDF) application 143, which causes subsets of data stored within storage resources 134 to be mirrored to one or more similar remote storage systems (not shown). The RDF application 143 also allows data stored by other remote storage systems (not shown) to be mirrored to the storage system 130 and stored within the storage resources 134. A given storage system 130 may operate as a primary node or secondary node in many mirroring pairs, and hence multiple RDF applications 143 may simultaneously execute on storage system 130 to control participation of storage system 130 in the mirroring operations.
(16) Another example application that may be executed by CPU 140 may be a snapshot application (SNAP) 144 configured to provide point-in-time data copying. A snapshot is a copy of data as that data existed at a particular point in time. Multiple SNAP applications 144 may be executing on a given storage system 130. Other applications 145 may likewise be executing on storage system 130. The particular set of applications 145 will depend on the particular implementation.
(17)
(18) As noted above, FAST application 142 operates to move data within storage resources 134, such as between storage tiers 134A, 134B, and 134C to optimize access to data stored within storage resources 134. When FAST application 142 is run, it utilizes CPU 140 to determine and flag data segments as candidates for relocation. In addition, CPU 140 performs the necessary processes to effectuate promotion/demotion among the tiers 134A, 134B, and 134C.
(19) Service level provisioning in storage system 130 ensures that storage 130 is provisioned with a set expectation for performance, for example system response time. FAST 142 and other aspects of the Operating System 148 coordinate to achieve QoS targets as established within the service level agreement(s). An additional means of achieving QoS, as discussed previously, is to run automated tiering software on the backend. This, however, has the drawback of consuming CPU 140 resources, which in turn reduces the amount of IO operations the storage system 130 is able to process.
(20) In some embodiments, storage administrators may want to limit the amount of IO (IOPS) or bandwidth (MB/s) a particular application 145 can drive on the storage system. Accordingly, storage system management application 130 enables the user to specify performance targets for the applications 145 as well as to define limits to enforce service level agreements and make application performance more predictable. Host IO limits features allows the user to limit frontend port performance by either IOPS, Host MB per host, or both. The limits may be set for particular storage resource pools, on a per channel basis, or on another basis. Once set by the administrator, the bandwidth and IOs controls are then monitored by the storage system 145 to attempt to meet the defined service level agreements while also ensuring that the applications 145 do not exceed the specified maximum bandwidth or maximum IOPS.
(21) The storage system management application 125 enables the user to set priorities for the applications 145 to realize the overall service level agreements for the applications 145 under control. The storage system management application 125 allows the user to monitor operation of the storage system 130 to determine whether it is achieving the QoS requirements specified in the service level agreement. If the QoS targets are not being met, the user can use the GUI 122 or the CLI 124 to adjust priorities of the applications 145 to better enable the storage system 130 to operate in accordance with its service level agreement. For example, the user may prioritize FAST 142 over other applications 145. Different priorities may be set for different channels, such that each iteration of the applications is adjusted to allow the QoS to be met within the channel.
(22)
(23) As previously stated, storage tiering is the movement of data to different types of storage areas, e.g., disks or cloud-based systems, in accordance with network performance and capacity requirements. In a typical storage environment 100, 95% of application data has little IO activity. Tiering takes this fact into account by allowing the majority of data to reside in slower, cheaper storage areas, while simultaneously ensuring that the most active data are stored on the fastest drives, e.g., Fibre Channel, or flash solid state drives (SSD).
(24) In an exemplary storage system 130, Tier 1 134A could be an extreme performance tier. The extreme performance drives 170A could be comprised of flash technology SSDs, which contain no moving parts. Flash drives have a higher per-GB storage cost, but a lower input/output (10) cost compared with spinning drives. In this embodiment, Tier 1 134 would contain hot data, i.e., data that typically requires fast response time and/or high IO per second.
(25) In this exemplary storage system 130, Tier 2 134B could be designated as the performance tier. The performance tier 134B could be used to store warm data. In Tier 2 134B, it is possible to achieve a combination of performance and capacity. In an embodiment, Tier 2 134B could be a cloud-based storage system. The performance tier 134B offers high, all-around performance with consistent response times, high throughput, and good bandwidth at a mid-level price point.
(26) In this exemplary storage system 130, Tier 3 134C could be used to store cold data. Tier 3 134C could be referred to as a capacity tier. The capacity tier 134C is often used to decrease the cost per GB of data storage. In one embodiment, the capacity tier drives 170C could be RPM Near-Line SAS (NL-SAS) drives. Although NL-SAS drives have a slower rotational speed than the SAS drives used in the exemplary performance tier 134B, the NL-SAS drives significantly reduce energy use and free up capacity in the more expensive, higher performance tiers 134A and 134B. In an alternate embodiment, Tier 3 134C could be a cloud-based storage system.
(27) Data relocations can be done on a per-storage-extent basis, i.e., at the granularity of an individual storage extent. As is known, a storage extent is an increment of contiguous storage, such as a block or a slice. A block is the smallest unit of storage that may be allocated to a data object and may be, for example, 8 KB in size, although block sizes can vary considerably. A slice is the smallest unit of storage that may be provisioned to a data object. Typical slice sizes are 256 MB or 1 GB. As used herein, the term storage extent or extent is intended to cover not only units, like blocks or slices, but also larger structures built from such units, such as LUNs, storage pools, arrays, vectors, and even an entire data set within a database. It should be understood that the definition of storage extent is intended to be flexible. For example, a storage extent may be a physical extent or a logical extent. Also, the particular examples of storage extents provided herein are intended to be merely illustrative.
(28) In terms of user control over automated tiering, users can define a tiering policy. The tiering policy specifies the tier 134 within which new data will be placed as well as how data will be relocated to an alternate tier 134 during scheduled and manually invoked relocation periods. By way of example, a tiering policy could utilize the highest available tier. In a highest available tiering policy, data could be placed in the highest performing tier until that tier was full, at which point, data would be placed in the next highest performance tier having capacity.
(29) An alternative tiering policies is a lowest available tier policy. In a lowest tiering policy, data are placed in the lowest performing tier 134 until that tier is full, at which point, data is placed in the next lowest available tier. A user could define a no data movement tiering policy as well. In this type of tiering policy, data are moved within a tier 134, but not from one tier 134 to another.
(30) Of particular relevance for the present invention is an automatic tiering policy. Auto-tiering policies automatically relocate data to the most appropriate level based on the activity level of each storage extent. Data are relocated based upon the highest available tier 134 and the storage extent's temperature.
(31) As can be seen in
(32)
(33) The storage subsystem 430 is further comprised of a central processing unit 440, Applications 445, which could be FAST, 442, RDF 443 and SNAP 444. Similarly, the CPU also includes an operating system 448. The central processing unit 440 could also be a general processing unit, a purpose-built processor, an ASIC, and the like in alternate embodiments.
(34) The storage subsystem 430 also includes locations for data storage, i.e., storage resources 434, a storage subsystem memory 432, and optionally a database 470. As will be discussed below, storage resources 434 are a tiered data storage system. One of the purposes of storage subsystem memory 432 is for capturing data hierarchies within storage resources 434 to facilitate re-tiering of data as a back-end function. In this way, users experience uninterrupted access to data stored within storage resources 434 while data storage system 400 is performing re-tiering evaluation and relocation. Toward that end, database 470 can be used for redundancy or to increase speed of backend re-tiering calculations and relocation by adding memory capacity to storage subsystem 430.
(35) Similarly, offload engine 460 could be a central processing unit, a general processing unit, a purpose-built processor, an ASIC, or the like. Offload engine 460 further comprises offload memory 433. Although offload engine is pictured as being internal to storage system 400, in alternate embodiments, offload engine 460 could be external to storage system 400. In these embodiments, offload engine 460 and storage system 400 would be communicatively coupled.
(36) Although
(37) Irrespective of the number of tiers contained within the storage resources 434, the performance characteristics typically associated with the differing tiers or slices within tiers could be defined in a service level agreement. In some embodiments, and as stated previously, users can alter performance characteristics through GUI 122 or CLI 124. In addition, in embodiments herein users can gain some level of control over how much hot data or cold data are relocated within the storage system 400 during data relocation or re-tiering cycles.
(38)
(39) From the storage system's 430 perspective, the first metric could be received 510 at the host 110 and transmitted to the storage system 430 or the offload engine 460 via communication pathways 114 and 414. Similarly, the second metric could be received 512 at the host 110 and transmitted to the storage system 430 or the offload engine 460 via communication channels 114 or 414. In typical prior art re-tiering operations, it is necessary for the storage system 430 to perform two operations to construct the arrays that will ultimately contain the hot data to be relocated and cold data, which will also be relocated. Typically, hot data, which are to be relocated, are stored in an array in descending order, with the hottest data entry being located in the first row of the array. In contrast, when the storage system 430 constructs the array containing cold data that will be relocated, the array is typically constructed having the coldest data at the top, or in ascending order, so the extents to be demoted appear at the top of the list.
(40) In order to improve system performance, embodiments herein reduce the computation cycles required to construct the hot and cold relocation arrays to a single sort operation. Referring again to
(41) The sorted array 480 could contain all data entries designated for relocation in descending order, that is, hottest to coldest. Specifically, A could be the hottest data, while A.sub.size is the coldest data. In these embodiments, the sorted array 480 is copied to the offload engine 460 because users may still need access to data that has been flagged for relocation during the time period within which these data are being relocated.
(42) In embodiments, the first partitioned sector 482 could be for hot data designated for relocation. In this embodiment, the hot data could be sorted in descending order, with A.sub.1 in the first row of the first partitioned sector 482; and the cold data in the second portioned sector 484 could be in ascending order with A.sub.size being located in the first row.
(43) The sorted array 480 A is a very large array that has a set size of A.sub.size. Within this array 480 the top A.sub.n elements, where A.sub.n is less than or equal to A.sub.size, contain the real sorted data and equate to the number of rows in the database 470. Given the current number of elements of A.sub.n for the sorted array 480 A, a hot percentage, represented by a first metric, of % h and a cold percentage, represented by a second metric, of % c, hot (H) array 482 and cold (C) array 484 are created of size H.sub.size=(A.sub.n*% h)/100 and C.sub.size=(A.sub.n*% c)/100 respectively. The index, C.sub.start, of the start of the cold data in the sorted array 480 A is calculated as A.sub.nC.sub.size To reduce latency, the following algorithm is performed once the data has been sorted in ascending order by score on the offload engine 460, which could be a GPU:
(44) Copy the sorted array 480 A.sub.G from GPU device, also called offload engine, memory 433 to the similarly sized array A in storage subsystem memory 432 in the storage subsystem 430. For each array element at index i in the sorted array 480 in offload memory 433:
(45) If index i is less than H.sub.size
(46) Copy the i.sup.th element of sorted array 480 A to the i.sup.th element of the hot data array 482 H
(47) Update storage subsystem memory 432 or database 470 or both with the information from the i.sup.th element of sorted array 480 A
(48) If index i is greater or equal to C.sub.start
(49) Calculate the index C.sub.index into the cold array 484 C:
C.sub.index=(C.sub.start1)(iC.sub.start)
(50) Copy the i.sup.th element of sorted array 480A to the C.sub.index.sup.th element of the cold data array 484 C
(51) Update storage subsystem memory 432 or database 470 or both with the information from the i.sup.th element of sorted array 480 A
(52) If index i is greater than or equal to H.sub.size or less than CI, update storage subsystem memory 432 or database 470 or both with the information from the i.sup.th element of sorted array 480 A
(53) Upon completion of the above algorithm, along with storage subsystem memory 432 or database 470 or both being updated, there are now three sorted arrays:
(54) Array A 480 (with size A.sub.size) sorted in descending order
(55) Array H 482 (with size H.sub.size) sorted in descending order
(56) Array C 484 (with size C.sub.size) sorted in ascending order
(57) However, only one sort was actually performed. The other two sorts were performed only by C.sub.size calculations and H.sub.size+C.sub.size comparisons and memory transfers.
(58) Accordingly, embodiments herein improve data storage system performance by increasing speed and reducing computational overhead. In addition, embodiments herein allow users to customize the amount of resources devoted to data movement among the tiers 134. For illustrative purposes, it may be the case that a user wants to ensure that tier locations 134 for hot data are frequently updated. In that situation, the user would increase the value for the first metric. Similarly, if the user wanted to ensure that more cold data were demoted during a relocation cycle, the user could increase the value of the second metric. Those of skill in the art will recognize that the choice of a first metric being associated with hot data and the second metric being associated with cold data is arbitrary and could be swapped.
(59) The methods described herein may be implemented as software configured to execute in control logic such as contained in a Central Processing Unit (CPU) or Graphics Processing Unit (GPU) of an electronic device such as a computer. In particular, the functions described herein may be implemented as sets of program instructions stored on a non-transitory tangible computer readable storage medium. The program instructions may be implemented utilizing programming techniques known to those of ordinary skill in the art. Program instructions may be stored in a computer readable memory within the computer or loaded onto the computer and executed on computer's microprocessor. However, it will be apparent to a skilled artisan that all logic described herein can be embodied using discrete components, integrated circuitry, programmable logic used in conjunction with a programmable logic device such as a Field Programmable Gate Array (FPGA) or microprocessor, or any other device including any combination thereof. Programmable logic can be fixed temporarily or permanently in a tangible computer readable medium such as random-access memory, a computer memory, a disk, or other storage medium. All such embodiments are intended to fall within the scope of the present invention.
(60) The methods described herein improve the operation of storage system 130, in that the methods change how the storage system 130 adjusts execution of applications 145 to cause execution of the applications 145 to meet specified service level objectives. In particular, using weighting and an off-load engine, the method described herein enable the storage system 130 to reliably meet specified quality of service settings.
(61) Throughout the entirety of the present disclosure, use of the articles a or an to modify a noun may be understood to be used for convenience and to include one, or more than one of the modified noun, unless otherwise specifically stated.
(62) Elements, components, modules, and/or parts thereof that are described and/or otherwise portrayed through the figures to communicate with, be associated with, and/or be based on, something else, may be understood to so communicate, be associated with, and or be based on in a direct and/or indirect manner, unless otherwise stipulated herein.
(63) The following reference numbers are used in the drawings: 100 storage environment 110 host 112 host local memory 114 communication channel 120 host CPU 122 host graphical user interface 124 host command line interface 125 storage system management application 126 application program interface 127 host operating system 130 storage system 132 storage system local memory 134 storage system resources or tiered storage 140 storage system central processing unit 142 storage system applicationFully Automated Storage Tiering (FAST) 143 storage system applicationRemote Data Forwarding (RDF) 144 storage system applicationSnapshot Application (SNAP) 145 storage system applications 148 storage system operating system 160 graphics processing unit 170 memory storage drive 310 hot data 320 warm data 330 cold data 400 storage system 414 communication channel 430 storage subsystem 432 storage subsystem memory 433 offload engine memory 434 storage resources 440 central processing unit 442 FAST 443 RDF 444 SNAP 445 applications 448 operating system 460 offload engine 470 database in storage subsystem 480 sorted array 482 first partitioned array 484 second partitioned array 510 receive first metric 512 receive second metric 514 copy sorted array 516 separate sorted array into partitioned storage sectors 518 write sorted array into subsystem memory or database or both
(64) Various changes and modifications of the embodiments shown in the drawings and described in the specification may be made within the spirit and scope of the present invention. Accordingly, it is intended that all matter contained in the above description and shown in the accompanying drawings be interpreted in an illustrative and not in a limiting sense. The invention is limited only as defined in the following claims and the equivalents thereto.