Dynamic resource allocation for distributed cluster-storage network
09792059 · 2017-10-17
Assignee
Inventors
- Carlos F. Fuente (Winchester, GB)
- John E. Lindley (San Jose, CA, US)
- William J. Scales (Winchester, GB)
Cpc classification
G06F3/0619
PHYSICS
International classification
G06F3/00
PHYSICS
G06F12/00
PHYSICS
G01R31/08
PHYSICS
H04L12/28
ELECTRICITY
G06F13/28
PHYSICS
Abstract
An apparatus, method and computer program in a distributed cluster storage network comprises storage control nodes to write data to storage on request from a host; a forwarding layer at a first node to forward data to a second node; a buffer controller at each node to allocate buffers for data to be written; and a communication link between the buffer controller and the forwarding layer at each node to communicate a constrained or unconstrained status indicator of the buffer resource to the forwarding layer. A mode selector selects a constrained mode of operation requiring allocation of buffer resource at the second node and communication of the allocation before the first node can allocate buffers and forward data, or an unconstrained mode of operation granting use of a predetermined resource credit provided by the second to the first node and permitting forwarding of a write request with data.
Claims
1. A method of operating a distributed cluster storage network having a host computer system and a storage subsystem, by a processor device, comprising: receiving at a first of said plurality of storage control nodes a request to write data to storage from said host computer system; forwarding said data by a forwarding layer at said first of a plurality of storage control nodes to a second of said plurality of storage control nodes; allocating buffer resource for data to be written to said storage by a buffer control component at each of said plurality of storage control nodes; and communicating a constrained and unconstrained status indicator of said buffer resource to said forwarding layer, the status indicator communicating an indication of when a particular set of input/output (I/O) operations are being processed expeditiously and when the particular set of I/O operations are under delay.
2. The method according to claim 1, further comprising, responsive to receiving said constrained status indicator at said forwarding layer, selecting a constrained mode of operation of a write, said constrained mode of operation requiring allocation of buffer resource at said second storage control node and communication of said allocation before said first storage control node becomes operable to allocate buffer resource for said data and to forward said data.
3. The method according to claim 1, further comprising, responsive to receiving said unconstrained status indicator at said forwarding layer, selecting an unconstrained mode of operation of a write, said unconstrained mode of operation granting use of a predetermined resource credit provided by said second to said first of said storage control nodes and permitting forwarding of a write request with said data from said first to said second of said storage control nodes.
4. The method according to claim 1, wherein operating said distributed cluster storage network comprises operating a storage virtualization controller.
5. An apparatus operable in a distributed cluster storage network having a host computer system and a storage subsystem having a plurality of storage control nodes each operable to write data to storage responsive to a request from said host computer system, comprising: a forwarding layer at a first of said plurality of storage control nodes operable to forward data to a second of said plurality of storage control nodes; a buffer control component at each of said plurality of storage control nodes operable to allocate buffer resource for data to be written to said storage; and a communication link between said buffer control component and said forwarding layer at each of said plurality of storage control nodes operable to communicate a constrained and unconstrained status indicator of said buffer resource to said forwarding layer, the status indicator communicating an indication of when a particular set of input/output (I/O) operations are being processed expeditiously and when the particular set of I/O operations are under delay.
6. The apparatus according to claim 5, further comprising a mode selector component responsive to receiving said constrained status indicator at said forwarding layer for selecting a constrained mode of operation of a write, said constrained mode of operation requiring allocation of buffer resource at said second storage control node and communication of said allocation before said first storage control node becomes operable to allocate buffer resource for said data and to forward said data.
7. The apparatus according to claim 6, said mode selector component responsive to receiving said unconstrained status indicator at said forwarding layer for selecting an unconstrained mode of operation of a write, said unconstrained mode of operation granting use of a predetermined resource credit provided by said second to said first of said storage control nodes and permitting forwarding of a write request with said data from said first to said second of said storage control nodes.
8. The apparatus according to claim 5, wherein said distributed cluster storage network comprises a storage virtualization controller.
9. A computer program product for operating a distributed cluster storage network in a computing environment by a processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that receives at a first of said plurality of storage control nodes a request to write data to storage from said host computer system; an executable portion that forwards said data by a forwarding layer at said first of a plurality of storage control nodes to a second of said plurality of storage control nodes; an executable portion that allocates buffer resource for data to be written to said storage by a buffer control component at each of said plurality of storage control nodes; and an executable portion that communicates a constrained and unconstrained status indicator of said buffer resource to said forwarding layer, the status indicator communicating an indication of when a particular set of input/output (I/O) operations are being processed expeditiously and when the particular set of I/O operations are under delay.
10. The computer program product according to claim 9, further comprising an executable portion that, responsive to receiving said constrained status indicator at said forwarding layer, selects a constrained mode of operation of a write, said constrained mode of operation requiring allocation of buffer resource at said second storage control node and communication of said allocation before said first storage control node becomes operable to allocate buffer resource for said data and to forward said data.
11. The computer program product according to claim 9, further comprising an executable portion that, responsive to receiving said unconstrained status indicator at said forwarding layer, selects an unconstrained mode of operation of a write, said unconstrained mode of operation granting use of a predetermined resource credit provided by said second to said first of said storage control nodes and permitting forwarding of a write request with said data from said first to said second of said storage control nodes.
12. The computer program product of claim 9, wherein operating said distributed cluster storage network comprises operating a storage virtualization controller.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) A preferred embodiment of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
(2)
(3)
(4)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
(5) In a preferred SVC embodiment, the buffer control component responsible for providing buffer resource from non-volatile cache maintains status on a per-vdisk (host volume) level which indicates whether that vdisk is running in the ‘constrained resource’ mode—so that resources are known to be depleted, or whether the vdisk is permitted to run in an ‘unconstrained resource’ mode with respect to allocating resources for new host I/O.
(6) The SVC clustering infrastructure is used to communicate this status to the forwarding layer, on all nodes. Within each node the forwarding layer uses this status to decide between two completely separate paths for handling write I/O, where forwarding is required. (Where the node that received the host I/O is also one of the nodes on which the cache function is able to operate, then the I/O is passed to cache without any buffers being allocated by the forwarding layer at all, and the algorithm here is not required).
(7) Turning to
(8) In ‘constrained resource’ mode, the flow is (with reference to
(9) 500. Host transmits I/O write request to first node
(10) 502. First (forwarding) node forwards request to second (forwarded-to) node which contains the cache function able to process I/O for that vdisk
(11) 504. Second node's cache layer decides to process, allocates buffer in which to receive data, and sends request for data to first node
(12) 506. First node allocates buffer, and sends request for data to host
(13) 508. Host transmits data, and data is received in first node in buffer defined at 506
(14) 510. First node is notified of completion of data transfer, and starts data transfer to second node in buffer defined at 504
(15) 512. Second node is notified of data transfer completion, and the cache layer resumes processing of write I/O request using received data
(16) In ‘unconstrained resource’ mode, there is an additional setup flow before I/O is processed (with reference to
(17) 600. Second (forwarded-to) node allocates some buffer resource
(18) 602. Second node transmits credits to first (forwarding) node entitling that node to transmit a defined amount of write data
(19) Then, the following write I/O flow is performed when the I/O is actually received:
(20) 604. Host transmits I/O write request to first node
(21) 606. First node allocates buffer, and sends request for data to host
(22) 608. Host transmits data, and data is received in first node in buffer defined at 606
(23) 610. First (forwarding) node forwards request with data to second (forwarded-to) node which contains the cache function able to process I/O for that vdisk
(24) 612. Second node is notified of receipt of I/O request and data, and cache layer processes I/O request using the received data.
(25) On completion of the I/O request, the freed buffer resource is used to repeat the setup cycle and provide new credit to the forwarding node for future I/O.
(26) The credit messages can most optimally be piggy-backed on other messages that flow in the same direction to minimise overhead caused by these. The resources used by the flows need to be sufficiently separate, to avoid deadlock arising from different paths allocating the same resources in different orders, as would be clear to one of ordinary skill in the art of distributed I/O systems.
(27) It will be clear to one of ordinary skill in the art that the preferred embodiment of the present invention is industrially applicable in providing advantageous efficiencies in the operation of distributed cluster storage networks.
(28) It will be clear to one of ordinary skill in the art that all or part of the method of the preferred embodiments of the present invention may suitably and usefully be embodied in a logic apparatus, or a plurality of logic apparatus, comprising logic elements arranged to perform the steps of the method and that such logic elements may comprise hardware components, firmware components or a combination thereof.
(29) It will be equally clear to one of skill in the art that all or part of a logic arrangement according to the preferred embodiments of the present invention may suitably be embodied in a logic apparatus comprising logic elements to perform the steps of the method, and that such logic elements may comprise components such as logic gates in, for example a programmable logic array or application-specific integrated circuit. Such a logic arrangement may further be embodied in enabling elements for temporarily or permanently establishing logic structures in such an array or circuit using, for example, a virtual hardware descriptor language, which may be stored and transmitted using fixed or transmittable carrier media.
(30) It will be appreciated that the method and arrangement described above may also suitably be carried out fully or partially in software running on one or more processors (not shown in the figures), and that the software may be provided in the form of one or more computer program elements carried on any suitable data-carrier (also not shown in the figures) such as a magnetic or optical disk or the like. Channels for the transmission of data may likewise comprise storage media of all descriptions as well as signal-carrying media, such as wired or wireless signal-carrying media.
(31) A method is generally conceived to be a self-consistent sequence of steps leading to a desired result. These steps require physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, parameters, items, elements, objects, symbols, characters, terms, numbers, or the like. It should be noted, however, that all of these terms and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
(32) The present invention may further suitably be embodied as a computer program product for use with a computer system. Such an implementation may comprise a series of computer-readable instructions either fixed on a tangible medium, such as a computer readable medium, for example, diskette, CD-ROM, ROM, or hard disk, or transmittable to a computer system, via a modem or other interface device, over either a tangible medium, including but not limited to optical or analogue communications lines, or intangibly using wireless techniques, including but not limited to microwave, infrared or other transmission techniques. The series of computer readable instructions embodies all or part of the functionality previously described herein.
(33) Those skilled in the art will appreciate that such computer readable instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Further, such instructions may be stored using any memory technology, present or future, including but not limited to, semiconductor, magnetic, or optical, or transmitted using any communications technology, present or future, including but not limited to optical, infrared, or microwave. It is contemplated that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation, for example, shrink-wrapped software, pre-loaded with a computer system, for example, on a system ROM or fixed disk, or distributed from a server or electronic bulletin board over a network, for example, the Internet or World Wide Web.
(34) In one alternative, the preferred embodiment of the present invention may be realized in the form of a computer implemented method of deploying a service comprising steps of deploying computer program code operable to, when deployed into a computer infrastructure and executed thereon, cause said computer system to perform all the steps of the method.
(35) In a further alternative, the preferred embodiment of the present invention may be realized in the form of a data carrier having functional data thereon, said functional data comprising functional computer data structures to, when loaded into a computer system and operated upon thereby, enable said computer system to perform all the steps of the method.
(36) It will be clear to one skilled in the art that many improvements and modifications can be made to the foregoing exemplary embodiment without departing from the scope of the present invention.