Methods and apparatus for providing hypervisor level data services for server virtualization
11256529 · 2022-02-22
Assignee
Inventors
- Ziv Kedem (Tel Aviv, IL)
- Chen Yehezkel Burshan (Tel Aviv, IL)
- Yair Kuszpet (Netanya, IL)
- Gil Levonai (Tel Aviv, IL)
Cpc classification
G06F3/0665
PHYSICS
G06F2009/4557
PHYSICS
G06F2201/84
PHYSICS
G06F2009/45579
PHYSICS
G06F3/067
PHYSICS
International classification
G06F9/455
PHYSICS
G06F11/20
PHYSICS
G06F11/14
PHYSICS
Abstract
A cross-host multi-hypervisor system, including a plurality of host sites, each site including at least one hypervisor, each of which includes at least one virtual server, at least one virtual disk that is read from and written to by the at least one virtual server, a tapping driver in communication with the at least one virtual server, which intercepts write requests made by any one of the at least one virtual server to any one of the at least one virtual disk, and a virtual data services appliance, in communication with the tapping driver, which receives the intercepted write requests from the tapping driver, and which provides data services based thereon, and a data services manager for coordinating the virtual data services appliances at the site, and a network for communicatively coupling the plurality of sites, wherein the data services managers coordinate data transfer across the plurality of sites via the network.
Claims
1. A cross-host multi-hypervisor system, comprising: a plurality of host sites executed by at least one processor, each site comprising: at least one hypervisor, each of which comprises: at least one virtual server; at least one virtual disk that is read from and written to by said at least one virtual server; a tapping driver installed within a hypervisor kernel of the at least one hypervisor, wherein the tapping driver resides in a software layer between the at least one virtual server and the at least one virtual disk, and the tapping driver in communication with said at least one virtual server, wherein the tapping driver intercepts write requests made by any one of said at least one virtual server to any one of said at least one virtual disk; a virtual data services appliance, in communication with said tapping driver, which receives the intercepted write requests from said tapping driver, and which provides data services based thereon; and a data services manager for coordinating the virtual data services appliance at the site and for communicatively coupling said plurality of host sites via a network, wherein said data services manager coordinates data transfer across said plurality of host sites via said network; wherein said data services manager provides data recovery for the at least one virtual server executing on the at least one hypervisor; and wherein the data services manager pairs a source virtual protection group including at least one virtual server at a first host site of the plurality of host sites with a target virtual protection group including at least one server at a second host site of the plurality of host sites.
2. The system of claim 1, wherein at least one virtual data services appliance at a first of said plurality of host sites transmits intercepted write requests to at least one virtual data services appliance at a second of said plurality of host sites, via said network.
3. The system of claim 2, wherein the at least one virtual data services appliance at the second of said plurality of host sites periodically applies the intercepted write requests received from the first of said plurality of host sites to at least one virtual disk at the second of said plurality of host sites.
4. The system of claim 1, wherein said virtual data services appliance at each hypervisor at the first host site of said plurality of host sites preserves write order fidelity for write requests intercepted from virtual servers in the source virtual protection group.
5. The system of claim 4, wherein said virtual data services appliance at each hypervisor at the first host site of said plurality of host sites transmit, via said network, the write requests intercepted from virtual servers in the source virtual protection group, to virtual data services appliances of the second host site of said plurality of host sites that include virtual servers in the target virtual protection group.
6. The system of claim 5, wherein said virtual data services appliances of the second host site of said plurality of host sites that include virtual servers in the target virtual protection group periodically apply the write requests intercepted from the virtual servers in the source virtual protection group to at least one virtual disk at the second host site of said plurality of host sites.
7. The system of claim 1, wherein the at least one virtual server in the source virtual protection group each belong to a same hypervisor.
8. The system of claim 1, wherein the at least one virtual server in the source virtual protection group belong to different hypervisors at a same host site as the first host site of said plurality of host sites.
9. The system of claim 1, wherein the at least one virtual server in the source virtual protection group each belong to hypervisors at different sub-host sites of the first host site of said plurality of host sites.
10. The system of claim 1, wherein the at least one virtual server in the target virtual protection group each belong to a same hypervisor.
11. The system of claim 1, wherein the at least one virtual server in the target virtual protection group each belong to different hypervisors at a same second host site as the second host site of said plurality of host sites.
12. The system of claim 1, wherein the at least one virtual server in the target virtual protection group each belong to hypervisors at different sub-host sites of the second host site of said plurality of host sites.
13. The system of claim 1, wherein the source virtual protection group comprises a same number of virtual servers as the target virtual protection group.
14. The system of claim 1, wherein the source virtual protection group comprises a different number of virtual servers from the target virtual protection group.
15. The system of claim 1, wherein the source virtual protection group spans a same number of hypervisors as does the target virtual protection group.
16. The system of claim 1, wherein the source virtual protection group spans a different number of hypervisors than the target virtual protection group.
17. The system of claim 1, wherein said respective data services managers provide data recovery for the at least one virtual server in the source virtual protection group from the at least one virtual server in the target virtual protection group.
18. The system of claim 17, wherein said respective data services managers provide data recovery for the virtual servers in the target virtual protection group from the virtual servers in the source virtual protection group.
19. The system of claim 1, wherein said data services manager at each site monitors environmental changes, including movement of a virtual server from one hypervisor to another, movement of a virtual disk from one hypervisor to another, and addition of a virtual server to a hypervisor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The present invention will be more fully understood and appreciated from the following detailed description, taken in conjunction with the drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
LIST OF APPENDICES
(9) Appendix I is an application programming interface for virtual replication site controller web services, in accordance with an embodiment of the present invention;
(10) Appendix II is an application programming interface for virtual replication host controller web services, in accordance with an embodiment of the present invention;
(11) Appendix III is an application programming interface for virtual replication protection group controller web services, in accordance with an embodiment of the present invention;
(12) Appendix IV is an application programming interface for virtual replication command tracker web services, in accordance with an embodiment of the present invention; and
(13) Appendix V is an application programming interface for virtual replication log collector web services, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION
(14) Aspects of the present invention relate to a dedicated virtual data services appliance (VDSA) within a hypervisor, which is used to provide a variety of hypervisor data services. Data services provided by a VDSA include inter alia replication, monitoring and quality of service.
(15) Reference is made to
(16) Hypervisor 100 also includes a tapping driver 150 installed within the hypervisor kernel. As shown in
(17) Hypervisor 100 also includes a VDSA 160. In accordance with an embodiment of the present invention, a VDSA 160 runs on a separate virtual server within each physical hypervisor. VDSA 160 is a dedicated virtual server that provides data services via one or more data services engines 170. However, VDSA 160 does not reside in the actual I/O data path between I/O backend 130 and physical disk 140. Instead, VDSA 160 resides in a virtual I/O data path.
(18) Whenever a virtual server 110 performs I/O on a virtual disk 120, tapping driver 150 identifies the I/O requests that the virtual server makes. Tapping driver 150 copies the I/O requests, forwards one copy via the conventional path to I/O backend 130, and forwards another copy to VDSA 160. In turn, VDSA 160 enables the one or more data services engines 170 to provide data services based on these I/O requests.
(19) Reference is made to
(20) As shown in
(21) A first copy is stored in persistent storage, and used to provide continuous data protection. Specifically, VDSA 160 sends the first copy to journal manager 250, for storage in a dedicated virtual disk 270. Since all I/O requests are journaled on virtual disk 270, journal manager 250 provides recovery data services for virtual servers 110, such as restoring virtual servers 110 to an historical image. In order to conserve disk space, hash generator 220 derives a one-way hash from the I/O requests. Use of a hash ensures that only a single copy of any I/O request data is stored on disk.
(22) An optional second copy is used for disaster recovery. It is sent via TCP transmitter 230 to remote VDSA 260. As such, access to all data is ensured even when the production hardware is not available, thus enabling disaster recovery data services.
(23) An optional third copy is sent to data analyzer and reporter 240, which generates a report with information about the content of the data. Data analyzer and reporter 240 analyzes data content of the I/O requests and infers information regarding the data state of virtual servers 110. E.g., data analyzer and reporter 240 may infer the operating system level and the status of a virtual server 110.
(24) Reference is made to
(25) In accordance with an embodiment of the present invention, every write command from a protected virtual server in hypervisor 100A is intercepted by tapping driver 150 (
(26) At Site B, the write command is passed to a journal manager 250 (
(27) In addition to write commands being written to the Site B journal, mirrors 110B-1, 110B-2 and 110B-3 of the respective protected virtual servers 110A-1, 110A-2 and 110A-3 at Site A are created at Site B. The mirrors at Site B are updated at each checkpoint, so that they are mirrors of the corresponding virtual servers at Site A at the point of the last checkpoint. During a failover, an administrator can specify that he wants to recover the virtual servers using the latest data sent from the Site A. Alternatively the administrator can specify an earlier checkpoint, in which case the mirrors on the virtual servers 110B-1, 110-B-2 and 110B-3 are rolled back to the earlier checkpoint, and then the virtual servers are recovered to Site B. As such, the administrator can recover the environment to the point before any corruption, such as a crash or a virus, occurred, and ignore the write commands in the journal that were corrupted.
(28) VDSAs 160A and 160B ensure write order fidelity; i.e., data at Site B is maintained in the same sequence as it was written at Site A. Write commands are kept in sequence by assigning a timestamp or a sequence number to each write at Site A. The write commands are sequenced at Site A, then transmitted to Site B asynchronously, then reordered at Site B to the proper time sequence, and then written to the Site B journal.
(29) The journal file is cyclic; i.e., after a pre-designated time period, the earliest entries in the journal are overwritten by the newest entries.
(30) It will be appreciated by those skilled in the art that the virtual replication appliance of the present invention operates at the hypervisor level, and thus obviates the need to consider physical disks. In distinction, conventional replication systems operate at the physical disk level. Embodiments of the present invention recover write commands at the application level. Conventional replication systems recover write commands at the SCSI level. As such, conventional replication systems are not fully application-aware, whereas embodiment of the present invention are full application-aware, and replicate write commands from an application in a consistent manner.
(31) The present invention offers many advantages.
(32) Hardware Agnostic: Because VDSA 160 manages recovery of virtual servers and virtual disks, it is not tied to specific hardware that is used at the protected site or at the recovery site. The hardware may be from the same vendor, or from different vendors. As long as the storage device supports the iSCSI protocol, any storage device, known today or to be developed in the future, can be used.
(33) Fully Scalable: Because VDSA 160 resides in the hypervisor level, architectures of the present invention scale to multiple sites having multiple hypervisors, as described hereinbelow with reference to
(34) Efficient Asynchronous Replication: Write commands are captured by VDSA 160 before they are written to a physical disk at the protected site. The write commands are sent to the recovery site asynchronously, and thus avoid long distance replication latency. Moreover, only delta changes are sent to the recovery site, and not a whole file or disk, which reduces the network traffic, thereby reducing WAN requirements and improving recovery time objective and recovery point objective.
(35) Control of Recovery: An administrator controls when a recovery is initiated, and to what point in time it recovers.
(36) Near-Zero Recovery Point Objective (RPO): VDSA 160 continuously protects data, sending a record of every write command transacted at the protected site to the recovery site. As such, recovery may be performed within a requested RPO.
(37) Near-Zero Recovery Time Objective (RTO): During recovery the mirrors of the protected virtual servers are recovered at the recovery site from VDSA 160B, and synchronized to a requested checkpoint. In accordance with an embodiment of the present invention, during synchronization and while the virtual servers at the recovery site are not yet fully synchronized, users can nevertheless access the virtual servers at the recovery site. Each user request to a virtual server is analyzed, and a response is returned either from the virtual server directly, or from the journal if the information in the journal is more up-to-date. Such analysis of user requests continues until the recovery site virtual environment is fully synchronized.
(38) WAN Optimization between Protected and Recovery Sites: In accordance with an embodiment of the present invention, write commands re compressed before being sent from VDSA 160A to VDSA 160B, with throwing used to prioritize network traffic. As such, communication between the protected site and the recovery site is optimized.
(39) WAN Failover Resilience: In accordance with an embodiment of the present invention, data is cached prior to being transmitted to the recovery site. If WAN 320 goes down, the cached data is saved and, as soon as WAN 320 comes up again, the data is sent to the recovery site and both sites are re-synchronized.
(40) Single Point of Control: In accordance with an embodiment of the present invention, both the protected and the recovery site are managed from the same client console.
(41) As indicated hereinabove, the architecture of
(42) The hypervisors are shown in system 300 with their respective VDSA's 160A/1, 160A/2, . . . , and the other components of the hypervisors, such as the virtual servers 110 and virtual disks 120, are not shown for the sake of clarity. An example system with virtual servers 110 is shown in
(43) The sites include respective data services managers 310A, 310B and 310C that coordinate hypervisors in the sites, and coordinate hypervisors across the sites.
(44) The system of
(45) Data services managers 310A, 310B and 310C are control elements. The data services managers at each site communicate with one another to coordinate state and instructions. The data services managers track the hypervisors in the environment, and track health and status of the VDSAs 160A/1, 160A/2, . . . .
(46) It will be appreciated by those skilled in the art that the environment shown in
(47) In accordance with an embodiment of the present invention, the data services managers enable designating groups of specific virtual servers 110, referred to as virtual protection groups, to be protected. For virtual protection groups, write order fidelity is maintained. The data services managers enable designating a replication target for each virtual protection group; i.e., one or more sites, and one or more hypervisors in the one or more sites, at which the virtual protection group is replicated. A virtual protection group may have more than one replication target. The number of hypervisors and virtual servers within a virtual protection group and its replication target are not required to be the same.
(48) Reference is made to
(49) Reference is made to
(50) More generally, the recovery host may be assigned to a cluster, instead of to a single hypervisor, and the recovery datastore may be assigned to a pool of resources, instead of to a single datastore. Such assignments are of particular advantage in providing the capability to recover data in an enterprise internal cloud that includes clusters and resource pools, instead of using dedicated resources for recovery.
(51) The data services managers synchronize site topology information. As such, a target site's hypervisors and datastores may be configured from a source site.
(52) Virtual protection groups enable protection of applications that run on multiple virtual servers and disks as a single unit. E.g., an application that runs on virtual servers many require a web server and a database, each of which run on a different virtual server than the virtual server that runs the application. These virtual servers may be bundled together using a virtual protection group.
(53) Referring back to
(54) For each virtual server 110 and its target host, each VDSA 160A/1, 160A/2, . . . replicates IOs to its corresponding replication target. The VDSA can replicate all virtual servers to the same hypervisor, or to different hypervisors. Each VDSA maintains write order fidelity for the IOs passing through it, and the data services manager coordinates the writes among the VDSAs.
(55) Since the replication target hypervisor for each virtual server 110 in a virtual protection group may be specified arbitrarily, all virtual servers 110 in the virtual protection group may be replicated at a single hypervisor, or at multiple hypervisors. Moreover, the virtual servers 110 in the source site may migrate across hosts during replication, and the data services manager tracks the migration and accounts for it seamlessly.
(56) Reference is made to
(57) Site A
(58) Hypervisor 100A/1: virtual servers 110A/1-1, 110A/1-2, 110A/1-3.
(59) Hypervisor 100A/2: virtual servers 110A/2-1, 110A/2-2, 110A/2-3.
(60) Hypervisor 100A/3: virtual servers 110A/3-1, 110A/3-2, 110A/3-3.
(61) Site B
(62) Hypervisor 100B/1: virtual servers 110B/1-1, 110B/1-2, 110B/1-3.
(63) Hypervisor 100B/2: virtual servers 110B/2-1, 110B/2-2, 110B/2-3.
(64) Site C
(65) Hypervisor 100C/1: virtual servers 110C/1-1, 110C/1-2, 110C/1-3, 110C/1-4.
(66) As further shown in
(67) VPG1 (shown with upward-sloping hatching)
(68) Source at Site A: virtual servers 110A/1-1, 110A/2-1, 110A/3-1
(69) Replication Target at Site B: virtual servers 110B/1-1, 110B/1-2, 110B/2-1
(70) VPG2 (shown with downward-sloping hatching)
(71) Source at Site B: virtual servers 110B/1-3, 110B/2-2
(72) Replication Target at Site A: virtual servers 110A/1-2, 110A/2-2
(73) VPG3 (shown with horizontal hatching)
(74) Source at Site A: virtual server 110A/3-3
(75) Replication Target at Site B: virtual serer 110B/2-3
(76) Replication Target at Site C: virtual server 110C/1-4
(77) VPG4 (shown with vertical hatching)
(78) Source at Site A: virtual servers 110A/1-3, 110A/2-3, 110A/3-2
(79) Replication Target at Site C: virtual servers 110C/1-1, 110C/1-2, 110C/1-3
(80) As such, it will be appreciated by those skilled in the art that the hypervisor architecture of
(81) The present invention may be implemented through an application programming interface (API), exposed as web service operations. Reference is made to Appendices I-V, which define an API for virtual replication web services, in accordance with an embodiment of the present invention.
(82) In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made to the specific exemplary embodiments without departing from the broader spirit and scope of the invention as set forth in the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.