System and method for performing an image level snapshot and for restoring partial volume data
09619341 ยท 2017-04-11
Assignee
Inventors
- Anand PRAHLAD (Bangalore, IN)
- David Ngo (Shrewsbury, NJ, US)
- Prakash Varadharajan (Manalapan, NJ)
- Rahul S. Pawar (Marlboro, NJ, US)
- Avinash Kumar (Sunnyvale, CA, US)
Cpc classification
G06F11/1448
PHYSICS
Y10S707/99953
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06F16/128
PHYSICS
Y10S707/99944
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06F11/1446
PHYSICS
G06F16/1844
PHYSICS
G06F2201/84
PHYSICS
Y10S707/99955
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y10S707/99952
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
The present invention relates to a method for performing an image level copy of an information store. The present invention comprises performing a snapshot of an information store that indexes the contents of the information store, retrieving data associated with the contents of the information store from a file allocation table, copying the contents of the information store to a storage device based on the snapshot, and associating the retrieved data with the copied contents to provide file system information for the copied contents.
Claims
1. A computing system that restores data to an information store, the method comprising: an information store; at least one storage manager in communication with the information store, the at least one storage manager comprising computer hardware, wherein the storage manager directs performance of a first snapshot of the information store at a first point in time, the first snapshot comprising first data about at least one file existing at the first point in time; at least one media agent in communication with the at least one storage manager, the at least one media agent copies at least the first data associated with the first snapshot to one or more remotely-located storage devices that are remotely located from the information store; the at least one media agent stores information about the first data in at least one map, the at least one map associates the first data with the first snapshot, and identifies the location of the first data in the one or more remotely-located storage devices; the storage manager directs performance of a second snapshot of the information store at a second point in time occurring after the first point in time, wherein the second snapshot indexes changed data that changed after the first point in time, wherein the changed data comprises portions of the at least one file that changed since the first point in time, wherein the at least one media agent copies the changed data indexed by the second snapshot to the one or more remotely located storage devices; the at least one media agent stores information about the changed data in the at least one map, wherein the at least one map associates the changed data with the second snapshot, and identifies the location of the changed data in the one or more remotely located storage devices; and the at least one media agent restores to the information store a copy of the at least one file existing at the second point in time using the map in association with the first and second snapshots to restore from the one or more remotely-located storage devices at least a portion of the at least one file existing at the first point in time and the changed data existing at the second point in time.
2. The system of claim 1 wherein the changed data comprises file information obtained from a file access table.
3. The system of claim 1 wherein the information about the changed data comprises a copy of the changed data.
4. The system of claim 1 wherein the map identifies one or more previously saved snapshots where previous copies of the changed blocks were stored.
5. The system of claim 1 wherein the information about changed data comprises one or more file names associated with the changed data.
6. The system of claim 1 wherein the at least one media agent stores copies of the changed data in association with a previous snapshot.
7. The system of claim 1 further comprising a block filter to identify the changed blocks.
8. The system of claim 1 further comprising copy on write to identify the changed blocks.
9. The system of claim 1 wherein restoring the copy of the at least one file existing at the second point in time comprises restoring the copy of the at least one file from the first snapshot and replacing portions of the restored at least one file with the changed data from the second snapshot.
10. The system of claim 1 wherein restoring the copy of the at least one file existing at the second point in time comprises restoring portions of the at least one file from the second snapshot and adding at least a portion of the at least one file existing at the first point in time.
11. A method in a computing system of restoring data to an information store, the method comprising: storing instructions in a non-transitory computer storage which perform the following acts when executed by one or more computing devices; performing a first snapshot of an information store at a first point in time, the first snapshot comprising first data about at least one file existing at the first point in time; copying at least the first data associated with the first snapshot to one or more remotely-located storage devices that are remotely located from the information store; storing information about the first data in at least one map, the at least one map associates the first data with the first snapshot, and identifies the location of the first data in the one or more remotely-located storage devices; performing a second snapshot of the information store at a second point in time occurring after the first point in time, wherein the second snapshot indexes changed data that changed after the first point in time, wherein the changed data comprises portions of the at least one file that changed since the first point in time; copying the changed data indexed by the second snapshot to the one or more remotely located storage devices storing information about the changed data in the at least one map, wherein the at least one map associates the changed data with the second snapshot, and identifies the location of the changed data in the one or more remotely located storage devices; and restoring from the one or more remotely-located storage devices to the information store, a copy of the at least one file existing at the second point in time using the map in association with the first and second snapshots to restore at least a portion of the at least one file existing at the first point in time and the changed data existing at the second point in time.
12. The method of claim 11 wherein the changed data comprises file information obtained from a file access table.
13. The method of claim 11 wherein the information about the changed data comprises a copy of the changed data.
14. The method of claim 11 wherein the map identifies one or more previously saved snapshots where previous copies of the changed blocks were stored.
15. The method of claim 11 wherein the information about the changed data comprises one or more file names associated with the changed data.
16. The method of claim 11 further comprising storing copies of the changed data in association with a previous snapshot.
17. The method of claim 11 wherein identifying the changed blocks uses a block filter to identify changes.
18. The method of claim 11 wherein identifying the changed blocks uses copy on write to identify changes.
19. The method of claim 11, wherein restoring the copy of the at least one file existing at the second point in time comprises restoring the copy of the at least one file from the first snapshot and replacing portions of the restored at least one file with the changed data from the second snapshot.
20. The method of claim 11, wherein restoring the copy of the at least one file existing at the second point in time comprises restoring portions of the at least one file from the second snapshot and adding at least a portion of the at least one file existing at the first point in time.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention is illustrated in the figures of the accompanying drawings which are meant to be exemplary and not limiting, in which like references are intended to refer to like or corresponding parts, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
(8) With reference to
(9) As shown, the system of
(10) A data agent 95 is a software module that is generally responsible for retrieving data from an information store 90 for copies, snapshots, archiving, migration, and recovery of data stored in an information store 90 or other memory location, e.g., hard disc drive. Each client computer 85 preferably has at least one data agent 95 and the system can support many client computers 85. The data agent 95 provides an interface to an information store 90 to execute copies, snapshots, archiving, migration, recovery and other storage operations on data in conjunction with one or more media agents 105. According to one embodiment, each client 85 runs a number of data agents 95, wherein each data agent is configured to interface with data generated by or from a specific application, e.g., a first data agent to interface with Microsoft Exchange data and a second data agent to interface with Oracle database data. As is explained in greater detail herein, a data agent 95 is III communication with one or more media agents 105 to effect the distributed storage of snapshots on one or more storage devices 115 that are remote from the information store that is the source of the snapshot 90.
(11) The storage manager 100 is a software module or application that coordinates and controls other components comprising the system, e.g., data and media agents, 95 and 105, respectively. The storage manager 100 communicates with data 95 and media 105 agents to control and manage snapshot creation, migration, recovery and other storage operations. According to one embodiment, the storage manger 100 maintains data in a storage manager index cache 120 that instructs a given data agent 95 to work in conjunction with a specific media agent 105 to store snapshots on one or more storage devices 115.
(12) The storage manager 100 maintains a storage manager index cache 120. Data in the storage manager index cache 120, which the storage manager 100 collects from data agents 95, media agents 105, user and other applications, is used to indicate, track and associate: logical relationships and associations between components of the system, user preferences, management tasks, and other data that is useful to the system. For example, the storage manager index cache 120 may contain data that tracks logical associations between media agents 105 and storage devices 115. The storage manager index cache 120 may also contain data that tracks the status of storage operations to be performed, storage patterns such as media use, storage space growth, network bandwidth, service level agreement (SLA) compliance levels, data protection levels, storage policy information, storage criteria associated with user preferences, data retention criteria, storage operation preferences, and other storage-related information.
(13) A media agent 105 is a software module that transfers data in conjunction with one or more data agents 95, as directed by the storage manager 100, between an information store 90 and one or more storage devices 115, such as a tape library, a magnetic media storage device, an optical media storage device, or other storage device. The media agent 105 communicates with and controls the one or more storage devices 115. According to one embodiment, the media agent 105 may communicate with the storage device 115 via a local bus, such as a SCSI adaptor. Alternatively, the storage device 115 may communicate with the data agent 105 via a Storage Area Network (SAN). Other types of communication techniques, protocols and media are contemplated as falling within the scope of the invention.
(14) The media agent 105 receives snapshots, preferably with the changed data that is tracked by the snapshot, from one or more data agents 95 and determines one or more storage devices 115 to which it should write the snapshot. According to one embodiment, the media agent 105 applies load-balancing algorithms to select a storage device 115 to which it writes the snapshot. Alternatively, the storage manager 100 instructs the media agent 105 as to which storage device 115 the snapshot should be written. In this manner, snapshots from a given information store 90 may be written to one or more storage devices 115, ensuring data is available for restoration purposes in the event that the information store fails. Either the media agent or the storage manager 100 records the storage device on which the snapshot is written in a replication volume table 102, thereby allowing the snapshot to be located when required for restoring the information store 90.
(15) A media agent 105 maintains a media agent index cache 110 that stores index data the system generates during snapshot, migration, and restore operations. For example, storage operations for Microsoft Exchange data generate application specific index data regarding the substantive Exchange data. Similarly, other applications may be capable of generating application specific data during a copy or snapshot. This data is generally described as metadata, and may be stored in the media agent index cache 110. The media agent index cache 110 may track data that includes, for example, information regarding the location of stored data on a given volume. The media agent index cache 110 may also track data that includes, but is not limited to, file names, sizes, creation dates, formats, application types, and other file-related information, information regarding one or more clients associated stored data, information regarding one or more storage policies, storage criteria, storage preferences, compression information, retention related information, encryption related information, and stream related information. Index data provides the system with an efficient mechanism for locating user files during storage operations such as copying, performing snapshots and recovery.
(16) This index data is preferably stored with the snapshot that is backed up to the storage device 115, although it is not required, and the media agent 105 that controls the storage operation may also write an additional copy of the index data to its media agent index cache 110. The data in the media agent index cache 110 is thus readily available to the system for use in storage operations and other activities without having to be first retrieved from the storage device 115.
(17) In order to track the location of snapshots, the system uses a database table or similar data structure, referred to herein as a replication volume table 102. The replication volume table 102, among other advantages, facilitates the tracking of multiple snapshots across multiple storage devices 115. For example, the system might, as directed by a policy or a user, store a first snapshot to on first storage device A, such as a tape drive or library, and then store subsequent snapshots containing only the changed cluster(s), tn, on a second storage device B, such as an optical drive or library. Alternatively, instructions may be stored within system components, e.g., a storage manger 100 or media agent 105, directing the storage devices 115 used to store snapshots. Information regarding the storage device 115 to which the snapshot is written, as well as other information regarding the snapshot generally, is written to the replication volume table 102. An exemplary structure according to one embodiment is as follows:
(18) TABLE-US-00001 { id serial, // PRIMARY KEY FOR THIS TABLE PointInTime integer, // CreationTime integer, // Timestamp of RV creation ModifyTime integer, // Timestamp of last RV update Current State integer, // Current state of R V CurrentRole integer, // Current role of RV PrimaryVolumeId integer, // FOREIGN KEY FOR PhysicalVolumeID integer, // SNR Volume TABLE ReplicationPolicyId integer, // FOREIGN KEY FOR RVScratch integer, // SNR Volume TABLE Flags integer, FOREIGN KEY FOR JobId longlong Replication Policy TABLE Snap VolumeId integer, // FOREIGN KEY FOR }
(19) In the exemplary replication volume table, id is a unique identification number assigned by the system to the snapshot; PointInTime represents the date and time that the snapshot was created; CreationTime represents the date and time that the snapshot was completed; ModifyTime is the recorded date and time of the snapshot taken prior to the current snapshot; Current state is an identifier used to indicate a current status of the snapshot (e.g. pending, completed, unfinished, etc.); PrimaryVolumeId is the identifier for the information store 90 from which the snapshot is being made; PhysicalVolumeId is a hardware identifier for the information store 90; RVScratchVolumeId is an identifier for a scratch volume, which in some embodiments may be used to buffer additional memory as known to those of skill in the art; Flags contains a 32 bit word for various settings such as whether a snapshot has been taken previously, etc.; JobId stores the identifier for the job as assigned by a storage management module; and the Snap VolumeId points to the physical destination storage device 115 to which the snapshot is written.
(20) As each snapshot indexes an information store at a given point in time, a mechanism must be provided that allows the snapshots taken of an information store to be chronologically related so that they are properly used for restoring an information store 90. According to the replication volume table 102, the CurrentRole integer may store a value for the relative position of a given snapshot in hierarchy of snapshots taken from a given information store 90 (e.g. first (to), second (tl), t2, t3, etc.)
(21) In some embodiments, components of the system may reside on and be executed by a single computer. According to this embodiment, a data agent 95, media agent 105 and storage manager 100 are located at the client computer 85 to coordinate and direct local copying, archiving, migration, and retrieval application functions among one or more storage devices 115 that are remote or distinct from the information store 90. This embodiment is further described in U.S. patent application Ser. No. 09/610,738.
(22) One embodiment of a method for using the system of the present invention to perform snapshots is illustrated in the flow diagram of
(23) Advantageously, the snapshot and data copied from the information store may be written to a storage device that is remote or different from the information store, step 302, e.g., local data from a given information store is written to a storage device attached to a network. The selection of a destination storage device for the snapshot may be accomplished using one or more techniques known to those of skill in the art. For example, a fixed mapping may be provided indicating a storage device for which all snapshots and copied or changed data should be written. Alternatively, an algorithm may be implemented to dynamically select a storage device from among a number of storage devices available on a network. For example, a storage manager may select a media agent to handle the transfer of the snapshot and copied data to a specific storage device based on criteria such as available bandwidth, other scheduled storage operations, media availability, storage policies, storage preferences, or other consider considerations. The snapshot, preferably along with the data from the information store, is written to the selected destination storage device, step 304. According to certain embodiments, the snapshot contains information regarding the files and folders that are tracked by the snapshot. Alternatively, the information regarding the files and folders that are indexed by the snapshot, e.g., file system information, are stored on the storage device.
(24) One embodiment of a snapshot used to track clusters read from the information store to clusters in a snapshot, as well as to map file and folder names corresponding to the snapshot clusters, is illustrated in
(25) The snapshot 350 is used to associate the original cluster numbers from an information store with clusters on a storage device, which in the present embodiment is a magnetic tape. It should be appreciated by those of skill in the art that the present invention is not limited to magnetic tape, and that the systems and methods described herein may be applicable to using snapshots with other storage technologies, e.g., storing disk geometry data to identify the location of a cluster on a storage device, such as a hard disk drive.
(26) The tape offsets 356 for the clusters 372 in the snapshot 370 are mapped to original disk cluster information 352. File and folder names 354 may be scanned from the information store's FAT and also mapped to the tape offsets 356. A file part column 358 in the snapshot tracks the clusters 372 for each file and folder where each file and folder contains an entry for the first cluster 372. For files or folders that are stored in more than one cluster, sometimes not in contiguous clusters, the offset table entry for each further cluster is numbered consecutively 358.
(27) In order to identify the files and folders represented by the stored clusters 372, e.g., changed data, in the snapshot 370, the map may exclude data from columns relating to the original disc clusters 352 and last snapshot 360. In order to keep track of changed verses unchanged clusters, however, the original disk cluster information 352 is stored in the map 350. Other information may also be stored in the map 350, such as timestamps for last edit and creation dates of the files.
(28) For each snapshot, even though only clusters that have been changed or created since a previous snapshot are tracked in a given snapshot after the initial snapshot to, the snapshot may be provided with the data from all previous snapshots to provide the latest snapshot with folder and file information such that an index of the entire information store is maintained concurrently each snapshot. Alternatively, this may be bypassed in favor of creating a snapshot that indexes all data at a given point in time in the information store and copying only changed data.
(29) Entries from each snapshot 350 may also contain a last-snapshot field 360 that holds an identifier for the last snapshot containing the cluster indexed by the entry at the time the current snapshot was created. According to an alternative embodiment, e.g., for snapshots that do not store the information from the information store's FAT, the snapshot only tracks clusters stored in the information store with the clusters indexed by the snapshot. For those embodiments, the snapshot 350 contains neither file and folder information 345 nor file part information 358.
(30) Returning to
(31) For each snapshot, tn, that is taken of the information store, a comparison is performed such that only the clusters which have changed or been created since the last snapshot, tn.) was taken of that volume are stored, step 310. For example, in some embodiments the data agent employs a block filter or similar construct known to those of skill in the art to compare snapshot tn with tn-i and thereby detect changed clusters on an information store. Alternatively, the data agent may use other techniques know in the art, such as Copy on Write (COW), to identify changed data on an information store. If a given cluster in the information store has changed since the last snapshot in which the cluster appears, or if the cluster from the information store was created subsequent to the last snapshot, then the cluster is read from information store and stored with the new snapshot being written to the storage device, step 314.
(32) A determination is made regarding the given storage device to which the snapshot and changed data (which may also include newly created data) is to be written, step 316. Techniques such as those described in conjunction with storage of the initial snapshot, steps 302 and 304, may also be employed regarding storage of subsequent snapshots. Advantageously, the initial snapshot and any subsequent snapshot may written to any storage device available in the network. Furthermore, there is no limitation to the combination of devices used to store the snapshots for a given information store. For example, an initial snapshot may be written to storage device A, a second and third snapshots may be written to storage device B, and a fourth snapshot may be written to storage device C. Regardless of the storage device that is selected, step 316, the replication volume table is updated to reflect the location, step 318, allowing snapshots to be located when a user requests to restore the information store from which the snapshots were taken.
(33) System administrators use stored snapshots, in conjunction with the changed data that the snapshot indexes or tracks, to recover lost or corrupted information.
(34) When the user selects a snapshot, the storage manager performs a query of the replication volume table to identify all previous snapshots for an information store from which the selected snapshot was taken, step 404. This may be accomplished by performing a search on the replication volume table for all snapshots with the same PrimaryVolumeId or PhysicalVolumeId. Starting with the selected snapshot, for each snapshot in the query result, loop 406, the storage manager directs a given media agent, in conjunction with a given data agent, to read and restore all clusters of changed data not already restored from clusters indexed by a prior snapshot, e.g., the latest version of each cluster, step 408. According to one embodiment, this is accomplished by restoring the clusters indexed by each of the snapshots in the query result, starting with the original snapshot, and overwriting clusters indexed by the original snapshot with changed clusters indexed by subsequent snapshots up to the snapshot representing the point in time selected by the user or system process. As an alternative, the last snapshot field of the selected snapshot may be utilized to determine the snapshots that should be utilized in the restore operation. The latest version of each cluster, starting with those indexed by the selected snapshot, is then restored, step 408.
(35) As discussed above, embodiments of the invention are contemplated wherein FAT information of the information store is stored in conjunction with a given snapshot, e.g. the file and folder information corresponding to the clusters of changed data indexed by a given snapshot. Accordingly, the storage manager may allow the user to select individual files and/or folders to be selected for restoration from a snapshot. With reference to
(36) When the user desires to restore the information store to a given point in time, the user interface allows the user to view the files and folders indexed by a snapshot representing the point in time as if the user were viewing a folder structure on a storage device, step 500. The storage manager retrieves the file and folder information for changed data that is indexed by one or more snapshots for display. Once one or more files and/or folders are selected, step 502, the storage manager selects those snapshots that index the given version of the files and/or folders using the replication volume table, step 502. Each snapshot indexing data for the one or more files to be restored are opened serially, loop 506. The changed data for the selected files and folders that are indexed by the snapshots are restored from clusters indexed by each snapshot, step 508, but not overwriting clusters indexed by prior snapshots.
(37) While the invention has been described and illustrated in connection with preferred embodiments, many variations and modifications as will be evident to those skilled in this art may be made without departing from the spirit and scope of the invention, and the invention is thus not to be limited to the precise details of methodology or construction set forth above as such variations and modification are intended to be included within the scope of the invention.