G06F16/125

DYNAMIC SNAPSHOT SCHEDULING USING STORAGE SYSTEM METRICS
20230222094 · 2023-07-13 ·

Dynamic snapshot scheduling techniques are provided using storage system metrics. One method comprises obtaining a schedule for generating snapshots of a portion of a storage system; automatically adjusting snapshot generation parameters in the schedule based on: (i) a current storage pool usage metric, (ii) an input/output metric of at least one storage resource in the portion of the storage system, (iii) a measure of snapshots in a destroying state, and/or (iv) a measure of a number of created snapshots; and initiating a generation of a snapshot of the storage system portion in accordance with the adjusted schedule. A snapshot generation frequency may be increased in response to an increase of: the current storage pool usage metric, the number of snapshots in the destroying state, and/or the number of created snapshots. A snapshot generation frequency may be decreased in response to an increase of the I/O metric of the at least one storage resource.

Performing secondary copy operations based on deduplication performance

An improved information management system is described herein in which the information management system can evaluate the deduplication performance of secondary copy operations and dynamically adjust the manner in which secondary copy data is created to minimize the negative effects of performing deduplication. Furthermore, the improved information management system can improve deduplication performance by applying different storage policies to different types of applications running on a client computing device. Moreover, the improved information management system can automatically detect the region of a client computing device and apply an appropriate information management policy to the client computing device to avoid inconsistencies or other errors resulting from administrator control.

Optimizing garbage collection based on survivor lifetime prediction
11550712 · 2023-01-10 · ·

A predictive method for scheduling of the operations is described. The predictive method utilizes data generated from computing an expected lifetime of the individual files or objects within the container. The expected lifetime of individual files or objects can be generated based on machine learning techniques. Operations such as garbage collection are scheduled at an epoch where computational efficiencies are realized for performing the operation.

Scalable architectures for reference signature matching and updating

Methods, apparatus, systems and articles of manufacture are disclosed for scalable architectures for reference signature matching and updating. An example method for scalable architectures for reference signature matching and updating includes accessing site signatures to be compared to reference signatures from a first group of media sources. The example method also include determining if a first reference node is an owner of a first one of the site signatures, comparing a neighborhood of site signatures including the first site signature to reference signatures in a first subset of reference signatures when the first reference node is the owner of the first site signature, the first subset of references signatures stored in a first memory partition associated with the first reference node, and not comparing site signature to reference signatures when the first reference node is not the owner of the first one of the site signatures.

Systems and methods for a specialized computer file system

A computer file system for managing data storage resources is provided. The system comprises storage server configured to receive data file from a client application, modify the file name to include an expiration stamp, upload the at least one data file to the data storage device, generate a file link associated with the at least one data file, and transmit the file link to the client application, wherein the at least one data file is retrievable by the end user via the file link. A maintenance server is communicatively coupled to the data storage device, the maintenance server configured to execute an erase operation to autonomously erase the at least one data file from the data storage device based on the expiration stamp.

Utilizing machine learning to determine data storage pruning parameters

A device receives, from a user device, a request to prune a primary database, and receives primary database information associated with the primary database and secondary database information associated with a secondary database that is different than the primary database. The device processes the primary database information and the secondary database information, with a machine learning model, to generate suggested pruning parameters, and provides the suggested pruning parameters to the user device. The device receives selected pruning parameters from the user device, where the selected pruning parameters are selected from the suggested pruning parameters or are input via the user device. The device removes pruned information from the primary database based on the selected pruning parameters, and provides the pruned information to the secondary database based on the selected pruning parameters.

TRACKING DATA LINEAGE AND APPLYING DATA REMOVAL TO ENFORCE DATA REMOVAL POLICIES

A graph tracks the lineage of customer data, including when it was originally extracted from a customer computing system, and any transformation results indicating transformations that were performed on the customer data. The graph is traversed to identify nodes in the graph that have expired based upon data removal policies. The customer data represented by the expired nodes in the graph is deleted and the graph is modified to delete the expired nodes. The modified graph is then stored in persistent memory until data removal is next triggered.

Method, apparatus and computer program product for improving data indexing in a group-based communication platform

Methods, apparatus and computer program product for improving data indexing in a group-based communication platform are described herein. The group-based communication platform having a computed collection and one or more live collections. The computer-implemented method includes generating a new collection, the new collection being generated at a snapshot time point; associating a collection manager with the new collection; retrieving a plurality of electronic messages from the computed collection and the one or more live collections; writing the plurality of electronic messages to the new collection, the writing being completed at a cut-over time point; synchronizing the new collection with the one or more live collections based on the plurality of electronic messages; and redirecting the read alias and the write alias from the computed collection to the new collection.

Criterion-based retention of data object versions

A method and apparatus for criterion-based retention of data object versions are disclosed. In the method and apparatus, a plurality of keys are sorted in accordance with an ordering scheme, whereby a key of the plurality of keys has an associated version of a data object and a timestamp. The key is inspected in accordance with the ordering scheme to determine based at least in part on the timestamp whether a criterion for performing an action on the associated version of the data object is satisfied. If the criterion is satisfied, a marker key is added to the plurality of keys, whereby the marker key precedes the inspected key according to the ordering scheme and indicates that the criterion is satisfied.

Data processing systems for generating personal data receipts and related methods

A method of identifying one or more pieces of personal data associated with a data subject based at least in part on one or more triggering action; identifying a storage location of each of the one or more pieces of personal data associated with the data subject; automatically determining that a first portion of the one or more of the pieces of personal data has one or more legal bases for continued storage; automatically maintaining storage of the first portion of the one or more pieces of personal data; and automatically facilitating deletion of a second portion of the one or more pieces of personal data associated with the data subject.