Patent classifications
G06F16/1752
Source file copying and error handling
Object service receives request to copy file to destination and identifies group identifier for fingerprints group corresponding to sequential segments in file. Object service communicates request for fingerprints group to deduplication service associated with group identifier range including group identifier. Deduplication service communicates fingerprints group, retrieved from fingerprint storage, to object service, which communicates fingerprints group and group identifier to destination. Object service communicates request for file segments, corresponding to fingerprints missing in destination, communicated from destination, to deduplication service, which communicates requested segments, retrieved from source storage, to object service, which communicates requested segments to destination. System identifies generation identifier associated with time of communicating by object service or deduplication service, and generation identifier associated with another time of communicating by object service or deduplication service. If generation identifier associated with time differs from generation identifier associated with other time, object service or deduplication service restarts communication.
RECOVERING FREE SPACE IN NONVOLATILE STORAGE WITH A COMPUTER STORAGE SYSTEM SUPPORTING SHARED OBJECTS
To identify objects shared by entities and to, in turn, identify free space in nonvolatile storage, a computer system uses a probabilistic data structure which tests whether an element is a member of a set. Such probabilistic data structures are created for entities in the storage system that share objects. The probabilistic data structure for an entity represents the objects that are used by that entity. When an entity is deleted, each object used by that entity is compared to the probabilistic data structures of other entities to determine if there is a likelihood that the object is used by one or more of the other entities. If the likelihood determined for an object is above an acceptable threshold, then the object is not deleted. If the likelihood determined for an object is below the set threshold, then the object can be deleted and the corresponding storage locations can be marked as free.
Optimized client-side deduplication
One example method includes optimizing client-side deduplication. When backing up a client, an overwrite ratio is determined based on a size of actual changes made to a volume and a size indicated by changes in a change log. Client-side deduplication is enabled or disabled based on a value of the overwrite ratio.
Block-level single instancing
Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.
SYSTEM AND METHOD FOR DATA COMPACTION AND SECURITY WITH EXTENDED FUNCTIONALITY
A system and method for highly efficient encoding of data that includes extended functionality for asymmetric encoding/decoding and network policy enforcement. In the case of asymmetric encoding/decoding the original data is encoded by an encoder according to a codebook and sent to a decoder, but the output of the decoder depends on data manipulation rules applied at the decoding stage to transform the decoded data into a different data set from the original data. In the case of network policy enforcement, a behavior appendix into the codebook, such that the encoder and/or decoder at each node of the network comply with network behavioral rules, limits, and policies during encoding and decoding.
Data Storage Arrangement and Method for Anonymization Aware Deduplication
A data storage arrangement includes a memory and a controller, where the controller receives an indication of data to be anonymized. The controller further parses a data element to be stored and generates a copy of one or more data portions to be anonymized. The controller further deletes one or more data portions to be anonymized to generate a modified data element to be stored. The controller further generates a copy of the modified data element to be stored utilizing deduplication. The data storage arrangement thus takes in account data anonymization during deduplication (i.e. an anonymization aware deduplication).
System and method for data compaction and security with extended functionality
A system and method for highly efficient encoding of data that includes extended functionality for asymmetric encoding/decoding and network policy enforcement. In the case of asymmetric encoding/decoding the original data is encoded by an encoder according to a codebook and sent to a decoder, but the output of the decoder depends on data manipulation rules applied at the decoding stage to transform the decoded data into a different data set from the original data. In the case of network policy enforcement, a behavior appendix into the codebook, such that the encoder and/or decoder at each node of the network comply with network behavioral rules, limits, and policies during encoding and decoding.
System and method for error-resilient data reduction
A system and method for error-resilient data reduction, utilizing a phase detector, a data requestor, a multi-phase trainer, a reconstruction engine, a deconstruction engine, and one or more reference codebooks. A multi-phase trainer may be used to train the reconstruction and deconstruction engines on various phase sourceblocks in order recover quickly from corrupted data files that cause the phase alignment of the sourceblocks to become out of phase. A phase detector may determine when the sourceblocks get out of phase and when the return to in-phase by checking if a predetermined threshold probability of correct encoding is met. Data requestor may request for retransmission only the data that was received out of phase.
SELECTIVE DATA DEDUPLICATION IN A MULTITENANT ENVIRONMENT
Computer implemented methods for selective data deduplication in a multitenant environment are disclosed. Data deduplication of blocks written to a storage area associated with a tenant and redundant copies of the blocks written to other storage areas of other tenants is permitted or prevented based on tagging the first storage area associated with the tenant with a particular type of parameter. Responsive to detecting a write operation directed to the storage area tagged with a parameter indicating that deduplication is not permitted, a block to be written to the storage area is modified prior to hashing the block. Responsive to detecting a write operation directed to the storage area tagged with a parameter indicating that deduplication is permitted, a block to be written to the storage area is prevented from being modified prior to hashing the block.
SELECTIVE DATA DEDUPLICATION IN A MULTITENANT ENVIRONMENT
A computer-implemented method for dynamic storage pricing in a multitenant environment is disclosed. The computer-implemented method includes dynamically modifying a storage cost for one or more tenants pointing to a block written to a storage area of the multitenant environment based, at least in part, on detecting a change in a number of tenants pointing to the block.