Patent classifications
G06F16/1748
Systems and methods for document search and aggregation with reduced bandwidth and storage demand
Methods and systems comprising a gateway coordinator of a local system that receives a task comprising search criteria, crawls for files on a local data source of the local system, and encounters one or more files of interest. The one or more files of interest may be deNISTed and deduplicated and sent to an upload coordinator of a remote cloud facility. In one or more examples, the gateway coordinator may be a virtual machine.
INLINE DEDUPLICATION BETWEEN NODES IN STORAGE SYSTEMS
Techniques described herein coordinate inline deduplication among nodes in a storage system. The method includes storing, in a page descriptor ring on a node, data and a fingerprint associated with the data in an entry. The method includes determining that a flushing work set (FWS) has been frozen. The node identifies, in the page descriptor ring, entries associated with the frozen FWS and having fingerprints with a parity associated with the node. The node deduplicates the entries based on a fingerprint database on the node. The node synchronizes deduplication of the frozen FWS with a peer node, so as to receive deduplication results concerning entries having fingerprints with a parity associated with the peer node. The node replaces entries in the page descriptor ring with the deduplication results from the peer node, and flushes entries in the frozen FWS to a storage device.
CONTAINER-BASED ERASURE CODING
A repository of replicated chunk files is analyzed to identify chunk files that meet at least a portion of combination criteria. Selected chunk files are associated together under a data protection grouping container. Erasure coding is applied to the data protection grouping container including by utilizing the selected chunk files as different data stripes of the erasure coding and generating one or more parity stripes based on the different data stripes.
Constant time updates after memory deduplication
Systems and methods are described for resource-efficient memory deduplication and write-protection. In an example, a method includes receiving, by a computing device having a processor, a request to assess deduplication for a plurality of candidate files. The computing device may perform one or more iterative steps for deduplication. The iterative steps may include: receiving, from the plurality of candidate files, a candidate file that is not write-protected; determining, based on a predetermined Bernoulli distribution, a decision to write-protect the candidate file; rendering the candidate file as a write-protected candidate file; determining, based on a review of other candidate files from the plurality of candidate files, that the write-protected candidate file can be deduplicated; and deduplicating the write-protected candidate file.
LARGE OBJECT PACKING FOR STORAGE EFFICIENCY
One example method includes receiving data, partitioning the data according to their respective similarity groups, and the similarity groups collectively define a range of similarity groups, deduplicating the data after the partitioning, packing unique data segments remaining after deduplicating into one or more compression regions, compressing the compression regions, and writing an object, that includes the compression regions, to a durable log. The deduplicating and compressing for a similarity group may be performed by a dedup-compression instances uniquely assigned to that similarity group.
Management of encryption agents in data storage systems
A method for managing keys and encrypting data is provided. The method includes receiving data to be written to a logical disk, generating an encryption table indicating one or more locations on the logical disk for storing the data and indicating a key used for encrypting the data, encrypting the data to be written to the logical disk, and transmitting the encrypted data and the encryption table to a storage array.
Data Storage Arrangement and Method for Anonymization Aware Deduplication
A data storage arrangement includes a memory and a controller, where the controller receives an indication of data to be anonymized. The controller further parses a data element to be stored and generates a copy of one or more data portions to be anonymized. The controller further deletes one or more data portions to be anonymized to generate a modified data element to be stored. The controller further generates a copy of the modified data element to be stored utilizing deduplication. The data storage arrangement thus takes in account data anonymization during deduplication (i.e. an anonymization aware deduplication).
WORD AWARE CONTENT DEFINED CHUNKING
One example method includes, in a data buffer that includes one or more words and whitespaces, calculating a hash value of data in a window that is movable within the data buffer, comparing the hash value to a mask, and when the hash value matches the mask, identifying a position of the window in the data buffer as a chunk anchor position, searching for a whitespace nearest the chunk anchor position, and designating an offset of the whitespace as a segment boundary.
OBJECT STORAGE-BASED INDEXING SYSTEMS AND METHOD
A file system and a related method are presented. The file system includes an object storage configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects. Each namespace entry of the plurality of namespace entries includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The file system further includes an indexing system configured to generate the plurality of namespace entries; store the plurality of namespace entries as one or more objects in the object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on a list of the plurality of namespace entries sorted on the version numbers.
FILE RESTORE PERFORMANCE USING A FILE HANDLER TO DISASSOCIATE PREFETCH AND READ STREAMS
Embodiments of small file restore process in deduplication file system wherein restoration requires issuing a read request within an I/O request to the file system. The process places the files in a prefetch queue such that a combined size of the files meets or exceeds a size of the prefetch queue as defined by a prefetch horizon. A file handler disassociates prefetch streams from read streams. The handler prefetches the read operations and stores them in memory. The stream corresponding to a read will only open as the read requests hit the queue processor. As a result, the stream usage is very low, since the I/O, worker threads and the read streams are disassociated from each other.