G06F16/174

Method and system of similarity-based deduplication

A method of similarity-based deduplication comprising the steps of: receiving an input data block; computing discrete wavelet transform (DWT) coefficients; extracting feature-related DWT data from the computed DWT coefficients; applying quantization to the extracted feature-related DWT data to obtain keys as results of the quantization; constructing a locality-sensitive fingerprint of the input data block; computing a similarity degree between the locality-sensitive fingerprint of the input data block and a locality-sensitive fingerprint of each data block in the plurality of the data blocks in a cache memory; selecting an optimal reference data block as the data block; determining a differential compression is required to be applied based on the similarity degree between the input data block and the optimal reference data block; applying the differential compression to the input data block and the optimal reference data block.

Concurrent computations operating on same data for CPU cache efficiency

Techniques for CPU cache efficiency may include performing concurrent processing, such as for first and second data operations, in a synchronized manner that prevents loading the same data chunk into the CPU cache more than once. Processing may include synchronizing the first and second data operations with respect to a first data chunk to ensure that both the first and second data operation processing has completed prior to proceeding with performing such processing on a second data chunk. The first and second data operations may be any two of deduplication, encryption, and compression, performed inline as part of the data path. In one embodiment, the first and second data operations for the first data chunk may be performed in parallel or sequentially where neither data operation proceeds with another data chunk until processing of the first and second data operations is complete for the first data chunk.

Optimizing backup performance with heuristic configuration selection
11513901 · 2022-11-29 · ·

Embodiments are described for a heuristic configuration selection process as part of or accessible by the backup management process. This processing component provides a method to automatically determine the configuration parameters needed to obtain optimal performance for a given backup/restore job. This process involves identifying key parameters that determine backup performance and suggest means to derive and incorporate those configurable parameters into the backup software automatically. Embodiments can be applied to stream based backups, or other types of backup software as well.

Optimizing backup performance with heuristic configuration selection
11513901 · 2022-11-29 · ·

Embodiments are described for a heuristic configuration selection process as part of or accessible by the backup management process. This processing component provides a method to automatically determine the configuration parameters needed to obtain optimal performance for a given backup/restore job. This process involves identifying key parameters that determine backup performance and suggest means to derive and incorporate those configurable parameters into the backup software automatically. Embodiments can be applied to stream based backups, or other types of backup software as well.

File layer to block layer communication for block organization in storage

A method performed by a block-storage server, of storing data is described. The method includes (1) receiving, from a remote file server, data blocks to be written to persistent block storage managed by the block-storage server; (2) receiving, from the remote file server, metadata describing a placement of the data blocks in a filesystem managed by the remote file server; and (3) organizing the data blocks within the persistent block storage based, at least in part, on the received metadata. An apparatus, system, and computer program product for performing a similar method are also provided.

Indexing splitter for any pit replication
11514002 · 2022-11-29 · ·

A method, apparatus, and system for transmitting file system metadata from an indexing splitter running in a VM to a source side RPA is disclosed. The operations comprise: capturing one or more file system events in a production virtual machine (VM) at an indexing splitter; transmitting file system metadata representing the captured file system events from the indexing splitter to a data splitter, the data splitter being an agent running on a host system hosting the VM; transmitting the file system metadata inside one or more special input/output (I/O) commands associated with a predetermined tag from the data splitter to a source side replication protection appliance (RPA) alongside regular storage system I/O command data; identifying the special I/O commands at the source side RPA based on the predetermined tag; and recovering the file system metadata from the special I/O commands at the source side RPA.

Deduplication-adapted CaseDB for edge computing

Disclosed is a data deduplication method for an edge computer. The method is performed in a key-value store, and may include receiving a compaction request occurred from the key-value store to a metadata layer, checking whether deduplication for removing duplicated data is required when compaction of a metadata file is performed in response to the received compaction request, and removing the duplicated data by checking whether the deduplication is required.

HYBRID FILE COMPRESSION MODEL
20220374395 · 2022-11-24 ·

An archive file that includes an archive start point and an archive end point is received to be segmented and compressed. A first set of compression start points to segment the archive file according to a first function and a second set of compression start points to partition the archive file according to a second function are created. The first set of compression start points and the second set of compression start points are combined to create a set of merged compression start points to partition the archive file into portions between the archive start point and the archive end point. Each portion between the archive start point and the archive end point are compressed to create a compressed archive file.

File system warnings application programing interface (API)

The present technology pertains to a organization directory hosted by a synchronized content management system. The corporate directory can provide access to user accounts for all members of the organization to all content items in the organization directory on the respective file systems of the members' client devices. Members can reach any content item at the same path as other members relative to the organization directory root on their respective client device. In some embodiments novel access permissions are granted to maintain path consistency.

Apparatus and method for storing received data blocks as deduplicated data blocks

An apparatus stores received data blocks as deduplicated data blocks. The apparatus is configured to: maintain a plurality of containers, where a reference to a container is unique within the apparatus and each container includes one or more data segments and segment metadata for each data segment, the segment metadata including a segment identifier and a segment reference, where the segment identifier is unique within the container and the segment reference is unique within the apparatus; and maintain a plurality of deduplicated data blocks storing received data blocks, where each deduplicated data block includes a plurality of identified container references, where a container reference identifier is unique within the deduplicated data block, and an ordered list of one or more segment indicators.