G06F16/1752

Storage system deduplication with service level agreements

Mechanisms are provided for adjusting a configuration of data stored in a storage system. According to various embodiments, a storage module may be configured to store a configuration of data. A processor may be configured to identify an estimated performance level for the storage system based on a configuration of data stored on the storage system. The processor may also be configured to transmit an instruction to adjust the configuration of data on the storage system to meet the service level objective when the estimated performance level fails to meet a service level objective for the storage system.

File layer to block layer communication for block organization in storage

A method performed by a block-storage server, of storing data is described. The method includes (1) receiving, from a remote file server, data blocks to be written to persistent block storage managed by the block-storage server; (2) receiving, from the remote file server, metadata describing a placement of the data blocks in a filesystem managed by the remote file server; and (3) organizing the data blocks within the persistent block storage based, at least in part, on the received metadata. An apparatus, system, and computer program product for performing a similar method are also provided.

Systems and methods for automatic backup scheduling based on backup history

Methods and systems for data backup are described. According to some embodiments, the method includes in response to receiving a request for database instance discovery, retrieving backup history information. The method further includes filtering the backup history information to obtain selected backup information. The method further includes sending an instance discovery response that includes the selected backup information. The method further includes generating one or more protection policies based on the selected backup information.

Apparatus and method for storing received data blocks as deduplicated data blocks

An apparatus stores received data blocks as deduplicated data blocks. The apparatus is configured to: maintain a plurality of containers, where a reference to a container is unique within the apparatus and each container includes one or more data segments and segment metadata for each data segment, the segment metadata including a segment identifier and a segment reference, where the segment identifier is unique within the container and the segment reference is unique within the apparatus; and maintain a plurality of deduplicated data blocks storing received data blocks, where each deduplicated data block includes a plurality of identified container references, where a container reference identifier is unique within the deduplicated data block, and an ordered list of one or more segment indicators.

Processing device configured for efficient generation of data reduction estimates for combinations of datasets
11593313 · 2023-02-28 · ·

An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify at least first and second datasets to be scanned to generate a data reduction estimate for a prospective combination of the first and second datasets, to designate a scan criterion to be utilized in the scan of each of the datasets, and for each of a plurality of pages of each of the datasets, to scan the page, where scanning the page comprises performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a data reduction estimate table for the dataset. The processing device merges contents of the data reduction estimate tables, and generates the data reduction estimate based at least in part on the merged contents.

Segmented index for data deduplication
11593327 · 2023-02-28 · ·

A deduplication index is generated having multiple entries, each entry storing a digest of a data block that was previously stored in non-volatile data storage together with a pointer to the location in non-volatile storage at which the data block was previously stored. The entries of the disclosed deduplication index are divided into multiple deduplication index segments. A resident subset of the deduplication index segments is stored in memory of the data storage system. A non-resident subset of the deduplication index segments is stored in non-volatile data storage of the data storage system. Data deduplication is performed for each subsequently received data block for which a digest is generated that matches any one of the digests in the entries of the deduplication index segments that are contained in the resident subset of the deduplication index segments.

Distributed Storage System Data Management And Security

A system and method for distributing data over a plurality of remote storage nodes. Data are split into segments and each segment is encoded into a number of codeword chunks. None of the codeword chunks contains any of the segments. Each codeword chunk is packaged with at least one encoding parameter and identifier, and metadata are generated for at least one file and for related segments of the at least one file. The metadata contains information to reconstruct from the segments, and information for reconstructing from corresponding packages. Further, metadata are encoded into package(s), and correspond to a respective security level and a protection against storage node failure. A plurality of packages are assigned to remote storage nodes to optimize workload distribution. Each package is transmitted to at least one respective storage node as a function iteratively accessing and retrieving the packages of metadata and file data.

APPARATUS AND METHOD FOR DETECTING TARGET FILE BASED ON NETWORK PACKET ANALYSIS

An apparatus for detecting a target file includes an inverse indexing database unit configured to generate at least one file chunk by performing a chunking operation on a target file, and inversely index each of the at least one file chunk as a target file code, a network packet receiving unit configured to receive a network packet, a packet chunk processing unit configured to generate at least one packet chunk by performing a chunking operation on a network packet, a chunk query unit configured to generate a packet chunk query word for each of the at least one packet chunk and provide the packet chunk query word to the inverse indexing database unit to receive the detection target file code, and a file code determining unit configured to determine a most likely detection target file code in the network packet based on the received detection target file code.

System and methods for bandwidth-efficient cryptographic data transfer

A system and methods for bandwidth-efficient cryptographic data transfer, utilizing an encoding endpoint device, a decoding endpoint device, a reference codebook, and a plurality of data to encode and decode, which may use specific algorithms on top of block cipher encryption to achieve higher data security and ease the burden on users with regards to computational power, complexity, and bandwidth for communication.

METHOD AND APPARATUS FOR REPLICATING A TARGET FILE BETWEEN DEVICES

There is provided a method and apparatus for remote differential compression (RDC) and data deduplication. According to embodiments, when a sending device acquires a new target file, the following steps are performed. Initially, Jaccard segmentation is performed, followed by performing identity-based segment deduplication and similarity-based segment deduplication. The transmission of the target file in the deduplicated form to the recipient device is subsequently performed. The recipient device can then rebuild the original target file from the deduplicated form thus replicating the target file at the recipient device with the target file originally present at the sending device.