G06F16/1756

Processing device configured for efficient generation of compression estimates for datasets
11609883 · 2023-03-21 · ·

An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table. The scan criterion may comprise, for example, a designated content-based signature prefix, or a designated subset inclusion characteristic defining a polynomial-based signature subspace.

Compression of localized files

A method for compressing a first application file and second application file includes accessing the first and the second application files, the first application file being in a first language and the second application being in a second language and being a counterpart of the first application file, decompressing the first and second application files to access internal files for the first and the second application files, comparing one of the first internal files to one of the second internal files, upon determining that the first internal file is identical to the second internal file, copying one of the internal files to an output folder, and upon determining that the files are not identical, copying both of the internal files to the output folder, or executing a differencing procedure on the first and second internal files to identify differences between them, storing data about the differences in the output folder, and compressing the output folder into one output file.

Method and Apparatus for Duplicated Data Management in Cloud Computing
20170346625 · 2017-11-30 · ·

An approach is provided for managing data duplication in cloud computing. A method comprising, sending from a first device to a data center, data encrypted with a data encryption key for storing the encrypted data at the data center; encrypting the data encryption key according to an attribute-based encryption (ABE) scheme by using identity as an attribute in a deduplication policy for the data; issuing to a second device, a personalized secret attribute key which is derived from a public key of the second device according to the attribute-based encryption (ABE) scheme, wherein the personalized secret attribute key is to be used for decrypting the encrypted data encryption key at the second device, in combination with the policy.

SNAPSHOT CREATION

In one example an updated snapshot delta value is computed upon occurrence of a new transaction. The new transaction is a data modification operation performed on data blocks of the storage device. Further, the delta value indicates at least one of volume of data modified since creation of a reference snapshot and number of transactions performed since the creation. Subsequently, the updated snapshot delta value is compared with a corresponding threshold value. The threshold value is at least one of a predetermined volume of data modified and a predetermined number of transactions. A snapshot action is subsequently performed based on the comparison.

Efficient database undo / redo logging

Log records are accessed as part of a database operation in a database. The log records log insert, update, and delete operations in the database and include, for each row, a row position, a fragment identifier (ID), and a row ID. Thereafter, as part of the database operation, rows specified by the log records are located by: using the fragment identifier and the row position within the corresponding record of the log if the fragment with the corresponding fragment identifier is still available, otherwise, using the row identifier within the corresponding record of the log to look up the row position in an index of a corresponding row identifier column. The database operation is then finalized using the located rows. Related apparatus, systems, techniques and articles are also described.

Decrypting files for data leakage protection in an enterprise network

Techniques are provided for decrypting an encrypted file within an enterprise network. The techniques include identifying by a password collecting module a password entered during a file encryption procedure performed at a terminal and storing the password; receiving an encrypted file by a data leakage protection (DLP) module; and attempting to decrypt the encrypted file with the password by the DLP module.

Method and system for accelerating data movement using change information concerning difference between current and previous data movements
09773042 · 2017-09-26 · ·

According to one embodiment, a first storage system receives a first data stream from a second storage system over a network. The first data stream includes data objects and differential object information identifying at least one data object missing from the first data stream. A difference between the first data stream and a second data stream that has been previously received is determined based on the differential object information, including identifying a data object that has been added, deleted, or modified in view of the second data stream. The first data stream is reconstructed based on the second data stream and the difference between the first data stream and the second data stream, generating a third data stream. The third data stream is stored in a persistent storage device of the first storage system, the third data stream representing a complete first data stream without a missing data object.

Data storage system and method

Data item deltas are generated for each of M updates of a plurality of updates, wherein M is greater than or equal to one, and a first first-level combined delta is generated representing N updates of the plurality of updates, wherein N is greater than M, and the N updates comprise the M updates and O=N−M other updates. A first second-level combined delta is generated representing J updates of the plurality of updates, wherein J is greater than N, and the J updates comprise the N updates and K other updates of the plurality of updates, wherein K=J−N. The deltas, the first first-level combined delta and the first second-level combined delta are stored for enabling subsequent reading of at least part of the data by accessing the data item, the first first-level combined delta and the first second-level combined delta.

INCREMENTAL BACKUP TO OBJECT STORE

Techniques are provided for incremental backup to an object store. A request may be received from an application to perform a backup from a volume hosted by a node to a backup target within the object store. A set of changed files within the volume since a prior backup of the volume was performed to the backup target is identified, along with metadata associated with the set of changed files. The metadata is utilized to identify changed data blocks comprising data of the set of changed files that was modified since the prior backup. The changed data blocks are backed up to the object store.

Techniques for compact data storage of network traffic and efficient search thereof

In networked communication systems, a document in a communication (e.g., a response) may be similar between multiple communications involving the same resource, such that duplicate data can be discarded and not stored by a network storage system. Storage of differences in network traffic facilitates compression of storage of network traffic, thereby significantly reducing data storage. Techniques are disclosed for efficient search and retrieval of the compressed data storage. Network traffic may be compared to communications in previous network traffic to identify differences if any. Resource templates may be generated for different (e.g., new) resources identified in network traffic. Storage of the different resources identified in network traffic enables compression of network traffic. Similarity matching may be implemented to improve processing performance for compact storage of network traffic, including determining differences in network traffic for storage.