H03M7/3091

RELATIONAL METHOD FOR TRANSFORMING UNSORTED SPARSE DICTIONARY ENCODINGS INTO UNSORTED-DENSE OR SORTED-DENSE DICTIONARY ENCODINGS
20200110820 · 2020-04-09 ·

Unsorted sparse dictionary encodings are transformed into unsorted-dense or sorted-dense dictionary encodings. Sparse domain codes have large gaps between codes that are adjacent in order. Unlike spare codes, dense codes have smaller gaps between adjacent codes; consecutive codes are dense codes that have no gaps between adjacent codes. The techniques described herein are relational approaches that may be used to generate sparse composite codes and sorted codes.

Method and storage device for reducing data duplication

The present disclosure directs to solutions for performing deduplication by a storage device. In the solutions, according to a duplicate data locality principle, non-duplicate data blocks whose logical addresses are contiguous are stored in contiguous physical addresses in a sequence of the logical addresses, and fingerprints of the non-duplicate data blocks whose logical addresses are contiguous are also stored in contiguous physical addresses in the sequence of the logical addresses, and in addition, a mapping from a logical address, which is of one data block in the non-duplicate data blocks whose logical addresses are contiguous, to an aggregation address is established.

Scalable binning for big data deduplication
10613785 · 2020-04-07 · ·

A very efficient computer system is presented to generate all pairs of records that have a certain similarity. Similarity is defined in terms of the textual similarity of the record attributes and/or absolute difference for numeric record attributes. Software assigns each record to a number of bins, and then compares pairs of records that belong to the same bin. This is more efficient than comparing all pairs of records since the number of records compared to each other is much smaller.

Computer system, storage apparatus, and method of managing data
10606499 · 2020-03-31 · ·

It is provided a computer system comprising at least one storage apparatus and a computer, wherein the each of the at least one storage apparatus is configured to manage identification information indicating specifics of the stored data, and wherein the computer determines whether the data to be written to the one of the at least one storage apparatus has duplicate data, which is the same data already stored in any one of the at least one storage apparatus, transmits deduplicated data, and uses at least one of individual pieces of identification information or a range of pieces of identification information, depending on how many pieces of identification information appear in succession, to request the information indicating whether the data that is associated with the calculated identification information is stored from the one of the at least one storage apparatus.

Systems and Methods for Version Chain Clustering
20200099392 · 2020-03-26 ·

A system, a method and a computer program product for storing data, which include receiving a data stream having a plurality of transactions that include at least one portion of data, determining whether at least one portion of data within at least one transaction is substantially similar to at least another portion of data within at least one transaction, clustering together at least one portion of data and at least another portion of data within at least one transaction, selecting one of at least one portion of data and at least another portion of data as a representative of at least one portion of data and at least another portion of data in the received data stream, and storing each representative of a portion of data from each transaction in the plurality of transactions, wherein a plurality of representatives is configured to form a chain representing the received data stream.

DETECTING AND PROTECTING AGAINST RANSOMWARE
20200099699 · 2020-03-26 · ·

In a system that replicates data writes by a server to form a local copy for a local production site with local storage and a remote copy for a remote recovery site having remote storage, ransomware is detected by a decrease of more than a predetermined threshold in either or both of compression ratio or deduplication ratio in a length of data selected by a sliding time window. Upon detecting ransomware, data writes to said remote storage are stopped to minimize corruption of the remote data.

Data compression with redundancy removal across boundaries of compression search engines

Data compression techniques are provided that remove redundancy across the boundary of compression search engines. An illustrative method comprises splitting the data frame into a plurality of sub-chunks; comparing at least two of the plurality of sub-chunks to one another to remove at least one sub-chunk from the plurality of sub-chunks that substantially matches at least one other sub-chunk to generate a remaining plurality of sub-chunks; generating matching sub-chunk information for data reconstruction identifying the at least one removed sub-chunk and the corresponding substantially matched at least one other sub-chunk; grouping the remaining plurality of sub-chunks into sub-units; removing substantially repeated patterns within the sub-units to generate corresponding compressed sub-units; and combining the compressed sub-units with the matching sub-chunk information to generate a compressed data frame. The data frame optionally comprises one or more host pages compressed substantially simultaneously, and the compressed data frame for a plurality of host pages compressed substantially simultaneously comprises a host page address for each host page.

DYNAMIC SYSTEM LOG PREPROCESSING

A method for more effectively recording information in system logs is disclosed. In one embodiment, such a method includes detecting errors on a system such as a host system or storage system over a specified period of time. The method stores information associated with the errors in a memory buffer. The method further preprocesses the information in the memory buffer to condense the information and remove duplication. In certain embodiments, this preprocessing includes grouping errors by error type and providing a single stack trace or other information per error type. The method then outputs the preprocessed information to a log file. A corresponding system and computer program product are also disclosed.

TECHNIQUES FOR ASSESSING DATA REDUCTION EFFICIENCY
20200019310 · 2020-01-16 · ·

Techniques for determining data reduction options may include: receiving data reduction statistics for a data set including a first value of a first statistic denoting an amount of data reduction obtained for the data set when compression is enabled, a second value of a second statistic denoting an amount of data reduction obtained for the data set when deduplication is enabled, and a third value of a third statistic denoting an overlap in data reduction contribution when both compression and deduplication are enabled; and determining, in accordance with the data reduction statistics, a first setting denoting a current data reduction option enabled for the data set. A Venn diagram provided on a user interface display may illustrate data reduction benefits for the data set based on the data reduction statistics. Data reduction benefits for the data set may be reassessed to determine whether to modify the current data reduction option.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND DATA STRUCTURE
20200012685 · 2020-01-09 · ·

An information processing device includes: a memory; and a processor coupled to the memory and configured to: convert target data into first data by predetermined arithmetic processing; generate second data based on the converted first data and identification information which specifies a file of the target data; and store the target data in an address of a memory corresponding to the generated second data.