G06F16/152

SELECTIVE DATA DEDUPLICATION IN A MULTITENANT ENVIRONMENT
20220405251 · 2022-12-22 ·

Computer implemented methods for selective data deduplication in a multitenant environment are disclosed. Data deduplication of blocks written to a storage area associated with a tenant and redundant copies of the blocks written to other storage areas of other tenants is permitted or prevented based on tagging the first storage area associated with the tenant with a particular type of parameter. Responsive to detecting a write operation directed to the storage area tagged with a parameter indicating that deduplication is not permitted, a block to be written to the storage area is modified prior to hashing the block. Responsive to detecting a write operation directed to the storage area tagged with a parameter indicating that deduplication is permitted, a block to be written to the storage area is prevented from being modified prior to hashing the block.

SELECTIVE DATA DEDUPLICATION IN A MULTITENANT ENVIRONMENT
20220405789 · 2022-12-22 ·

A computer-implemented method for dynamic storage pricing in a multitenant environment is disclosed. The computer-implemented method includes dynamically modifying a storage cost for one or more tenants pointing to a block written to a storage area of the multitenant environment based, at least in part, on detecting a change in a number of tenants pointing to the block.

DUPLICATE FILE MANAGEMENT FOR CONTENT MANAGEMENT SYSTEMS AND FOR MIGRATION TO SUCH SYSTEMS

In large installations of document management systems, files are often duplicated. Users may place their own copies of files in convenient locations, or for other reasons files may be unintentionally duplicated. Duplication of files causes many problems for systems reliant on document management, chiefly because the additional (identical) files accept extra storage space, and must be handled like all other files, which results in greater network and resource utilization (with a concomitant increase in processing, search and retrieval times). A tool to standardize the identification of duplicate files (based on their binary contents), as well as the identification of a primary duplicate (the original file) across multiple repositories in a manner that minimizes the time for identification is disclosed.

Internal key hash directory in table

Provided is a system and method for searching for a target key in a database, the method including populating a hash-offset table of a sorted key table with hash-offset table entries, the hash-offset table entries having a hash-value corresponding to a respective key, and a hash offset, sorting the hash-offset table entries based on the hash-values, searching for a target hash-value of the hash-values corresponding to a target key in the hash-offset table, locating a target key-value pair corresponding to the target key based on the target hash-value, and saving a location of the target key-value pair.

Distributed storage device and data management method in distributed storage device
11520745 · 2022-12-06 · ·

The number of inter-node communications in inter-node deduplication can be reduced and both performance stability and high capacity efficiency can be achieved. A storage drive of storage nodes stores files that are not deduplicated in the plurality of storage nodes, duplicate data storage files in which deduplicated duplicate data is stored, and cache data storage files in which cache data of duplicate data stored in another storage node is stored, in which when a read access request for the cache data is received, the processors of the storage nodes read the cache data if the cache data is stored in the cache data storage file, and request another storage node to read the duplicate data related to the cache data if the cache data is discarded.

Distributed query execution and aggregation

Computer-implemented methods and systems are disclosed for receiving and indexing a plurality of files for later querying, for dynamically generating scripts to be executed during a query of a data store, and for horizontally distributing a query and aggregating results of the distributed query.

Method and system of similarity-based deduplication

A method of similarity-based deduplication comprising the steps of: receiving an input data block; computing discrete wavelet transform (DWT) coefficients; extracting feature-related DWT data from the computed DWT coefficients; applying quantization to the extracted feature-related DWT data to obtain keys as results of the quantization; constructing a locality-sensitive fingerprint of the input data block; computing a similarity degree between the locality-sensitive fingerprint of the input data block and a locality-sensitive fingerprint of each data block in the plurality of the data blocks in a cache memory; selecting an optimal reference data block as the data block; determining a differential compression is required to be applied based on the similarity degree between the input data block and the optimal reference data block; applying the differential compression to the input data block and the optimal reference data block.

File system warnings application programing interface (API)

The present technology pertains to a organization directory hosted by a synchronized content management system. The corporate directory can provide access to user accounts for all members of the organization to all content items in the organization directory on the respective file systems of the members' client devices. Members can reach any content item at the same path as other members relative to the organization directory root on their respective client device. In some embodiments novel access permissions are granted to maintain path consistency.

Apparatus and method for storing received data blocks as deduplicated data blocks

An apparatus stores received data blocks as deduplicated data blocks. The apparatus is configured to: maintain a plurality of containers, where a reference to a container is unique within the apparatus and each container includes one or more data segments and segment metadata for each data segment, the segment metadata including a segment identifier and a segment reference, where the segment identifier is unique within the container and the segment reference is unique within the apparatus; and maintain a plurality of deduplicated data blocks storing received data blocks, where each deduplicated data block includes a plurality of identified container references, where a container reference identifier is unique within the deduplicated data block, and an ordered list of one or more segment indicators.

Browsability of backup files using data storage partitioning

A data storage system includes non-volatile data storage including a container partition and a browsable partition and control circuitry configured to backup a file in the non-volatile data storage at least in part by receiving the file from a host, the file including a plurality of chunks of data, storing the plurality of chunks of data in the browsable partition of the non-volatile data storage, determining that one or more of the plurality of chunks has been modified, storing the one or more modified chunks in a container partition of the non-volatile data storage, determining a new chunk associated with each of the one or more modified chunks, and storing the one or more new chunks in the browsable partition of the non-volatile data storage.