Patent classifications
G06F16/1724
INCREMENTALLY IMPROVING CLUSTERING OF CROSS PARTITION DATA IN A DISTRIBUTED DATA SYSTEM
Methods and systems are provided for improved access to rows of data in a distributed data system. Each data row is associated with a partition. Data rows are distributed in one or more files and an impure file includes data rows associated multiple partitions. A clustering set is generated from a plurality of impure files by selecting a candidate impure file based on file access activity metrics and one or more neighbor impure files. Data rows of the impure files included in the clustering set are sorted according to their respective associated partitions. A set of disjoint partition range files are generated based on the sorted data rows of the impure files included in the clustering set. Each file of the set of disjoint partition range files is transferred to a respective target partition.
Reference set construction for data deduplication
By way of example, a data storage system may comprise, a non-transitory storage device storing data blocks in chunks, and a storage logic coupled to the non-transitory storage device that manages storage of data on the storage device. The storage logic is executable to receive a data stream for storage in a non-transitory storage device, the data stream including one or more data blocks, analyze the data stream to determine a domain, retrieve a pre-configured reference set based on the domain, and deduplicate the one or more data blocks of the data stream using the pre-configured reference set.
DEFRAGMENTATION IN DEDUPLICATION STORAGE SYSTEMS
Disclosed are techniques for defragmentation in deduplication storage systems. Machine language determines using deduplication metadata that at least some of an incoming input/output stream is a duplicate of at least part of a source volume whose physical locations of its stored data are fragmented in backend storage. Subsequently, defragmentation is carried out on the stored data by using the incoming input/output stream to write the data into sequential chunks at new physical locations in the backend storage and updating the source volume location mappings to the new physical locations.
PRUNING DATA SEGMENTS STORED IN CLOUD STORAGE TO RECLAIM CLOUD STORAGE SPACE
An information management system uses cloud storage resources store secondary copies of primary data created by client computing devices managed by a storage manager. Deduplication operations are performed on the secondary copies, which results in chunk metadata indices that allow for tracking and faster retrieval of the deduplicated secondary copies. The chunk metadata indices may reference data segments of the deduplicated secondary copies. The data segments may be stored in, and across, one or more sub-files. As the secondary copies are aged out from the cloud storage resources, data segments are identified as being orphaned or non-orphaned. Data segments that are orphaned are pruned to remove their corresponding sub-files from the cloud storage resources, where the sub-files are replaced with new sub-files that do not contain the orphaned data segments.
COMPUTER PROGRAM PRODUCT, METHOD, APPARATUS AND DATA STORAGE SYSTEM FOR MANAGING DEFRAGMENTATION IN FILE SYSTEMS
Aspects of managing defragmentation in a data storage system comprising one or more storage apparatuses and a file system server connected to the one or more storage apparatuses and to one or more host computers are described, comprising: providing free space allocation information; allocating, in response to receiving an update request to update data stored in one or more first storage units of a plurality of storage units, one or more second storage units of the plurality of storage units indicated to be free based on the provided free space allocation information for writing update data of the update request, controlling writing update data to the allocated one or more second storage units, and controlling swapping logical addresses associated with the one or more second storage units with respective logical addresses associated with the one or more first storage units.
SYSTEMS AND METHODS FOR OBFUSCATION OF DATA VIA AN AGGREGATION OF CLOUD STORAGE SERVICES
The present disclosure describes systems and methods for aggregation and management of cloud storage among a plurality of providers via file fragmenting to provide increased reliability and security. In one implementation, fragments or blocks may be distributed among a plurality of cloud storage providers, such that no provider retains a complete copy of a file. Accordingly, even if an individual service is compromised, a malicious actor cannot access the data. In another implementation, file fragmenting may be performed in a non-standard method such that file headers and metadata are divided across separate fragments, obfuscating the original file metadata.
FILE FRAGMENTATION REMOVAL METHOD AND DEVICE
A file fragmentation removal method includes collecting input/output system call information of a plurality of file input/output system calls by an application of at least one target file used by an application operating on an arbitrary filesystem, generating a file range list of items comprising information on a start point, an end point, and an access count of input/output based on the input/output system call information, selecting a plurality of fragmentation target items based on the file range list and a predetermined threshold, and selectively removing fragmentation based on whether individual items of the plurality of fragmentation target items are fragmented.
OPTIMIZED RECORD PLACEMENT IN GRAPH DATABASE
Methods and systems are disclosed for optimizing record placement in a graph by minimizing fragmentation when writing data. Issues with fragmented data within a graph database are addressed on the record level by placing data that is frequently accessed together contiguously within memory. For example, a dynamic rule set may be developed based on dynamically analyzing access patterns of the graph database, policies, system characteristics and/or other heuristics. Based on statistics regarding normal query patterns, the systems and methods may identify an optimal position for certain types of edges that are often traversed with respect to particular types of nodes.
KEY VALUE FILE SYSTEM
A file system includes: an application programming interface (API) configured to provide a file system access to an application running on a host computer; a key value file system configured to represent a file or a directory as an inode including one or more key-value pairs; a virtual file system configured to direct a file system call received from the application to the key value file system; and a key value API configured to provide the file system access to data stored in a data storage device. Each key-value pair contained in the inode includes a name of the file or the directory as a key and an identifier of a container that is associated with the file or the directory as a value. The data of the file is stored in the data storage device as being divided into one or more data blocks of a fixed size, and each of the one or more data blocks associated with the data of the file is accessible within the key value file system using the one or more key-value pairs.
Information processing device, program, and recording medium
An information processing device includes: a metadata retaining section retaining metadata of a file formed by a plurality of data blocks; a correspondence file retaining section retaining a correspondence file associating information identifying a recording location of a data block with information identifying the metadata retaining section retaining the metadata of the data block; a change processing section changing the recording location of the data block; and an update processing section updating the metadata retained by the metadata retaining section. The update processing section refers to the correspondence file, identifies the metadata retaining section retaining the metadata of the data block whose recording location is changed by the change processing section, and updates the metadata.