G06F3/0641

Update of deduplication fingerprint index in a cache memory

In some examples, a system performs data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein a block of the plurality of blocks comprises fingerprints computed based on content of respective data values. The system merges, in a merge operation, updates for the deduplication fingerprint index to the hash data structure stored in a persistent storage. As part of the merge operation, the system mirrors the updates to a cached copy of the hash data structure in a cache memory, and updates, in an indirect block, information regarding locations of blocks in the cached copy of the hash data structure.

Load balancing across multiple data paths

Multiple data paths may be available to a data management system for transferring data between a primary storage device and a secondary storage device. The data management system may be able to gain operational advantages by performing load balancing across the multiple data paths. The system may use application layer characteristics of the data for transferring from a primary storage to a backup storage during data backup operation, and correspondingly from a secondary or backup storage system to a primary storage system during restoration.

Massively Scalable Object Storage for Storing Object Replicas

An example method for storing data includes providing a plurality of physical storage pools, each storage pool including a plurality of storage nodes coupled to a network. The method also includes mapping a partition of a plurality of partitions to a set of physical storage pools, where each physical storage pool of the set of physical storage pools is located in a different availability zone, and the storage nodes within an availability zone are subject to a correlated loss of access to stored data. The method further includes receiving a data management request over the network, the data management request being associated with a data object. The method also includes identifying a first partition of the plurality of partitions corresponding to the received data management request and manipulating the data object in the physical storage pools mapped to the first partition in accordance with the data management request.

RECOVERING FREE SPACE IN NONVOLATILE STORAGE WITH A COMPUTER STORAGE SYSTEM SUPPORTING SHARED OBJECTS
20180004769 · 2018-01-04 ·

To identify objects shared by entities and to, in turn, identify free space in nonvolatile storage, a computer system uses a probabilistic data structure which tests whether an element is a member of a set. Such probabilistic data structures are created for entities in the storage system that share objects. The probabilistic data structure for an entity represents the objects that are used by that entity. When an entity is deleted, each object used by that entity is compared to the probabilistic data structures of other entities to determine if there is a likelihood that the object is used by one or more of the other entities. If the likelihood determined for an object is above an acceptable threshold, then the object is not deleted. If the likelihood determined for an object is below the set threshold, then the object can be deleted and the corresponding storage locations can be marked as free.

USE OF PREDEFINED BLOCK POINTERS TO REDUCE DUPLICATE STORAGE OF CERTAIN DATA IN A STORAGE SUBSYSTEM OF A STORAGE SERVER
20180011657 · 2018-01-11 · ·

A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.

LOAD BALANCING ACROSS MULTIPLE DATA PATHS

Multiple data paths may be available to a data management system for transferring data between a primary storage device and a secondary storage device. The data management system may be able to gain operational advantages by performing load balancing across the multiple data paths. The system may use application layer characteristics of the data for transferring from a primary storage to a backup storage during data backup operation, and correspondingly from a secondary or backup storage system to a primary storage system during restoration.

DATA MANAGEMENT IN MULTIPLY-WRITEABLE FLASH MEMORIES
20180011632 · 2018-01-11 · ·

According to the present disclosure is provided a device and method for mapping management in a flash memory based on partitioning the memory to a main address space and a substitute space, each partition comprising locations in the memory that are denoted by at least three statues according to which locations are mapped from the main space to the substitute space while responsively modifying the statuses.

Smart de-fragmentation of file systems inside VMS for fast rehydration in the cloud and efficient deduplication to the cloud

One example method includes chunking a respective disk of each of a plurality of virtual machines (VM) to create a respective plurality of chunks associated with each of the VMs, creating, based on the chunking process, a cluster comprising one or more of the VMs, creating a VM template whose data and disk structure match respective data and disk structures of each of the VMs in the cluster, and in response to a file operation involving a first one of the VM disks, defragmenting the first VM disk so that a disk structure of the first VM disk is the same as a disk structure of the VM template.

DISK USAGE GROWTH PREDICTION SYSTEM

Certain embodiments described herein relate to an improved disk usage growth prediction system. In some embodiments, one or more components in an information management system can determine usage status data of a given storage device, perform a validation check on the usage status data using multiple prediction models, compare validation results of the multiple prediction models to identify the best performing prediction model, generate a disk usage growth prediction using the identified prediction model, and adjust the available space of the storage device according to the disk usage growth prediction.

Technologies for providing edge deduplication

Technologies for providing deduplication of data in an edge network includes a compute device having circuitry to obtain a request to write a data set. The circuitry is also to apply, to the data set, an approximation function to produce an approximated data set. Additionally, the circuitry is to determine whether the approximated data set is already present in a shared memory and write, to a translation table and in response to a determination that the approximated data set is already present in the shared memory, an association between a local memory address and a location, in the shared memory, where the approximated data set is already present. Additionally, the circuitry is to increase a reference count associated with the location in the shared memory.