G06F3/0608

High performance space efficient distributed storage
11550755 · 2023-01-10 · ·

High performance space efficient distributed storage is disclosed. For example, a distributed storage volume (DSV) is deployed on a plurality of hosts, with a first host storing a local cache, and a storage controller executing on a processor of the first host receives a request to store a first file. The first file is stored to the local cache. The DSV is queried to determine whether a second file that is a copy of the first file is stored in the DSV. In response to determining that the DSV lacks the second file, the first file is transferred from the local cache to the DSV and then replicated to a second host of the plurality of hosts. In response to determining that the second file resides in the DSV, a reference to the second file is stored in the DSV and then replicated to the second host.

TECHNIQUES FOR DATA STORAGE MANAGEMENT
20230214115 · 2023-07-06 · ·

A data storage system can use non-volatile solid state drives (SSDs) to provide backend storage. The data storage system and SSDs can implement log structured systems (LSSs) experiencing write amplification (WA). The aggregated WA of the LSSs can be minimized when the WAs of both LSSs of the system and SSDs are equal, within a specified tolerance. An amount of storage capacity which the LSS of the data storage system is allowed to use can be limited and vary based on the system’s data capacity denoting the storage capacity with valid data. Pm can denote a percentage of Cs, the advertised capacity of the SSDs, storing valid data. Po can be a percentage of Cs denoting the upper bound of the system’s used capacity. Po and Pm, as well as the utilization and WA of both the data storage system and SSDs, can be evaluated and adjusted adaptively and holistically.

Optimizing garbage collection based on survivor lifetime prediction
11550712 · 2023-01-10 · ·

A predictive method for scheduling of the operations is described. The predictive method utilizes data generated from computing an expected lifetime of the individual files or objects within the container. The expected lifetime of individual files or objects can be generated based on machine learning techniques. Operations such as garbage collection are scheduled at an epoch where computational efficiencies are realized for performing the operation.

Efficient transfers between tiers of a virtual storage system

Efficiently transferring data between tiers in a virtual storage system, including: receiving, by the virtual storage system, a request to write data to the virtual storage system; transforming, within storage provided by a first tier of storage of the virtual storage system, the data to generate transformed data; and migrating, from the first tier of storage to a second tier of storage that is more durable than the first tier of storage of the virtual storage system, at least a portion of the transformed data.

System and method for error-resilient data reduction

A system and method for error-resilient data reduction, utilizing a phase detector, a data requestor, a multi-phase trainer, a reconstruction engine, a deconstruction engine, and one or more reference codebooks. A multi-phase trainer may be used to train the reconstruction and deconstruction engines on various phase sourceblocks in order recover quickly from corrupted data files that cause the phase alignment of the sourceblocks to become out of phase. A phase detector may determine when the sourceblocks get out of phase and when the return to in-phase by checking if a predetermined threshold probability of correct encoding is met. Data requestor may request for retransmission only the data that was received out of phase.

Efficiently writing data in a zoned drive storage system
11550481 · 2023-01-10 · ·

A list of a available zones across respective SSD storage portions of a plurality of zoned storage devices of a storage system is maintained. Data is received from multiple sources, wherein the data is associated with processing a dataset, the dataset including multiple volumes and associated metadata. Shards of the data are determined such that each shard is capable of being written in parallel with the remaining shards. The shards are mapped to a subset of the available zones, respectively. The shards are written to the subset of the available zones in parallel.

Memory system and method for controlling nonvolatile memory
11693770 · 2023-07-04 · ·

According to one embodiment, a memory system manages a plurality of management tables corresponding to a plurality of first blocks in a nonvolatile memory. Each management table includes a plurality of reference counts corresponding to a plurality of data in a corresponding first block. The memory system copies a set of data included in a copy-source block for garbage collection and corresponding respectively to reference counts belonging to a first reference count range to a first copy-destination block, and copies a set of data included in the copy-source block and corresponding respectively to reference counts belonging to a second reference count range having a lower limit higher than an upper limit of the first reference count range to a second copy-destination block.

File decay property
11693823 · 2023-07-04 · ·

A computer-implemented method, computer system, and computer program product for deleting a file from a storage medium. A file that is marked for automatic deletion is identified. A deletion time at which the file is to be deleted from the storage medium is identified. A visual indicator associated with the file is displayed. The visual indicator changes over time to provide an indication as to a nearness of the deletion time.

Optimized deduplication based on backup frequency in a distributed data storage system

Disclosed deduplication techniques at a distributed data storage system guarantee that space reclamation will not affect deduplicated data integrity even without perfect synchronization between components. By understanding certain “behavioral” characteristics and schedule cadences of backup operations that generate backup copies received at the distributed data storage system, data blocks that are not re-written by subsequent backup copies are pro-actively aged, while promoting continued retention of data blocks that are re-written. An expiry scheme operates with block-level granularity. Each unique deduplicated data block is given an expiry timeframe based on the block's arrival time at the distributed data storage system (i.e., when a backup copy supplies the block) and further based on backup frequencies of the various virtual disks referencing a unique system-wide identifier of the block, which is based on the block's hash value. Communications between components are kept to an as-needed basis. Cloud-based and multi-cloud configurations are disclosed.

MICRO-BATCHING METADATA UPDATES TO REDUCE TRANSACTION JOURNAL OVERHEAD DURING SNAPSHOT DELETION
20230214146 · 2023-07-06 ·

A method for deleting one or more snapshots using micro-batch processing is provided. The method includes receiving a request to delete the one or more snapshots, identifying one or more middle map extents exclusively owned by the one or more snapshots requested to be deleted, wherein metadata for the one or more snapshots is stored in one or more logical maps having logical map extents mapping logical block addresses (LBAs) to middle block addresses (MBAs) and a middle map having middle map extents mapping MBAs to physical block addresses (PBAs) of physical locations where data blocks are written, adding MBAs of the identified one or more middle map extents in a batch, determining a first micro-batch including a first subset of the MBAs in the batch, the first subset of MBAs being MBAs less than a first upper bound MBA, and using a first transaction to delete the middle map extents corresponding to the first subset of MBAs included in the first micro-batch.