G06F3/0664

Write input/output optimization for virtual disks in a virtualized computing system

An example method of handling, at a hypervisor on a host in a virtualized computing system, a write input/output (IO) operation to a file on a storage device having a virtual machine file system (VMFS) is described. The method includes: sorting, at the hypervisor, a scatter-gather array for the write IO operation into sets of scatter-gather elements, each of the sets including at least one scatter-gather element targeting a common file block address; resolving offsets of the sets of scatter-gather elements to identify a first scatter-gather array of transaction-dependent scatter-gather elements; generating logical transactions for the first scatter-gather array having updates to metadata of the VMFS for the file; batching the logical transactions into a physical transaction; and executing the physical transaction to commit the updates to the metadata of the VMFS on the storage device for the file.

Managing lifecycle of virtualization software running in a standalone host

Virtualization software installed in a standalone host is remediated according to a desired state model using a desired image of a virtualization software that is used to remediate virtualization software running in hosts which are logically grouped as a cluster of hosts not including the standalone host. The method of remediating the virtualization software installed in the standalone host includes the steps of generating a desired image of the virtualization software of the standalone host from a desired image of the virtualization software of the hosts in the cluster, and upon detecting a difference between an image of the virtualization software currently running in the standalone host and the desired image of the virtualization software of the standalone host, instructing the standalone host to remediate the image of the virtualization software currently running therein to match the desired image of the virtualization software of the standalone host.

Verification of metadata consistency across snapshot copy-on-write (COW) B+tree logical maps
11573860 · 2023-02-07 · ·

A method for verifying a consistency of snapshot metadata maintained in an ordered data structure for a plurality of snapshots in a snapshot hierarchy is provided. The method includes identifying a first plurality of nodes maintained in a first ordered data structure for a first snapshot that is a child of a second snapshot; for a first node of the first plurality of nodes, verifying the first node by checking for the first node in a second node map maintained in memory for the second snapshot, wherein the second node map includes a plurality of verified nodes in a second ordered data structure; and based on whether the first node is in the second node map: adding the first node to a first node map maintained in memory for the first snapshot, wherein the first node map includes verified nodes of the first plurality of nodes; or triggering an alarm.

METHODS AND SYSTEMS FOR STORING DATA IN A DISTRIBUTED SYSTEM USING OFFLOAD COMPONENTS

A method for storing data, the method comprising receiving, by an offload component in a client application node, a request originating from an application executing in an application container on the client application node, wherein the request is associated with data and wherein the offload component is located in a hardware layer of the client application node, and processing, by the offload component, the request by a file system (FS) client and a memory hypervisor module executing in a modified client FS container on the offload component, wherein processing the request results in at least a portion of the data in a location in a storage pool.

METHOD AND SYSTEMS FOR STORING DATA IN A STORAGE POOL USING MEMORY SEMANTICS WITH APPLICATIONS UTILIZING OBJECT SEMANTICS
20220350545 · 2022-11-03 ·

A method for storing data, comprising receiving, by an offload component in a client application node, a first request from a client translation and bridging container, wherein the client translation and bridging container translates a second request originating from an application executing in an application container on the client application node to the first request, wherein the second request is specified using object semantics and the first request is specified using file semantics, wherein the first request is associated with data, and wherein the offload component is located in a hardware layer of the client application node; and processing, by the offload component, the first request by a file system (FS) client and a memory hypervisor module executing in a modified client FS container on the offload component, wherein processing the first request results in at least a portion of the data in a location in a storage pool.

Virtual disk storage techniques

This document describes techniques for storing virtual disk payload data. In an exemplary configuration, each virtual disk extent can be associated with state information that indicates whether the virtual disk extent is described by a virtual disk file. Under certain conditions the space used to describe a virtual disk extent can be reclaimed and state information can be used to determine how read and/or write operations directed to the virtual disk extent are handled. In addition to the foregoing, other techniques are described in the claims, figures, and detailed description of this document.

Incremental restore of a virtual machine

Techniques are provided for incrementally restoring a virtual machine hosted by a computing environment. In response to receiving an indication that the virtual machine is to be incrementally restored, a snapshot of the virtual machine may be created while the virtual machine is shut down into an off state. The snapshot is transmitted to a storage environment as a common snapshot. The snapshot and the common snapshot are common snapshots comprising a same representation of the virtual machine. The common snapshot and a prior snapshot of the virtual machine are evaluated to identify a data difference of the virtual machine between the common snapshot and the prior snapshot. An incremental restore is performed of the virtual machine by transmitting the data difference from the storage environment to the computing environment to restore the virtual machine to a state represented by the prior snapshot.

Distributed data storage system using erasure coding on storage nodes fewer than data plus parity fragments

A distributed data storage system using erasure coding (EC) provides advantages of EC data storage while retaining high resiliency for EC data storage architectures having fewer data storage nodes than the number of EC data-plus-parity fragments. An illustrative embodiment is a three-node data storage system with EC 4+2. Incoming data is temporarily replicated to ameliorate the effects of certain storage node outages or fatal disk failures, so that read and write operations can continue from/to the storage system. The system is equipped to automatically heal failed EC write attempts in a manner transparent to users and/or applications: when all storage nodes are operational, the distributed data storage system automatically converts the temporarily replicated data to EC storage and reclaims storage space previously used by the temporarily replicated data. Individual hardware failures are healed through migration techniques that reconstruct and re-fragment data blocks according to the governing EC scheme.

Technique for replicating oplog index among nodes of a cluster

A technique replicates an index of an operations log (oplog) from a primary node to a secondary node of a cluster in the event of failure. The oplog functions as a staging area to coalesce random write operations directed to a virtual disk (vdisk) stored on a backend storage tier. The oplog temporarily caches write data as well as metadata describing the write data. The metadata includes descriptors to the write data corresponding to offset ranges of the vdisk and are used to identify ranges of write data for the vdisk that are cached in the oplog. To facilitate fast lookup operations of whether write data is cached in the oplog, an oplog index provides a state of the latest data for offset ranges of the vdisk that enables fast failover of metadata used to construct the oplog index in memory without downtime or significant metadata replay.

Writing Data To Compressed And Encrypted Volumes

A method of volume compressed header identification includes reading, by a processing device of a host, compressible data on a sector of a storage volume of a storage array. The method further includes compressing the compressible data to generate compressed data for the sector. The method further includes adding, by the processing device of the host, metadata associated with the storage volume to the compressed data. The method further includes writing the compressed data, including the added metadata, to the sector of the storage volume of the storage array.