Patent classifications
G06F16/174
Real-time data replication in a multiple availability zone cloud platform
The present disclosure relates to computer-implemented methods, software, and systems for managing data replication. A request associated with storing content of a file is received at a storage service provided by in a multiple availability zone cloud platform. A lock request is sent to an in-memory data grid at a first instance of the storage service to lock the file for accessing. An input stream of the file is received at the persistence interface to be read iteratively in portions. A read portion of the file is iteratively stored in a first file system storage associated with instances of the storage service at a first availability zone. The portions of the file are provided iteratively to a replication executor at the first instance of the storage service to request replication of the content of the file into a second file storage of a second availability zone of the cloud platform.
HYBRID INTERMEDIATE STREAM FORMAT
Systems and methods providing a hybrid intermediate stream format are provided. The method includes compressing a vertex into a first data block via a first compression method, compressing the vertex into a second data block via a second compression method, determining a smaller file of the first data block and the second data block, finalizing compression of the vertex via a compression method, selected from the first compression method and the second compression method, corresponding to the determined smaller file of the first data block and the second data block, and transferring the compressed vertex.
SYSTEMS AND METHODS FOR REPLICATION TIME ESTIMATION IN A DATA DEDUPLICATION SYSTEM
Systems and methods for of determining a replication time in a deduplicated file system are disclosed. Maximum streams are determined based on a number of allocated streams on a source node and a number of allocated streams on a target node. An available network bandwidth between the source node and the target node is determined. A delta time is estimated based at least on one or more duplicate fingerprints between a logical space unit of the source node and the target node by using at least one source smart filter and at least one target smart filter. The replication time is determined based on the maximum streams, the available network bandwidth between the source and target nodes, the estimated delta time, and a number of unique fingerprints that exist between the logical space unit of the source node and the target node.
SYSTEM FOR ELECTRONIC DATA COMPRESSION BY AUTOMATED TIME-DEPENDENT COMPRESSION ALGORITHM
A system is provided for electronic data compression by automated time-dependent compression algorithm. In particular, the system may track instances in which a particular dataset is used, copied, or accessed over time. For certain datasets (e.g., datasets that have not been accessed for a threshold amount of time), the system may use a time-based compression algorithm that progressively removes the least significant bits of such datasets as time passes. The compression of the datasets may continue until the system detects that further compression would cause the dataset to be unreadable or unrecoverable. In this way, the system may minimize the computing resources allocated to storing datasets that are not frequently accessed.
Method and system for content agnostic file indexing
A computer-implemented method for content-agnostic referencing of a binary data file, the method comprising: determining a length of the binary data file, the length comprising the number of bits of the binary data file; for the determined length, generating all permutations of data of the determined length; locating an index within the generated permutations, wherein the index is the starting position of the binary data file within the generated permutations; and using the length and the index to indicate the binary data file.
Container management system with a remote sharing manager
Methods, systems, and computer storage media for providing a set of common flat files in a composite image that can be mounted as a container (i.e. composite container) to support isolation and interoperation of computing resources. Container management is provided for a container management system based on a composite image file system engine that executes composite operations. In particular, a remote sharing manager operates with a composite engine interface to support generating composite images configured for split layer memory sharing, split layer direct access memory sharing, and dynamic base images. In operation, a plurality of files and a selection of a remote sharing configuration for generating a composite image are accessed. The composite image for the plurality of files and the remoting sharing configuration is generated. The composite image is communicated to cause sharing of the composite image, sharing of the composite image is based on the remote sharing configuration.
System, method, and computer program product for generating a file structure
Computer-implemented methods may include receiving first report template data. The first report template data may include a first version identifier identifying a first version of a report schema associated with a first report template. The first version of the report schema may be determined to be subsequent to a current version of the report schema based on the first version identifier. First sample export object data associated with the first version of the report schema may be retrieved. First sample file structure data associated with a file structure of the first version of the report schema may be determined based on the first sample export object data. A first file structure may be generated based on the first sample file structure data. The first file structure may be populated with a plurality of report templates including the first report template. Systems and computer program products are also provided.
OPTIMIZING RESOURCES IN A DISASTER RECOVERY CLEANUP PROCESS
In an approach for optimizing resources in a disaster recovery cleanup process, processors are configured for receiving transaction entries represented by transaction identifiers at a source database in communication with target databases via Synchronous-to-Asynchronous Traffic Converters (SATCs). Further, the processors are configured for transmitting a transaction payload from the SATCs to the target databases; identifying completed tracking entries corresponding to tracking entries having a complete status for the SATCs; deleting remaining transaction entries ranging from a transaction entry associated with a highest processed transaction identifier to a transaction entry associated with a lowest processed transaction identifier; providing a list of the remaining transaction entries that were deleted to predecessors of the SATCs; removing the remaining transaction entries from the SATCs if the transaction entries were delivered to all target databases; and detecting a topology change corresponding to one or more additional SATCs integrated with the one or more SATCs.
EFFICIENT REPLICATION OF FILE CLONES
A method for managing replication of cloned files is provided. Embodiments include determining, at a source system, that a first file has been cloned to create a second file. Embodiments include sending, from the source system to a replica system, an address of the first extent and an indication that a status of the first extent has changed from non-cloned to cloned. Embodiments include changing, at the replica system, a status of a second extent associated with a replica of the first file on the replica system from non-cloned to cloned and creating a mapping of the address of the first extent to an address of the second extent on the replica system. Embodiments include creating, at the replica system, a replica of the second file comprising a reference to the address of the second extent on the replica system.
Elastic, ephemeral in-line deduplication service
A deduplication service can be provided to a storage domain from a services framework that expands and contracts to both meet service demand and to conform to resource management of a compute domain. The deduplication service maintains a fingerprint database and reference count data in compute domain resources, but persists these into the storage domain for use in the case of a failure or interruption of the deduplication service in the compute domain. The deduplication service responds to service requests from the storage domain with indications of paths in a user namespace and whether or not a piece of data had a fingerprint match in the fingerprint database. The indication of a match guides the storage domain to either store the piece of data into the storage backend or to reference another piece of data. The deduplication service uses the fingerprints to define paths for corresponding pieces of data.