G06F16/1756

SYSTEM AND METHOD FOR AN ULTRA HIGHLY AVAILABLE, HIGH PERFORMANCE, PERSISTENT MEMORY OPTIMIZED, SCALE-OUT DATABASE

A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.

System and method for an ultra highly available, high performance, persistent memory optimized, scale-out database

A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.

Delivery of digital information to a remote device
11709810 · 2023-07-25 · ·

Methods and systems relating to a file distribution scheme in a computer network are disclosed that distributes files in an efficient manner that reduces, among other things, network traffic. In an embodiment of the invention, a method for updating a file is disclosed. In such a method, unique chunks in a first version of a digital file are identified. For a second version of the digital file, chunks that are the same as in the first version are identified. Recompilation information is generated and stored for these identified chunks. Also, for the second version of the digital file, chunks in the second version that are different from chunks in the first version are identified. Recompilation information is generated and stored for these identified chunks. With this information, the second version of the digital file is completely defined and can be efficiently stored.

COMPRESSION OF LOCALIZED FILES
20230021891 · 2023-01-26 · ·

A method for compressing a first application file and second application file includes accessing the first and the second application files, the first application file being in a first language and the second application being in a second language and being a counterpart of the first application file, decompressing the first and second application files to access internal files for the first and the second application files, comparing one of the first internal files to one of the second internal files, upon determining that the first internal file is identical to the second internal file, copying one of the internal files to an output folder, and upon determining that the files are not identical, copying both of the internal files to the output folder, or executing a differencing procedure on the first and second internal files to identify differences between them, storing data about the differences in the output folder, and compressing the output folder into one output file.

Information source agent systems and methods for distributed data storage and management using content signatures
11561931 · 2023-01-24 · ·

Information source agent systems and methods for distributed content storage and management using content signatures that use file identicality properties are provided. A data management system is provided that includes a content engine for managing the storage of file content, a content signature generator that generates a unique content signature for a file processed by the content engine, a content signature comparator that compares content signatures and a content signature repository that stores content signatures. Information source agents are provided that include content signature generators and content signature comparators. Methods are provided for the efficient management of files using content signatures that take advantage of file identicality properties. Content signature application modules and registries exist within information source clients and centralized servers to support the content signature methods.

DIFFERENCE ENGINE FOR MEDIA CHANGE MANAGEMENT
20230215467 · 2023-07-06 · ·

A universal media difference engine generates a change list specifying the edits required to create an edited revision of a media composition from a base version. The difference engine determines the format of the media composition, locates and installs a plug-in corresponding to the format, and uses the plug-in to parse the composition and generate the change list. The supported compositional formats include formats native to specific media editing applications, as well as interoperable formats. The difference engine is able to convert rich change lists expressed in native form to canonical change lists that are compatible with multiple editing applications. Timeline, mixer configuration, and scene graph composition types are supported. Content management system storage requirements are reduced by storing a base version and change lists instead of multiple revisions of the composition. A media composition recreation engine recreates an edited revision by applying a change list to a prior version.

SYSTEMS AND METHODS FOR REPLICATION TIME ESTIMATION IN A DATA DEDUPLICATION SYSTEM

Systems and methods for of determining a replication time in a deduplicated file system are disclosed. Maximum streams are determined based on a number of allocated streams on a source node and a number of allocated streams on a target node. An available network bandwidth between the source node and the target node is determined. A delta time is estimated based at least on one or more duplicate fingerprints between a logical space unit of the source node and the target node by using at least one source smart filter and at least one target smart filter. The replication time is determined based on the maximum streams, the available network bandwidth between the source and target nodes, the estimated delta time, and a number of unique fingerprints that exist between the logical space unit of the source node and the target node.

EFFICIENT REPLICATION OF FILE CLONES
20220414064 · 2022-12-29 ·

A method for managing replication of cloned files is provided. Embodiments include determining, at a source system, that a first file has been cloned to create a second file. Embodiments include sending, from the source system to a replica system, an address of the first extent and an indication that a status of the first extent has changed from non-cloned to cloned. Embodiments include changing, at the replica system, a status of a second extent associated with a replica of the first file on the replica system from non-cloned to cloned and creating a mapping of the address of the first extent to an address of the second extent on the replica system. Embodiments include creating, at the replica system, a replica of the second file comprising a reference to the address of the second extent on the replica system.

Analysis of streaming data using deltas and snapshots

Implementations described herein relate to methods, systems, and computer-readable media to obtain snapshots used for analysis of streaming data. In some implementations, a computer-implemented method includes receiving initial data that includes a plurality of identifiers and corresponding timestamps, generating and storing a snapshot based on the initial data, wherein the snapshot includes the identifiers and a corresponding status, receiving a data stream that includes a subset of the identifiers, activity information for each identifier in the subset, and corresponding timestamps. The method further includes periodically analyzing the data stream to obtain a delta that includes an updated status for each identifier in the subset, storing the delta separate from the snapshot. The method further includes receiving a request for identifiers that are active in a particular time period, and based on the particular time period, retrieving active identifiers from the data stream, the delta, or the snapshot.

Processing device configured for efficient generation of data reduction estimates for combinations of datasets
11593313 · 2023-02-28 · ·

An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify at least first and second datasets to be scanned to generate a data reduction estimate for a prospective combination of the first and second datasets, to designate a scan criterion to be utilized in the scan of each of the datasets, and for each of a plurality of pages of each of the datasets, to scan the page, where scanning the page comprises performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a data reduction estimate table for the dataset. The processing device merges contents of the data reduction estimate tables, and generates the data reduction estimate based at least in part on the merged contents.