G06F16/1858

System and method for analyzing data records

A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.

Performing a closure merge operation

In a method for data management, one or more processors identifying a source closure, a target file set, and a previously merged closure, wherein the source closure is a closure of files that includes changed files to merge into the target file set, and wherein the previously merged closure is a closure of files previously merged into the target file set. The method further includes one or more processors loading the identified source closure, the previously merged closure, and a closure of ancestor files shared by the identified source closure and the previously merged closure into a merge session. The method further includes one or more processors determining one or more file merge conflicts in the merge session based on differences between a comparison of the identified source closure to the closure of ancestor files and a comparison of the previously merged closure to the closure of ancestor files.

Cluster file system with metadata server for storage of parallel log structured file system metadata for a shared file

Data from a group of distributed processes to a shared file is written using a parallel log-structured file system. A metadata server of a cluster file system is configured to communicate with a plurality of object storage servers of the cluster file system over a network. The metadata server further configured to implement a Parallel Log Structured File System (PLFS) library to coordinate storage on one or more of the plurality of object storage servers of a plurality of portions of a shared file generated by a plurality of applications executing on compute nodes of the cluster file system and to store metadata for the plurality of portions of the shared file. Concurrent writes to the shared file are decoupled by writing the plurality of portions of the shared file generated by each of the plurality of applications to independent write streams for each application. The metadata server communicates with a plurality of applications executing on the compute nodes over the network to process metadata requests from the applications.

Data beacon pulser(s) powered by information slingshot
11487717 · 2022-11-01 · ·

Systems and methods for providing data beacons are disclosed. In some embodiments the system can include a first node and a second node. Each node includes a read queue, a write queue and a parallel file system. Data is written from the write queue on the first node to the parallel file system on the second node and from the write queue on the second node to the parallel file system on the first node. The read queue on each node receives data from the parallel file system on the node itself.

PROCESSING STREAMS ON EXTERNAL DATA SOURCES
20220058160 · 2022-02-24 ·

The subject technology receives an operation to perform on an external data source accessible via a network, the external data source being hosted by an external system separate from a network-based database system. The subject technology determines a set of shards corresponding to the external data source. The subject technology determines a set of offsets of each shard of the set of shards. The subject technology, based on the set of shards and the set of offsets, performs the operation on the external data source. The subject technology provides an indication that the operation is complete.

Database migration management

A database to migrate from a first database system to a second database system is identified. Prior to the database being migrated from the first database system to the second database system, information associated with the first database system is analyzed to determine a physical design for the database to have in the second database system.

Data conversion method
09779100 · 2017-10-03 · ·

Methods of converting data are provided. In one embodiment, a data conversion method is provided that includes partitioning the data file into a plurality of file segments. The method also includes assigning a plurality of key values for each of the plurality of file segments. Also, the method includes forming a key value file from the plurality of key values.

Lustre file system

A computer-executable method, system, and computer program product of managing I/O received within a Lustre file system, the computer-executable method, system, and computer program product comprising: receiving a data I/O request, wherein the data I/O request relates to data stored within the Lustre file system; processing the data I/O request in a journal stored on a fast data storage device within the Lustre file system; analyzing the journal to make a determination related to the data I/O request; and responding to the data I/O request.

Parallel streaming
09736202 · 2017-08-15 · ·

Embodiments of the present invention set forth techniques for a content player to stream a media file using multiple network connections. To stream the media file, the content player downloads metadata associated with a requested media file, establishes a network connection with multiple content servers (or multiple network connections with a single content server or both) and begins requesting portions of the media file. In response, the requested portions are transmitted to the content player. The content player may employ a predictive multi-connection scheduling approach to determine which network connection to use in downloading a given chunk.

MARKOV DECISION PROCESS FOR EFFICIENT DATA TRANSFER
20220035855 · 2022-02-03 · ·

Techniques are disclosed for improving transfer speed for a plurality of files (e.g., image files) by using a Markov decision process to determine an optimal number of parallel instances of transfer stages and optimal file batch sizes for each instance. The transfer (e.g., import or export) operation involves different stages that are each optimized using the algorithm. The stages include a file fetch operation, a file processing operation, and a database update operation. Each of the stages may have multiple parallel instances to process many files at the same time. The Markov decision process uses a reward structure to determine the optimal number of parallel instances for each stage and the number of files operated on at each instance at any given moment in time. The process is dynamic and adaptable to any system environment since it does not rely on any particular hardware or operating system configuration.