Patent classifications
G06F16/1858
ASSIGN PLACEMENT POLICY TO SEGMENT SET
A plurality of segment sets of one or more storage segments of a distributed file system may be created and/or updated. The storage segments may be independently controlled. A placement policy may be assigned to each of the plurality of segment sets. The placement policy may control an initial placement and/or relocation of an object to the one or more storage segments for the assigned storage set.
Managing I/O operations in a shared file system
A method for managing I/O operations in a shared file system environment. The method includes receiving for each of a plurality of compute nodes, information associated with I/O accesses to a shared file system, and applications for executing the I/O accesses. The method includes creating application profiles, based, at least in part, on the received information. The method then includes determining execution priorities for the application, based, at least in part, on the created application profiles.
Parallel file system with metadata distributed across partitioned key-value store
Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Optimized object status consistency within clustered file systems
Responsive to receiving an identification of a new state identifier associated with the state of an object within a file from a child node, a master node updates a current state identifier for the object to the new state identifier in the master node. Responsive to a predefined user specification indicating that the new state identifier is to be broadcast to each remaining child node of a subset of child nodes in a plurality of child nodes that have a copy of the object, the master node identifies the subset of child nodes in the plurality of child nodes that have a copy of the object in the master node. The master node then broadcasts the new state identifier to the subset of child nodes that have a copy of the object.
Distributed data processing framework
In one aspect, there is provided a system. The system may store instructions that result in operations when executed by the at least one data processor. The operations may include receiving raw transactional data, collating, and reading the raw transactional data from the plurality of data sources. The operations may further include randomly sampling the raw transactional data. The operations may further include transforming the raw transactional data into at least one resilient distributed dataset. The operations may further include mapping the at least one resilient distributed dataset with a corresponding unique key. The operations may further include aggregating the at least one resilient distributed dataset on a key field. The operations may further include iterating over a lookup table. The operations may further include aggregating the data lines corresponding to the unique key associated with the at least one resilient distributed dataset. The operations may further include appending in-memory data lines serially to form a consumer level data string.
Parallel processing of changes in a distributed system
Systems and methods include reception of a request for changed data of an object from a subscriber, determination of a logging table associated with the object and comprising a plurality of logging table entries, determination of a pointer to a last-processed entry of the logging table based on the object and the subscriber, definition of a plurality of sub-portions of logging table entries subsequent to the last-processed entry, and reconstruction and transfer of first data associated with a first one of the plurality of sub-portions to the subscriber using a first process, and reconstruction and transfer, in parallel with the first process, second data associated with a second one of the plurality of sub-portions to the subscriber using a second process.
Cloning a tracking copy of replica data
Cloning a tracking copy of replica data, including receiving, at a target data repository from a source data repository, metadata describing one or more updates to a dataset stored within the source data repository; generating, based on the metadata describing the one or more updates to the dataset, a tracking copy of replica data on the target data repository; and generating, based on the tracking copy, a cloned image of the dataset that is modifiable without modifying the tracking copy of the replica data.
Method for connecting a relational data store's meta data with hadoop
A system for sharing a metadata store between a relational database and an unstructured data source is disclosed. The unstructured data source may comprise a Hadoop system with a Hadoop Distributed Files System.
System And Method For Analyzing Data Records
Systems and methods for analyzing input data records are provided in which a master process initiates a plurality of concurrent first processes each of which comprises, for each data record in at least a subset of a plurality of input data records, creating a parsed representation of the data record and independently applying a procedural language query to the parsed representation to extract one or more values. A respective emit operator is applied to at least one of the extracted one or more values thereby adding corresponding information to a respective intermediate data structure. The respective emit operator implements one of a predefined set of statistical information processing functions. The master process also initiates a plurality of second processes each of which aggregates information from a corresponding subset of intermediate data structures to produce aggregated data that is, in turn, combined to produce output data.
Uniform Model For Distinct Types Of Data Replication
A uniform model for distinct types of data replication, including receiving, at a source data repository, an update to a dataset; generating, based on the update to the dataset, both metadata describing the update to the dataset and also a metadata representation of the dataset; and initiating, based on the same metadata describing the update to the dataset and also based on the same metadata representation of the dataset, either a first type of data replication or a second type of data replication from among a plurality of types of data replication.