G06F16/256

PLATFORM AND SOURCE AGNOSTIC DATA PROCESSING FOR STRUCTURED AND UNSTRUCTURED DATA SOURCES

Data queries that are agnostic to any particular data source may include a data source alias. The data source alias may be replaced with a data source identifier to obtain a data query configured for a target data source. Data processing jobs may be agnostic to any particular data processing platform. A data processing job may include a data processing task that is agnostic to any particular data processing platform. A code library may provide platform-specific code configured to implement a data processing task on a data processing platform. A data query configured for a particular data source and a data processing task configured for a particular data processing platform may be used to create a data processing job. Configurations that restrict execution of a data processing job to execution via an interactive development environment may be removed to allow its execution directly at the data processing platform itself.

INTELLIGENT CACHE MANAGEMENT FOR MOUNTED SNAPSHOTS BASED ON A BEHAVIOR MODEL

A client computing device receives a behavior model corresponding to a user group associated with a user. The behavior model has been trained with monitored user interactions of one or more files associated with the user group. The client computing device further mounts a snapshot of a file and determines, based on the behavior model, which files of the mounted snapshot to transfer to a locally accessible cache. During use of the client computing device, the client computing device may determine whether the mounted snapshot is accessible. If the mounted snapshot is not accessible, the client computing device may selectively delete, based on the behavior model, one or more of the files stored in the locally accessible cache. If the mounted snapshot is accessible, the client computing device may update the one or more files of the locally accessible cache with monitored user interactions with the mounted snapshot.

DECENTRALIZED QUERY EVALUATION FOR A DISTRIBUTED GRAPH DATABASE

The disclosed technologies are capable of decentralized query evaluation for a distributed graph database. In one technique, a query is divided into first and second sets of operations. The query comprises variables and constraints that correspond to at least two nodes and at least one edge of a graph in a graph database. The first set of operations for processing the query is assigned to multiple shards. A limit is communicated to the shards. The second set of operations for processing the query is executed. A list of completed operations is received from each shard. The lists of operations received from the shards are merged into a merged set of operations, which is used to determine whether query processing is finished. If query processing is not finished, then an updated limit is communicated to the shards; otherwise, query results are provided in response to the query.

Techniques and systems for storage and processing of operational data

A system stores data, such as sensor data or other operational data, on a plurality of storage volumes in a sequence so as to allow for interpolations or other approximations of the data using a subset of the storage volumes in response to a request for information regarding that data. For example, a plurality of devices connect to the system to provide operational data, which is then stored in a specified sequence on a specified set of volumes. In response to a request for operational information regarding some or all of the devices, the system reads at least one of the volumes, and approximates the values of the data over a specified period of time. In some embodiments, the data may be buffered prior to storage, and a jitter analyzer determines whether the incoming data is anomalous relative to a baseline, which may be determined using related data sets.

DETERMINING A DEGREE OF SIMILARITY OF A SUBSET OF TABULAR DATA ARRANGEMENTS TO SUBSETS OF GRAPH DATA ARRANGEMENTS AT INGESTION INTO A DATA-DRIVEN COLLABORATIVE DATASET PLATFORM
20220405292 · 2022-12-22 · ·

Various techniques are described, including evaluating ingested data including a dataset to identify one or more links to other datasets stored in a graph, using a similarity determination algorithm to identify a degree of similarity between datasets to determine joinability of ingested datasets with graph-stored datasets, determining a ratio to determine whether to perform an overlap or coverage function, associating a subset of similarity matrices with a subset of graph data joined to the ingested dataset, and forming links in a column of data between the dataset and the another dataset of the ingested data based on the degree of similarity.

DECENTRALIZED DATA PLATFORM

Data from data sources may be processed at an edge device. The edge device may generate a local processing result, filter the data, and/or prioritize the data. Accordingly, data is transmitted from the edge device to the data platform, where it may be processed further. For example, a local processing result may be processed at the data platform, such that processing is performed without all of the data source data. In examples, at least a part of such data may remain at an edge device. The edge device may maintain a manifest of data stored by the edge device. The data platform may generate an aggregated manifest using manifests from associated edge devices, such that it may be determined where data is stored. As a result, the data platform may redirect requests to an associated edge device when it is determined that requested data is remote from the data platform.

Federated search of multiple sources with conflict resolution

Methods and apparatuses related to federated search of multiple sources with conflict resolution are disclosed. A method may comprise obtaining a set of data ontologies (e.g., types, properties, and links) associated with a plurality of heterogeneous data sources; receiving a selection of a graph comprising a plurality of graph nodes connected by one or more graph edges; and transforming the graph into one or more search queries across the plurality of heterogeneous data sources. A method may comprise obtaining a first data object as a result of executing a first search query across a plurality of heterogeneous data sources; resolving, based on one or more resolution rules, at least the first data object with a repository data object; deduplicating data associated with at least the first data object and the repository data object prior to storing the deduplicated data in a repository that has a particular data model.

Distributed privacy-preserving computing on protected data

The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a data host based on the input data requirements, curating the data assets within a data storage structure that is within infrastructure of the data host, and integrating the algorithm into a secure capsule computing framework. The secure capsule computing framework serves the algorithm to the data assets within the data storage structure in a secure manner that preserves privacy of the data assets and the algorithm. The computer implemented method further includes running the data assets through the algorithm to obtain an inference.

Adaptive resource provisioning for a multi-tenant distributed event data store
11531570 · 2022-12-20 · ·

Systems and methods for adaptively provisioning a distributed event data store of a multi-tenant architecture are provided. According to one embodiment, a managed security service provider (MSSP) maintains a distributed event data store on behalf of each tenant of the MSSP. For each tenant, the MSSP periodically determines a provisioning status for a current active partition of the distributed event data store of the tenant. Further, when the determining indicates an under-provisioning condition exits, the MSSP dynamically increases number of resource provision units (RPUs) to be used for a new partition to be added to the partitions for the tenant by a first adjustment ratio. While, when the determining indicates an over-provisioning condition exists, the MSSP dynamically decreases the number of RPUs to be used for subsequent partitions added to the partitions for the tenant by a second adjustment ratio.

VERSIONED METADATA USING VIRTUAL DATABASES

Distributed database systems including a plurality of SQL compute nodes are described herein that enable such nodes to operate with versioned metadata despite the fact that SQL is only single-version aware. The distributed database system further includes a global logical metadata server to store and manage versions of metadata, to determine which of such versions should be visible at any given point in time, and enable creation of a virtual database that includes the proper versions of metadata. In an aspect, a central transaction manager manages global transaction identifiers and their associated start times, abort times and/or commit times that enables determination of transaction and metadata version visibility for any point in time. In an aspect, the visible metadata is included in a virtual database that logically overlays a physical database and provides the correct version of metadata in lieu of the current metadata version stored in the physical database.