G06F16/24547

DATA PROCESSING METHOD, APPARATUS, AND DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
20230100679 · 2023-03-30 ·

This application provides a federated-learning-based data processing method, apparatus, and device, and a computer-readable storage medium. The method includes obtaining data to be processed, the data to be processed comprising multiple object identifiers and a feature value corresponding to each object identifier; binning the data to be processed based on the feature value corresponding to each object identifier to obtain a number of bins; determining multiple target identifier sets from each bin, and transmitting each target identifier set to a label-party device; receiving each piece of set label distribution information corresponding to each target identifier set from the label-party device, and determining bin label distribution information corresponding to each bin based on each piece of set label distribution information; and merging bins based on a binning policy and each piece of bin label distribution information to obtain a final binning result.

BEHAVIORAL BASELINING FROM A DATA SOURCE PERSPECTIVE FOR DETECTION OF COMPROMISED USERS

A method and system are disclosed. The method and system include receiving, at a wrapper, a communication and a context associated with the communication from a client. The communication is for a data source. The wrapper includes a dispatcher and a service. The dispatcher receives the communication and is data agnostic. The method and system also include providing the context from the dispatcher to the service. In some embodiments, the method and system use the service to compare the context to a behavioral baseline for the client. The behavioral baseline incorporates a plurality of contexts previously received from the client.

Management of distributed computing framework components in a data fabric service system

Systems and methods are described for establishing and managing components of a distributed computing framework implemented in a data intake and query system. The distributed computing framework may include a master and a plurality of worker nodes. The master may selectively operate on a search head captain that is chosen from the search heads of the data intake and query system. The search head captain may distribute configuration information for the master and the distributed computing framework to the other search heads, which in turn, may distribute that configuration information to indexers of the data intake and query system. Worker nodes may be selectively activated for operation on the indexers based on the configuration information, and the worker nodes may additionally use the configuration information to contact the master and join the distributed computing framework. This approach may provide numerous benefits, including improved security, flexibility in the selection of worker nodes, and redundancy for failures of physical components of the data intake and query system.

DATA MIGRATION BY QUERY CO-EVALUATION
20230031659 · 2023-02-02 ·

Techniques are disclosed to migrate data via query co-evaluation. In various embodiments, an input data associated with a source database S and a target schema T to which the input data is to be migrated is received. A set of relational conjunctive queries from target schema T to source database S is received. Query co-evaluation is performed on the received set of relational conjunctive queries to transition data from source database S to target schema T.

Storing data and parity via a computing system
11609912 · 2023-03-21 · ·

A method includes generating a plurality of parity blocks from a plurality of lines of data blocks. The plurality of lines of data blocks are stored in data sections of memory of a cluster of computing devices of the computing system by distributing storage of individual data blocks of the plurality of lines of data blocks among unique data sections of the cluster of computing devices. The plurality of parity blocks are stored in parity sections of memory of the cluster of computing devices by distributing storage of parity blocks of the plurality of parity blocks among unique parity sections of the cluster of computing devices.

VOLUME PLACEMENT FAILURE ISOLATION AND REPORTING
20230070038 · 2023-03-09 ·

Systems, methods, and machine-readable media are disclosed for isolating and reporting a volume placement error for a request to place a volume on a storage platform. A volume placement service requests information from a database using an optimized database query to determine an optimal location to place a new volume. The database returns no results. The volume placement service deconstructs the optimized database query to extract a plurality of queries. The volume placement service iterates over the plurality queries, combining queries in each iteration, to determine a cause for the database to return no results. The volume placement service determines based on the results of each iterative database request a cause the database to return an empty result. The volume placement service provides an indication of the cause for returning an empty result.

PLATFORM MANAGEMENT OF INTEGRATED ACCESS OF PUBLIC AND PRIVATELY-ACCESSIBLE DATASETS UTILIZING FEDERATED QUERY GENERATION AND QUERY SCHEMA REWRITING OPTIMIZATION

Various techniques are described for platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization, including receiving at a dataset access platform a query formatted according to a first data schema, generating a copy of the query, saving the query and the copy to a datastore, parsing the copy of the query in the first schema using an inference engine, determining whether the query comprises data associated with an access control condition associated with accessing the dataset, the access control condition being configured to indicate whether the query is permitted to access the dataset, and rewriting, using a proxy server, the copy of the query in a second schema, and optimizing the rewriting by identifying a database engine to execute the query and including other data converted into another triple associated with an attribute of the query.

QUERY ANALYSIS USING A PROTECTIVE LAYER AT THE DATA SOURCE

A method and system for performing query analysis are described. The method and system include receiving a query for a data source at a wrapper. The wrapper includes a dispatcher and a service. The dispatcher receives the query and is data agnostic. The method and system also include providing the query from the dispatcher to the data source and to the service as well as analyzing the query using the service.

Metadata classification
11630853 · 2023-04-18 · ·

Generating semantic names for a data set is described. An example method can include retrieving data from a data set, the data organized in a plurality of columns. The method may also include generating one or more candidate semantic categories for that column, wherein each of the one or more candidate semantic categories has a corresponding probability for each of the columns. The method may also further include creating a feature vector for each column from the one or more column candidate semantic categories and the corresponding probabilities. Additionally, the method may also include selecting, for each column, a column semantic category from the one or more candidate semantic categories using at least the feature vector and a trained machine learning model.

Distinct value estimation for query planning
11663213 · 2023-05-30 · ·

The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.