G06F16/2237

TOP CONTRIBUTOR RECOMMENDATION FOR CLOUD ANALYTICS

A system and method including determining, for a specified target measure column of a first dataset including a plurality of records, the metadata of the first dataset, including a probability distribution for the specified target column and dimension scores for the dimensions for the first dataset conditioned on the specified target measure column, where the first dataset comprises a plurality of columns including the at least one target measure column and a plurality of non-numeric, dimension columns for the records of the first dataset; determining, for a subset of data of the first dataset based on one or more specified variables, dimension scores for the dimensions of the subset of data approximately derived from the determined metadata of the first dataset; and providing recommendations of top contributors based on the approximated dimension scores of dimensions of the subset of data.

IDENTIFICATION OF MATCHED SEGMENTED IN PAIRED DATASETS
20220382730 · 2022-12-01 ·

Disclosed herein relates to processes that identify segments of a target dataset that match segments of other datasets in a database. A computing server may encode the target dataset to generate a pair of encoded target bitmap sequences based on an encoding scheme. The encoding scheme defines encoding values based on homogeneity between the pair of data value sequences. The computing server may compare the pair of encoded target bitmap sequences with other pairs of encoded bitmap sequences to identify homogeneous mismatched locations. A homogeneous mismatched location may be a location where the target dataset and the other dataset in comparison are both homogeneous but have different types of homogeneity at the location. The computing server may identify a matched segment between the target dataset and one of the other datasets based on the homogeneous mismatched locations identified. The matched segment is contained within two homogeneous mismatched locations.

PROJECTION-BASED TECHNIQUES FOR UPDATING SINGULAR VALUE DECOMPOSITION IN EVOLVING DATASETS

A system, method, and computer program product are disclosed. The method includes loading a first set of data as an initial matrix and determining a truncated singular value decomposition (SVD) of the initial matrix. The method also includes loading a second set of data as a new matrix, generating a first projection matrix, which approximates k leading left singular vectors of the updated matrix, and generating a second projection matrix, which approximates k leading right singular vectors of the updated matrix. Further, the method includes determining based on the initial matrix, the new matrix, the SVD of the existing matrix, and the first or second projection matrix, an approximate truncated SVD of the updated matrix.

Learning system for pangenetic-based recommendations

An embodiment may involve storing, by a computing device and in a database, a set of pangenetic attributes of a set of individuals, wherein the pangenetic attributes of the set are respectively and statistically associated with products; based on the statistical associations between the pangenetic attributes and the products, determining, by the computing device, product recommendations for a second set of individuals; receiving, by the computing device and from the second set of individuals, a plurality of measures of satisfaction with the product recommendations; based on the plurality of measures of satisfaction, learning, by the computing device, an association between a subset of the pangenetic attributes and a particular product; and storing, by the computing device and in the database, the learned association, wherein the learned association provides a basis for subsequent recommendations of the particular product when a subsequent individual exhibits the subset of the pangenetic attributes.

Data governance with custom attribute based asset association

A computer-implemented method includes: reading a vector of a first table in a database, the vector including counts of a plurality of keywords in the first table, the plurality of keywords including a first keyword and a second keyword; determining a first custom attribute describing the first table, the first custom attribute having a vector including counts of at least a first portion of the plurality of keywords in the first table; determining a multiplier of the first custom attribute, the multiplier being a number of other tables that reference the first custom attribute; and revising the vector of the first table based on the first custom attribute.

SECURE MULTI-PARTY REACH AND FREQUENCY ESTIMATION

Systems and methods for generating min-increment counting bloom filters to determine count and frequency of device identifiers and attributes in a networking environment are disclosed. The system can maintain a set of data records including device identifiers and attributes associated with device in a network. The system can generate a vector comprising coordinates corresponding to counter registers. The system can identify hash functions to update a counting bloom filter. The system can hash the data records to extract index values pointing to a set of counter registers. The system can increment the positions in the min-increment counting bloom filter corresponding to the minimum values of the counter registers. The system can obtain an aggregated public key comprising a public key. The system can encrypt the counter registers using the aggregated shared key to generate an encrypted vector. The system can transmit the encrypted vector to a networked worker computing device.

TECHNIQUES FOR GENERATING AND PROCESSING HIERARCHICAL REPRESENTATIONS OF SPARSE MATRICES
20220374403 · 2022-11-24 ·

One embodiment sets forth a technique for generating a tree structure within a computer memory for storing sparse data. The technique includes dividing a matrix into a first plurality of equally sized regions. The technique also includes dividing at least one region in the first plurality of regions into a second plurality of regions, where the second plurality of regions includes a first region and one or more second regions that have a substantially equal number of nonzero matrix values and are formed within the first region. The technique further includes creating the tree structure within the computer memory by generating a first plurality of nodes representing the first plurality of regions, generating a second plurality of nodes representing the second plurality of regions, and grouping, under a first node representing the first region, one or more second nodes representing the one or more second regions.

Data Storage Using Roaring Binary-Tree Format
20220374404 · 2022-11-24 ·

Techniques are disclosed relating to managing virtual data sources (VDSs), including creating and using VDSs. A virtual data source manager (VDSM) that is executing on a computer system may receive a request to generate a bitmap index for a dataset. The VDSM may then generate a bitmap index by ingesting the dataset into a data format of the bitmap index. The VDSM may further generate the bitmap index by performing a compression procedure on the ingested dataset to generate a plurality of data containers, where a given data container includes a respective compressed portion of the ingested dataset. After compressing the ingested dataset, the VDSM may then store the plurality of data containers in a set of binary trees (b-trees), where the set of b-trees is usable to respond to data requests for data of the bitmap index.

STRUCTURAL DATA MATCHING USING NEURAL NETWORK ENCODERS
20230059579 · 2023-02-23 ·

Implementations of the present disclosure include methods, systems, and computer-readable storage mediums for receiving first and second data sets, both the first and second data sets including structured data in a plurality of columns, for each of the first data set and the second data set, inputting each column into an encoder specific to a column type of a respective column, the encoder providing encoded data for the first data set, and the second data set, respectively, providing a first multi-dimensional vector based on encoded data of the first data set, providing a second multi-dimensional vector based on encoded data of the second data set, and outputting the first multi-dimensional vector and the second multi-dimensional vector to a loss-function, the loss-function processing the first multi-dimensional vector and the second multi-dimensional vector to provide an output, the output representing matched data points between the first and second data sets.

Aggregating metrics in distributed file systems

Embodiments are directed to managing file systems over a network. A hierarchical index may be provided based on a file system and a plurality of objects stored in the file system A token index may be generated based on the hierarchical index. Each token may be a portion of the path of the objects Metric indices may be generated based on the hierarchical index and a plurality of metrics associated with the objects such that the metrics indices include one or more rows that corresponds to a place position for a metric value. Employing the token index and the metric indices to generate query results based on the plurality of metrics associated with the objects.