G06F16/24556

Method and apparatus for processing dataset

The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.

SYSTEM AND METHOD FOR DISJUNCTIVE JOINS USING A LOOKUP TABLE

Joining data using a disjunctive operator using a lookup table is described. An example computer-implemented method can include receiving a query with a set of conjunctive predicates and a set of disjunctive predicates. The method may also include generating a lookup table for each predicate in the sets of conjunctive predicates and disjunctive predicates. The method, for each row in a probe-side table, may also further include looking up a value associated with that row in each of the lookup tables and adding the row to a results set when there is a match. Additionally, the method may also include returning the results set.

METHODS AND APPARATUS TO ESTIMATE AUDIENCE SIZES OF MEDIA USING DEDUPLICATION BASED ON VECTOR OF COUNTS SKETCH DATA

Methods and apparatus to estimate audience sizes using deduplication based on vector of counts sketch data are disclosed. A system includes hardware circuitry to instantiate: coefficient analyzer circuitry to determine coefficient values of a polynomial based on (i) variances in values in a first vector of counts and a second vector of counts, (ii) a first cardinality of the first vector of counts, and (iii) a second cardinality of the second vector of counts; and overlap analyzer circuitry to: determine a real root of the polynomial; and report generator circuitry to estimate a deduplicated audience size based on (i) the estimate of the quantity of the second subscribers that are duplicates of the first subscribers and (ii) the first and second cardinalities. The system includes communication circuitry to transmit a network communication to a third party entity, the second network communication including a report based on the deduplicated audience size.

Method for Dynamic Resource Scheduling of Programmable Dataplanes for Network Telemetry

A method for network dataplane telemetry includes: a) receiving telemetry queries, where each query includes a requested network telemetry task, and associated query result accuracy and query result latency weights; b) every epoch, scheduling the telemetry queries to produce a schedule associating to each sub-epoch of an epoch for a subset of the telemetry queries; c) every sub-epoch, reprogramming a programmable dataplane device to execute scheduled telemetry queries associated the sub-epoch; d) every sub-epoch, collecting and aggregating intermediate query results from the programmable dataplane device; e) every epoch, returning aggregated results of completed queries; wherein scheduling the telemetry queries uses a multi-objective optimization that uses multiple objective functions weighted by the query result accuracy and query result latency weights to balance resource requirements of the runtime programmable network switch, query result accuracy, and query result latency.

Enabling Real-Time Integration of Up-To-Date Siloed Data
20220327129 · 2022-10-13 ·

A system and a method are disclosed for receiving, via a user interface, user input of a first parameter and a second parameter. The system identifies aggregations corresponding to the parameters, the aggregations updated based on input from respective sets of machines, the aggregations being siloed with respect to one another. The system transmits a first query to the first aggregation corresponding to the first parameter, and receives a first response to the first query comprising first data, and transmits a second query to the second aggregation corresponding to the second parameter, and receives a second response to the second query comprising second data. The system integrates the first data and the second data into integrated data, and provides for display, via the user interface, a representation of the integrated data.

DENSE RETRIEVAL OF DOCUMENT TEMPLATES

A system and method are provided for supporting dense retrieval of a template (e.g., a document template) for responding to a query or other textual input. The templates and past queries that were responded to using the templates are stored. A machine-learning model for matching a new query to the most appropriate template is trained using a selected subset of the stored queries as training queries. For each of one or more training batches or phases, multiple stored templates are selected (e.g., randomly) then, from among all training queries that the selected templates were used for, the same number of queries are selected (e.g., randomly), such that they represent the distribution of the training queries among the selected templates. A unique loss function is computed that leverages similarities and differences not only between each selected training query and each selected template, but also between different queries and between different templates.

Browser-based aggregation

A system and method for aggregating account data, and more specifically, a system and method for aggregation of financial account data that provides enhanced privacy and security protections to a user by enabling the user to maintain custody of his or her login credentials. A syncing agent in coordination with a system add-on coordinates log-in to a remote system and storage of session information. Syncing agent utilizes the session agent to retrieve additional information on behalf of the user or perform other tasks on the remote server.

Read-time relevance-based unseen notification handling

Technologies for unseen notification handling are described. Embodiments select an initial set of notifications, provide the selected initial set of notifications to a client device, store seen notifications in a first data store, maintain sent but unseen notifications in a second data store that is an in-memory online data store, retrieve a set of the sent but unseen notifications from the second data store, create a list of unseen notifications by combining the retrieved set of sent but unseen notifications with a set of unsent and unseen notifications, generate a set of relevance scores for the list of unseen notifications, create a new version of the list of unseen notifications based on the new set of relevance scores, and provide the new version of the list of the unseen notifications to the client device.

SYSTEM AND METHOD FOR PRIVACY-PRESERVING ANALYTICS ON DISPARATE DATA SETS

A system and method for providing the ability to use k-anonymous groups to analyze disparate data sets via the use of either individual to segment or segment to segment matching using modelling or querying approaches are disclosed. The system and method include creating a common representation across all consumer and producer data sets, training one or more models or defining one or more queries optimized to recognize the behavior of the specified subjects within the generated common representation, evaluating those models or executing those queries on the common representation of the producer data set(s) to identify likely candidates for the specified input data subjects in each producer data set, the performing of actions over the identified subjects for each producer data set, and output the analytics result.

Aggregation operator optimization during query runtime

The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.