G06F16/3335

ANONYMIZATION OF QUERY INFORMATION WHILE RETAINING QUERY STRUCTURE AND SIZING INFORMATION
20230367768 · 2023-11-16 ·

When analyzing operation of an online system, such as an online concierge system, that receive queries from users and return results in response to those queries, queries received by the online system may include different numbers of terms. The variations in numbers of terms included in different queries makes it difficult to efficiently group queries for analysis, resulting in inefficient analysis of individual queries when analyzing different queries based on their structure. To more efficiently analyze different queries, the online system groups queries based on a number of terms in the query. In some embodiments, queries are grouped based on a logarithmic scale of the number of terms in the query, while in other embodiments queries are grouped based on a square root of the number of terms in the query.

Methods and systems for modifying a search query having a non-character-based input
11714809 · 2023-08-01 · ·

A method and system are provided for searching a search query having a non-character-based input. The method comprises receiving the search query comprising a first part and a second part. The first part comprises a non-character-based input. The method further comprises identifying a first plurality of keywords associated with the non-character-based input and receiving a selection of at least one of the first plurality of keywords. The method further comprises generating a modified search query comprising the at least one selected keyword and the second part. The method further comprises retrieving search results based on the modified search query and generating for presentation the search results.

CONFIGURABLE, STREAMING HYBRID-ANALYTICS PLATFORM

An analytics platform for the extraction of structured observations from largely narrative sources using a hybrid approach of user configuration and machine learning is provided. The analytics platform collects and normalizes data from public and private sources and applies extractions to the data to create a world view of objects, traits, and relationships of interest and maintains that world view as data and/or extractions are updated. The platform is further configured to apply queries to the extracted world view for a variety of purposes including scoring objects for prioritized attention, generating notifications when specific conditions are met, providing data sets for exploratory analysis, and triggering the automatic collection of enhancing data from external sources.

Systems and methods for micro-credential accreditation
11550832 · 2023-01-10 · ·

Systems and methods provide micro-credential accreditation. The systems and methods analyze, using one or more prediction models, received text submissions received from applicants via interaction with an applicant device. The prediction model(s) fit one or more micro-credentials to the received text submission, which may collectively or independently qualify the applicant for one or more accreditation credits. By processing the received text submission, the systems and methods allow for consistent and standard output of micro-credentials by the prediction model(s). Furthermore, the systems and methods provide for monitoring the prediction model output(s) to ensure ethical fairness across varying demographic groups of applicants.

EXECUTING SERVICES ACROSS MULTIPLE TRUSTED DOMAINS FOR DATA ANALYSIS

Computing systems of a multi-tenant trusted domain collect metadata describing data stored in data sources of a set of tenant trusted domains. The computing systems of the multi-tenant trusted domain use the metadata to process natural language questions based on data stored in data sources of a tenant trusted domain. The computing systems of the multi-tenant trusted domain identify a set of data sources of the tenant trusted domain that are relevant for processing the natural language question and generate an execution plan for answering the natural language question. The computing systems of the multi-tenant trusted domain send the execution plan to one or more computing systems of the tenant trusted domain. The computing systems of the tenant trusted domain execute the execution plan and send the result of executing the execution plan to a client device that sent the natural language question.

Data processing systems and methods for automatically redacting unstructured data from a data subject access request

System and methods are disclosed for redacting analyzing unstructured data in a request for data associated with a data subject to determine whether the unstructured data is relevant to the request. The relevancy of pieces of the unstructured data may be determined by determining a categorization for each such piece of unstructured data and comparing them to known personal data associated with the data subject having the same categorization. Pieces of the unstructured data that do not match known personal data having the same categorization are redacted from the request before the request is processed.

Identifying and extracting addresses within content

The disclosed embodiments provide a system for processing data. During operation, the system extracts text windows of varying length from text in one or more content items associated with an entity. Next, the system applies a machine learning model to features for the text windows to produce scores representing the likelihoods that the text windows contain addresses. The system then identifies, based on the scores and validation rules applied to the text windows, one of the text windows as an address for the entity. Finally, the system stores the selected text window as the address for the entity.

SYSTEMS AND METHODS FOR ASSESSING RISK ASSOCIATED WITH A MACHINE LEARNING MODEL
20220222440 · 2022-07-14 ·

Techniques for assessing risk associated with a machine learning model trained to perform a task. The techniques include: using at least one computer hardware processor to execute software to perform: obtaining natural language text including a plurality of answers to a respective plurality of questions for assessing risk for the machine learning model; identifying, using a second natural language processing (NLP) technique and from among a plurality of topics, the risk report indicating at least one risk associated with the machine learning model and at least one action to perform for mitigating the at least one risk associated with the machine learning model; and outputting the risk report to a user of the software.

Discriminating ambiguous expressions to enhance user experience

Methods and systems are provided for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component. The natural language expression may include at least one of words, terms, and phrases of text. A dialog hypothesis set from the natural language expression may be created by using contextual information. In some cases, the dialog hypothesis set has at least two dialog hypotheses. A plurality of dialog responses may be generated for the dialog hypothesis set. The dialog hypothesis set may be ranked based on an analysis of the plurality of the dialog responses. An action may be performed based on ranking the dialog hypothesis set.

Candidate data ranking method using previously selected candidate data
11423071 · 2022-08-23 · ·

Various embodiments provide a method for sorting data. The method can include: receiving criteria data from a user regarding an open opportunity; receiving candidate data, storing candidate data in a candidate profile in a candidate database; evaluating the candidate data, wherein evaluating the candidate data comprises: comparing the audio data of the candidate to a database of successfully-selected candidates to determine an N-gram analysis result for the candidate data, wherein the N-gram analysis includes creating a textual transcript of the audio data for a portion audio data; removing stop words from the textual transcript; creating an N-gram data set based on the textual transcript with stop words removed; comparing the N-gram data set of the candidate data to an N-gram data set of the successfully-selected candidates; and sorting the candidate data within a plurality of other candidate data using the N-gram analysis result.