G06F16/3347

Restricted access to sensitive content

In one aspect, the present disclosure relates to a method including: receiving, by a client device, a request to access content stored on a remote server; determining, by the client device, that the requested content includes sensitive information based on a user profile associated with the client device; modifying, by the client device, the requested content in response to the determination that the content includes sensitive information; and providing, by client device, access to the modified content in place of the requested content that includes the sensitive information.

Multiscale quantization for fast similarity search

The present disclosure provides systems and methods that include or otherwise leverage use of a multiscale quantization model that is configured to provide a quantized dataset. In particular, the multiscale quantization model can receive and perform vector quantization of a first dataset. The multiscale quantization model can generate a residual dataset based at least in part on a result of the vector quantization. The multiscale quantization model can apply a rotation matrix to the residual dataset to generate a rotated residual dataset that includes a plurality of rotated residuals. The multiscale quantization model can perform reparameterization of each rotated residual in the rotated residual dataset into a direction component and a scale component. The multiscale quantization model can perform product quantization of the direction components of the plurality of rotated residuals, and perform scalar quantization of the scale components of the plurality of rotated residuals.

Self-evolving knowledge graph

A computer system updates a knowledge graph. A model corresponding to a set of documents is received, wherein the model comprises a plurality of entities, a plurality of entity associations, and a plurality of confidence scores corresponding to the plurality of entity associations. A relevance value is calculated for each entity of the plurality of entities that are present in the set of documents and for each entity of the plurality of entities that are present in a new document. One or more entity associations that are supported by specific portions of the new document are identified. The confidence scores for each of the identified one or more entity associations are updated based on a level of support in the new document. Embodiments of the present invention further include a method and program product for updating a knowledge graph in substantially the same manner described above.

Messaging controller for anonymized communication
11533298 · 2022-12-20 · ·

A method may include receiving, from a first client, a first message. The first message may be matched to a second user based on a similarity between a first keyword included in the first message and a second keyword included in a profile of a second user. The first keyword may be determined to be similar to the second keyword based on a distance between a first vector representation of the first keyword and a second vector representation of the second keyword not exceeding a threshold value. In response to the first message being matched with the second user, the first message may be sent to a second client associated with the second user. In response to receiving, from the second client, a second message responsive to the first message, the second message may be sent to the first client. Related systems and articles of manufacture are also provided.

System and method for question answering with derived glossary clusters

A method, system, and computer-usable medium are disclosed for answering general background questions on a topic from documents with glossary sections, A set of documents with glossaries is received from which a set of terms and associated glossary entries are extracted, where each term has a corresponding glossary entry. Association is performed of related glossary entries. The associations is based on a similarity algorithm to form glossary clusters where each glossary cluster refers to one or more glossary entries. A query with query terms tailored to general information is received. The glossary clusters are ranked relevance to the query terms to form a ranked set. A set of glossary clusters meeting a high ranked threshold is selected and provided.

TECHNOLOGIES FOR RELATING TERMS AND ONTOLOGY CONCEPTS

This disclosure enables various technologies that can (1) learn new synonyms for a given concept without manual curation techniques, (2) relate (e.g., map) some, many, most, or all raw named entity recognition outputs (e.g., “United States”, “United States of America”) to ontological concepts (e.g., ISO-3166 country code: “USA”), (3) account for false positives from a prior named entity recognition process, or (4) aggregate some, many, most, or all named entity recognition results from machine learning or rules based approaches to provide a best of breed hybrid approach (e.g., synergistic effect).

TEXT EXTRACTION METHOD AND DEVICE, COMPUTER READABLE STORAGE MEDIUM AND ELECTRONIC DEVICE
20220398384 · 2022-12-15 ·

A text extraction method and device, computer-readable storage medium, and electronic device are described that relate to the technical field of machine learning. The method includes: acquiring to-be-extracted data and extracting a current trigger word in the to-be-extracted data using a target trigger word extraction model included in a target event extraction model; generating a current query sentence according to the current trigger word; and extracting a current event argument corresponding to the current trigger word according to the current query sentence and a target argument extraction model included in the target event extraction model, wherein the target trigger word extraction model and the target argument extraction model have a same model structure and parameter, and are connected in a cascading manner.

IDENTIFYING A CLASSIFICATION HIERARCHY USING A TRAINED MACHINE LEARNING PIPELINE

Techniques are disclosed for using a trained machine learning (ML) pipeline to identify categories associated with target data items even though the identified categories may not already be present in the hierarchy. The ML pipeline may include trained cluster-based and classification-based machine learning models, among others. If the results of the cluster-based and classification-based machine learning models are the same, then the target data items is assigned to a hierarchical classification consistent with the identical results of the machine learning model. An assigned hierarchical classification may be validated by the operation of subsequent trained ML models that determine whether parent and child categories in the identified classification are properly associated with one another.

EXPERT BOARD CASE SELECTION SYSTEM AND METHOD
20220399130 · 2022-12-15 ·

The present embodiments relate to generation of a docket of feature groups based on a plurality of identified features from a dataset. For instance, the docket can be used by a board of experts to efficiently review and provide insights to a larger number of cases. Data relating to each of the plurality of scenarios can be processed to extract textual features for each of the plurality of scenarios. Feature vectors can be populated with textual features from the plurality of scenarios, and feature groups can be derived by a docket generation model. The docket generation model can generate a docket comprising a listing of the set of feature groups. The listing of the set of feature groups can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups.

TECHNIQUES FOR IMPROVING STANDARDIZED DATA ACCURACY
20220391690 · 2022-12-08 ·

Described herein is a technique for mapping the raw text of a job title of an online job posting to an entity embedding, associated with an entity or entry of a title taxonomy. The raw text of the job title is first encoded to generate a multilingual word embedding in a multilingual word embedding space. Then, the vector representation of the job title, as represented in the multilingual word embedding space is translated, using a neural network, to a vector representation of the job title in the entity embedding space. Finally, a nearest neighbor search is performed to identify an entity embedding associated with an entity or entry in the title taxonomy that has a vector representation that is closest in distance to the vector output by the neural network.