G06F40/284

Automatic Synonyms, Abbreviations, and Acronyms Detection
20230039689 · 2023-02-09 ·

A completely unsupervised solution for generating and maintaining a list of lexically similar terms for an e-commerce system is provided. Given a particular electronic collection of items in an e-commerce system, each term in a first item listing is initially paired with each term in a second item listing to form a set of token pairs. The token pairs represent possible candidates for being synonyms. For a respective token pair, an attempt is made to match the shortest token of the token pair to the longest token of the token pair, character by character. If a match is successful, the terms in the token pair are automatically labeled as synonyms for the particular electronic collection of items. Some implementations automatically filter out false positives and/or token pairs that are unrelated and not likely synonyms. The solution can be performed at the granularity of a product, category, vertical, or entire catalog.

IDENTIFYING AND TRANSFORMING TEXT DIFFICULT TO UNDERSTAND BY USER

A computer-implemented method, system and computer program product for improving understandability of text by a user. A final word vector for each word in a sentence of a document is computed, such as by averaging a first word vector and a second word vector for that word. Furthermore, elements of a user portrait are vectorized. A distance is then computed between a vector for each word in the sentence and a vectorized element in the user’s portrait which is summed to form an evaluation result for the element. An evaluation result is also formed for every other element in the user’s portrait by performing such a computation step. A “final evaluation result” is then generated corresponding to the evaluation results for every element in the user’s portrait. The document is then transformed in response to the final evaluation result indicating a lack of understanding of the sentence by the user.

IDENTIFYING AND TRANSFORMING TEXT DIFFICULT TO UNDERSTAND BY USER

A computer-implemented method, system and computer program product for improving understandability of text by a user. A final word vector for each word in a sentence of a document is computed, such as by averaging a first word vector and a second word vector for that word. Furthermore, elements of a user portrait are vectorized. A distance is then computed between a vector for each word in the sentence and a vectorized element in the user’s portrait which is summed to form an evaluation result for the element. An evaluation result is also formed for every other element in the user’s portrait by performing such a computation step. A “final evaluation result” is then generated corresponding to the evaluation results for every element in the user’s portrait. The document is then transformed in response to the final evaluation result indicating a lack of understanding of the sentence by the user.

DATA STRUCTURE MANAGEMENT SYSTEM
20230043217 · 2023-02-09 ·

A computing device generates a first token for first data content that is associated with a first relationship and a second relationship, and a second token for second data content that is associated with the first relationship and a third relationship, such that the first token and second token are generated based on a frequency of use of data values included in the first and the second data content. The computing device calculates a first similarity score of data values from third data content that is associated with the second relationship and a fourth relationship with data values from fourth data content that is associated with the third relationship and the fourth relationship in response to the first and second token matching. The computing device then performs, in response to the first similarity score satisfying a similarity threshold, a first modification to any of the data content.

DATA STRUCTURE MANAGEMENT SYSTEM
20230043217 · 2023-02-09 ·

A computing device generates a first token for first data content that is associated with a first relationship and a second relationship, and a second token for second data content that is associated with the first relationship and a third relationship, such that the first token and second token are generated based on a frequency of use of data values included in the first and the second data content. The computing device calculates a first similarity score of data values from third data content that is associated with the second relationship and a fourth relationship with data values from fourth data content that is associated with the third relationship and the fourth relationship in response to the first and second token matching. The computing device then performs, in response to the first similarity score satisfying a similarity threshold, a first modification to any of the data content.

NATURAL LANGUAGE BASED PROCESSOR AND QUERY CONSTRUCTOR
20230042940 · 2023-02-09 ·

An apparatus comprising an interface and a natural language processor. The interface receives a data retrieval request formatted in a natural language and the natural language processor processes the data retrieval request. Processing the data retrieval request includes identifying database entities, database relations, or any combination thereof based words in the data retrieval request. It can also include identifying database entity criterion, database relation criterion, or any combination thereof based on words in the data retrieval request. It also includes generating a database query based on the database entities, the database relations, the database entity criterion, the database relation criterion, or any combination thereof and causing the database query to be applied to a database. Wherein, processing the data retrieval request includes grammatically tagging the data retrieval request using part-of-speech tagging techniques, e.g. grammatical type, grammatical context, semantic, or any combination thereof, and a database ontology.

Text autocomplete using punctuation marks

A dataset comprising text-based messages can be accessed. Tokens for words and punctuation marks contained in the text-based messages can be generated. Each token corresponds to one word or one punctuation mark. A vector representation for each of a plurality of the tokens can be generated using natural language processing. A sequence of tokens corresponding to the text-based message can be generated for each of a plurality of the text-based messages in the dataset. Ones of the tokens that represent punctuation marks can be identified. An artificial neural network can be trained to predict use of the punctuation marks in sentence structures. The training uses the generated sequence of tokens and the vector representations for the tokens, in the sequence of tokens, that represent the punctuation marks.

Text autocomplete using punctuation marks

A dataset comprising text-based messages can be accessed. Tokens for words and punctuation marks contained in the text-based messages can be generated. Each token corresponds to one word or one punctuation mark. A vector representation for each of a plurality of the tokens can be generated using natural language processing. A sequence of tokens corresponding to the text-based message can be generated for each of a plurality of the text-based messages in the dataset. Ones of the tokens that represent punctuation marks can be identified. An artificial neural network can be trained to predict use of the punctuation marks in sentence structures. The training uses the generated sequence of tokens and the vector representations for the tokens, in the sequence of tokens, that represent the punctuation marks.

Machine learning based abbreviation expansion
11544457 · 2023-01-03 · ·

Techniques are described herein for determining a long-form of an abbreviation using a machine learning based approach that takes into consideration both sequential context and structural context, where the long-form corresponds to a meaning of the abbreviation as used in a sequence of words that form a sentence. In some embodiments, word representations are generated for different words in the sequence of words, and a combined representation is generated for the abbreviation based on a word representation corresponding to the abbreviation, a sequential context representation, and a structural context representation. The sequential context representation can be generated based on word representations for words positioned near the abbreviation. The structural context representation can be generated based on word representations for words that are syntactically related to the abbreviation. The combined representation can be input to a classification neural network trained to output a label representing the long-form of the abbreviation.

Generation of text from structured data

Implementations of the subject matter described herein provide a solution for generating a text from the structured data. In this solution, the structured data is converted into its representation, where the structured data comprises a plurality of cells, and the representation of the structured data comprises plurality of representations of the plurality of cells. A natural language sentence associated with the structured data may be determined based on the representation of the structured data, thereby implementing the function of converting the structured data into a text.