G06F40/232

Methods and systems for correcting transcribed audio files
11594211 · 2023-02-28 · ·

Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.

Method and apparatus for generating context information

A memory stores therein a document and a plurality of word vectors that are word embeddings respectively computed for a plurality of words. A processor extracts, with respect to one of the words, two or more surrounding words within a prescribed range from one occurrence position where the one word occurs, from the document, and computes a sum vector by adding word vectors corresponding to the surrounding words. The processor determines a parameter such as to predict the surrounding words from the sum vector and the parameter using a machine learning model. The processor stores the parameter as context information for the one occurrence position, in association with the word vector corresponding to the one word.

DATA ANALYSIS SYSTEM AND DATA ANALYSIS METHOD
20230059693 · 2023-02-23 ·

A data analysis method is provided to optimize the content of the medical record, and input the optimized medical record report into an application model, so that the application model can link the medical record report with the diagnosis code, and output an accurate recommended diagnosis code. With the assistance of the application model for the search of diagnostic codes, the overall quality of medical care is further improved.

METHOD FOR GENERATING TRAINING DATA AND METHOD FOR POST-PROCESSING OF SPEECH RECOGNITION USING THE SAME

Disclosed is a training data construction method and a speech recognition method using the same. The training data construction method is performed by a computing apparatus including at least one processor and includes converting first text data including a plurality of sentences to first speech data; acquiring second speech data by adding noise to the first speech data; and converting the second speech data to second text data. The second text data includes a sentence corresponding to each of the plurality of sentences included in the first text data.

System and method for unsupervised text normalization using distributed representation of words

A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

System and method for unsupervised text normalization using distributed representation of words

A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.

Mathematical and scientific expression editor for computer systems
11501055 · 2022-11-15 · ·

A method of creating a mathematical or scientific expression on a computer system in which a user interface is provided on a computer display device. Input data comprising a string of alphanumeric characters is received from a keyboard. The input data is matched with one or more verbalisations of a mathematical or scientific term. The or each matching term is displayed on the display device. When a user selects one of the displayed matching terms a corresponding graphical symbol is displayed on the display device. The method allows the user to type a verbalised version of the desired mathematical or scientific expression and to have the corresponding mathematical or scientific notation displayed on the display device.

NATURAL LANGUAGE PROCESSING FOR CATEGORIZING SEQUENCES OF TEXT DATA

Disclosed herein are system, method, and computer program product embodiments for categorizing sequences of text extracted from documents using natural language processing. In some embodiments, a categorization system may receive a first document file in a machine readable format. The categorization system may analyze a sequence of text from the first document file and identify a numeric text string in the sequence. The categorization system may also identify text data in the sequence matching text data from a second document file. The categorization system may remove the numeric text string and the matching data from the sequence to generate a trimmed version of the sequence. The categorization system may then apply a vectorization model to the trimmed version of the sequence as well as a trained deep learning model to the vector version to identify a corresponding category for the sequence of text.

NATURAL LANGUAGE PROCESSING FOR CATEGORIZING SEQUENCES OF TEXT DATA

Disclosed herein are system, method, and computer program product embodiments for categorizing sequences of text extracted from documents using natural language processing. In some embodiments, a categorization system may receive a first document file in a machine readable format. The categorization system may analyze a sequence of text from the first document file and identify a numeric text string in the sequence. The categorization system may also identify text data in the sequence matching text data from a second document file. The categorization system may remove the numeric text string and the matching data from the sequence to generate a trimmed version of the sequence. The categorization system may then apply a vectorization model to the trimmed version of the sequence as well as a trained deep learning model to the vector version to identify a corresponding category for the sequence of text.

Systems and methods for managing voice queries using pronunciation information

The system receives a voice query at an audio interface and converts the voice query to text. The system can determine pronunciation information during conversion and generate metadata the indicates a pronunciation of one or more words of the query, include phonetic information in the text query, or both. A query includes one or more entities, which may be more accurately identified based on pronunciation. The system searches for information, content, or both among one or more databases based on the generated text query, pronunciation information, user profile information, search histories or trends, and optionally other information. The system identifies one or more entities or content items that match the text query, and retrieves the identified information to provide to the user.