G06V30/268

UTILIZING MACHINE LEARNING MODELS, POSITION BASED EXTRACTION, AND AUTOMATED DATA LABELING TO PROCESS IMAGE-BASED DOCUMENTS

A device may receive image data that includes an image of a document and lexicon data identifying a lexicon, and may perform an extraction technique on the image data to identify at least one field in the document. The device may utilize form segmentation to automatically generate label data identifying labels for the image data, and may process the image data, the label data, and data identifying the at least one field, with a first model, to identify visual features. The device may process the image data and the visual features, with a second model, to identify sequences of characters, and may process the image data and the sequences of characters, with a third model, to identify strings of characters. The device may compare the lexicon data and the strings of characters to generate verified strings of characters that may be utilized to generate a digitized document.

Cataloging database metadata using a probabilistic signature matching process

A system and computer implemented method for cataloging database metadata using a probabilistic signature matching process are provided. The method includes receiving an input name to be matched to keys in a data corpus; dividing the received input name into a plurality of text segments; identifying a set of matching keys by matching each of the plurality text segments against keys in the data corpus; analyzing the set of matching keys to construct a tag; and cataloging the metadata with the matching key as the construct tag.

COMPUTATIONALLY REACTING TO A MULTIPARTY CONVERSATION

Technology is provided for causing a computing system to extract conversation features from a multiparty conversation (e.g., between a coach and mentee), apply the conversation features to a machine learning system to generate conversation analysis indicators, and apply a mapping of conversation analysis indicators to actions and inferences to determine actions to take or inferences to make for the multiparty conversation. In various implementations, the actions and inferences can include determining scores for the multiparty conversation such as a score for progress toward a coaching goal, instant scores for various points throughout the conversation, conversation impact score, ownership scores, etc. These scores can be, e.g., surfaced in various user interfaces along with context and benchmark indicators, used to select resources for the coach or mentee, used to update coach/mentee matchings, used to provide real-time alerts to signify how the conversation is going, etc.

SYSTEM AND METHOD TO EXTRACT SOFTWARE DEVELOPMENT REQUIREMENTS FROM NATURAL LANGUAGE
20210200515 · 2021-07-01 ·

The disclosure relates to system and method for extracting software development requirements from natural language information. In one example, the method may include receiving structured text data related to a software development and derived from natural language information, extracting a plurality of features for each sentence in the structured text data, and determining a set of requirement classes and a set of confidence scores for the each sentence, based on the plurality of features, using a set of classification models. The method may further include deriving a final requirement class and a final confidence score for the each sentence based on the set of requirement classes and the set of confidence scores for the each sentence corresponding to the set of classification models, and providing the software development requirements based on the final requirement class and the final confidence score for the each sentence.

Computer system, method and program for performing multilingual named entity recognition model transfer
11030407 · 2021-06-08 · ·

A multilingual named-entity recognition system according to an embodiment includes an acquisition unit configured to acquire an annotated sample of a source language and a sample of a target language, a first generation unit configured to generate an annotated named-entity recognition model of the source language by applying Conditional Random Field sequence labeling to the annotated sample of the source language and obtaining an optimum weight for each annotated named entity of the source language, a calculation unit configured to calculate similarity between the annotated sample of the source language and the sample of the target language, and a second generation unit configured to generate a named-entity recognition model of the target language based on the annotated named-entity recognition model of the source language and the similarity.

TEXT CLASSIFICATION FOR INPUT METHOD EDITOR
20210150289 · 2021-05-20 ·

Techniques are disclosed for an improved user interface, such as an input method editor (IME) that selectively collects text input based on text classification and user privacy preference. An example methodology implementing the techniques includes receiving, by the IME, at least one text input made by a user and, responsive to a determination that the at least one text input is a privacy word, causing the at least one text input to not be collected for learning usage habits of the user. The example method may also include, responsive to a determination that the at least one text input is not a privacy word, cause the at least one text input to be collected for learning usage habits of the user.

KNOWLEDGE POINT MARK GENERATION SYSTEM AND METHOD THEREOF
20210134298 · 2021-05-06 ·

A knowledge point mark generation system and a method thereof are provided. The system and the method thereof can obtain a label vocabulary by performing the analysis procedure on at least one second candidate vocabulary repeated by sound in a class, at least one first candidate vocabulary repeated by text during class, at least one second keyword highlighted by sound during class, and at least one first keyword highlighted by text during class according to their weights, and then set knowledge point marks on a timeline of a video file taken during class according to time periods when the label vocabulary appears. Thus, a learner can know the knowledge points of the class and their video clips in the video file without browsing the entire video file, so that it is convenient for the learner to learn or review the key points of the class.

APPARATUS, METHOD, AND STORAGE MEDIUM FOR SUPPORTING DATA ENTRY
20210099586 · 2021-04-01 ·

When a character required to be corrected is specified in a character string of a character recognition result, a plurality of candidate character strings are generated by using a substitution candidate for the specified character and not using a substitution candidate for a character other than the specified character, and one correct character string is finalized from the plurality of generated candidate character strings.

Knowledge point mark generation system and method thereof
10978077 · 2021-04-13 · ·

A knowledge point mark generation system and a method thereof are provided. The system and the method thereof can obtain a label vocabulary by performing the analysis procedure on at least one second candidate vocabulary repeated by sound in a class, at least one first candidate vocabulary repeated by text during class, at least one second keyword highlighted by sound during class, and at least one first keyword highlighted by text during class according to their weights, and then set knowledge point marks on a timeline of a video file taken during class according to time periods when the label vocabulary appears. Thus, a learner can know the knowledge points of the class and their video clips in the video file without browsing the entire video file, so that it is convenient for the learner to learn or review the key points of the class.

Electronic handwriting processor with convolutional neural networks
10949660 · 2021-03-16 · ·

An improved machine learning system is provided. For example, a content management server may provide real-time analysis of a user's handwriting to assess the user's knowledge of a language, including using a convolution neural network method. The convolution neural network method may be executed to normalize at least some identified strokes in the user's handwritten user input. Normalization may be performed by translating a window comprising a subset of pixels in a digital representation of the handwritten user input amongst a plurality of pixels in the digital representation.