G06V30/268

Image reader performing character correction
10943108 · 2021-03-09 · ·

An image reader includes a document reading unit, and a control unit that functions as an individual image cutting section, character string detection section, mismatch detection section, judgment section, and correction section. The individual image cutting section cuts out individual images from image data obtained through reading by the document reading unit. The character string detection section detects character strings present on the individual images. The mismatch detection section detects, for the character strings detected by the character string detection section, a mismatching portion by making comparison between the individual images with considering character strings having contents identical or similar to each other as same information. The judgment section judges for the mismatching portions whether a ratio of majority characters reaches a predefined ratio. Upon judging that the ratio of the majority characters has reached the predefined ratio, the correction section replaces a minority character with the majority character.

System and method for cooperative text recommendation acceptance in a user interface

Methods for cooperative text recommendation acceptance of completion options in a user interface are performed by systems and devices. A user provides inputs via a user interface (UI) that are stored in an input buffer. As a portion of a first input is received, completion options for some part of the first input are determined based on statistical probabilities and the portion. A completion option is selected and displayed via the UI as completing the first input in a differentiated manner from the user-entered input. The user then either generates an acceptance command for the completion option or continues providing the first input and the UI adapts the remaining completion option portion. Acceptance commands are accepted as space characters or as alphanumeric characters representing additional input that follows the first input and the completion option. Statistical likelihoods are used to account for typographical errors and misspellings in user inputs.

CONTENT EVALUATION BASED ON MACHINE LEARNING AND ENGAGEMENT METRICS
20210073673 · 2021-03-11 ·

Techniques for machine learning analysis are provided. A machine learning (ML) model is trained to identify appropriate documents based on lexical knowledge of target groups. A lexical knowledge of a set of users is determined. Additionally, a first document of a plurality of documents is selected by processing the determined level of lexical knowledge using the ML model. The first document is presented to the set of users. A level of engagement of the set of users is then determined. Upon determining that the level of engagement is below a predefined threshold, a second document of the plurality of documents is selected using the ML model.

Translation Method and Apparatus and Electronic Device
20210209428 · 2021-07-08 ·

A translation method includes acquiring an image, where the image includes a text to be translated; splitting the text to be translated in the image and acquiring a plurality of target objects, where each of the plurality of target objects includes a word or a phrase of the text to be translated; receiving an input operation for the plurality of target objects, acquiring an object to be translated among the plurality of target objects, and translating the object to be translated.

Content evaluation based on machine learning and engagement metrics

Techniques for machine learning analysis are provided. A machine learning (ML) model is trained to identify appropriate documents based on lexical knowledge of target groups. A lexical knowledge of a set of users is determined. Additionally, a first document of a plurality of documents is selected by processing the determined level of lexical knowledge using the ML model. The first document is presented to the set of users. A level of engagement of the set of users is then determined. Upon determining that the level of engagement is below a predefined threshold, a second document of the plurality of documents is selected using the ML model.

DATA EXTRACTION AND DUPLICATE DETECTION

A system provides an end-to-end solution for invoice processing which includes reading invoices (both pdfs and images), extracting key relevant information from the face of invoices, organizing the relevant information in a structured template as a key-value pair, and comparing invoices based on the similarities between different invoice fields to identify potential duplicate invoices.

OCR error correction

Implementations of the disclosure are directed to OCR error correction systems and methods. In some implementations, a method comprises: obtaining, at a computing device, optical character recognition (OCR) text extracted from a document image, the text comprising a token; searching, at the computing device, based on a token bigram determined from the token and a mapping between words in a corpus and a corpus bigram set comprised of unique bigrams from the beginning or ending of the words in the corpus, the corpus for a best word to replace the token; and replacing, at the computing device, the token with the best word.

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM STORING PROGRAM

An image processing device including: a first feature quantity selecting unit configured to select a first feature quantity of a document image that is a character recognition target among first feature quantities that are recoded in advance and represent features of character strings of an item; a character recognition processing unit configured to perform a character recognition process for the document image; a character string selecting unit configured to select a character string of a specific item corresponding to the first feature quantity among the character strings acquired as a result of the character recognition process; and a determination result acquiring unit configured to acquire a determination result indicating whether or not a character string that has been input in advance matches the character string of the specific item in a case in which the character string selecting unit has not selected any one of the character strings.

Method And System For Hierarchical Classification Of Documents Using Class Scoring
20200409982 · 2020-12-31 ·

A method and system for hierarchically classifying text documents, using scoring and ranking. In particular, the present invention provides a system and method for classifying text documents, where terms in the document are associated with a class drawn from a taxonomy and used to calculate a score for each class. In one form, terms are captured for each class and adjustments made to compute a score to classify a document into a class. Using the scores, the top classes in a document are computed. Advantageously, the method and system can explain the classification, including why a class was not considered.

SYSTEM AND METHOD FOR COOPERATIVE TEXT RECOMMENDATION ACCEPTANCE IN A USER INTERFACE
20200410051 · 2020-12-31 ·

Methods for cooperative text recommendation acceptance of completion options in a user interface are performed by systems and devices. A user provides inputs via a user interface (UI) that are stored in an input buffer. As a portion of a first input is received, completion options for some part of the first input are determined based on statistical probabilities and the portion. A completion option is selected and displayed via the UI as completing the first input in a differentiated manner from the user-entered input. The user then either generates an acceptance command for the completion option or continues providing the first input and the UI adapts the remaining completion option portion. Acceptance commands are accepted as space characters or as alphanumeric characters representing additional input that follows the first input and the completion option. Statistical likelihoods are used to account for typographical errors and misspellings in user inputs.