Patent classifications
G06F40/232
METHOD FOR AUTOMATICALLY IDENTIFYING WORD REPETITION ERRORS
A method for automatically identifying word repetition errors includes the following steps; after performing word segmentation on a large-scale training corpus, performing statistics to obtain two-tuple and three-tuple structures including repeated words in the training corpus, and repeated combination degrees, left contextual adjacent word information entropy and right contextual adjacent word information entropy in the training corpus; performing statistics and recording words containing repeated characters in a Chinese dictionary and establishing a repeated word library of the Chinese dictionary; judging the repeated words appearing in the text to be subjected to error checking based on the repeated words in the Chinese dictionary; and judging the repeated words appearing in the text to be subjected to error checking based on the repeated combination degrees, left contextual adjacent word information entropy and right contextual adjacent word information entropy obtained by performing statistics.
Multi-Modal Learning Based Intelligent Enhancement of Post Optical Character Recognition Error Correction
A mechanism is provided for implementing an optical character recognition (OCR) error correction mechanism for correcting OCR errors. Responsive to receiving a document in which OCR has been performed, the mechanism assesses the document to identify a set of OCR errors generated by an OCR engine that performed the OCR using a set of visual embeddings. Responsive to identifying the set of OCR errors, the mechanism analyzes each character of a plurality of sentences within the document to generate a high-dimensional embedding for the characters of the plurality of sentences within the document. The mechanism then linguistically corrects each OCR error in the set of OCR error. The mechanism utilizes ground truth information and the set of visual embeddings to verify that character stream is linguistically correct. Responsive to verifying that the character stream is linguistically correct, the mechanism outputs an OCR error corrected document to a user.
Multi-Modal Learning Based Intelligent Enhancement of Post Optical Character Recognition Error Correction
A mechanism is provided for implementing an optical character recognition (OCR) error correction mechanism for correcting OCR errors. Responsive to receiving a document in which OCR has been performed, the mechanism assesses the document to identify a set of OCR errors generated by an OCR engine that performed the OCR using a set of visual embeddings. Responsive to identifying the set of OCR errors, the mechanism analyzes each character of a plurality of sentences within the document to generate a high-dimensional embedding for the characters of the plurality of sentences within the document. The mechanism then linguistically corrects each OCR error in the set of OCR error. The mechanism utilizes ground truth information and the set of visual embeddings to verify that character stream is linguistically correct. Responsive to verifying that the character stream is linguistically correct, the mechanism outputs an OCR error corrected document to a user.
Cursor adjustments
Example implementations relate to cursor adjustments. In some examples, a computing device may include a cursor positioning device. The computing device may include a processor to determine a first input associated with a first cursor path received from the cursor positioning device. The computing device may include a processor to determine a modified output of the first cursor path that is different from the first input. The computing device may include a processor to determine a second cursor path based on the modified output. The computing device may include a processor to determine an adjustment based on a difference between the first cursor path and the second cursor path. The computing device may include a processor to apply the adjustment to a third cursor path.
Cursor adjustments
Example implementations relate to cursor adjustments. In some examples, a computing device may include a cursor positioning device. The computing device may include a processor to determine a first input associated with a first cursor path received from the cursor positioning device. The computing device may include a processor to determine a modified output of the first cursor path that is different from the first input. The computing device may include a processor to determine a second cursor path based on the modified output. The computing device may include a processor to determine an adjustment based on a difference between the first cursor path and the second cursor path. The computing device may include a processor to apply the adjustment to a third cursor path.
Context and domain sensitive spelling correction in a database
A method of operating a health tracking system is disclosed. The method comprises: receiving a first data record comprising at least a first descriptive string regarding a consumable item, the first descriptive string having at least one word thereof incorrectly spelled; generating a vector using the first descriptive string using a machine learning model; identifying a second descriptive string which corresponds to the consumable item and which has a correct spelling of the at least one incorrectly spelled word by applying the machine learning model to the generated vector; calculating a confidence factor regarding the identified second descriptive string using the machine learning model; and when it is determined that the confidence factor exceeds a predetermined threshold, (i) modifying the first data record by replacing the first descriptive string with the second descriptive string, and (ii) storing the modified first data record in the database.
Context and domain sensitive spelling correction in a database
A method of operating a health tracking system is disclosed. The method comprises: receiving a first data record comprising at least a first descriptive string regarding a consumable item, the first descriptive string having at least one word thereof incorrectly spelled; generating a vector using the first descriptive string using a machine learning model; identifying a second descriptive string which corresponds to the consumable item and which has a correct spelling of the at least one incorrectly spelled word by applying the machine learning model to the generated vector; calculating a confidence factor regarding the identified second descriptive string using the machine learning model; and when it is determined that the confidence factor exceeds a predetermined threshold, (i) modifying the first data record by replacing the first descriptive string with the second descriptive string, and (ii) storing the modified first data record in the database.
Automatic lexical sememe prediction system using lexical dictionaries
Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.
Automatic lexical sememe prediction system using lexical dictionaries
Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.
METHOD OF CONVERTING BETWEEN AN N-TUPLE AND A DOCUMENT USING A READABLE TEXT AND A TEXT GRAMMAR
Embodiments are directed at processing language content by a method of bi-directional conversion between language content with additional information to and from documents, using a readable text and a text grammar. A method combines additional information with the language content using punctuation idioms. The combined language content and additional information remains readable by one ordinarily skilled in the art of reading and also remains allowable according to a text grammar; that is embodiments are rigorous and may be declarative. The document is compliant with a format drawn from a set which comprises SGML, XML, TEI, HTML, DOC, DOCX, ODX, PDF and XPS. The document is publishable in a medium drawn from a set which comprises a book, a magazine, a journal, a newspaper, an article and a web page. A computer-readable memory device and a computing device are also claimed.