G06V30/26

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
20250077796 · 2025-03-06 ·

Provided is an information processing apparatus including: a character recognition unit configured to perform a character recognition process on an image of a processing target document; a generation unit configured to generate an instruction message based on a result of the character recognition process, the instruction message being a message for causing a large language model to reply a document type of the processing target document; a transmission unit configured to transmit the instruction message in order to obtain a reply to the instruction message from the large language model; and a reception unit configured to receive the reply to the instruction message from the large language model.

EXTENDED REALITY SYSTEM INCLUDING AI-ASSISTED IMAGE CAPTURE OPTICAL CHARACTER RECOGNITION FOR USE WITH VISUAL AID CORRECTION FOR LOW VISION

An extended reality (XR) system is described herein. The XR system includes a display system mounted to the headset for displaying a display screen including computer-generated images thereon, and a controller that includes one or more processors programmed to execute an algorithm for operating in an optical character recognition (OCR) mode to display a text selection screen by generating an image distortion zone within the text selection screen associated with a non-viewable boundary and modifying received video images to adjust a viewable image attribute of a portion of the received video images displayed within the image distortion zone, and operating in the scrolling text mode to display a text display screen by generating machine-readable text OCR and modifying the generating machine-readable text using a lightweight language model for text correction.

EXTENDED REALITY SYSTEM INCLUDING AI-ASSISTED IMAGE CAPTURE OPTICAL CHARACTER RECOGNITION FOR USE WITH VISUAL AID CORRECTION FOR LOW VISION

An extended reality (XR) system is described herein. The XR system includes a display system mounted to the headset for displaying a display screen including computer-generated images thereon, and a controller that includes one or more processors programmed to execute an algorithm for operating in an optical character recognition (OCR) mode to display a text selection screen by generating an image distortion zone within the text selection screen associated with a non-viewable boundary and modifying received video images to adjust a viewable image attribute of a portion of the received video images displayed within the image distortion zone, and operating in the scrolling text mode to display a text display screen by generating machine-readable text OCR and modifying the generating machine-readable text using a lightweight language model for text correction.

Systems and methods for detection and correction of OCR text

OCR-text correction system and method embodiments are described. The OCR-text correction embodiments comprise or cooperate with a transformer-based sequence-to-sequence language model. The model is pretrained to denoise corrupted text and is fine-tuned using OCR-correction-specific examples. Text obtained at least in part through OCR is applied to the fine-tuned pretrained transformer model to detect at least one error in a subset of the text. Responsive to detecting the at least one error, the fine-tuned pretrained transformer model outputs an updated subset of the text to correct the at least one error.

Systems and methods for detection and correction of OCR text

OCR-text correction system and method embodiments are described. The OCR-text correction embodiments comprise or cooperate with a transformer-based sequence-to-sequence language model. The model is pretrained to denoise corrupted text and is fine-tuned using OCR-correction-specific examples. Text obtained at least in part through OCR is applied to the fine-tuned pretrained transformer model to detect at least one error in a subset of the text. Responsive to detecting the at least one error, the fine-tuned pretrained transformer model outputs an updated subset of the text to correct the at least one error.

COMPUTER-READABLE RECORDING MEDIUM, SEARCH PROCESSING METHOD, AND SEARCH PROCESSING DEVICE
20250363153 · 2025-11-27 · ·

A non-transitory computer-readable recording medium has stored therein a program that causes a computer to execute a process. The process includes extracting a first character string included in image data from one or a plurality of pieces of data including the image data by character recognition processing, performing a search on the plurality of pieces of data using a keyword to specify, as a search result, a second character string included in the first character string based on similarity with the keyword in the search on the image data, and

displaying, in a case of displaying a search result in which one or more pieces of data including the second character string are indicated in a list, the image data included in the one or more pieces of data in a state where the second character string in the image data is identifiable.

SYSTEMS AND METHODS FOR MACHINE LEARNING KEY-VALUE EXTRACTION ON DOCUMENTS
20250371898 · 2025-12-04 ·

A method to improve, post-extraction, classification accuracy of key-values after a machine-learning model has been applied to documents, according to one embodiment, comprises receiving a collection of document images, creating an input data set from the collection, applying a classification model to the input data set that generates an initial set of entity predictions, and filtering the initial set of entity predictions that generates a revised set of entity predictions. The filtering the initial set of entity predictions further comprises applying at least a plurality of rules to the initial set of entity predictions. The plurality of rules comprises a first rule corresponding to treating each individual entity as unique, and a second rule corresponding to treating a single document as unique.

SYSTEMS AND METHODS FOR MACHINE LEARNING KEY-VALUE EXTRACTION ON DOCUMENTS
20250371898 · 2025-12-04 ·

A method to improve, post-extraction, classification accuracy of key-values after a machine-learning model has been applied to documents, according to one embodiment, comprises receiving a collection of document images, creating an input data set from the collection, applying a classification model to the input data set that generates an initial set of entity predictions, and filtering the initial set of entity predictions that generates a revised set of entity predictions. The filtering the initial set of entity predictions further comprises applying at least a plurality of rules to the initial set of entity predictions. The plurality of rules comprises a first rule corresponding to treating each individual entity as unique, and a second rule corresponding to treating a single document as unique.

Image recognition
12475700 · 2025-11-18 · ·

In an image recognition method, a unit duration is set according to the actual number of object boxes. External voice information is obtained or a focus event is monitored within the unit duration. One or more target object boxes are selected according to at least one of the external voice information or the focus event. Deduplication processing is performed on target object images respectively contained in the target object boxes.

Information processing apparatus for displaying screen for inputting property information, information processing method, and storage medium
12494075 · 2025-12-09 · ·

The user interface screen for inputting property information on a scanned image includes an input field to which information is input automatically based on results of character recognition processing performed for a character area included within the scanned image and in a case where the results of the character recognition processing are a numerical value, information that is input automatically to the input field is a numerical value after the numerical value of the results of the character recognition processing is changed in accordance with a predetermined interpretation rule of numerical value representation.