Patent classifications
G06V30/153
License Plate Reading System with Enhancements
System and methods are disclosed for capturing license plate (LP) information of a vehicle in relative motion to a camera device. In one example, the camera system detects the LP in multiple frames, then aligns and geometrically rectifies the image of the LP by scaling, warping, rotating, and/or performing other functions on the images. The camera system may optimize capturing of the LP information by executing a temporal noise filter on the aligned, geometrically rectified images to generate a composite image of the LP for optical character recognition. In some examples, the camera device may include an image sensor, such as a high dynamic range (HDR) sensor, modified to set long and short exposures of the HDR sensor to capture frames of a vehicle's LP, but without consolidating the images into a composite image. The camera system may set optimal exposure settings based on detected relative speed of the vehicle.
Information processing apparatus and non-transitory computer readable medium storing information processing program
An information processing apparatus includes a processor configured to detect, in response to detection of plural first character strings each representing a first item from a document having writing fields for items, second character strings related to the plural first character strings, respectively, and set at least one of the detected second character strings in setting information as a second item associated with the first item.
UTILIZING MACHINE-LEARNING BASED OBJECT DETECTION TO IMPROVE OPTICAL CHARACTER RECOGNITION
The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately enhancing optical character recognition with a machine learning approach for determining words from reverse text, vertical text, and atypically-sized text. For example, the disclosed systems segment a digital image into text regions and non-text regions utilizing an object detection machine learning model. Within the text regions, the disclosed systems can determine reverse text glyphs, vertical text glyphs, and/or atypically-sized text glyphs utilizing an edge based adaptive binarization model. Additionally, the disclosed systems can utilize respective modification techniques to manipulate reverse text glyphs, vertical text glyphs, and/or atypically-sized glyphs for analysis by an optical character recognition model. The disclosed systems can further utilize an optical character recognition model to determine words from the modified versions of the reverse text glyphs, the vertical text glyphs, and/or the atypically-sized text glyphs.
EXTRACTING KEY INFORMATION FROM DOCUMENT USING TRAINED MACHINE-LEARNING MODELS
Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.
MACHINE LEARNING MODEL-AGNOSTIC CONFIDENCE CALIBRATION SYSTEM AND METHOD
A method may include extracting, from a document, a first key-value pair including a key and a first value and corresponding to a first confidence score, extracting a second key-value pair including the key and a second value corresponding to a second confidence score, classifying a first match probability for the first key-value pair and a second match probability for the second key-value pair, generating a first calibrated confidence score for the first confidence score and a second calibrated confidence score for the second confidence score by transforming, using precision lookup tables constructed from training records, the first match probability to the first calibrated confidence score and the second match probability to second calibrated confidence score, selecting, using the first and second calibrated confidence scores, one of the first key-value pair and the second key-value pair, and presenting, in a graphical user interface (GUI), the selected key-value pair.
Text Line Detection
Implementations of the present disclosure provide a solution for text line detection. In this solution, a first text region comprising a first portion of at least a first text element and a second text region comprising a second portion of at least a second text element are determined from an image. A first feature representation is extracted from the first text region and a second feature representation is extracted from the second text region. The first and second feature representations comprise at least one of an image eature representation or a semantic feature representation of the image. A link relationship between the first and second text regions can then be determined based at least in part on the first and second feature representations. The link relationship can indicate whether the first and second portions of the first and second text elements are located in a same text line. In this way, by detecting text regions and determining the link relationship thereof based on their feature representations, the accuracy and efficiency for detecting text lines in various images can be improved
SCREEN RESPONSE VALIDATION OF ROBOT EXECUTION FOR ROBOTIC PROCESS AUTOMATION
Screen response validation of robot execution for robotic process automation (RPA) is disclosed. Whether text, screen changes, images, and/or other expected visual actions occur in an application executing on a computing system that an RPA robot is interacting with may be recognized. Where the robot has been typing may be determined and the physical position on the screen based on the current resolution of where one or more characters, images, windows, etc. appeared may be provided. The physical position of these elements, or the lack thereof, may allow determination of which field(s) the robot is typing in and what the associated application is for the purpose of validation that the application and computing system are responding as intended. When the expected screen changes do not occur, the robot can stop and throw an exception, go back and attempt the intended interaction again, restart the workflow, or take another suitable action.
Image context processing
Provided is a notification management method of a mobile terminal including generating a screenshot image, determining a category of the screenshot image based on a text or an image included in the screenshot image, and extracting a text or an image related to the category and generating notification information using the extracted text or image. The user equipment and the AI system of this disclosure may be associated with artificial intelligence modules, drones (unmanned aerial vehicles (UAVs)), robots, augmented reality (AR) devices, virtual reality (VR) devices, and devices related to 5G services.
Video processing for embedded information card localization and content extraction
Metadata for one or more highlights of a video stream may be extracted from one or more card images embedded in the video stream. The highlights may be segments of the video stream, such as a broadcast of a sporting event, that are of particular interest. According to one method, video frames of the video stream are stored. One or more information cards embedded in a decoded video frame may be detected by analyzing one or more predetermined video frame regions. Image segmentation, edge detection, and/or closed contour identification may then be performed on identified video frame region(s). Further processing may include obtaining a minimum rectangular perimeter area enclosing all remaining segments, which may then be further processed to determine precise boundaries of information card(s). The card image(s) may be analyzed to obtain metadata, which may be stored in association with at least one of the video frames.
Character recognition of license plate under complex background
A system, method, and computer program product provides a way to separate connected or adhered adjacent characters of a digital image for license plate recognition. As a threshold processing, the method performs a recognition of character adhesion by obtaining character parameters using an image processor. The parameters include a horizontal max crossing and a ratio of width and height. A first rule-based module is used responsive to the character parameters to distinguish the adhered characters (character adhesions) that are easy to judge, leaving the uncertain part to a character adhesion classifier model for discrimination. Character adhesion data is obtained by data augmentation including the adding of a random distance between two single characters to create class like adhered characters. Then the character adhesion classifier model of single character and character adhesion data is trained. Any uncertain part can be distinguished by the trained character adhesion classifier model.