G06V30/15

EXTRACTION OF TEXTUAL CONTENT FROM VIDEO OF A COMMUNICATION SESSION
20230394861 · 2023-12-07 ·

Methods and systems provide for providing extraction of textual content from video of a communication session. In one embodiment, the system receives video content of a communication session which includes a number of participants. The system then extracts frames from the video content, and classifies the frames of the video content. The system identifies one or more distinguishing frames containing text. For each distinguishing frame containing text, the system detects a title within the frame, crops a title area with the title within the frame, and extracts, via optical character recognition (“OCR”), the title from the cropped title area of the frame. The system extracts, via OCR, textual content from the distinguishing frames containing text, and then transmits the extracted textual content and extracted titles to one or more client devices.

LICENSE PLATE DETECTION AND RECOGNITION SYSTEM
20210264168 · 2021-08-26 ·

A license plate detection and recognition system receives training data comprising images of license plates. The system prepares ground truth data from the training data based predefined parameters. The system trains a first machine learning algorithm based on the ground truth data to generate a license plate detection model. The license plate detection model is configured to detect one or more regions in the images. The one or more regions contains a candidate for a license plate. The LPDR system generates a bounding box for each region. The LPDR system trains a second machine learning algorithm based on the ground truth data and the license plate detection model to generate a license plate recognition model. The license plate recognition model generates a sequence of alphanumeric characters with a level of recognition confidence for the sequence.

METHOD AND APPARATUS OF IMAGE-TO-DOCUMENT CONVERSION BASED ON OCR, DEVICE, AND READABLE STORAGE MEDIUM

A method of image-to-document conversion based on optical character recognition (OCR) includes obtaining an image to be converted into a target document, and performing layout segmentation on the image according to image content of the image, to obtain n image layouts, each of the n image layouts corresponding to a content type, and n being a positive integer. The method also includes, for each of the n image layouts, processing image content in the respective image layout according to the content type corresponding to the respective image layout, to obtain converted content corresponding to the respective image layout. The method further includes adding the converted content corresponding to the n image layouts to an electronic document, to obtain the target document.

Neural network-based optical character recognition

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural network-based optical character recognition. An embodiment of the system may generate a set of bounding boxes based on reshaped image portions that correspond to image data of a source image. The system may merge any intersecting bounding boxes into a merged bounding box to generate a set of merged bounding boxes indicative of image data portions that likely portray one or more words. Each merged bounding box may be fed by the system into a neural network to identify one or more words of the source image represented in the respective merged bounding box. The one or more identified words may be displayed by the system according to a standardized font and a confidence score.

IMAGE MODERATION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

The disclosure discloses an image moderation method, and relates to a field of deep learning, artificial intelligence, computer vision and natural language processing technologies. The solution includes: obtaining a target image to be moderated; labeling the target image to obtain multiple pieces of label information of the target image; obtaining at least one service vertical category to which the target image is to be launched; recognizing whether the multiple pieces of label information contain label information excluded by the service vertical category; and determining that the target image is a forbidden image in the service vertical category in a case that the multiple pieces of label information contain the label information excluded by the service vertical category.

METHOD AND APPARATUS FOR GENERATING IMAGE

A method and an apparatus for generating an image are provided. The method includes: acquiring a screenshot of a webpage preloaded by a terminal as a source image; recognizing connection areas in the source image, and generating first circumscribed rectangular frames outside outlines of the connection areas; combining, in response to determining that a distance between the connection areas is smaller than a preset distance threshold, the connection areas, and generating a second circumscribed rectangular frame outside outlines of the combined connection areas; and generating, based on a nested relationship between the first circumscribed rectangular frames and the second circumscribed rectangular frames and pictures in the first circumscribed rectangular frames, a target image. The first circumscribed rectangular frames and the second circumscribed rectangular frame are respectively generated by recognizing and combining the connection areas in the source image.

License plate detection and recognition system

A license plate detection and recognition system receives training data comprising images of license plates. The system prepares ground truth data from the training data based predefined parameters. The system trains a first machine learning algorithm based on the ground truth data to generate a license plate detection model. The license plate detection model is configured to detect one or more regions in the images. The one or more regions contains a candidate for a license plate. The LPDR system generates a bounding box for each region. The LPDR system trains a second machine learning algorithm based on the ground truth data and the license plate detection model to generate a license plate recognition model. The license plate recognition model generates a sequence of alphanumeric characters with a level of recognition confidence for the sequence.

Method and Apparatus for Detecting and Interpreting Price Label Text
20210142092 · 2021-05-13 ·

A method of price text detection by an imaging controller comprises obtaining, by the imaging controller, an image of a shelf supporting labels bearing price text, generating, by the imaging controller, a plurality of text regions containing candidate text elements from the image, assigning, by the imaging controller, a classification to each of the text regions, selected from a price text classification and a non-price text classification. The imaging controller, within each of a subset of the text regions having the price text classification: detects a price text sub-region and generates a price text string by applying character recognition to the price text sub-region. The method further includes presenting, by the imaging controller, the locations of the subset of text regions, in association with the corresponding price text strings.

SYSTEMS AND METHODS OF INSTANT-MESSAGING BOT FOR ROBOTIC PROCESS AUTOMATION AND ROBOTIC TEXTUAL-CONTENT EXTRACTION FROM IMAGES

Systems and methods of instant-messaging bot for robotic process automation (RPA) and robotic textual-content extraction from digital images include a chatbot application, a software RPA manager, and an instant-messaging (IM) platform, all built for an enterprise. The enterprise IM platform is connected to one or more public IM platforms over the Internet. The RPA manager contains multiple modules of enterprise workflows and receives instructions from the enterprise chatbot for executing individual workflows. The system allows enterprise users connected to the enterprise IM platform, and external users connected to the public IM platforms, to use instant messaging to initiate enterprise workflows that are automated with the help of the enterprise chatbot and delivered via instant messaging. Furthermore, textual-content extraction from digital images is incorporated in the RPA manager as an enterprise workflow, and provides improved convolutional neural network (CNN) methods for textual-content extraction.

Neural Network-based Optical Character Recognition

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural network-based optical character recognition. An embodiment of the system may generate a set of bounding boxes based on reshaped image portions that correspond to image data of a source image. The system may merge any intersecting bounding boxes into a merged bounding box to generate a set of merged bounding boxes indicative of image data portions that likely portray one or more words. Each merged bounding box may be fed by the system into a neural network to identify one or more words of the source image represented in the respective merged bounding box. The one or more identified words may be displayed by the system according to a standardized font and a confidence score.