Patent classifications
G06V30/1918
METHOD AND DEVICE FOR TRAINING, BASED ON CROSSMODAL INFORMATION, DOCUMENT READING COMPREHENSION MODEL
A method for training a document reading comprehension model includes: acquiring a question sample and a rich-text document sample, in which the rich-text document sample includes a real answer of the question sample; acquiring text information and layout information of the rich-text document sample by performing OCR processing on image information of the rich-text document sample; acquiring a predicted answer of the question sample by inputting the text information, the layout information and the image information of the rich-text document sample into a preset reading comprehension model; and training the reading comprehension model based on the real answer and the predicted answer. The method may enhance comprehension ability of the reading comprehension model to the long rich-text document, and save labor cost.
TEXT EXTRACTION METHOD, TEXT EXTRACTION MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM
A text extraction method and a text extraction model training method are provided. The present disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision. An implementation of the method comprises: obtaining a visual encoding feature of a to-be-detected image; extracting a plurality of sets of multimodal features from the to-be-detected image, wherein each set of multimodal features includes position information of one detection frame extracted from the to-be-detected image, a detection feature in the detection frame and first text information in the detection frame; and obtaining second text information matched with a to-be-extracted attribute based on the visual encoding feature, the to-be-extracted attribute and the plurality of sets of multimodal features, wherein the to-be-extracted attribute is an attribute of text information needing to be extracted.
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, NON-TRANSITORY COMPUTER-READABLE MEDIUM, AND INFORMATION PROCESSING METHOD
An information processing system (100) includes a rangefinding and processing unit (101) that generates rangefinding information indicating the distance and direction to each of a plurality of rangefinding targets; an imaging and processing unit (104) that generates image data of a captured image, specifies the distance, direction, and type of an imaged target, and generates imaging information indicating the distance, direction, and type of the imaged target; and a control unit (114) that specifies tentative values indicating the sizes of the rangefinding targets by using the imaging information, specifies a plurality of tentative areas, which are areas where the rangefinding target are projected, in accordance with the tentative values and the rangefinding information, and calculates match probability indicating the possibility of the imaged target matching each of the rangefinding targets by using the dimension of an overlap between each of the tentative areas and a target area.
Method and apparatus for data efficient semantic segmentation
A method and system for training a neural network are provided. The method includes receiving an input image, selecting at least one data augmentation method from a pool of data augmentation methods, generating an augmented image by applying the selected at least one data augmentation method to the input image, and generating a mixed image from the input image and the augmented image.
Parsing an ink document using object-level and stroke-level processing
Technology is described herein for parsing an ink document having a plurality of ink strokes. The technology performs stroke-level processing on the plurality of ink strokes to produce stroke-level information, the stroke-level information identifying at least one characteristic associated with each ink stroke. The technology also performs object-level processing on individual objects within the ink document to produce object-level information, the object-level information identifying one or more groupings of ink strokes in the ink document. The technology then parses the ink document into constituent parts based on the stroke-level information and the object-level information. In some implementations, the technology converts the ink stroke data into an ink image. The stroke-level processing and/or the object-level processing may operate on the ink image using one or more neural networks. More specifically the stroke-level processing can classify pixels in the input image, while the object-level processing can identify bounding boxes containing possible objects.
SYSTEM AND METHOD TO RECOGNISE CHARACTERS FROM AN IMAGE
System and method to recognise characters from an image are disclosed. The method includes receiving the at least one image, pre-processing the at least one image, extracting a plurality of characters from the corresponding at least one image, extracting at least one structure from the corresponding at least one image upon applying an edge detection technique to extract a structure, identifying a template based on extracted structure, subjecting the plurality of characters into a plurality of ensemble AI models to extract one of a plurality of texts, a plurality of non-textual data and a combination thereof, comparing a plurality of extracted plurality of texts, a plurality of non-textual data, or a combination thereof from the corresponding plurality of ensemble AI models with each other, generating a confidence score and validating one of the plurality of accurate texts, the plurality of accurate non-textual data, or a combination thereof.
CROSS MODALITY TRAINING OF MACHINE LEARNING MODELS
There is provided a method, comprising: providing a training dataset including, medical images and corresponding text based reports, and concurrently training a natural language processing (NLP) machine learning (ML) model for generating a NLP category for a target text based report and a visual ML model for generating a visual finding for a target image, by: training the NLP ML model using the text based reports of the training dataset and a ground truth comprising the visual finding generated by the visual ML model in response to an input of the images corresponding to the text based reports of the training dataset, and training the visual ML model using the images of the training dataset and a ground truth comprising the NLP category generated by the NLP ML model in response to an input of the text based reports corresponding to the images of the training dataset.
Methods of processing data from multiple image sources to provide normalized confidence levels for use in improving performance of a recognition processor
A method comprises receiving from a first data source first recognition results which are associated with the first data source, and receiving from a second data source second recognition results which are associated with the second data source. The method further comprises, processing a first set of confidence levels associated with the first recognition results to provide a first set of normalized confidence levels associated with the first data source, and processing a second set of confidence levels associated with the second recognition results to provide a second set of normalized confidence levels associated with the second data source. The method also comprises storing the first set of normalized confidence levels associated with the first data source in a first table of normalized confidence levels and the second set of normalized confidence levels associated with the second data source in a second table of normalized confidence levels.
MULTI-SENSOR CALIBRATION SYSTEM
Techniques for performing multi-sensor calibration on a vehicle are described. A method includes obtaining, from each of at least two sensors located on a vehicle, sensor data item of a road comprising a lane marker, extracting, from each sensor data item, a location information of the lane marker, and calculating extrinsic parameters of the at least two sensors based on determining a difference between the location information of the lane marker from each sensor data item and a previously stored location information of the lane marker.
Parsing an Ink Document using Object-Level and Stroke-Level Processing
Technology is descried herein for parsing an ink document having a plurality of ink strokes. The technology performs stroke-level processing on the plurality of ink strokes to produce stroke-level information, the stroke-level information identifying at least one characteristic associated with each ink stroke. The technology also performs object-level processing on individual objects within the ink document to produce object-level information, the object-level information identifying one or more groupings of ink strokes in the ink document. The technology then parses the ink document into constituent parts based on the stroke-level information and the object-level information. In some implementations, the technology converts the ink stroke data into an ink image. The stroke-level processing and/or the object-level processing may operate on the ink image using one or more neural networks. More specifically the stroke-level processing can classify pixels in the input image, while the object-level processing can identify bounding boxes containing possible objects.