Patent classifications
G06T2207/30176
Techniques for image content extraction
Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image. In some embodiments, the machine-facilitated annotation may be used to generate a template for the template database.
Automatic Area Detection
An example computing platform is configured to (i) receive a two-dimensional (2D) image file comprising a construction drawing, (ii) generate, via semantic segmentation, a first set of polygons corresponding to respective areas of the 2D image file, (iii) generate, via instance segmentation, a second set of polygons corresponding to respective areas of the 2D image file, (iv) generate, via unsupervised image processing, a third set of polygons corresponding to respective areas of the 2D image file, (v) based on (a) overlap between polygons in the first, second, and third sets of polygons and (b) respective confidence scores for each of the overlapping polygons, determine a set of merged polygons corresponding to respective areas of the 2D image file, and (vi) cause a client station to display a visual representation of the 2D image file where each merged polygon is overlaid as a respective selectable region of the 2D image file.
MACHINE LEARNING ENABLED DOCUMENT DESKEWING
A method may include determining, based at least on an image of a document, a plurality of text bounding boxes enclosing lines of text present in the document. A machine learning model may be trained to determine, based at least on the coordinates defining the text bounding boxes, the coordinates of a document bounding box enclosing the text bounding boxes. The document bounding box may encapsulate the visual aberrations that are present in the image of the document. As such, one or more transformations may be determined based on the coordinates of the document bounding box. The image of the document may be deskewed by applying the transformations. One or more downstream tasks may be performed based on the deskewed image of the document. Related methods and articles of manufacture are also disclosed.
Object detection and image cropping using a multi-detector approach
Systems, methods and computer program products for detecting objects using a multi-detector are disclosed, according to various embodiments. In one aspect, a computer-implemented method includes defining an analysis profile comprising an initial number of analysis cycles dedicated to each of a plurality of detectors, where each detector is independently configured to detect objects according to a unique set of analysis parameters and/or a unique detector algorithm. The method also includes: receiving digital video data that depicts at least one object; analyzing the digital video data using some or all of the detectors in accordance with the analysis profile, where the analyzing produces an analysis result for each detector used in the analysis. Further, the method includes updating the analysis profile by adjusting the number of analysis cycles dedicated to at least one of the detectors based on the analysis results.
ARTIFICIAL INTELLIGENCE ARCHITECTURES FOR DETERMINING IMAGE AUTHENTICITY
The present disclosure generally relates to systems that include an artificial intelligence (AI) architecture for determining whether an image is manipulated. The architecture can include a constrained convolutional layer, separable convolutional layers, maximum-pooling layers, a global average-pooling layer, and a fully connected layer. In one specific example, the constrained convolutional layer can detect one or more image-manipulation fingerprints with respect to an image and can generate feature maps corresponding to the image. The global average-pooling layer can generate a vector of feature values by averaging the feature maps. The fully connected layer can then generate, based on the vector of feature values, an indication of whether the image was manipulated or not manipulated.
Methods and apparatus to determine the dimensions of a region of interest of a target object from an image using target object landmarks
Methods and apparatus to determine the dimensions of a region of interest of a target object and a class of the target object from an image using target object landmarks are disclosed herein. An example method includes identifying a landmark of a target object in an image based on a match between the landmark and a template landmark; classifying a target object based on the identified landmark; projecting dimensions of the template landmark based on a location of the landmark in the image; and determining a region of interest based on the projected dimensions, the region of interest corresponding to text printed on the target object.
Generating synthetic images as training dataset for a machine learning network
A method may include identifying a first image for training a deep learning network, wherein the first image includes at least one target object associated with at least one location in the first image, and wherein the first image is associated with a mask image; determining a set of deformations to create a training set of deformed images, wherein the training set is to be used to train the deep learning network; generating the training set of deformed images by applying the set of deformations to the first image; and generating a set of deformed mask images by applying the set of deformations to the mask image, wherein each deformed image of the training set of deformed images is associated with a respective mask image to identify the location of the at least one target object in each deformed image.
Detection of layout table(s) by a screen reader
Described herein is a system and method for detecting a layout table by a screen reader. Information regarding a document being displayed by an application is received from an application programming interface of a user interface automation system that provides information regarding user interface elements of application and/or the application. The information includes an indication that the document comprises a table. A determination is made as to whether the table is a data table or a layout table based upon the received information using a rule-based heuristic. When it is determined that the table is a layout table, presentation information associated with the layout table can be skipped over, and, cell data content within the layout table provided. Thus, for a determined layout table, the system and method allow the screen reader to act as if the containing table doesn't exist, but still read the content.
Privacy protection in mobile robot
A mobile robot is configured for operation in a commercial or industrial setting, such as an office building or retail store. The mobile robot may include cameras for capturing images and videos and include microphones for capturing audio of its surroundings. To improve privacy by preventing confidential information from being transmitted, the mobile robot may detect text in images and modify the images to make the text illegible before transmitting the images. The mobile robot may also detect human voice in audio and modify audio to make the human voice unintelligible before transmitting the audio.
IMAGE PROCESSING SYSTEM AND METHOD FOR IMAGE NOISE REMOVAL
A system for removing a noise artifact from an image of a document extracts a first set of features from the image, where the first set of features represents items on the image. The system identifies noise artifact features from the first set of features representing pixel values of the noise artifact. The system generates a second set of features by removing the noise artifact features from the first set of features. The system generates a test clean image of the document based on the second set of features as an input. The system determines whether a portion of the test clean image that previously displayed the noise artifact corresponds to a counterpart portion of the training clean image. If it is determined that the portion of the test clean image corresponds to the counterpart portion of the training clean image, the system outputs the test clean image.