Patent classifications
G06V30/19173
Using attributes for identifying imagery for selection
A system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving data representing an image, the image being represented in the data by a collection of visual elements. Operations also include determining whether to select the image for presentation by one or more entities using a machine learning system, the machine learning system being trained using data representing a plurality of training images and data representing one or more attributes regarding image presentation by the one or more entities.
Method and system for converting font of Chinese character in image, computer device and medium
A method and a system for converting a font of a Chinese character in an image, a computer device and a medium are disclosed. A specific implementation of the method includes: acquiring a stroke of a to-be-converted Chinese character in the image and spatial distribution information of the stroke; and generating a Chinese character in a target font that corresponds to the to-be-converted Chinese character in the image according to the stroke of the to-be-converted Chinese character, the spatial distribution information of the stroke and standard stroke information of the target font, to replace the to-be-converted Chinese character.
Object recognition devices, electronic devices and methods of recognizing objects
An object recognition device including an artificial neural network (NN) engine configured to receive learning data and weights, make an object recognition model (ORM) learn by using the received information, and provide selected weight data including weights from the selected portion of the weights, and further configured to receive a feature vector, and apply the feature vector extracted from an object data that constructs the object and the selected weight data to the learned ORM to provide an object recognition result, a nonvolatile memory (NVM) configured to store the learned ORM, and an error correction code (ECC) engine configured to perform an ECC encoding on the selected weight data to generate parity data, provide the selected weight data and the parity data to the NVM, and provide the selected weight data to the NN engine by performing an ECC decoding on the selected weight data based on the parity data.
Method of detecting at least one element of interest visible in an input image by means of a convolutional neural network
A method of detecting at least one element of interest visible on an input image, by means of a convolutional neural network, CNN, the method comprises the steps of: (a) extracting, by means of an ascending branch of a first subnetwork of said CNN of feature pyramid network, FPN, type, a plurality of initial feature maps (C1, C2, C3, C4, C5) representative of the input image at different scales, said FPN further comprising a descending branch and lateral connections between the ascending branch and the descending branch, at least one lateral connection comprising an attention module; (b) generating, by means of said descending branch of the FPN, a plurality of enriched feature maps (P2, P3, P4, P5) also representative of the input image at different scales, each enriched feature map (P2, P3, P4, P5) incorporating the information from the initial feature maps (C1, C2, C3, C4, C5) of smaller or equal scale; (d) detecting at least one element of interest visible on an input image, by means of a second subnetwork of said CNN, referred to as detection network, taking said enriched feature maps (P2, P3, P4, P5) as input.
System and method for learning sensory media association without using text labels
A computer-implemented method of learning sensory media association includes receiving a first type of nontext input and a second type of nontext input; encoding and decoding the first type of nontext input using a first autoencoder having a first convolutional neural network, and the second type of nontext input using a second autoencoder having a second convolutional neural network; bridging first autoencoder representations and second autoencoder representations by a deep neural network that learns mappings between the first autoencoder representations associated with a first modality and the second autoencoder representations associated with a second modality; and based on the encoding, decoding, and the bridging, generating a first type of nontext output and a second type of nontext output based on the first type of nontext input or the second type of nontext input in either the first modality or the second modality.
Methods and systems for accurately recognizing vehicle license plates
Systems can be configured for detecting license plates and recognizing characters in license plates. In an example, a system can receive an image and identify one or more regions in the image that include a license plate. Character recognition can be performed in the one or more regions to determine contents of a candidate license plate. Location-specific information about a license plate format can be used together with the determined contents of the candidate license plate to determine if the recognized characters are valid.
Document classification system and non-transitory computer readable recording medium storing document classification program
A document classification system uses an image file as a file of an image serving as a model for classifying a document to classify, by machine learning, an image read from a form as a document by a scanner of an image forming apparatus, and reports a classification failure image as an image of the document when the document is unsuccessfully classified.
Image processing device, image processing method, and image processing system
Provided are: an amodal segmentation unit that generates a set of first amodal masks indicating a probability that a particular pixel belongs to a relevant object for each of objects, with respect to an input image in which a plurality of the objects partially overlap; an overlap segmentation unit that generates an overlap mask corresponding only to an overlap region where the plurality of objects overlap in the input image based on an aggregate mask obtained by combining the set of first amodal masks generated for each of the objects and a feature map generated based on the input image; and an amodal mask correction unit that generates and outputs a second amodal mask, which includes an annotation label indicating a category of each of the objects corresponding to a relevant pixel, for each of pixels in the input image using the overlap mask and the aggregate mask.
Cross modality training of machine learning models
There is provided a method, comprising: providing a training dataset including, medical images and corresponding text based reports, and concurrently training a natural language processing (NLP) machine learning (ML) model for generating a NLP category for a target text based report and a visual ML model for generating a visual finding for a target image, by: training the NLP ML model using the text based reports of the training dataset and a ground truth comprising the visual finding generated by the visual ML model in response to an input of the images corresponding to the text based reports of the training dataset, and training the visual ML model using the images of the training dataset and a ground truth comprising the NLP category generated by the NLP ML model in response to an input of the text based reports corresponding to the images of the training dataset.
Content capturing system and content capturing method
A content capturing system is suitable for capturing content in an image of a document. The content capturing system includes a processor and a storage device. The processor accesses the program stored in the storage device to implement a cutting module and a processing module. The cutting module receives a corrected image. The content in the corrected image includes a plurality of text areas, and the cutting module inputs the corrected image or a first text area into a convolutional neural network. The convolutional neural network outputs the coordinates of the first text area. The cutting module cuts the first text area according to the coordinates of the first text area. The cutting module inputs the cut first text area into a text recognition system and obtains a plurality of first characters in the first text area from the text recognition system.