Patent classifications
G06V30/19173
Methods and apparatus to determine the dimensions of a region of interest of a target object from an image using target object landmarks
Methods and apparatus to determine the dimensions of a region of interest of a target object and a class of the target object from an image using target object landmarks are disclosed herein. An example method includes identifying a landmark of a target object in an image based on a match between the landmark and a template landmark; classifying a target object based on the identified landmark; projecting dimensions of the template landmark based on a location of the landmark in the image; and determining a region of interest based on the projected dimensions, the region of interest corresponding to text printed on the target object.
Determining drivable free-space for autonomous vehicles
In various examples, sensor data may be received that represents a field of view of a sensor of a vehicle located in a physical environment. The sensor data may be applied to a machine learning model that computes both a set of boundary points that correspond to a boundary dividing drivable free-space from non-drivable space in the physical environment and class labels for boundary points of the set of boundary points that correspond to the boundary. Locations within the physical environment may be determined from the set of boundary points represented by the sensor data, and the vehicle may be controlled through the physical environment within the drivable free-space using the locations and the class labels.
Convolutional neural network and associated method for identifying basal cell carcinoma
A convolutional neural network (CNN) and associated method for identifying basal cell carcinoma are disclosed. The CNN comprises two convolution layers, two pooling layers and at least one fully-connected layer. The first convolution layer uses initial Gabor filters that model the kernel parameters setting in advance based on human professional knowledge. The method uses collagen fiber images for training images and converts doctors' knowledge to initiate the Gabor filters as featuring computerization. The invention provides better training performance in terms of training time consumption and training material overhead.
Systems and methods of instant-messaging bot for robotic process automation and robotic textual-content extraction from images
Systems and methods of instant-messaging bot for robotic process automation (RPA) and robotic textual-content extraction from digital images include a chatbot application, a software RPA manager, and an instant-messaging (IM) platform, all built for an enterprise. The enterprise IM platform is connected to one or more public IM platforms over the Internet. The RPA manager contains multiple modules of enterprise workflows and receives instructions from the enterprise chatbot for executing individual workflows. The system allows enterprise users connected to the enterprise IM platform, and external users connected to the public IM platforms, to use instant messaging to initiate enterprise workflows that are automated with the help of the enterprise chatbot and delivered via instant messaging. Furthermore, textual-content extraction from digital images is incorporated in the RPA manager as an enterprise workflow, and provides improved convolutional neural network (CNN) methods for textual-content extraction.
METHODS FOR IMPROVING THE PERFORMANCE OF NEURAL NETWORKS USED FOR BIOMETRIC AUTHENTICATIO
A method of generating a biometric signature of a user for use in authentication using a neural network, the method comprising: receiving (110) a plurality of biometric samples from a user;
extracting at least one feature vector using the plurality of biometric samples; using the elements of the at least one feature vector as inputs for a neural network; extracting the corresponding activations from an output layer of the neural network; and generating a biometric signature of the user using the extracted activations, such that a single biometric signature represents multiple biometric samples from the user.
OPTICAL CHARACTER RECOGNITION TRAINING WITH SEMANTIC CONSTRAINTS
A method, computer system, and a computer program product for optical character recognition training are provided. A text image and plain text labels for the text image may be received. The text image may include words. The plain text labels may include machine-encoded text corresponding to the words. Semantic feature vectors for the words, respectively, may be generated based on the plain text label. The text image, the plain text labels, and the semantic feature vectors may be input together into a machine learning model to train the machine learning model for optical character recognition. The plain text labels and the semantic feature vectors may be constraints for the training.
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
In a scene where a pseudo character image is generated by performing deformation processing for a character image, a character image that impedes training is suppressed from being generated. Based on a condition relating to a parameter that is used for the deformation processing and associated with a first class, a parameter of the deformation processing is determined and the deformation processing is performed for a character image belonging to the first class using the determined parameter. Then, whether or not the deformed character image generated by the deformation processing is similar to a character image belonging to a class different from the first class is determined and in a case where similarity is determined, the condition associated with the first class is updated.
Method and system for automatically classifying images
A processor of an image automatic classification server may perform a method for automatically classifying images. The method includes receiving partial or entire contents of a plurality of products from an online shopping website, classifying the received contents of the plurality of products into each of the products and storing the contents classified by each of the products, extracting a plurality of product images of one product among the plurality of products form the stored contents, and automatically classifying the extracted product images of the one product into a plurality of categories to generate information for the one product. The information for the one product comprises information that classifies the plurality of product images of the one product for each of a plurality of categories to provide the classified product images to be selectable.
Enhanced training of machine learning systems based on automatically generated realistic gameplay information
Systems and methods for enhanced training of machine learning systems based on automatically generated visually realistic gameplay. An example method includes obtaining electronic game data that includes rendered images and associated annotation information, the annotation information identifying features included in the rendered images to be learned, and the electronic game data being generated by a video game associated with a particular sport. Machine learning models are trained based on the obtained electronic game data, with training including causing the machine learning models to output annotation information based on associated input of a rendered image. Real-world gameplay data is obtained, with the real-world gameplay data being images of real-world gameplay of the particular sport. The obtained real-world gameplay data is analyzed based on the trained machine learning models. Analyzing includes extracting features from the real-world gameplay data using the machine learning models.
Systems and methods for processing audiovisual data using latent codes from generative networks and models
Systems and methods for viewing, storing, transmitting, searching, and editing application-specific audiovisual content (or other unstructured data) are disclosed in which edge devices generate content on the fly from a partial set of instructions rather than merely accessing the content in its final or near-final form. An image processing architecture may include a generative model that may be a deep learning model. The generative model may include a latent space comprising a plurality of latent codes and a trained generator mapping. The trained generator mapping may convert points in the latent space to uncompressed data points, which in the case of audiovisual content may be generated image frames. The generative model may be capable of closely approximating (up to noise or perceptual error) most or all potential data points in the relevant compression application, which in the case of audiovisual content may be source images.