Patent classifications
G06V10/7747
Landing tracking control method and system based on lightweight twin network and unmanned aerial vehicle
A landing tracking control method comprises the following contents: a tracking model training stage and an unmanned aerial vehicle real-time tracking stage. The landing tracking control method extracts a network Snet by using a lightweight feature and makes modification, so that an extraction speed of the feature is increased to better meet a real-time requirement. Weight allocation on the importance of channel information is carried out to differentiate effective features more purposefully and utilize the features, so that the tracking precision is improved. In order to improve a training effect of the network, a loss function of an RPN network is optimized, a regression precision of a target frame is measured by using CIOU, and meanwhile, calculation of classified loss function is adjusted according to CIOU, and a relation between a regression network and classification network is enhanced.
Detecting user interface elements in robotic process automation using convolutional neural networks
Graphical elements in a user interface (UI) may be detected in robotic process automation (RPA) using convolutional neural networks (CNNs). Such processes may be particularly well-suited for detecting graphical elements that are too small to be detected using conventional techniques. The accuracy of detecting graphical elements (e.g., control objects) may be enhanced by providing neural network-based processing that is robust to changes in various UI factors, such as different resolutions, different operating system (OS) scaling factors, different dots-per-inch (DPI) settings, and changes due to UI customization of applications and websites, for example.
Deep face recognition based on clustering over unlabeled face data
A computer-implemented method for implementing face recognition includes obtaining a face recognition model trained on labeled face data, separating, using a mixture of probability distributions, a plurality of unlabeled faces corresponding to unlabeled face data into a set of one or more overlapping unlabeled faces that include overlapping identities to those in the labeled face data and a set of one or more disjoint unlabeled faces that include disjoint identities to those in the labeled face data, clustering the one or more disjoint unlabeled faces using a graph convolutional network to generate one or more cluster assignments, generating a clustering uncertainty associated with the one or more cluster assignments, and retraining the face recognition model on the labeled face data and the unlabeled face data to improve face recognition performance by incorporating the clustering uncertainty.
Information processing apparatus, information processing method, vehicle, information processing server, and storage medium
An information processing apparatus recognizes a target within an actual image by executing processing of a neural network. The information processing apparatus obtains intermediate outputs which correspond to the actual image and a computer graphics (CG) image and which are from a hidden layer when each of the actual image and the CG image has been separately input to the neural network, and causes the neural network to perform learning with use of an evaluation values based on a first evaluation function and a second evaluation function, the first evaluation function causing the evaluation value to decrease as a difference between a recognition result and training data decreases, the second evaluation function causing the evaluation value to decrease as a difference between the intermediate outputs corresponding to the actual image and the CG image decreases.
Dynamic image recognition and training using data center resources and data
A system, method, and computer-readable medium are disclosed for creating image recognition models, which can be operated on smartphone or similar device. The smartphone captures images of hardware in a data center. The captured images are processed to produce a full set of annotated images. The full set is minimized to a simplified set and trained to create a mobile image recognition model implemented by the smartphone or similar device.
SYSTEMS AND METHODS FOR AUTOMATICALLY SOURCING CORPORA OF TRAINING AND TESTING DATA SAMPLES FOR TRAINING AND TESTING A MACHINE LEARNING MODEL
A system and method of curating machine learning training data for improving a predictive accuracy of a machine learning model includes sourcing training data samples based on seeding instructions; returning a corpus of unlabeled training data samples based on a search of data repositories; assigning a distinct classification labels to each of the training data samples of the corpus; computing efficacy metrics for an in-scope corpus of labeled training data samples derived from a subset of training data samples of the corpus that have been assigned one of the plurality of distinct classification labels, wherein the efficacy metrics identify whether the in-scope corpus of labeled training data samples is suitable for training a target machine learning model; and routing the in-scope corpus of labeled training data samples based on the efficacy metrics.
METHOD AND SYSTEM FOR FASHION ATTRIBUTE DETECTION
Traditional systems used for fashion attribute detection struggle to generate accurate predictions due to presence of large intra-class and relatively small inter-class variations in data related to the fashion attributes. The disclosure herein generally relates to image processing, and, more particularly, to a method and system for fashion attribute detection. The method proposes F-AttNet, an attribute extraction network to leverage the performance of fine-grained localized fashion attribute recognition. F-AttNet comprises Attentive Multi-scale Feature Encoder (AMF) blocks that encapsulate multi-scale fine-grained attribute information upon adaptive recalibration of channel weights. F-AttNet is designed by hierarchically stacking the AMF encoders to extract deep fine-grained information across multiple scales. A data model used by F-AttNet is trained using a novel γ-variant focal loss function for addressing the class imbalance problem by penalizing wrongly classified examples and incorporating separate importance to positive and negative instances.
Method and apparatus for sampling training data and computer server
The present disclosure provides a method and an apparatus for sampling training data and a computer server. The method includes: inputting a video to a target detection model to obtain a detection result for each frame of image; inputting the detection results for all frames of images in the video to a target tracking model, to obtain a tracking result for each frame of image; and for each frame of image in the video: matching the detection result and the tracking result for the frame of image, and when the detection result and the tracking result for the frame of image are inconsistent with each other, determining the frame of image as a sample image to be marked, for which processing by the target detection model is not optimal.
ARTIFICIAL INTELLIGENCE FOR CAPTURING FACIAL EXPRESSIONS AND GENERATING MESH DATA
Methods and systems are provided for training a model used for animating a facial expression of a game character. The method includes capturing mesh data of a first human actor using a three-dimensional (3D) camera to generate three-dimensional (3D) depth data of a face of the first human actor. In one embodiment, the 3D depth data is output as mesh files corresponding to a frame captured by the 3D camera. The method includes capturing two-dimensional (2D) point cloud data of the first human actor using a 2D camera. In one embodiment, the 2D point cloud data represents tracked dots present on the face of the first human actor. In another embodiment, the 2D point cloud data is processed to generate training label value files (tLVFs). The method includes processing the mesh data in time coordination with the tLVFs associated with the 2D point cloud data to train the model. The model is configured to receive input mesh files captured from a second human actor and generate as output LVFs corresponding to the input mesh files.
METHOD AND DEVICE FOR TRAINING A STYLE ENCODER OF A NEURAL NETWORK AND METHOD FOR GENERATING A DRIVING STYLE REPRESENTATION REPRESENTING A DRIVING STYLE OF A DRIVER
A method for training a style encoder of a neural network. Sensory input variables, which represent a movement of a system and surroundings of the system, are compressed to an abstract driving situation representation in at least one portion of a latent space of the neural network, using a trained situation encoder of the neural network. The sensory input variables are compressed to a driving style representation in at least one portion of the latent space, using the untrained style encoder. The driving style representation and the driving situation representation are decompressed from the latent space to output variables, using a style decoder of the neural network. A structure of the style encoder is changed to train the style encoder until the output variables of the style decoder represent the movement.