Patent classifications
G06V10/7753
LEARNING APPARATUS, LEARNING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM IN WHICH LEARNING PROGRAM HAS BEEN STORED
A learning apparatus (500) according to the present invention includes a detection unit (510) that detects, as a candidate region of a learning target, a region detected by one of first detection processing of detecting an object region from a predetermined image and second detection processing of detecting a change region from background image information and the image, and not detected by the other, an output unit (520) that outputs at least a part of the candidate region as a labeling target, and a learning unit (530) that learns a model for performing the first detection processing or a model for performing the second detection processing by using the labeled candidate region as learning data.
Neural style transfer for image varietization and recognition
Systems and methods for image recognition are provided. A style-transfer neural network is trained for each real image to obtain a trained style-transfer neural network. The texture or style features of the real images are transferred, via the trained style-transfer neural network, to a target image to generate styled images which are used for training an image-recognition machine learning model (e.g., a neural network). In some cases, the real images are clustered and representative style images are selected from the clusters.
SYSTEMS, METHODS, AND APPARATUSES FOR IMPLEMENTING ADVANCEMENTS TOWARDS ANNOTATION EFFICIENT DEEP LEARNING IN COMPUTER-AIDED DIAGNOSIS
Embodiments described herein include systems for implementing annotation-efficient deep learning in computer-aided diagnosis. Exemplary embodiments include systems having a processor and a memory specially configured with instructions for learning annotation-efficient deep learning from non-labeled medical images to generate a trained deep-learning model by applying a multi-phase model training process via specially configured instructions for pre-training a model by executing a one-time learning procedure using an initial annotated image dataset; iteratively re-training the model by executing a fine-tuning learning procedure using newly available annotated images without re-using any images from the initial annotated image dataset; selecting a plurality of most representative samples related to images of the initial annotated image dataset and the newly available annotated images by executing an active selection procedure based on the which of a collection of un-annotated images exhibit either a greatest uncertainty or a greatest entropy; extracting generic image features; updating the model using the generic image features extrated; and outputting the model as the trained deep-learning model for use in analyzing a patient medical image. Other related embodiments are disclosed.
LEVERAGING UNSUPERVISED META-LEARNING TO BOOST FEW-SHOT ACTION RECOGNITION
The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.
Learning systems and methods
A sequence of images depicting an object is captured, e.g., by a camera at a point-of-sale terminal in a retail store. The object is identified, such as by a barcode or watermark that is detected from one or more of the images. Once the object's identity is known, such information is used in training a classifier (e.g., a machine learning system) to recognize the object from others of the captured images, including images that may be degraded by blur, inferior lighting, etc. In another arrangement, such degraded images are processed to identify feature points useful in fingerprint-based identification of the object. Feature points extracted from such degraded imagery aid in fingerprint-based recognition of objects under real life circumstances, as contrasted with feature points extracted from pristine imagery (e.g., digital files containing label artwork for such objects). A great variety of other features and arrangements—some involving designing classifiers so as to combat classifier copying—are also detailed.
Segmentation to determine lane markings and road signs
Systems and methods for lane marking and road sign recognition are provided. The system aligns image level features between a source domain and a target domain based on an adversarial learning process while training a domain discriminator. The target domain includes one or more road scenes having lane markings and road signs. The system selects, using the domain discriminator, unlabeled samples from the target domain that are far away from existing annotated samples from the target domain. The system selects, based on a prediction score of each of the unlabeled samples, samples with lower prediction scores. The system annotates the samples with the lower prediction scores.
Shuffle, attend, and adapt: video domain adaptation by clip order prediction and clip attention alignment
A method for performing video domain adaptation for human action recognition is presented. The method includes using annotated source data from a source video and unannotated target data from a target video in an unsupervised domain adaptation setting, identifying and aligning discriminative clips in the source and target videos via an attention mechanism, and learning spatial-background invariant human action representations by employing a self-supervised clip order prediction loss for both the annotated source data and the unannotated target data.
Process to learn new image classes without labels
Described is a system for learning object labels for control of an autonomous platform. Pseudo-task optimization is performed to identify an optimal pseudo-task for each source model of one or more source models. An initial target network is trained using the optimal pseudo-task. Source image components are extracted from source models, and an attribute dictionary of attributes is generated from the source image components. Using zero-shot attribution distillation, the unlabeled target data is aligned with the source models similar to the unlabeled target data. The unlabeled target data are mapped onto attributes in the attribute dictionary. A new target network is generated from the mapping, and the new target network is used to assign an object label to an object in the unlabeled target data. The autonomous platform is controlled based on the object label.
Rapid and accurate modeling of a building construction structure including estimates, detailing, and take-offs using artificial intelligence
Some embodiments relate to generating three dimensional virtual representations of a building construction structure based on two-dimensional real-world construction plans, such as architectural plans or building plans. Some embodiments further produce autonomous, near real-time, and highly accurate and comprehensive building take-offs, complete construction detailing or estimates, detailed bill of materials, plan analysis (including detection of a number of non-standardized objects, such as doors or windows), as well as transforming 2D drawings into 3D and/or providing Building Information Modeling (BIM). The two dimensional real-world architectural plan can include multivariate non-standardized architectural symbols, which define numerous objects including trees, bathrooms, doors, stairs, windows, and floor finishes, lines, including solid, hollow, dashed and dotted lines, which define features including internal or external walls, windows, doors, stairs, property boundaries, easements, footpaths, rooflines, driveways, rights of way, paving stones, landscaping, water, power, drainage, and dimensions, shading, and patterns which define materials and areas on the two dimensional real-world architectural plan, and text which indicate the purposes of the rooms, dimensions, features, construction methods, and regulatory standards.
DATA CLASSIFICATION AND RECOGNITION METHOD AND APPARATUS, DEVICE, AND MEDIUM
A data classification and recognition method includes: obtaining a first data set and a second data set, the second data set including second data, samples in the second data being labeled; performing training using first data in an unsupervised training mode and using the second data in a supervised training mode to obtain a first classification model; obtaining a second classification model; performing distillation training on a model parameter of the second classification model to obtain a data classification model; and performing class prediction on target data by using the data classification model.