Patent classifications
G06V10/763
Automated Video Segmentation
Methods and systems for automated video segmentation are disclosed. A sequence of video frames having video segments of contextually-related sub-sequences may be received. Each frame may be labeled according to segment and segment class. A video graph may be constructed in which each node corresponds to a different frame, and each edge connects a different pair of nodes, and is associated with a time between video frames and a similarity metric of the connected frames. An artificial neural network (ANN) may be trained to predict both labels for the nodes and clusters of the nodes corresponding to predicted membership among the segments, using the video graph as input to the ANN, and ground-truth clusters of ground-truth labeled nodes. The ANN may be further trained to predict segment classes of the predicted clusters, using the segment classes as ground truths. The trained ANN may be configured for application runtime video sequences.
Method for recognizing distribution network equipment based on raspberry pi multi-scale feature fusion
Disclosed is a method for recognizing distribution network equipment based on Raspberry Pi multi-scale feature fusion. The method includes obtaining an initial sample data set; constructing an object detection network composed of EfficientNe-B0 backbone network, multi-scale feature fusion module and a regression classification prediction head; training the object detection network by taking the initial sample data set as a training sample; finally, detecting inspection pictures by using a the trained object detection network. A light-weight EfficientNet-B0 backbone network feature extraction method obtains more features of objects. Meanwhile, an introduction of multi-scale feature fusion better adapts to small object detection, and a light-weight y_pred regression classification detection head is effectively deployed and realized in Raspberry Pi embedded equipment with tight resources and limited computing power.
Systems and methods for generating music recommendations
Systems, methods, and non-transitory computer-readable media can be configured to determine a video embedding for a video content item based at least in part on a first machine learning model. A set of music embeddings can be determined for a set of music content items based at least in part on a second machine learning model. The set of music content items can be ranked based at least in part on the video embedding and the set of music embeddings.
SYSTEM AND METHOD FOR DEEP LEARNING TECHNIQUES UTILIZING CONTINUOUS FEDERATED LEARNING WITH A DISTRIBUTED DATA GENERATIVE MODEL
A computer implemented method is provided. The method includes establishing, via multiple processors, a continuous federated learning framework including a global model at a global site and respective local models derived from the global model at respective local sites. The method also includes retraining or retuning, via the multiple processors, the global model and the respective local models without sharing actual datasets between the global site and the respective local sites but instead sharing synthetic datasets generated from the actual datasets.
Cloud detection on remote sensing imagery
A system for detecting clouds and cloud shadows is described. In one approach, clouds and cloud shadows within a remote sensing image are detected through a three step process. In the first stage a high-precision low-recall classifier is used to identify cloud seed pixels within the image. In the second stage, a low-precision high-recall classifier is used to identify potential cloud pixels within the image. Additionally, in the second stage, the cloud seed pixels are grown into the potential cloud pixels to identify clusters of pixels which have a high likelihood of representing clouds. In the third stage, a geometric technique is used to determine pixels which likely represent shadows cast by the clouds identified in the second stage. The clouds identified in the second stage and the shadows identified in the third stage are then exported as a cloud mask and shadow mask of the remote sensing image.
Directed control transfer for autonomous vehicles
Techniques are described for cognitive analysis for directed control transfer for autonomous vehicles. In-vehicle sensors are used to collect cognitive state data for an individual within a vehicle which has an autonomous mode of operation. The cognitive state data includes infrared, facial, audio, or biosensor data. One or more processors analyze the cognitive state data collected from the individual to produce cognitive state information. The cognitive state information includes a subset or summary of cognitive state data, or an analysis of the cognitive state data. The individual is scored based on the cognitive state information to produce a cognitive scoring metric. A state of operation is determined for the vehicle. A condition of the individual is evaluated based on the cognitive scoring metric. Control is transferred between the vehicle and the individual based on the state of operation of the vehicle and the condition of the individual.
Map partition system for autonomous vehicles
In one embodiment, a system identifies a road to be navigated by an ADV, the road being captured by one or more point clouds from one or more LIDAR sensors. The system extracts road marking information of the identified road from the point clouds, the road marking information describing one or more road markings of the identified road. The system partitions the road into one or more road partitions based on the road markings. The system generates a point cloud map based on the road partitions, where the point cloud map is utilized to perceive a driving environment surrounding the ADV.
GPU accelerated image segmentation
Data gathered from a continual data source is converted into an image and represented by a rectilinear grid defining a grid of pixels. Each pixel in each axis unit of one axis is examined in parallel using a graphic processor unit (GPU) to determine whether any pixels exceed a predefined threshold. Those pixels that exceed the threshold are identified as positive return pixels. Within each axis unit groups of positive return pixels are identified based on a first axis epsilon. Adjacent groups of positive return pixels are assembled by merging the axis units based on a second axis epsilon. Groups of positive return pixels grouped together according to the first axis epsilon and the second axis epsilon are classified and reported as a signal.
Radar-Based Gesture Classification Using a Variational Auto-Encoder Neural Network
In an embodiment, a method includes: obtaining one or more positional time spectrograms of a radar measurement of a scene comprising an object; and based on the one or more positional time spectrograms and based on a feature embedding of a variational auto-encoder neural network, predicting a gesture class of a gesture performed by the object.
Methods and systems for face recognition
Systems and methods for face recognition are provided. The systems may perform the methods to obtain a neural network comprising a first sub-neural network and a second sub-neural network; generate a plurality of preliminary feature vectors based on an image associated with a human face, the plurality of preliminary feature vectors comprising a color-based feature vector; obtain at least one input feature vector based on the plurality of preliminary feature vectors; generate a deep feature vector based on the at least one input feature vector using the first sub-neural network; and recognize the human face based on the deep feature vector.