Patent classifications
G06V10/806
Answering Questions During Video Playback
In implementations of answering questions during video playback, a video system can receive a question related to a video at a timepoint of the video during playback of the video, and determine audio sentences of the video that occur within a segment of the video that includes the timepoint. The video system can generate a classification vector from words of the question and the audio sentences, and determine an answer to the question utilizing the classification vector. The video system can obtain answer candidates, and the answer to the question can be selected as one of the answer candidates based on matching the classification vector to one of the answer vectors.
Devices, systems, and methods for feature encoding
Devices, systems, and methods obtain data in a first modality; propagate the data in the first modality through a neural network, thereby generating network outputs, wherein the neural network includes a first-stage neural network and a second-stage neural network, wherein the first-stage neural network includes two or more layers, wherein each layer of the two or more layers of the first-stage neural network includes a plurality of respective nodes, wherein the second-stage neural network includes two or more layers, one of which is an input layer and one of which is an output layer, and wherein each node in each layer of the first-stage neural network is connected to the input layer of the second-stage neural network; calculate a gradient of a loss function based on the network outputs; backpropagate the gradient through the neural network; and update the neural network based on the backpropagation of the gradient.
Evaluating content on social media networks
A system and method may be used to evaluate content on one or more social media networks. A deep learning model may be stored. A communication may be received, that has been or is to be communicated on a social network. The deep learning model may be applied to the communication to obtain an automated evaluation of the communication. User input may be received, and may include a user evaluation of the communication. The user evaluation may be applied to train the deep learning model. The steps of receiving the communication, applying the deep learning model to obtain the automated evaluation, receiving the user evaluation, and applying the user evaluation to train the model, may be iterated to enhance the accuracy of the automated evaluations.
BIDIRECTIONAL COMPACT DEEP FUSION NETWORKS FOR MULTIMODALITY VISUAL ANALYSIS APPLICATIONS
Techniques related to bidirectional compact deep fusion networks for multimodal image inputs are discussed. Such techniques include applying a shared convolutional layer and independent batch normalization layers to input volumes for each modality and fusing features from the resultant output volumes in both directions across the modalities.
METHODS AND SYSTEMS FOR AUGMENTING DEPTH DATA FROM A DEPTH SENSOR, SUCH AS WITH DATA FROM A MULTIVIEW CAMERA SYSTEM
Methods of determining the depth of a scene and associated systems are disclosed herein. In some embodiments, a method can include augmenting depth data of a scene captured with a depth sensor with depth data from one or more images of the scene. For example, the method can include capturing image data of the scene with a plurality of cameras. The method can further include generating a point cloud representative of the scene based on the depth data from the depth sensor and identifying a missing region of the point cloud, such as a region occluded from the view of the depth sensor. The method can then include generating depth data for the missing region based on the image data. Finally, the depth data for the missing region can be merged with the depth data from the depth sensor to generate a merged point cloud representative of the scene.
METHOD AND APPARATUS FOR OBSTACLE DETECTION UNDER COMPLEX WEATHER
The present invention discloses a method and an apparatus for obstacle detection under complex weather. The method includes: obtaining an image under a complex weather condition; performing enhanced preprocessing on the image by using a multi-scale retinex with color restoration MSRCR algorithm; inputting the preprocessed image into a trained obstacle detection model based on an improved YOLOv3 network; and according to output of the obstacle detection model based on the improved YOLOv3 network, determining an obstacle detection result under the complex weather; replacing a Leaky-ReLU activation function in convolutional layers in the original YOLOv3 network with an ELU activation function; and training the obstacle detection model with the processed data set to obtain a trained obstacle detection model based on the improved YOLOv3 network.
Parking assisting apparatus and control unit
A parking assisting apparatus includes an imaging unit acquiring image information corresponding to an image of surroundings of a vehicle, an image processing section recognizing a feature shape in the image by processing the image information, an obstacle detecting section acquiring positional-relationship information corresponding to a positional relationship between the vehicle and an obstacle present around a parking space, and a manner-of-parking selecting section selecting a manner of parking the vehicle in the parking space from manner candidates including perpendicular parking and parallel parking based on the feature shape and the positional-relationship information. The manner-of-parking selecting section selects the manner of parking from the manner candidates by integrating a likelihood of each of the manner candidates based on the positional-relationship information with a likelihood of the manner candidate based on a recognition result to calculate final likelihoods of the respective manner candidates, and by comparing the calculated final likelihoods.
Multi-view image clustering techniques using binary compression
This disclosure relates to improved techniques for performing multi-view image clustering. The techniques described herein utilize machine learning functions to optimize the image clustering process. Multi-view features are extracted from a collection of images. A machine learning function is configured to jointly learn a fused binary representation that combines the multi-view features and one or more binary cluster structures that can be used to partition the images. A clustering function utilizes the fused binary representation and the one or more binary cluster structures to generate one or more image clusters based on the collection of images.
Automated Mapping Information Generation From Inter-Connected Images
Techniques are described for using computing devices to perform automated operations to generate mapping information using inter-connected images of a defined area, and for using the generated mapping information in further automated manners. In at least some situations, the defined area includes an interior of a multi-room building, and the generated information includes a floor map of the building, such as from an automated analysis of multiple panorama images or other images acquired at various viewing locations within the buildingin at least some such situations, the generating is further performed without having detailed information about distances from the images' viewing locations to walls or other objects in the surrounding building. The generated floor map and other mapping-related information may be used in various manners, including for controlling navigation of devices (e.g., autonomous vehicles), for display on one or more client devices in corresponding graphical user interfaces, etc.
ROBOT AND METHOD FOR OPERATING SAME
A robot and a method for operating the same according to one aspect of the present disclosure can provide emotion based services by acquiring data related to a user and recognizing emotional information on the basis of the data related to the user, and automatically generate a character expressing an emotion of the user by generating an avatar by mapping the recognized emotional information of the user to face information of the user.