G06V20/00

Method and system for producing story video

A method and a system for producing a story video are provided. A method for producing a story video, according to one embodiment, can produce a specific story video by determining a theme of a story that is suitable for collected videos and selecting and arranging an appropriate video for each frame of a template associated with the theme.

Normal estimation for a planar surface

Various implementations disclosed herein include devices, systems, and methods for normal estimation using a directional measurement, such as a gravity vector. In various implementations, a device includes a non-transitory memory and one or more processors coupled with the non-transitory memory. In some implementations, a method includes identifying planar surfaces in an environment represented by an image. Each planar surface is associated with a respective orientation. A directional vector associated with the environment is determined. A subset of the planar surfaces that have a threshold orientation relative to the directional vector is identified. For each planar surface in the subset of the planar surfaces, a normal vector for the planar surface is determined based on the orientation of the planar surface and the directional vector.

Normal estimation for a planar surface

Various implementations disclosed herein include devices, systems, and methods for normal estimation using a directional measurement, such as a gravity vector. In various implementations, a device includes a non-transitory memory and one or more processors coupled with the non-transitory memory. In some implementations, a method includes identifying planar surfaces in an environment represented by an image. Each planar surface is associated with a respective orientation. A directional vector associated with the environment is determined. A subset of the planar surfaces that have a threshold orientation relative to the directional vector is identified. For each planar surface in the subset of the planar surfaces, a normal vector for the planar surface is determined based on the orientation of the planar surface and the directional vector.

Analyzing images and videos of damaged vehicles to determine damaged vehicle parts and vehicle asymmetries
11704631 · 2023-07-18 · ·

A system may receive video of a damaged vehicle, perform image analysis of the video to determine one or more frames of the video that include a damaged portion of the vehicle, further analyze the one or more frames of the video that include a damaged portion of the vehicle to determine a damaged cluster of parts of the vehicle, determine whether the damaged cluster of parts should be repaired or replaced, map the damaged cluster of parts to one or more parts in a vehicle-specific database of parts, and generate, based on the mapping, a list of parts for repair or replacement.

Analyzing images and videos of damaged vehicles to determine damaged vehicle parts and vehicle asymmetries
11704631 · 2023-07-18 · ·

A system may receive video of a damaged vehicle, perform image analysis of the video to determine one or more frames of the video that include a damaged portion of the vehicle, further analyze the one or more frames of the video that include a damaged portion of the vehicle to determine a damaged cluster of parts of the vehicle, determine whether the damaged cluster of parts should be repaired or replaced, map the damaged cluster of parts to one or more parts in a vehicle-specific database of parts, and generate, based on the mapping, a list of parts for repair or replacement.

Providing a response in a session

The present disclosure provides method and apparatus for providing a response to a user in a session. At least one message associated with a first object may be received in the session, the session being between the user and an electronic conversational agent. An image representation of the first object may be obtained. Emotion information of the first object may be determined based at least on the image representation. A response may be generated based at least on the at least one message and the emotion information. The response may be provided to the user.

Robotic interactions for observable signs of intent

Described herein are assistant robots that anticipate needs of one or more people (or animals). The assistant robots may recognize a current activity, knowledge of the person's routines, and contextual information. As such, the assistant robots can provide or offer to provide appropriate robotic assistance. The assistant robots can learn users' habits or be provided with knowledge regarding humans in its environment. The assistant robots develop a schedule and contextual understanding of the persons' behavior and needs. The assistant robots may interact, understand, and communicate with people before, during, or after providing assistance. The robot can combine gesture, clothing, emotional aspect, time, pose recognition, action recognition, and other observational data to understand people's medical condition, current activity, and future intended activities and intents.

System and method for fashion attributes extraction

A system and a method for training an inference model using a computing device. The method includes: providing a text-to-vector converter; providing the inference model and pre-training the inference model using labeled fashion entries; providing non-labeled fashion entries; separating each of the non-labeled fashion entries into a target image and target text; converting the target text into a category vector and an attribute vector using the text-to-vector converter; processing the target image using the inference model to obtain processed target image and target image label; comparing the category vector to the target image label; when the category vector matches the target image label, updating the target image label based on the category vector and the attribute vector to obtain updated label; and retraining the inference model using the processed target image and the updated label.

System and method for fashion attributes extraction

A system and a method for training an inference model using a computing device. The method includes: providing a text-to-vector converter; providing the inference model and pre-training the inference model using labeled fashion entries; providing non-labeled fashion entries; separating each of the non-labeled fashion entries into a target image and target text; converting the target text into a category vector and an attribute vector using the text-to-vector converter; processing the target image using the inference model to obtain processed target image and target image label; comparing the category vector to the target image label; when the category vector matches the target image label, updating the target image label based on the category vector and the attribute vector to obtain updated label; and retraining the inference model using the processed target image and the updated label.

Scalable three-dimensional object recognition in a cross reality system

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scalable three-dimensional (3-D) object recognition in a cross reality system. One of the methods includes maintaining object data specifying objects that have been recognized in a scene. A stream of input images of the scene is received, including a stream of color images and a stream of depth images. A color image is provided as input to an object recognition system. A recognition output that identifies a respective object mask for each object in the color image is received. A synchronization system determines a corresponding depth image for the color image. A 3-D bounding box generation system determines a respective 3-D bounding box for each object that has been recognized in the color image. Data specifying one or more 3-D bounding boxes is received as output from the 3-D bounding box generation system.