Patent classifications
G06V20/41
Intelligent reframing
Intelligent reframing techniques are described in which content (e.g., a movie) can be generated in a different aspect ratio than previously provided. These techniques include obtaining various video frames having a first aspect ratio. Various objects can be identified within the frames. An object having the highest degree of importance in a frame can be selected and a focal point can be calculated based at least in part on that object. A modified version of the content can be generated in a second aspect ratio that is different from the first aspect ratio. The modified version can be generated using the focal point calculated based on the object having the greatest degree of importance. Using these techniques, the content can be provided in a different aspect ratio while ensuring that the most important features of the frame still appear in the new version of the content.
Event/object-of-interest centric timelapse video generation on camera device with the assistance of neural network input
An apparatus including an interface and a processor. The interface may be configured to receive pixel data generated by a capture device. The processor may be configured to generate video frames in response to the pixel data, perform computer vision operations on the video frames to detect objects, perform a classification of the objects detected based on characteristics of the objects, determine whether the classification of the objects corresponds to a user-defined event and generate encoded video frames from the video frames. The encoded video frames may be communicated to a cloud storage service. The encoded video frames may comprise a first sample of the video frames selected at a first rate when the user-defined event is not detected and a second sample of the video frames selected at a second rate while the user-defined event is detected. The second rate may be greater than the first rate.
Computer vision enabled smart snooze home security cameras
An apparatus including an interface and a processor. The interface may be configured to receive pixel data. The processor may be configured to generate a plurality of video frames in response to the pixel data received from the interface, perform computer vision operations to detect objects in the video frames, extract features of the objects in response to characteristics of the objects determined using the computer vision operations, identify a person in the video frames based on the features, detect an event based on the person identified and generate a notification in response to detecting the event and a permission status. The permission status may suppress sending the notification when the permission status for the person identified corresponds to denying the notification and enable sending the notification when the permission status does not correspond to denying the notification.
MONITORING
A method comprising: automatically processing recorded first sensor data from a scene to recognise automatically a first user input from user action in the scene; in response to recognition of the first user input, automatically entering a learning state to enable: automatic processing of the first sensor data from the scene to capture an ad-hoc sequence of spatial events in the scene subsequent to the first user input and automatic processing of subsequently recorded second sensor data from the scene different to the first sensor data of the scene, to recognise automatically a sequence of spatial events in the subsequently recorded second video corresponding to the captured sequence of spatial events.
Methods and Systems for Detecting Persons in a Smart Home Environment
The various implementations described herein include methods, devices, and systems for detecting motion and persons. In one aspect, a method is performed at a smart home system that includes a video camera, a server system, and a client device. The video camera captures video and audio, and wirelessly communicates, via the server system, the captured data to the client device. The server system: (1) receives and stores the captured data from the video camera; (2) determines whether an event has occurred, including detected motion; (3) in accordance with a determination that the event has occurred, identifies video and audio corresponding to the event; and (4) classifies the event. The client device receives information indicative of the identified events, displays a user interface for reviewing the video and audio stored by the remote server system, and displays the at least one classification for the event.
Hazard recognition
Methods, systems, and devices are provided for identifying hazards. According to one aspect, a computer-implemented method can include receiving a plurality of sensor data including one or more image files from a mobile device. The method can include generating one or more position and label pairs based on the plurality of sensor data. The method can include assigning a hazard recognition to each of the position and label pairs. The method can include assigning a score associated to each of the hazard recognitions. The method can include displaying a result including one or more image results based on the one or more image files, one or more hazard recognitions, the one or more hazard recognitions associated with at least one of the one or more image results, and one or more scores associated to each of the hazard recognitions.
Automated pausing of audio and/or video during a conferencing session
Embodiments include an audio analyzer to analyze audio data received from a user computing system operating as a participant in a conference managed by a conferencing application and to detect one or audio pause conditions; a video analyzer to analyze video data received from the user computing system and to detect one or video pause conditions; and a conferencing manager to automatically pause distribution of the audio data to other participants of the conference when the one or more audio pause conditions are detected and automatically pause distribution of the video data to the other participants when the one or more video pause conditions are detected.
Systems and methods for creating video summaries
Video information defining video content may be accessed. Highlight moments within the video content may be identified. Flexible video segments may be determined based on the highlight moments. Individual flexible video segments may include one or more of the highlight moments and a flexible portion of the video content. The flexible portion of the video content may be characterized by a minimum segment duration, a target segment duration, and a maximum segment duration. A duration allocated to the video content may be determined. One or more of the flexible video segments may be selected based on the duration and one or more of the minimum segment duration, the target segment duration, and/or the maximum segment duration of the selected flexible video segments. A video summary including the selected flexible video segments may be generated.
SYSTEMS, PROCESSES AND DEVICES FOR OCCLUSION DETECTION FOR VIDEO-BASED OBJECT TRACKING
Processes, systems, and devices for occlusion detection for video-based object tracking (VBOT) are described herein. Embodiments process video frames to compute histogram data and depth level data for the object to detect a subset of video frames for occlusion events and generate output data that identifies each video frame of the subset of video frames for the occlusion events. Threshold measurement values are used to attempt to reduce or eliminate false positives to increase processing efficiency.
End-to-end vehicle perception system training
Techniques for a perception system of a vehicle that can detect and track objects in an environment are described herein. The perception system may include a machine-learned model that includes one or more different portions, such as different components, subprocesses, or the like. In some instances, the techniques may include training the machine-learned model end-to-end such that outputs of a first portion of the machine-learned model are tailored for use as inputs to another portion of the machine-learned model. Additionally, or alternatively, the perception system described herein may utilize temporal data to track objects in the environment of the vehicle and associate tracking data with specific objects in the environment detected by the machine-learned model. That is, the architecture of the machine-learned model may include both a detection portion and a tracking portion in the same loop.