Patent classifications
G06V20/47
Auto-Capture of Interesting Moments by Assistant Systems
In one embodiment, a method includes accessing from a client system associated with a first user sensor signals captured by sensors of the client system, wherein the client system comprises a plurality of sensors, and wherein the sensors signals are accessed from the sensors based on cascading model policies, wherein each cascading model policy utilizes one or more of a respective cost or relevance associated with each sensor, detecting a change in a context of the first user associated with an activity of the first user based on machine-learning models and the sensor signals, wherein the change in the context of the first user satisfies a trigger condition associated with the activity, and responsive to the detected change in the context of the first user automatically capturing visual data by cameras of the client system.
Gating model for video analysis
Implementations described herein relate to methods, devices, and computer-readable media to perform gating for video analysis. In some implementations, a computer-implemented method includes obtaining a video comprising a plurality of frames and corresponding audio. The method further includes performing sampling to select a subset of the plurality of frames based on a target frame rate and extracting a respective audio spectrogram for each frame in the subset of the plurality of frames. The method further includes reducing resolution of the subset of the plurality of frames. The method further includes applying a machine-learning based gating model to the subset of the plurality of frames and corresponding audio spectrograms and obtaining, as output of the gating model, an indication of whether to analyze the video to add one or more video annotations.
Media management system for video data processing and adaptation data generation
In various embodiments, methods and systems for implementing a media management system, for video data processing and adaptation data generation, are provided. At a high level, a video data processing engine relies on different types of video data properties and additional auxiliary data resources to perform video optical character recognition operations for recognizing characters in video data. In operation, video data is accessed to identify recognized characters. A video OCR operation to perform on the video data for character recognition is determined from video character processing and video auxiliary data processing. Video auxiliary data processing includes processing an auxiliary reference object; the auxiliary reference object is an indirect reference object that is a derived input element used as a factor in determining the recognized characters. The video data is processed based on the video OCR operation and based on processing the video data, at least one recognized character is communicated.
Information processing apparatus, information processing system, information processing method, and storage medium
The information processing apparatus includes an acquisition unit configured to acquire a plurality of videos captured by a plurality of imaging apparatuses, an extraction unit configured to extract one or more pieces of object information each representing a trajectory of an object from each of the plurality of videos acquired by the acquisition unit, and a generation unit configured to generate a synopsis video obtained by gathering objects on one background image based on the object information extracted by the extraction unit.
Image Processing
An apparatus and method for image processing is disclosed. The method may include receiving an image from a camera sensor, receiving selection of one or more target objects appearing in the image and tracking the one or more target objects over a plurality of subsequently-received images. For the subsequently-received images in turn, the method may include estimating one or more performance metric(s) associated with performing a fill-in processing operation of the one or more tracked target objects and saving the image as an optimised reference image if the respective performance metric(s) indicate an improved performance over that of one or more previously-received images from the time of receiving selection. The method may include performing the fill-in processing operation using one or more of the saved optimised reference images for output to a display screen.
INFORMATION PROCESSING APPARATUS, CONTROL METHOD, AND PROGRAM
An information processing apparatus (2000) includes a summarizing unit (2040) and a display control unit (2060). The summarizing unit (2040) obtains a video (30) generated by each of a plurality of cameras (10). Furthermore, the summarizing unit (2040) performs a summarizing process on the video (30) and generates summary information of the video (30). The display control unit (2060) causes a display system (20) to display the video (30). Here, the display control unit (2060) causes the display system (20) to display the summary information of the video (30) in response to that a change in a display state of the video (30) in the display system (20) satisfies a predetermined condition.
On-line video filtering
Some embodiments relate to a system and method to increase the speed of a computer determination whether a video contains a particular content. In some embodiments, the quantity of data in the video is first reduced while preserving the searched-for content. Optionally, first, the size of the data is reduced by reducing the resolution, for example resolution may be reduced without searching and/or processing the full data set. Additionally or alternatively, low quality and/or empty data is removed from the dataset. Additionally or alternatively, redundant data may be searched out and/or removed. Optionally, after data reduction, the reduced dataset is analyzed to determine if it contains the searched-for content. Optionally, an estimate is made of the probability of the full dataset containing the searched-for content.
Highlight video generated with adaptable multimodal customization
In implementations for highlight video generated with adaptable multimodal customization, a multimodal detection system tracks activities based on poses and faces of persons depicted in video clips of video content. The system determines a pose highlight score and a face highlight score for each of the video clips that depict at least one person, the highlight scores representing a relative level of the interest in an activity depicted in a video clip. The system also determines pose-based emotion features for each of the video clips. The system can detect actions based on the activities of the persons depicted in the video clips, and detect emotions exhibited by the persons depicted in the video clips. The system can receive input selections of actions and emotions, and filter the video clips based on the selected actions and emotions. The system can then generate a highlight video of ranked and filtered video clips.
SYSTEMS, DEVICES, AND METHODS EMPLOYING THE SAME FOR ENHANCING AUDIENCE ENGAGEMENT IN A COMPETITION OR PERFORMANCE
Presented herein is an interactive platform for judging an activity by a participant in an event. The platform includes a client application program downloadable to a mobile device. The program may include a database storing a mobile device identifier (ID), a user ID, user information, and location data of the device. The application may further be configured to display one or more events of the activity as well as an input for receiving a score of the activity from the user. The platform may additionally include a server system connected with the client application programs via a communication network. The server system may be configured for receiving the mobile device ID, the user ID, the user information, and the location data for the client program, and may further be configured to receive the scores from the users, and to adjust the scores according to determined bias of the associated user.
Automated Recording Highlights For Conferences
A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. Scores are determined for strings of the transcript that are used to select strings for inclusion in the conference summary. Determining the scores includes determining respective sentence vectors for strings. A sentence vector has elements corresponding to words in the transcript that are proportional to occurrences of the word in the string and inversely proportional to occurrences of the word in the transcript. A short video conference summary or a short audio conference summary is then generated using timestamps from the transcript associated with strings (e.g., sentences) that have been selected for inclusion in the conference summary. The short video or audio summary may be presented to users to enable efficient storage and transmission of conference information within a unified communications system.