G06V20/41

Personalized conversational recommendations by assistant systems

In one embodiment, a method includes receiving a user request from a client system associated with a user, generating a response to the user request which references one or more entities, generating a personalized recommendation based on the user request and the response, wherein the personalized recommendation references one or more of the entities of the response, and sending instructions for presenting the response and the personalized recommendation to the client system.

Automatic trailer detection in multimedia content
11694726 · 2023-07-04 · ·

The disclosed computer-implemented method may include accessing media segments that correspond to respective media items. At least one of the media segments may be divided into discrete video shots. The method may also include matching the discrete video shots in the media segments to corresponding video shots in the corresponding media items according to various matching factors. The method may further include generating a relative similarity score between the matched video shots in the media segments and the corresponding video shots in the media items, and training a machine learning model to automatically identify video shots in the media items according to the generated relative similarity score between matched video shots. Various other methods, systems, and computer-readable media are also disclosed.

Generation and usage of semantic features for detection and correction of perception errors

Described is a system for detecting and correcting perception errors in a perception system. In operation, the system generates a list of detected objects from perception data of a scene, which allows for the generation of a list of background classes from backgrounds in the perception data associated with the list of detected objects. For each detected object in the list of detected objects, a closest background class is identified from the list of background classes. Vectors can then be used to determine a semantic feature, which is used to identify axioms. An optimal perception parameter is then generated, which is used to adjust perception parameters in the perception system to minimize perception errors.

Automatic identification of misleading videos using a computer network

Machine-based video classifying to identify misleading videos by training a model using a video corpus, obtaining a subject video from a content server, generating respective feature vectors of a title, a thumbnail, a description, and a content of the subject video, determining a first semantic similarities between ones of the feature vectors, determining a second semantic similarity between the title of subject video and titles of videos in the misleading video corpus in a same domain as the subject video, determining a third semantic similarity between comments of the subject video and comments of videos in the misleading video corpus in the same domain as the subject video, classifying the subject video using the model and based on the first semantic similarities, the second semantic similarity, and the third semantic similarity, and outputting the classification of the subject video to a user.

Video analytics scene classification and automatic camera configuration based automatic selection of camera profile

Example implementations include a method, apparatus and computer-readable medium for configuring profiles for a camera, comprising receiving video from the camera. The implementations further include classifying a first scene of the first video stream. Additionally, the implementations further include determining a first metadata for the first scene. Additionally, the implementations further include selecting a first profile for the camera based on the first metadata, wherein the first profile comprises one or more configuration parameters, wherein values of each of the one or more configuration parameters of the first profile are based on the first metadata. Additionally, the implementations further include configuring the camera with the first profile.

Systems and methods for counting repetitive activity in audio video content

Repetitive activities can be captured in audio video content. The AV content can be processed in order to predict the number of repetitive activities present in the AV content. The accuracy of the predicted number may be improved, especially for AV content with challenging conditions, by basing the predictions on both the audio and video portions of the AV content.

Automated Content Segmentation and Identification of Fungible Content

A content segmentation system includes a computing platform having processing hardware and a system memory storing a software code and a trained machine learning model. The processing hardware is configured to execute the software code to receive content, the content including multiple sections each having multiple content blocks in sequence, to select one of the sections for segmentation, and to identify, for each of the content blocks of the selected section, at least one respective representative unit of content. The software code is further executed to generate, using the at least one respective representative unit of content, a respective embedding vector for each of the content blocks of the selected section to provide a multiple embedding vectors, and to predict, using the trained machine learning model and the embedding vectors, subsections of the selected section, at least some of the subsections including more than one of the content blocks.

SYSTEMS AND METHODS FOR STREAMING WORKOUT VIDEO SESSIONS

Disclosed are example embodiments of systems and methods for displaying a workout session. For example, a method for displaying a workout session is disclosed. The method includes determining whether a plurality of participants are eligible to be presented on a wall of live streams of a live workout session. The method also includes displaying a video stream of a first participant from the plurality of participants on the wall of live streams based on the determination.

METHOD AND ELECTRONIC DEVICE FOR A SLOW MOTION VIDEO

A method for generating a slow motion video. The method includes segmenting, by an electronic device, objects in the video. Further, the method includes determining, by the electronic device, an interaction between the segmented objects. Further, the method includes clustering, by the electronic device, the segmented objects in the video to generate object clusters based on the interaction. Further, the method includes determining, by the electronic device, a degree of slow motion effect to be applied to each of the object clusters in the video based on a significance score of each of the object clusters. Further, the method includes generating, by the electronic device, the slow motion video by applying the degree of slow motion effect to that has been determined to corresponding the object clusters.

Storage and Processing of Intermediate Features in Neural Networks
20230004742 · 2023-01-05 ·

Systems and methods described herein provide for the use and storage of intermediate layer data within a neural network processing system. A first neural network, such as an object detection neural network may receive and process raw video image data to generate output utilized for metadata creation. Secondary neural networks may be configured to accept input data from one or more intermediate layers of the first neural network instead of the raw video image data. In this way, the initial data processed by the intermediate layers of the first neural network can be stored and utilized as a shortcut for processing additional features or attributes within the video image data. This alleviates the need to process video image data multiple times in different neural networks. The intermediate layer data can be stored in a different and often cheaper storage system and recalled faster and with fewer resources for future use.