Patent classifications
G06V20/47
Video tagging system and method
An automatic video tagging system which learns from videos, their web context and comments shared on social networks is described. Massive multimedia collections are analyzed by Internet crawling and a knowledge base is maintained that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. Practical applications of video recognition require a label scheme that is appealing to the end-user (i.e. obtained from social curation) and a training dataset that can be updated in real-time to be able to recognize new actions, scenes and people. To create this dataset that evolves in real-time and uses labels that are relevant to the users, a weakly-supervised deep learning approach is utilized combining both a machine-learning pre-processing stage together with a set of keywords obtained from the internet. The resulting tags combined with videos and summaries of videos are used with deep learning to train a neural network in an unsupervised manner that allows the tagging system to go from an image to a set of tags for the image and then to the visual representation of a tag.
Generating video summaries for a video using video summary templates
Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.
AUTOMATIC CINEMAGRAPH
A system for performing automatic cinemagraph creation is described herein. The system comprises a memory and a processor. The memory is configured to receive series of images. The processor is coupled to the memory. The processor is to segment the series of images, select the most fitting times and mask, and apply the times and masks to the series of images to generate a cinemagraph.
Utilizing a machine learning model trained to determine subtle pose differentiations to automatically capture digital images
The present disclosure describes systems, non-transitory computer-readable media, and methods for utilizing a machine learning model trained to determine subtle pose differentiations to analyze a repository of captured digital images of a particular user to automatically capture digital images portraying the user. For example, the disclosed systems can utilize a convolutional neural network to determine a pose/facial expression similarity metric between a sample digital image from a camera viewfinder stream of a client device and one or more previously captured digital images portraying the user. The disclosed systems can determine that the similarity metric satisfies a similarity threshold, and automatically capture a digital image utilizing a camera device of the client device. Thus, the disclosed systems can automatically and efficiently capture digital images, such as selfies, that accurately match previous digital images portraying a variety of unique facial expressions specific to individual users.
Personalizing videos with nonlinear playback
A method for personalized playback of a video as performed by a video platform includes parsing a video into segments based on visual and audio content of the video. The platform creates multimodal fragments that represent underlying segments of the video, and then orders the multimodal fragments based on a preference of a target user. The platform thus enables nonlinear playback of the segmented video in accordance with the multimodal fragments.
DELIVERING MEDIA CONTENT TO A CONTENT CONSUMING USER
In the following, a content delivery system delivers a modified version of a media asset to a current content consuming user. Control information identifying a desired attribute of the modified version of the asset is received from the current content consuming user. The media asset is modified based on the control information and audience reaction data associated with the media asset and generated by analyzing at least a previous content consuming user's reactions to the media asset whilst the media asset was supplied to a media output device of the previous content consuming user.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
Provided is an information processing device including: an imaging position information calculation unit that calculates a position and an orientation of an imaging device as imaging device position information on the basis of camera data received from the imaging device in association with video data; and an information display control unit that controls a display of the imaging device position information. Provided is an information processing method performed by a processor, the method including: calculating a position and an orientation of an imaging device as imaging device position information on the basis of camera data received from the imaging device in association with video data; and controlling a display of the imaging device position information.
Video monitoring system
A video transmission apparatus detects a target object to be monitored from images obtained by imaging a monitoring area, tracks the detected target object, obtains existence time from appearance to disappearance of the target object, and transmits a data volume reduced image of the target object to a video reception apparatus. The video reception apparatus analyzes the data volume reduced image transmitted from the video transmission apparatus and transmits a video request of the target object to the video transmission apparatus) based on input made according to the result of the analysis. When the video request transmitted from the video reception apparatus is received, the video transmission apparatus generates a monitoring video from the appearance to the disappearance of the target object based on the existence time of the target object and transmits the monitoring video to the video reception apparatus.
Content playback system, server, mobile terminal, content playback method, and recording medium
Selected image data or specific information thereon is stored in association with moving image data as a management marker of a selected image. The selected image data is selected from among still image data extracted from the moving image data. When an output image of the selected image is captured, image analysis is performed on the captured image data to acquire a management marker of a captured image. A management marker of a selected image corresponding to the management marker of the captured image from among management markers of selected images stored in the storage is specified. Digest moving image data is generated by picking out a part of moving image data associated with the specific management marker. Control is performed so that a digest moving image is playbacked and displayed on the display section.
RECURSIVE NEURAL NETWORKS ON FUTURE EVENT PREDICTION
Systems and methods for training a recursive neural network (RNN) is provided. The method includes generating, by the processor using the RNN, a plurality of embedding vectors based on a plurality of observations, wherein the observations include (i) a subject, (ii) an action taken by the subject, and (iii) an object on which the subject is taking the action on, wherein the subject and object are constant. The method further includes generating, by the processor, predictions of one or more future events based on one or more comparisons of at least some of the plurality of embedding vectors. The method also includes initiating, by the processor, based on the predictions, an action to a hardware device to mitigate expected harm to at least one item selected from the group consisting of the hardware device, another hardware device related to the hardware device, and a person related to the hardware device.