G06F16/786

SEARCH APPARATUS, SEARCH METHOD, AND NON-TRANSITORY STORAGE MEDIUM
20200242155 · 2020-07-30 · ·

A search apparatus (10) including a storage unit (11) that stores video index information including correspondence information which associates a type of one or a plurality of objects extracted from a video with a motion of the object, an acquisition unit (12) that acquires a search key associating the type of one or the plurality of objects as a search target with the motion of the object, and a search unit (13) that searches the video index information on the basis of the search key is provided.

Decomposition of a video stream into salient fragments

The disclosure includes a system and method for decomposing a video to salient fragments and synthesizing a video composition based on the salient fragments. A video decomposition application extracts non-salient portions of a video, extracts a plurality of salient fragments of the video, builds a database of the plurality of salient fragments, receives a query, retrieves, from the database of the plurality of salient fragments, a set of salient fragments based on the query, and synthesizes a video composition based on the set of salient fragments and the non-salient portions of the video.

AGGLOMERATED VIDEO HIGHLIGHTS WITH CUSTOM SPECKLING

Presentation of video highlights is disclosed. A data processing system receives from multiple users, multimedia files with user-generated video(s), the multimedia files being produced and enhanced by the users. The data processing system generates a speckle excitement vector of the multimedia files based on identifying feature(s) of the user-generated video(s). The processing and distribution system determines a cognitive state of each of the users based, in part, on the speckle excitement vector of each of the multimedia files. The processing and distribution system alters characteristic(s) of the user-generated video(s) of the multimedia files based on the cognitive state of each of the users that results in altered video(s). The processing and distribution system compiles the altered video(s) into a digital file that includes automatically-produced multimedia content. The processing and distribution system makes the digital file available for viewing.

Subsumption architecture for processing fragments of a video stream

The disclosure includes a system and method for distributing video segments of a video to one or more brokers based on topics and storing the video segments in a distributed commit log associated with the topics. A video processing application decomposes a video into fragments, groups the fragments into topics based on identifiers associated with the fragments, breaks the fragments into a sequence of segments, distributes the sequence of segments to one or more brokers based on the topics, and stores, by the one or more brokers, the sequence of segments associated with a topic in a distributed commit log while preserving a sequence order of the sequence of segments.

System and method for monitoring a premises based on parsed codec data
10685542 · 2020-06-16 · ·

This document describes a monitoring system for detecting conditions at a physical premises. The monitoring system can receive, by a computing system, from a video sensor system deployed at the physical premises, block-based encoded video data encoded with a block-based encoder in the video sensor system. The monitoring system can parse, by the computing system, the block-based encoded video data to extract from the block-based encoded data macroblock arrays that correspond to areas of a frame of video data. The monitoring system can reduce, by the computing system, the macroblock arrays to one or more data clusters. The monitoring system can apply, by the computing system, a pattern recognition algorithm to the one or more data clusters to detect patterns in the one or more data clusters.

SYSTEM AND METHOD OF VIDEO CONTENT FILTERING

An input video sequence from a camera is filtered by a process that comprises detecting temporal tracks of moving image parts from the input video sequence and assigning activity scores to temporal segments of the tracks, using respective predefined track dependent activity score functions for a plurality of different activity types. Based on this, event scores for are computed as a function of time. This computation is controlled by a definition of a temporal sequence of activity types or compound activity types for an event type. Successive intermediate scores are computed, each as a function of time for a respective activity types or compound activity types in the temporal sequence. The successive intermediate scores for each respective activity types or compound activity are computed from a combination of the intermediate score for a preceding activity type or compound activity type in the temporal sequence at a preceding time and activity scores that were assigned to segments of the tracks after the preceding time, for the activity type or activity types defined by the compound activity type defined by the respective activity types or compound activity types in the temporal sequence. One of the computed event scores for a selected time. The computation of the selected event score is traced back to identify intermediate scores that were used to compute the selected one of the event scores and to identify segments of the tracks for which the assigned activity scores were used to compute the identified intermediate scores. An output video sequence and/or video image is generates that selectively includes the image parts associated with the selected segments.

ELECTRONIC APPARATUS, DOCUMENT DISPLAYING METHOD THEREOF AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM

The disclosure relates to an artificial intelligence (AI) system using a machine learning algorithm such as deep learning, and an application thereof. In particular, an electronic apparatus, a document displaying method thereof, and a non-transitory computer readable recording medium are provided. An electronic apparatus according to an embodiment of the disclosure includes a display unit displaying a document, a microphone receiving a user voice, and a processor configured to acquire at least one topic from contents included in a plurality of pages constituting the document, recognize a voice input through the microphone, match the recognized voice with one of the acquired at least one topic, and control the display unit to display a page including the matched topic.

Generating and reviewing motion metadata

Aspects of the instant disclosure relate to methods for generating motion metadata for a newly captured video feed. In some aspects, methods of the subject technology can include steps for recording a video feed using the video capture system, partitioning the image frames into a plurality of pixel blocks, and processing the image frames to detect one or more motion events. In some aspects, the method may further include steps for generating motion metadata describing each of the one or more motion events. Systems and computer-readable media are also provided.

Scene Level Video Search

In some embodiments, a method trains a first prediction network to predict similarity between images in videos. The training uses boundaries detected in the videos to train the prediction network to predict images in a same scene to have similar feature descriptors. The first prediction network generates feature descriptors that describe library images from videos in a video library offered to users of a video delivery service. A search image is received and the prediction network predicts one or more library images for one or more videos that are predicted to be similar to the received image. The one or more library images for the one or more videos are provided as a search result.

Reasoning from surveillance video via computer vision-based multi-object tracking and spatiotemporal proximity graphs

Methods, systems, and apparatuses, among other things, may detect and store activity in videos based on a spatiotemporal graph representation. Spatiotemporal proximity graphs may be built based on one or more received tracks and may include one or more nodes and each node may include one or more attributes associated with a corresponding entity. One or more spatiotemporal relationships may be identified between the entities based on each spatiotemporal proximity graph one or more activities of the entities may be identified based on the spatiotemporal relationships.