G06V20/47

Machine learning based human activity detection and classification in first and third person videos

An analytics device for monitoring maintenance on an elevator system performed by an individual including: a processor; and a memory including computer-executable instructions that, when executed by the processor, cause the processor to perform operations, the operations including: capturing a first video stream using a first video camera; extracting sequences from at least the first video stream; extracting features from the sequences; and analyzing, using a long short-term memory model, the sequence to determine whether the maintenance performed on the elevator system by the individual is performed correctly.

METHOD AND APPARATUS FOR GENERATING SYNOPSIS VIDEO AND SERVER
20220415360 · 2022-12-29 ·

A method for generating a synopsis video includes acquiring a target video and parameter data related to editing of the target video, wherein the parameter data comprises at least a duration parameter of a synopsis video of the target video; extracting a plurality of pieces of image data from the target video, and determining an image label of the image data, wherein the image label comprises at least a visual-type label; and determining a type of the target video, and establishing a target editing model for the target video according to the type of the target video, the duration parameter, and a plurality of preset editing technique submodels; and editing the target video according to the image label of the image data in the target video by using the target editing model to obtain the synopsis video of the target video

Video highlights with auto trimming
11538499 · 2022-12-27 · ·

A server configured to receive video clips from a mobile device, such as eyewear. The server has an electronic processor enabled to execute computer instructions to process the video clips to identify one or more characteristics in the frames of the video clips. The processor selects the video clips having the identified characteristics in the frames and creates a set of the selected video clips having the identified characteristics in the frames. The processor automatically trims the video clips based on frames that have the identified characteristics to create trimmed video clip segments, and then sends the trimmed video clip segments to the mobile device.

INFORMATION PROCESSING APPARATUS, REPRODUCTION PROCESSING APPARATUS, AND INFORMATION PROCESSING METHOD

There is provided an information processing apparatus, a reproduction processing apparatus, and an information processing method that improve data transmission efficiency. A preprocessing unit (102) generates, as scene configuration information indicating a configuration of a scene of 6DoF content including a three-dimensional object in a three-dimensional space, dynamic scene configuration information that changes over time and static scene configuration information that does not change over time, the static scene configuration information being scene configuration information different from the dynamic scene configuration information.

FORMULATING NATURAL LANGUAGE DESCRIPTIONS BASED ON TEMPORAL SEQUENCES OF IMAGES
20220405489 · 2022-12-22 ·

Implementations are described herein for formulating natural language descriptions based on temporal sequences of digital images. In various implementations, a natural language input may be analyzed. Based on the analysis, a semantic scope to be imposed on a natural language description that is to be formulated based on a temporal sequence of digital images may be determined. The temporal sequence of digital images may be processed based on one or more machine learning models to identify one or more candidate features that fall within the semantic scope. One or more other features that fall outside of the semantic scope may be disregarded. The natural language description may be formulated to describe one or more of the candidate features.

Information processing apparatus, control method, and program
11532160 · 2022-12-20 · ·

An information processing apparatus (2000) includes a summarizing unit (2040) and a display control unit (2060). The summarizing unit (2040) obtains a video (30) generated by each of a plurality of cameras (10). Furthermore, the summarizing unit (2040) performs a summarizing process on the video (30) and generates summary information of the video (30). The display control unit (2060) causes a display system (20) to display the video (30). Here, the display control unit (2060) causes the display system (20) to display the summary information of the video (30) in response to that a change in a display state of the video (30) the display system (20) satisfies a predetermined condition.

Electronic device for generating video comprising character and method thereof

An electronic device and method are disclosed. The electronic device includes a display, a processor and memory. The processor may implement the method, including analyzing, by a processor, a first video to identify any characters included in the first video, displaying one or more icons representing one or more characters identified in the first video via a display, receiving, by input circuitry, a first user input selecting a first icon representing a first character from among the one or more icons, based on the first user input, selecting image frames of the first video that include the first character from among image frames included in the first video, and generating, by the processor, a second video including the selected image frames. A second embodiment includes automatically selecting images from a gallery including one or more characters for generation of a video.

Smart summarization, indexing, and post-processing for recorded document presentation
11532333 · 2022-12-20 · ·

Systems and methods for providing summarization, indexing, and post-processing of a recorded document presentation are provided. The system accesses a structured document and recordings associated with a recorded presentation given using the structured document. The system analyzes, using machine-trained models, the structured document, audio and video recordings, and recording of operations performed during the presentation. The analyzing comprises generating a transcript of the audio recording, determining context of components of the structured document, and deriving context from the video recordings and recording of operations. Based on the analyzing, the system segments the recorded presentation into a plurality of segments and generates an index of the plurality of segments that is used for post-processing.

Systems and methods for generating comic books from video and images

Techniques for a comic book feature are described herein. A visual data stream of a video may be parsed into a plurality of frames. Scene boundaries may be determined to generate a scene using the plurality of frames where a scene includes a subset of frames. A key frame may be determined for the scene using the subset of frames. An audio portion of an audio data stream of the video may be identified that maps to the subset of frames based on time information. The key frame may be converted to a comic image based on an algorithm. First dimensions and placement for a data object may be determined for the comic image. The data object may include the audio portion for the comic image. A comic panel may be generated for the comic image that incorporates the data object using the determined first dimensions and the placement.

SYSTEM AND METHOD FOR OBJECT TRACKING AND METRIC GENERATION
20220391620 · 2022-12-08 ·

Disclosed herein is a system and method directed to object tracking and metric generation using a plurality of cameras. The system includes the plurality of cameras disposed around a playing surface in a mirrored configuration, where the plurality of cameras are time-synchronized. The system further includes logic that, when executed by a processor, causes performance of operations including: obtaining a sequence of images from the plurality of cameras, continuously detecting an object in image pairs at successive points in time, wherein each image pair corresponds to a single point in time, continuously determining a location of the object within the playing space through triangulation of the object within each image pair, detecting a player and the object within each image of a subset of image pairs of the sequence of images, identifying a sequence of interactions between the object and the player, and storing the sequence of interactions.