G06F16/73

RECORDING AND REPRODUCING APPARATUS AND METHOD THEREOF
20230062925 · 2023-03-02 ·

In a recording and reproducing apparatus and a recording and reproducing method for the recording and reproducing apparatus for recording and reproducing image information on a scene obtained through photographing, relative to a predetermined first recording medium and being capable of setting one or more chapters to each scene, a face recognizing process is executed for a photographed image based on the image information, an importance level of each chapter is set in accordance with a result of the face recognizing process for a very important person (VIP) set by a user, and each chapter having a relevant importance level among importance levels of respective chapters is selectively reproduced. A user can therefore find an object chapter and scene quickly and easily.

RECORDING AND REPRODUCING APPARATUS AND METHOD THEREOF
20230062925 · 2023-03-02 ·

In a recording and reproducing apparatus and a recording and reproducing method for the recording and reproducing apparatus for recording and reproducing image information on a scene obtained through photographing, relative to a predetermined first recording medium and being capable of setting one or more chapters to each scene, a face recognizing process is executed for a photographed image based on the image information, an importance level of each chapter is set in accordance with a result of the face recognizing process for a very important person (VIP) set by a user, and each chapter having a relevant importance level among importance levels of respective chapters is selectively reproduced. A user can therefore find an object chapter and scene quickly and easily.

INTELLIGENT AUTOMATED ASSISTANT FOR TV USER INTERACTIONS

Systems and processes are disclosed for controlling television user interactions using a virtual assistant. In an example process, a virtual assistant can interact with a television set-top box to control content shown on a television display. Speech input for the virtual assistant can be received from a device with a microphone. The speech input can comprise a query associated with content shown on the television display. A user intent of the query can be determined based on one or more of the content shown on the television display and a viewing history of media content. A result of the query can be caused to be displayed based on the determined user intent.

Method and apparatus for video searching, terminal and storage medium

Provided are a method and device for video search, a terminal and a storage medium. The method includes: receiving a first event generated by triggering a first control in a video playback page; acquiring, in response to the first event, a current video image frame played in the video playback page when the first event is triggered; acquiring a first to-be-searched target positioned by a second control in the current video image frame and a first display position of the first to-be-searched target in the current video image frame, and displaying the second control on the first display position; and acquiring a first recommendation result corresponding to the first to-be-searched target, and displaying the first recommendation result in a search result page.

Method and apparatus for video searching, terminal and storage medium

Provided are a method and device for video search, a terminal and a storage medium. The method includes: receiving a first event generated by triggering a first control in a video playback page; acquiring, in response to the first event, a current video image frame played in the video playback page when the first event is triggered; acquiring a first to-be-searched target positioned by a second control in the current video image frame and a first display position of the first to-be-searched target in the current video image frame, and displaying the second control on the first display position; and acquiring a first recommendation result corresponding to the first to-be-searched target, and displaying the first recommendation result in a search result page.

SHORT-TERM AND LONG-TERM MEMORY ON AN EDGE DEVICE

Systems and methods are provided for distributed video storage and search over edge computing devices having a short-term memory and a long-term memory. The method may comprise caching a first portion of data on a first device. The method may further comprise determining, at a second device, whether the first device has the first portion of data. The determining may be based on whether the first piece of data satisfies a specified criterion. The method may further comprise sending the data, or a portion of the data, and/or a representation of the data from the first device to a third device.

SHORT-TERM AND LONG-TERM MEMORY ON AN EDGE DEVICE

Systems and methods are provided for distributed video storage and search over edge computing devices having a short-term memory and a long-term memory. The method may comprise caching a first portion of data on a first device. The method may further comprise determining, at a second device, whether the first device has the first portion of data. The determining may be based on whether the first piece of data satisfies a specified criterion. The method may further comprise sending the data, or a portion of the data, and/or a representation of the data from the first device to a third device.

LEVERAGING UNSUPERVISED META-LEARNING TO BOOST FEW-SHOT ACTION RECOGNITION
20230113643 · 2023-04-13 ·

The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.

LEVERAGING UNSUPERVISED META-LEARNING TO BOOST FEW-SHOT ACTION RECOGNITION
20230113643 · 2023-04-13 ·

The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.

User interface for labeling, browsing, and searching semantic labels within video

A system for browsing, searching and/or viewing video content includes at least one user device and a server computer operably connected to the at least one user device. The server computer includes at least one processor operably connected to an electronic storage device, and the at least one processor is programmed with computer program instructions that, when executed, cause the server computer to present a first video on a user interface to the at least one user device, wherein the user interface presents scenes of the first video and semantic labels associated with the scenes of the first video, and wherein the user interface further presents confidence parameters associated with the scenes of the first video and the semantic labels. The server computer also obtains, during presentation of a first scene of the first video, a selection of a semantic label from a user of the at least one user device, then causes, during the presentation of the first scene of the first video, a jump from the first scene to a second scene of the first video based on the selection of the semantic label, the second scene being associated with the selected semantic label, and the jump from the first scene to the second scene causing the second scene to be presented on the user interface, and then updates the presentation of the semantic labels and the confidence parameters based on the jump from the first scene to the second scene such that the updated presentation of the semantic labels and the confidence parameters on the user interface are associated with the second scene.