G06F16/7343

SYSTEMS AND METHODS FOR VIDEO RETRIEVAL AND GROUNDING

Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.

Retrieval of video and vehicle behavior for a driving scene described in search text

The retrieval device extracts a feature corresponding to search text by inputting the search text into a pre-trained text feature extraction model. The retrieval device then, for plural combinations stored in a database associating a text description including plural sentences, with a vehicle-view video, and with vehicle behavior data representing temporal vehicle behavior, computes a text distance represented by a difference between a feature extracted from each sentence of the text description associated with the video and vehicle behavior data, and the feature corresponding to the search text. The retrieval device outputs as the search result a prescribed number of pairs of video and vehicle behavior data pairs in sequence from the smallest text distance according to the text distances.

Validation of documents against specifications for delivery of creatives on a video delivery system

In some embodiments, a method receives a file describing characteristics for delivery of a creative on a video delivery system. The file is queried to identify elements in the string that define metadata. The string is written in a structural language and defines characteristics for the delivery of the creative. The method retrieves tag metadata for tags that define structural elements and validates the tag metadata based on a first specification. Media file metadata is obtained for a media file based on a link to the media file and the media file metadata is validated based on a second specification. The method outputs a result based on the validations. The creative is eligible for insertion during a break of streaming a main video on the video delivery system when the tag metadata and the media file metadata are validated.

Method and System for Generating Elements of Recorded information in Response to a Secondary User's Natural Language Input
20210271708 · 2021-09-02 ·

The invention relates to a computerized method and computer-based system for generating elements of recorded information for a secondary user in response to the secondary user's natural language input. The recorded information could be in the form of, for example, video, audio, audiovisual, text files, or other recordable media. The method and system of the invention permit a secondary user to access, in real time, information of an original source (e.g., allows a descendant to obtain a multimedia response stored by or on behalf of an ancestor) via a computer network, with the response being accessible via a television, audio player, Bluetooth or wireless device, or any other electronic and digital system. The access to such information can be initiated by the secondary user's input provided through use of, for example, voice response technology, including speech recognition and natural language software. The ability to access the information as recorded by the original source increases the perceived and, hopefully the actual, level of validity and accuracy, while also simulating, with multiple secondary user communication entries and responses, a ‘face-to-face conversation’ between the secondary user and the original source.

VALIDATION OF DOCUMENTS AGAINST SPECFICATIONS FOR DELIVERY OF CREATIVES ON A VIDEO DELIVERY SYSTEM

In some embodiments, a method receives a file describing characteristics for delivery of a creative on a video delivery system. The file is queried to identify elements in the string that define metadata. The string is written in a structural language and defines characteristics for the delivery of the creative. The method retrieves tag metadata for tags that define structural elements and validates the tag metadata based on a first specification. Media file metadata is obtained for a media file based on a link to the media file and the media file metadata is validated based on a second specification. The method outputs a result based on the validations. The creative is eligible for insertion during a break of streaming a main video on the video delivery system when the tag metadata and the media file metadata are validated.

RETRIEVAL DEVICE, TRAINING DEVICE, RETRIEVAL SYSTEM, AND RECORDING MEDIUM

The retrieval device extracts a feature corresponding to search text by inputting the search text into a pre-trained text feature extraction model. The retrieval device then, for plural combinations stored in a database associating a text description including plural sentences, with a vehicle-view video, and with vehicle behavior data representing temporal vehicle behavior, computes a text distance represented by a difference between a feature extracted from each sentence of the text description associated with the video and vehicle behavior data, and the feature corresponding to the search text. The retrieval device outputs as the search result a prescribed number of pairs of video and vehicle behavior data pairs in sequence from the smallest text distance according to the text distances.

Method and System for Retrieving Video Temporal Segments
20210004605 · 2021-01-07 ·

A method and a system for retrieving video temporal segments are provided. In the method, a video is analyzed to obtain frame feature information of the video; the frame feature information is input into an encoder to output first data relating to temporal information of the video; the first data and a retrieval description for retrieving video temporal segments of the video are input into a decoder to output second data; attention computation training is conducted according to the first data and the second data; video temporal segments of the video corresponding to the retrieval description are determined according to the attention computation training.

MULTI-DETECTOR PROBABILISTIC REASONING FOR NATURAL LANGUAGE QUERIES
20200311072 · 2020-10-01 ·

Systems and methods for solving queries on image data are provided. The system includes a processor device coupled to a memory device. The system includes a detector manager with a detector application programming interface (API) to allow external detectors to be inserted into the system by exposing capabilities of the external detectors and providing a predetermined way to execute the external detectors. An ontology manager exposes knowledge bases regarding ontologies to a reasoning engine. A query parser transforms a natural query into query directed acyclic graph (DAG). The system includes a reasoning engine that uses the query DAG, the ontology manager and the detector API to plan an execution list of detectors. The reasoning engine uses the query DAG, a scene representation DAG produced by the external detectors and the ontology manager to answer the natural query.

System and method for natural language driven search and discovery in large data sources

Presenting natural-language-understanding (NLU) results can include redundancies and awkward sentence structures. In an embodiment of the present invention, a method includes, responsive to receiving a result to a NLU query, loading a matching template of a plurality of templates stored in a memory. Each template has mask fields associated with at least one property. The method compares the properties of the mask fields of each of the templates to properties of the query and properties of the result, and selects the matching template. The method further completes the matching template by inserting fields of the result into corresponding mask fields of the matching template. The method may further suppress certain mask fields of the matching template to increase brevity and improve the naturalness of the response when appropriate based on the results of the NLU query. The method further presents the completed matching template to a user via a display.

VIDEO GENERATION METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM
20240127859 · 2024-04-18 ·

The present disclosure provides a video generation method, an apparatus, a device, a storage medium, and a program product, and the method includes: in response to a first instruction triggered for an input text, generating first video editing data based on the input text, in which the first video editing data includes a first video clip and an audio clip, a first target video clip among the first video clip is a vacant clip; displaying the first video clip and the audio clip on a video editing track of a video editor; in response to triggering a second instruction for the target video clip on the video editor, filling the first target video clip with a target video to obtain second video editing data; generating a first video based on the second video editing data.