G06V20/635

PROGRESSIVE LOCALIZATION METHOD FOR TEXT-TO-VIDEO CLIP LOCALIZATION
20230260267 · 2023-08-17 ·

A progressive localization method for text-to-video clip localization. The method comprises: first, respectively extracting features of two modes, namely a video mode and a text mode by using different feature extraction methods; then progressively selecting different step sizes, and learning the correlation between the video and the text in multiple stages; and finally, training a model in an end-to-end manner based on the correlation loss of each stage. Moreover, the fine time granularity stage is fused with information of the coarse time granularity stage by means of a condition feature update module and up-sampling connection, such that different stages are mutually promoted. Different stages can pay attention to clips with different time granularities, and the model can cope with the situation that the length of a target clip is obviously changed based on the interrelation between the stages.

METHOD AND SYSTEM FOR THE IDENTIFICATION AND HANDLING OF EVENTS IN A SYSTEM FOR THE SYNCHRONIZATION AND COMBINED DISPLAY OF INFORMATION
20220133430 · 2022-05-05 ·

A method and system for the identification and treatment of events for synchronization and combined display of information in a sequence of frames relating to the progress of a diagnostic investigation or surgery performed on a patient.

Frames are searched for the occurrence of an event, the search carried out by comparison between identified events and sought-after events, the latter stored in an archive; in the event of recognition of an event, executing a computer program for carrying out a treatment on the frame and/or the activation of a program procedure for signaling the occurrence of the event; memorizing the event or the frame that contains it, in the correct temporal reference with respect to the sequence of frames.

VIDEO PROCESSING

A video processing method and apparatus is provided. The video processing method includes: extracting at least two types of modal information from a received target video; extracting text information from the at least two types of modal information based on extraction manners corresponding to the at least two types of modal information; and performing matching between preset object information of a target object and the text information to determine an object list corresponding to the target object included in the target video.

Caption Anomaly Detection

Systems, apparatuses, and methods are described for detecting anomalies in closed captioning or other video presentation systems. Anomaly detection may involve comparing detected captions that are delivered to one or more end devices (return captions) with corresponding scheduled captions. Other types of information may also be similarly compared between original scheduled instances of information to be delivered with the actual (return) delivered information. Such other types of information may include, for example, ratings information (such as V-chip ratings and/or flags) and/or content (e.g., advertisement) insertion information such as SCTE-35 signaling.

User-exhibit distance based collaborative interaction method and system for augmented reality museum

The present invention discloses a user-exhibit distance based collaborative interaction method and system for an augmented reality museum. The method includes: detecting and acquiring dynamic position information of a user, and calculating a sensing distance of the user in real time according to the dynamic position information of the user; establishing a distance model with an exhibit as a center according to the sensing distance, and setting interaction authority of the user according to the distance model and a real-time sensing distance of the user; dynamically matching a single-user interaction mode and a multi-user collaborative interaction mode within the interaction authority of the user to the user according to the interaction authority of the user corresponding to the sensing distance of the user; executing, by the user, a single-user interaction behavior and a multi-user collaborative interaction behavior in real time according to the single-user interaction mode and the multi-user collaborative interaction mode; and realizing recommendation of the exhibit between users according to a real-time single-user interaction behavior result and a real-time multi-user collaborative interaction behavior result. The method and system promote the user to have a real-time and interesting interaction with other surrounding users while learning about museum exhibits progressively.

APPARATUSES AND METHODS FOR SELECTIVELY INSERTING TEXT INTO A VIDEO RESUME
20230298630 · 2023-09-21 · ·

Aspects relate to apparatuses and methods for selectively inserting text into a video resume. An exemplary apparatus includes a processor and a memory communicatively connected to the processor, the memory containing instructions configuring the processor to receive a video resume from a user, divide the video resume is into temporal sections, acquire a plurality of textual inputs from a user, wherein the plurality of textual inputs pertains to the same user of received video resume, classify the plurality of textual inputs to corresponding temporal sections of the received video resume and display, as a function of the classification, the received video resume with a corresponding plurality of textual inputs.

WORKFLOW FOR AUTOMATIC MEASUREMENT OF DOPPLER PIPELINE

Workflows for automatic measurement of Doppler is provided. In various embodiments, a plurality of frames of a medical video are read. A mode label indicative of a mode of each of the plurality of frames is determined. At least one of the plurality of frames is provided to a trained feature generator. The at least one of the plurality of frames have the same mode label. At least one feature vector is obtained from the trained feature generator corresponding to the at least one of the plurality of frames. At least one feature vector is provided to a trained classifier. A valve label indicative of a valve is obtained from the trained classifier corresponding to the at least one of the plurality of frames. One or more measurement is extracted indicative of a disease condition from those of the at least one of the plurality of frames matching a predetermined valve label.

METHODS AND APPARATUS FOR EFFICIENT MEDIA SEARCH AND ANALYSIS
20220030325 · 2022-01-27 ·

Methods and apparatus for providing attribute-based search of media assets such as video and audio assets, and associated augmented reality functions including dynamic provision of relevant secondary content relating to the identified attribute(s). In one embodiment, media or content assets are ingested into a processing system and processed according to one or more attribute detection, identification, and characterization algorithms. Attributes detected may include for example the presence of tangible items such as clothing or chattels, particular persons such as celebrities, and/or certain contexts such as sporting activities and musical performances, as rendered within the media asset. In one implementation, the characterized assets are provided a unique ID, and stored so as to permit cross-correlation based on, e.g., the identified and characterized attributes. Secondary content (e.g., advertising or promotional content) is correlated to each identified asset, and dynamically served upon a user providing some threshold level of interaction with the attribute.

VIDEO PROCESSING FOR EMBEDDED INFORMATION CARD LOCALIZATION AND CONTENT EXTRACTION
20220027631 · 2022-01-27 · ·

Metadata for one or more highlights of a video stream may be extracted from one or more card images embedded in the video stream. The highlights may be segments of the video stream, such as a broadcast of a sporting event, that are of particular interest. According to one method, video frames of the video stream are stored. One or more information cards embedded in a decoded video frame may be detected by analyzing one or more predetermined video frame regions. Image segmentation, edge detection, and/or closed contour identification may then be performed on identified video frame region(s). Further processing may include obtaining a minimum rectangular perimeter area enclosing all remaining segments, which may then be further processed to determine precise boundaries of information card(s). The card image(s) may be analyzed to obtain metadata, which may be stored in association with at least one of the video frames.

METHOD FOR PROCESSING A VIDEO FILE COMPRISING AUDIO CONTENT AND VISUAL CONTENT COMPRISING TEXT CONTENT

This invention relates to a computer implemented method (10) for processing a video file, said video file comprising audio content and visual content, the visual content comprising text content, wherein the method comprises: (S11) extracting the text content in the visual content; (S12) generating a context information for the audio content based on the text content extracted from said visual content; and (S13) converting the audio content into text by using the context information generated based on the text content extracted from the visual content of the video file.