G06V20/635

Video processing for embedded information card localization and content extraction
11615621 · 2023-03-28 · ·

Metadata for one or more highlights of a video stream may be extracted from one or more card images embedded in the video stream. The highlights may be segments of the video stream, such as a broadcast of a sporting event, that are of particular interest. According to one method, video frames of the video stream are stored. One or more information cards embedded in a decoded video frame may be detected by analyzing one or more predetermined video frame regions. Image segmentation, edge detection, and/or closed contour identification may then be performed on identified video frame region(s). Further processing may include obtaining a minimum rectangular perimeter area enclosing all remaining segments, which may then be further processed to determine precise boundaries of information card(s). The card image(s) may be analyzed to obtain metadata, which may be stored in association with at least one of the video frames.

Optimized reduced bitrate encoding for titles and credits in video content

Embodiments include systems, methods, and computer-readable media for optimized reduced bitrate encoding for text-based content in video frames. Example methods may include determining that a first segment of video content includes a content scene, determining that a second segment of the video content includes text, and determining a first encoder configuration to encode the first segment of video content, where the first encoder configuration includes a first encoding parameter setting. Example methods may include determining a second encoder configuration to encode the second segment of the video content, where the second encoder configuration includes a second encoding parameter setting, encoding the first segment using the first encoder configuration, and encoding the second segment using the second encoder configuration. The first segment may be encoded at a first bitrate that is greater than a second bitrate at which the second segment is encoded.

Systems and methods of presenting video overlays

Systems and methods are provided for relocating an overlay overlapping information in content. The systems and methods may comprise receiving a content item, the content item comprising a video image, and determining a first screen position of an information box (e.g., a score box) in the video image. Determining may be performed with image analysis and/or a machine learning model. The system receives an overlay image (e.g., a channel logo) with a second screen position and determines if the second screen position (e.g., for the logo) overlaps the first screen position (e.g., for the score). In response to determining the second screen position (e.g., of the logo) overlaps the first screen position (e.g., the score), the system modifies the second screen position (e.g., for the logo). Then the system generates for display the overlay image on the video in the modified screen position. The system may not relocate the overlay if the overlay is a high priority.

Acquiring public opinion and training word viscosity model

A public opinion acquisition method and device, a word viscosity model training method and device, a server, and a medium are provided in the present disclosure. And the present disclosure relates to the technical field of artificial intelligence, specifically to image recognition and natural language processing, which can be used in a cloud platform. A video public opinion acquisition method includes: receiving a public opinion acquisition request, the public opinion acquisition request including a public opinion keyword to be acquired; matching the public opinion keyword to be acquired with video data including a recognition result, wherein the recognition result is obtained by performing predefined content recognition on the video data, the predefined content recognition including text recognition and image recognition; and determining video data that matches with the public opinion keyword to be acquired as result video data.

Semantically-guided template generation from image content

Techniques for template generation from image content includes extracting information associated with an input image. The information comprises: 1) layout information indicating positions of content corresponding to a content type of a plurality of content types within the input image; and 2) text attributes indicating at least a font of text included in the input image. A user-editable template having the characteristics of the input image is generated based on the layout information and the text attributes.

CONTENT RECOGNITION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

A method for content recognition includes acquiring, from a content for recognition, a text piece and a media piece associated with the text piece, performing a first feature extraction on the text piece to obtain text features, performing a second feature extraction on the media piece associated with the text piece to obtain media features, and determining feature association measures between the media features and the text features. A feature association measure for a first feature in the media features and a second feature in the text features indicating an association degree between the first feature and the second feature. The method further includes adjusting the text features based on the feature association measures to obtain adjusted text features, and performing a recognition based on the adjusted text features to obtain a content recognition result of the content. Apparatus and non-transitory computer-readable storage medium counterpart embodiments are also contemplated.

OBJECT CHARACTERIZATION USING ONE OR MORE NEURAL NETWORKS

Apparatuses, systems, and techniques are presented to detect one or more objects in one or more images. In at least one embodiment, one or more neural networks can be used to detect one or more objects in one or more images based, at least in part, on textual descriptions of the one or more objects.

PERSONALIZED VR CONTROLS AND COMMUNICATIONS
20230128658 · 2023-04-27 ·

Systems and methods for personalized controls and communications in virtual environments are provided. A virtual reality (VR) profile may be stored in memory for a user. Such VR profile may specify a cue associated with custom instructions executable to modify one or more virtual display elements. An interactive session associated with a virtual environment in which the user is participating via a user device may be monitored based on the VR profile stored for the user. The cue specified by the VR profile may be detected as being present in the monitored communication session. The virtual elements may be modified within a presentation of the virtual environment provided to the user device in accordance with the executable instructions associated with the cue specified by the VR profile of the user.

Systems and methods for augmented reality application for annotations and adding interfaces to control panels and screens

Example implementations described herein systems and method for providing a platform to facilitate augmented reality (AR) overlays, which can involve stabilizing video received from a first device for display on a second device and for input made to a portion of the stabilized video at the second device, generating an AR overlay on a display of the first device corresponding to the portion of the stabilized video.

Robust audio identification with interference cancellation

Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.