G06V20/635

Learning representations of generalized cross-modal entailment tasks
11250299 · 2022-02-15 · ·

A method is provided for determining entailment between an input premise and an input hypothesis of different modalities. The method includes extracting features from the input hypothesis and an entirety of and regions of interest in the input premise. The method further includes deriving intra-modal relevant information while suppressing intra-modal irrelevant information, based on intra-modal interactions between elementary ones of the features of the input hypothesis and between elementary ones of the features of the input premise. The method also includes attaching cross-modal relevant information to the features from the input premise to the features from the input hypothesis to form a cross-modal representation, based on cross-modal interactions between pairs of different elementary features from different modalities. The method additionally includes classifying a relationship between the input premise and the input hypothesis using a label selected from the group consisting of entailment, neutral, and contradiction based on the cross-modal representation.

Subtitle detection for stereoscopic video contents
09762889 · 2017-09-12 · ·

A right image and a left image is provided. Then a disparity estimation map relating to both images is provided and a left subtitle detection map for the left image and a right subtitle detection map for the right image are generated. Each map indicates subtitle areas within an image. For said subtitle areas and based on said disparity estimation map a subtitle disparity value for X and Y directions common for all subtitle areas is determined. Said left and right subtitle maps and said subtitle X and Y disparity values are used in an image interpolation process.

DISPLAY METHOD AND APPARATUS FOR ITEM INFORMATION, DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
20220239988 · 2022-07-28 ·

This application relates to a method for displaying item information performed by a computer device. The method includes: displaying a live stream image comprising a live stream host performing in a live stream environment and at least one item located in the live stream environment, and the at least one item comprising a target item; in response to an item recognition operation, displaying an item tag corresponding to the at least one item; and in response to receiving a first selection operation on an item tag, displaying an item link region. Because the item tag includes an item keyword, preliminary information corresponding to the item can be directly provided to a viewer of the live stream. When the viewer account initiates a selection operation on the item tag, an item link region including item information is displayed. In this way, the viewer can further interact with the item information.

IMAGE DETECTION APPARATUS AND OPERATION METHOD THEREOF
20210406577 · 2021-12-30 · ·

An image detection apparatus includes: a display outputting an image; a memory storing one or more instructions; and a processor configured to execute the one or more instructions stored in the memory to: detect, by using a neural network, an additional information area in a first image output on the display; obtain style information of the additional information area from the additional information area; and detect, in a second image output on the display, an additional information area having style information different from the style information by using a model that has learned an additional information area having new style information generated based on the style information.

SOFTWARE TEST CASE MAINTENANCE
20210397546 · 2021-12-23 ·

Provided herein is technology relating to selecting an element in a software application and particularly, but not exclusively, to systems and methods for identifying a target element for testing a software application using artificial intelligence.

Methods and apparatus to measure brand exposure in media streams

Methods and apparatus to measure brand exposure in media streams are disclosed. An example method to determine brand exposures included in media content disclosed herein comprises determining whether a scene detected from a media stream corresponding to the media content matches a reference scene, identifying an expected region of interest in the detected scene based on information describing the reference scene when the detected scene is determined to match the reference scene and the reference scene is not specified to be a scene of no interest, and determining whether a reference brand identifier associated with the reference scene is included in the expected region of interest identified in the detected scene.

Robust Audio Identification with Interference Cancellation

Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.

Machine learning for recognizing and interpreting embedded information card content
11373404 · 2022-06-28 · ·

Metadata for highlights of a video stream is extracted from card images embedded in the video stream. The highlights may be segments of a video stream, such as a broadcast of a sporting event, that are of particular interest to one or more users. Card images embedded in video frames of the video stream are identified and processed to extract text. The text characters may be recognized by applying a machine-learned model trained with a set of characters extracted from card images embedded in sports television programming contents. The training set of character vectors may be pre-processed to maximize metric distance between the training set members. The text may be interpreted to obtain the metadata. The metadata may be stored in association with the portion of the video stream. The metadata may provide information regarding the highlights, and may be presented concurrently with playback of the highlights.

SYSTEM AND METHOD FOR IDENTIFYING AND DISPLAYING INFORMATION RELATED TO AN OFF SCREEN PLOT ELEMENT OR CHARACTER IN A MEDIA STREAM
20220198141 · 2022-06-23 ·

A method for identifying and displaying one or more of a plot element and a character in a media stream includes identifying a name of the plot element or character in a portion of the media stream, determining if the identified name is a name of an off screen plot element or character, and if the identified name is a name of an off screen plot element or character, displaying information related to the off screen plot element or character on a user terminal.

DISPLAY DEVICE AND DRIVING METHOD THEREOF

A display device includes: pixels arranged in a display area; a timing controller which generates image data of each frame based on an input image signal of the each frame, the timing controller including a logo controller which detects a logo image and a logo area including the logo image from the input image signal of the each frame to control luminance of the logo image; and a data driver which generates a data signal based on the image data and supplies the data signal to the pixels. The logo controller generates a first logo map based on an input image signal of a previous frame, generates a second logo map based on an input image signal of a current frame, and determines a similarity between the first logo map and the second logo map to selectively change luminance of a logo image of a next frame.