G06V20/635

ROBUST AUDIO IDENTIFICATION WITH INTERFERENCE CANCELLATION

Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.

SEMANTICALLY-GUIDED TEMPLATE GENERATION FROM IMAGE CONTENT

Techniques for template generation from image content includes extracting information associated with an input image. The information comprises: 1) layout information indicating positions of content corresponding to a content type of a plurality of content types within the input image; and 2) text attributes indicating at least a font of text included in the input image. A user-editable template having the characteristics of the input image is generated based on the layout information and the text attributes

Detection of transitions between text and non-text frames in a video stream

Detecting the start of a credit roll within video program may allow for the automatic extension of video recordings among other functions. The start of the credit roll may be detected by determining the number of text blocks within a sequence of frames and identifying a point in the sequence of frames where a difference between the number of text blocks in frames occurring before the point and the number of text blocks in frames occurring after the point is greatest and exceeds a specified threshold. Text blocks may be identified within each frame by partitioning the frame into one or more segments and recording the segments having a pixel of a sufficiently high contrast. Contiguous segments may be merged or combined into single blocks, which may then be filtered to remove noise and false positives. Additional content may be inserted into the credit roll frames.

Methods and apparatus to measure brand exposure in media streams

Methods and apparatus to measure brand exposure in media streams are disclosed. Example apparatus disclosed herein are to determine a first histogram based on at least one of luminance components or chrominance components of a first frame of video, and determine a second histogram based on at least one of luminance components or chrominance components of a second frame of the video. Disclosed example apparatus are also to detect a transition in the video based on the first histogram and the second histogram, and responsive to the detection of the transition in the video. Disclosed example apparatus are further to process a region of interest within at least one of the first frame or the second frame to detect a logo in the region of interest.

Display device and driving method thereof

A display device includes: pixels arranged in a display area; a timing controller which generates image data of each frame based on an input image signal of the each frame, the timing controller including a logo controller which detects a logo image and a logo area including the logo image from the input image signal of the each frame to control luminance of the logo image; and a data driver which generates a data signal based on the image data and supplies the data signal to the pixels. The logo controller generates a first logo map based on an input image signal of a previous frame, generates a second logo map based on an input image signal of a current frame, and determines a similarity between the first logo map and the second logo map to selectively change luminance of a logo image of a next frame.

Processing method and apparatus, terminal device and medium
11676385 · 2023-06-13 · ·

A target video and video description information corresponding to the target video are acquired; salient object information of the target video is determined; a key frame category of the video description information is determined; and the target video, the video description information, the salient object information and the key frame category are input into a processing model to obtain a timestamp of an image corresponding to the video description information in the target video.

Systems and methods of extracting text from a digital image
09830508 · 2017-11-28 · ·

A method of extracting text from a digital image is provided. The method of extracting text includes receiving a digital image at an image processor where the digital image includes a textual object and a graphical object. A mask is generated based on the digital image. The mask includes a pattern having a first pattern area associated with the textual object and a second pattern area associated with the graphical object. The mask is applied to the digital image creating a transformed digital image. The transformed digital image includes a portion of the digital image associated with the textual object. Character recognition is performed on the portion of the digital image associated with the textual object of the transformed digital image to create a recognized text output.

VISUAL DOMAIN DETECTION SYSTEMS AND METHODS

Disclosed is an effective domain name defense solution in which a domain name string may be provided to or obtained by a computer embodying a visual domain analyzer. The domain name string may be rendered or otherwise converted to an image. An optical character recognition function may be applied to the image to read out a text string which can then be compared with a protected domain name to determine whether the text string generated by the optical character recognition function from the image converted from the domain name string is similar to or matches the protected domain name. This visual domain analysis can be dynamically applied in an online process or proactively applied in an offline process to hundreds of millions of domain names.

3D captions with face tracking

Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing at least one program and method for performing operations comprising: receiving, by one or more processors that implement a messaging application, a video feed from a camera of a user device; detecting, by the messaging application, a face in the video feed; in response to detecting the face in the video feed, retrieving a three-dimensional (3D) caption; modifying the video feed to include the 3D caption at a position in 3D space of the video feed proximate to the face; and displaying a modified video feed that includes the face and the 3D caption.

APPARATUS, SYSTEMS AND METHODS FOR CONTROL OF MEDIA CONTENT EVENT RECORDING
20170311026 · 2017-10-26 ·

Systems and methods are operable to record a user-specified media content event at a media device. An exemplary embodiment grabs a series of subsequently received image frames from a preceding broadcasting media content event after a monitored real time reaches a closing credits monitor time, wherein the closing credits monitor time is a recording start time less a predefined duration. The embodiment then analyzes each of the image frames to identify an occurrence of text presented in the analyzed image frame, determines that the identified text corresponds to closing credits of the preceding broadcasting media content event if the at least one attribute of the identified text matches a corresponding predefined closing credits attribute, and initiates a start of the recording of the user-specified media content event in response to determining that the identified text corresponds to the closing credits of the preceding broadcasting media content event.