G06V30/262

Method for generating tag of video, electronic device, and storage medium

A method for generating a tag of a video, an electronic device, and a storage medium are related to a field of natural language processing and deep learning technologies. The detailed implementing solution includes: obtaining multiple candidate tags and video information of the video; determining first correlation information between the video information and each of the multiple candidate tags; sorting the multiple candidate tags based on the first correlation information to obtain a sort result; and generating the tag of the video based on the sort result.

System and method for synthetic image generation with localized editing

Embodiments described herein provide a system for generating synthetic images with localized editing. During operation, the system obtains a source image and a target image for image synthesis and selects a semantic element from the source image. The semantic element indicates a semantically meaningful part of an object depicted in the source image. The system then determines the style information associated with the source and target images. Subsequently, the system generates a synthetic image by transferring the style of the semantic element from the source image to the target image based on the feature representations. In this way, the system can facilitate localized editing of the target image.

RETOUCHING DIGITAL IMAGES UTILIZING LAYER SPECIFIC DEEP-LEARNING NEURAL NETWORKS
20230058793 · 2023-02-23 ·

The present disclosure relates to an image retouching system that automatically retouches digital images by accurately correcting face imperfections such as skin blemishes and redness. For instance, the image retouching system automatically retouches a digital image through separating digital images into multiple frequency layers, utilizing a separate corresponding neural network to apply frequency-specific corrections at various frequency layers, and combining the retouched frequency layers into a retouched digital image. As described herein, the image retouching system efficiently utilizes different neural networks to target and correct skin features specific to each frequency layer.

SPECIFICITY RANKING OF TEXT ELEMENTS AND APPLICATIONS THEREOF

Ranking a plurality of text elements, each comprising at least one word, by specificity. For each text element to be ranked, such a method includes computing an embedding vector that locates a text element in an embedding space, and selecting a set of text fragments from reference text. Each of these text fragments contains the text element to be ranked and further text elements. For each text fragment, the method calculates respective distances in the embedding space between the further text elements. The method further includes calculating a specificity score for the text element to be ranked and storing the specificity score. After ranking the plurality of text elements, a text data structure using the specificity scores for text elements to extract data having a desired specificity from the data structure may be processed.

DOCUMENT ANALYSIS TO IDENTIFY DOCUMENT CHARACTERISTICS AND APPENDING THE DOCUMENT CHARACTERISTICS TO A RECORD

In some implementations, a device may receive a document associated with a series of recurring events and associated with an account. The device may analyze, using at least one of an optical character recognition technique or a natural language processing technique, the document to identify one or more characteristics associated with the document. The device may match the document with a record included in a ledger associated with the account based on the one or more characteristics associated with the document, enabling the device to identify that the record is associated with a recurring event of the series of recurring events. The device may modify display information associated with the ledger to append at least one characteristic associated with the document to information associated with the record. The device may transmit, to a user device, the display information to cause the display information to be displayed by the user device.

Scene understanding and generation using neural networks

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.

Image processing device, image processing method, and image processing system

Provided are: an amodal segmentation unit that generates a set of first amodal masks indicating a probability that a particular pixel belongs to a relevant object for each of objects, with respect to an input image in which a plurality of the objects partially overlap; an overlap segmentation unit that generates an overlap mask corresponding only to an overlap region where the plurality of objects overlap in the input image based on an aggregate mask obtained by combining the set of first amodal masks generated for each of the objects and a feature map generated based on the input image; and an amodal mask correction unit that generates and outputs a second amodal mask, which includes an annotation label indicating a category of each of the objects corresponding to a relevant pixel, for each of pixels in the input image using the overlap mask and the aggregate mask.

Image identification device, method for performing semantic segmentation, and storage medium
11587345 · 2023-02-21 · ·

An image identification device includes an image acquisition unit configured to acquire an image, a feature value extraction unit configured to extract a plurality of feature values of the acquired image, a feature map creation unit configured to create a feature map for each of the plurality of feature values, and a multiplication unit configured to multiply each of the feature maps by a weighting factor that is an arbitrary positive value indicating a degree of importance of a feature.

METHOD AND APPARATUS FOR DETERMINING USER INTENT
20230044981 · 2023-02-09 ·

The disclosed embodiments describe methods, systems, and apparatuses for determining user intent. A method is disclosed comprising obtaining a session text of a user; calculating, by the processor, a feature vector based on the session text; determining probabilities that the session text belongs to a plurality of intent labels, the probabilities calculated using a multi-level hierarchal intent classification model, the intent labels assigned to levels in the multi-level hierarchal intent classification model; and assigning a user intent to the session text based on the probabilities.

SYSTEM AND METHOD FOR LEARNING SCENE EMBEDDINGS VIA VISUAL SEMANTICS AND APPLICATION THEREOF
20230041472 · 2023-02-09 ·

The present teaching relates to method, system, and programming for responding to an image related query. Information related to each of a plurality of images is received, wherein the information represents concepts co-existing in the image. Visual semantics for each of the plurality of images are created based on the information related thereto. Representations of scenes of the plurality of images are obtained via machine learning, based on the visual semantics of the plurality of images, wherein the representations capture concepts associated with the scenes.