Patent classifications
G06V30/274
RETOUCHING DIGITAL IMAGES UTILIZING LAYER SPECIFIC DEEP-LEARNING NEURAL NETWORKS
The present disclosure relates to an image retouching system that automatically retouches digital images by accurately correcting face imperfections such as skin blemishes and redness. For instance, the image retouching system automatically retouches a digital image through separating digital images into multiple frequency layers, utilizing a separate corresponding neural network to apply frequency-specific corrections at various frequency layers, and combining the retouched frequency layers into a retouched digital image. As described herein, the image retouching system efficiently utilizes different neural networks to target and correct skin features specific to each frequency layer.
GENERATING SYNTHESIZED DIGITAL IMAGES UTILIZING A MULTI-RESOLUTION GENERATOR NEURAL NETWORK
This disclosure describes methods, non-transitory computer readable storage media, and systems that generate synthetized digital images via multi-resolution generator neural networks. The disclosed system extracts multi-resolution features from a scene representation to condition a spatial feature tensor and a latent code to modulate an output of a generator neural network. For example, the disclosed systems utilizes a base encoder of the generator neural network to generate a feature set from a semantic label map of a scene. The disclosed system then utilizes a bottom-up encoder to extract multi-resolution features and generate a latent code from the feature set. Furthermore, the disclosed system determines a spatial feature tensor by utilizing a top-down encoder to up-sample and aggregate the multi-resolution features. The disclosed system then utilizes a decoder to generate a synthesized digital image based on the spatial feature tensor and the latent code.
Scene understanding and generation using neural networks
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for image rendering. In one aspect, a method comprises receiving a plurality of observations characterizing a particular scene, each observation comprising an image of the particular scene and data identifying a location of a camera that captured the image. In another aspect, the method comprises receiving a plurality of observations characterizing a particular video, each observation comprising a video frame from the particular video and data identifying a time stamp of the video frame in the particular video. In yet another aspect, the method comprises receiving a plurality of observations characterizing a particular image, each observation comprising a crop of the particular image and data characterizing the crop of the particular image. The method processes each of the plurality of observations using an observation neural network to determine a numeric representation as output.
PARAMETERIZED NEIGHBORHOOD MEMORY ADAPTATION
Systems and techniques that facilitate parameterized neighborhood memory adaptation for semantic role labeling are provided. In various embodiments, a system can comprise a receiver component that can access a semantic role labeling model trained on a training dataset. In various aspects, the system can further comprise an execution component that can retrain a labeler of the semantic role labeling model based on a set of neighborhood parameters learned from the training dataset. In various instances, the execution component can execute, after retraining, the semantic role labeling model on an inputted sentence.
Image processing device, image processing method, and image processing system
Provided are: an amodal segmentation unit that generates a set of first amodal masks indicating a probability that a particular pixel belongs to a relevant object for each of objects, with respect to an input image in which a plurality of the objects partially overlap; an overlap segmentation unit that generates an overlap mask corresponding only to an overlap region where the plurality of objects overlap in the input image based on an aggregate mask obtained by combining the set of first amodal masks generated for each of the objects and a feature map generated based on the input image; and an amodal mask correction unit that generates and outputs a second amodal mask, which includes an annotation label indicating a category of each of the objects corresponding to a relevant pixel, for each of pixels in the input image using the overlap mask and the aggregate mask.
Image identification device, method for performing semantic segmentation, and storage medium
An image identification device includes an image acquisition unit configured to acquire an image, a feature value extraction unit configured to extract a plurality of feature values of the acquired image, a feature map creation unit configured to create a feature map for each of the plurality of feature values, and a multiplication unit configured to multiply each of the feature maps by a weighting factor that is an arbitrary positive value indicating a degree of importance of a feature.
METHOD AND APPARATUS FOR DETERMINING USER INTENT
The disclosed embodiments describe methods, systems, and apparatuses for determining user intent. A method is disclosed comprising obtaining a session text of a user; calculating, by the processor, a feature vector based on the session text; determining probabilities that the session text belongs to a plurality of intent labels, the probabilities calculated using a multi-level hierarchal intent classification model, the intent labels assigned to levels in the multi-level hierarchal intent classification model; and assigning a user intent to the session text based on the probabilities.
SYSTEM AND METHOD FOR LEARNING SCENE EMBEDDINGS VIA VISUAL SEMANTICS AND APPLICATION THEREOF
The present teaching relates to method, system, and programming for responding to an image related query. Information related to each of a plurality of images is received, wherein the information represents concepts co-existing in the image. Visual semantics for each of the plurality of images are created based on the information related thereto. Representations of scenes of the plurality of images are obtained via machine learning, based on the visual semantics of the plurality of images, wherein the representations capture concepts associated with the scenes.
Pedestrian attribute identification and positioning method and convolutional neural network system
A method for pedestrian attribute identification and positioning is provided. The method includes: performing feature extraction on a to-be-detected image at a plurality of different abstraction degrees, to obtain a plurality of first feature maps of a pedestrian attribute; performing convolution on the plurality of first feature maps, to obtain a plurality of second feature maps; mapping each second feature map to a plurality of areas (bins) that overlap each other, and performing max pooling on each bin, to obtain a plurality of high-dimensional feature vectors, where the plurality of bins that overlap each other evenly cover each second feature map; processing the plurality of high-dimensional feature vectors into a low-dimensional vector, to obtain an identification result of the pedestrian attribute; and further obtaining a positioning result of the pedestrian attribute based on the plurality of second feature maps and the plurality of high-dimensional feature vectors.
Text Line Detection
Implementations of the present disclosure provide a solution for text line detection. In this solution, a first text region comprising a first portion of at least a first text element and a second text region comprising a second portion of at least a second text element are determined from an image. A first feature representation is extracted from the first text region and a second feature representation is extracted from the second text region. The first and second feature representations comprise at least one of an image eature representation or a semantic feature representation of the image. A link relationship between the first and second text regions can then be determined based at least in part on the first and second feature representations. The link relationship can indicate whether the first and second portions of the first and second text elements are located in a same text line. In this way, by detecting text regions and determining the link relationship thereof based on their feature representations, the accuracy and efficiency for detecting text lines in various images can be improved