Patent classifications
G06T9/002
Data preprocessing and data augmentation in frequency domain
Methods and systems are provided for implementing preprocessing operations and augmentation operations upon image datasets transformed to frequency domain representations, including decoding images of an image dataset to generate a frequency domain representation of the image dataset; performing a resizing operation based on resizing factors on the image dataset in a frequency domain representation; performing a reshaping operation based on reshaping factors on the image dataset in a frequency domain representation; and performing a cropping operation on the image dataset in a frequency domain representation. The methods and systems may further include performing an augmentation operation on the image dataset in a frequency domain representation. Methods and systems of the present disclosure may free learning models from computational overhead caused by transforming image datasets into frequency domain representations. Furthermore, computational overhead caused by inverse transformation operations is also alleviated.
Tunable models for changing faces in images
Techniques are disclosed for changing the identities of faces in images. In embodiments, a tunable model for changing facial identities in images includes an encoder, a decoder, and dense layers that generate either adaptive instance normalization (AdaIN) coefficients that control the operation of convolution layers in the decoder or the values of weights within such convolution layers, allowing the model to change the identity of a face in an image based on a user selection. A separate set of dense layers may be trained to generate AdaIN coefficients for each of a number of facial identities, and the AdaIN coefficients output by different sets of dense layers can be combined to interpolate between facial identities. Alternatively, a single set of dense layers may be trained to take as input an identity vector and output AdaIN coefficients or values of weighs within convolution layers of the decoder.
IMAGE PROCESSING METHOD, METHOD FOR TRAINING IMAGE PROCESSING MODEL DEVICES AND STORAGE MEDIUM
An image processing method includes: obtaining a first latent code by encoding an image to be edited in a Style (S) space of a Generative Adversarial Network (GAN), in which the GAN is a StyleGAN; encoding the text description information, obtaining a text code of a Contrastive Language-Image Pre-training (CLIP) model, and obtaining a second latent code by mapping the text code on the S space; obtaining a target latent code that satisfies distance requirements by performing distance optimization on the first latent code and the second latent code; and generating a target image based on the target latent code.
MOTION COMPENSATION FOR NEURAL NETWORK ENHANCED IMAGES
A device includes a memory and one or more processors. The memory is configured to store instructions. The one or more processors are configured to execute the instructions to apply a neural network to a first image to generate an enhanced image. The one or more processors are also configured to execute the instructions to adjust at least a portion of a high-frequency component of the enhanced image based on a motion compensation operation to generate an adjusted high-frequency image component. The one or more processors are further configured to execute the instructions to combine a low-frequency component of the enhanced image and the adjusted high-frequency image component to generate an adjusted enhanced image.
ARTIFICIAL INTELLIGENCE FOR SEMI-AUTOMATED DYNAMIC COMPRESSION OF IMAGES
In non-limiting examples of the present disclosure, systems, methods and devices for determining image compression optimums are provided. An image may be processed with a machine learning model that has been trained to identify object types in digital images. A first object and a first object type of the first object may be identified in the image. A first compressed version of the image may be generated, wherein the first compressed version has a first storage size. The first object and the first object type of the first object may be identified in the first compressed version of the image. A second compressed version of the image may be generated based on the identification of the first object and the first object type in the first compressed version of the image. The second compressed version may have a smaller storage size than the first storage size.
IMAGE COMPRESSION AND DECODING, VIDEO COMPRESSION AND DECODING: TRAINING METHODS AND TRAINING SYSTEMS
A computer-implemented method of training an image generative network f.sub.θ for a set of training images, in which an output image {circumflex over (x)} is generated from an input image x of the set of training images non-losslessly, and in which a proxy network is trained for a gradient intractable perceptual metric that evaluates a quality of an output image {circumflex over (x)} given an input image x, the method of training using a plurality of scales for input images from the set of training images. In an embodiment, a blindspot network b.sub.α is trained which generates an output image {tilde over (x)} from an input image x. Related computer systems, computer program products and computer-implemented methods of training are disclosed.
DYNAMIC ASSIGNMENT OF DOWN SAMPLING INTERVALS FOR DATA STREAM PROCESSING
- Joydeep Ray ,
- Ben Ashbaugh ,
- Prasoonkumar Surti ,
- Pradeep Ramani ,
- Rama Harihara ,
- Jerin C. Justin ,
- Jing Huang ,
- Xiaoming Cui ,
- Timothy B. Costa ,
- Ting Gong ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kumar Balasubramanian ,
- Anil Thomas ,
- Oguz H. Elibol ,
- Jayaram Bobba ,
- Guozhong Zhuang ,
- Bhavani Subramanian ,
- Gokce Keskin ,
- Chandrasekaran Sakthivel ,
- Rajesh Poornachandran
Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
Image manipulation by text instruction
A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
Apparatus for estimating sameness of point cloud data and system for estimating sameness of point cloud data
For information about point cloud data, a point cloud data sameness estimation apparatus and a point cloud data sameness estimation system in which accuracy of evaluating sameness is improved are provided. In the present disclosure, a point cloud data sameness estimation apparatus for estimating sameness of objects that are sources of two 3-dimensional point cloud datasets includes a point cloud data acquisition unit configured to acquire first point cloud data and second point cloud data including 3-dimensional point cloud data; a first neural network configured to output a first point cloud data feature, with information about the first point cloud data as an input into the first neural network; a second neural network configured to output a second point cloud data feature, with information about the second point cloud data as an input into the second neural network; and a sameness evaluation unit configured to output an evaluation about sameness of the first point cloud data and the second point cloud data, based on the first point cloud data feature and the second point cloud data feature, wherein a weight is mutually shared by the first neural network and the second neural network.
ADVANCED VIDEO CODING USING A KEY-FRAME LIBRARY
A method including generating at least one of a key frame and an inter-predicted frame based on a received video stream including at least one video frame, the inter-predicted frame being generated using information from the key frame, determining whether the key frame is stored at a receiving device, selecting video data as one of the key frame or a key frame identifier representing the key frame based on whether the key frame is stored at the receiving device, and communicating at least one of the video data and the inter-predicted frame.