Patent classifications
G06T3/00
DIFFERENT ATLAS PACKINGS FOR VOLUMETRIC VIDEO
Methods, devices and stream are disclosed to encode and decode a scene (such as a point cloud) in the context of a patch-based transmission of a volumetric video content. Attributes of points of the scene are projected onto patches. Every point has a geometry attribute. For other attributes, like transparency of displacement attribute, some points may have no value. According to the present principles, each attribute is encoded in a different atlas with its own layout. This allow to save pixel rate in memory of the renderer.
VIDEO ANALYTICS SYSTEM
A computer-implemented method for sampling and analyzing data from at least one image frame from at least one series of image frames captured by at least one sensor, comprises: defining at least one sampling model, wherein the sampling model is defined in a virtual 3D-vector space and is based on one or more predetermined shapes in the virtual 3D-vector space, applying the at least one sampling model to at least one part of the at least one image frame of the at least one series of image frames, wherein applying of the at least one sampling model defines at least one area of the at least one image frame from which data is to be extracted, extracting data from the at least one area of the at least one image frame defined by the sampling model, and analyzing the extracted data.
MULTI-SCALE TRANSFORMER FOR IMAGE ANALYSIS
The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.
VVIDEO PROCESSING AND PLAYBACK SYSTEMS AND METHODS
A video processing method for a circular panoramic video recording including an original field of view region at a first resolution and a further peripheral region outside the original field of view at a second, lower resolution, the method including the steps of performing spatial upscaling of the further peripheral region to a resolution higher than the second resolution.
ENDOSCOPE WITH SYNTHETIC APERTURE MULTISPECTRAL CAMERA ARRAY
A method which may effectively provide an endoscope or other surgical instrument with a synthetic multi-camera array may comprise capturing using one or more cameras located at a distal tip of the surgical instrument, a set of images comprising first and second images. For each image in such set of images, that image may be captured by a corresponding camera from the one or more cameras, may be captured when the distal dip of the instrument is located at a corresponding point in space. Such a method may also comprise generating a three dimensional image based on compositing representations of a structure in the first and second image after applying a non-rigid transformation to one or more of those representations.
METHODS AND SYSTEMS FOR HIGH DEFINITION IMAGE MANIPULATION WITH NEURAL NETWORKS
Methods and systems for high-resolution image manipulation are disclosed. An original high-resolution image to be manipulated is obtained, as well as a driving signal indicating a manipulation result. The original high-resolution image is down-sampled to obtain a low-resolution image to be manipulated. Using a trained manipulation generator, a low-resolution manipulated image and a motion field are generated from the low-resolution image. The motion field represent pixel displacements of the low-resolution image to obtain the manipulation indicated by the driving signal. A high-frequency residual image is computed from the original high-resolution image. A high-frequency manipulated residual image is generated using the motion field. A high-resolution manipulated image is outputted by combining the high-frequency manipulated residual image and a low-frequency manipulated image generated from the low-resolution manipulated image by up-sampling.
Deep Saliency Prior
Techniques for tuning an image editing operator for reducing a distractor in raw image data are presented herein. The image editing operator can access the raw image data and a mask. The mask can indicate a region of interest associated with the raw image data. The image editing operator can process the raw image data and the mask to generate processed image data. Additionally, a trained saliency model can process at least the processed image data within the region of interest to generate a saliency map that provides saliency values. Moreover, a saliency loss function can compare the saliency values provided by the saliency map for the processed image data within the region of interest to one or more target saliency values. Subsequently, the one or more parameter values of the image editing operator can be modified based at least in part on the saliency loss function.
MESSAGE DISTRIBUTION SERVICE
A method of distributing location-based message contents over a messaging system and that are displayable on consumer devices present at associated locations. The method comprises, for each message of a set of messages, obtaining a message content and a message location search term, submitting the message location search term to a web mapping service so that a service application programming interface (API) searches with the message location search term, and receiving a result list including a plurality of message locations corresponding to the message. The method further comprises adding the message content and the plurality of message locations to a message distribution database or set of linked databases that is or are searchable by location. This facilitates the sending of relevant message location(s) to the consumer devices.
ARRANGEMENT FOR PRODUCING HEAD RELATED TRANSFER FUNCTION FILTERS
When three-dimensional audio is produced by using headphones, particular HRTF-filters are used to modify sound for the left and right channels of the headphone. As the morphology of every ear is different, it is beneficial to have HRTF-filters particularly designed for the user of headphones. Such filters may be produced by deriving ear geometry from a plurality of images taken with an ordinary camera, detecting necessary features from images and fitting said features to a model that has been produced from accurately scanned ears comprising representative values for different sizes and shapes. Taken images are sent to a server (52) that performs the necessary computations and submits the data further or produces the requested filter.
MOBILE INFORMATION TERMINAL
When a first user makes a video call with a second user of the other side by using the video call function, a first state is set as a state in which the enclosure is flatly placed on a first surface of an object, and in which a face of the first user is included within a range of an angle of view AV1 of the front camera C1. In the first state, the mobile information terminal 1 detects a first region including the face of the first user from a wide angle image that is captured by the front camera C1, trims a first image corresponding to the first region, creates a transmission image to be transmitted to a terminal of the other side on the basis of the first image, and transmits the transmission image to the terminal of the other side.