Patent classifications
G06T9/001
UAV video aesthetic quality evaluation method based on multi-modal deep learning
The present disclosure provides a UAV video aesthetic quality evaluation method based on multi-modal deep learning, which establishes a UAV video aesthetic evaluation data set, analyzes the UAV video through a multi-modal neural network, extracts high-dimensional features, and concatenates the extracted features, thereby achieving aesthetic quality evaluation of the UAV video. There are four steps, step one to: establish a UAV video aesthetic evaluation data set, which is divided into positive samples and negative samples according to the video shooting quality; step two to: use SLAM technology to restore the UAV's flight trajectory and to reconstruct a sparse 3D structure of the scene; step three to: through a multi-modal neural network, extract features of the input UAV video on the image branch, motion branch, and structure branch respectively; and step four to: concatenate the features on multiple branches to obtain the final video aesthetic label and video scene type.
Cost-driven framework for progressive compression of textured meshes
Techniques of compressing level of detail (LOD) data involve sharing a texture image LOD among different mesh LODs for single-rate encoding. That is, a first texture image LOD corresponding to a first mesh LOD may be derived by refining a second texture image LOD corresponding to a second mesh LOD. This sharing is possible when texture atlases of LOD meshes are compatible.
Techniques and apparatus for lossless lifting for attribute coding
A method of point cloud attribute coding includes obtaining an attribute signal corresponding to a point cloud; determining whether lossless lifting is enabled; based on determining that lossless lifting is enabled, modifying at least one from among a plurality of quantization weight and a plurality of lifting coefficients; decomposing the attribute signal into a plurality of detail signals and a plurality of approximation signals based on the modified at least one from among the plurality of quantization weights and the plurality of lifting coefficients; generating a bitstream representing the point cloud based on the plurality of detail signals and the plurality of approximation signals; and transmitting the bitstream.
A METHOD AND APPARATUS FOR ENCODING AND DECODING OF MULTIPLE-VIEWPOINT 3DOF+ CONTENT
A method for encoding a volumetric video content representative of a 3D scene is disclosed. The method comprises obtaining a reference viewing box and an intermediate viewing box defined within the 3D scene. For the reference viewing bounding box, the volumetric video reference subcontent is encoded as a central image and peripheral patches for parallax. For the intermediate viewing bounding box, the volumetric video intermediate sub-content is encoded as intermediate central patches which are differences between the intermediate central image and the reference central image.
INFORMATION PROCESSING DEVICE AND METHOD
There is provided an information processing device and method capable of suppressing a reduction in encoding efficiency. When performing, for attribute information of each point of a point cloud that represents an object having a three-dimensional shape as a set of points, hierarchization of the attribute information by recursively repeating classification of a prediction point for deriving a difference value between the attribute information and a predicted value of the attribute information and a reference point used for deriving the predicted value with respect to the reference point, the reference point is set on the basis of a centroid of points. The present disclosure can be applied to, for example, an information processing device, an image processing device, an encoding device, a decoding device, an electronic device, an information processing method, a program, and the like.
Video-based point cloud compression model to world signaling information
Apparatuses, methods, and computer programs are disclosed to implement video-based cloud compression model to world signaling. An example apparatus includes at least one processor; and at least one non-transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: provide first signaling information comprising information related to a world domain, wherein the world domain is a point cloud frame that is represented by a number of points in a first volumetric coordinate system; and provide second signaling information comprising information related to a conversion of a model domain to the world domain, wherein the model domain represents the point cloud frame by a number of points in a second volumetric coordinate system.
FEATURE-BASED MULTI-VIEW REPRESENTATION AND CODING
Aspects of the disclosure provide a method, an apparatus, and non-transitory computer-readable storage medium for video decoding. The apparatus includes processing circuitry configured to decode at least one first key picture of pictures from a multi-view bitstream. The pictures correspond to different views. The at least one first key picture corresponds to at least one first view of the different views. The processing circuitry determines first feature information of content in the at least one first key picture. The processing circuitry decode, based on the multi-view bitstream, a first feature change to the first feature information. The first feature change indicates a content change between a key picture in the at least one first key picture and a first picture. The processing circuitry reconstructs the first picture based on the decoded first feature change, the first feature information, and the key picture in the at least one first key picture.
THREE-DIMENSIONAL DATA ENCODING METHOD, THREE-DIMENSIONAL DATA DECODING METHOD, THREE-DIMENSIONAL DATA ENCODING DEVICE, AND THREE-DIMENSIONAL DATA DECODING DEVICE
A three-dimensional data encoding method of encoding three-dimensional points includes: calculating a predicted value of attribute information of a first three-dimensional point in a prediction mode, using one or more items of attribute information of one or more second three-dimensional points in the vicinity of the first three-dimensional point; calculating a prediction residual that is a difference between the attribute information of the first three-dimensional point and the predicted value; and generating a bitstream including the prediction residual and prediction mode information indicating the prediction mode. The prediction mode is: one prediction mode among two or more prediction modes when a type of the attribute information of the first three-dimensional point is first attribute information including elements more than a predetermined threshold value; and one fixed prediction mode when the type is second attribute information including elements equal to or less than the predetermined threshold value.
DECODING METHODS, ENCODING METHOD, DECODING DEVICE, AND ENCODING DEVICE
A decoding method includes: calculating a first position on a polar coordinate system, using a first residual generated by predicting a position of a first three-dimensional point on the polar coordinate system; calculating a second position by transforming the polar coordinate system of the first position into a Cartesian coordinate system; decoding the position of the first three-dimensional point on the Cartesian coordinate system by adjusting the second position using a second residual of the Cartesian coordinate system; and decoding attribute information of the first three-dimensional point, using the first position before the transforming.
Point cloud compression using video encoding with time consistent patches
A system comprises an encoder configured to compress attribute and/or spatial information for a point cloud and/or a decoder configured to decompress compressed attribute and/or spatial information for the point cloud. To compress the attribute and/or spatial information, the encoder is configured to convert a point cloud into an image based representation. Also, the decoder is configured to generate a decompressed point cloud based on an image based representation of a point cloud. In some embodiments, an encoder generates time-consistent patches for multiple version of the point cloud at multiple moments in time and uses the time-consistent patches to generate image based representations of the point cloud at the multiple moments in time.