Depth-color alignment with assistance metadata for transcoding of volumetric video
12477091 ยท 2025-11-18
Assignee
Inventors
- Bertrand Chupeau (Rennes, FR)
- Franck Thudor (Rennes, FR)
- Remy Gendrot (Montgermont, FR)
- Gerard Briand (Ploufragan, FR)
Cpc classification
H04N13/111
ELECTRICITY
H04N13/161
ELECTRICITY
H04N19/70
ELECTRICITY
H04N13/243
ELECTRICITY
H04N19/597
ELECTRICITY
H04N13/25
ELECTRICITY
International classification
H04N13/161
ELECTRICITY
H04N13/111
ELECTRICITY
Abstract
Methods, device and data stream are provided to encode unaligned multi-view plus depth images associated with assistance metadata for generating depth maps aligned with the unaligned color views. At the encoding stage, for an unaligned color view, the contribution of each depth map is evaluated. A subset of the most contributing depth maps is selected and metadata representative of this subset is encoded in the data stream of the unaligned multi-view plus depth images in association with the unaligned color view. At the decoding stage, to generate a depth map unaligned with the unaligned color view, only the depth maps referenced in the assistance metadata are warped.
Claims
1. A method comprising: obtaining a multi-view plus depth image comprising color views and first depth views, wherein at least one of the color views is unaligned with first depth views; for each unaligned color view, determining a subset of the first depth views by: for each first depth view, generating a second depth view warped to the unaligned color view; for each pixel of the unaligned color view, counting a contribution of each second depth view to the new depth view; and selecting the subset of the first depth views according to the counted contribution of the corresponding second depth views; and encoding the multi-view plus depth image and metadata representative of the subset of the first depth views for each unaligned color view in a data stream.
2. The method of claim 1, wherein a first depth view is selected if the contribution of the corresponding second view is greater than a given contribution value.
3. The method of claim 1, wherein a subset of the first depth views is determined for each unaligned color view of the multi-view plus depth image.
4. A device comprising a memory associated with a processor configured for: obtaining a multi-view plus depth image comprising color views and first depth views, wherein at least one of the color views is unaligned with a first depth view; for each unaligned color view, determining a subset of the first depth views by: for each first depth view, generating a second depth view warped to the unaligned color view; for each pixel of the unaligned color view, counting a contribution of each second depth view to the new depth view; and selecting the subset of the first depth views according to the counted contribution of the corresponding second depth views; and encoding the multi-view plus depth image and metadata representative of the subset of the first depth views in a data stream.
5. The device of claim 3, wherein a first depth view is selected if the contribution of the corresponding second view is greater than a given contribution value.
6. The device of claim 4, wherein a subset of the first depth views is determined for each unaligned color view of the multi-view plus depth image.
7. A method comprising: decoding from a data stream, a multi-view plus depth image, the multi-view plus depth image comprising color views and depth views, wherein at least one of the color views is unaligned with depth views and decoding metadata representative of subsets of depth views, each subset being associated with an unaligned color view; and for each unaligned color view, generating a depth view aligned with the unaligned color view by warping the depth views of the subset to the unaligned color view.
8. A device comprising a memory associated with a processor configured for: decoding from a data stream, a multi-view plus depth image, the multi-view plus depth image comprising color views and depth views, wherein at least one of the color views is unaligned with a depth view and decoding metadata representative of subsets of depth views, each subset being associated with an unaligned color view; and for each unaligned color view, generating a depth view aligned with the unaligned color view by warping the depth views of the subset to the unaligned color view.
9. A non-transitory processor-readable storage medium carrying data representative of: a multi-view plus depth image, the multi-view plus depth image comprising color views and depth views, wherein at least one of the color views is unaligned with a depth view; and metadata representative of subsets of depth views, each subset being associated with an unaligned color view.
Description
4. BRIEF DESCRIPTION OF DRAWINGS
(1) The present disclosure will be better understood, and other specific features and advantages will emerge upon reading the following description, the description making reference to the annexed drawings wherein:
(2)
(3)
(4)
(5)
(6)
5. DETAILED DESCRIPTION OF EMBODIMENTS
(7) The present principles will be described more fully hereinafter with reference to the accompanying figures, in which examples of the present principles are shown. The present principles may, however, be embodied in many alternate forms and should not be construed as limited to the examples set forth herein. Accordingly, while the present principles are susceptible to various modifications and alternative forms, specific examples thereof are shown by way of examples in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the present principles to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present principles as defined by the claims.
(8) The terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting of the present principles. As used herein, the singular forms a, an and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms comprises, comprising, includes and/or including when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being responsive or connected to another element, it can be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being directly responsive or directly connected to other element, there are no intervening elements present. As used herein the term and/or includes any and all combinations of one or more of the associated listed items and may be abbreviated as /.
(9) It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of the present principles.
(10) Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
(11) Some examples are described with regard to block diagrams and operational flowcharts in which each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
(12) Reference herein to in accordance with an example or in an example means that a particular feature, structure, or characteristic described in connection with the example can be included in at least one implementation of the present principles. The appearances of the phrase in accordance with an example or in an example in various places in the specification are not necessarily all referring to the same example, nor are separate or alternative examples necessarily mutually exclusive of other examples.
(13) Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims. While not explicitly described, the present examples and variants may be employed in any combination or sub-combination.
(14)
(15)
(16) In some use cases, the MVD source has to be re-generated at the decoder side, and transcoded to a MVD signal with aligned color and depth views. Such transcoding can be required for example to input an autostereoscopic multi-view display. The generation of aligned MVD images may also be required when transcoding, for example a MIV (MPEG Immersive Video) signal to other 3D compression formats that assume spatially aligned color and depth components, such as 3D-HEVC (High Efficiency Video Coding). The generation of depth maps aligned with the decoded color views involve similar depth warping, selecting and filtering operations as in the visibility stage of a rendering algorithm. However, as not a single viewport depth map is to be computed (but the depth maps corresponding to every source color view), the computational demand increases proportionally to the number of color cameras.
(17)
(18) According to the present principles, an unaligned MVD image is encoded with assistance metadata that provide the decoder with information to limit the number of depth map warping operations, in order to decrease the complexity, computational burden and memory footprint of transcoding. The bigger the number of acquisition cameras (for example, about 30 color cameras around a basketball field when recording a match), the more useful and relevant such assistance metadata for the transcoding process.
(19) The present principles relate to the transmission of a static or a dynamic 3D scene, ingested as a multi-view plus depth (MVD) representation by a volumetric video encoderfor example an MIV encoderwith spatially unaligned color and depth views, originating from non-colocated color and depth sensors.
(20) According to the present principles, the assistance metadata indicate, for each source color view that is not associated with a corresponding aligned depth map, a subset of source depth views that has to be taken into account to generate a depth map aligned with each given color view.
(21) These assistance metadata allow to minimize the computational and buffer memory demand of transcoding at decoder side, by decreasing the number of depth map warping operations (the memory and computational demand is shifted to the encoder side), and to increase the quality of transcoded depth maps, by decreasing the risks of errors when selecting among different depth candidate values, originating from different depth sensors, as a smaller subset of more reliable candidates is provided.
(22) According to the present principles, for each color view, a subset of the source depth views that may contribute to the generation of an aligned depth map is determined. The following method is proposed:
(23) TABLE-US-00001 for each color view unaligned with a depth view initialize to 0, a histogram H[j] of the contributions of source depth views to the target view depth map; for each source depth view D[j] generate a depth map {tilde over (D)}[j] warped to the color view by; - de-projecting / reprojecting pixels towards the target view and - splat-based rasterization or triangulation; for each pixel (u,v) of the color view: select the warped depth map {tilde over (D)}[j.sub.select] which gives the final target depth value {tilde over (D)}[j.sub.select][v][u] (e.g. majority voting process taking into account the weight of each source depth view with respect to the target view, which depends on their distance), and increment the corresponding bin of the histogram: H[j.sub.select] discard the source depth views which histogram bin is under a given threshold (or keep the n better histogram bins).
(24) The output of this analysis stage is, for each source color view Vc.sub.i, i [1,9], the number N.sub.i and the indices {Idx.sub.k, k {1, N.sub.i}} of the depth views Vd.sub.j which depth map may contribute to the depth of Vc.sub.i. In the example layout, N.sub.i is in [0, 4] and Idx.sub.k is in [1, 4].
(25) In a first embodiment, the assistance metadata are provided using a modified 3D transmission format. The considered 3D transmission format is MIV, but the present principles apply to any other 3D format able to ingest an MVD input with unaligned color and depth views.
(26) According to the present principles, a data structure like the miv_view_params_list( ) of MIV, which carries global information on the camera parameters, comprises the assistance metadata. Similar to what is done for the camera extrinsic, camera intrinsic and depth quantization parameters, a dynamic temporal update mechanism can also be implemented for the depth warping parameters.
(27) TABLE-US-00002 Descriptor casps_miv_extension( ) { casme_depth_low_quality_flag u(1) casme_depth_quantization_params_present_flag u(1) casme_vui_params_present_flag u(1) if( casme_vui_params_present_flag ) vui_parameters( ) casme_unaligned_depth_flag u(1) }
(28) casme_unaligned_depth_flag equal to 1 specifies that, in the view parameters list representing a volumetric frame, some views contain geometry but no texture, while some other views, with different camera parameters, contain texture but no geometry. casme_unaligned_depth_equal_flag equal to 0 specifies that, when video geometry data is present, the views which contain texture also contain geometry.
(29) TABLE-US-00003 Descriptor caf_miv_extension( ) { if( nal_unit_type == NAL_CAF_IDR ) { miv_view_params_list( ) } else { came_update_extrinsics_flag u(1) came_update_intrinsics_flag u(1) if( casme_depth_quantization_params_present_flag ) came_update_depth_quantization_flag u(1) if( casme_unaligned_depth_flag ) came_update_depth_warping_flag u(1) if( came_update_extrinsics_flag ) miv_view_params_update_extrinsics( ) if( came_update_intrinsics_flag ) miv_view_params_update_intrinsics( ) if( came_update_depth_quantization_flag ) miv_view_params_update_depth_quantization( ) if( came_update_depth_warping_flag ) miv_view_params_update_depth_warping( ) } }
(30) came_update_depth_warping_flag equal to 1 indicates that the miv_view_params_update_depth_warping ( ) syntax structure is present in this syntax structure. came_update_depth_warping_flag equal to 0 indicates that the miv_view_params_update_depth_warping( ) syntax structure is not present in this syntax structure. When not present, the value of came_update_depth_warping_flag is inferred to be equal to 0.
(31) TABLE-US-00004 Descriptor miv_view_params_list( ) { mvp_num_views_minus1 u(16) mvp_explicit_view_id_flag u(1) if( mvp_explicit_view_id_flag ) for( v = 0; v <= mvp_num_views_minus1; v++ ) mvp_view_id[ v ] u(16) for( v = 0; v <= mvp_num_views_minus1; v++ ) { camera_extrinsics( v ) mvp_inpaint_flag[ v ] u(1) } mvp_intrinsic_params_equal_flag u(1) for( v = 0; v <= mvp_intrinsic_params_equal_flag ? 0 : mvp_num_views_minus1; v ++ ) camera_intrinsics( v ) if( casme_depth_quantization_params_present_flag ) { mvp_depth_quantization_params_equal_flag u(1) for( v = 0; v <= mvp_depth_quantization_equal_flag ? 0 : mvp_num_views_minus1 ; v++ ) depth_quantization( v ) } mvp_pruning_graph_params_present_flag u(1) if ( mvp_pruning graph_params_present_flag ) for( v = 0; v <= mvp_num_views_minus1; v++ ) pruning_parents( v) if( casme_unaligned_depth_flag ) mvp_num_texture_only_views_minus1 u(16) for( v = 0; v <= mvp_num_texture_only_views_minus1; v++ ) { mvp_texture_only_view_idx[ v ] u(16) depth_warping( mvp_texture_only_view_idx[ v ] ) }
(32) mvp_num_texture_only_views_minus1 plus 1 specifies the number of views which contain texture but not geometry. The value of mvp_num_texture_only_views_minus1 shall be in the range of 0 to mvp_num_views_minus1, inclusive.
(33) mvp_texture_only_view_idx[v] specifies the index, in the list of all views representing the volumetric frame, of the view with index v in the sublist of views which contain texture but not geometry. The value of mvp_texture_only_view_idx[v] shall be in the range of 0 to mvp_num_views_minus1, inclusive.
(34) dw_num_depth_views_minus1[i] plus 1 specifies the number of views containing geometry that may contribute to the geometry of the view with index i. The value of dw_num_depth_views_minus1[v] shall be in the range of 0 to mvp_num_views_minus1, inclusive.
(35) dw_depth_view_idx[i][k] specifies the index, in the list of all views representing the volumetric frame, of the view with index k in the sublist of views which contain geometry and may contribute to the geometry of the view with index i. The value of dw_depth_view_idx[i][k] shall be in the range of 0 to mvp_num_views_minus1, inclusive.
(36) TABLE-US-00005 Descriptor miv_view_params_update_depth_warping( ) { mvpudw_num_view_updates_minus1 u(16) for( i = 0; i <= mvpudw_num_view_updates_minus1; i++ ) { mvpudw_view_idx[ i ] u(16) depth_warping( mvpudw_view_idx[ i ] ) } }
(37) mvpudw_num_view_updates_minus1 plus 1 specifies the number of depth_warping(i) syntax structures that are present within this syntax structure. The value of mvpudw_num_view_updates_minus1 shall be in the range of 0 to mvp_num_texture_only_views_minus1, inclusive.
(38) mvpudw_view_idx[i] specifies the view index for which updated depth warping parameters will be signalled. The value of mvpudw_view_idx[i] shall be in the range of 0 to mvp_num_views_minus1, inclusive, and shall be equal to one of the values of mvp_texture_only_view_idx[v] for v in the range of 0 to mvp_num_texture_only_views_minus1. It is a requirement of bitstream conformance to this version of this document that the value of mvpudw_view_idx[j] shall not be equal to mvpudw_view_idx[k] for all j!=k.
(39) In a second embodiment, the signalling of depth warping assistance metadata is carried by an optional Supplemental Enhancement Information (SEI) message, with similar syntax and semantics as in the first embodiment. The SEI message can be inserted at any composition time in the bitstream and is valid for the remainder of the sequence, as long as the depth warping information is not refreshed by a new SEI message.
(40) TABLE-US-00006 SEI message Persistence scope Viewing space The remainder of the bitstream or until a new viewing space SEI message Viewing space The remainder of the bitstream or handling until a new view space handling SEI message Geometry upscaling The remainder of the bitstream or parameters until a new geometry upscaling parameters SEI message Atlas view enabled Specified by the semantics of the SEI message. OMAF v1 compatible The remainder of the sequence Geometry assistance The coded atlas access unit containing the SEI message Depth warping assistance The remainder of the sequence
(41) TABLE-US-00007 Descriptor depth_warping_assistance( payloadSize ) { dwa_num_texture_only_views_minus1 u(16) for( v = 0; v <= dwa_num_texture_only_views_minus1; v++ ) { dwa_texture_only_view_idx[ v ] u(16) dwa_num_depth_views_minus1[ v ] for(i = 0; i <= dwa_num_depth_views_minus1[ v ]; i++ ) dwa_depth view_idx[ v ][ i ] u(16) } }
(42) At the decoder side, the assistance metadata are used to generate depth views aligned with color views. The output depth maps spatially aligned with the color views are computed using a visibility step of a view rendering algorithm. According to the present principles, this method is fed only with the subset of decoded depth views signalled by the metadata.
(43) TABLE-US-00008 for each decoded color view retrieve from metadata the view indices j of the depth maps candidate to warping; for each decoded source depth view {circumflex over (D)}[j] ; generate a depth map {tilde over (D)}[j] warped to the target view by - de-projecting / reprojecting pixels towards the target view, and - splat-based rasterization or triangulation; for each pixel (u,v) of the target view: select a single warped depth value {tilde over (D)}[j.sub.select][v][u] (e.g. majority voting process taking into account the weight of each source depth view with respect to the target view, which depends on their distance); and clean out the final warped depth map using a post median filtering to remove outliers.
(44)
(45) Device 30 comprises following elements that are linked together by a data and address bus 31: a microprocessor 32 (or CPU), which is, for example, a DSP (or Digital Signal Processor); a ROM (or Read Only Memory) 33; a RAM (or Random Access Memory) 34; a storage interface 35; an I/O interface 36 for reception of data to transmit, from an application; and a power supply (not represented in
(46) In accordance with an example, the power supply is external to the device. In each of mentioned memory, the word register used in the specification may correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). The ROM 33 comprises at least a program and parameters. The ROM 33 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 32 uploads the program in the RAM and executes the corresponding instructions.
(47) The RAM 34 comprises, in a register, the program executed by the CPU 32 and uploaded after switch-on of the device 30, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
(48) The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (PDAs), and other devices that facilitate communication of information between end-users.
(49) Device 30 is linked, for example via bus 31 to a set of sensors 37 and to a set of rendering devices 38. Sensors 37 may be, for example, cameras, microphones, temperature sensors, Inertial Measurement Units, GPS, hygrometry sensors, IR or UV light sensors or wind sensors. Rendering devices 38 may be, for example, displays, speakers, vibrators, heat, fan, etc.
(50) In accordance with examples, the device 30 is configured to implement a method according to the present principles, and belongs to a set comprising: a mobile device; a communication device; a game device; a tablet (or tablet computer); a laptop; a still picture camera; a video camera.
(51)
(52) The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, Smartphones, tablets, computers, mobile phones, portable/personal digital assistants (PDAs), and other devices that facilitate communication of information between end-users.
(53) Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, texture processing, and other processing of images and related texture information and/or depth information. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
(54) Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (CD), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (RAM), or a read-only memory (ROM). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
(55) As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
(56) A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.