Image reconstruction method, system, device and computer-readable storage medium
11257283 · 2022-02-22
Assignee
Inventors
Cpc classification
H04N21/21805
ELECTRICITY
H04N13/293
ELECTRICITY
G06F3/167
PHYSICS
H04N13/349
ELECTRICITY
G06F2203/04806
PHYSICS
H04N13/282
ELECTRICITY
H04N13/279
ELECTRICITY
G06F3/04842
PHYSICS
G06V20/52
PHYSICS
H04N13/189
ELECTRICITY
G06T3/4038
PHYSICS
H04N13/117
ELECTRICITY
H04N13/172
ELECTRICITY
H04N2013/0081
ELECTRICITY
H04N23/90
ELECTRICITY
International classification
H04N13/117
ELECTRICITY
G06F3/0488
PHYSICS
H04N13/293
ELECTRICITY
G06T3/40
PHYSICS
H04N13/349
ELECTRICITY
H04N13/172
ELECTRICITY
G06F3/0484
PHYSICS
H04N13/282
ELECTRICITY
H04N13/189
ELECTRICITY
H04N13/279
ELECTRICITY
Abstract
Image reconstruction methods, systems, devices, and computer-readable storage media are provided. The method includes: acquiring a multi-angle free-perspective image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, where the image combination includes multiple groups of texture images and depth maps that are synchronized at multiple angles and have corresponding relationships; selecting a corresponding group of texture images and depth maps in the image combination at a user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment.
Claims
1. A method implemented by a computing device, the method comprising: acquiring a multi-angle free-perspective image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, wherein the image combination includes multiple groups of texture images and depth maps, and the multiple groups of texture images and depth maps are synchronized at multiple angles and have corresponding relationships, wherein the corresponding relationships are indicated in an association relationship field in a stitched image that stores the multiple groups of texture images and depth maps; selecting a corresponding group of texture images and depth maps in the image combination at a user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to a virtual viewpoint position at the user interaction moment.
2. The method of claim 1, wherein selecting the corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the preset rule according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies a preset positional relationship and/or a preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination.
3. The method of claim 2, wherein selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies the preset positional relationship and/or the preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting a preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.
4. The method of claim 3, wherein selecting the preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting texture images and depth maps corresponding to 2 to N capturing devices closest to the virtual viewpoint position according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, wherein N is the number of all capturing devices that capture the image combination.
5. The method of claim 1, wherein combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment comprises: performing forward projection on the selected depth maps of the corresponding group in the image combination at the user interaction moment respectively, to project to the virtual position at the user interaction moment; performing post-processing on the depth maps after the forward projection respectively; performing backward projection on the selected texture images of the corresponding group in the image combination at the user interaction moment respectively; and fusing respective virtual texture images generated after the backward projection.
6. The method of claim 5, wherein after fusing respective the virtual texture images generated after the backward projection, the method further comprises: performing inpainting on the fused texture image to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
7. The method of claim 5, wherein performing post-processing on the depth maps after the forward projection respectively comprises at least one of the following: performing foreground padding processing on the depth maps after the forward projection respectively; and performing pixel-level filtering processing on the depth maps after the forward projection respectively.
8. The method of claim 5, wherein fusing respective virtual texture images generated after the backward projection comprises: fusing the respective virtual texture images generated after the backward projection using a global weight determined by a distance between the virtual viewpoint position and a position of a capturing device that captures a corresponding texture image in the image combination, according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment.
9. The method of claim 1, wherein combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment comprises: respectively projecting the depth maps of the corresponding group in the image combination at the user interaction moment to the virtual viewpoint position at the user interaction moment according to a spatial geometric relationship, to form a depth map of the virtual viewpoint position; and copying from pixel points in the texture images of the corresponding group to the generated virtual texture images corresponding to the virtual viewpoint position according to the projected depth maps, to form the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment; and fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
10. The method of claim 9, wherein fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment comprises: performing weighting processing on pixels in corresponding positions in the virtual texture images corresponding to respective corresponding groups in the image combination at the user interaction moment, to obtain pixel values of the corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction moment; and for a first pixel where a pixel value is zero in the reconstructed image of the virtual viewpoint position at the user interaction moment, performing inpainting using pixels around the first pixel in the reconstructed image, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
11. The method of claim 9, wherein fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment comprises: for a first pixel where a pixel value is zero in the virtual texture images corresponding to respective corresponding groups in the image combination at the user interaction moment, performing inpainting on the first pixel using around pixel values respectively; and performing weighting processing on pixel values in corresponding positions in the virtual texture images corresponding to the respective corresponding groups after the inpainting, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
12. The method of claim 1, wherein acquiring a multi-angle free-perspective image combination and parameter data of the image combination comprises: decoding acquired compressed multi-angle free-perspective image data, to obtain the multi-angle free-perspective image combination and the parameter data corresponding to the image combination.
13. A system, comprising: one or more processors; memory; an acquiring unit, stored in the memory and executable by the one or more processors, configured to acquire multiple-angle free-perspective image combination, parameter data of the image combination, and the virtual viewpoint position information based on user interaction, wherein the image combination includes multiple groups of texture images and the depth maps, and the multiple groups of texture images and depth maps are synchronized at multiple angles and have corresponding relationships, wherein the corresponding relationships are indicated in an association relationship field in a stitched image that stores the multiple groups of texture images and depth maps; a selecting unit, stored in the memory and executable by the one or more processors, configured to select a corresponding group of texture images and the depth maps in the image combination at the user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and an image reconstruction unit, stored in the memory and executable by the one or more processors, configured to combine and render the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
14. A computer-readable storage medium having computer instructions stored thereon that, when executed by one or more processors of a computing device, cause the one or more processors to perform acts comprising: acquiring a multi-angle free-perspective image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, wherein the image combination includes multiple groups of texture images and depth maps, and the multiple groups of texture images and depth maps are synchronized at multiple angles and have corresponding relationships, wherein the corresponding relationships are indicated in an association relationship field in a stitched image that stores the multiple groups of texture images and depth maps; selecting a corresponding group of texture images and depth maps in the image combination at a user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
15. The computer-readable storage medium of claim 14, wherein selecting the corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the preset rule according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies a preset positional relationship and/or a preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination.
16. The computer-readable storage medium of claim 15, wherein selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies the preset positional relationship and/or the preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting a preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.
17. The computer-readable storage medium of claim 16, wherein selecting the preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting texture images and depth maps corresponding to 2 to N capturing devices closest to the virtual viewpoint position according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, wherein N is the number of all capturing devices that capture the image combination.
18. The computer-readable storage medium of claim 14, wherein combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment comprises: performing forward projection on the selected depth maps of the corresponding group in the image combination at the user interaction moment respectively, to project to the virtual position at the user interaction moment; performing post-processing on the depth maps after the forward projection respectively; performing backward projection on the selected texture images of the corresponding group in the image combination at the user interaction moment respectively; and fusing respective virtual texture images generated after the backward projection.
19. The computer-readable storage medium of claim 14, wherein after fusing respective the virtual texture images generated after the backward projection, the method further comprises: performing inpainting on the fused texture image to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
20. The computer-readable storage medium of claim 14, wherein performing post-processing on the depth maps after the forward projection respectively comprises at least one of the following: performing foreground padding processing on the depth maps after the forward projection respectively; and performing pixel-level filtering processing on the depth maps after the forward projection respectively.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) In order to illustrate the example embodiments of the present disclosure more clearly, the drawings used in the description of the example embodiments will be briefly introduced below. Apparently, the drawings in the following description represent some of the example embodiments of the present disclosure, and other drawings may be obtained from these drawings by those skilled in the art without any creative efforts.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
DETAILED DESCRIPTION
(47) To enable a person of ordinary skill in the art to better understand the solutions of the present disclosure, hereinafter, technical solutions in the example embodiments of the present disclosure will be clearly and thoroughly described with reference to the accompanying drawings in the example embodiments of the present disclosure. Example embodiments described herein merely represent some of the example embodiments of the present disclosure. Other example embodiments obtained by a person of ordinary skill in the art based on the example embodiments of the present disclosure without making creative efforts should fall within the scope of the present disclosure.
(48) As described above, currently, in order to achieve the multi-degree-of-freedom image, a large amount of data operation is required. For example, in the manner of expressing the multi-degree-of-freedom image with the point cloud, since the point cloud expresses and stores the three-dimensional positions and pixel information of all points in space, a very large storage capacity is required. Accordingly, during the image reconstruction process, a very large amount of data operation is required. With the above image reconstruction method, if the image reconstruction is performed on the cloud, a great amount of pressure of processing will be put on the reconstruction device on the cloud. If the image reconstruction is performed on the terminal, since the terminal has limited processing capability, it is difficult for the terminal to handle such a large amount of data. In addition, currently, there is no good standard or industrial software and hardware supporting the compression of the point cloud, which is not conducive to be promoted and popularized.
(49) For the above technical problems, example embodiments of the present disclosure provide a solution, where by acquiring a multi-angle free-perspective image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, where the image combination includes multiple groups of texture images and depth maps that are synchronized at multiple angles and have corresponding relationships; and next selecting a corresponding group of texture images and depth maps in the image combination at a user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, the reconstructed image of the image combination at the user interaction moment may be obtained. Because the entire image reconstruction process only need to combine and render the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, there is no need to perform image reconstruction based on the texture maps and depth maps of all groups in the image combination. Thus, the amount of data operation during the image reconstruction may be reduced.
(50) In order to make the above objectives, features, and beneficial effects of the present disclosure more comprehensible, specific example embodiments of the present disclosure will be described in detail hereinafter with reference to the accompanying drawings.
(51) In the example embodiments of the present disclosure, video compression data or image data may be acquired through capturing devices. In order to enable those skilled in the art to better understand and implement the example embodiments of the present disclosure, hereinafter, specific application scenarios are used for description.
(52) As an example embodiment of the present disclosure, the following steps may be included. The first step is capturing and depth map calculation, including three main steps, which respectively are multi-camera video capturing, camera internal and external parameter calculation (camera parameter estimation), and depth map calculation. For multi-camera capturing, the videos captured by respective cameras are required to be aligned at the frame level. Referring to
(53) In this solution, no special camera, such as a light field camera, is required to capture the video. Similarly, no complicated camera calibration is required before capturing. Positions of multiple cameras may be laid out and arranged to better capture the objects or scenarios that need to be captured. Referring to
(54) After the above three steps are processed, the texture image captured from multiple cameras, all camera parameters, and the depth map of each camera are obtained. These three pieces of data may be referred to as data files in multi-angle free-perspective video data, and may also be referred to as 6 degrees of freedom video data (6DoF video data) 3914. Because of these pieces of data, the user terminal may generate a virtual viewpoint based on the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
(55) Referring to
(56) Referring to
(57) Referring to
(58) In an example embodiment implemented during a test, each test example includes 20 seconds of video data. The video data is 30 frames/second with a resolution of 1920*1080. For any one of the 30 cameras, there are 600 frames of data in total. The main folder includes the texture image folder and the depth map folder. Under the texture image folder, the secondary directories from 0 to 599 may be found. These secondary directories respectively represent 600 frames of content corresponding to the 20-second video. Each secondary directory includes texture images captured by 30 cameras, named from 0.yuv to 29.yuv in the format of yuv420. Accordingly, in the depth map folder, each secondary directory includes 30 depth maps calculated by the depth estimation algorithm. Each depth map corresponds to the texture image with the same name. The texture images and corresponding depth maps of multiple cameras belong to a certain frame moment in the 20-second video.
(59) All depth maps in the test example are generated by a preset depth estimation algorithm. In the test, these depth maps may provide good virtual viewpoint reconstruction quality at the virtual 6DoF position. In one case, a reconstructed image of the virtual viewpoint may be generated directly from the given depth maps. Alternatively, the depth map may also be generated or improved by the depth calculation algorithm based on the original texture image.
(60) In addition to the depth map and the texture image, the test example also includes a .sfm file, which is used to describe the parameters of all 30 cameras. The data of the .sfm file is written in binary format. The data format is described hereinafter. Considering the adaptability to different cameras, a fisheye camera model with distortion parameters was used in the test. How to read and use camera parameter data from the file may be understood with reference to DIBR reference software provided by us. The camera parameter data includes the following fields:
(61) (1) krt_R is the rotation matrix of the camera;
(62) (2) krt_cc is the optical center position of the camera;
(63) (3) krt_WorldPosition is the three-dimensional space coordinate of the camera;
(64) (4) krt_kc is the distortion coefficient of the camera;
(65) (5) src_width is the width of the calibration image;
(66) (6) src_height is the height of the calibration image; and
(67) (7) fisheye_radius and lens_fov are parameters of the fisheye camera.
(68) In the technical solution implemented by the present disclosure, the user may find the detailed code of how to read the corresponding parameters in the .sfm file from the preset parameter reading function (set_sfm_parameters function).
(69) In video reconstruction system or DIBR software used by the example embodiments of the present disclosure, camera parameters, the texture image, the depth map, and the 6DoF position of the virtual camera are received as inputs, and the generated texture image and depth map at the virtual 6DoF position are output at the same time. The 6DoF position of the virtual camera is the above 6DoF position determined according to user behavior. The DIBR software may be the software that implements image reconstruction based on the virtual viewpoint in the example embodiments of the present disclosure.
(70) Referring to
(71) Referring to
(72) In the above DIBR software, two cameras closest to the virtual 6DoF position may be selected by default to generate the virtual viewpoint.
(73) In the postprocessing step of the depth map, the quality of the depth map may be improved by various methods, such as foreground padding, pixel-level filtering, and the like.
(74) For the output generated image, a method for fusing texture images from two cameras is used. The fusion weight is a global weight and is determined by the distance of the position of the virtual viewpoint from the position of the reference camera. When the pixel of the output virtual viewpoint image is projected to only one camera, the projected pixel may be directly used as the value of the output pixel.
(75) After the fusion step, if there are still hollow pixels that have not been projected to, an inpainting method may be used to fill the hollow pixels.
(76) For the output depth map, for the convenience of error and analysis, a depth map obtained by projecting from one of the cameras to the position of the virtual viewpoint may be used as the output.
(77) Additionally, 6DoF position of the virtual camera 4520 and camera parameters 4522 may be used as the input for the camera selection step 4502.
(78) Those skilled in the art may understand that the above example embodiments are merely examples and are not limitations on the implementation manners. The technical solutions in the example embodiments of the present disclosure will be further described hereinafter.
(79) Referring to the schematic diagram of the to-be-viewed area as shown in
(80) For example, referring to
(81) The capturing device may be a camera or a video camera capable of synchronous shooting, for example, a camera or a video camera capable of synchronous shooting through a hardware synchronization line. With multiple capturing devices capturing data in the to-be-viewed area, multiple images or video streams in synchronization may be obtained. According to the video streams captured by multiple capturing devices, multiple synchronized frame images may also be obtained as multiple synchronized images. Those skilled in the art may understand that, ideally, the term synchronization refers to corresponding to the same moment, but the existence of errors and deviations may also be tolerated.
(82) Referring to
(83) In implementations, the process of performing video reconstruction or image reconstruction to obtain a reconstructed image may be implemented by the device 33 that performs displaying, or may be implemented by a device located on a Content Delivery Network (CDN) in an edge computing manner. Those skilled in the art may understand that
(84) The process of video reconstruction based on multi-angle free-perspective data will be described in detail hereinafter with reference to
(85) Referring to
(86) For example, the user may slide on the surface of the screen to switch the virtual viewpoint. In an example embodiment of the present disclosure, referring to
(87) Those skilled in the art may understand that the image viewed before switching may also be a reconstructed image. The reconstructed image may be a frame image in a video stream. In addition, there are various manners to switch the virtual viewpoint according to the user instruction, which is not limited herein.
(88) In implementations, the virtual viewpoint may be represented by 6 degrees of freedom (DoF) coordinates, where the spatial position of the virtual viewpoint may be represented as (x, y, z), and the perspective may be represented as three directions of rotation (θ, , γ).
(89) The virtual viewpoint is a three-dimensional concept. Three-dimensional information is required to generate the reconstructed image. In an implementation manner, the multi-angle free-perspective data may include the depth data for providing third-dimensional information outside the plane image (the texture image). Compared with other implementation manners, such as providing three-dimensional information through point cloud data, the data amount of the depth data is smaller. Implementation manners of generating multi-angle free-perspective data will be described in detail hereinafter with reference to
(90) In the example embodiments of the present disclosure, the switching of the virtual viewpoint may be performed within a certain range, which is the multi-angle free-perspective range. That is, within the multi-angle free-perspective range, the position of the virtual viewpoint and the perspective may be arbitrarily switched.
(91) The multi-angle free-perspective range is related to the arrangement of the capturing devices. The broader the shooting coverage of the capturing devices is, the larger the multi-angle free-perspective range is. The quality of the picture displayed by the device that performs displaying is related to the number of capturing devices. Generally, the more the number of capturing devices is set, the fewer the number of the hollow areas in the displayed picture is.
(92) Referring to
(93) Referring to
(94) In implementations, if only one row of capturing devices is set, a certain degree of freedom in the vertical direction may also be obtained in the process of image reconstruction to obtain the reconstructed image, but the multi-angle free-perspective range thereof is smaller than the free-perspective range of the scenario where two rows of capturing devices are set in the vertical direction.
(95) It may be understood by those skilled in the art that the above respective example embodiments and corresponding drawings are merely for illustrative purposes and are not intended to limit the association relationship between the setting of the capturing devices and the multi-angle free-perspective range, nor are they limitations of operation manners or obtained display effects of the device that performs displaying.
(96) Hereinafter, a setting method of capturing devices is further described.
(97)
(98) Step S1002, determining a multi-angle free-perspective range, where virtual viewpoint switching viewing in the to-be-viewed area is supported within the multi-angle free-perspective range.
(99) Step S1004, determining setting positions of the capturing devices according to at least the multi-angle free-perspective range, where the setting positions are suitable for setting the capturing devices to perform data capturing in the to-be-viewed area.
(100) Those skilled in the art may understand that a completely free perspective may refer to a perspective with 6 degrees of freedom. That is, the user may freely switch the spatial position and perspective of the virtual viewpoint on the device that performs displaying, where the spatial position of the virtual viewpoint may be expressed as (x, y, z), and the perspective may be expressed as three directions of rotation (θ, , γ). There are 6 degrees of freedom in total, and thus the perspective is referred to as a perspective with 6 degrees of freedom.
(101) As described above, in the example embodiments of the present disclosure, the switching of the virtual viewpoint may be performed within a certain range, which is the multi-angle free-perspective range. That is, within the multi-angle free-perspective range, the position of the virtual viewpoint and the perspective may be arbitrarily switched.
(102) The multi-angle free-perspective range may be determined according to the needs of the application scenario. For example, in some scenarios, the to-be-viewed area may have a core focus, such as the center of the stage, or the center of the basketball court, or the hoop of the basketball court. In such scenarios, the multi-angle free-perspective range may include a planar or three-dimensional area including the core focus. Those skilled in the art may understand that the to-be-viewed area may be a point, a plane, or a three-dimensional area, which is not limited herein.
(103) As described above, the multi-angle free-perspective range may be various areas, and further examples are described hereinafter with reference to
(104) Referring to
(105) Taking the multi-angle free-perspective range as the sector area A.sub.1OA.sub.2 as an example, the position of the virtual viewpoint may be continuously switched in this area. For example, the position of the virtual viewpoint may be continuously switched from A.sub.1 along the arc segment A.sub.1A.sub.2 to A.sub.2. Alternatively, the position of the virtual viewpoint may also be continuously switched along the arc segment L.sub.1L.sub.2. Alternatively, the position is switched in the multi-angle free-perspective range in other manners. Accordingly, the perspective of the virtual viewpoint may also be changed in this area.
(106) Further referring to
(107) Referring to
(108) Further referring to
(109) In the scenario with the core focus, the position of the core focus may be various, and the multi-angle free-perspective range may also be various, which are not listed herein one by one. Those skilled in the art may understand that the above respective example embodiments are merely examples and are not limitations on the multi-angle free-perspective range. Moreover, the shapes shown therein are not limitations on actual scenarios and applications.
(110) In implementations, the core focus may be determined according to the scenario. In a shooting scenario, there may also be multiple core focuses, and the multi-angle free-perspective range may be a superposition of multiple sub-ranges.
(111) In other application scenarios, the multi-angle free-perspective range may also be without the core focus. For example, in some application scenarios, it is necessary to provide multi-angle free-perspective viewing of historic buildings, or to provide multi-angle free-perspective viewing of art exhibitions. Accordingly, the multi-angle free-perspective range may be determined according to the requirements of these scenarios.
(112) Those skilled in the art may understand that the shape of the degree of freedom perspective range may be arbitrary. Any point within the multi-angle free-perspective range may be used as the position.
(113) Referring to
(114) In implementations, after the multi-angle free-perspective range is determined, the positions of the capturing devices may be determined according to the multi-angle free-perspective range.
(115) Specifically, the setting positions of the capturing devices may be selected within the multi-angle free-perspective range. For example, the setting positions of the capturing devices may be determined at boundary points of the multi-angle free-perspective range.
(116) Referring to
(117) In implementations, two or more setting positions may be set, and correspondingly, two or more capturing devices may be set. The number of capturing devices may be determined according to the requirements of the quality of the reconstructed image or video. In a scenario with a higher requirement on the picture quality of the reconstructed image or video, the number of capturing devices may be greater. In a scenario with a lower requirement on the picture quality of the reconstructed image or video, the number of capturing devices may be smaller.
(118) Still referring to
(119) Referring to
(120) As described above, in some application scenarios, the to-be-viewed area may include the core focus. Accordingly, the multi-angle free-perspective range includes the area where the perspective is directed to the core focus. In such an application scenario, the setting positions of the capturing devices may be selected from an arc-shaped area whose concave direction (radius direction) points to the core focus.
(121) When the to-be-viewed area includes the core focus, the setting positions are selected in the arc-shaped area pointing to the core focus in the concave direction, so that the capturing devices are arranged with an arc shape. Because the to-be-viewed area includes the core focus, the perspective points to the core focus. In such a scenario, the capturing devices are arranged with the arc shape, such that fewer capturing devices may be used to cover a larger multi-angle free-perspective range.
(122) In implementations, the setting positions of the capturing devices may be determined with reference to the perspective range and the boundary shape of the to-be-viewed area. For example, the setting positions of the capturing devices may be determined at a preset interval along the boundary of the to-be-viewed area within the perspective range.
(123) Referring to
(124) In implementations, the multi-angle free-perspective range may also support viewing from the upper side of the to-be-viewed area, and the upper side is in a direction away from the horizontal plane.
(125) Accordingly, the capturing device may be mounted on the drone to set the capturing device on the upper side of the to-be-viewed area, or on the top of the building where the to-be-viewed area is located. The top of the building is the structure in the direction away from the horizontal plane.
(126) For example, the capturing device may be set on the top of the basketball stadium, or may hover on the upper side of the basketball court through the drone carrying the capturing device. The capturing device may be set on the top of the stadium where the stage is located, or may be carried by the drone.
(127) By setting the capturing device on the upper side of the to-be-viewed area, the multi-angle free-perspective range may include the perspective above the to-be-viewed area.
(128) In implementations, the capturing device may be a camera or a video camera, and the captured data may be pictures or video data.
(129) Those skilled in the art may understand that the manner in which the capturing device is set at the setting position may be various. For example, the capturing device may be supported by the support frame at the setting position, or in other setting manners.
(130) In addition, those skilled in the art may understand that the above respective example embodiments are merely examples for illustration, and are not limitations on the setting manner of capturing devices. In various application scenarios, the implementations of determining the setting positions of the capturing devices and setting the capturing devices for capturing according to the multi-angle free-perspective range are all within the protection scope of the present disclosure.
(131) Hereinafter, the method for generating multi-angle free-perspective data is further described.
(132) As described above, still referring to
(133) In implementations, referring to
(134) Step S1902, acquiring multiple synchronized two-dimensional images, where the shooting angles of the multiple two-dimensional images are different;
(135) Step S1904, determining the depth data of each two-dimensional image based on the multiple two-dimensional images;
(136) Step S1906, for each of the two-dimensional images, storing the pixel data of each two-dimensional image in a first field, and storing the depth data in at least a second field associated with the first field.
(137) The multiple synchronized two-dimensional images may be images captured by the camera or frame images in video data captured by the video camera. In the process of generating the multi-angle free-perspective data, the depth data of each two-dimensional image may be determined based on the multiple two-dimensional images.
(138) The depth data may include depth values corresponding to pixels of the two-dimensional image. The distance from the capturing device to each point in the to-be-viewed area may be used as the above depth value, and the depth value may directly reflect the geometry of the visible surface in the to-be-viewed area. The depth value may be the distance from respective points in the to-be-viewed area along the optical axis of the camera to the optical center, and the origin of the camera coordinate system may be used as the optical center. Those skilled in the art may understand that the distance may be a relative value, as long as the multiple images use the same reference.
(139) Further, the depth data may include depth values corresponding to the pixels of the two-dimensional image on a one-to-one basis. Alternatively, the depth data may be some values selected from a set of depth values corresponding to the pixels of the two-dimensional image on a one-to-one basis.
(140) Those skilled in the art may understand that the two-dimensional image may also be referred to as the texture image, and the set of depth values may be stored in the form of a depth map. In implementations, the depth data may be data obtained by down-sampling the original depth map. The set of depth values corresponding to the pixels of the two-dimensional image (the texture image) on a one-to-one basis is stored according to the arrangement of pixel points of the two-dimensional image (the texture image) is the original depth map.
(141) In implementations, the pixel data of the two-dimensional image stored in the first field may be original two-dimensional image data, such as data obtained from the capturing device, or may be data with a reduced resolution of the original two-dimensional image data. Further, the pixel data of the two-dimensional image may be original the pixel data of the image, or the pixel data with reduced resolution. The pixel data of the two-dimensional image may be any one of YUV data and RGB data, or may be other data capable of expressing the two-dimensional image.
(142) In implementations, the amount of the depth data stored in the second field may be the same as or different from the number of pixel points corresponding to the pixel data of the two-dimensional image stored in the first field. The amount may be determined according to the bandwidth limitation of data transmission of the device terminal that processes the multi-angle free-perspective image data. If the bandwidth is small, the amount of data may be reduced in the above manners such as down-sampling or resolution reduction, and the like.
(143) In implementations, for each of the two-dimensional images (the texture images), the pixel data of the two-dimensional image (the texture image) may be sequentially stored in multiple fields in a preset order, and these fields may be consecutive or may be distributed in an interleaving manner with the second field. The fields storing the pixel data of the two-dimensional image (the texture image) may be used as the first fields. Hereinafter, examples are provided for explanation.
(144) For simplicity of description, unless otherwise specified, the images described in
(145) Referring to
(146) Referring to
(147) In implementations, the depth data may be stored in the same order as the pixel data of the two-dimensional image, so that a respective field in the first fields may be associated with a respective field in the second fields, thereby reflecting the depth value corresponding to each pixel.
(148) In implementations, the pixel data and the corresponding depth data of multiple two-dimensional images may be stored in various ways. Hereinafter, examples are provided for further explanation.
(149) Referring to
(150) Those skilled in the art may understand that respective images in the image stream or respective frame images in the video stream that are continuously captured by one capturing device of multiple synchronized capturing devices may be used as the above image 1 respectively. Similarly, among the multiple synchronized capturing devices, the two-dimensional image captured in synchronization with texture image 1 may be used as texture image 2. The capturing device may be the capturing device shown in
(151) Referring to
(152) Referring to
(153) Referring to
(154) In summary, the fields storing the pixel data of each two-dimensional image may be used as the first fields, and the fields storing the depth data corresponding to the two-dimensional image may be used as the second fields. For each generated multi-angle free-perspective data, the first fields and the second fields associated with the first fields may be included respectively.
(155) Those skilled in the art may understand that the above respective example embodiments are merely examples, and are not specific limitations on the type, size, and arrangement of the fields.
(156) Referring to
(157) In implementations, both the first fields and the second fields may be pixel fields in the stitched image. The stitched image is used to store the pixel data and the depth data of the multiple images. By using image format for data storage, the amount of data may be reduced, the time length of data transmission may be reduced, and the resource occupation may be reduced.
(158) The stitched image may be an image in various formats such as BMP format, JPEG format, PNG format, and the like. These image formats may be the compressed format or the uncompressed format. Those skilled in the art may understand that the two-dimensional image in various formats may include fields corresponding to respective pixels, which are referred to as pixel fields. The size of the stitched image, i.e., parameters like the number of pixels and the aspect ratio of the stitched image, may be determined according to needs, for example, may be determined based on the number of the multiple synchronized two-dimensional images, the amount of data to be stored in each two-dimensional image, the amount of the depth data to be stored in each two-dimensional image, and other factors.
(159) In implementations, among the multiple synchronized two-dimensional images, the depth data corresponding to the pixels of each two-dimensional image and the number of bits of the pixel data may be associated with the format of the stitched image.
(160) For example, when the format of the stitched image is the BMP format, the range of the depth value may be 0-255, which is 8-bit data, and the data may be stored as the gray value in the stitched image. Alternatively, the depth value may also be 16-bit data, which may be stored as the gray value at two pixel positions in the stitched image. Alternatively, the gray value may be stored in two channels at one pixel position in the stitched image.
(161) When the format of the stitched image is the PNG format, the depth value may also be 8-bit or 16-bit data. In the PNG format, the depth value of 16-bit may be stored as the gray value of one pixel position in the stitched image.
(162) Those skilled in the art may understand that the above example embodiments are not limitations on the storage manner or the number of data bits, and other data storage manners that may be implemented by those skilled in the art fall within the protection scope of the present disclosure.
(163) In implementations, the stitched image may be split into a texture image area and a depth map area. The pixel fields of the texture image area store the pixel data of the multiple two-dimensional images, and the pixel fields of the depth map area store the depth data of the multiple images. The pixel fields storing the pixel data of each two-dimensional image in the texture image area are used as the first fields, and the pixel fields storing the depth data of each image in the depth map area are used as the second fields.
(164) In implementations, the texture image area may be a continuous area, and the depth map area may also be a continuous area.
(165) Further, in implementations, the stitched image may be equally split, and the two split parts are used as the texture image area and the depth map area respectively. Alternatively, the stitched image may also be split in an unequal manner according to the amount of the pixel data and the amount of the depth data of the two-dimensional image to be stored.
(166) For example, referring to
(167) Those skilled in the art may understand that
(168) In implementations, the texture image area may include multiple texture image sub-areas. Each texture image sub-area is used to store one of the multiple images. The pixel fields of each texture image sub-area may be used as the first fields. Accordingly, the depth map area may include multiple depth map sub-areas. Each depth map sub-area is used to store the depth data of one of the multiple depth maps. The pixel fields of each depth map sub-area may be used as the second fields.
(169) The number of texture image sub-areas and the number of depth map sub-areas may be equal, both of which are equal to the number of multiple synchronized images. In other words, the number of image sub-areas and the number of depth map sub-areas may be equal to the number of cameras described above.
(170) Referring to
(171) With reference to the descriptions above, the pixel data of the synchronized 8 texture images, i.e., perspective 1 texture image to perspective 8 texture image, may be the original images obtained from the cameras, or may be images after the original images are reduced in resolution. The depth data is stored in a partial area of the stitched image and may also be referred to as the depth map.
(172) As described above, in implementations, the stitched image may also be split in an unequal manner. For example, referring to
(173) Those skilled in the art may understand that
(174) In implementations, the texture image area or the depth map area may also include multiple areas. For example, as shown in
(175) Alternatively, referring to
(176) Alternatively, referring to
(177) In implementations, the pixel data of each texture image may be stored in the texture image sub-areas in the order of the arrangement of pixel points. The depth data of each texture image may also be stored in the depth map sub-areas in the order of the arrangement of pixel points.
(178) Referring to
(179) Referring to
(180) Similarly, when the depth data of texture image 1 is stored into the depth map sub-areas, image 1 may be stored in a similar manner. In the case where the depth value corresponds to the pixel value of the texture image on a one-to-one basis, the depth data of image 1 may be stored in a manner as shown in
(181) Those skilled in the art may understand that the compression ratio of compressing the image is related to the association of respective pixel points in the image. The stronger the association is, the higher the compression ratio is. Since the captured image corresponds to the real world, the association of respective pixel points is strong. By storing the pixel data and the depth data of the image in the order of the arrangement of pixel points, the compression ratio when compressing the stitched image may be higher. That is, the amount of data after compression may be made smaller if the amount of data before compression is the same.
(182) By splitting the stitched image into the texture image area and the depth map area, in the case where multiple texture image sub-areas are adjacent in the texture image area or multiple depth map sub-areas are adjacent in the depth map area, since the data stored in the respective texture image sub-areas is obtained from images or frame images in the videos taken from different angles of the to-be-viewed area, all the depth maps are stored in the depth map area, and thus when the stitched image is compressed, a higher compression ratio may also be obtained.
(183) In implementations, padding may be performed on all or some of the texture image sub-areas and the depth map sub-areas. The form of padding may be various. For example, taking perspective 1 depth map in
(184) Because the stitched image includes multiple texture images and depth maps, the association between adjacent borders of respective texture images is poor. By performing padding, quality loss of the texture images and the depth maps in the stitched image may be reduced when the stitched image is compressed.
(185) In implementations, the pixel field of the texture image sub-area may store three-channel data, and the pixel field of the depth map sub-area may store single-channel data. The pixel field of the texture image sub-area is used to store the pixel data of any one of the multiple synchronized two-dimensional images. The pixel data is usually three-channel data, such as RGB data or YUV data.
(186) The depth map sub-areas are used to store the depth data of the image. If the depth value is 8-bit binary data, a single channel of the pixel field may be used for storage. If the depth value is 16-bit binary data, two channels of the pixel field may be used for storage. Alternatively, the depth value may also be stored with a larger pixel area. For example, if the multiple synchronized images are all 1920*1080 images and the depth values are 16-bit binary data, the depth values may also be stored in a doubled 1920*1080 image area, where each texture image area is stored with the single channel. The stitched image may also be split in combination with the storage manner.
(187) The uncompressed amount of data of the stitched image is stored in such a way that each channel of each pixel occupies 8 bits, which may be calculated according to the following formula, i.e., the number of the multiple synchronized two-dimensional images*(the amount of data of the pixel data of the two-dimensional image+the amount of data of the depth map).
(188) If the original image has a resolution of 1080P, i.e., 1920*1080 pixels, with a progressive scan format, the original depth map may also occupy 1920*1080 pixels, which is the single channel. The amount of data of pixels of the original image is 1920*1080*8*3 bits, and the amount of data of the original depth map is 1920*1080*8 bits. If the number of cameras is 30, the amount of data of pixels of the stitched image is 30*(1920*1080*8*3+1920*1080*8) bits, which is about 237M. If not compressed, the stitched image will occupy a lot of system resources and have a large delay. Especially when the bandwidth is small, for example, when the bandwidth is 1 Mbps, the uncompressed stitched image needs about 237 seconds to be transmitted. The real-time performance is poor, and the user experience needs to be improved.
(189) By one or more of manners such as storing regularly to obtain a higher compression ratio, reducing the resolution of the original image, or using the pixel data with reduced resolution as the pixel data of the two-dimensional image, or performing down-sampling on one or more of the original depth maps, and the like, the amount of data of stitched image may be reduced.
(190) For example, if the resolution of the original two-dimensional image is 4K, i.e., the pixel resolution of 4096*2160, and the down-sampling has a resolution of 540P, i.e., the pixel resolution of 960*540, the number of pixels of the stitched image is approximately one-sixteenth of the number of pixels before down-sampling. In combination with any one or more of other manners for reducing the amount of data described above, the amount of data may be made smaller.
(191) Those skilled in the art may understand that if the bandwidth is supportive and the decoding capability of the device that performs data processing may support the stitched image with higher resolution, the stitched image with higher resolution may also be generated to improve the image quality.
(192) Those skilled in the art may understand that in different application scenarios, the pixel data and the corresponding depth data of the multiple synchronized two-dimensional images may also be stored in other manners, for example, stored in the stitched image in units of pixel points. Referring to
(193) The pixel data and the depth data of the texture image are stored in a preset order. In implementations, the multiple synchronized two-dimensional images may also be multiple synchronized frame images obtained by decoding multiple videos. The videos may be acquired by multiple cameras, and the settings thereof may be the same as or similar to the cameras that acquire the two-dimensional images as described above.
(194) In implementations, generating the multi-angle free-perspective image data may further include generating the association relationship field, and the association relationship field may indicate the association relationship between the first field and at least one second field. The first field stores the pixel data of one two-dimensional image of the multiple synchronized two-dimensional images, and the second field stores the depth data corresponding to the two-dimensional image, where the first field and the second field correspond to the same shooting angle, i.e., the same perspective. The association relationship between the first field and the second field may be described by the association relationship field.
(195) Taking
(196) The association relationship field may indicate the association relationship between the first field and the second field of each two-dimensional image of the multiple synchronized two-dimensional images in various manners, for example, may be content storage rules of the pixel data and the depth data of the multiple synchronized two-dimensional images, that is, indicating the association relationship between the first field and the second field through indicating the storage manner described above.
(197) In implementations, the association relationship field may only include different mode numbers. The device that performs data processing may learn the storage manner of the pixel data and the depth data in the obtained multi-angle free-perspective image data according to the mode number of the field and the data stored in the device that performs data processing. For example, if the received mode number is 1, the storage manner is parsed as follows. The stitched image is equally split into two areas up and down, where the upper half area is the texture image area, and the lower half area is the depth map area. The texture image at a certain position in the upper half area is associated with the depth map stored at the corresponding position in the lower half area.
(198) Those skilled in the art may understand that the manner of storing the stitched image in the above example embodiments, for example, the storage manners illustrated in
(199) As described above, the picture format of the stitched image may be any one of the two-dimensional image formats such as BMP, PNG, JPEG, Webp and the like, or other image formats. The storage manner of the pixel data and the depth data in multi-angle free-perspective image data is not limited to the manner of stitched image. The pixel data and the depth data in multi-angle free-perspective image data may be stored in various manners, and may also be described by the association relationship field.
(200) Similarly, the storage manner may also be indicated in a manner of mode number. For example, in the storage manner shown in
(201) Those skilled in the art may understand that storage manners of the pixel data and the depth data of the multiple synchronized two-dimensional images may be various, and expression manners of the association relationship field may also be various. The association relationship field may be indicated by the above mode number or may directly indicate the content. The device that performs data processing may determine the association relationship between the pixel data and the depth data of the two-dimensional image according to the content of the association relationship field with reference to stored data or other priori knowledge such as the content corresponding to each mode number or the specific number of the multiple synchronized images, and the like.
(202) In implementations, generating the multi-angle free-perspective image data may further include, calculating and storing parameter data of each two-dimensional image based on the multiple synchronized two-dimensional images, and the parameter data includes data of the shooting position and the shooting angle of the two-dimensional image.
(203) With reference to the shooting position and the shooting angle of each image of the multiple synchronized two-dimensional images, the device that performs data processing may determine the virtual viewpoint in the same coordinate system with reference to the user's needs, and perform the reconstruction of the image based on the multi-angle free-perspective image data, to show the user the expected viewing position and perspective.
(204) In implementations, the parameter data may further include internal parameter data. The internal parameter data includes attribute data of the image capturing device. The above data of the shooting position and shooting angle of the image may also be referred to as external parameter data. The internal parameter data and external parameter data may be referred to as attitude data. With reference to the internal parameter data and external parameter data, factors indicated by internal parameter data such as lens distortion may be taken into account during image reconstruction, and the image of the virtual viewpoint may be reconstructed more accurately.
(205) In implementations, generating the multi-angle free-perspective image data may further include generating a parameter data storage address field, where the parameter data storage address field is used to indicate the storage address of the parameter data. The device that performs data processing may obtain the parameter data from the storage address of the parameter data.
(206) In implementations, generating the multi-angle free-perspective image data may further include generating a data combination storage address field, which is used to indicate the storage address of the data combination, i.e., to indicate the storage addresses of the first field and the second field of each image of the multiple synchronized images. The device that performs data processing may obtain the pixel data and the corresponding depth data of the multiple synchronized two-dimensional images from the storage space corresponding to the storage address of the data combination. From this perspective, the data combination includes the pixel data and the depth data of the multiple synchronized two-dimensional images.
(207) Those skilled in the art may understand that the multi-angle free-perspective image data may include specific data such as the pixel data of the two-dimensional image, the corresponding depth data of the two-dimensional image, and parameter data, and the like, as well as other indicative data such as the above generated association relationship field, and parameter data storage address field, data combination storage address field, and the like. These pieces of indicative data may be stored in the data header file to instruct the device that performs data processing to obtain the data combination, the parameter data, and the like.
(208) In implementations, the terminology explanations, implementation manners, and beneficial effects involved in respective example embodiments of generating multi-angle free-perspective data may refer to other example embodiments.
(209) Referring to
(210) In order to enable those skilled in the art to better understand and implement the example embodiments of the present disclosure, hereinafter, the image reconstruction method is further described.
(211) Referring to
(212) S3702, acquiring a multiple-angle free-perspective image combination, the parameter data of the image combination, and the virtual viewpoint position information based on the user interaction, where the image combination includes multiple groups of texture images and depth maps that are synchronized at multiple angles and have corresponding relationships.
(213) In implementations, as described above, multiple cameras, video cameras, and the like may be used to capture images at multiple angles in a scenario.
(214) The image in the multi-angle free-perspective image combination may be a completely free-perspective image. In implementations, the image may have a 6 degrees of freedom (DoF) perspective. That is, the spatial position and the perspective of the viewpoint may be freely switched. As described above, the spatial position of the viewpoint may be expressed as coordinates (x, y, z), and the perspective may be expressed as three directions of rotation (θ, , γ). Thus, the perspective may be referred to as 6DoF.
(215) During the image reconstruction process, the multi-angle free-perspective image combination and the parameter data of the image combination may be acquired first.
(216) In implementations, as described above, multiple groups of synchronized texture images and depth maps that have corresponding relationships in the image combination may be stitched together to form a frame of stitched image, specifically referring to the stitched image structures as shown in
(217) Referring to
(218) For the specific relationship between the multiple groups of texture images and depth maps in the image combination, reference may be made to the description of the above example embodiments, and details are not repeated herein.
(219) In implementations, the texture images in the image combination correspond to the depth maps on a one-to-one basis, where the texture image may use any type of two-dimensional image format, for example, any one of formats of BMP, PNG, JPEG, webp, and the like. The depth map may represent the distance of a respective point in the scenario with respect to the shooting device. That is, each pixel value in the depth map represents the distance between a certain point in the scenario and the shooting device.
(220) As described above, in order to save transmission bandwidth and storage resources, the image combination may be transmitted or stored in a compressed format. For the obtained two-dimensional image in the compressed format, decoding may be performed first, to obtain multiple groups of synchronized two-dimensional images in the corresponding image combination. In implementations, decompression software, decompression hardware, or a decompression apparatus combining software and hardware that can recognize the compressed format may be used.
(221) In implementations, parameter data of images in the combination may be acquired from the attribute information of the images.
(222) As described above, the parameter data may include external parameter data, and may further include internal parameter data. The external parameter data is used to describe the space coordinates, the posture, and the like of the shooting device. The internal parameter data is used to describe the attribute information of the shooting device such as the optical center, focal length, and the like of the device. The internal parameter data may also include distortion parameter data. The distortion parameter data includes radial distortion parameter data and tangential distortion parameter data. Radial distortion occurs during the process of the shooting device coordinate system transferred to the image physical coordinate system. The tangential distortion occurs during the manufacturing process of the shooting device, because the plane of the photosensitive element is not parallel to the lens. Based on external parameter data, information such as the shooting position, the shooting angle, and the like of the image may be determined. During the image reconstruction process, with reference to the internal parameter data including the distortion parameter data, the determined spatial projection relationship may be more accurate.
(223) In implementations, the image combinations in the above example embodiments may be used as the data file in the example embodiments of the present disclosure. In an application scenario where the bandwidth is limited, the image combination may be split into multiple parts for multiple transmissions.
(224) In implementations, using the 6DoF expression, virtual viewpoint position information based on user interaction may be represented in the form of coordinates (x, y, z, θ, , γ), where the virtual viewpoint position information may be generated under one or more preset user interaction manners. For example, the user interaction manner may be coordinates input by user operations, such as manual clicks or gesture paths, or virtual positions determined by voice input, or customized virtual viewpoints provided to the user (for example, the user may enter positions or perspectives in the scenario, such as under the hoop, at the sideline, referee perspective, coach perspective, and the like). Alternatively, based on a specific object such as a player in the game, an actor or a guest, a host in the image, and the like, the perspective may be switched to the object's perspective after the user clicks the corresponding object. Those skilled in the art may understand that the specific user interaction manner is not limited in the example embodiments of the present disclosure, as long as the virtual viewpoint position information based on the user interaction may be obtained.
(225) S3704, selecting a corresponding group of texture images and depth maps in the image combination at the user interaction moment based on preset rules according to the virtual viewpoint position information and the parameter data of the image combination.
(226) In implementations, according to the virtual viewpoint position information and the parameter data of the image combination, the corresponding group of texture images and depth maps in the image combination at the user interaction moment that meets a preset position relationship and/or quantitative relationship with the virtual viewpoint position may be selected. For example, for a virtual viewpoint position area with a large camera density, the textures map and the corresponding depth maps shot by only two cameras closest to the virtual viewpoint may be selected, while in the virtual viewpoint position area with a small the camera density, the texture images and the corresponding depth maps shot by three or four cameras closest to the virtual viewpoint may be selected.
(227) In an example embodiment of the present disclosure, according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, texture images and depth maps corresponding to 2 to N capturing devices closest to the virtual viewpoint position are selected, where N is the number of all capturing devices that capture the image combination. For example, the texture images and the depth maps corresponding to two capture devices closest to the virtual viewpoint position may be selected by default. In implementations, the user may set the number of the selected capturing devices closest to the virtual viewpoint position, and the maximum number does not exceed the number of the capturing devices corresponding to the image combination in the video frame.
(228) With this manner, there is no special requirement on the spatial position distribution of the multiple shooting devices that capture images (for example, the arrangement may be a linear distribution, an arc array arrangement, or any irregular arrangement form). Rather, the actual distribution situation of the shooting devices is determined according to the obtained virtual viewpoint position information and parameter data corresponding to the image combination. Then, the corresponding group of texture images and depth maps in the image combination at the user interaction moment may be selected using adaptive strategies. Thus, a higher degree of freedom and flexibility of selection may be provided in the case of reducing the amount of data operation and ensuring the quality of the reconstructed image. In addition, installation requirements for the shooting devices that capture images are also lowered, thereby facilitating the adaption to different site requirements and easy operation of installation.
(229) In an example embodiment of the present disclosure, according to the virtual viewpoint position information and the parameter data of the image combination, a preset number of corresponding groups of texture images and depth maps in the image combination at the user interaction moment closest to the virtual viewpoint position are selected.
(230) Those skilled in the art may understand that, in implementations, the corresponding group of texture images and depth maps may also be selected from the image combination using other preset rules, for example, also according to the processing capability of the image reconstruction device, or may be according to the user's requirements for the reconstruction speed, the clarity requirements of the reconstructed image (such as Standard Definition, High Definition or Ultra High Definition, and the like).
(231) S3706, combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(232) In implementations, the corresponding group of texture images and depth maps in the image combination at the user interaction moment may be combined and rendered using various manners, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(233) In an example embodiment of the present disclosure, according to the corresponding group of depth maps in the image combination at the user interaction moment, the pixel points in the corresponding group of texture images are directly copied to the generated virtual texture images, and a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment may be obtained.
(234) In another example embodiment of the present disclosure, the image is reconstructed in the following way: the selected corresponding groups of depth maps in the image combination at the user interaction moment are respectively forwardly projected to the virtual position at the user interaction moment; the selected corresponding groups of texture images in the image combination at the user interaction moment are respectively backwardly projected; and the respective generated virtual texture images are fused after the backward projection.
(235) In implementations, the above fused texture image may be output as the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(236) In implementations, in addition to the texture images, the reconstructed image may also include corresponding depth maps. There may be various manners to obtain the corresponding depth maps. For example, for the corresponding depth map, one of the depth maps obtained after post-processing may be randomly selected as the depth map of the reconstructed image. For another example, the depth map closest to the virtual viewpoint position at the user interaction moment may be selected from the depth maps obtained after the post-processing as the depth map of the reconstructed image. If there are more than one depth maps closest to the virtual viewpoint position, any one of them may be selected. As another example, the post-processed depth maps may be fused to obtain a reconstructed depth map.
(237) In implementations, after fusing respective virtual texture images generated after the backward projection, the inpainting may be performed on the fused texture image to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(238) In implementations, various manners may be used to post-process the depth maps respectively after the forward projection. For example, foreground padding processing may be performed on the depth maps after the forward projection, and pixel-level filtering processing may also be performed on the depth maps after the forward projection. A certain post-processing action may be performed individually. Alternatively, multiple post-processing actions may be used simultaneously.
(239) In an example embodiment of the present disclosure, the virtual texture images generated after the backward projection are fused in the following manner: according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, using the global weight determined by the distance between the virtual viewpoint position and the position of the capturing device that captures the corresponding texture image in the image combination, the respective virtual texture images generated after the backward projection are fused.
(240) In an example embodiment of the present disclosure, the forward projection may be performed first, and the depth information is used to project a corresponding group of texture images in the image combination to the three-dimensional Euclidean space. That is, the depth maps of the corresponding group are respectively projected to the virtual viewpoint position at the user interaction moment according to the spatial geometric relationship, to form the depth map of the virtual viewpoint position. Next, the backward projection is performed to project the three-dimensional spatial points onto the imaging plane of the virtual camera, that is, copying from the pixel points in the texture images of the corresponding group to the generated virtual texture images corresponding to the virtual viewpoint position according to the projected depth maps, to form the virtual texture images corresponding to the corresponding group. Next, the virtual texture images corresponding to the corresponding group are fused to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment. With the above method for reconstructing the image, the sampling accuracy of the reconstructed image may be improved.
(241) Before the forward projection is performed, preprocessing may be performed first. Specifically, according to the parameter data corresponding to the corresponding group in the image combination, the depth value of forward projection and the homography matrix of the texture backward projection may be calculated first. In implementations, the Z transformation may be used to convert the depth level into the depth value.
(242) In the depth map forward projection process, formulas may be used to map the depth map of the corresponding group to the depth map of the virtual viewpoint position, and then copy the depth value of the corresponding position. In addition, the depth map of the corresponding group may be noisy, and some sampled signals may be included in the mapping process, so the generated depth map of the virtual viewpoint position may have small noise holes. To solve this problem, median filtering may be used to remove noise.
(243) In implementations, other postprocessing may also be performed on the depth maps of the virtual viewpoint position obtained after the forward projection according to needs, to further improve the quality of the generated reconstructed image. In an example embodiment of the present disclosure, before the backward projection is performed, the front and back view occlusion relationship of the depth maps of the virtual viewpoint position obtained by the forward projection is processed, so that the generated depth maps may more truly reflect the positional relationship of objects in the scenario viewed at the virtual viewpoint position.
(244) For the backward projection, specifically, the position of the corresponding group of texture images in the virtual texture images may be calculated according to the depth maps of the virtual viewpoint position obtained by the forward projection. Next, the texture values corresponding to the pixel positions are copied, where holes in the depth maps may be marked as 0 or as no texture value in the virtual texture images. For the areas marked as the holes, the hole expansion may be performed to avoid synthetic illusion.
(245) Next, the generated corresponding groups of virtual texture images are fused to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment. In implementations, the fusion may also be performed in various manners. The following two example embodiments are used for illustration.
(246) In an example embodiment of the present disclosure, weighting processing is performed first, and then inpainting is performed. Specifically, the weighting processing is performed on pixels in corresponding positions in the virtual texture images corresponding to the respective corresponding groups in the image combination at the user interaction moment, to obtain the pixel values of the corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction moment. Next, for the position where the pixel value is zero in the reconstructed image of the virtual viewpoint position at the user interaction moment, the pixels around the pixel in the reconstructed image are used to perform the inpainting, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(247) In another example embodiment of the present disclosure, inpainting is performed first, and then weighting processing is performed. Specifically, for the position where the pixel value is zero in the virtual texture images corresponding to the respective corresponding groups in the image combination at the user interaction moment, the around pixel values are used respectively to perform inpainting. Next, after the inpainting, the weighting processing is performed on the pixel values in corresponding positions in the virtual texture images corresponding to the respective corresponding groups, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(248) The weighting processing in the above example embodiment may specifically use the weighted average method, or may use different weighting coefficients according to parameter data of the camera or the positional relationship between the shooting camera and the virtual viewpoint. In an example embodiment of the present disclosure, the weighting is performed according to the reciprocal of the distance between the position of the virtual viewpoint and the positions of the respective cameras, i.e., the closer the camera to the virtual viewpoint position is, the greater the weight is.
(249) In implementations, the inpainting may be performed with a preset inpainting algorithm according to needs, and details are not described herein again.
(250) Hereinabove, how to combine and render the corresponding group of texture images and depth maps in the image combination at the moment of user interaction based on the virtual viewpoint position and the parameter data of the corresponding group in the image combination at the moment of user interaction is illustrated with examples. Those skilled in the art may understand that, in implementations, other Depth Image Based Rendering (DIBR) algorithms maybe used according to needs, and will not be described one by one.
(251) Referring to the schematic structural diagram of an image reconstruction system in the example embodiment of the present disclosure shown in
(252) The acquiring unit 3810 is adapted to acquire multiple-angle free-perspective image combination, the parameter data of the image combination, and the virtual viewpoint position information based on the user interaction, where the image combination includes multiple groups of texture images and the depth maps that are synchronized at multiple angles and have corresponding relationships;
(253) The selecting unit 3812 is adapted to select a corresponding group of texture images and the depth maps in the image combination at the user interaction moment based on preset rules according to the virtual viewpoint position information and the parameter data of the image combination;
(254) The image reconstruction unit 3814 is adapted to combine and render the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(255) With the above image reconstruction system 3800, by selecting the corresponding group of texture images and depth maps in the image combination at the user interaction moment according to preset rules based on the virtual viewpoint position information and the acquired parameter data of the image combination, it is only necessary to combine and render the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment, without performing image reconstruction based on all groups of texture images and depth maps in the image combination. Thus, the amount of data operation during the image reconstruction may be reduced.
(256) In implementations, the selecting unit 3812 may select the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies preset positional relationships with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination, or select the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies preset quantitative relationships with the virtual viewpoint position, or select the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies preset positional and quantitative relationships with the virtual viewpoint position.
(257) In an example embodiment of the present disclosure, the selecting unit 3812 may select a preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.
(258) In implementations, referring to
(259) The forward projection subunit 3816 is adapted to project the corresponding group of depth maps respectively to the virtual viewpoint position at the user interaction moment according to the spatial geometric relationship, to form the depth map of the virtual viewpoint position;
(260) The backward projection subunit 3818 is adapted to copy from the pixel points in the corresponding group of texture images to the generated virtual texture images corresponding to the virtual viewpoint position according to the projected depth map, to form the virtual texture images corresponding to the corresponding group;
(261) The fusing subunit 3820 is adapted to fuse the virtual texture images corresponding to the corresponding group to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(262) In an example embodiment of the present disclosure, the fusing subunit 3820 is adapted to perform weighting processing on pixels in corresponding positions in the virtual texture images corresponding to the corresponding groups in the image combination at the user interaction moment, to obtain the pixel values of corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction moment; and adapted to use the pixels around the pixel in the reconstructed image to perform inpainting for the position where the pixel value is zero in the reconstructed image of the virtual viewpoint position at the user interaction moment, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(263) In another example embodiment of the present disclosure, the fusing subunit 3820 is adapted to, for the position where the pixel value is zero in the virtual texture images corresponding to the respective corresponding groups in the image combination at the user interaction moment, perform inpainting using the around pixel values respectively; and adapted to, perform the weighting processing on the pixel values in corresponding positions in the virtual texture images corresponding to the respective corresponding groups after the inpainting, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(264) In another example embodiment of the present disclosure, in the image reconstruction unit 3814, the forward projection subunit 3816 is adapted to perform forward projection on the selected depth maps of the corresponding group in the image combination at the user interaction moment, respectively, to project to the virtual position at the user interaction moment; the backward projection subunit 3818 is adapted perform backward projection on the selected texture images of the corresponding group in the image combination at the user interaction moment, respectively; and the fusing subunit 3820 is adapted to fuse the respective virtual texture images generated after the backward projection.
(265) In implementations, the above fused texture image may be output as the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(266) In implementations, in addition to the texture images, the reconstructed image may also include corresponding depth maps. There may be various manners to obtain the corresponding depth maps. For example, for the corresponding depth map, one of the depth maps obtained after post-processing may be randomly selected as the depth map of the reconstructed image. For another example, the depth map closest to the virtual viewpoint position at the user interaction moment may be selected from the depth maps obtained after the post-processing as the depth map of the reconstructed image. If there are more than one depth maps closest to the virtual viewpoint position, any one of them may be selected. As another example, the post-processed depth maps may be fused to obtain a reconstructed depth map.
(267) In implementations, the image reconstruction unit 3814 may further include a post-processing subunit (not shown), which is adapted to perform post-processing on the depth maps after the forward projection, respectively. For example, the post-processing subunit may perform at least one of foreground padding processing, pixel-level filtering processing, and the like, on the depth maps after the forward projection, respectively.
(268) In implementations, the acquiring unit 3810 may include a decoding subunit (not shown), which is adapted to decode the acquired compressed multi-angle free-perspective image data, to obtain the multi-angle free-perspective image combination, and the parameter data corresponding to the image combination.
(269) Example embodiments of the present disclosure further provides an image reconstruction device which can achieve the above image reconstruction methods. The image reconstruction device may include a memory and a processor. The memory stores computer instructions that may run on the processor. When the processor runs the computer instructions, the processor may execute the steps of the image reconstruction method according to any one of the above example embodiments.
(270) In implementations, the image reconstruction device may include a terminal device. After the terminal device completes the image reconstruction using the above example embodiments, the terminal device may output the display through a display interface, for the user to view. The terminal device may be a handheld terminal such as a mobile phone and the like, a tablet computer, a set-top box, and the like.
(271) In implementations, the edge node may also be used to perform the above image reconstruction. After the edge node completes the image reconstruction, the edge node may output to the important device in communication therewith, for the user to view. The edge computing node may be a node that performs short-range communication with a display device that displays the reconstructed image and maintains a high-bandwidth, low-latency connection, such as a connection via WiFi, 5G network, and the like. In implementations, the edge node may be any one of a base station, a router, a home gateway, and an in-vehicle device. With reference to
(272) In implementations, in a network, a specific terminal device or edge node device may be selected according to the processing capability of the terminal device and the edge node, or according to the user's selection, or according to the operator's configuration, to perform the image reconstruction process in the example embodiments of the present disclosure. Reference may be made to the specific methods described in the example embodiments of the present disclosure, and details are not described herein again.
(273) Example embodiments of the present disclosure further provide a computer-readable storage medium having computer instructions stored thereon. When the computer instructions are running, the steps of the image reconstruction method according to any one of the above example embodiments of the present disclosure may be executed. For the image reconstruction method executed by the instructions stored on the computer-readable storage medium, reference may be made to the above example embodiments of the image reconstruction method, and details are not described herein again.
(274) The computer-readable storage medium may be various suitable media, such as an optical disc, a mechanical hard disk, and a solid-state hard disk. The computer-readable storage medium may include a volatile or non-volatile type, a removable or non-removable media, which may achieve storage of information using any method or technology. The information may include a computer-readable instruction, a data structure, a program module or other data. Examples of computer storage media include, but not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electronically erasable programmable read-only memory (EEPROM), quick flash memory or other internal storage technology, compact disk read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media, which may be used to store information that may be accessed by a computing device. As defined herein, the computer-readable storage medium does not include transitory media, such as modulated data signals and carrier waves.
(275) Although the present disclosure has been described as above, the present disclosure is not limited thereto. Any person skilled in the art may make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the scope defined by the claims.
EXAMPLE CLAUSES
(276) Clause 1. An image reconstruction method, comprising: acquiring a multi-angle free-perspective image combination, parameter data of the image combination, and virtual viewpoint position information based on user interaction, wherein the image combination includes multiple groups of texture images and depth maps that are synchronized at multiple angles and have corresponding relationships; selecting a corresponding group of texture images and depth maps in the image combination at a user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(277) Clause 2. The image reconstruction method according to clause 1, wherein selecting the corresponding group of texture images and depth maps data in the image combination at the user interaction moment based on the preset rule according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies a preset positional relationship and/or a preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination.
(278) Clause 3. The image reconstruction method according to clause 2, wherein selecting the corresponding group of texture images and the depth maps in the image combination at the user interaction moment that satisfies the preset positional relationship and/or the preset quantitative relationship with the virtual viewpoint position according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting a preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination.
(279) Clause 4. The image reconstruction method according to clause 3, wherein selecting the preset number of corresponding groups of texture images and the depth maps that are closest to the virtual viewpoint position in the image combination at the user interaction moment according to the virtual viewpoint position information and the parameter data of the image combination comprises: selecting texture images and depth maps corresponding to 2 to N capturing devices closest to the virtual viewpoint position according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, wherein N is the number of all capturing devices that capture the image combination.
(280) Clause 5. The image reconstruction method according to clause 1, wherein combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment comprises: performing forward projection on the selected depth maps of the corresponding group in the image combination at the user interaction moment respectively, to project to the virtual position at the user interaction moment; performing post-processing on the depth maps after the forward projection respectively; performing backward projection on the selected texture images of the corresponding group in the image combination at the user interaction moment respectively; and fusing respective virtual texture images generated after the backward projection.
(281) Clause 6. The image reconstruction method according to clause 5, wherein after fusing respective the virtual texture images generated after the backward projection, the method further comprises: performing inpainting on the fused texture image to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(282) Clause 7. The image reconstruction method according to clause 5, wherein performing post-processing on the depth maps after the forward projection respectively comprises at least one of the following: performing foreground padding processing on the depth maps after the forward projection respectively; and performing pixel-level filtering processing on the depth maps after the forward projection respectively.
(283) Clause 8. The image reconstruction method according to clause 5, wherein fusing respective virtual texture images generated after the backward projection comprises: fusing the respective virtual texture images generated after the backward projection using a global weight determined by a distance between the virtual viewpoint position and a position of a capturing device that captures a corresponding texture image in the image combination, according to the virtual viewpoint position information and the parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment.
(284) Clause 9. The image reconstruction method according to clause 1, wherein combining and rendering the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction moment comprises: respectively projecting the depth maps of the corresponding group in the image combination at the user interaction moment to the virtual viewpoint position at the user interaction moment according to a spatial geometric relationship, to form a depth map of the virtual viewpoint position; and copying from pixel points in the texture images of the corresponding group to the generated virtual texture images corresponding to the virtual viewpoint position according to the projected depth maps, to form the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment; and fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(285) Clause 10. The image reconstruction method according to clause 9, wherein fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment comprises: performing weighting processing on pixels in corresponding positions in the virtual texture images corresponding to respective corresponding groups in the image combination at the user interaction moment, to obtain pixel values of the corresponding positions in the reconstructed image of the virtual viewpoint position at the user interaction moment; and for a position where a pixel value is zero in the reconstructed image of the virtual viewpoint position at the user interaction moment, performing inpainting using pixels around the pixel in the reconstructed image, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(286) Clause 11. The image reconstruction method according to clause 9, wherein fusing the virtual texture images corresponding to the corresponding group in the image combination at the user interaction moment to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment comprises: for a position where a pixel value is zero in the virtual texture images corresponding to respective corresponding groups in the image combination at the user interaction moment, performing inpainting using around pixel values respectively; and performing weighting processing on pixel values in corresponding positions in the virtual texture images corresponding to the respective corresponding groups after the inpainting, to obtain the reconstructed image of the virtual viewpoint position at the user interaction moment.
(287) Clause 12. The image reconstruction method according to clause 1, wherein acquiring a multi-angle free-perspective image combination and parameter data of the image combination comprises: decoding acquired compressed multi-angle free-perspective image data, to obtain the multi-angle free-perspective image combination and the parameter data corresponding to the image combination.
(288) Clause 13. An image reconstruction system, comprising: an acquiring unit, adapted to acquire multiple-angle free-perspective image combination, parameter data of the image combination, and the virtual viewpoint position information based on user interaction, wherein the image combination includes multiple groups of texture images and the depth maps that are synchronized at multiple angles and have corresponding relationships; a selecting unit, adapted to select a corresponding group of texture images and the depth maps in the image combination at the user interaction moment based on a preset rule according to the virtual viewpoint position information and the parameter data of the image combination; and an image reconstruction unit, adapted to combine and render the selected corresponding group of texture images and depth maps in the image combination at the user interaction moment based on the virtual viewpoint position information and parameter data corresponding to the corresponding group of texture images and depth maps in the image combination at the user interaction moment, to obtain a reconstructed image corresponding to the virtual viewpoint position at the user interaction moment.
(289) Clause 14. An image reconstruction device, comprising a memory and a processor, the memory storing computer instructions thereon capable of running on the processor, wherein the processor performs steps of the method of any one of clauses 1 to 12 when the processor runs the computer instructions.
(290) Clause 15. The image reconstruction device according to clause 14, wherein the image reconstruction device comprises at least one of a terminal device and an edge node.
(291) Clause 16. A computer-readable storage medium having computer instructions stored thereon, wherein steps of the method of any one of clauses 1 to 12 are performed when the computer instructions are executed.