METHOD FOR PERFORMING VOLUMETRIC RECONSTRUCTION

Abstract

The present disclosure relates to a method and capturing arrangement for creating a three-dimensional model of a scene. The model comprises a three-dimensional space comprising a plurality of discrete three-dimensional volume elements (V.sub.1,1, V.sub.1,2) associated with three initial direction-independent color values and an initial opacity value. The method comprises obtaining a plurality of images of said scene and defining a minimization problem. Wherein the minimization problem comprises three residuals, one for each color value, wherein each residual is based on the difference between (a) the color value of each image element of each image and (b) an accumulated direction-independent color value of the volume along each ray path of each image element. The method further comprises creating the three-dimensional model of the scene by solving said minimization problem.

Claims

1. A method for creating a three-dimensional model of a scene, the model comprising a three-dimensional space comprising a plurality of discrete three-dimensional volume elements, each three-dimensional volume element being associated with three initial direction-independent color values and an initial opacity value, said method comprising: obtaining a plurality of images of said scene, each image comprising a grid of image elements wherein each image element is associated with three color values and a ray path through said three-dimensional space; defining a minimization problem, wherein the direction-independent color values and opacity value of each volume element forms parameters of the minimization problem, and wherein the minimization problem comprises three residuals, one for each color value, wherein each residual is based on the difference between: a) the color value of each image element of each image, and b) an accumulated direction-independent color value, obtained by accumulating the direction-independent color value and opacity value of each volume element along the ray path of each image element of each image (200a, 200b), creating the three-dimensional model of the scene by solving said minimization problem with a Gauss-Newton solver iteratively adjusting said parameters based on a Jacobian of the residuals being a function of the parameters, so as to reduce each residual below a predetermined threshold.

2. The method according to claim 1, wherein said minimization problem is a non-linear least squares problem.

3. The method according to claim 1, wherein solving said minimization problem comprises calculating each residual in parallel and solving said minimization with an iterative parallel solver.

4. The method according to claim 1, wherein solving said minimization problem by adjusting said parameters comprises: solving said minimization problem for a first resolution of the three-dimensional space and for a second resolution of the three-dimensional space, wherein said second resolution is higher compared to said first resolution.

5. The method according to claim 4, wherein said minimization problem is solved at the first resolution to obtain a first resolution model of the scene prior to solving the minimization problem at the second resolution, said method further comprising: initializing each volume element of the three-dimensional space at the second resolution with three initial direction-independent color values and an initial opacity value based on the three direction-independent color values and an opacity value of each volume element of the first resolution model of the scene.

6. The method according to claim 1, further comprising: selecting at least two images of said plurality of images, wherein each of said at least two images comprise an image element with a ray path passing through a same volume element of said created model; and determining for each color value of said same volume element in the created model a spherical function indicative of the direction-dependent color intensity of said volume element based on the corresponding color value of the image element with a ray path passing through the same volume element of said obtained model.

7. The method according to claim 1, further comprising: defining an auxiliary element for each image element of each image, wherein said auxiliary element is associated with a weighting value and the three color values of the associated image element; wherein each residual for each color is based on the difference between: a) the color value of each image element of each image, and b) a modified accumulated direction-independent color value, wherein the modified accumulated direction-independent color value is based on a weighted sum of the accumulated direction-independent color value and the color value of the auxiliary element, wherein the weights of the weighted sum are based on the weighting value and wherein the weighting value of each auxiliary element is a parameter of the minimization problem, and wherein the minimization problem further comprises penalty residual for each auxiliary element, the penalty residual being based on the weighting value such that auxiliary elements with weighting values that introduce more of the auxiliary elements color values are penalized more.

8. The method according to claim 1, wherein said opacity value ranges from a first value, indicating a fully transparent volume element, and a second value, indicating a fully opaque volume element, wherein said minimization problem comprises an opacity residual for each image element based on the accumulated opacity along the ray path of each image element, and wherein the opacity residual is proportional to the smallest one of: the difference between the accumulated opacity and the first value and the difference between the accumulated opacity and the second value.

9. The method according to claim 1, further comprising: defining a two-dimensional environment map at least partially surrounding the three-dimensional volume elements, said environment map comprising a plurality of two-dimensional environment map elements wherein each environment map element comprises three color values and an opacity value, initializing each two-dimensional environment map element with three initial direction-independent color values and an initial opacity value; wherein the parameters of the least squares problem further comprises the direction-independent color values and opacity value of the environment map.

10. A capture arrangement for capturing a three-dimensional model of a scene, the model comprising a three-dimensional space which comprises a plurality of discrete three-dimensional volume elements, each three-dimensional volume element being associated with three initial direction-independent color values and an initial opacity value, said system comprising: a plurality of cameras configured to capture a plurality of images of said scene, each image comprising a grid of image elements wherein each image element is associated with three color values and a ray path through said three-dimensional space, and a processing device configured to create the three-dimensional model of the scene by solving a minimization problem with three residuals, one for each color value, using an iterative Gauss-newton solver for adjusting a set of parameters of the minimization problem based in a Jacobian of the residuals being a function of the parameters so as to reduce each residual below a predetermined threshold, wherein the parameters of the minimization problem comprise the direction-independent color values and opacity value of each volume element and wherein each residual is based on the difference between: a) the color value of each image element of each image, and b) an accumulated direction-independent color value, obtained by accumulating the direction-independent color value and opacity value of each volume element along the ray path of each image element of each image.

11. A capturing and rendering system comprising the capturing arrangement according to claim 10 and a rendering arrangement, the rendering arrangement comprising: a display, configured to display an image, head position determining means, configured to determine the head position of a user relative a normal of the display, and a rendering unit configured to obtain the three-dimensional model captured by the capturing arrangement, a reference direction indicating a reference view direction of the model, and the head position of the user determined by the head position determining means, wherein the rendering unit is further configured to render an image of the three-dimensional model from a target view direction and provide said image to said display, wherein the displacement of said target view direction from said reference view direction is equal to the displacement of the head position of the user relative the normal of the display.

12. The capturing and rendering system according to claim 11, wherein said system updates said three-dimensional model and said target view direction periodically to capture and render three-dimensional video as a part of a videoconferencing system.

13. The capturing and rendering system according to claim 11, wherein said head position determining means comprises at least one camera capturing a respective image of the head of the user.

14. The capturing and rendering system according to claim 11, wherein said head position determining means comprises: a head position predictor, configured to predict the head position of the user at a future point in time given the head position of the user at a current and at least one previous point in time as determined by the head position determining means, wherein said target view direction is based on the predicted head position of the user relative the display.

15. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0046] The present invention will be described in more detail with reference to the appended drawings, showing currently preferred embodiments of the invention.

[0047] FIG. 1 is a block diagram of a capturing and rendering system according to embodiments of the present invention.

[0048] FIG. 2 is a flowchart illustrating a method for creating a three-dimensional model of a scene from a plurality of images depicting the scene.

[0049] FIG. 3a illustrates a plurality of cameras viewing a scene comprising an object.

[0050] FIG. 3b illustrates a plurality of images according to embodiments of the invention, wherein each image comprises a plurality of image elements.

[0051] FIG. 3c illustrates ray paths associated with image elements wherein the ray paths travel through the scene comprising the at least one object.

[0052] FIG. 3d illustrates a three-dimensional space comprising a three-dimensional volume which in turn comprises a model of the object of the scene.

[0053] FIG. 3e illustrates a created three-dimensional model of the scene.

[0054] FIG. 4 is a flowchart illustrating a method for solving the minimization problem according to embodiments of the present invention.

[0055] FIG. 5 depicts a ray path of an image element propagating through the discrete volume elements of the three-dimensional model.

[0056] FIG. 6 depicts the three-dimensional volume together with a set of environment maps surrounding the three-dimensional volume in accordance with embodiments of the present invention.

[0057] FIG. 7a depicts a display which displays the three-dimensional model from a target view direction according to embodiments of the present invention.

[0058] FIG. 7b depicts a three-dimensional model with a reference rendering direction according to embodiments of the present invention.

[0059] FIG. 7c depicts a user positioned in front of the display viewing the display from a direction substantially normal to the display.

[0060] FIG. 7d depicts a display which displays the three-dimensional model from another target view direction according to embodiments of the present invention.

[0061] FIG. 7e depicts a three-dimensional model with a reference rendering direction and a target rendering direction which differs from the reference rendering direction.

[0062] FIG. 7f depicts a user positioned in front of the display viewing the display from an off-normal viewing direction.

DETAILED DESCRIPTION OF CURRENTLY PREFERRED EMBODIMENTS

[0063] FIG. 1 illustrates how the method of the present invention may be implemented to create a three-dimensional model 100 of a scene comprising an object(s) 10, wherein the scene is captured by images from each of a plurality of cameras 20a, 20b. For instance, the scene may comprise one or more objects 10 such as one or more persons which are stationary or moving in front of the cameras 20a, 20b. The remote processor 30 is configured to perform the method of creating a 3D model 100 of the scene which comprises a 3D model 110 of the object(s) 10 present in the scene and that are visible in the images captured by the cameras 20a, 20b. Accordingly, the 3D model 100 may be provided to a local processor 40 configured to render a 2D image depicting the 3D model 100 from an arbitrary point of view. The local processor 40 performs any suitable form of rendering to obtain such an image, for instance the local processor 40 performs volumetric ray casting (or similar) to obtain such an image. The arbitrary point of view may be equal to or different from the point of view of the cameras 20a, 20b providing the local processor 40 with a freedom of choice to render a wide range of 2D images depicting the 3D model 100 from different points of view. The 2D image rendered by the local processor 40 may be provided to a display 50 which displays the 2D image.

[0064] For instance, the cameras 20a, 20b and the remote processor 30 may be provided on a remote site and used to create a 3D model 100 of the scene which is transmitted to the local processor 40 and display 50 provided in a local site. By periodically capturing new images of the scene object(s) 10 with the cameras 20a, 20b and updating the 3D model 100 accordingly a 3D video stream is obtained which e.g. may be used for teleconferencing purposes. Due to the rapid convergence of the proposed method for creating a 3D model 100 the remote processor 30 may create a new updated 3D model 100 more than 20 times per second, more than 30 times per second or more than 60 times per second allowing the 3D video stream to be as fluid as a conventional 2D video stream. The 3D video stream may be accompanied by a synchronized audio stream e.g. recorded using a microphone on the remote site and transmitted alongside the 3D video. The 3D video stream may also be a two-way stream enabling 3D teleconferencing with 3D video being simultaneously recorded and rendered on a plurality of teleconferencing clients, each client comprising a rendering and capturing arrangement such as a remote processor 30 and local processor 40.

[0065] In some implementations, the view direction from which the local processor 40 renders the 3D model may be based on the position of a user (or a user's head) in the local site relative the display 50. For instance, a convincing window effect may be established for the user viewing the display 50 wherein the display mimics a window to the 3D scene 100 rather than a mere 2D-projection of the scene. This effect is described in more detail in relation to FIGS. 7a and 7d in the below and enables e.g. more intuitive interaction between participants of a 3D-video teleconferencing session.

[0066] With reference to FIG. 2 together with FIG. 3a, 3b, 3c, 3d, 3e the method of creating a 3D model 100 from a plurality of images 200a, 200b will now be described in more detail.

[0067] At step S21 a plurality of images 200a, 200b are obtained. The images may be captured by a respective one of a plurality of cameras 20a, 20b as illustrated in FIG. 3a. Alternatively, the images 200a, 200b may be captured using a same camera which has captured a plurality of images 200a, 200b from different positions within the scene wherein the object(s) 10 of the scene is essentially stationary between capturing of the images. Accordingly, each image 200a, 200b depicts any object(s) 10 in the scene from different points of view.

[0068] As illustrated in FIG. 3b each image 200a, 200b comprises a plurality of 2D image elements 201a, 202a, 203a, 201b, 202b 203b arranged in a grid. For instance, the 2D image elements 201a, 202a, 203a, 201b, 202b 203b may be respective pixels of the images 200a, 200b. Alternatively, each 2D image element 201a, 202a, 203a, 201b, 202b 203b may represent a collection of pixels such that each 2D element 201a, 202a, 203a, 201b, 202b 203b e.g. is an aggregate of two or more pixels, such as four pixels or nine pixels. That is, the 2D image elements 201a, 202a, 203a, 201b, 202b 203b may form a downsampled representation of each image 200a, 200b.

[0069] Optionally, the method may go to step S21 comprising determining the relative capture position of the images 200a, 200b. The person skilled in the art will recognize many alternative methods for determining the relative capture position of a plurality of images depicting a same scene from different points of view. This may involve first determining the so-called intrinsic camera parameters, which correspond one or more of the focal length, optical center, and radial distortion coefficients of the lens, and then the extrinsic parameters. The relative camera positions and orientations may then be found by using for instance Structure from Motion (SfM) or a known calibration pattern, such as a checkerboard, or AI solutions. Alternatively, the relative positioning of the cameras 20a, 20b may be known whereby the relative capture position of the images 200a, 200b is also known. For instance, the cameras 20a, 20b may be fixed in a housing or onto a scaffolding forming a camera system with fixed relative positions.

[0070] Furthermore, each image element 201a, 202a, 203a, 201b, 202b 203b is associated with at least one color value. For instance, each image element 201a, 202a, 203a, 201b, 202b 203b may be associated with three color values forming a color value vector C, wherein the three color values may be RGB color values indicating the relative color intensity of the red, green and blue color of the volume element. The color vector C of each 2D image element 201a, 202a, 203a, 201b, 202b 203b may represent other color models than the RGB color model, for instance with an arbitrary color model with three or more color values or with a CMY color model.

[0071] As seen in FIG. 3c each two-dimensional image element 202b, 203a, 203b is further associated with a respective ray path 212b, 213a, 213b through the scene containing an object(s) 10. The ray paths 212b, 213a, 213b may be obtained or determined for each image element. For instance, the images 200a, 200b may be captured using light field cameras (plenoptic cameras) which in addition to capturing color and/or light intensity captures the direction from which the light is incident on the camera sensor wherein the ray paths 212b, 213a, 213b may be parallel with the incidence light direction of each image element. Alternatively, the ray paths 212b, 213a, 213b are determined or approximated for the 2D images. In a simple case, the ray paths 212b, 213a, 213b are normal to the image element and parallel with each other (or converging/diverging) and the direction the camera used to capture the image. By considering or assuming e.g. a focal length used to capture the image it is possible to instead determine ray paths 212b, 213a, 213b that are diverging or converging from each other.

[0072] The ray paths 212b, 213a, 213b in FIG. 3c are depicted as being approximately parallel although this is merely an example and the ray paths 212b, 213a, 213b of each image may be diverging or converging. Moreover, the direction of one ray path 212b may be independent of the direction of another ray path 213b of the same image 200b.

[0073] Moreover, the point in each image element from which each ray path 212b, 213a, 213b originates may be chosen stochastically each time a ray path 212b, 213a, 213b is referenced to mitigate constant sampling artifacts in the created model. For instance, the originating point of ray path 212b, 213a, 213b may be determined using two random variables indicating the originating point along two respective vectors that span the image element. Additionally, as creating random variables may be computationally expensive to perform each time the ray path 212b, 213a, 213b is referenced the two random variables may have been created and stored prior to solving the minimization problem, wherein the two random variables may simply be obtained/accessed as opposed to generate each time the ray path 212b, 213a, 213b is referenced.

[0074] Turning to FIG. 3d it is illustrated how a three-dimensional volume 100 is defined at step S12. When the minimization problem is solved the three dimensional volume 100 depicting the scene of the images 200a, 200b will become the three-dimensional model 100. Thus, the three-dimensional volume and the three-dimensional model may represent the same volume elements with different opacity and color values. The three-dimensional volume 100 comprises a plurality of three-dimensional volume elements such as voxels. Although depicted as a cube the three-dimensional volume 100 may be delimited with any shape. The shape may be a predetermined shape or the shape may be determined using the captured images 200a, 200b. For instance, the shape of the three-dimensional volume 100 may be the volume which is delimited by the outermost ray paths of each image 200a, 200b. Similarly, the three-dimensional volume elements which form the three-dimensional volume 100 may be delimited with any suitable shape which describes a finite volume. For instance, the three-dimensional volume elements may be cubes, cuboids, parallelepipeds, prisms or hexagonal prisms.

[0075] At step S13 the minimization problem is initialized which may comprise initializing each volume element of the three-dimensional volume 100 with an opacity value and three color values. The initial values of each volume element may be any value, e.g. a same or random set of color values is assigned to each volume element and a same or random opacity value is assigned to each volume element.

[0076] It is understood that the three-dimensional space may comprise the three-dimensional volume and optionally one or more environment maps. When the minimization problem is solved the three-dimensional volume is the three-dimensional model of the scene and the three-dimensional model may optionally further comprise at least one environment map obtained by solving the minimization problem. Accordingly, a model three-dimensional space may be the three-dimensional space but with opacity and color values obtained by solving the minimization problem.

[0077] The method then goes to step S14 involving solving the minimization problem so as to create a 3D model of the scene wherein the 3D model of the scene comprises a 3D model 110 of an object(s) present in the scene. The minimization problem comprises a plurality of residuals, one for each image element of each image and each of the three color values. For instance, for an image with d×d image elements a total of 3d.sup.2 residuals may be defined. For each of the three color values the residual of one image element is defined as the difference between the color value of the image element and the accumulated direction-independent color value, obtained by accumulating the direction-independent color value and opacity value of each volume element in the three-dimensional volume 100 along the ray path of the image element. The accumulated color value for each image element of each image 200a, 200b may be obtained using the classical rendering equation

H(r)=∫.sub.t.sub.n.sup.t.sup.fT(t)σ(r(t))c(r(t))dt Eq. 1

where

T(t)=e(−∫.sub.t.sub.n.sup.tσ(r(s))ds). Eq. 2

[0078] In equations 1 and 2 r is a vector representing each ray path and t indicates the position along the ray which ranges from a near position t.sub.n to a far position t.sub.f indicating the point along the ray path closest to the image element and the point furthest from the image respectively. Further, a indicates the opacity value of a volume element and depends on the path ray and the position t along the path ray. Accordingly, σ(r(t)) is a matrix representing the opacity value of each ray at position t along the rays wherein σ for each volume element may be represented with a scalar value indicating the degree of opacity. For instance, σ may be a value between 0 and 1 wherein 0 indicates a fully transparent volume element and 1 indicates a fully opaque volume element.

[0079] The vector c indicates the three color values of each volume element (contrary to C which indicates the color vector of each image element) which varies for each ray path and position t along each ray path making c a function of r(t).

[0080] Equations 1 and 2 may be discretized using the quadrature rule to form discretized versions of H(r) and T(t) wherein the discretized versions are labelled Ĥ(r) and {circumflex over (T)}(t). The discretized versions can be expressed as:

Ĥ=Σ.sub.t.sub.n.sup.t.sup.f{circumflex over (T)}(t)(1−e.sup.(−σ(t)δ(t)))c(t) Eq. 3

where

{circumflex over (T)}(t)=e.sup.(−Σ.sup.tn.sup.tσ(r(t))δ(t)). Eq. 4

In equation 3 and δ(t) represents a function for obtaining a discretized portion around each position t along each ray path (such as a delta function).

[0081] Accordingly, by using an iterative solver to change the opacity value a and each of the three color values represented with the color vector c of the three-dimensional volume elements until the residuals are decreased below a predetermined threshold value a three dimensional model of the scene may be created in the form of a three dimensional volume 100 representing the scene.

[0082] Thus, a three-dimensional model 100 comprising a three-dimensional representation 110 of an object in the scene may be obtained at step S15. The obtained three-dimensional object may e.g. be stored or transmitted to a second device, such as a local processor configured to render the scene from a target view direction using the three dimensional model 100.

[0083] With reference to FIG. 4 the step of solving the minimization problem will now be described in more detail. The inventors have realized that the rendering equations in equation 1 and 2 or the discretized counterpart in equation 3 and 4 may be used to formulate a minimization which converges rapidly to a three-dimensional model representing the scene due to at least in part a utilization of the color information from each image element of each image.

[0084] In some implementations, the minimization problem is formulated as a non-linear least squares problem which in general involves minimization of the entity S(x) by modification of the parameters x wherein

S(x)=αΣ.sub.i=1.sup.mr.sub.i(x).sup.2 Eq. 5

for residuals r.sub.i where i=1, 2, . . . , m and wherein α is a constant (e.g. α=0.5). By applying the general non-linear least squares problem formulation from equation 5 to the discretized rendering equations from equations 3 and 4 a least squares problem for the three-dimensional space is a acquired as

F(c(t),σ(t))=αΣ.sub.t.sub.n.sup.t.sup.fΣ.sub.i=1.sup.m(Ĥ.sub.i(c.sub.i(t),σ(t))−C.sub.i) Eq. 6

where the residuals are given by the difference between the accumulated color value Ĥ.sub.i and the color value C.sub.i of each image element of each image and the parameters are the color values of each volume element and the opacity value of each volume element. For instance, three color values are used to represent each volume element and image element meaning that m=3 and that the minimization problem comprises three residual types, one for each color value. It is further noted that the three color residuals are added independently to the sum F (c(t), σ(t)) which enables efficient parallelization by enabling optimization of each residual type in parallel.

[0085] At step S34 of FIG. 4 equation 6 in the above is evaluated to determine whether the present set of parameters (σ and c) for the three-dimensional volume (and optionally environment map) results in a sum of residuals below a predetermined (and potentially resolution specific) convergence threshold. In general, the initial parameters x.sub.0, i.e. the initial set of σ and c will not specify a three-dimensional volume associated with a sum of residuals which is below the predetermined convergence threshold and the method will then go step S31 comprising calculating a modification (step) to be applied to the parameters x to form a new set of parameters associated with a smaller sum of residuals.

[0086] The minimization problem may be solved with any suitable iterative solver, for instance the minimization problem may be solved iteratively using the Gauss-Newton algorithm. For some starting value x.sub.0 of the parameters (i.e. a set of σ and c for each volume element) a next set of the parameters may be determined iteratively as

x.sub.i+1=x.sub.i−γ(J.sub.r.sup.TJ.sub.r).sup.−1J.sub.r.sup.Tr(x.sup.i) Eq. 7

wherein J.sub.r is the Jacobian of the residuals, matrix transposition is indicated by T, r(x.sub.i) are the residuals associated with the parameters x.sub.i and γ indicates a step length. The step length may be equal to one but may be adjusted (as will be described in the below) to be smaller than one (in general γ is between 0 and 1). Step S31 relates to calculating a next set of parameters by calculating the modification step with which the current parameters x.sub.i are to be modified. With the Gauss-Newton algorithm the modification step is (J.sub.r.sup.TJ.sub.r).sup.−1J.sub.r.sup.T(x.sub.i) multiplied with γ which is to be subtracted from x.sub.i to form the next set of parameters x.sub.i+1. The Jacobian J.sub.r has the structure

[00001] $\begin{matrix} J_{r} = [\begin{matrix} \frac{\partial r_{1}}{\partial x_{1}} & .Math. & \frac{\partial r_{1}}{\partial x_{n}} \\ .Math. & .Math. \\ \frac{\partial r_{m}}{\partial x_{1}} & .Math. & \frac{\partial r_{m}}{\partial x_{n}} \end{matrix}] & Eq . 8 \end{matrix}$

which means that each row of J.sub.r corresponds to the gradient of the residual Vr.sub.i, that is

[00002] $\begin{matrix} J_{r} = [\begin{matrix} \nabla r_{1} \\ .Math. \\ \nabla r_{m} \end{matrix}] . & Eq . 9 \end{matrix}$

[0087] Accordingly, for each ray path that is defined by each image element of each image the partial derivatives of each residual r.sub.i should be computed with respect to each parameter of each volume element. That is, for a discretized three-dimensional space equation 3 may be applied for a single ray path to obtain

[00003] $\begin{matrix} \hat{H} = e^{0} (1 - e^{- σ_{1} δ_{1}}) c_{1} + e^{- σ_{1} δ_{1}} (1 - e^{- σ_{2} δ_{2}}) c_{2} + e^{- (σ_{1} δ_{1} + σ_{2} δ_{2})}) (1 - e^{- σ_{3} δ_{3}}) c_{3} + .Math. + e^{- ({.Math.}_{1}^{N - 1} σ_{i} δ_{i})} (1 - e^{- σ_{N} δ_{N}}) c_{N} & Eq . 10 \end{matrix}$

where c.sub.1, c.sub.2, c.sub.3, . . . , c.sub.N indicates the color values of each of the N volume elements through which the ray path propagates. The partial derivatives may be obtained efficiently using only two auxiliary variables V and G if equation 10 is traversed backwards. For example, V and G are initialized as V=e.sup.−(E.sup.1.sup.Nσ.sup.i.sup.S.sup.i) and G=0 and the following algorithm (Alg. 1) is performed:

TABLE-US-00001 V = e.sup.−(Σ.sup.1.sup.N.sup.σ.sup.i.sup.δ.sup.i.sup.) G = 0 for i = N to i = 1 do V = V − σ.sub.iδ.sub.i [00004] $\frac{\partial \hat{H}}{\partial c_{i}} = e^{- V} (1 - e^{- σ_{i} δ_{i}})$ [00005] $\frac{\partial \hat{H}}{\partial σ_{i}} = e^{- V} δ_{i} e^{- σ_{i} δ_{i}} c_{i} - δ_{i} G$ G = G + e.sup.−V(1 − e.sup.−σ.sup.i.sup.δ.sup.i)c.sub.i end for
to obtain the partial derivatives

[00006] $\frac{\partial H}{\partial c_{i}} and \frac{\partial H}{\partial σ_{i}}$

for each residual with respect to each parameter c.sub.i and σ.sub.i of each volume element encountered along each ray path. The auxiliary variables V and G may be stored using only 16 bytes, e.g. with 4 bytes allocated for V and 4 bytes for each color value of G.

[0088] To perform an iterative step with the Gauss-Newton solver according to equation 7 a computationally expensive matrix inversion of the square matrix J.sub.r.sup.TJ.sub.r is required. To avoid performing this computationally expensive matrix inversion explicitly the expression −(J.sub.r.sup.TJ.sub.r).sup.−1J.sub.r.sup.Tr(x.sup.i) may be rewritten as Δ=−(J.sub.r.sup.TJ.sub.r).sup.−1J.sub.r.sup.T(x.sub.i) which in turn may be rewritten as

J.sub.r.sup.TJ.sub.rΔ=−J.sub.r.sup.Tr(x.sub.i) Eq. 11

which specifies a linear system of equations of the form

Ax=b Eq. 12

wherein A=J.sub.r.sup.TJ.sub.r, x=Δ and b=−J.sub.r.sup.Tr(x.sup.i). In general, A is a large and sparse matrix of size n×n where n is the total number of parameters of the volume elements in the tree-dimensional volume (and optionally environment map). Solving equation 12 for x may be performed using a number of solution techniques. One way of efficiently solving equation 12 in parallel is to perform an iterative preconditioned conjugate gradient algorithm. For instance, the iterative preconditioned conjugate gradient algorithm (Alg. 2) comprises the following steps:

TABLE-US-00002 r.sub.0 = b − Ax.sub.0 z.sub.0 = M.sup.−1r.sub.0 p.sub.0 = z.sub.0 k = 0 while true do [00007] $α_{k} = \frac{r_{k}^{T} z_{k}}{p_{k}^{T} {Ap}_{k}}$ x.sub.k+1 = x.sub.k + α.sub.kp.sub.k r.sub.k+1 = r.sub.k + α.sub.kAp.sub.k if r.sub.k+1.sup.Tr.sub.k+1 < EPS then return x.sub.k+1 end if z.sub.k+1 = M.sup.−1r.sub.k+1 [00008] $β_{k} = \frac{r_{k + 1}^{T} z_{k + 1}}{r_{k}^{T} z_{k}}$ p.sub.k+1 = z.sub.k+1 + β.sub.kp.sub.k k = k + 1 end while

[0089] wherein EPS denotes a predetermined threshold denoting the upper limit for the entity r.sub.k+1.sup.Tr.sub.k+1 when the algorithm has converged. That is, when r.sub.k+1.sup.Tr.sub.k+1 is lower than the EPS threshold the set of linear equations from equation 12 are solved with x.sub.k+1 with sufficient accuracy. The matrix M is a preconditioner matrix, such as a diagonal Jacobi preconditioner which can be expressed as

[00009] $\begin{matrix} M = diag (J_{r}^{T} J_{r}) = diag (.Math. {(\frac{\partial r_{i}}{\partial x_{1}})}^{2}, .Math. {(\frac{\partial r_{i}}{\partial x_{2}})}^{2}, .Math., .Math. {(\frac{\partial r_{i}}{\partial x_{n}})}^{2}) & Eq . 13 \end{matrix}$

wherein the sum of each element is across i ranging from 1 to m wherein m is the number of residuals. The preconditioner matrix M may preferably be extracted in connection to the Jacobian J.sub.r as the diagonal elements of diag(J.sub.r.sup.TJ.sub.r) is recognized as the squared sum of the columns of the Jacobian J.sub.r from equation 8. Additionally, the computation of Ap in algorithm 2 is preferably not performed explicitly as the matrix A is generally a large matrix. To this end the identity

Ap=J.sub.r.sup.TJ.sub.rp=Σ.sub.i=1.sup.mVr.sub.i(Vr.sub.i.sup.Tp) Eq. 14

may be used which enables each residual r.sub.i to add independently to Ap and thereby enable efficient parallelization.

[0090] The method may then go to step S33 after step S31 is performed which determined x=Δ=−(J.sub.r.sup.TJ.sub.r).sup.−1J.sub.r.sup.Tr(x.sub.i) with sufficient accuracy (e.g. utilizing Alg. 2). Step S33 involves modifying the current parameters x.sub.i to obtain a next set x.sub.i+1 in accordance with equation 7 with e.g. γ=1. Following step S33 the sum of the residuals may again be evaluated at step S34 but this time for the new parameters x.sub.i+1 whereby the iterative process involving step S31, optionally S32, S33 and S34 is repeated until the sum of the residuals is below the predetermined convergence threshold.

[0091] Optionally, when the computation of the Gauss-Newton step Δ=−(J.sub.r.sup.TJ.sub.r).sup.−1 J.sub.r.sup.Tr(x.sub.i) is complete in step S31 the method may go to step S32 involving running a backtracking algorithm to find a suitable step length γ which is used to scale the Gauss-Newton step. For instance, starting with a step length of γ=1 and repeatedly decimating this value with a factor of α<1, such as α=0.9, while continuing to evaluate the sum of the residuals using equation 5 until a higher value of the sum S is acquired a suitable step length γ may be identified. For instance, the step length γ may be determined using an algorithm (Alg. 3) comprising the steps of:

TABLE-US-00003 γ = 1 α = 0.9 μ.sub.prev = FLT_MAX while true do μ = Error(γ) if μ > μ.sub.prev then return γ/α end if γ = αγ μ.sub.prev = μ end while

[0092] wherein FLT_MAX indicates a large value, e.g. the largest finite floating point value which can be represented in the computer environment implementing the algorithm. Moreover, μ=Error(γ) indicates the residual error by evaluating equation 5 or 6 with the parameter x.sub.i modified with J.sub.r.sup.TJ.sub.r).sup.−1 J.sub.r.sup.Tr(x.sub.i) weighted with γ. For as long as μ decreases the step size is decimated further until a higher value of μ is acquired, whereby the previous value of γ is returned as the optimal step length. That is the step length associated with the smallest error.

[0093] When the optimal step length γ is determined the method scales the step calculated at S31 with the step length calculated at S32 to obtain a next modified set of parameters x.sub.i+1 which is passed on to step 34.

[0094] As mentioned in the above S31, optionally S32, S33 and S34 are repeated until the sum of the residuals is below a predetermined convergence threshold. When a set of parameters x.sub.i+1 associated with residuals below the predetermined convergence threshold is obtained the method may go to step S35 wherein it is determined if the three-dimensional space (i.e. the three-dimensional volume and optionally environment map) is represented with the final resolution. To facilitate faster convergence, the three-dimensional space (volume) is first initialized with a first resolution whereby the minimization problem is solved at the first resolution by finding a set of parameters associated with residuals below the predetermined first resolution convergence threshold. The first resolution solution is used to initialize a three-dimensional space (volume) of a second resolution, which is a higher resolution than the first resolution, whereby the minimization problem is solved at the second resolution by finding a set of parameters associated with residuals below the predetermined second resolution convergence threshold. In this manner, the minimization problem may be solved hierarchically at increasing resolutions. For instance, the minimization problem may utilize two, three, four, five or more resolutions and experiments have shown that using at least two resolutions helps to speed up convergence and reduce the risk of the solver converging to a local minimum as opposed to the global minimum.

[0095] Accordingly, if it is determined at S35 that the parameters associated with residuals below the predetermined convergence threshold are associated with a three-dimensional space of the final (highest) resolution the method goes to step S35 in which the final complete three-dimensional model is represented with the parameters of the three-dimensional space. If it is determined at S35 that the resolution was not the final resolution the method goes to step S37 comprising initializing a three-dimensional space with the next (higher) resolution.

[0096] The initial parameters of the next resolution three-dimensional space are based on the parameters of the previous resolution parameters. For instance, the next resolution parameters may be obtained by interpolating the previous resolution parameters. The next resolution three-dimensional space is then used at step S31 to calculate a modification step whereby iteratively repeating steps S31, optionally S32, S33 and S34 leads to convergence at the next resolution. For instance, a first resolution for the three-dimensional volume of the three-dimensional space may be 64.sup.3, whereby the second resolution is 128.sup.3, the third is 256.sup.3 and the fourth is 512.sup.3.

[0097] In some implementations, an opacity residual L is added for each ray path of each image element of each image in addition to the three color residuals to mitigate problems with smoke-like artifacts introduced by solving the minimization problem for scenes with view dependent objects. The purpose of the opacity residual L is to penalize (i.e. provide a high associated residual value) for ray paths which have an accumulated opacity value that deviates from either a value indicating a fully transparent ray path or a value indicating a fully opaque ray path. For instance, if the opacity value of each volume element is a value between 0 and 1 wherein 0 indicates a fully transparent volume element and 1 indicates a fully opaque volume element the following simple definition of L may be used:

L=λ(−4({circumflex over (T)}−0.5).sup.2+1) Eq. 15

[0098] Wherein λ is a factor determining to which extent ray paths should be penalized for an accumulated opacity value deviating from either fully opaque or fully transparent wherein λ preferably is between 0.05 and 0.15, and most preferably around 0.1. Higher values of λ may result in unwanted artifacts of too opaque or too transparent surfaces. However, upon convergence of the minimization problem with the opacity residual and λ below 0.15 or without the L-residual entirely (e.g. with λ=0) λ may be raised to 0.5 or above, or preferably to 0.8 or above, or most preferably to around 1.0 for a one or more additional iterations to clean up any remaining smoke-like artifacts in the created three-dimensional model of the scene.

[0099] FIG. 5 illustrates schematically how a ray path 211 of an image element of an image captured by a camera 20 propagates through the three-dimensional volume 100. The three-dimensional volume 100 comprises a plurality of three-dimensional volume elements V.sub.1,1, V.sub.1,2, V.sub.1,3, V.sub.2,1, V.sub.2,2, V.sub.2,3, V.sub.3,1, V.sub.3,2, V.sub.3,3. In this particular example, the ray path 211 propagates through volume elements V.sub.2,3, V.sub.2,2, V.sub.3,2 and V.sub.3,1 meaning that the residuals of the image element associated with the ray path 211 will depend on the accumulated color values and opacity value of each of the volume elements V.sub.2,3, V.sub.2,2, V.sub.3,2 and V.sub.3,1. Also depicted in FIG. 5 are sample points t.sub.1, t.sub.2, t.sub.3 indicating where along the ray path 211 the parameters of the three-dimensional volume 100 are sampled. The sample points t.sub.1, t.sub.2, t.sub.3 may by default be equidistantly placed along the ray path 211 wherein the distance between each neighboring pair of sample points may be referred to as a raymarch step. In some implementations, jittering is applied to the sample points t.sub.1, t.sub.2, t.sub.3 to enable a random placement of each sample point t.sub.1, t.sub.2, t.sub.3 within half of a raymarch step from the respective equidistant default position. As for the random originating point of each ray path 211 in each image element, jittering of the sample points t.sub.1, t.sub.2, t.sub.3 along each ray path 211 mitigates issues with constant sampling in the three-dimensional volume 100. The jittering may be enabled by one random variable indicating the displacement of the sample point from the equidistant default position and a new random position may be used each time the ray path is evaluated. To avoid performing the computationally costly procedure of generating new random values for each sample point t.sub.1, t.sub.2, t.sub.3 when solving the minimization problem the random values may have been generated beforehand (and are e.g. shared with the random variables of the image elements described in the above) and merely accessed or obtained when solving the minimization problem.

[0100] FIG. 6 illustrates that the three-dimensional volume 100 may be surrounded by one or more two-dimensional environment maps 300a, 300b, 300c, 300d. Accordingly, the three-dimensional space may comprise the three-dimensional volume 100 and environment map(s) 300a, 300b, 300c, 300d. Each environment map 300a, 300b, 300c, 300d comprises a grid of two-dimensional environment map elements wherein each environment map element comprises three color values and an opacity value much like the volume elements of the three-dimensional volume 100. When solving the minimization problem the residual of each ray path 211a, 211b is based on the accumulated color value of each volume element of the volume 100 and each environment map element of one or more environment maps 300a, 300b, 300c, 300d which the ray path 211a, 211b passes. The environment maps 300a, 300b, 300c, 300d may be provided in a plurality of layers substantially or entirely surrounding three-dimensional volume 100. While depicted as flat layers the environment maps 300a, 300b, 300c, 300d may be defined as concave, convex, curved or spherical surfaces. Moreover, at least one environment maps may be defined as a cube map with six rectangular surfaces surrounding the three-dimensional volume 100.

[0101] Accordingly, the opacity and color value of each environment map element are included as parameters in the minimization problem and the environment maps are created alongside the three-dimensional volume 100 comprising a model 110 of one or more objects of the scene. To this end, the environment maps 300a, 300b, 300c, 300d may be initiated with a random, constant or arbitrary predetermined set of parameters similar to the initialization of the volume elements. Moreover, as the resolution of the three-dimensional volume 100 may be increased to enable solving the minimization problem in hierarchically the resolution of the environment maps 300a, 300b, 300c, 300d may also be increased in a corresponding manner.

[0102] With reference to FIGS. 7a, 7b and 7c there is depicted a rendering arrangement comprising a display 50 connected to a local processor 40 which renders an image 100a of the three-dimensional space (i.e. three-dimensional volume and optionally environment map) from a target view direction. The three-dimensional model 100 may be associated with a reference view direction R. The reference view direction R may be implicitly defined by the three-dimensional volume 100, e.g. the reference view direction R may be based on the arrangement of the volume elements comprised in the three-dimensional volume 100. The reference view direction R may specify the plane of the imaginary window which should be displayer on display 50 relative the three-dimensional volume 100 and any object(s) 110 therein.

[0103] In some implementations, the local processor 40 obtains a three-dimensional model 100, or a stream of three-dimensional models 100 of the same scene for 3D video rendering, e.g. over a network. The rendering arrangement further comprises head position determining means 60a, 60b configured to determine the position of the head of a user 70 viewing the display. The head position determining means 60a, 60b may comprise one or more cameras 60a, 60b configured to periodically acquire images of the user 70. The head position determining means 60a, 60b may comprise a processor configured to determine the head position of the user 70 relative a reference direction, e.g. by analyzing the images of the one or more cameras 60a, 60b. The reference direction may e.g. be the normal N of the display 50. Alternatively, the images of the at least one camera 60a, 60b are provided to the local processor 40 wherein the local processor 40 is configured to perform the analysis to determine the position of the user's 70 head with respect to the reference direction. The analysis may involve any suitable method for determining the position of the user's (70) head and may e.g. involve image recognition, triangulation and/or a neural network. For instance, the determination of the user's head 70 relative reference direction may utilize the fact that the distance between the eyes of human is approximately 12 cm. With such a reference distance the head position determining means may efficiently establish where the head of the user 70 is located in front of the display 50.

[0104] With reference to FIGS. 7d, 7e and 7f it is illustrated how information relating to the position of the user's 70 head relative the display 50 may be utilized to simulate a window effect with the display 50. As the user 70 moves away from a reference display view direction—such as the normal N of the display 50—and views the display 50 from an angle α relative the normal N the target rendering direction T for the local processor 40 is changed accordingly to render the three-dimensional volume 110b from a target direction T which forms approximately an angle α with the reference rendering direction R. Thus, when the user 70 moves relative the display 50 the target rendering direction T changes so as to simulate that the scene comprising object(s) 110 is stationary on the other side of the of display 50. In other words the display 50 acts as a window into the three-dimensional volume 100, wherein which portion of the three-dimensional volume 100 that is visible to the user 70 changes with respect to which direction from which the user 70 looks through the imaginary window/display.

[0105] Although the angle α is illustrated as being in the horizontal plane the angle α may comprise both a vertical and horizontal component to describe head positions that move side-to-side and up-and-down in front of the display 50.

[0106] Additionally, as the steps of determining the head position of the user 70 relative the display, rendering the three-dimensional model 100 from a suitable target viewing direction T and displaying the rendered image on the display 50 will in general take some time to perform a distracting delay between the actual head position for the user 70 and the target view direction T which is currently displayed on the display 50 may occur. This distracting delay will be more apparent the faster the user 70 moves his/her head in front of the display and alters the angle α.

[0107] To mitigate this effect, a predictor may be implemented as a part of the head position determining means 60a, 60b and/or the local processor 40 wherein the predictor obtains at least two previous head positions and calculates a future position of the head. For instance, if the steps involving rendering and displaying the three-dimensional volume 100 from a target view direction T requires a time t to perform the predictor may be configured to predict the head position of the user 70 a time t into the future by e.g. utilizing the latest and previous head positions. The predicted future head position is then provided to the local processor which starts the process of rendering the three-dimensional volume 100 with a target view direction T based on the predicted head position as opposed to the current or latest determined head position. Accordingly, the head of the user 70 will most likely be at the predicted position when the rendered image for that position is displayed on the display 50 which mitigates or removes the distracting delay.

[0108] The predictor may operate by calculating the future head position using the physics simulation of the motion of the head of the user (e.g. by monitoring and calculating the inertia and acceleration of the head). Alternatively or additionally, the predictor may comprise a neural network trained to predict the future head position given two or more previous head positions. It is noted that the head position determining means may be configured to update the head position of the user 70 with a higher frequency than the local processor renders a novel view of the three-dimensional model and/or with a higher frequency than the rate at which an updated three-dimensional model 100 is provided to the local processor 40.

[0109] Although the object 110 is depicted as a stationary geometrical shape the object 110 may be move or otherwise change in the three-dimensional volume 100 whilst the reference rendering direction R is maintained. Particularly, the object 110 may be a person moving and interacting with the user 70 via the display 50 as a part of a 3D video teleconferencing suite.

[0110] The person skilled in the art realizes that the present invention by no means is limited to the preferred embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims. For example, the position determination means may comprise at least one camera to enable determining the position of the user's 70 head and the at least one camera of the position determining means may be used simultaneously as one of the plurality of cameras used to capture the plurality of images of the scene. Accordingly, in two-way video conferencing system a plurality of cameras may be provided on both the remote end and the local end, wherein at least one of the cameras on either end is used as a part of the head position determining means.

METHOD FOR PERFORMING VOLUMETRIC RECONSTRUCTION

Inventors

Cpc classification

Classification Explorer

G06T15/20

PHYSICS

Classification Explorer

G06T7/55

PHYSICS

Classification Explorer

G06T2215/12

PHYSICS

Classification Explorer

G06T17/10

PHYSICS

Classification Explorer

G06T15/08

PHYSICS

Classification Explorer

G06T2200/08

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

G06T17/20

PHYSICS

Classification Explorer

G06T15/06

PHYSICS

Classification Explorer

G06T17/00

PHYSICS

Classification Explorer

G06T2210/62

PHYSICS

Classification Explorer

G06T2207/30196

PHYSICS

Classification Explorer

G06F3/012

PHYSICS

International classification

Classification Explorer

G06T17/10

PHYSICS

Classification Explorer

G06F3/01

PHYSICS

Classification Explorer

G06T15/06

PHYSICS

Classification Explorer

G06T15/08

PHYSICS

Classification Explorer

G06T15/20

PHYSICS

Classification Explorer

G06T17/20

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Abstract

Claims

Description