Co-registration—simultaneous alignment and modeling of articulated 3D shapes

Abstract

Present application refers to a method, a model generation unit and a computer program (product) for generating trained models (M) of moving persons, based on physically measured person scan data (S). The approach is based on a common template (T) for the respective person and on the measured person scan data (S) in different shapes and different poses. Scan data are measured with a 3D laser scanner. A generic personal model is used for co-registering a set of person scan data (S) aligning the template (T) to the set of person scans (S) while simultaneously training the generic personal model to become a trained person model (M) by constraining the generic person model to be scan-specific, person-specific and pose-specific and providing the trained model (M), based on the co registering of the measured object scan data (S).

Claims

1. A model generation unit for generating deformable, non-rigid visual models (M) of physical objects, based on physically measured object scan data (S), comprising: a template interface for providing at least one common template (T) for one of the physical objects; a scanner for scanning said physical objects having respectively different shapes and poses to generate object scan data (S) that corresponds to physical landmarks on surfaces of said physical objects; a database for storing at least one generic object model that corresponds to said object scan data; an initializing interface for providing said object scan data (S) and said template data (T) in initialized form; a co-registration unit for executing a non-linear objective function encompassing both a mesh alignment term and a model term for co-registering a set of ones of said object scan data (S) by executing registering and model generation in a combined manner, namely: repeatedly a) aligning the template data (T) to the object scan data (S) to obtain aligned scans and training one of the models based on the scanned data, and b) constraining the aligning in step a) based on the one of the models (M) being trained; and an output interface for generating said deformable, non-rigid visual models (M).

2. The model generation unit according to claim 1, wherein aligning is executed by deforming the initialized template (T) to all initialized scans (S) of the set of initialized scans (S) in parallel and/or by inferring object shape from incomplete, noisy and/or ambiguous scan data.

3. The model generation unit according to claim 1, wherein co-registration uses data present in another scan (S.sub.o) in order to propagate information learned from the other scan (S.sub.o) to present scan (S).

4. The model generation unit according to claim 1, wherein at least some or all of the steps are executed iteratively so that the generic model may be replaced in the course of process with the trained model (M).

5. A model generation unit according to claim 1, wherein aligning is done by applying a data penalty term for deforming the template (T) to match the scans (S) and by applying a data coupling term for constraining the deforming according to the trained model (M).

6. The model generation unit according to claim 1, wherein the generic object model is a BlendSCAPE model, which is scan-specific, object-specific and pose-specific.

7. The model generation unit according to claim 1, wherein a fit of an aligned template surface (T) to a surface of the initialized object scan (S) is evaluated by: $E_{S} (T; S) = \frac{1}{a_{S}}_{x_{s} S} (\min_{x_{t} T} .Math. x_{s} - x_{t} .Math.) .$

8. The model generation unit according to claim 1, wherein differences between the aligned template and the trained model are penalized by a coupling term, which is defined by: $E_{C} (T,, D, Q) = \underset{f}{.Math.} a_{f} {.Math. T_{f} - B_{f} () D_{f} Q_{f} () T_{f}^{*} .Math.}_{F}^{2} .$

9. The model generation unit according to claim 1, wherein simple regularization terms are used to constrain object shape deformations (D) with regard to spatial smoothness and pose-dependent deformation model (Q).

10. The model generation unit according to claim 1, wherein a result is a set of alignments, wherein one alignment refers to one scan (S), and a set of trained object models (M), wherein one model (M) refers to one physical object.

11. A method for generating deformable, non-rigid visual models (M) of physical objects, based on physically measured object scan data, comprising the following steps: providing at least one common template (T) for one of the physical objects; scanning said physical objects having respectively different shapes and poses to generate object scan data (S) that corresponds to physical locations on surfaces of said physical objects; providing a database that includes at least one generic object model that corresponds to said object scan data; providing said object scan data (S) and said template data (T) in initialized form; co-registering a set of ones of said object scan data (S) by executing a non-linear objective function encompassing both a mesh alignment term and a model term for the steps of registering and model generation in a combined manner, namely: repeatedly, a) aligning the template data (T) to object scan data (S) to obtain aligned scans and training one of the models based on the scanned data, and b) constraining the aligning in step a) based on the one of the models (M) being trained; and generating said deformable visual models (M).

12. The method according to claim 11, wherein all initialized object scans (S) are registered in parallel while simultaneously calculating object shape deformations (D) and a pose-dependent deformation model (Q) across all scans (S).

13. The method according to claim 11, wherein aligning is executed by deforming the initialized template (T) to all initialized scans (S) of the set of initialized scans (S) in parallel and/or by inferring object shape from incomplete, noisy and/or ambiguous scan data.

14. The method according to claim 11, wherein co-registration uses data present in another scan (S.sub.o) in order to propagate information learned from the other scan (S.sub.o) to present scan (S).

15. The method according to claim 11, wherein at least some or all of the steps are executed iteratively so that the generic model may be replaced in the course of process with the trained model (M).

16. The method according to claim 11, wherein aligning is done by applying a data penalty term for deforming the template (T) to match the scans (S) and by applying a data coupling term for constraining the deforming according to the trained model (M).

17. The method according to claim 11, wherein the generic object model is a BlendSCAPE model, which is scan-specific, object-specific and pose-specific.

18. The method according to claim 11, wherein a fit of an aligned template surface (T) to a surface of the initialized object scan (S) is evaluated by: $E_{S} (T; S) = \frac{1}{a_{S}}_{x_{s} S} (\min_{x_{t} T} .Math. x_{s} - x_{t} .Math.) .$

19. The method according to claim 11, wherein differences between the aligned template and the trained model are penalized by a coupling term, which is defined by: $E_{C} (T,, D, Q) = \underset{f}{.Math.} a_{f} {.Math. T_{f} - B_{f} () D_{f} Q_{f} () T_{f}^{*} .Math.}_{F}^{2} .$

20. The method according to claim 11, wherein simple regularization terms are used to constrain object shape deformations (D) with regard to spatial smoothness and pose-dependent deformation model (Q).

21. The method according to claim 11, wherein a result is a set of alignments, wherein one alignment refers to one scan (S), and a set of trained object models (M), wherein one model (M) refers to one physical object.

22. A computer program product operable, when executed on at least one computer, to perform the method according to claim 11.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Embodiments of the present invention will now be described with reference to the accompanying drawings in which:

(2) FIG. 1 is a schematic overview of relevant units for building a 3D model of a person in prior art,

(3) FIG. 2 shows by way of example a corpus registration procedure and respective learning according to prior art systems,

(4) FIG. 3 shows a schematic overview according to the co-registration procedure according to a preferred embodiment of present invention,

(5) FIG. 4 is a flowchart of a co-registration procedure according to a preferred embodiment of present invention,

(6) FIG. 5 refers to using a SCAPE and BlendSCAPE model according to a preferred embodiment of present invention,

(7) FIG. 6 a comparison of a SCAPE versus BlendSCAPE model according to a preferred embodiment of present invention,

(8) FIG. 7 shows the convergence of quality indicators, according to a preferred embodiment of present invention

(9) FIG. 7a shows a distance from a scan according to a preferred embodiment of present invention

(10) FIG. 7b shows a landmark prediction error according to a preferred embodiment of present invention,

(11) FIG. 8 shows examples from co-registration according to a preferred embodiment of present invention and

(12) FIG. 9 shows registration denoising according to a preferred embodiment of present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

(13) The figures illustrate principles of the invention according to specific embodiments. Thus, it is also possible to implement the invention in other embodiments, so that these figures are only to be construed as examples. Moreover, in the figures, like reference numerals designate corresponding modules or items throughout the different drawings.

(14) FIG. 3 shows all relevant units and processes for model building according to a preferred embodiment of present invention. A 3D sensing device like a high resolution structured light scanner is used to provide a set of scans of an object, particularly a person, in different poses. Thus, a first scan S.sub.1 may refer to the person sitting on a chair, the second scan S.sub.2 may refer to the person lying on a couch, the third scan S.sub.3 may refer to the person standing and a fourth scan S.sub.4 may refer to the person walking. The scans S.sub.1, S.sub.2, . . . S.sub.n serve as input parameters for the co-registration procedure.

(15) Further, a 3D template T is used. Particularly, the template T is a generic 3D wire mesh model for a representation of an object and particularly of an individual person. Thus, a set of scans S, a template mesh T and a rough initialization of the template match or fit (i.e. the alignment) to each of the scans S are given and are used for model generation. According to an aspect the articulated generated 3D model is refined step-by-step iteratively according to the different read-in scans S.

(16) A generic 3D model M of each person is used as input for co-registration. Preferably, an articulated 3D model of each person is crudely estimated using the shape and part segmentation of the generic-looking human-shaped 3D template.

(17) As can be seen in FIG. 3 the 3D scans S, the template T and the untrained, generic model M serve as input parameters for the co-registration procedure.

(18) As can be seen by comparing FIG. 1 and FIG. 3, the co-registration procedure differs from known procedures in that the alignment/registration step and the model building step are combined.

(19) As already mentioned in the general description above previous methods for building articulated models (for example Hasler et al.) have relied on a three phase process: 1. data initialization 2. data registration or alignmenti.e. aligning the raw 3D data with a 3D template. Usually the template is deformed in order to match the scan data by bringing the template into point-to-point correspondence with the respective scan. 3. After the registration process is completed and all data are aligned, model building is initiated. All the aligned data is used to build an articulated model that captures the shape and pose of the person in each scan.

(20) In contrast, the co-registration algorithm according to present invention treats 3D alignment (step 2) and model learning (step 3) as a single problem. These two steps are executed in an interleaved or combined manner. Initially, a generic looking articulated model of the person being aligned is used. This model is used to guide and constrain the alignment algorithm and then the resulting model-guided alignments are used to update, or to refine, the articulated model of each person. This process is repeated many times iteratively. When the entire co-registration process is complete, the result is a set of alignments (one alignment per scan) and a set of articulated 3D models (one model per person). Both the alignments and models have been optimized to be maximally consistent with the originally 3D scan data.

(21) A further significant difference between present invention and state of the art systems refers to the fact that in previous systems aligning the template to the scans has been executed independently, i.e. deformation of the template T according to scan S.sub.1 has been executed independent of the deformation of the template T according to scan S.sub.2 and to the other scans. By contrast, present invention performs the aligning of each scan S of a person dependent on the respective other scan alignment procedures. This approach makes it possible to consider the consistency of each individual's shape across a different poses and the similarities in the deformations of different bodies as they change pose.

(22) As can be seen in FIG. 3 co-registration comprises model training, which is based on real measured 3D scan data of the individual person.

(23) With respect to FIG. 4 a possible flowchart of a co-registration and model building procedure is described below.

(24) After starting the procedure a set of scan data is measured by a 3D laser scanner.

(25) Optionally other acquisition methods may be used to provide two-dimensional representations of a 3D dimensional object. Preferably, the object is an individual person in different poses. Generally, it is possible to measure the data or to access measured data by an interface. The measured data may be stored in a database or another storage unit.

(26) The second step refers to reading in a template T. This may also be done by accessing a respective database.

(27) In the third step a generic model of the person is accessed.

(28) It has to be noted that the sequence of the steps mentioned before may be changed.

(29) In the fourth step the scans S and the template T are initialized. Initialization may be done by using manually placed markers on the scans S and on the template mesh. Alternatively it is also possible to use automated methods for initialization. Initialization may be executed in a preparation phase, preceding the model generation phase, in order to provide already initialized scans S and template T.

(30) As can be seen in FIG. 4, the fifth step refers to the co-registration procedure. A co-registration is implemented as an iteration. Co-registration comprises both: aligning the template to match the set of object scans (this is depicted on the left-hand side in FIG. 4) and estimating a 3D model and training this model according to the aligned scans. This model is used to constrain the aligning procedure.

(31) The co-registration repeatedly re-estimates both models and alignments. Each time the model is re-estimated, it is constrained to fit all the alignments as accurately as possible. More specifically, each model contains a large number of internal parameters which determine the model's overall body shape, as well as how the model deforms to accurately assume a range of poses. These internal parameters are computed so as to best match the alignments of all scans. These internal parameters provide a series of linear constrains that determine how each triangle of the model should change shape as the model changes pose. The parameters also describe the overall body shape of each person being aligned. If scans of enough people are provided, they can also describe how body shape is likely to vary across an entire population. Further statistical methods procedures may be used here. After a co-registration is completed there is provided a trained articulated 3D model M of a person and also for different persons in different poses.

(32) Usually the method ends after providing the trained articulated 3D model M. Alternatively, it is also possible to execute at least a part of the steps repeatedly, for example for an updated set of 3D scans (for example another person).

(33) One key aspect of the present invention is to be seen that all the scans of a person are computed parallel, so that the template may be aligned to match all scans simultaneously and dependent on the other scans. By combining aligning and model building it is possible to wave manual corrections or hand tuning of the alignment algorithm (which sometimes turns out to be necessary in previous registration and model building methods).

(34) Once more referring to FIG. 4 it can be seen that when co-registration is used to simultaneously align many scans S in parallel, simultaneously an articulated 3D model of each person is estimated. This 3D model is used to guide the alignment process, which is also depicted in FIG. 4.

(35) Preferably, all steps of the method mentioned in this application are computer-implemented and may be implemented in software. Particularly, the alignment is an alignment algorithm and the model building is also implemented by an algorithm. By executing these algorithms (aligning and model building) it is possible to ensure that all of the alignments, as well as the model, are maximally consistent in terms of their anatomical point-to-point correspondence across scans.

(36) By learning or training the model of each person, an articulated (i.e. poseable) model of each person is estimated. This model is used to constrain the alignment process. When the model does not provide hard constraints on the shape of the each alignment, each alignment is penalized by how much it disagrees with the model. With other words, if the model associated with person x cannot be posed to closely match the alignment of person x to a given scan, that alignment will be strongly penalized. This penalty encourages the algorithm to further deform the alignment such that it more closely matches the shape of the model.

(37) The co-registration and model building procedure according to present invention may be based on different model generation methods. It is possible to use the SCAPE model or the BlendSCAPE model, which will be described in detail below.

(38) SCAPE and BlendSCAPE:

(39) SCAPE is a model of human body shape learned from registered scans. In this respect it is referred to Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: Shape completion and animation of people. ACM ToG. 24 (2005) 408-416.

(40) The SCAPE model defines how to deform a human-shaped triangulated template mesh, T*, to take on different poses and identities (body shapes). Let T* be pre-segmented into parts (differently coded in FIG. 5) connected in a kinematic tree structure. The relative rotations between neighboring parts in the tree are represented as Rodrigues vectors. Let be a vector collecting all the relative rotations and R() represent the absolute rotations of the triangles in a part. In SCAPE, every triangle within a part has the same rotation. Let D represent the deformations that transform T* into the shape of a specific person. Finally, let Q() define non-rigid deformations of the triangles of T* that capture shape change as a function of pose .

(41) As can be seen in FIG. 5 a template may be deformed with SCAPE and BlendSCAPE. On the left hand side in FIG. 5 the template T is depicted, and on the right hand side there is shown the deformed templates: Rigid R() Blended B() +Identity D.sup.p +Non-Rigid Q()

(42) The template is deformed in three steps. First T* is decomposed, or unstitched, into disconnected triangles, T.sub.f*. Each unstitched triangle is represented by a pair of its edge vectors, forgetting its location but retaining its shape and orientation.

(43) Second, each unstitched triangle is individually deformed according to a sequence of pose- and shape-dependent 33 linear deformations. Each unstitched triangle T.sub.f* is posed by a rotation R.sub.f() and deformed to represent a person's body shape using a 33 matrix D.sub.f. It is also deformed by a 33 matrix Q.sub.f() that accounts for pose-dependent shape changes like muscle bulging and skin wrinkling and corrects for deviations between the rigidly posed model and the true shape. A transformed triangle is written
T.sub.f=R.sub.f()D.sub.fQ.sub.f()T.sub.f*

(44) These deformed triangles are recomposed, or stitched, to define the vertices of a watertight mesh M(,D,Q). Because triangles are transformed independently, and will disagree at shared edges, we solve for the final vertex locations of the mesh using least-squares.

(45) SCAPE uses a partition of the template triangles into rigid parts to define its model for R. Since each part is independently rotated, the final stitched body surface can collapse, crease or fold near joints (see FIG. 5). Q can be trained to correct these artifacts given artifact-free alignments, but if these artifacts are sufficiently severe they can cause convergence problems during co-registration.

(46) To address this a BlendSCAPE model is introduced, in which each triangle's rotation is a linear blend,

(47) $B_{f} () = \underset{i}{.Math.} w_{fi} R^{i},$
of the rotations, R.sup.i, of the parts, indexed by i, in the kinematic tree. These weights, w.sub.fi can be estimated along with the other parameters of the model, but in this work we define them manually by smoothing our SCAPE segmentation across part boundaries. The template posed with BlendSCAPE is shown in FIG. 5. Clearly B.sub.f() itself does not provide a realistic model of body pose deformation, but rather reduces the work that Q must do to correct its errors. It has been found that this improves model fitting to scans and, consequently, registration (FIG. 6).

(48) FIG. 6 shows a comparison of SCAPE vs BlendSCAPE. Rigid part rotations, R, can introduce significant artifacts early in the fitting process, particularly when scan noise (e.g. holes) coincides with part boundaries of the model. Co-registration has difficulty in eliminating these artifacts. In contrast, B from the BlendSCAPE model introduces smoother, less significant artifacts, which co-registration rapidly corrects.

(49) A corpus of body scans is assumed, containing multiple people, each in multiple poses, and it is known which scans correspond to which people. After co-registration, each scan is modeled by a person-specific D.sup.p that represents that individual's body shape, a scan specific pose, .sup.s, and a pose-dependent Q(.sup.s) for each scan in which the function Q is the same all across people. As in previous work (see Anguelov et al.), the deformation Q is a linear function of the Rodrigues vectors describing the relative orientations of adjacent parts:

(50) $Q () = Q^{0} + \underset{c}{.Math.}_{c} Q^{c}$
where .sub.c is the c.sup.th element of the pose vector , and Q.sup.0,Q.sup.c contain the linear coefficients and are learned from the corpus of registered bodies. This model is constrained so only the orientations of parts near a triangle contribute to its deformation (i.e. Q.sup.c is kept sparse).

(51) Previous SCAPE models have been built using two body scan corpora: one containing people of different shapes in roughly a fixed pose and the other containing one person in many poses. This is in contrast to Hasler et. al. who train a model with correlation between shape and pose using scans of several people in different poses. Present invention describes the first SCAPE pose model, Q, trained from multiple people in multiple poses.

(52) This improves the ability to model the deformations of different people. In summary, a scan in the corpus is approximated with a model M(.sup.s,D.sup.p,Q) that poses the model using B(.sup.s), deforms it to the identity of the person using D.sup.p, and accounts for non-rigid shape changes using Q, which are a function of pose, .sup.s.

(53) Co-Registration:

(54) The process of co-registration is explained in more detail below.

(55) Co-registration aligns a triangulated template mesh to a corpus of 3D scans while simultaneously training a BlendSCAPE model. Below a data penalty term is defined that seeks to deform the template T to match a scan S and a novel coupling term that constrains this deformation to be similar to a learned BlendSCAPE model. Optimization involves solving for both the alignment and the model parameters.

(56) To train the model M, a pose .sup.s for each scan in the corpus must be estimated, a shape D.sup.p for each person in our corpus, and a single linear pose-dependent deformation model Q(). Once co-registration is complete, each scan should be tightly fit by a deformed template mesh and should also closely match the corresponding BlendSCAPE body M(.sup.s,D.sup.p,Q). Note that before training an untrained BlendSCAPE model exists in which D and Q are the identity. At the start of co-registration, the template is roughly aligned by posing and scaling the untrained BlendSCAPE model. For this step a set of landmarks associated with each scan is used. Note, however, during co-registration the landmarks are discarded, in contrast to stat of the art systems (Allen et al.).

(57) Given a scan S, the following data term, E.sub.S, is defined, evaluating the fit of the deformed template T to the surface of the scan S:

(58) $\begin{matrix} E_{S} (T; S) = \frac{1}{a_{S}}_{x_{s} S} (\min_{x_{t} T} .Math. x_{s} - x_{t} .Math.), & (Equation 2) \end{matrix}$
where is the Geman-McClure robust error function

(59) $(x) = \frac{x^{2}}{^{2} + x^{2}},$
S is the scan surface, a.sub.S is the scan's surface area, and T is the surface of the aligned template. The data error is approximated, using a fixed set of locations x.sub.s, uniformly sampled over the surface of the scan S. It is also possible to add a landmark term into E.sub.S that would constrain known locations on the template to be close to measured locations on the scan.

(60) To address the correspondence ambiguity inherent to E.sub.S, a coupling term is added, penalizing differences between the aligned template and the current model:

(61) $\begin{matrix} E_{C} (T,, D, Q) = \underset{f}{.Math.} a_{f} {.Math. T_{f} - B_{f} () D_{f} Q_{f} () T_{f}^{*} .Math.}_{F}^{2}, & (Equation 3) \end{matrix}$
where T.sub.frepresents the pair of edge vectors of the unstitched triangle f of T, B.sub.f()D.sub.fQ.sub.f()T.sub.f* is the corresponding unstitched triangle of M(,D,Q), and a.sub.f is the area of f on the template mesh, T*. The squared Frobenius norm is used to measure the difference between corresponding unstitched triangles of T and M(,D,Q). This is simply the sum of squared distances between corresponding pairs of edge vectors.

(62) Additionally, simple regularization terms are used to constrain the body shape deformations, D, and the pose-dependent deformation model, Q. The first term promotes spatial smoothness of the deformations, D, that map the template mesh to an observed person. The second term penalizes the magnitude of the effect of the pose-dependent deformation model

(63) $\begin{matrix} E_{D} (D) = \underset{adjacent faces i, j}{.Math.} a_{ij} \frac{{.Math. D_{i} - D_{j} .Math.}_{F}^{2}}{h_{ij}^{2}} E_{Q} (Q) = \underset{faces f}{.Math.} a_{f} ({.Math. Q_{f}^{0} - I .Math.}_{F}^{2} + \underset{c}{.Math.} {.Math. Q_{f}^{c} .Math.}_{F}^{2}) & (Equation 4) \end{matrix}$

(64) Here h.sub.ij is the distance between the centroids of template triangles i and j, a.sub.f is the area of triangle f, and

(65) $a_{ij} = \frac{a_{i} + a_{j}}{3}$
is the area of the diamond-shaped region defined by the centroids of triangles i and j and the endpoints of their shared edge.

(66) A weakly informative pose prior, E.sub., is also used, which penalizes deviation from the template pose. This regularizes the pose when the scan provides little useful information.

(67) If D and the function Q were known, a single scan could be reliably aligned by optimizing E.sub.S(T;S)+E.sub.C(T,;D,Q). Since D and Q are not known, co-registration seeks to align all scans in parallel while simultaneously solving for D and Q across scans.

(68) Summing over all scans and adding the model regularization yields the following co-registration optimization problem:

(69) $\begin{matrix} \min_{{T^{k}}, {^{k}}, {D^{p}}, Q} \underset{k}{.Math.} [E_{S} (T^{k}; S^{k}) +_{C} (E_{C} (T^{k},^{k}, D^{p_{k}}, Q) +_{} E_{} (^{k}))] +_{C} [_{D} \underset{p}{.Math.} E_{D} (D^{p}) +_{Q} E_{Q} (Q)] & (Equation 5) \end{matrix}$

(70) Here p indexes people, k indexes scans, and p.sub.k identifies the person in each scan. The 's control the relative influence of terms. .sub.C is particularly important; it controls how much the alignments can deviate from the model.

(71) Optimization:

(72) The objective function is non-linear and the state space of solutions is very high-dimensional. Fortunately its structure admits a tractable alternating optimization scheme. Fixing the shapes D.sup.p and the pose-dependent deformation model Q() decouples the scans. Equation 1 (see above) is minimized by solving one non-linear problem of the form min.sub.T.sub.k.sub.,.sub.kE.sub.S(T.sup.k;S.sup.k)+.sub.C(E.sub.C(T.sup.k,.sup.k;D.sup.p.sup.k,Q)+.sub.E.sub.(.sup.k)) for each scan.

(73) In essence, these subproblems are standard pairwise registration problems with a strong regularization toward the posable model (i.e. min.sub.E.sub.C(,;D.sup.p.sup.k,Q)). These subproblems are solved, using MATLAB's lsqnoniin (MathWorks, Natick Mass.). Solving 8 such problems takes 3 minutes on an 8-core Opteron processor.

(74) With all T.sup.k and Q() fixed, minimization with respect to each person's D.sup.p is an independent linear least squares problem for each person p. Similarly, with all T.sup.k and D.sup.p fixed, minimization with respect to Q.sub.f() is an independent linear least squares problem for each triangle f. These sparse least squares problems can be solved efficiently, thus the method's runtime largely depends on its rate of convergence and our ability to compute registration subproblems in parallel.

(75) Co-registration is initialized by fitting an untrained BlendSCAPE model to each scan using E.sub.S and landmark correspondences. This simple model uses a trivial pose-dependent deformation model Q.sub.f()=I . Pose is allowed to vary freely, but shape varies only by isotropically scaling the template. The model fit to scan S.sup.k initializes T.sup.k and .sup.k. Each person's shape D.sup.p is initialized by averaging the scale of the fits for their scans. Q is initialized to the identity.

(76) It is useful to perform the optimization in stages. Experiments begin with a low coupling weight .sub.C so that the crude initial model provides only a rough guide to the registration. Then, .sub.C is increased from 0.25 to between 1 and 5 over several iterations, tightening the fit of the model to the scans. In each iteration, w.r.t. T.sup.k and , then w.r.t. D and Q is minimized. As .sub.C increases, the estimated model has more influence on the alignments, which enables information from good alignments to inform the registration of noisy scans. In addition, the scale parameter a of the robust error function in E.sub.S, is gradually decreased as is frequently done with non-convex error functions; , starts at 1 meter and decreases to 5 cm, 1 cm, and 5 mm. It is observed that the result are not very sensitive to the precise sequence of values of these parameters, or to whether intermediate optimization steps are run to convergence.

(77) Experiments:

(78) To demonstrate the accuracy and robustness of co-registration, several body scan corpora are registered. Each corpus consists of multiple individuals in a wide range of poses. By visual appraisal, at least 96% of the scans in each corpus are registered well, and high quality models from both corpora are obtained. No scans were excluded due to registration failure.

(79) Quantitative Analysis:

(80) For quantitative evaluation a dataset of 124 scans of two females in a wide range of standing and sitting poses was used. One of the two women was scanned during two separate sessions two years apart with different clothing and different hair styles. For the purpose of co-registration, the corpus was treated as containing three individuals, each with distinct body shapes. This dataset has extremely challenging poses, scans with significant holes, and hand-placed landmarks that allow evaluation.

(81) Initialization (see optimization, mentioned above) used twelve hand-placed landmarks on each scan. Co-registration was then run as described above with respect to detailed description of co-registration without any landmarks. In eight iterations, good registrations were obtained for all but four scans. Hands were sometimes slightly misaligned, as hand scan data was often quite noisy.

(82) FIG. 7 shows plots of two indicators of registration quality over the course of the co-registration process. Iteration 0 corresponds to initialization. Only six iterations are shown as there was no visible change afterward The Distance from scan captures how faithful the alignment is to the scan surface. Statistics of the distance between uniformly sampled points on the scan surface and their nearest points on the aligned template were used. Because an alignment can be close to a scan without having anatomically meaningful correspondence, the Landmark prediction error is used to rate correspondence as well. Twenty-four hand-designated landmark vertices (which were not used in registration) are used to predict the location of anatomical landmarks on the scan. These predictions are compared to the locations of these landmarks marked on each scan by a human annotator.

(83) In the first iteration, the alignment surface snaps to within about 1 mm of the scan, but the alignment-scan gap widens afterward. The alignments are pulled toward shapes representable by the model as the alignment-model coupling constant .sub.C increases between iterations 1 and 3. This results in alignments with better correspondence, as seen by the decrease in landmark prediction error and model to scan error. For evaluation, we withhold 30 scans of the same individuals. The model's ability to fit these held out scans improves with each iteration (see the dashed lines in FIG. 7).

(84) FIG. 8 shows some representative alignments and models according to co-registration by way of example. Sample scans, alignments, and model fits stem from co-registration of 124 scans of three subjects across a range of poses. It is to be noted that many of the scans contain significant amounts of noise and missing data; e.g. the chest and back of the bent-over subject. Co-registration is able to use the data present in a scan and to propagate information learned from other scans.

(85) In order to compare co-registration with existing corpus registration methods, also the corpus of 124 scans was registered, using two algorithms representative of the methods discussed above. In AlgorithmI each scan is registered independently using traditional model-free registration, and then all registrations are used to learn a model using the same optimization performed in the learning stage of co-registration. Model-free registration is performed using scan-to-mesh distance E.sub.S, twelve landmark points, and a nonlinear smoothness regularization from Amberg et al. In AlgorithmII, AlgorithmI is iterated as in Blanz & Vetter. After each iteration, the resulting model is fit to each scan and used to reinitialize a fresh run of AlgorithmI.

(86) All methods yield a registration of the model template and a model fit to each scan. Co-registration alignments give more accurate predictions of the 24 evaluation landmarks, with a mean landmark error of 2.02.1 cm versus 3.02.8 for AlgorithmI and 2.72.7 for AlgorithmII. Co-registration also yields better models. Models trained using co-registration are better able to fit scans, with a mean scan to model-fit distance of 0.250.30 cm on our 30 test scans. Algorithms I and II have distances of 0.380.63 and 0.310.40 respectively. Co-registration models give a mean landmark prediction error of 2.21.8 cm on the 30 test scans, whereas the models generated by I and II have errors of 3.79.3 and 3.46.0.

(87) Large Scale Registration:

(88) To evaluate the method of this invention on a larger corpus with a wider range of body shapes, a publicly available set of scans provided by Hasler et. al. was registered. The dataset contains 337 scans of 34 different women in 35 poses. Hasler et. al. provide alignments as well, which were used to obtain 36 rough landmark vertices on each scan for initialization. Only six bad registrations have been observed, each to a scan of a different woman. Five are in forward bend poses, in which large portions of the face and chest are missing from the scan. These failures do not appear to impact the model's ability to accurately capture the shapes, D, of the six women.

(89) Improving Existing Registrations:

(90) Because co-registration is able to integrate information from multiple scans of the same person and multiple people in different poses, it can be used to improve extant registered meshes without access to the original scans. 4 female subjects with 10 poses each from the Hasler et. al. dataset have been randomly selected. By fitting the model M to a small number of these registrations, a correspondence between their template and present one has been estimated. This correspondence is used, to initialize T.sup.k for everybody and then use co-registration to learn a model and registration to their registered meshes. Registering registered meshes may seem odd, but it has two effects: 1) it denoises the existing alignments and 2) it learns a model from them.

(91) FIG. 9 refers to registration denoising and shows examples of the original registrations and the refinement. Each pair shows one pose for each of four women. The left mesh is the registration from Hasler et al. these have noise and artifacts. The right mesh is the registration according to present invention to the left registration using co-registration.

(92) Further Embodiments

(93) A preferred embodiment of present invention has been described with respect to solving the corpus registration problem by approaching modeling and alignment simultaneously. The algorithm for co-registration incorporates a BlendSCAPE term into the registration objective function. This allows, to optimize over both aligned template meshes and over a shape model, offering a significant advantage over the traditional three-stage approach to model learning. By providing a well-defined, model-based objective function that a collection of registered meshes should minimize, co-registration allows shape information learned from good data to correct for missing data. To demonstrate the effectiveness of co-registration, several collections of 3D scans have been registered. Co-registration results in high quality alignments and a realistic BlendSCAPE model learned from multiple individuals.

(94) A preferred embodiment of present invention focuses on the SCAPE model, it should be understood that other standard graphics models of the body could be used as well. Furthermore, it is common for graphics models to describe not just 3D shape, but other aspects of an object's visual appearance (e.g. color, reflectance). When working with such models, it is straightforward to extend co-registration to account for more than just shape. In this case, the data term E.sub.D and coupling term E.sub.C simply require additional terms, that estimate the visual agreement between triangles on the scan, alignment, and model. This allows co-registration to compute not only a per shape model, but also, for example, a per-person colored texture map to associate with each shape model.

(95) Above a basic co-registration method has been described, which produces a high quality SCAPE model applicable only to the registered individuals. Of course it should be obvious that not just the shape D of each individual can be learned, but also a low dimensional shape space capable of approximating all body shapes. This has been done previously with SCAPE (see Anguelov et al., 2005 and Balan et al.), but only using traditional registration techniques. Additionally, previous attempts to learn a shape space via PCA focus on single scans of individuals. Since the D estimates are learned across multiple scans of a person, they may be more reliable than those learned from a single scan. It is to be noted also that D in the coupling term E.sub.C can easily be replaced with a low-dimensional projection of D. This helps drive shape estimates toward a low-dimensional space. It also helps co-registration work with datasets in which there are only one or two scans of each individual.

(96) One version of the method works with a corpus of a single individual and produces a personalized avatar for that person. Another version produces an avatar that captures the shape of a corpus of many people.

(97) The method according to this invention can be used to align point clouds or meshes. It can also be used to denoise existing registered methods (model-based mesh denoising). The method's ability to learn realistic models from noisy, hole-filled data also makes it well-suited to noisy depth images, such as those output by the XBox kinect.

(98) Here it has been described a single Q model for the whole corpus. It is straightforward to make Q depend also on body shape. For example, if we learn a PCA subspace for the body shape, Q can be made a simple function of these shape parameters. For example, Q can be a linear function of the shape parameters and this can be learned simultaneously during co-registration.

(99) The method has been demonstrated for people in tight clothing but it can be applied to people in clothing as well. In this case if we learn a single D, this captures the mean clothing shape. If D is varied with every scan of a person then a low dimensional subspace of clothing shape variation can be captured. This can further be related to body pose. For example, the current Q can be replaced with a global non-rigid body deformation, constructed from this shape deformation subspace, that is related (e.g. linearly) to pose. This would effectively model non-rigid deformations of clothing with pose. The same approach can be used to model muscle deformations.

(100) The invention has been described, using full body scans but it also works with partial scans. For example, devices like the Microsoft Kinect produce one view of a person.

(101) As the person moves around other views may be recognized. An entire corpus of partial scans like this can be co-registered. As in the examples of holes, mentioned above, the information from good views fills in the information that is missing.

(102) While the focus was set on human bodies, of course this method can be applied to build models of any type of animal or object. Generally, the method and system may be applied for customizing virtual person's clones, like an avatar, according to scan data.

(103) Generally, the example embodiments mentioned above are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by this description.

REFERENCE NUMERALS

(104) T Template S Object Scan S.sub.o Another object scan M Trained object Model D Object shape deformations Q Pose-dependent deformation model

Co-registration—simultaneous alignment and modeling of articulated 3D shapes

Assignee

Inventors

Cpc classification

Classification Explorer

G06T13/40

PHYSICS

Classification Explorer

G06T2213/04

PHYSICS

Classification Explorer

G06T19/20

PHYSICS

Classification Explorer

G06T7/97

PHYSICS

Classification Explorer

G06T2213/12

PHYSICS

Classification Explorer

G06T17/20

PHYSICS

Classification Explorer

G06T7/50

PHYSICS

Classification Explorer

G06T17/00

PHYSICS

International classification

Classification Explorer

G06T13/40

PHYSICS

Classification Explorer

G06T17/00

PHYSICS

Classification Explorer

G06T7/50

PHYSICS

Classification Explorer

G06T7/00

PHYSICS

Classification Explorer

G06T17/20

PHYSICS

Classification Explorer

G06T19/20

PHYSICS

Abstract

Claims

Description