SYSTEM AND A METHOD FOR LEARNING FEATURES ON GEOMETRIC DOMAINS
20170213381 ยท 2017-07-27
Inventors
- Michael Bronstein (Lugano, CH)
- Davide Boscaini (Lugano, CH)
- Jonatan Masci (Como, IT)
- Pierre Vandergheynst (Ecublens, CH)
Cpc classification
G06F18/21345
PHYSICS
G06T19/00
PHYSICS
G06V10/7715
PHYSICS
International classification
Abstract
A method for extracting hierarchical features from data defined on a geometric domain is provided. The method includes applying on said data at least an intrinsic convolution layer, including the steps of applying a patch operator to extract a local representation of the input data around a point on the geometric domain and outputting the correlation of a patch resulting from the extraction with a plurality of templates. A system to implement the method is also described.
Claims
1. A method for extracting hierarchical features from data defined on a geometric domain, comprising applying on said data at least an intrinsic convolution layer, including the steps of applying a patch operator to extract a local representation of the input data around a point on the geometric domain and outputting the correlation of a patch resulting from the extraction with a plurality of templates.
2. The method according to claim 1, further comprising applying at least one of the following layers: a linear layer or fully connected layer, including outputting a weighted linear combination of input data; a non-linear layer, including applying a non-linear function to input data; a spatial pooling layer, including: determining a subset of points on the geometric domain; for each point of said subset, determining the neighbours on the geometric domain; and computing an averaging operation on input data over the neighbours for all the points of said subset; and a covariance layer, including computing a covariance matrix of input data over all the points of the geometric domain; wherein each layer has input data and output data and output data of one layer are given as input data to another layer.
3. The method according to claim 2, wherein two or more of said layers are applied in sequence, and the output data of one layer in the sequence is given as input data to a subsequent layer in the sequence.
4. The method according to claim 1, wherein the geometric domain is one of the following: a manifold; a parametric surface; an implicit surface; a mesh; a point cloud; or a graph.
5. The method according to claim 1, wherein parameters of the applied layer are determined by minimizing a cost function by means of an optimization procedure.
6. The method according to claim 1, wherein the patch operator inputs data on the geometric domain and said point on said domain, and outputs the local representation of said data around said point, wherein the local representation is one or more of the following: data represented in a local intrinsic polar system of coordinates; data transformed by a windowed Fourier transform; and data weighted by anisotropic diffusion kernels.
7. The method according to claim 6, wherein the patch operator outputs said local representation of input data in the local intrinsic polar system of coordinates, and wherein an origin of angular coordinates of the local intrinsic polar system of coordinates is provided as side information extracted from the geometric domain or the data.
8. The method according to claim 7, wherein the geometric domain is a surface and the side information used to determine the origin of the angular coordinate of the local intrinsic polar system of coordinates at each point on the surface is a principle curvature direction at said point.
9. The method according to claim 7, wherein the side information used to determine the origin of the angular coordinate of the local intrinsic polar system of coordinates at each point is a direction of a minimal or maximal absolute change of the data at said point.
10. The method according to claim 6, wherein the patch operator outputs said local representation of input data in the local intrinsic polar system of coordinates, and the Fourier transform is applied with respect to angular coordinates, followed by an absolute value operation.
11. The method according to claim 1, wherein the intrinsic convolution layer includes applying said patch operator to extract a local representation of input data in a local intrinsic polar system of coordinates around a point on the geometric domain, and outputting the correlation of a patch resulting from the extraction with a plurality of templates, the computation of said correlation further comprising the steps of: rotating each template along angular coordinates; computing the correlation of the patch with the rotated template; and taking a maximum operation over all the rotations.
12. The method according to claim 6, wherein the representation of input data in the local polar system of coordinates around the point on the geometric domain further comprises the steps of: computing an intrinsic distance from said point to all the other points on the geometric domain; computing level sets of said intrinsic distance at a plurality of levels; splitting a full angle at said point into a plurality of angular directions; and shooting rays emanating from said point along said directions perpendicular to said level sets.
13. The method according to claim 12, wherein said intrinsic distance is one of the following or an approximation thereof: geodesic distance; diffusion distance; commute time distance; or biharmonic distance.
14. The method according to claim 12, wherein the representation of input data further comprises the steps of computing weights localized around level sets and rays; and computing weighted averages of input data with each of said weights.
15. The method according to claim 6, wherein computing the windowed Fourier transform of input data further comprises: computing localized modulated atoms; and computing inner products of data with said atoms.
16. The method according to claim 15, wherein localized modulated atoms are heat kernels multiplied by Laplacian eigenfunctions.
17. The method according to claim 16, wherein the heat kernels are anisotropic.
18. The method according to claim 6, wherein the computation of local representation of input data around a point on the geometric domain further comprises the steps of computing a plurality of directed anisotropic heat kernels at said point, corresponding to a plurality of anisotropy constants and directions; and computing weighted averages of input data with said directed anisotropic heat kernels.
19. The method according to claim 2, wherein the averaging operation includes one or more of the following: an arithmetic average; a maximum operation; an average weighted by weights dependent on a distance between points; and an average weighted by weights dependent on local volume elements.
20. The method according to claim 5, wherein a plurality of said cost functions are defined, and wherein each of said cost functions is associated to one or more application for which feature extraction is carried out.
21. The method according to claim 5, wherein the input into the optimization procedure is a training set comprising: a positive set of pairs of points on one or more geometric domains; and negative set of pairs points on one or more geometric domains, wherein two identical copies of the hierarchical system configured with the same parameters are fed with pairs of points from said positive and negative sets, and wherein the optimization procedure tries to minimize the distance between the outputs corresponding to positive pairs and maximize the distance between the outputs corresponding to negative pairs.
22. A system for extracting hierarchical features from data defined on a geometric domain, comprising means for applying on said data at least an intrinsic convolution layer, wherein said means are configured to apply a patch operator, to extract a local representation of the input data around a point on the geometric domain, and to output the correlation of a patch resulting from the extraction with a plurality of templates.
23. The system for extracting hierarchical features according to claim 22, further including means for applying one or more of the following layers on said data: an intrinsic convolution layer, to apply a patch operator for extracting a local representation of the data around a point on the geometric domain and to correlate a patch, resulting from the extraction, with a plurality of templates; a linear layer or fully connected layer, to take said data in input and to give in output a weighted linear combination; a non-linear layer, to take said data in input into a non-linear function; a spatial pooling layer, to determine neighbours of a point on the geometric domain and to compute an averaging operation over the neighbours; and a covariance layer, to compute a covariance matrix of said data in input over all the points of the geometric domain.
24. The system according to claim 22, wherein the geometric domain is one of the following: a manifold; a parametric surface; a mesh; a point cloud; or a graph.
25. The system according to claim 22, including an optimization procedure for minimizing a cost function to determine parameters of the layer to be applied.
26. The system according to claim 22, wherein the patch operator is one or more of the following: a local polar system of coordinates; a windowed Fourier transform; and an anisotropic diffusion kernels.
27. The system according to claim 26, wherein the patch operator is configured to output said local representation of input data in the local intrinsic polar system of coordinates, and to provide an origin of angular coordinates of the local intrinsic polar system of coordinates as side information extracted from the geometric domain or the data.
28. The system according to claim 27, configured to take a surface as a geometric domain and to a take a principle curvature direction at said point as the side information used to determine the origin of the angular coordinate of the local intrinsic polar system of coordinates at each point on the surface.
29. The system according to claim 27, configured to take a direction of a minimal or maximal absolute change of the data at said point as the side information used to determine the origin of the angular coordinate of the local intrinsic polar system of coordinates at each point.
30. The system according to claim 26, wherein the patch operator is configured to output said local representation of input data in the local intrinsic polar system of coordinates, and to apply the Fourier transform with respect to angular coordinates, followed by an absolute value operation.
31. The system according to claim 22, wherein said means to apply the intrinsic convolution layer is configured to apply said patch operator to extract a local representation of input data in a local intrinsic polar system of coordinates around a point on the geometric domain, and to output the correlation of a patch resulting from the extraction with a plurality of templates, and is further configured to: rotate each template along angular coordinates; compute the correlation of the patch with the rotated template; and take a maximum operation over all the rotations.
32. The system according to claim 22, wherein said means is further configured to: compute an intrinsic distance from said point to all the other points on the geometric domain; compute level sets of said intrinsic distance at a plurality of levels; split a full angle at said point into a plurality of angular directions; and shoot rays emanating from said point along said directions perpendicular to said level sets.
33. The system according to claim 32, wherein said intrinsic distance is one of the following or an approximation thereof: geodesic distance; diffusion distance; commute time distance; or biharmonic distance.
34. The system according to claim 32, wherein said means is further configured to: compute weights localized around level sets and rays; and compute weighted averages of input data with each of said weights.
35. The system according to claim 26, wherein said means is further configured to compute the windowed Fourier transform of input data by: computing localized modulated atoms; and computing inner products of data with said atoms.
36. The system according to claim 35, wherein said localized modulated atoms are heat kernels multiplied by Laplacian eigenfunctions.
37. The system according to claim 36, wherein the heat kernels are anisotropic.
38. The system according to claim 26, wherein said means is further configured for the computation of local representation of input data around a point on geometric domain by: computing a plurality of directed anisotropic heat kernels at said point, corresponding to a plurality of anisotropy constants and directions; and computing weighted averages of input data with said directed anisotropic heat kernels.
39. The system according to claim 23, wherein the averaging operation includes one or more of the following: an arithmetic average; a maximum operation. an average weighted by weights dependent on a distance between points; and an average weighted by weights dependent on local volume elements.
40. The system according to claim 25, which stores a plurality of said cost functions, wherein each of said cost functions is associated to one or more application for which features may be extracted.
41. The system according to claim 40, including a training set as an input for the optimization procedure, comprising: a positive set of pairs of points on one or more geometric domains; and negative set of pairs points on one or more geometric domains, wherein two identical copies of the hierarchical system configured with the same parameters are fed with pairs of points from said positive and negative sets, and wherein the optimization procedure tries to minimize the distance between the outputs corresponding to positive pairs and maximize the distance between the outputs corresponding to negative pairs.
42. The system according to claim 22, further comprising means for determining similarity between two geometric objects or for determining correspondence between objects from a class of geometric objects and a reference object.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
DETAILED DISCLOSURE
[0147] According to the idea of solutions given above, methods are hereafter described for extracting hierarchical features from data defined on a geometric domain using an intrinsic convolutional neural network endowed with, at least, one intrinsic convolution layer, according to the present invention. The methods are described with reference to the annexed drawings, provided for exemplary purposes only and without limiting the scope of the present invention.
[0148] The intrinsic convolution layer of the method of the present invention differs from the convolution layer of related art convolutional neural networks by the kind of domains onto which it is applicable. In particular, the convolution layer of related art neural networks is restricted to domains that can be modeled as Euclidean spaces (images, audio signals, etc.). Advantageously, the intrinsic convolution layer of the method of the present invention is adapted to deal with a wider category of domains that can be modeled as non-Euclidean spaces.
[0149] Advantageously, the methods of the subject invention work at an intrinsic level: the patch operator according to the present invention is defined on the geometric domain itself, as opposed to related art methods using a Euclidean representation (embedding) of non-Euclidean domains. For instance,
[0150] Hereafter, definitions of geometric domain, data defined on the geometric domain, patch operator, correlation with templates and intrinsic convolutional neural network according to the methods of the present invention are given.
[0151] A geometric domain is a non-Euclidean space. Therefore, the meaning of the two expressions in the following description is the same. Geometric domains include, but are not limited to, manifolds and graphs.
[0152] An m-dimensional manifold X is a topological space that is locally homeomorphic (topologically equivalent) to a m-dimensional Euclidean space, referred to as the tangent space. For example, the Earth (spherical surface) is locally equivalent to a plane. Additionally, the manifold X can be endowed with an inner product on the tangent space (referred to as Riemannian metric g.sub.X), which provides for a way to measure intrinsic distances on the manifold. In this case X is referred to as a Riemannian manifold.
[0153] What is commonly known as three-dimensional (3D) shapes in computer graphics and vision can be modeled as two-dimensional (2D) manifolds (surfaces), embedded in the 3D Euclidean space. The term 2D in this case refers to the intrinsic dimensionality of the manifold, and 3D to the (extrinsic) dimensionality of the embedding space.
[0154] More generally, an M-dimensional manifold embedded in a D-dimensional Euclidean space (D>M) can be represented through a parametrization y=(x.sub.1, . . . , x.sub.M), : .sup.M.fwdarw.
.sup.D, where represents the parameter space. In the case when M=2, the manifold is referred to as a parametric surface.
[0155] Another way of representing an M-dimensional manifold is in the implicit form as the M-dimensional level set of a function (x.sub.1, . . . , x.sub.D)=0. For example, a 2-dimensional manifold (surface) is represented by a function defined on the 3D Euclidean space. Often, is chosen as the distance from the manifold.
[0156] The previous models are ordered in terms of generality.
[0157] According to the present invention, the geometric domains defined in terms of one of the previous formulas are associated to a discrete approximation.
[0158] For instance, the manifold X can be associated to a discrete approximation X by sampling N points x.sub.1, . . . , x.sub.NX. The discrete approximation can be represented in different ways, including: [0159] a point cloud V.sup.N3 where for each point x.sub.i, i=1, . . . , N, is specified a coordinate in
.sup.3 stored in the ith row of the matrix V. These data are the typical output of 3D acquisition devices such as Microsoft Kinect or Intel RealSense; [0160] a mesh (V,E,F), for instance a manifold triangular mesh with vertices V
.sup.N3, edges E{1, . . . , N}{1, . . . , N}, and faces F{1, . . . , N}{1, . . . , N}{1, . . . , N}, where each interior edge (i,j)E is shared by exactly two triangular faces (i,j,k),(i,j,h)F, as represented in
[0161]
[0162] Graphs, instead, can be modeled as the pair (V,E), where V={1, . . . , N} is a set of N vertices or nodes, and E{1, . . . , N}{1, . . . , N} is the set of edges connecting the vertices. The main difference with the previous models resides in the fact that a graph is a purely topological notion, therefore the vertices are not associated with any coordinate in .sup.3.
[0163] According to the present invention, given the geometric domain X, data on X are defined in terms of functions : X.fwdarw..sup.Q, where Q represents the dimensionality of the data. When the geometric domain X is approximated with one of the previous discrete approximation X, then data can be defined in terms of maps f:V.fwdarw.
.sup.Q, where V are the vertices of the discretization X. If Q=1, than f is the N-dimensional vector f=((x.sub.1), . . . , (x.sub.N)).sup.T, where .sup.T is the transposition operator.
[0164] The steps of the method of the present invention are executed on data defined on the discrete approximation X of the geometric domain X.
[0165] Examples of such functions for defining data are: [0166] geometric information, such as the Gaussian curvature K(x.sub.i)=k.sub.M(x.sub.i)k.sub.M(x.sub.i), where k.sub.m(x.sub.i),k.sub.M(x.sub.i) represent the minimum and maximum curvature of the point x.sub.iX respectively; [0167] color information, such as color texture in the RGB color space f:V.fwdarw..sub.3, x.sub.i.fwdarw.(R.sub.i,G.sub.i,B.sub.i) where R.sub.i,G.sub.i,B.sub.i are scalars associated to the red, green, and blue channels respectively; [0168] semantic information, such as the meaning or the name of some shape parts. If, for instance, we consider the 3D shape of a human body, the function f:
can map a sparse set of vertices
, where the films are labeled with natural numbers) or more generally, any function :V.fwdarw.
.
[0169] Data on the geometric domain X may be defined also through spectral properties of the Laplace-Beltrami operator .sub.X =div.sub.X(.sub.X), a generalization of the classical Laplacian to non-Euclidean spaces.
[0170] The Laplace-Beltrami operator (LBO) is intrinsic, i.e. it does not depend on the embedding of the manifold. As a result, it is invariant to isometric (metric preserving) deformations of the manifold. If, for instance, the manifold X represent a human body, then the LBO and all the derived quantities are invariant to, e.g., pose changes (different positions of arms, legs, . . . ).
[0171] On a compact manifold, the LBO admits an eigendecomposition .sub.X.sub.k=.sub.k.sub.k with real eigenvalues {0=.sub.1.sub.2 . . . }. The eigenvalues set {.sub.k}.sub.k1 is also known as the spectrum of the LBO. The corresponding eigenfunctions {.sub.k}.sub.k1, form an orthonormal basis on L.sup.2(X), which is a generalization of the Fourier basis to non-Euclidean domains.
[0172] Any function L.sup.2(X) can be represented as the Fourier series (x)=.sub.k1,.sub.k
.sub.L.sub.
,.sub.k
.sub.L.sub.
[0173] The generalized convolution of ,gL.sup.2(X) can be defined by analogy to the classical case as the inverse transform of the product of forward transforms, (*g)(x)=.sub.k1{circumflex over ()}.sub.k.sub.k.sub.k(X) and is, in general, non shift-invariant.
[0174] The LBO can be used to describe physical processes on geometric domains. For instance, the heat diffusion on a geometric domain X can be described in the simplest setting through the isotropic heat equation:
where (x,t) denotes the amount of heat at point x and time t and .sub.0(x) is the initial heat distribution. The solution of the previous equation is obtained as (x,t)=h.sub.t(x,x)*.sub.0(x), where h.sub.t(x,x) is the heat kernel. In the spectral domain the heat kernel is expressed as h.sub.t(x,x)=.sub.k1e.sup.t.sup.
[0175] The isotropic heat equation assumes that the heat conduction properties of the manifold are constant at every point. A more general diffusion equation has the form .sub.t(x,t)=div.sub.X(D(x).sub.X(x,t)), where D(x) is the thermal conductivity tensor (in the discrete settings, the operator D(x) can be represented as the 22 matrix D) applied to the gradient in the tangent plane. The thermal conductivity tensor allows modeling heat flow that is position- and direction-dependent; the diffusion equation in this case is called anisotropic.
[0176] The eigendecomposition and the heat kernel associated with the anisotropic LBO
[0177] On a discrete approximation X of the geometric domain, the LBO .sub.X can be defined as an NN matrix L=A.sup.1W, where
where .sub.ij,.sub.ij denotes the angles ikj,ihj of the triangles sharing the edge ij, and A=diag(a.sub.1, . . . , a.sub.N) with a.sub.i=.sub.jk:ijkFA.sub.ijk being the local area element at vertex i and A.sub.ijk denoting the area of the triangle ijk.
[0178] The first KN eigenvectors and eigenvalues of L are computed by performing the generalized eigendecomposition W=A, where =(.sub.1, . . . , .sub.K) is an NK matrix containing as columns the discretized eigenfunctions and =diag(.sub.1, . . . , .sub.K) is the diagonal matrix of the corresponding eigenvalues.
[0179] Through the eigendecomposition of the LBO it is possible to define meaningful features f:V.fwdarw..sup.Q that can be considered input data for the method of the present invention. For instance, global point signature is defined as f(x)=(.sub.1.sup.1/2.sub.1(x), . . . , .sub.Q.sup.1/2.sub.Q(x)).
[0180] Other spectral shape descriptors take a generic form of the diagonal of a parametric kernel diagonalized by the LBO eigenbasis. More specifically, at each point a descriptor may be constructed as
f(x)=.sub.k1(.sub.k).sub.k.sup.2(x),
[0181] where ()=(.sub.1(), . . . , .sub.Q()) represents a bank of transfer functions. By changing the transfer functions () different shape properties are described or emphasized.
[0182] Relevant examples include: [0183] heat kernel signature (HKS) considers low-pass transfer functions .sub.t()=e.sup.t for various values of the parameter t{t.sub.1, . . . , t.sub.Q}. The resulting descriptor gives rise to the autodiffusivity function h.sub.t(x,x) (diagonal of the heat kernel), whose physical interpretation is the amount of heat remaining at x after time t; [0184] wave kernel signature (WKS) considers bans-pass transfer functions .sub.()=e.sup.((log log )/2.sup.
[0186] According to the present invention, the convolution layer is applied to geometric domains by following the interpretation of convolution as correlation with a template. This convolution layer is referred to as intrinsic convolution layer to avoid ambiguities with related art convolution layers, which are limited to be applied on Euclidean domains only and are not invariant to non-rigid deformations.
[0187] The intrinsic convolution layer comprises the steps of: [0188] extracting a local representation of the input data around a point xX. Such local representation is hereafter referred as patch and the operator that extract the patches from the geometric domain X is referred as patch operator; [0189] computing the correlation between the patches with a plurality of templates.
[0190] According to an aspect of the present invention, the patch operator may be defined in different ways. In particular, in the following description, three different patch operators are defined through which, respectively: [0191] data can be represented in a local polar system of coordinates; [0192] data can be transformed by a windowed Fourier transform; [0193] data can be weighted by anisotropic heat kernels.
[0194] The patch operator takes in input data on the geometric domain and a point on said domain, and outputs the local representation of said input data around said point, wherein the local representation, depending on the operator used, is data represented in a local intrinsic polar system of coordinates, data transformed by a windowed Fourier transform, or data weighted by anisotropic diffusion kernels.
[0195] Advantageously, the patch operator to be used is selected depending on the kind of geometric domain on which data are defined. For instance, the construction of a patch as a local polar system of coordinates is limited to meshes only.
[0196] Moreover, depending on which patch operator is used, the step of correlating with a template is adapted. In the following, details of each of the three patch operators and how to correlate the extracted patches with a template are described.
[0197] According to one aspect of the proposed invention, the patch operator computes a local representation of the data around each input vertex of the geometric domain by interpolating the data on a local polar system of coordinates previously extracted. In more detail, given a point xX, such patch operator is constructed following two steps: [0198] construction of a local polar system of coordinates on the geometric domain X, i.e. a bijection (x):B.sub..sub.
[0200] The construction of a local polar system of coordinates includes the following steps: [0201] extraction of intrinsic distance from the point xX to all the other points of the geometric domain X. Examples of intrinsic distances include: [0202] geodesic distance, which measures the length of the shortest path along the geometric domain X between two points x,xX. The main drawbacks of such distance are that it requires a mesh structure and it is sensitive to topological noise (e.g., holes in the geometric domains caused by an imprecise acquisition of the 3D geometry); [0203] diffusion distance, which measures the distance between two points x,xX by the diffusion process d.sub.X.sup.2(x,x)=.sub.k1e.sup.t.sup.
is the Green's function of the biharmonic operator .sup.2. When represented in terms of eigenfunctions and eigenvalues of the LBO, the biharmonic distance formula differs from the commute time distance one only slightly:
[0208] The previous steps are repeated for all the points of the geometric domain X In particular,
[0209] Given the angular and radial coordinates, the interpolation weights (x) can be defined as:
and w.sub.(x,x), w.sub.(x,x) are the radial and angular interpolation weights, respectively.
[0210] For instance, the radial interpolation weights can be defined as a Gaussian w.sub.(x,x)e.sup.(d.sup.
[0211] The angular interpolation weights can be defined as a Gaussian w.sub.(x,x)e.sup.d.sup.
[0212] If the geometric domain is approximated with a (manifold triangular) mesh, the local system of coordinates previously described is sampled at N.sub. angular and N.sub. radial bins, obtaining in this way a discrete local system of coordinates. In more detail, with reference to
[0216] As a consequence, the patch operator can be represented in the discrete domain as a N.sub.N.sub.NN matrix applied to the data f on the mesh and producing the patches at each vertex. Conveniently, the matrix is very sparse since the values of the function at a few nearby vertices only contribute to each local polar bin.
[0217] Once the patch operator (x) is computed, the convolution of L.sup.2(X) with a template a(,) is computed as (*a)(x)=.sub.,ra(+,r)((x))(r,), where a(,) is a template (or filter) applied on the patch (i.e. expressed in the same local representation than the data ). Due to angular coordinate ambiguity, the filter can be rotated by arbitrary angle .
[0218]
[0219] In order to deal with the angular ambiguity, the method of the present invention may consider as side information to determine the origin of the angular coordinate of the local intrinsic polar system of coordinates, one of the following: [0220] principal curvature direction at said point (only if geometric domain is a surface); [0221] direction of minimal/maximal absolute change of the data at said point;
[0222] According to another aspect of the present invention, if no side information is available, it is known that the angular ambiguity can be removed by first applying the Fourier transform with respect to the angular coordinate and then taking the absolute value, i.e. |.sub. e.sup.i((x)(x))(,)|. The Fourier transform translates rotational ambiguity into complex phase ambiguity, which is removed by the absolute value.
[0223] A further aspect of the method removes angular ambiguity by means of an angular max pooling procedure including the steps of: [0224] creating N.sub.1 additional copies of the template a(,) by rotating it along the angular component by the angular bin magnitude, obtaining this way a total of N.sub. templates; [0225] computing the correlation .sub.,ra(+,r)((x))(r,) of the patch with all the N.sub. rotated templates; [0226] taking the maximum operation over all the rotations, which corresponds to selecting the template with the maximum response to the given patch among the N.sub. rotated templates.
[0227] On the basis of the idea that convolution in the spatial domain corresponds to a multiplication in the frequency domain, the method of the present invention defines the intrinsic convolution layer by applying a vertex-frequency analysis or a windowed Fourier transform (WFT) to the input data.
[0228] The computation of the patch operator using the WFT requires two steps: [0229] the computation of the localized modulated atoms; [0230] the computation of inner products with said atoms.
[0231] In the Euclidean domain, classical WFT analyzes the frequency content of a signal that is localized by means of a window. Given a function L.sup.2(), and a window g localized at zero, the WFT is computed as (S).sub.x,=
(x)g(xx)e.sup.ixdx. WFT can also be represented as an inner product with a translated and modulated window, (S).sub.x,=
,M.sub.T.sub.xg
.sub.L.sub.
.sub.), where T.sub.x represent the translation operator and M.sub. the modulation operator, respectively.
[0232] The method of the present invention extends the notion of WFT to geometric domains by defining translation operator as (T.sub.x)(x)=.sub.k1{circumflex over ()}.sub.k.sub.k(x).sub.k(x) and by defining the modulation operators as (M.sub.k)(x)=.sub.k(x)(x), where {circumflex over ()} is the Fourier transform of the input data and {.sub.k}.sub.k1 can be the isotropic LBO eigenfunctions or the anisotropic LBO eigenfunctions.
[0233]
[0237] Combining the two operators together, a modulated and translated window M.sub.T.sub.xg, called atom can be obtained. Accordingly, on geometric domains (S).sub.x,=,M.sub.T.sub.xg
.sub.L.sub.
,.sub.l.sub.k
.sub.L.sub.
[0238] If a discrete approximation of the geometric domain is provided, the WFT can be computed as Sf=(f).sup.TA(
.sup.T), where is the K-dimensional vector representing the window in the frequency domain, f is the N-dimensional vector representing the input function, and (a
B).sub.ij=a.sub.iB.sub.ij denotes the known Hadamard product, i.e. element-wise multiplication of a vector and a matrix, replicating the vector along the second dimension of the matrix. The resulting WFT is a matrix of size KN.
[0239] If the patch operator is provided by means of WFT, the correlation with a template a reduces to a simple multiplication, i.e. (*a)(x)=.sub.p=1.sup.P.sub.k=1.sup.Ka.sub.qpk|(S.sub.p).sub.x,k|, q=1, . . . , Q, where p=1, . . . , P represent the dimension of the input function and the absolute value is considered to reduce the effect of the LBO eigenfunctions sign ambiguities.
[0240] Advantageously, according to the subject invention, the Fourier transform of the window can not only be a fixed filter, but can also be parameterized in some fixed interpolation basis in the frequency domain (e.g. the B-spline basis ={.sub.1, . . . , .sub.Q} on the LBO spectrum), ()=.sub.m=1.sup.Mb.sub.pm.sub.m(), p=1, . . . , P, where the PM matrix (b.sub.pm) of learnable weights defines the windows.
[0241] According to another embodiment of the present invention, the patch operator can be defined through anisotropic heat kernels. In this setting, the main idea is to build a local representation of the given data around the point x by averaging a variety of anisotropic heat kernels at different directions. Such kernels capture local directional structures similarly to the local polar system of coordinates previously described.
[0242] The construction of such local kernels requires two steps: [0243] the computations of the thermal tensor D.sub.(x) which favors the heat diffusion towards a certain direction. The parameter controls the degree of anisotropy; [0244] the computation of the rotated heat kernels D.sub.(x)=R.sub.D.sub.(x)R.sub..sup.T for various angles .
[0245] For instance, D.sub.(x) can be defined as
where
and k.sub.m(x),k.sub.M(x) are the minimum and maximum curvature, respectively. In this situation, D.sub.(x) drives the diffusion in the direction of the maximum curvature k.sub.M(x).
[0246]
[0247]
[0248] This approach offers various advantages over the local polar coordinate system, including: [0249] no angular ambiguity; [0250] no restriction on the diameter of the local representation to guarantee valid topological disks; [0251] no limit to meshes only.
[0252] If the geometric domain is approximated with a triangular mesh X, the anisotropic LBO is defined as
where .sub.kj,.sub.ki
.sub.H=.sub.kj.sup.TH.sub.kj,
and U.sub.ijk=(.sub.M,,{circumflex over (n)}) is an orthonormal reference frame attached to each triangle ijkF. The shear matrix H encodes the anisotropic scaling operated by D.sub. up to an orthogonal basis change. If D.sub.=I,.sub.kj,.sub.ki
.sub.H=cos .sub.ij and the isotropic LBO discretization is obtained.
[0253] In order to allow arbitrary direction, the basis vectors U.sub.ijk is rotated on each triangle around the respective normal {circumflex over (n)} by the angle , equal for all triangles. Denoting as R.sub. the corresponding 33 rotation matrix, this is equivalent to modifying the H-weighted inner product with the directed shear matrix H.sub.=R.sub.HR.sub..sup.T. The resulting weights w.sub.ij are thus obtained by using the inner products .sub.kj,.sub.ki
.sub.H=.sub.kj.sup.TH.sub..sub.kj.
[0254] If the geometric domain is approximated with a point cloud instead, the procedure follows the following steps: [0255] for each point x in the point cloud, a local tangent plane is estimated by considering the points within an -ball B.sub.(x) using known techniques; [0256] the points in B.sub.(x) are projected onto the tangent plane; [0257] a Delaunay triangulation of the points projected on the tangent plane is constructed.
[0258] Once this local mesh triangulation is provided, the previous formulas still hold.
[0259] Following the same construction of known convolutional neural networks, the proposed method combines other known layers with the intrinsic convolution layer either to achieve better performance or to adapt the presented method to the desired application.
[0260] In particular, the proposed method includes several layers that can be applied subsequently, such that the output of the previous layer is used as input into the subsequent one to form a hierarchy of transformations. Other known layers include: [0261] linear layer or fully connected layer, given a P-dimensional input f.sup.in=(.sub.1.sup.in(x), . . . , .sub.P.sup.in(x)) produces a Q-dimensional output f.sup.out=(.sub.1.sup.out(x), . . . , .sub.Q.sup.out(x)) as linear combination of the input channels with a set of learnable parameters w, .sub.q.sup.out(x)=.sub.p=1.sup.Pw.sub.qp.sub.p.sup.in, q=1, . . . , Q; [0262] non-linear layer, optionally it follows a linear layer by applying a non-linear function to the output of the linear layer, i.e. .sub.q.sup.out(x)=(.sub.p.sup.in), q=1, . . . , Q. The non-linear functions are often referred to as activation functions. Most common activation functions include, but are not limited to: [0263] ReLU, (.sub.p.sup.in)=max{0,.sub.p.sup.in}; [0264] logististic,
[0270] computing an averaging operation on input data over the neighbors for all the points on said subset.
[0271] Averaging operations include the known [0272] arithmetic mean,
[0274] and the novel [0275] weighted average with weights depending on an intrinsic distance between points, (x.sub.i)=.sub.j=1.sup.Md.sub.X(x.sub.i,x.sub.i.sup.j)(x.sub.i.sup.j), where d.sub.X(;) is one of the intrinsic distances previously mentioned; [0276] weighted average with weights depending on local volume elements (x.sub.i)=.sub.j=1.sup.Mvol.sub.X(x.sub.i.sup.j)(x.sub.i.sup.j), where vol.sub.X(x.sub.i.sup.j) can be the area of a local patch around the point x.sub.i.sup.j; [0277] covariance layer, it is used in applications such as retrieval where one needs to aggregate the point-wise descriptors to produce a global shape descriptor f.sup.out(x)=.sub.X(f.sup.in(x))(f.sup.in(x)).sup.Tdx, where f.sup.in(x)=(.sub.1.sup.in(x), . . . , .sub.P.sup.in(x)) is a P-dimensional input vector, =.sub.Xf.sup.in(x)dx, and f.sup.out(x) is a PP matrix column-stacked into a P.sup.2-dimensional vector. Typically, it is considered just before the output layer.
[0278] Each of the previous layers has input data and output data, and the output data of one layer in the sequence can be given as input to any subsequent layer, or layers, in the sequence. The present invention does not impose any constraint on the composition of said layers, including any limit on the order or the number of layers to consider.
[0279] Such sequence of customized layers can be thought of as a non-linear hierarchical parametric function .sub.(F), where F=(f(x.sub.1), . . . , f(x.sub.N)) is a PN matrix of input features at all the points of the geometric domain, and denotes the parameters of all the layers, and .sub.(F)=.sub.N.sub.N1 . . . .sub.0(F), where .sub.i indicates one of the aforementioned layers.
[0280] Advantageously, depending on the application in mind, the present invention allows to learn these parameters by minimizing some task specific cost function. The present invention allows learning of the parameters , i.e. intrinsic convolution templates, b-spline interpolation weights, linear combination weights, etc.
[0281] The applications that the proposed invention can deal with, include but are not necessarily limited to: [0282] invariant descriptors, i.e. finding descriptors that are the most similar as possible for corresponding points, or positives, and the most dissimilar as possible at non-corresponding points, or negatives; [0283] correspondences, i.e. labeling each vertex of a query geometric domain X with the index of a corresponding point on some reference geometric domain Y. [0284] shape retrieval, i.e. producing a global shape descriptor that discriminates between shape classes.
[0285] For this purpose, the present invention considers a siamese network configuration, composed of two identical copies of the same intrinsic convolutional neural network model sharing the same parameterization and fed by pairs of knowingly similar or dissimilar samples, and minimize a cost function such as, l()=(1).sub.i=1.sup.|T.sup.
[0286] Other examples of cost functions do not rely on the siamese network construction, therefore they can be minimized by only one intrinsic convolutional neural network. Example of such cost functions are: [0287] multinomial regression loss, l()=.sub.i=1.sup.|T|e.sub.ji log .sub.(f.sub.i), where e.sub.i is a unit vector with one at index i, and T={f(x.sub.i),j.sub.i} represents a set of known correspondences. [0288] regression loss, l()=.sub.i=1.sup.|T|.sub.(f.sub.i)j.sub.i.sub.p, where j.sub.i represents the desired output of the model (teacher signal) and .sub.p denotes the p-norm (usually p=2). As means of example j.sub.i can be the spectral coefficients of some target functions.
[0289] In some embodiments, the methods and processes described herein can be embodied as code and/or data. The software code and data described herein can be stored on one or more machine-readable media (e.g., computer-readable media), which may include any device or medium that can store code and/or data for use by a computer system. When a computer system reads and executes the code and/or data stored on a computer-readable medium, the computer system performs the methods and processes embodied as data structures and code stored within the computer-readable storage medium.
[0290] It should be appreciated by those skilled in the art that machine-readable media (e.g., computer-readable media) include removable and non-removable structures/devices that can be used for storage of information, such as computer-readable instructions, data structures, program modules, and other data used by a computing system/environment. A computer-readable medium includes, but is not limited to, volatile memory such as random access memories (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only-memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM), and magnetic and optical storage devices (hard drives, magnetic tape, CDs, DVDs); network devices; or other media now known or later developed that is capable of storing computer-readable information/data. Computer-readable media should not be construed or interpreted to include any propagating signals. A computer-readable medium that can be used with embodiments of the subject invention can be, for example, a compact disc (CD), digital video disc (DVD), flash memory device, volatile memory, or a hard disk drive (HDD), such as an external HDD or the HDD of a computing device, though embodiments are not limited thereto. A computing device can be, for example, a laptop computer, desktop computer, server, cell phone, or tablet, though embodiments are not limited thereto.
[0291] In some embodiments, one or more (or all) of the steps performed in any of the methods of the subject invention can be performed by one or more processors (e.g., one or more computer processors). For example, any or all of the means for applying an intrinsic convolution layer, the means for applying one or more layers as described herein, the means for determining similarity between two geometric objects, and the means for determining correspondence between objects from a class of geometric objects and a reference object can include or be a processor (e.g., a computer processor) or other computing device.
[0292] It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application.
[0293] All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.