SALIENCY PREDICTION METHOD AND SYSTEM FOR 360-DEGREE IMAGE

Abstract

The present disclosure provides a saliency prediction method and system for a 360-degree image based on a graph convolutional neural network. The method includes: firstly, constructing a spherical graph signal of an image of an equidistant rectangular projection format by using a geodesic icosahedron composition method; then inputting the spherical graph signal into the proposed graph convolutional neural network for feature extraction and generation of a spherical saliency graph signal; and then reconstructing the spherical saliency graph signal into a saliency map of an equidistant rectangular projection format by using a proposed spherical crown based interpolation algorithm. The present disclosure further proposes a KL divergence loss function with sparse consistency. The method can achieve excellent saliency prediction performance subjectively and objectively, and is superior to an existing method in computational complexity.

Claims

1. A saliency prediction method for a 360-degree image based on a graph convolutional neural network, comprising: constructing a graph signal of a 360-degree image of an equidistant rectangular projection format by using a geodesic icosahedron projection technology, and generating a spherical graph signal; inputting the generated spherical graph signal into the graph convolutional neural network for feature extraction, and outputting a single-channel saliency spherical graph signal with a size identical to that of the input spherical graph signal; interpolating the output single-channel saliency spherical graph signal by using a spherical crown based interpolation algorithm, so as to convert the single-channel saliency spherical graph signal into an image of an equidistant rectangular projection format, and further reconstruct a 360-degree saliency map of the equidistant rectangular projection format; and predicting saliency of the 360-degree image according to the reconstructed 360-degree saliency map of the equidistant rectangular projection format; wherein the geodesic icosahedron projection technology adopts a geodesic icosahedron composition method in a spherical graph convolutional network (SGCN), and the geodesic icosahedron composition method comprises: firstly, constructing a largest internal icosahedron for the 360-degree image, wherein twelve vertices of the icosahedron are used as a spherical graph signal of level 0; then, taking a midpoint of each edge of the icosahedron and extending a center of a sphere to intersect with a spherical surface through a midpoint structural ray, wherein intersecting nodes are new sampling points; combining the new sampling points with sampling points of level 0 to form a spherical graph signal of level 1; and based on the obtained new sampling points, repeating the process continuously to generate a higher-level spherical graph signal, which is the generated spherical graph signal.

2. The saliency prediction method according to claim 1, wherein the graph convolutional neural network comprises a graph convolutional layer, a graph pooling layer and a graph unpooling layer; the graph convolutional layer adopts a convolution operation in a Chebyshev network (ChebNet) to extract features of a spherical graph signal; the graph pooling layer adopts a rotation equivariant pooling operation in SGCN to down-sample the spherical graph signal; and the graph unpooling layer introduces feature information of neighboring nodes in an unpooling process, and up-samples the spherical graph signal.

3. The saliency prediction method according to claim 2, wherein the graph convolutional neural network adopts an encoder-decoder network structure and comprises: an encoder comprising 5 graph convolutional layers and 4 graph pooling layers, and encoding the input spherical graph signal into a high-dimensional graph signal with the size of 1/256 of an original size; a decoder comprising 5 graph convolutional layers and 4 graph unpooling layers, and decoding the high-dimensional graph signal encoded by the encoder into a one-dimensional graph signal with a same size as the input spherical graph signal to represent saliency distribution; and inputs of first 4 graph convolutional layers of the decoder are respectively constituted by connection of the output of the previous graph convolutional layer with a feature graph with a same number of nodes in the decoder part.

4. The saliency prediction method according to claim 1, wherein the graph convolutional neural network uses a Kullback-Leibler (KL) divergence loss function with a sparse consistency feature for network training; and the KL divergence loss function KL.sub.sc is expressed as: $K L_{s c} = \frac{λ}{1 + λ} K L_{s} + \frac{1}{1 + λ} K L_{h i s t}$ where: $K L_{s} = {.Math.}_{t = 1}^{N} G_{g t} (v_{t}) \log [\frac{G_{g t} (v_{t})}{G_{s} (v_{t})}],$ $K L_{h i s t} = {.Math.}_{i = 1}^{255} h i s t (G_{g t}) (i) \log [\frac{h i s t (G_{g t}) (i)}{h i s t (G_{s}) (i)}]$ where: G.sub.gt(ν.sub.t) and G.sub.s(ν.sub.t) represent a spherical graph signal constructed by a true saliency map and a spherical saliency graph signal predicted by the network, respectively; KL.sub.s represents the traditional KL divergence loss of G.sub.gt(ν.sub.t) and G.sub.s(ν.sub.t); hist( ▪ ) represents histogram solution of a vector, which here means solution of histogram distribution of values of the spherical graph signal; then the histogram distribution of G.sub.gt(ν.sub.t) and G.sub.s(ν.sub.t) is calculated to obtain KL.sub.hist; and finally the loss function KL.sub.sc with sparse consistency is obtained by introducing a weighting λ.

5. The saliency prediction method according to claim 1, wherein the spherical crown based interpolation algorithm comprises: firstly, calculating spherical coordinates of grid points of a standard equidistant rectangular projection format; secondly, constructing a spherical crown with a fixed size on the spherical surface with each grid point as the center; then, counting all the nodes falling on the spherical crown in a single-channel saliency spherical graph signal and calculating an Euclidean distance between the nodes and the center of the spherical crown; and finally, calculating pixel values of the center of the spherical crown, i.e., the grid points of the equidistant rectangular projection format, by inverse distance weighting of all the nodes falling on the spherical crown in the single-channel saliency spherical graph signal, and reconstructing the 360-degree saliency map of the equidistant rectangular projection format.

6. The saliency prediction method according to claim 1, further comprising: smoothing the obtained 360-degree saliency map of the equidistant rectangular projection format by a Gaussian kernel to obtain a smoother saliency map.

7. A saliency prediction system for a 360-degree image based on a graph convolutional neural network, comprising: a graph signal construction module configured to construct a graph signal of a 360-degree image of an equidistant rectangular projection format by using a geodesic icosahedron composition module, and generate a spherical graph signal; a graph convolutional network module configured to input the generated spherical graph signal into the graph convolutional neural network for feature extraction, and output a single-channel saliency spherical graph signal with a size identical to that of the input spherical graph signal; and an interpolation and reconstruction module configured to interpolate the output single-channel saliency spherical graph signal by using a spherical crown based interpolation algorithm, so as to convert the single-channel saliency spherical graph signal into an image of an equidistant rectangular projection format, and further reconstruct a 360-degree saliency map of the equidistant rectangular projection format, and predict saliency of the 360-degree image according to the reconstructed 360-degree saliency map of the equidistant rectangular projection format; wherein the geodesic icosahedron projection technology adopts a geodesic icosahedron composition method in SGCN, and the geodesic icosahedron composition method comprises: firstly, constructing a largest internal icosahedron for the 360-degree image, wherein twelve vertices of the icosahedron are used as a spherical graph signal of level 0; then, taking a midpoint of each edge of the icosahedron and extending a center of a sphere to intersect with a spherical surface through a midpoint structural ray, wherein intersecting nodes are new sampling points; combining the new sampling points with sampling points of level 0 to form a spherical graph signal of level 1; and based on the obtained new sampling points, repeating the process continuously to generate a higher-level spherical graph signal, which is the generated spherical graph signal.

8. The saliency prediction system according to claim 7, wherein the graph convolutional neural network adopts an encoder-decoder network structure, and comprises an encoder comprising 5 graph convolutional layers and 4 graph pooling layers, and encoding the input spherical graph signal into a high-dimensional graph signal with the size of 1/256 of the original size; a decoder comprising 5 graph convolutional layers and 4 graph unpooling layers, and decoding the high-dimensional graph signal encoded by the encoder into a one-dimensional graph signal with the same size as the input spherical graph signal to represent saliency distribution; and inputs of first 4 graph convolutional layers of the decoder are respectively constituted by connection of the output of the previous graph convolutional layer with a feature graph with the same number of nodes in the decoder part.

9. The saliency prediction system according to claim 8, wherein the system is defined as a graph node level regression model; and the saliency prediction result is an objective optimization problem of the regression model, so that: $\min_{θ_{k}} {.Math.}_{t = 0}^{N} d i s t |G I C O P i x (E_{g t}) [v_{t}], N_{G} (G I C O P i x (E_{i})) [v_{t}]|$ where: E.sub.i and E.sub.gt respectively represent the 360-degree image of the equidistant rectangular projection format and a corresponding true saliency map, which are constructed into spherical graph signals with the same number of nodes by the geodesic icosahedron composition module GICOPix(▪); then, the constructed spherical graph signals are input into the graph convolutional neural network N.sub.G( ▪ ) to generate the saliency spherical graph signals predicted by the network; the objective optimization process is to optimize learnable weights θ.sub.k, so that the distance between the saliency spherical graph signal output by the graph convolutional neural network and the true saliency graph signal is as small as possible, thereby realizing training of the regression model.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0033] Other features, objects and advantages of the present disclosure will become more apparent by reading the detailed description of the non-limited embodiments with reference to the following drawings:

[0034] FIG. 1 is a flow chart of a saliency prediction method for a 360-degree image based on a graph convolutional neural network in an embodiment of the present disclosure;

[0035] FIG. 2 is a schematic diagram of spherical graph signal construction in an embodiment of the present disclosure;

[0036] FIG. 3 is a schematic structural diagram of a graph convolutional neural network in an embodiment of the present disclosure;

[0037] FIG. 4 is a schematic diagram of a graph pooling method in an embodiment of the present disclosure;

[0038] FIG. 5 is a schematic diagram of a graph unpooling method in an embodiment of the present disclosure; and

[0039] FIG. 6 is a schematic diagram of a spherical crown based interpolation algorithm in an embodiment of the present disclosure.

DETAILED DESCRIPTION OF THE INVENTION

[0040] The present disclosure will be described in detail with reference to the following specific embodiments. The following embodiments will help those skilled in the art further understand the present disclosure, but will not limit the present disclosure in any way. It should be pointed out that for those skilled in the art, several modifications and improvements can be made without departing from the concept of the present disclosure, which all belong to the scope of protection of the present disclosure.

[0041] An embodiment of the present disclosure provides a saliency prediction method for a 360-degree image based on a graph convolutional neural network. Aiming at the problems of poor prediction performance and high calculation cost or the like in the prior art, the method includes: firstly generating a spherical graph signal from a 360-degree image by a geodesic icosahedron composition method, then using a graph convolutional network to extract features of a spherical image and generate a saliency spherical graph signal, then reconstructing the graph signal into a 360-degree image of an equidistant rectangular projection format by an interpolation algorithm, and finally obtaining a final result by Gaussian kernel smoothing. The present embodiment avoids an interpolation operation of a feature graph in the convolution process to ensure the performance of the method, and greatly reduces calculation cost and improves prediction efficiency.

[0042] The saliency prediction method for a 360-degree image based on a graph convolutional neural network provided by the present embodiment includes the following steps: [0043] step 1, constructing a graph signal of a 360-degree image of an equidistant rectangular projection format to generate a spherical graph signal; [0044] step 2, inputting the spherical graph signal obtained in step 1 into the graph convolutional neural network for feature extraction, and generating a single-channel saliency spherical graph signal with a size identical to that of the input spherical graph signal; and [0045] step 3, reconstructing the saliency spherical graph signal output in step 2 by using a spherical crown based interpolation algorithm into a 360-degree saliency map of the equidistant rectangular projection format.

[0046] As a preferred embodiment, in step 1, geodesic icosahedron projection in a spherical graph convolutional network (SGCN) is used to generate the spherical graph signal. A specific construction method includes: [0047] (1) firstly, constructing a largest internal icosahedron for a sphere, wherein twelve vertices of the icosahedron are used as a spherical graph signal of level 0; [0048] (2) then, taking a midpoint of each edge of the icosahedron and extending a center of a sphere to intersect with a spherical surface through a midpoint structural ray, wherein intersecting nodes are new sampling points; combining the new sampling points with sampling points of level 0 to form a spherical graph signal of level 1; and [0049] (3) repeating the process of (2) iteratively for the spherical graph signal to generate a higher-level spherical graph signal, which is a generated spherical graph signal.

[0050] As a preferred embodiment, in step 2, the graph convolutional neural network includes a graph convolutional layer, a graph pooling layer and a graph unpooling layer: [0051] the graph convolutional layer adopts a convolution operation in a Chebyshev network (ChebNet) to extract features of a spherical graph signal; [0052] the graph pooling layer adopts a rotation-equivariant pooling operation in SGCN to down-sample the spherical graph signal; and [0053] the graph unpooling layer uses a graph unpooling operation, namely, the graph unpooling layer introduces feature information of neighboring nodes in an unpooling process to up-sample the spherical graph signal.

[0054] As a preferred embodiment, in step 2, the graph convolutional neural network adopts an encoder-decoder network structure similar to U-net, and includes: [0055] an encoder including 5 graph convolutional layers and 4 graph pooling layers, and encoding the input spherical graph signal into a high-dimensional graph signal with the size of 1/256 of an original size; and [0056] a decoder including 5 graph convolutional layers and 4 graph unpooling layers, and decoding the high-dimensional graph signal encoded by the encoder into a one-dimensional graph signal with a same size as the input spherical graph signal to represent saliency distribution.

[0057] Particularly, inputs of first 4 graph convolutional layers of the decoder are respectively constituted by connection of the output of the previous graph convolutional layer with a feature graph with a same number of nodes in the decoder part.

[0058] As a preferred embodiment, in step 2, the graph convolutional neural network uses a KL (Kullback-Leibler) divergence loss function with a sparse consistency feature for network training; and the KL divergence loss function KL.sub.sc is expressed as:

[00005] $K L_{s c} = \frac{λ}{1 + λ} K L_{s} + \frac{1}{1 + λ} K L_{h i s t}$

where:

[00006] $K L_{s} = {.Math.}_{t = 1}^{N} G_{g t} (v_{t}) \log [\frac{G_{g t} (v_{t})}{G_{s} (v_{t})}],$

[00007] $K L_{h i s t} = {.Math.}_{i = 1}^{255} h i s t (G_{g t}) (i) \log [\frac{h i s t (G_{g t}) (i)}{h i s t (G_{s}) (i)}]$

where: G.sub.gt(v.sub.t) and G.sub.s(v.sub.t) represent a spherical graph signal constructed by a true saliency map and a spherical saliency graph signal predicted by the network, respectively; KL.sub.s represents the traditional KL divergence loss of G.sub.gt(v.sub.t) and G.sub.s(v.sub.t); hist(.Math.) represents histogram solution of a vector, which here means solution of histogram distribution of values of the spherical graph signal; then the histogram distribution of G.sub.gt(v.sub.t) and G.sub.s(v.sub.t) is calculated to obtain KL.sub.hist; and finally the loss function KL.sub.sc with sparse consistency is obtained by introducing a weighting λ. Specifically, the loss function includes KL divergence directly calculated for the graph signal and the KL divergence of the histogram distribution of the graph signal. For the KL divergence directly calculated for the graph signal, the KL divergence is directly calculated for the graph signal output by the graph convolutional neural network and the graph signal constructed by the true saliency graph. For the KL divergence of the histogram of the graph signal, firstly, the histogram distribution of the graph signal output by the network and the graph signal constructed by the true saliency graph is calculated; and then the KL divergence of the calculated histogram distribution is calculated. Finally, the KL divergence of the two parts is weighted to obtain the final KL divergence loss function with a sparse consistency feature, so as to achieve the similarity of both spatial distribution and numerical distribution.

[0059] As a preferred embodiment, in step 3, the spherical crown based interpolation algorithm is used to realize conversion of the spherical graph signal to the image of the equidistant rectangular projection format. Specifically, the process is divided into the following steps: [0060] (a) calculating spherical coordinates of all grid points of an equidistant rectangular projection format; [0061] (b) constructing a spherical crown area with each grid point as the center; [0062] (c) determining nodes falling on each spherical crown area in the spherical graph signal; [0063] (d) calculating an Euclidean distance between the nodes on each spherical crown area and the center of the spherical crown; [0064] (e) calculating pixel values of the center of the spherical crown by inverse distance weighting of the nodes in each spherical crown area through the distances in (d); and [0065] (f) calculating a grid position of the equidistant rectangular projection format for the center of each spherical crown, wherein the pixel value of the spherical crown is the pixel value of the corresponding grid point.

[0066] As a preferred embodiment, the method further includes: step 4, smoothing the saliency map of the equidistant rectangular projection format obtained in step 3 by a Gaussian kernel to obtain a smoother saliency map.

[0067] The method provided by the present embodiment is further described below with reference to the following drawings.

[0068] As shown in FIG. 1, specific implementation of the method includes the following steps: [0069] 1. Constructing a spherical graph signal for a spherical image [0070] As shown in FIG. 2, an example of spherical graph signal construction is analyzed; and a 360-degree image of an equidistant rectangular projection format is defined as E.sub.i(x, y) ∈ ℝ.sup.W×H×3, where 3 represents a RGB channel. Then, the graph signal is constructed by a composition method based on a geodesic icosahedron; and the specific method specifically includes: Particularly, the relationship between the number of nodes of G.sub.1 and l is N.sub.l = 10 × 2.sup.2l + 2. [0071] (1) firstly, constructing a largest internal icosahedron for a sphere, wherein twelve vertices of the icosahedron are used as a spherical graph signal G.sub.0 of level 0; [0072] (2) then, taking a midpoint of each side of the icosahedron and extending a center of a sphere to intersect with a spherical surface through a midpoint structural ray, wherein intersecting nodes are new sampling points; combining the new sampling points with G.sub.0 to form a spherical graph signal G.sub.1 of level 1; and [0073] (3) repeating the process of (2) continuously to obtain a higher-level spherical graph signal G.sub.1. [0074] 2. Inputting the spherical graph signal into the graph convolutional network to generate a spherical saliency graph signal [0075] As shown in FIG. 3, the graph convolutional neural network structure uses a network structure similar to U-net, wherein a Chebyshev network (ChebNet) is used in graph convolution. Specifically, for the graph signal G.sub.l(V, ε, W) (V, ε, W respectively represent the node set, edge set and adjacency matrix of the graph signal) constructed in 1, a normalized Laplacian matrix thereof can be defined as L = I.sub.N –D.sup.-½WD.sup.-½, in which D represents a degree matrix; and I.sub.N is the N-order identity matrix. Further convolution operation of the Chebyshev network (ChebNet) is defined as: [0076] Where: K represents the order of Chebyshev polynomial; θ.sub.k is the coefficient of Chebyshev polynomial; where λ.sub.max represents the largest eigenvalue of L; and T.sub.k(L) = 2LT.sub.k–1(L) – T.sub.k–2(L), where T.sub.0 = I.sub.N, T.sub.1 = L. [0077] FIG. 4 shows a graph pooling operation of the graph convolutional network. Here, the graph pooling operation in SGCN is directly adopted. Specifically, for the graph signal G.sub.l of level 1, after the graph pooling operation, the newly added nodes from G.sub.l–1 to G.sub.l will be discarded; and only the values of the node positions of G.sub.l–1 will be kept. In addition, N.sub.l/N.sub.l–1 ≈ 4, namely the scaling ratio of traditional pooling operation is roughly realized. FIG. 4 shows the graph signal change relationship after multiple pooling operations in the graph convolutional network, where Layer h (Level l) represents the level l graph signal corresponding to the h layer of the convolutional network; Layer h+1 (Level l-1) represents the level l-1 graph signal corresponding to the h+1 layer of the convolutional network; and Layer h+2 (Level l-2) represents the level l-2 graph signal corresponding to the h+2 layer of the convolutional network. [0078] FIG. 5 shows the proposed graph unpooling operation, which aims to realize the inverse operation of graph pooling operation. Specifically, a method of linear interpolation is used to up-sample the graph signal. V.sub.l is first defined as the node set of G.sub.l, then the unpooling operation can be expressed as follows: [0079] Where: represents the k-th node in G.sub.l; Pos(.Math.) means returning of the rectangular coordinates of the node; and Nei(.Math.) means returning of the two neighboring nodes of the node in G.sub.l-1. The whole interpolation process can be understood as replacing the newly added node with the average value of neighboring nodes thereof while keeping the original node. [0080] In addition, for network training, a KL divergence loss with sparse consistency is used for network training. Specifically, the whole loss function is divided into two parts, one of which is the KL divergence between the spherical graph signal output by the network and the true spherical saliency graph signal: where G.sub.s(ν.sub.t) and G.sub.gt(ν.sub.t) respectively represent the spherical graph signal output by the network and the true spherical saliency graph signal. Then, in order to make G.sub.gt(v.sub.t) and G.sub.s(v.sub.t) have similarity in sparsity, the histogram distributions of G.sub.gt(v.sub.t) and G.sub.s(v.sub.t) are further calculated; and the KL divergence between the histogram distributions is calculated: where hist(.Math.) is the operation of calculating the histogram. [0081] 3. Using the spherical saliency graph signal output in 2 to obtain a 360-degree image of an equidistant rectangular projection format by a spherical crown based interpolation algorithm [0082] As shown in FIG. 6, for an interpolation operation, grid point coordinates (m.sub.i, n.sub.i), 1 ≤ i ≤ W × H of the equidistant rectangular projection format are defined at first, wherein R represents the radius of the sphere; P represents an arbitrary point on the edge of the spherical crown; and α represents the size of the spherical crown, which chooses α = 2π/W–1 here. Then the output E.sub.o(m.sub.i, n.sub.i) of the interpolation algorithm can represent the i-th pixel of the 360-degree image of the equidistant rectangular projection format. Then the rectangular coordinates (x.sub.i, y.sub.i, z.sub.i) of (m.sub.i, n.sub.i) are calculated by the following formula: [0083] Then, a spherical crown area with a height of (R[1 – cos (α) ]) is constructed with (x.sub.i, y.sub.i, z.sub.i) as the center; and the size of the spherical crown area is controlled by α. In the present embodiment, α = 2π/(W–1). The plane at the bottom of the spherical crown can be expressed as follows: [0084] Then, all the nodes in the spherical saliency graph signal output by the network that fall into the spherical crown area are added to the set U. Particularly, the nodes belonging to the spherical crown should satisfy the following relationship: [0085] Finally, all the nodes in U are treated by inverse distance weighting to obtain E.sub.o(m.sub.i, n.sub.i); and all the points in E.sub.o in the process can be traversed to obtain the complete E.sub.o. The inverse distance weighting formula is as follows: [0086] Where: e = 1e – 8.

Implementation Effects

[0087] According to the above steps, the method steps provided by the present embodiment are adopted for implementation. The experiment conducts network training and testing with the head + eye movement data set in Salient360 data set, and realizes comparison with SalNet360, SalGAN360, BMS360, BMS and GBVS360 on subjective and objective levels.

[0088] The method proposed in the present embodiment is at an excellent level in the intuitive level, and has better saliency prediction performance in the high latitude areas (top and bottom) of the image. Meanwhile, on the objective level, the method can achieve approximate objective performance under the condition that the computational complexity is 3 orders of magnitude lower than that of the best performance method SalGAN360.

[0089] Another embodiment of the present disclosure provides a saliency prediction system for a 360-degree image based on a graph convolutional neural network, including: [0090] a graph signal construction module configured to construct a graph signal of a 360-degree image of an equidistant rectangular projection format by using a geodesic icosahedron composition module, and generate a spherical graph signal; [0091] a graph convolutional network module configured to input the generated spherical graph signal into the graph convolutional neural network for feature extraction, and output a single-channel saliency spherical graph signal with a size identical to that of the input spherical graph signal; and [0092] an interpolation and reconstruction module configured to interpolate the output single-channel saliency spherical graph signal by using a spherical crown based interpolation algorithm, so as to convert the single-channel saliency spherical graph signal into an image of an equidistant rectangular projection format, and further reconstruct a 360-degree saliency map of the equidistant rectangular projection format, and predict saliency of the 360-degree image according to the reconstructed 360-degree saliency map of the equidistant rectangular projection format.

[0093] As a preferred embodiment, the geodesic icosahedron composition module adopts a spherical graph signal generated by a geodesic icosahedron composition method in SGCN.

[0094] As a preferred embodiment, the graph convolutional neural network adopts an encoder-decoder network structure, and includes: an encoder including 5 graph convolutional layers and 4 graph pooling layers, and encoding the input spherical graph signal into a high-dimensional graph signal with the size of 1/256 of an original size; a decoder including 5 graph convolutional layers and 4 graph unpooling layers, and decoding the high-dimensional graph signal encoded by the encoder into a one-dimensional graph signal with the same size as the input spherical graph signal to represent saliency distribution; and inputs of first 4 graph convolutional layers of the decoder are respectively constituted by connection of the output of the previous graph convolutional layer with a feature graph with the same number of nodes in the decoder part.

[0095] As a preferred embodiment, the system is defined as a graph node level regression model; and the saliency prediction result is an objective optimization problem of the regression model, so that:

[00019] $\min_{θ_{k}} {.Math.}_{t = 0}^{N} d i s t |G I C O P i x (E_{g t}) [v_{t}], N_{g} (G I C O P i x (E_{i})) [v_{t}]|$

Where: E.sub.i and E.sub.gt respectively represent the 360-degree image of the equidistant rectangular projection format and a corresponding true saliency map, which are constructed into spherical graph signals with the same number of nodes by the geodesic icosahedron composition module GICOPix(.Math.). Then, the constructed spherical graph signals are input into the graph convolutional neural network N.sub.G(.Math.) to generate the saliency spherical graph signals predicted by the network. The objective optimization process is to optimize learnable weights θ.sub.k, so that the distance between the saliency spherical graph signal output by the graph convolutional neural network and the true saliency graph signal is as small as possible, thereby realizing training of the regression model.

[0096] The saliency prediction method and system for the 360-degree image based on the graph convolutional neural network provided by the above embodiments of the present disclosure include: firstly, constructing an image of an equidistant rectangular projection format into a spherical graph signal by using a geodesic icosahedron composition method; then inputting the spherical graph signal into the proposed graph convolutional neural network for feature extraction and generation of a spherical saliency graph signal; and then reconstructing the spherical saliency graph signal into a saliency map of an equidistant rectangular projection format by using a proposed spherical crown based interpolation algorithm. In order to realize effective model training of the method and system, the present disclosure further proposes a KL divergence loss function with sparse consistency. The saliency prediction method and system for the 360-degree image provided by the above embodiments of the present disclosure can achieve excellent saliency prediction performance subjectively and objectively, and are superior to an existing method in computational complexity.

[0097] It should be noted that the steps in the method provided by the present disclosure can be realized by using the corresponding modules, devices, units, etc. in the system; and those skilled in the art can refer to the technical solution of the system to realize the step flow of the method, that is, the embodiments in the system can be understood as the preferred embodiments for realizing the method, which will not be repeated here. The specific embodiments of the present disclosure have been described above. It should be understood that the present disclosure is not limited to the above specific embodiments; and those skilled in the art can make various changes or modifications within the scope of the claims, which will not affect the essential content of the present disclosure.

SALIENCY PREDICTION METHOD AND SYSTEM FOR 360-DEGREE IMAGE

Assignee

Inventors

Cpc classification

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06T5/20

PHYSICS

Classification Explorer

G06T5/002

PHYSICS

Classification Explorer

G06V10/462

PHYSICS

Classification Explorer

G06V10/426

PHYSICS

Classification Explorer

G06V20/56

PHYSICS

Classification Explorer

H04N23/698

ELECTRICITY

International classification

Classification Explorer

G06V10/46

PHYSICS

Classification Explorer

G06T5/00

PHYSICS

Classification Explorer

G06T5/20

PHYSICS

Classification Explorer

G06V10/426

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

H04N23/698

ELECTRICITY

Abstract

Claims

Description