CONTOUR SHAPE RECOGNITION METHOD
20230047131 · 2023-02-16
Inventors
Cpc classification
G06V10/44
PHYSICS
G06V10/755
PHYSICS
G06V10/462
PHYSICS
International classification
G06V10/46
PHYSICS
G06V10/74
PHYSICS
Abstract
Provided is a contour shape recognition method, including: sampling and extracting salient feature points of a contour of a shape sample; calculating a feature function of the shape sample at a semi-global scale by using three types of shape descriptors; dividing the scale with a single pixel as a spacing to acquire a shape feature function in a full-scale space; storing feature function values at various scales into a matrix to acquire three types of feature grayscale map representations of the shape sample in the full-scale space; synthesizing the three types of grayscale map representations of the shape sample, as three channels of RGB, into a color feature representation image; constructing a two-stream convolutional neural network by taking the shape sample and the feature representation image as inputs at the same time; and training the two-stream convolutional neural network, and inputting a test sample into a trained network model to achieve shape classification.
Claims
1. A contour shape recognition method, comprising the following steps: step 1, sampling and extracting salient feature points of a contour of a shape sample; step 2, calculating a shape feature function of the shape sample at a semi-global scale by using three types of shape descriptors; step 3, dividing the scale with a single pixel as a spacing to acquire a shape feature function in a full-scale space; step 4, storing shape feature function values at various scales into a matrix to acquire three types of shape feature grayscale map representations of the shape sample in the full-scale space; step 5, synthesizing the three types of shape feature grayscale map representations of the shape sample, as three channels of RGB, into a color feature representation image; step 6, constructing a two-stream convolutional neural network by taking the shape sample and the color feature representation image as inputs at the same time; and step 7, training the two-stream convolutional neural network, and inputting a test sample into a trained network model to achieve classified recognition of the contour shape.
2. The method for recognizing the contour shape according to claim 1, wherein in step 1, extracting the salient feature points of the contour of the shape sample is that: the contour of each shape sample is composed of a series of sampling points, and for any shape sample S,
S={p.sub.x(i),p.sub.y(i)|i∈[1,n]}, wherein p.sub.x(i), p.sub.y(i) indicates coordinates of a contour sampling point p(i) in a two-dimensional plane, and n indicates the length of the contour; the salient feature points are extracted by evolving a contour curve of the shape sample, and during each evolution process, a point that contributes the least to target recognition is deleted, wherein the contribution of each point p(i) is defined as:
3. The method for recognizing the contour shape according to claim 2, wherein in step 2, a method for calculating the shape feature function of the shape sample in the semi-global scale specifically comprises: using three types of shape descriptors M:
M={s.sub.k(i),l.sub.k(i),c.sub.k(i)|k∈[1,m],i∈[1,n]}, wherein s.sub.k, l.sub.k, c.sub.k are three invariants, namely, a normalized area s, a normalized arc length l, and a normalized barycentric distance c, at a scale k, k is a scale label, and in is the total number of scales; defining descriptors of the three shape invariants respectively: making a preset circle C.sub.1(i) with an initial radius
s.sub.1*(i)=∫.sub.C.sub.
c.sub.1*(i)=∥p(i)−w.sub.1(i)∥, finally, using a ratio of the radius of c.sub.1*(i) to the radius of the preset circle C.sub.1(i) of the target contour point p(i) as a barycenter parameter c.sub.1(i) of the multiscale invariant descriptor of the target contour point p(i):
M.sub.1={s.sub.1(i),l.sub.1(i),c.sub.1(i)|i∈[1,n]}.
4. The method for recognizing the contour shape according to claim 3, wherein in step 3, a method for calculating the shape feature function of the shape sample in the full-scale space specifically comprises: selecting a single pixel as a continuous scale change spacing in the full-scale space since a digital image takes one pixel as the smallest unit, that is, for a k.sup.th scale label, setting a radius r.sub.k of a circle C.sub.k(i):
M={s.sub.k(i),l.sub.k(i),c.sub.k(i)|k∈[1,m],i∈[1,n]}.
5. The method for recognizing the contour shape according to claim 4, wherein in step 4, the shape feature functions at various scales are respectively stored into the matrix, and are combined in a continuous scale change order to acquire the three types of shape feature grayscale map representations of the shape sample in the full-scale space:
G={s,l,c} wherein s, l, c each indicate a grayscale matrix with a size of m×n.
6. The method for recognizing the contour shape according to claim 5, wherein in step 5, the three types of shape feature grayscale map representations of the shape sample are synthesized, as the three channels of RGB, into a color feature representation image, which acts as tensor representation T.sub.m×n×3 of the shape sample S, wherein
7. The method for recognizing the contour shape according to claim 6, wherein in step 6, a structure for constructing the two-stream convolutional neural network comprises a two-stream input layer, a pre-training layer, fully connected layers and an output layer, wherein the pre-training layer is composed of the first four modules of a VGG16 network model, and parameters acquired after the four modules are trained in an imagenet data set are used as initialization parameters, and three fully connected layers are connected after the pre-training layer; in the pre-training layer, a first module specifically comprises two convolution layers and one maximum pooling layer, wherein each of the convolution layers has 64 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a second module specifically comprises two convolution layers and one maximum pooling layer, wherein each of the convolution layers has 128 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a third module specifically comprises three convolution layers and one maximum pooling layer, wherein each of the convolution layers has 256 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a fourth module specifically comprises three convolution layers and one maximum pooling layer, wherein each of the convolution layers has 512 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a calculation formula for each convolution layer is:
C.sub.O=ϕ.sub.relu(W.sub.C.Math.C.sub.I+θ.sub.C), wherein ϕ.sub.relu is a relu activation function, θ.sub.C is a bias vector of the convolutional layer, W.sub.C is a weight of the convolutional layer, C.sub.I is an input of the convolutional layer, and C.sub.O is an output of the convolutional layer; a module of the fully connected layers specifically comprises three fully connected layers, wherein a first fully connected layer contains 4096 nodes, a second fully connected layer contains 1024 nodes, a third fully connected layer contains N nodes, with N representing the number of types contained in a sample data set, and a calculation formula for the first two fully connected layers is:
F.sub.O=ϕ.sub.tan h(W.sub.F.Math.F.sub.I+θ.sub.F), wherein θ.sub.tan h is a tan h activation function, θ.sub.F is a bias vector of the fully connected layers, W.sub.F is a weight of the fully connected layers, F.sub.I is an input of the fully connected layers, and F.sub.O is an output of the fully connected layers; the last fully connected layer is an output layer, has an output calculated with a formula as follows:
Y.sub.O=ϕ.sub.softmax(W.sub.Y.Math.Y.sub.I+θ.sub.Y), wherein ϕ.sub.softmax is a softmax activation function, θ.sub.Y is a bias vector of the output layer, W.sub.Y is a weight of the output layer, Y.sub.I is an input of the output layer, and Y.sub.O is an output of the output layer; and each neuron of the output layer represents a corresponding shape category.
8. The method for recognizing the contour shape according to claim 7, wherein in step 7, a method for achieving classified recognition of the contour shape specifically comprises: inputting all training samples into the two-stream convolutional neural network to train the two-stream convolutional neural network model; inputting the test sample into the trained two-stream convolutional neural network model; and determining a shape category, corresponding to a maximum value among output vectors, as a shape type of the test sample, thereby achieving the classified recognition of the contour shape.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
DETAILED DESCRIPTION
[0062] The technical solutions in the embodiments of the present invention will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the embodiments described are merely some instead of all of the embodiments of the present invention. Based on the embodiments in the present invention, any other embodiments acquired by a person of ordinary skills in the art without making creative efforts shall fall within the protection scope of the present invention.
[0063] As shown in
[0064] 1. As shown in
S={p.sub.x(i),p.sub.y(i)|i∈[1,100]},
[0065] Wherein p.sub.x(i),p.sub.y(i) indicates coordinates of a contour sampling point p(i) in a two-dimensional plane.
[0066] 2. As shown in
[0067] making a preset circle C.sub.1(i) with an initial radius
by taking a contour sampling point p(i) as a circle center, i.e., a target contour point, the preset circle being an initial semi-global scale of the target contour point. After the preset circle C.sub.1(i) is acquired according to the above steps, a part of the target shape would necessarily fall within the preset circle, as schematically shown in
s.sub.1*(i)=∫.sub.C.sub.
[0068] wherein B(Z.sub.1(i), x) is an indicator function, which is defined as:
[0069] a ratio of the area of Z.sub.1(i) to the area of the preset circle C.sub.1(i) is used as an area parameter s.sub.1(i) for a multiscale invariant descriptor of the target contour point p(i):
[0070] and the value range of s.sub.1(i) should be between 0 and 1.
[0071] In the case of calculating the barycenter of a region having a direct connection relationship with the target contour point p(i), it specifically includes averaging the coordinate values of all pixel points in the region to acquire a result that is the coordinate values of the barycenter of the region. This process can be expressed as:
[0072] wherein w.sub.1(i) indicates the barycenter of the area.
[0073] Calculating a distance c.sub.1*(i) between the target contour point p(i) and the barycenter w.sub.1(i) can be expressed as:
c.sub.1*(i)=∥p(i)−w.sub.1(i)∥,
[0074] a ratio of c.sub.1*(i) to the radius of the preset circle C.sub.1(i) of the target contour point p(i) is used as a barycenter parameter c.sub.1(i) of the multiscale invariant descriptor of the target contour point p(i):
[0075] and the value range of c.sub.1(i) should be between 0 and 1.
[0076] After the preset circle is acquired according to the above steps, one or more arc segments would necessarily fall within the preset circle after the contour of the target shape is cut by the preset circle, as shown in
[0077] and the value range of l.sub.1(i) should be between 0 and 1.
[0078] Based on the above steps, the feature function of the shape sample S at the semi-global scale having a scale label k=1 and the initial radius
is calculated:
M.sub.1={s.sub.1(i),l.sub.1(i),c.sub.1(i)|i∈[1,100]},
[0079] The feature functions calculated at this layer of scale are stored into a feature vector.
[0080] 3. As shown in
[0081] That is, in the case of an initial scale k=1,
and thereafter, the radius r.sub.k is reduced 99 times at an equal amplitude by taking one pixel as a unit, until reaching the smallest scale k=100. The feature functions of the shape sample S in the full-scale space are obtained by calculation:
M={s.sub.k(i),l.sub.k(i),c.sub.k(i)k∈[1,100],i∈[1,100]}.
[0082] 4. As shown in
G={s,l,c},
[0083] wherein s, l, c each indicate a grayscale matrix with a size of m×n.
[0084] 5. As shown in
[0085] wherein
[0086] 6. A two-stream convolutional neural network is constructed, including a two-stream input layer, a pre-training layer, fully connected layers, and an output layer. The present invention normalizes the size of an original contour shape to 100*100. Then, both the original shape and its corresponding feature representation image are simultaneously input into a two-stream convolutional neural network structure model for training. In the present invention, an sgd optimizer is used; a learning rate is set to 0.001; a delay rate is set to 1e-6; a cross entropy is selected as a loss function; the weight of a two-stream feature is set to 1:1; softmax is selected as a classifier; and 128 is selected as the batch size. As shown in
[0087] In the pre-training layer, a first module specifically comprises two convolution layers and one maximum pooling layer, wherein each of the convolution layers has 64 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a second module specifically comprises two convolution layers and one maximum pooling layer, wherein each of the convolution layers has 128 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a third module specifically comprises three convolution layers and one maximum pooling layer, wherein each of the convolution layers has 256 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2; a fourth module specifically comprises three convolution layers and one maximum pooling layer, wherein each of the convolution layers has 512 convolution kernels, with a size of 3×3, and the pooling layer has a size of 2×2. The calculation formula for each layer of convolution is:
C.sub.O=ϕ.sub.relu(W.sub.C.Math.C.sub.1+θ.sub.C).
[0088] wherein ϕ.sub.relu is a relu activation function, θ.sub.C is a bias vector of the convolutional layer, W.sub.C is a weight of the convolutional layer, C.sub.I is an input of the convolutional layer, and C.sub.O is an output of the convolutional layer.
[0089] A module of the fully connected layers specifically includes three fully connected layers, wherein a first fully connected layer contains 4096 nodes, a second fully connected layer contains 1024 nodes, a third fully connected layer contains 70 nodes. The calculation formula for the first two fully connected layers is:
F.sub.O=ϕ.sub.tan h(W.sub.F.Math.F.sub.I+θ.sub.F),
[0090] wherein ϕ.sub.tan h is a tan h activation function, θ.sub.F is a bias vector of each of the fully connected layers, W.sub.F is a weight of each of the fully connected layers, F.sub.I is an input of each of the fully connected layers, and F.sub.O is an output of each of the fully connected layers:
[0091] the last fully connected layer is an output layer, which has an output calculated with a formula as follows:
Y.sub.O=ϕ.sub.softmax(W.sub.Y.Math.Y.sub.I+θ.sub.Y),
[0092] wherein ϕ.sub.softmax is a softmax activation function, θ.sub.Y is a bias vector of the output layer, W.sub.Y is a weight of the output layer, Y.sub.I is an input of the output layer, and Y.sub.O is an output of the output layer; and each neuron of the output layer represents one corresponding shape category.
[0093] 7. All training samples are input into the two-stream convolutional neural network to train the two-stream convolutional neural network model; the test sample is input into the trained two-stream convolutional neural network model; and a shape category corresponding to a maximum value among output vectors is determined as a shape type of the test sample, thereby achieving the classified recognition of the shape.
[0094] Although the present invention is illustrated in detail with reference to the foregoing embodiments, those skilled in the art would also have been able to make modifications on the technical solutions recorded in the foregoing embodiments, or make equivalent replacement on some of the technical features therein. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention shall be incorporated within the protection scope of the present invention.