LANDING TRACKING CONTROL METHOD AND SYSTEM BASED ON LIGHTWEIGHT TWIN NETWORK AND UNMANNED AERIAL VEHICLE

20220332415 · 2022-10-20

Inventors

Cpc classification

International classification

Abstract

A landing tracking control method comprises the following contents: a tracking model training stage and an unmanned aerial vehicle real-time tracking stage. The landing tracking control method extracts a network Snet by using a lightweight feature and makes modification, so that an extraction speed of the feature is increased to better meet a real-time requirement. Weight allocation on the importance of channel information is carried out to differentiate effective features more purposefully and utilize the features, so that the tracking precision is improved. In order to improve a training effect of the network, a loss function of an RPN network is optimized, a regression precision of a target frame is measured by using CIOU, and meanwhile, calculation of classified loss function is adjusted according to CIOU, and a relation between a regression network and classification network is enhanced.

Claims

1. A landing tracking control method and system based on a lightweight twin network, the method comprising the following contents: in a tracking model training stage: a1. Extracting a target image in a target template, and extracting a search image in a search area; inputting the target image and the search image into two same lightweight SNet feature extraction modules, and extracting a search feature and a target feature by using a lightweight network SNet in the lightweight SNet feature extraction modules; a2. Adjusting weights of the search feature and the target feature by a feature weight adjusting module to obtain an adjusted search feature and an adjusted target feature; a3. Inputting the adjusted search feature and the adjusted target feature into an enhanced feature module to obtain an enhanced search feature and an enhanced target feature by a feature enhancing operation; a4. Inputting the enhanced search feature and the enhanced target feature into a same RPN network to determine a type and a position of a target; a5. Measuring a regression precision of a target frame by using CIOU, wherein when a CIOU value is relatively great, that is, the target frame is positioned precisely; and as far as a sample with a high frame precision is concerned, a classification loss value and a frame loss value thereof are increased, and otherwise, the classification loss value and the frame loss value thereof are decreased; and a6. Carrying out multiple circuit training according to steps a5-a2-a5 to finally obtain a tracking model; and in an unmanned aerial vehicle real-time tracking stage, b1. Carrying out frame identifying operation on a camera carried by the unmanned aerial vehicle; b2. Introducing frame identifying information into the tracking model and identifying a target; and b3. Judging whether the target is positioned successfully or not, if so, carrying out forecasting search on the target by means of Kalman algorithm and returning to the step b1 to operate; and if not, adjusting and expanding the search range and returning to the step b1 to operate; wherein in the step a1, after a deep separated convoluting operation of the lightweight network SNet, three search features and three target features are obtained; the weight adjusting operation in the step a2 comprises the following contents: compressing a feature pattern by the search feature and the target feature by utilizing global max-pooling first; then training a set of stipulated parameters to represent weight of each channel feature pattern then via full convolutional and nonlinear activation operations, and finally, multiplying features of original channels by weight values obtained by the full convolutional and nonlinear activation operations to obtain an adjusted search feature pattern and an adjusted target feature pattern; the steps a3 and a4 further comprise the following contents: enhancing the extracted adjusted search feature and the adjusted target feature based on a feature pyramid, carrying out feature fusion, and inputting the fused feature into the RPN network to determine the type and the position of the target from three dimensions; the step a5 further comprises the following contents: For an optimization problem of the loss function, measuring a regression accuracy of the target frame by using CIOU, wherein when the CIOU value is relatively great, it is indicated that the target frame is positioned precisely; and as far as a sample with a high frame precision is concerned, the classification loss value and the frame loss value thereof are increased, and otherwise, the classification loss value and the frame loss value thereof are decreased; IOU in the CIOU is specifically as follows: $IOU = \frac{.Math. A .Math. b .Math.}{.Math. A .Math. b .Math.}$ $a = \frac{v}{1 - IOU + v}$ $\begin{matrix} v = \frac{4}{π^{2}} {(arc \tan \frac{w^{gt}}{h^{gt}} - arc \tan \frac{w}{h})}^{2} & (3) \end{matrix}$ $\begin{matrix} CIOU = IOU - \frac{ρ^{2} (b, b^{gt})}{c^{2}} - a * v & (4) \end{matrix}$ $\begin{matrix} {Loss}_{b} = 1 - C_{IOU} & (5) \end{matrix}$ $\begin{matrix} {Loss}_{c} = - [g * \ln p + (1 - g) * \ln (1 - p)] & (6) \end{matrix}$ $\begin{matrix} p = {Pre}_{object} + C_{IOU} * (1 - {Pre}_{object}), & (7) \end{matrix}$ wherein A represents a true frame, B represents a prediction frame, w, h, b, w.sup.gt, h.sup.gt.sup.and b.sup.gt respectively represent widths, heights and centers of the true frame and the prediction frame, ρ.sup.2(b,b.sup.gt) represents an Euclidean distance between the center points of the true frame and prediction frame, c represents a diagonal length of the minimum rectangle containing the true frame and prediction frame true frame and prediction frame, Loss.sub.b represents a frame loss function, Loss.sub.c represents a classification loss function, g represents whether it is the target or not, if so, it is 1, and if not, it is 0; Pre.sub.object represents a classification probability predicted by the RPN network, and when the regression precision of the prediction frame is high, it is considered that classification is relatively reliable, and the classification prediction probability thereof is increased; the step b3 further comprises the following contents: for a re-positioning problem after the target disappears, predicting a next frame position of the target by using Kalman algorithm, and locally detecting the target by taking a prediction result as a center; if the target tracking in some continuous frames are missing, adding one of the length and width of the search area additionally by taking the prediction position result of Kalman filtering as a center, wherein the item is increased along with increment of time before the length and width of the search area exceed a video frame itself, and if the target is not detected for a long time, a final detection area is expanded to a whole image; and the step b3 further comprises the following contents: $\begin{matrix} w_{search} = w_{pre} + a * v_{w} * t * (\frac{w_{frame} - w_{pre}}{v_{w}}) & (8) \end{matrix}$ $\begin{matrix} h_{search} = h_{pre} + a * v_{h} * t * (\frac{h_{frame} - h_{pre}}{v_{h}}) & (9) \end{matrix}$ w.sub.search.sub.and h.sub.search are width and height of the search area, w.sub.pre and h.sub.pre are width and height of a Kalman prediction area, .sub.and rame, h.sub.frame are width and height of the video frame, v.sub.w and v.sub.h are Kalman average moving speeds in transverse and longitudinal directions of the target in a previous n frame image, a is a constant that controls an area expanding speed, and t is a video frame number counted from start of tracking loss.

2. A landing tracking control system, the system comprising a tracking model and a real-time tracking apparatus, wherein the tracking model comprises: a lightweight SNet feature module for facilitating a search feature pattern and a target feature pattern of the lightweight network SNet; a feature weight adjusting module for adjusting weights of the search feature pattern and the target feature pattern to obtain an adjusted search feature pattern and an adjusted target feature pattern; an enhanced feature module for carrying out a feature enhancing operation on the adjusted search feature pattern and the adjusted target feature pattern to obtain an enhanced search feature pattern and an enhanced target feature pattern; an RPN module configured with an RPN network, for determining a type and a position of the target; and a CIOU loss module for measuring a regression precision of the target frame; and the real-time tracking apparatus comprises: a camera carried by the unmanned aerial vehicle for shooting a video of the camera carried by the unmanned aerial vehicle; a video identifying module for carrying out frame identifying operation on the camera carried by the unmanned aerial vehicle; a tracking identifying module, configured with a control software for the landing tracking control method of the unmanned aerial vehicle as claimed in claim 1; a judging module for judging whether the target is positioned successfully or not; a prediction searching module for carrying out prediction search on the target by using Kalman algorithm when the judging module judges that the target is positioned successfully; and a search expanding module for adjusting a search expanding scope when the judging module judges that the target is not positioned successfully.

3. An unmanned aerial vehicle, applying the landing tracking control system as claimed in claim 2.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0024] FIG. 1 is a workflow schematic diagram of a landing tracking control system in an embodiment of the invention.

[0025] FIG. 2 is a schematic diagram of a frame structure of a part of algorithm in the tracking model training stage in an embodiment of the present invention.

[0026] FIG. 3 is a weight adjusting operating low schematic diagram of the feature weight adjusting module in an embodiment of the present invention.

[0027] FIG. 4 is a feature enhancing operating flow schematic diagram of the enhancing feature module in an embodiment of the present invention.

DETAILED DESCRIPTION

[0028] The technical scheme of the present invention is further described through specific embodiments in combination with the drawings.

[0029] It is to be noted that the terms used herein are merely to describe specific implementation modes rather than being intended to limit the exemplary implementation modes according to the application. As used herein, unless otherwise specified in the context, the singular form is further intended to include plural form. In addition, it is to be further understood that when the terms “comprise” and/or “include” are used in the description, it indicates that there are features, steps, operations, apparatuses, assemblies and/or their combinations.

[0030] Unless otherwise specified, relative arrangement, digital expression formulae and numerical values of components and steps illustrated in these embodiments do not limit the scope of the present invention. Meanwhile, it shall be understood that for the convenience of description, sizes of parts shown in the drawings are not drawn according to an actual proportional relationship. Techniques, methods and devices known to those skilled in the prior art may not be discussed in detail. But in a proper circumstance, the techniques, methods and devices shall be regarded as a part of the description. In all the illustrated and discussed examples, any specific value shall be explained as be exemplary merely rather than be restrictive. Thus, other examples of exemplary embodiments may have different values. It is to be noted that similar mark numbers and letters represent similar items in the drawings below, such that once a certain item is defined in a drawing, it is unnecessary to further discuss it in the subsequent drawings.

[0031] A landing tracking control method and system based on A lightweight twin network includes the following contents:

[0032] in a tracking model training stage: a1. A target image is extracted in a target template, and a search image is extracted in a search area; the target image and the search image are input into two same lightweight SNet feature extraction modules, and a search feature and a target feature are extracted by using a lightweight network SNet in the lightweight SNet feature extraction modules; a2. Weights of the search feature and the target feature are adjusted by a feature weight adjusting module to obtain an adjusted search feature and an adjusted target feature; a3. The adjusted search feature and the adjusted target feature are input into an enhanced feature module to obtain an enhanced search feature and an enhanced target feature by a feature enhancing operation; a4. The enhanced search feature and the enhanced target feature are input into a same RPN network to determine a type and a position of a target; a5. A regression precision of a target frame is measured by using CIOU, wherein when a CIOU value is relatively great, that is, the target frame is positioned precisely; and as far as a sample with a high frame precision is concerned, a classification loss value and a frame loss value thereof are increased, and otherwise, the classification loss value and the frame loss value thereof are decreased; and a6. Multiple circuit training is carried out according to steps a5-a2-a5 to finally obtain a tracking model; and

[0033] in an unmanned aerial vehicle real-time tracking stage, b1. Frame identifying operation is carried out on a camera carried by the unmanned aerial vehicle; b2. Frame identifying information is introduced into the tracking model and identifying a target; and b3. Whether the target is positioned successfully or not is judged, if so, forecasting search is carried out on the target by means of Kalman algorithm and returning to the step b1 to operate; and if not, the search range is adjusted and expanded and it is returned to the step b1 to operate.

[0034] For the real-time demand in tracking by the unmanned aerial vehicle, feature extraction on the image by using the modified lightweight network SNet. In order to aim at a specific target better, that is, track the taking off and landing platform, the original network is modified now, and the modified network is as shown in FIG. 2.

[0035] Specifically, in the step a1, after a deep separated convoluting operation of the lightweight network SNet, three search features and three target features are obtained.

[0036] By taking a feature extraction network as an example, assuming that the size of the input image is 448*448, the feature extraction network of the lightweight network Snet is shown in a table below:

TABLE-US-00001 OutputSize Layer Input 448*448 Image Conv1 224*224 3*3, 24, s2 Pool 112*112 3*3maxpool, s2 Stage1 56*56 [132, s2] 56*56 [132, s1]*3 Stage2 28*28 [264, s2] 28*28 [264, s1]*7 Stage3 14*14 [528, s2] 14*14 [528, s1]*3

[0037] Through deep separable convolution, multiple channels can be obtained. Information carried by each channel is nearly decoupled. When related operations are carried out, a same type of objects only has high response in corresponding channels. For single type tracking, the target can be detected better by utilizing the corresponding channels precisely, so that the detection precision can be improved.

[0038] The weight adjusting operation in the step a2 includes the following contents: a feature pattern is compressed by the search feature and the target feature by utilizing global max-pooling first; then a stipulated parameter is trained to represent weight of each channel feature pattern then via full convolutional and nonlinear activation operations, and finally, features of original channels are multiplied by weight values obtained by the full convolutional and nonlinear activation operations to obtain an adjusted search feature pattern and an adjusted target feature pattern. Specifically, as shown in FIG. 3, F1 represents that the feature pattern is compressed by global maximum pooling, F2 represents operations such as full convolution and nonlinear activation, a W parameter is trained to represent the weight of each channel feature pattern, and F3 represents that the feature of the original channel is multiplied with the weight value obtained by F2 to obtain the adjusted feature pattern.

[0039] The shallow feature after image convolution reflects shape, color, edge and the like much, which is favorable to position the target. The deep feature usually has semantic information of higher layer, which is favorable to classify the target. The shallow feature and the deep feature are fused to utilize information represented by both of them at the same time more efficiently.

[0040] Specifically, the steps a3 and a4 further comprise the following contents: enhancing the extracted adjusted search feature and the adjusted target feature based on a feature pyramid, carrying out feature fusion, and inputting the fused feature into the RPN network to determine the type and the position of the target from three dimensions; and the workflow of the feature enhancing module is as shown in FIG. 4.

[0041] More preferably, the Step a5 further includes the following contents: for an optimization problem of the loss function, the performance of the trained network is greatly related to setting of the loss function; a regression accuracy of the target frame is measured by using CIOU, wherein when the CIOU value is relatively great, it is indicated that the target frame is positioned precisely; and as far as a sample with a high frame precision is concerned, the classification loss value and the frame loss value thereof are increased, and otherwise, the classification loss value and the frame loss value thereof are decreased.

[0042] Specifically, IOU in the CIOU is specifically as follows:

[00003] $IOU = \frac{.Math. A .Math. b .Math.}{.Math. A .Math. b .Math.}$ $a = \frac{v}{1 - IOU + v}$ $\begin{matrix} v = \frac{4}{π^{2}} {(arc \tan \frac{w^{gt}}{h^{gt}} - arc \tan \frac{w}{h})}^{2} & (3) \end{matrix}$ $\begin{matrix} CIOU = IOU - \frac{ρ^{2} (b, b^{gt})}{c^{2}} - a * v & (4) \end{matrix}$ $\begin{matrix} {Loss}_{b} = 1 - C_{IOU} & (5) \end{matrix}$ $\begin{matrix} {Loss}_{c} = - [g * \ln p + (1 - g) * \ln (1 - p)] & (6) \end{matrix}$ $\begin{matrix} p = {Pre}_{object} + C_{IOU} * (1 - {Pre}_{object}) & (7) \end{matrix}$

[0043] A represents a true frame, B represents a prediction frame, w, h, b, w.sup.gt, h.sup.gt.sup.and b.sup.gt respectively represent widths, heights and centers of the true frame and the prediction frame, ρ.sup.2(b,b.sup.gt) represents an Euclidean distance between the center points of the true frame and prediction frame, c represents a diagonal length of the minimum rectangle containing the true frame and prediction frame true frame and prediction frame, Loss.sub.b represents a frame loss function, Loss.sub.c represents a classification loss function, g represents whether it is the target or not, if so, it is 1, and if not, it is 0; Pre.sub.object represents a classification probability predicted by the RPN network, and when the regression precision of the prediction frame is high, it is considered that classification is relatively reliable, and the classification prediction probability thereof is increased.

[0044] More preferably, the step b3 further includes the following contents: for a re-positioning problem after the target disappears, predicting a next frame position of the target by using Kalman algorithm, and locally detecting the target by taking a prediction result as a center; if the target tracking in some continuous frames are missing, adding one of the length and width of the search area additionally by taking the prediction position result of Kalman filtering as a center, wherein the item is increased along with increment of time before the length and width of the search area exceed a video frame itself, and if the target is not detected for a long time, a final detection area is expanded to a whole image.

[0045] Specifically,

[00004] $\begin{matrix} w_{search} = w_{pre} + a * v_{w} * t * (\frac{w_{frame} - w_{pre}}{v_{w}}) & (8) \end{matrix}$ $\begin{matrix} h_{search} = h_{pre} + a * v_{h} * t * (\frac{h_{frame} - h_{pre}}{v_{h}}) & (9) \end{matrix}$

[0046] w.sub.search.sub.and h.sub.search are width and height of the search area, w.sub.pre and h.sub.pre are width and height of a Kalman prediction area, .sub.and rame, h.sub.frame are width and height of the video frame, v.sub.w and v.sub.h are Kalman average moving speeds in transverse and longitudinal directions of the target in a previous n frame image, .sup.a is a constant that controls an area expanding speed, and t is a video frame number counted from start of tracking loss.

[0047] A landing tracking control system applying the landing tracking control method includes a tracking model and a real-time tracking apparatus.

[0048] Specifically, the tracking model includes: a lightweight SNet feature module for facilitating a search feature pattern and a target feature pattern of the lightweight network SNet; a feature weight adjusting module for adjusting weights of the search feature pattern and the target feature pattern to obtain an adjusted search feature pattern and an adjusted target feature pattern; an enhanced feature module for carrying out a feature enhancing operation on the adjusted search feature pattern and the adjusted target feature pattern to obtain an enhanced search feature pattern and an enhanced target feature pattern; an RPN module configured with an RPN network, for determining a type and a position of the target; and a CIOU loss module for measuring a regression precision of the target frame.

[0049] The real-time tracking apparatus includes: a camera carried by the unmanned aerial vehicle for shooting a video of the camera carried by the unmanned aerial vehicle; a video identifying module for carrying out frame identifying operation on the camera carried by the unmanned aerial vehicle; a tracking identifying module, configured with a control software for the landing tracking control method; a judging module for judging whether the target is positioned successfully or not; a prediction searching module for carrying out prediction search on the target by using Kalman algorithm when the judging module judges that the target is positioned successfully; and a search expanding module for adjusting a search expanding scope when the judging module judges that the target is not positioned successfully.

[0050] An unmanned aerial vehicle applies the above-mentioned landing tracking control system.

[0051] In considering the real-time demand of tracking by the unmanned aerial vehicle, the unmanned aerial vehicle takes another more lightweight network as the feature extraction network. For the tracking task of the single type of objects, a feature weight adjusting module is designed. The feature channel with greater response is found by network training and is allocated with a larger weight value, so that the feature information is utilized more efficiently. The extracted feature is enhanced based on a feature pyramid, so that multi-dimensional target detection is carried out by inputting the feature into the RPN network. The loss function for classification and frame regression is optimized to enhance relation between the two. A policy of adjusting the search region based on a target moving rule is designed to help re-positioning of the target missing tracking.

[0052] According to the specific implementation mode, the present invention provides a landing tracking control method and system based on a lightweight twin network and an unmanned aerial vehicle. The landing tracking control method extracts a network Snet by using a lightweight feature and makes modification, so that an extraction speed of the feature is increased to better meet a real-time requirement. For fixed point landing of the unmanned aerial vehicle, a taking off and landing platform is usually fixed, that is, the tracking task can be divided into tracking of a single target. A same type of targets is usually represented by a special feature channel. Therefore, a module is designed to allocate the importance of channel information to differentiate effective features more purposefully and utilize the features, so that the tracking precision is improved. In order to improve a training effect of the network, a loss function of a (Region Proposal Network) RPN network is optimized, a regression precision of a target frame is measured by using (Complete-IOU) CIOU, and meanwhile, calculation of classified loss function is adjusted according to CIOU, and a relation between a regression network and classification network is enhanced. In a tracking process, when the target is missing as a result of a certain reason, the algorithm can still expand the search region gradually as time lapses according to the previous moving rule of the target, so that it is further ensured that the target is still in the currently searched region when appearing again.

[0053] The technical principle of the present invention is described in combination with the above-mentioned specific embodiments. The descriptions are merely to explain the principle of the present invention rather than being explained to limit the protective scope of the present invention in any form. Based on explanation herein, those skilled in the art may be associated with other specific implementation modes of the present invention without creative efforts, and these implementation modes shall fall into the protective scope of the present invention.

LANDING TRACKING CONTROL METHOD AND SYSTEM BASED ON LIGHTWEIGHT TWIN NETWORK AND UNMANNED AERIAL VEHICLE

Inventors

Cpc classification

Classification Explorer

B64U2201/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T7/246

PHYSICS

Classification Explorer

G06T7/60

PHYSICS

Classification Explorer

G06T2207/10016

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06V2201/07

PHYSICS

Classification Explorer

G06T7/74

PHYSICS

Classification Explorer

G06T7/277

PHYSICS

Classification Explorer

G06T2207/20076

PHYSICS

Classification Explorer

G06V20/46

PHYSICS

Classification Explorer

G06V10/7715

PHYSICS

Classification Explorer

G06T2207/10032

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G05D1/042

PHYSICS

Classification Explorer

G05D1/0676

PHYSICS

Classification Explorer

G06V20/41

PHYSICS

Classification Explorer

G06V10/774

PHYSICS

Classification Explorer

G06V20/17

PHYSICS

Classification Explorer

G06T2207/10024

PHYSICS

Classification Explorer

B64C39/024

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

G06V10/7747

PHYSICS

Classification Explorer

G06V10/806

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06V10/766

PHYSICS

International classification

Classification Explorer