IMAGE RECOGNITION METHOD AND UNMANNED AERIAL VEHICLE SYSTEM
20220245938 · 2022-08-04
Assignee
Inventors
Cpc classification
G06V10/771
PHYSICS
B64U2101/30
PERFORMING OPERATIONS; TRANSPORTING
G06V10/774
PHYSICS
G06V10/7715
PHYSICS
B64C39/024
PERFORMING OPERATIONS; TRANSPORTING
G06T3/40
PHYSICS
International classification
G06T3/40
PHYSICS
G06V10/77
PHYSICS
G06V10/771
PHYSICS
G06V10/774
PHYSICS
Abstract
An image recognition method and an unmanned aerial vehicle system are provided. A training image marked with a specified range is received, and a plurality of features are extracted from the training image through a basic model to obtain a feature map. Next, a frame selection is performed on each point on the feature map to obtain a plurality of initial detection frames, and a plurality of candidate regions are selected in the initial detection frames based on the specified range. Thereafter, the obtained candidate regions are classified to obtain a target block, feature data corresponding to the target block is extracted from the feature map, and a parameter of the basic model is adjusted based on the extracted feature data. In the disclosure, a higher-resolution image is achieved, time flexibility is provided, and accuracy of image recognition is thereby improved.
Claims
1. An image recognition method, comprising: receiving a training image, wherein a specified range is marked in the training image; extracting a plurality of features from the training image through a basic model to obtain a feature map; performing a frame selection on each point on the feature map to obtain a plurality of initial detection frames and selecting a plurality of candidate regions in the initial detection frames based on the specified range; classifying the obtained candidate regions to obtain a target block; extracting feature data corresponding to the target block from the feature map; and adjusting a parameter of the basic model based on the extracted feature data.
2. The image recognition method according to claim 1, wherein the image recognition method further comprises: receiving an input image; scaling down the input image; and performing data augmentation on the scaled-down input image to obtain a plurality of the training images.
3. The image recognition method according to claim 1, wherein the training image further comprises a specified category corresponding to the marked specified range.
4. The image recognition method according to claim 3, wherein the specified category comprises one of landslides, rivers, and roads.
5. The image recognition method according to claim 1, wherein the basic model is an inception residual network.
6. The image recognition method according to claim 1, wherein the step of performing the frame selection on each point on the feature map comprises: extracting the initial detection frames with shapes corresponding to a plurality of filter panes by treating each point on the feature map as a center point.
7. The image recognition method according to claim 1, wherein after the step of selecting the candidate regions in the initial detection frames based on the specified range is performed, the image recognition method further comprises: filtering the candidate regions by using a non-maximum suppress algorithm and classifying the retained candidate regions.
8. The image recognition method according to claim 1, wherein the image recognition method further comprises: receiving an operation to frame a range on the training image; and adjusting the range to the specified range of a regular shape.
9. The image recognition method according to claim 1, wherein the image recognition method further comprises: verifying a recognition rate of the basic model, which comprises: inputting a plurality of test images to the basic model to obtain a plurality of output results; determining whether intersections over union of the output results and specified ranges marked in the test images are greater than a default value; and determining the output results with the intersections over union greater than the default value to be correct recognition to obtain the recognition rate.
10. An unmanned aerial vehicle system, comprising image capturing equipment, an unmanned aerial vehicle, and a computing apparatus, wherein the unmanned aerial vehicle is equipped with the image capturing equipment, and the computing apparatus is configured for: training a basic model, wherein an image to be recognized is received from the image capturing equipment through a transmission manner, and a target block in the image to be recognized is predicted by using the basic model, wherein the step of training the basic model comprises: receiving a training image from the image capturing equipment, wherein a specified range is marked in the training image; extracting a plurality of features from the training image through a basic model to obtain a feature map; performing a frame selection on each point on the feature map to obtain a plurality of initial detection frames and selecting a plurality of candidate regions in the initial detection frames based on the specified range; classifying the obtained candidate regions to obtain a target block; extracting feature data corresponding to the target block from the feature map; and adjusting a parameter of the basic model based on the extracted feature data.
11. The unmanned aerial vehicle system according to claim 10, wherein the computing apparatus is configured for: receiving an input image; scaling down the input image; and performing data augmentation on the scaled-down input image to obtain a plurality of the training images.
12. The unmanned aerial vehicle system according to claim 10, wherein the training image further comprises a specified category corresponding to the marked specified range.
13. The unmanned aerial vehicle system according to claim 12, wherein the specified category comprises one of landslides, rivers, and roads.
14. The unmanned aerial vehicle system according to claim 10, wherein the basic model is an inception residual network.
15. The unmanned aerial vehicle system according to claim 10, wherein the computing apparatus is configured for: extracting the initial detection frames with shapes corresponding to a plurality of filter panes by treating each point on the feature map as a center point.
16. The unmanned aerial vehicle system according to claim 10, wherein the computing apparatus is configured for: filtering the candidate regions by using a non-maximum suppress algorithm and classifying the retained candidate regions.
17. The unmanned aerial vehicle system according to claim 10, wherein the computing apparatus is configured for: receiving an operation to frame a range on the training image; and adjusting the range to the specified range of a regular shape.
18. The unmanned aerial vehicle system according to claim 10, wherein the computing apparatus is configured for: verifying a recognition rate of the basic model, comprising: inputting a plurality of test images to the basic model to obtain a plurality of output results; determining whether intersections over union of the output results and specified ranges marked in the test images are greater than a default value; and determining the output results with the intersections over union greater than the default value to be correct recognition to obtain the recognition rate.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
[0014]
[0015]
[0016]
DESCRIPTION OF THE EMBODIMENTS
[0017] It is to be understood that other embodiment may be utilized and structural changes may be made without departing from the scope of the disclosure. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms “connected,” “coupled,” and “mounted,” and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings.
[0018]
[0019] The computing apparatus 130 is, for example, an electronic apparatus with computing functions such as a server, a personal computer, a tablet computer, and a smart phone, and has a processor and a storage device. The computing apparatus 130 receives image information from the unmanned aerial vehicle 120 through a wireless transmission manner. The wireless transmission manner is a manner known to a person having ordinary skill in the art, and description thereof is thus not provided herein. The processor is, for example, a central processing unit (CPU), a graphic processing unit (GPU), a physics processing unit (PPU), a programmable microprocessor, an embedded control chip, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or other similar devices. The storage device may be implemented as a fixed or a movable random access memory in any form, a read-only memory, a flash memory, a secure digital card, a hard disk, other similar devices, or a combination of the foregoing devices. One or more code segments are stored in the storage device, and the code segments are executed by the processor to complete an image recognition method provided as follows. In the embodiments provided as follows, a faster region-based convolutional neural network (faster R-CNN) framework is adopted, and two-stage detection is used for image recognition. That is, an object position is detected first and classification is then performed.
[0020]
[0021] Besides, a data augmentation manner may also be used to obtain the training image. Further, resolution of an input image obtained by the image capturing equipment 110 during aerial photography is high. If the input image is directly inputted to a basic model, a considerable amount of memory space is required. Therefore, a size of the input image may be reduced by a same proportion first, and data augmentation may then be performed on the scaled-down input image to obtain a plurality of the training images. Data augmentation includes different strategies, such as rotation, color adjustment, mirroring, translation or deformation of a target region, and so on.
[0022] After obtaining the training images, the processor may further receive an operation through input equipment, so as to perform a frame selection on a range to be marked on a training image and adjust such range to the specified range of a regular shape. Herein, the regular shape is, for example, a square.
[0023]
[0024] Next, in step S210, a plurality of features are extracted from the training image through the basic model to obtain a feature map. The choice of the basic model may affect the ability of model feature extraction. Herein, the basic model adopts an inception residual network (inception Resnet) structure. The training image is inputted to the basic model for feature extraction, and the feature map is accordingly obtained. For instance, in the faster R-CNN framework, a feature value of each region in the training image is extracted to act as input, and the corresponding feature map is then extracted through a convolution kernel operation in a convolution layer. Each point on the feature map may be treated as a feature of a corresponding region in the original training image.
[0025] Next, in step S215, a frame selection is performed on each point on the feature map to obtain a plurality of initial detection frames, and a plurality of candidate regions are selected in the initial detection frames based on the specified range. Herein, in the faster R-CNN framework, the initial detection frames with shapes corresponding to a plurality of filter panes are extracted by treating each point on the feature map as a center point. It is assumed that the filter panes include basic regions of 9 types (anchors), which are obtained by a combination of three sizes and three lengths and widths and are specified ranges corresponding to different shapes. 9 initial detection frames are extracted for each point on the feature map by using the 9 types of the filter panes. After the initial detection frames of all points are obtained, an initial detection frame among the initial detection frames that is most consistent with each specified range marked in the training image is selected as the candidate region. After the candidate regions are selected, a non-maximum suppression algorithm is used to filter the candidate regions, and subsequent classification is performed on the retained candidate regions.
[0026] In the embodiment, two-stage detection is adopted, and an object position (candidate region) is detected first and is then classified. That is, candidate regions that may have landslide regions are selected, and these candidate regions are then classified from the original feature map to determine whether these candidate regions are landslide regions. An advantage thereof is accuracy.
[0027] Thereafter, in step S220, the obtained candidate regions are classified to obtain a target block. With the selected candidate regions, a feature corresponding to each of the candidate regions is extracted from the original feature map. Final region correction and classification are performed on these candidate regions, so as to select the target block among the candidate regions. In terms of landslide detection, landslides and roads and rivers have similar features. Although the purpose is to recognize landslides, it is found in experiments that in the case that the road and river regions are added for classification of various types, and finally only the location of the landslides is selected, the overall accuracy may improve.
[0028] Next, in step S225, feature data corresponding to the target block is extracted from the feature map. Further, in step S230, a parameter of the basic model is adjusted based on the extracted feature data. In terms of landslide detection, the target block belonging to landslides is found, and the feature data corresponding to the landslides is obtained from the feature map to adjust the parameter of the basic model. The parameter includes at least one of a convolutional layer parameter, a fully-connected layer parameter, and an output layer parameter.
[0029] The purpose of basic model training is that when an image is inputted, the basic model can predict the location of landslides, and such prediction needs to resemble manually marked data as much as possible. Herein, a momentum optimizer may be further used to facilitate parameter adjustment, and not until the model converges does the training stops.
[0030] In addition, after parameter adjustment is completed, a recognition rate of the basic model may be further verified. That is, a plurality of test images are inputted to the basic model to obtain a plurality of output results, and whether intersections over union of the output results and specified ranges marked in the test images are greater than a default value are determined. Next, the output results with the intersections over union greater than the default value to be correct recognition to obtain the recognition rate.
[0031] In an embodiment, it is assumed that there are a total of 968 images obtained by aerial photography by the image capturing equipment 110, 774 images are used as the training images, and 194 images are used as the test images. Firstly, 194 test images are manually marked to obtain the specified range of the corresponding landslide region. Secondly, the 194 test images are inputted to the basic model one by one to obtain the final output result. Next, the output result is compared with the marked specified range. Since it is necessary to compare whether the landslide position in the output result is correct, the identification is correct if the intersection over union (IOU) of each output result and the marked specified range is set to be greater than 50%. Further, a different number of landslide regions are provided in each test image. Therefore, the correct detection is further defined as the detection of all landslides on the test image, and it is determined as correct detection. Even a landslide is redundantly detected, such detection is still considered as correct detection, and as long as one landslide is missed, it is considered an error. Verification results are shown in Table 1.
TABLE-US-00001 TABLE 1 Verification Results Number Percentage Correct no misjudgment 175 164 90% 85% Detection redundant detection 11 5% Missed at least one missed 19 18 10% 9% Detection all missed 1 1%
[0032] In view of the foregoing, in the embodiments, the advantage of using the image capturing equipment together with the unmanned aerial vehicle for landslide detection is flexibility. Before and after the disaster, the unmanned aerial vehicle can be taken off at any time for landslide inspection. Further, the unmanned aerial vehicle itself has a global positioning system (GPS) that can record the location of the photo. When a landslide is detected, the region where the landslide occurred can be known, an early warning map may thus be accordingly established. A higher-resolution image is achieved with the use of the unmanned aerial vehicle to take photos, time flexibility is provided, and accuracy of image recognition is thereby improved. In addition, through the embodiments, after the collapse occurs, the landslide region may be known as soon as possible, and contingency measures may then be made.
[0033] The above are exemplary embodiments of the disclosure and should not be construed as limitations to the scope of the disclosure. That is, any simple change or modification made based on disclosure of the claims and specification of the disclosure falls within the scope of the disclosure. Any of the embodiments or any of the claims of the disclosure does not necessarily achieve all of the advantages or features disclosed by the disclosure. Moreover, the abstract and the title are merely used to aid in search of patent files and are not intended to limit the scope of the claims of the disclosure. In addition, terms such as “first” and “second” in the specification or claims are used only to name the elements or to distinguish different embodiments or scopes and should not be construed as the upper limit or lower limit of the number of any element.
[0034] The foregoing description of the preferred embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form or to exemplary embodiments disclosed. Accordingly, the foregoing description should be regarded as illustrative rather than restrictive. Obviously, many modifications and variations will be apparent to practitioners skilled in this art. The embodiments are chosen and described in order to best explain the principles of the invention and its best mode practical application, thereby to enable persons skilled in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use or implementation contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents in which all terms are meant in their broadest reasonable sense unless otherwise indicated. Therefore, the term “the invention”, “the present invention” or the like does not necessarily limit the claim scope to a specific embodiment, and the reference to particularly preferred exemplary embodiments of the invention does not imply a limitation on the invention, and no such limitation is to be inferred. The invention is limited only by the spirit and scope of the appended claims. Moreover, these claims may refer to use “first”, “second”, etc. following with noun or element. Such terms should be understood as a nomenclature and should not be construed as giving the limitation on the number of the elements modified by such nomenclature unless specific number has been given. The abstract of the disclosure is provided to comply with the rules requiring an abstract, which will allow a searcher to quickly ascertain the subject matter of the technical disclosure of any patent issued from this disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Any advantages and benefits described may not apply to all embodiments of the invention. It should be appreciated that variations may be made in the embodiments described by persons skilled in the art without departing from the scope of the present invention as defined by the following claims. Moreover, no element and component in the present disclosure is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.