Method for flood disaster monitoring and disaster analysis based on vision transformer

Abstract

A method for flood disaster monitoring and disaster analysis based on vision transformer is provided. It includes: step (1), constructing a bi-temporal image change detection model based on vision transformer; step (2), selecting bi-temporal remote sensing images to make flood disaster labels; and step (3), performing flood monitoring and disaster analysis according to the bi-temporal image change detection model constructed in the step (1). In combination with the bi-temporal image change detection model based on an advanced vision transformer in deep learning and radar data which is not affected by time and weather and has strong penetration ability, data when floods occur can be obtained and recognition accuracy is improved.

Claims

1. A method for flood disaster monitoring and disaster analysis based on vision transformer, comprising: step (1), constructing a bi-temporal image change detection model, wherein the bi-temporal image change detection model comprises a convolutional neural network (CNN) framework, a semantic marker, and a transformer module comprising an encoder and a decoder; and an implementation process of the step (1) comprises: step (11), semantic feature extraction, which comprises: performing feature extraction by using the CNN framework to obtain changed water body features X.sup.i of bi-temporal remote sensing images; processing the changed water body features X.sup.i by using a spatial attention mechanism A.sup.i to obtain a bi-temporal feature set F.sup.i expressed as follows: F.sup.i=(A.sup.i).sup.TX.sup.i, where i represents a temporal order of remote sensing image, i=1, 2; T represents a transpose operation; and inputting the bi-temporal feature set F.sup.i into the transformer module; step (12), transformer encoding, which comprises: encoding the bi-temporal feature set F.sup.i obtained in the step (11) to construct a rich semantic feature set F′.sup.i with a spatiotemporal relationship; wherein a relationship between an attention mechanism and a softmax activation function is expressed as follows: $Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d}}) V,$
Q=F.sup.(l-1)W.sup.q,
K=F.sup.(l-1)W.sup.k,
V=F.sup.(l-1)W.sup.v, where, the attention mechanism is composed of Q, K and V function keys that control feature weights; l represents a number of layers of the CNN framework, W.sup.q, W.sup.k and W.sup.v are learning parameters, and d represents a channel size; step (13), transformer decoding, which comprises: decoding the rich semantic feature set F′.sup.i encoded in the step (12) by the decoder in a pixel space to optimize the changed water body features X.sup.i to obtain optimized changed water body features X′.sup.i, and then calculating a feature difference image through shallow CNN to thereby obtain a prediction result at a pixel level; step (2), selecting bi-temporal remote sensing images to make flood disaster labels; wherein the bi-temporal remote sensing images with a target spatial range and a target time scale are selected to make the flood disaster labels with a target generalization, and an implementation process of the step (2) comprises: step (21), performing preprocessing of radiometric calibration, geometric correction, and logarithmic transformation on paired bi-temporal remote sensing images containing water bodies before and after a flood period in a previous year; step (22), marking preprocessed bi-temporal remote sensing images by using an image marking tool; and then differentially labelling a changed part and an unchanged part of the water body after the flood period compared with the water body before the flood period by using an image processing software, to thereby obtain an image with a size same as its original size and with truth values of water body change; step (23), segmenting marked bi-temporal remote sensing images into a sample set inputtable into the bi-temporal image change detection model, screening the sample set to remove a sample having no water body change from the sample set, and dividing screened sample set into a training set, a validation set, and a testing set; and step (3), performing flood monitoring and disaster analysis according to the bi-temporal image change detection model constructed in the step (1).

2. The method according to claim 1, wherein the step (3) comprises: training the bi-temporal image change detection model based on the flood disaster labels made in the step (2), and adjusting parameters of the model until a loss curve of the model is fitted and a value thereof approaches to 0 while an accuracy of identification of flood submerged area of remote sensing image reaches 95% or more.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 illustrates a schematic diagram of a transformer decoder of the disclosure.

(2) FIG. 2 illustrates a frame diagram of a bi-temporal image change detection model based on vision transformer, which is mainly composed of a convolutional neural network (CNN) framework (i.e., ResNet), a semantic marker, and a transformer module (including an encoder and a decoder).

(3) FIG. 3 illustrates a schematic diagram of a bi-temporal microwave remote sensing flood disaster label made by the disclosure, where the first row and the second row show Sentinel-1 data of two temporals, and the third row shows corresponding target change water body labels.

(4) FIG. 4 illustrates a schematic diagram of flood disaster analysis in Poyang Lake area in the summer of 2020 year.

DETAILED DESCRIPTION OF EMBODIMENTS

(5) The disclosure will be further described in detail below in combination with the accompanying drawings and specific embodiments of description.

(6) The disclosure proposes a bi-temporal image change detection model based on vision transformer for effectively modeling spatiotemporal information of microwave remote sensing images, so that water body change conditions before and after a flood can be efficiently obtained, and the flood disaster can be accurately monitored and analyzed. On the one hand, a vision transformer method is employed as a model framework of the disclosure, in which the transformer uses attention and other mechanisms to increase a receptive field and use the spatiotemporal information, which can effectively learn changes of a target in a high-resolution remote sensing image and make full use of the spatiotemporal information. On the other hand, the disclosure makes bi-temporal flood disaster labels, and takes remote sensing data before and after the occurrence of flood as training data of the model, which can be used to accurately monitor flood events and accurately identify a flood submerged area.

(7) Specific steps are as follows.

(8) Step 1, constructing a bi-temporal image change detection model based on vision transformer.

(9) FIG. 1 illustrates a structural framework diagram of the bi-temporal image change detection model based on vision transformer of the disclosure, which mainly includes a CNN framework (ResNet), a semantic marker, and a transformer module (also referred to as transformer). The transformer may include an encoder and a decoder. An implementation process is as follows.

(10) Step (11), semantic feature extraction

(11) Specifically, a classical deep learning convolution neural network ResNet is used as a framework to extract semantic features of changes of a target water body in bi-temporal remote sensing images, and then a spatial attention mechanism is used to transform the remote sensing image feature mapping of each temporal into a rich semantic feature set.

(12) Feature extraction is performed first by using the CNN framework to obtain changed water body features X.sup.i(i=1, 2) in the bi-temporal remote sensing images, and then a spatial attention mechanism A.sup.i is used to process the changed water body features X.sup.i (i=1, 2) to obtain a bi-temporal feature set F.sup.i, and input the bi-temporal feature set F.sup.i into the transformer module; and the bi-temporal feature set F.sup.i is expressed as follows:
F.sup.i=(A.sup.i).sup.TX.sup.i (1)

(13) where i represents a temporal order of remote sensing images, and T represents a transpose operation of the remote sensing images.

(14) Step (12), transformer encoding

(15) Transformer can make full use of a global semantic relationship of bi-temporal remote sensing images, and therefore after generating rich semantic feature representation for each temporal, the encoder of the transformer is used to model the obtained bi-temporal feature set F.sup.i in spatiotemporal. The encoder encodes information about elements in relative or absolute locations in spatiotemporal because location information facilitates the encoder context modeling.

(16) As shown in FIG. 2, the bi-temporal feature set F.sup.i obtained in the step (11) is encoded, that is, a relationship of bi-temporal features in spatiotemporal is established to construct a rich semantic feature set F′.sup.i with a spatiotemporal relationship. Firstly, an operation of the attention mechanism is performed on the bi-temporal feature set F.sup.i by using multi-head self-attention (MSA), because MSA can connect information of different features from different locations. Then normalization is used to keep the bi-temporal feature set F.sup.i consistent in space, which is convenient for network feature extraction to be more stable. Finally, multi-layer perception (MLP) is used to better link the input bi-temporal feature set F.sup.i and the output rich semantic feature set F′.sup.i, which improves the efficiency and stability of network training. Among them, MSA is an attention mechanism composed of Q, K and V function keys that control feature weights; MLP represents a multi-layer perception, and softmax is an activation function:

(17) $\begin{matrix} Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d}}) V & (2) \end{matrix}$
Q=F.sup.(l-1)W.sup.q (3)
K=F.sup.(l-1)W.sup.k (4)
V=F.sup.(l-1)W.sup.v (5)

(18) where T is the transpose operation, l represents a number of layers of the network, W.sup.q, W.sup.k and W.sup.v are learning parameters, and d represents a channel size.

(19) Step (13), transformer decoding

(20) The context rich semantic feature set F′.sup.i obtained by the encoder is remapped into a pixel space by the twin decoder to enhance original pixel level semantic features. Finally, a feature difference image is calculated from two optimized feature images, and the feature difference image is input into shallow CNN to obtain a prediction result of water body change at a pixel level.

(21) Step 2, flood disaster labels making based on microwave remote sensing of Sentinel-1

(22) At present, flood monitoring is to identify the water body of each remote sensing image, so as to obtain a flood evolution process, which is a heavy workload, time-consuming, and laborious. The disclosure uses the paired bi-temporal remote sensing images to monitor the flood, and can quickly identify the flood submerged area through two remote sensing images of the same area before and after the disaster, which greatly shortens early-stage data processing time and improves the efficiency. In order to meet various application scenarios and monitor flood events in different regions at different times, bi-temporal microwave remote sensing images with a broad spatial scope and a long time scale are selected to make the flood disaster labels with strong generalization. A specific implementation process of the step 2 are as follows.

(23) Step (21), in order to meet the requirement that data samples have good temporal and spatial generalization and can be applied to flood monitoring in different regions and seasons, the necessary preprocessing is performed on paired bi-temporal remote sensing images of Sentinel 1 in middle and lower reaches of the Yangtze River before and after the flood period from 2015 to 2020 years, including radiometric calibration, geometric correction, logarithmic transformation, etc.

(24) Step (22), according to shapes, textures, colors, and other information of the water body in synthetic aperture radar (SAR) images, the SAR images are annotated manually by visual interpretation to obtain a water body change result image of bi-temporal remote sensing images. Use photoshop (PS) and other image processing software to label a changed part of the water body due to flood period (flood) as white and an unchanged part of the water body as black (as shown in FIG. 3), and finally an image with a size same as its original size and with truth values of water body change is obtained.

(25) Step (23), marked bi-temporal remote sensing images (i.e., annotated images in the step (22)) are respectively segmented into a fixed size that can be input into the model: 256*256 to thereby form a sample set. A marked dataset (i.e., the sample set) is segmented according to a channel size of the transformer of the model. In this situation, in order to improve the model training efficiency, samples in the sample set are screened and a sample without water body changes in the images are removed, which can effectively shorten the time of loading images during model training. A total of 6296 pairs of water body samples (a training set: 70%, a validation set: 20%, a testing set: 10%) are produced for Sentinel-1 images, which can be applied to flood monitoring in different regions and different phases.

(26) Step 3, accurate flood monitoring and disaster analysis of Sentinel-1 by the change detection model based on vision transformer

(27) Step (31), the bi-temporal image change detection model based on vision transformer is trained by using the Sentinel-1 water body change dataset obtained from the step 2. By adjusting parameters, changing the loss function, and adjusting the network structure, until a loss curve of the network is fitted and the value is close to 0, the identification accuracy of the changed water body in the remote sensing images can reach 95% or more, which can meet the requirements of efficient and accurate identification of water body changes, so as to be applied to flood monitoring.

(28) Step (32), a flood disaster situation is analyzed by combining a water body distribution image and land cover type data of disaster area obtained by the bi-temporal image change detection model, so as to obtain the expansion trend of the flood and the submerged area of different land surface types, and provide decision-making basis for flood disaster prevention and reduction.

(29) The bi-temporal image change detection model including transformer is trained based on the above-mentioned the water body change dataset, and model parameters are adjusted, which can meet the ability of efficiently and accurately identifying the flood submerged area. According to the rapid and accurate monitoring of flood events, the water body change and the expansion trend of flood before and after the flood can be obtained, which can provide decision-making basis for flood disaster prevention and reduction. Combined with a local land use map, the submerged area and changes of different land surface types in the disaster area can be obtained, so that the damage caused by flood disasters can be quantitatively analyzed.

(30) FIG. 4 illustrates a schematic diagram of flood disaster analysis in Poyang Lake area in a summer of 2020. Combined analysis of flood area and land cover type data identified by the bi-temporal image change detection model, a flood inundation situation of the disaster area is obtained.

(31) The disclosure proposes a deep learning model for bi-temporal image change detection based on vision transformer, and adopts data of active microwave remote sensing satellite (Sentinel-1) capable of surveying an earth's surface through clouds, which can realize high-precision all-weather and all-weather flood disaster accurate monitoring and disaster analysis, and its technical indicators can improve a flood identification accuracy to 95% or more.

Method for flood disaster monitoring and disaster analysis based on vision transformer

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/0464

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06N3/0455

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06V10/7715

PHYSICS

Classification Explorer

G06V10/26

PHYSICS

Classification Explorer

G06N3/048

PHYSICS

Classification Explorer

G06N3/09

PHYSICS

Classification Explorer

G06V20/13

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

Y02A10/40

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G06V10/774

PHYSICS

Classification Explorer

G06N3/0475

PHYSICS

Classification Explorer

G06V20/182

PHYSICS

Classification Explorer

G06V10/62

PHYSICS

International classification

Classification Explorer

G06V20/10

PHYSICS

Classification Explorer

G06V10/77

PHYSICS

Classification Explorer

G06V10/774

PHYSICS

Classification Explorer

G06V10/26

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V20/13

PHYSICS

Classification Explorer

G06V10/62

PHYSICS

Abstract

Claims

Description