Method and device for flight path planning considering both the flight trajectory and the visual images from air traffic control systems for air traffic controllers

Abstract

The invention discloses a method to support the trajectory planning considering both the flight trajectory and the visual images (VI) from air traffic control (ATC) system for air traffic controllers (ATCOs) (VI from ATC system for ATCOs), comprising the following steps: Step 1: acquire the VI and the flight trajectory to serve as the method inputs, and extract features of the VI and the relative position of the aircraft; Step 2: construct reinforcement learning-based methods to support the decision-making for flight path planning and conduct the training procedures of the models in the proposed method; Step 3: based on the optimized reinforcement learning-based methods, predict the required operation sequence to guide the flight to the target waypoint. The method of the invention can support the flight path planning for air traffic operation in a safe and efficient manner and is able to reduce the workload of air traffic controllers.

Claims

1. A method for flight path planning considering both the flight trajectory and the visual images (VI) from air traffic control (ATC) system for air traffic controllers (ATCOs), comprising the following steps: Step 1: Acquire the VI and the flight trajectory, preprocess the VI and extract features of the VI and the relative position of the aircraft; Step 2: Construct reinforcement learning-based methods to support the decision-making for flight, path planning, and conduct the training procedures of the models in the proposed method; The reinforcement learning-based methods to support the decision-making for flight path planning comprise a route planning module (RPM), an action selection module (ASM) and a fusion module (FM); The RPM is used to predict the route sequence to the target waypoint according to the features of the VI and the relative position of the aircraft acquired in Step 1; The FM is used to generate fused features considering both the flight trajectory and the VI according to the route sequence and the flight trajectory to the target waypoint; The FM comprises fully connected layers and an attention mechanism; The fully connected layers are used to extract features from the route sequence and the flight trajectory with the same shape; The attention mechanism is used to aggregate information by assigning learnable weights to route sequence and flight trajectory to generate fused features H; the mathematical notations are as follows: ${\begin{matrix} Q = {XW}_{Q} + b_{Q} \\ K = {XW}_{K} + b_{K} \\ V = {XW}_{V} + b_{V} \\ H = softmax (\frac{{QK}^{T}}{\sqrt{d}}) V \end{matrix}$ Where, Q, K and V are the three weight matrices in the attention mechanism, X is the matrix composed of processed route sequence and flight trajectory, W.sub.Q, W.sub.K and W.sub.V are linear transformation matrices, b.sub.Q, b.sub.K and b.sub.V are bias vectors, d is the vector dimension, and T is the transpose of matrix; The ASM is used to obtain the next operation for flight path planning according to the fused features outputted by the FM; The ASM comprises fully connected layers, and the fused features obtained from the FM serve as inputs; the Actor-Critic-based reinforcement learning framework is adopted for training the ASM, and the training process is as follows: Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor; the Actor outputs the decision-making for the ATCO action: the ATCO's decision-making actions are selected from maintaining current state, left turn, right turn, ascent, descent, acceleration, and deceleration; Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function; Train the Critic and Actor to achieve the network convergence; The reward function in the ASM is as follows; $reward = {\begin{matrix} \begin{matrix} \frac{\deg (θ_{rel_waypoint}, θ_{waypoint}) * α}{\max (dist, 1) * β} * \\ (1 - \deg (θ_{aircraft}, θ_{waypoint})) + \\ abs (h - gp_altitude) \end{matrix} & Aircraft flying normally \\ \begin{matrix} 1000 + \max ((limit_steps - \\ total_steps) * γ, 0) \end{matrix} & \begin{matrix} Aircraft arriving at \\ the target waypoint \end{matrix} \\ - 200 & \begin{matrix} \begin{matrix} Aircraft flying unusually \\ (flying out of the airspace or \end{matrix} \\ below minimum vectoring altitude) \end{matrix} \end{matrix}$ Where, θ.sub.rel_waypoint is the relative angle between the aircraft and the target waypoint, θ.sub.waypoint is the heading of the target waypoint, θ.sub.aircraft is the aircraft heading, h is the flight altitude, gp_altitude is the altitude where the aircraft starts taxiing, dist is the distance between the aircraft and target waypoint, limit_steps is the limit of steps in one episode of reinforcement learning, total_steps is the number of steps currently performed, deg is a function to calculate the relative angle, α, β and γ are weight coefficients, and abs is the function to return the absolute value; Step 3: Feed the features of the VI and the flight trajectory into the reinforcement learning-based methods to support the decision-making for flight path planning trained in Step 2 to obtain the flight path planning operation to make the flight path control decision.

2. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the pre-processing in Step 1 comprises the following processing processes; Step 1-1: Acquire the VI, and resize the image to the predefined size; Step 1-2: Downsampled the image, and output the downsampled image in the form of 3D array; Step 1-3: Search the RUB value of the flight in the image in the 3D array obtained in Step 1-2, and obtain the relative position of the aircraft according to the RGB value.

3. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the RPM in Step 2 comprises a convolutional neural network and fully connected layers; The convolution neural network takes the VI as inputs, learns VI features, and compresses the data dimension; The fully connected layers take the relative position of the flight as inputs and learn the relative position features; the VI features and relative position features are concatenated and fed into the fully connected layers for further features extraction.

4. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 3, wherein the Actor-Critic based reinforcement learning framework is adopted to train the RPM, and the training process is as follows: Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, while the Actor outputs the route sequence selection; Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function Train the Critic and Actor to achieve the network convergence.

5. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 4, wherein the reward function iii the RPM is as follows: $reward = {\begin{matrix} α {.Math.}_{t = 0}^{r} r_{t} & Aircrafts arrive at the target waypoint \\ 0 & Aircrafts conflict midway \end{matrix}$ Where, r.sub.t is the reward of Step t in the ASM, t is the number of steps for the flight action, T is the maximum number of flight action steps, and α is a constant.

6. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the training process of Critic and Actor is as follows: Let the Critic parameter be ϕ.sub.targ, the Actor parameter be θ.sub.targ, and the optimization target of Critic is: $ϕ^{'} = \underset{ϕ}{\arg \min} {.Math.}_{t = 0}^{T} {(V_{ϕ} (s_{t}) - {\hat{R}}_{t})}^{2}$ The optimization target of Actor is: $θ^{'} = \underset{θ}{\arg \max} {.Math.}_{t = 0}^{T} ((r_{t} + V_{ϕ} (s_{t + 1}) - V_{ϕ} (s_{t})) * π_{θ} (a_{t} .Math. s_{t}))$ Where, ϕ′ is the updated parameters for Critic network, V.sub.ϕ is the Critic network, s.sub.t is the expression of the processed and fused VI, flight trajectory and selected route sequence in the traffic situation at time t, {circumflex over (R)}.sub.t is the sum of rewards obtained from 0 to t, θ′ is the updated parameters for Actor network, r.sub.t is the reward obtained at t, s.sub.t+1 is the expression of the VI, flight trajectory and selected route sequence in the traffic situation at t+1, π.sub.θ is the Actor network, and a.sub.t is the route sequence selection action in RPM or ATC decision-making action in ASM at t; The loss functions of the two networks are calculated in turn, and the network parameters are updated by gradient descent method.

7. A device for adopting the method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, comprising: A visual input module for acquiring the VI and the flight trajectory; An information processing module, for pre-processing and considering both the VI and the flight trajectory; A decision-making module for making flight path control decisions based on the processed information.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a flow diagram of the method in the present invention.

(2) FIG. 2 is a structural diagram of the RPM in the present invention.

(3) FIG. 3 is a structural diagram of the ASM in the present invention.

(4) FIG. 4 is a schematic diagram of reinforcement learning-based methods to support the decision-making for flight path planning in the present invention.

(5) FIG. 5 is a structural diagram of the decision-making device in the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(6) The present invention is further described with reference to the drawings and embodiments.

(7) As shown in FIG. 1, a method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers includes the following steps:

(8) Step 1: Acquire the VI and the flight trajectory, preprocess the VI, and extract features of the VI and the relative position of the aircraft;

(9) The pre-processing process is as follows:

(10) Step 1-1: Run the air traffic simulation BlueSky and store the image rendered by radarwidget module in BlueSky into the buffer; extract the image from the buffer and resize the image according to the input format of the reinforcement learning-based methods to support the decision-making for flight path planning;

(11) Step 1-2: Downsample the image to give a pixelated effect and read out the downsampled image in the form of 3D array;

(12) Step 1-3: Search the RGB value of the flight in the image in the array obtained in Step 1-2, obtain the relative position of the aircraft in the image, and distinguish the aircraft receiving the operation from other aircrafts in the control area according to the different RGB values;

(13) Step 2: Construct reinforcement learning-based methods to support the decision-making for flight path planning, and conduct the training separately on the RPM and ASM;

(14) As shown in FIG. 4, the reinforcement learning-based methods to support the decision-making for flight path planning include a route planning module, an action selection module and a fusion module;

(15) The RPM is used to predict the route sequence to the target waypoint according to the features of the VI and the relative position of the aircraft acquired in Step 1;

(16) As shown in FIG. 2, the RPM includes a convolutional neural network and fully connected layers;

(17) The convolutional neural network takes the VI as inputs, learns VI features, and compresses the data dimension;

(18) The fully connected layers take the relative position of the flight as inputs and learn the relative position features; the VI features and relative position features are concatenated and fed into the fully connected layers for further features extraction.

(19) The Actor-Critic-based reinforcement learning framework is adopted to train the RPM until the method can stably obtain the reward exceeding the benchmark, and then it is cascated with the RPM to work out the reinforcement learning-based methods to support the decision-making for flight path planning.

(20) The training process is as follows:

(21) Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, while the Actor outputs the route sequence selection;

(22) Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function;

(23) Train the Critic and Actor to achieve the network convergence. The Critic and Actor share a state input head, which is the network structure of this module.

(24) The reward function is as follows:

(25) $reward = {\begin{matrix} α {.Math.}_{t = 0}^{r} r_{t} & Aircrafts arrive at the target waypoint \\ 0 & Aircrafts conflict midway \end{matrix}$

(26) Where, r.sub.t is the reward of Step t in the ASM, t is the number of steps for the flight action, T is the maximum number of flight action steps, and α is a constant.

(27) The FM is used to generate fused features considering both the flight trajectory and the VI according to the route sequence and flight trajectory to the target waypoint;

(28) The FM includes fully connected layers and an attention mechanism;

(29) The fully connected layers are used to extract features from the route sequence and the flight trajectory with the same shape;

(30) The attention mechanism is used to aggregate information by assigning learnable weights to route sequence and flight trajectory to generate fused features H; the mathematical notations are as follows:

(31) ${\begin{matrix} Q = {XW}_{Q} + b_{Q} \\ K = {XW}_{K} + b_{K} \\ V = {XW}_{V} + b_{V} \\ H = softmax (\frac{{QK}^{T}}{\sqrt{d}}) V \end{matrix}$

(32) Where, Q, K and V are the three weight matrices in the attention mechanism, X is the matrix composed of processed route sequence and flight trajectory, W.sub.Q, W.sub.K and W.sub.V are linear transformation matrices, b.sub.Q, b.sub.K and b.sub.V are bias vectors, d is the vector dimension, and T is the transpose of matrix.

(33) As shown in FIG. 3, the ASM considers both the flight trajectory and VI, which is used to obtain the next operation for flight path planning according to the route sequence and flight trajectory to the target waypoint;

(34) The ASM includes fully connected layers; the route sequence and flight trajectory outputted from the RPM are concatenated together and fed into the fully connected layers; the features considering both the flight trajectory and VI are extracted; the training process is as follows:

(35) Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, receives the decision-making action outputted by the Actor, and outputs a scalar as the value of the decision-making action in the current situation; the Actor outputs the decision-making for the ATCO action;

(36) Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function;

(37) Train the Critic and Actor to achieve the network convergence.

(38) The reward function is as follows:

(39) $reward = {\begin{matrix} \begin{matrix} \frac{\deg (θ_{rel_waypoint}, θ_{waypoint}) * α}{\max (dist, 1) * β} * \\ (1 - \deg (θ_{aircraft}, θ_{waypoint})) + \\ abs (h - gp_altitude) \end{matrix} & Aircraft flying normally \\ \begin{matrix} 1000 + \max ((limit_steps - \\ total_steps) * γ, 0) \end{matrix} & \begin{matrix} Aircraft arriving at \\ the target waypoint \end{matrix} \\ - 200 & \begin{matrix} \begin{matrix} Aircraft flying unusually \\ (flying out of the airspace or \end{matrix} \\ below minimum vectoring altitude) \end{matrix} \end{matrix}$

(40) Where, θ.sub.rel_waypoint is the relative angle between the aircraft and the target waypoint, θ.sub.waypoint is the heading of the target waypoint, θ.sub.aircraft is the aircraft heading, h is the flight altitude, gp_altitude is the altitude where the aircraft starts taxiing, dist is the distance between the aircraft and target waypoint, limit_steps is the limit of steps in one episode of reinforcement learning, total_steps is the number of steps currently performed, deg is a function to calculate the relative angle, α, β and γ are weight coefficients, and abs is a function to return the absolute value.

(41) The decision-making actions outputted by the Actor are discrete ATCO's decision-making actions, including maintaining current state, left turn, right turn, ascent, descent, acceleration and deceleration.

(42) The training process of Critic and Actor is as follows:

(43) Let the Critic parameter be ϕ.sub.targ, the Actor parameter be θ.sub.targ, and the optimization target of Critic is:

(44) $ϕ^{'} = \underset{ϕ}{\arg \min} {.Math.}_{t = 0}^{T} {(V_{ϕ} (s_{t}) - {\hat{R}}_{t})}^{2}$

(45) The optimization target of Actor is:

(46) 0 $θ^{'} = \underset{θ}{\arg \max} {.Math.}_{t = 0}^{T} ((r_{t} + V_{ϕ} (s_{t + 1}) - V_{ϕ} (s_{t})) * π_{θ} (a_{t} .Math. s_{t}))$

(47) Where, ϕ′ is the updated parameters for Critic network. V.sub.ϕ is the Critic network, s.sub.t is the expression of the processed and fused VI, flight trajectory and selected route sequence in the traffic situation at time t, {circumflex over (R)}.sub.t is the sum of rewards obtained from 0 to t, θ′ is the updated parameters for Actor network, r.sub.t is the reward obtained at t, s.sub.t+1 is the expression of the VI, flight trajectory and selected route sequence in the traffic situation at t+1, π.sub.0 is the Actor network, and a.sub.t is the route sequence selection action in RPM or ATC decision-making action in ASM at t;

(48) The loss functions of the two networks are calculated in turn, and the network parameters are updated by gradient descent method.

(49) As shown in FIG. 5, a decision-making device for flight path planning considering both the flight trajectory and the VI, comprising:

(50) A visual input module for acquiring the VI and the flight trajectory;

(51) An information processing module for pre-processing and considering both the VI and the flight trajectory;

(52) A decision-making module for making flight path control decisions based on the processed information.

(53) In the present invention, the decision-making method realizes the feature extraction of the information by collecting the VI and flight trajectory and using the reinforcement learning-based methods constructed by deep neural networks. The stable flight path planning operation is extracted to control the flights, so as to achieve the end-to-end flights path planning. At the input side of the system, the VI for ATCOs can be obtained, making the system more robust and capable to handle more complex air traffic situation and make stable decisions that are more similar to human ATCOs. The method can support the flight path planning for air traffic operation in a safe and efficient manner and is able to reduce the workload of air traffic controllers.

Method and device for flight path planning considering both the flight trajectory and the visual images from air traffic control systems for air traffic controllers

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/0464

PHYSICS

Classification Explorer

G08G5/003

PHYSICS

Classification Explorer

G06N3/092

PHYSICS

International classification

Classification Explorer

G08G5/00

PHYSICS

Classification Explorer

G06N3/092

PHYSICS

Classification Explorer

G06N3/0464

PHYSICS

Abstract

Claims

Description