Method and device for flight path planning considering both the flight trajectory and the visual images from air traffic control systems for air traffic controllers
11710412 · 2023-07-25
Assignee
Inventors
Cpc classification
International classification
Abstract
The invention discloses a method to support the trajectory planning considering both the flight trajectory and the visual images (VI) from air traffic control (ATC) system for air traffic controllers (ATCOs) (VI from ATC system for ATCOs), comprising the following steps: Step 1: acquire the VI and the flight trajectory to serve as the method inputs, and extract features of the VI and the relative position of the aircraft; Step 2: construct reinforcement learning-based methods to support the decision-making for flight path planning and conduct the training procedures of the models in the proposed method; Step 3: based on the optimized reinforcement learning-based methods, predict the required operation sequence to guide the flight to the target waypoint. The method of the invention can support the flight path planning for air traffic operation in a safe and efficient manner and is able to reduce the workload of air traffic controllers.
Claims
1. A method for flight path planning considering both the flight trajectory and the visual images (VI) from air traffic control (ATC) system for air traffic controllers (ATCOs), comprising the following steps: Step 1: Acquire the VI and the flight trajectory, preprocess the VI and extract features of the VI and the relative position of the aircraft; Step 2: Construct reinforcement learning-based methods to support the decision-making for flight, path planning, and conduct the training procedures of the models in the proposed method; The reinforcement learning-based methods to support the decision-making for flight path planning comprise a route planning module (RPM), an action selection module (ASM) and a fusion module (FM); The RPM is used to predict the route sequence to the target waypoint according to the features of the VI and the relative position of the aircraft acquired in Step 1; The FM is used to generate fused features considering both the flight trajectory and the VI according to the route sequence and the flight trajectory to the target waypoint; The FM comprises fully connected layers and an attention mechanism; The fully connected layers are used to extract features from the route sequence and the flight trajectory with the same shape; The attention mechanism is used to aggregate information by assigning learnable weights to route sequence and flight trajectory to generate fused features H; the mathematical notations are as follows:
2. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the pre-processing in Step 1 comprises the following processing processes; Step 1-1: Acquire the VI, and resize the image to the predefined size; Step 1-2: Downsampled the image, and output the downsampled image in the form of 3D array; Step 1-3: Search the RUB value of the flight in the image in the 3D array obtained in Step 1-2, and obtain the relative position of the aircraft according to the RGB value.
3. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the RPM in Step 2 comprises a convolutional neural network and fully connected layers; The convolution neural network takes the VI as inputs, learns VI features, and compresses the data dimension; The fully connected layers take the relative position of the flight as inputs and learn the relative position features; the VI features and relative position features are concatenated and fed into the fully connected layers for further features extraction.
4. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 3, wherein the Actor-Critic based reinforcement learning framework is adopted to train the RPM, and the training process is as follows: Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, while the Actor outputs the route sequence selection; Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function Train the Critic and Actor to achieve the network convergence.
5. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 4, wherein the reward function iii the RPM is as follows:
6. The method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, wherein the training process of Critic and Actor is as follows: Let the Critic parameter be ϕ.sub.targ, the Actor parameter be θ.sub.targ, and the optimization target of Critic is:
7. A device for adopting the method for flight path planning considering both the flight trajectory and the visual images from ATC system for air traffic controllers according to claim 1, comprising: A visual input module for acquiring the VI and the flight trajectory; An information processing module, for pre-processing and considering both the VI and the flight trajectory; A decision-making module for making flight path control decisions based on the processed information.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(6) The present invention is further described with reference to the drawings and embodiments.
(7) As shown in
(8) Step 1: Acquire the VI and the flight trajectory, preprocess the VI, and extract features of the VI and the relative position of the aircraft;
(9) The pre-processing process is as follows:
(10) Step 1-1: Run the air traffic simulation BlueSky and store the image rendered by radarwidget module in BlueSky into the buffer; extract the image from the buffer and resize the image according to the input format of the reinforcement learning-based methods to support the decision-making for flight path planning;
(11) Step 1-2: Downsample the image to give a pixelated effect and read out the downsampled image in the form of 3D array;
(12) Step 1-3: Search the RGB value of the flight in the image in the array obtained in Step 1-2, obtain the relative position of the aircraft in the image, and distinguish the aircraft receiving the operation from other aircrafts in the control area according to the different RGB values;
(13) Step 2: Construct reinforcement learning-based methods to support the decision-making for flight path planning, and conduct the training separately on the RPM and ASM;
(14) As shown in
(15) The RPM is used to predict the route sequence to the target waypoint according to the features of the VI and the relative position of the aircraft acquired in Step 1;
(16) As shown in
(17) The convolutional neural network takes the VI as inputs, learns VI features, and compresses the data dimension;
(18) The fully connected layers take the relative position of the flight as inputs and learn the relative position features; the VI features and relative position features are concatenated and fed into the fully connected layers for further features extraction.
(19) The Actor-Critic-based reinforcement learning framework is adopted to train the RPM until the method can stably obtain the reward exceeding the benchmark, and then it is cascated with the RPM to work out the reinforcement learning-based methods to support the decision-making for flight path planning.
(20) The training process is as follows:
(21) Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, while the Actor outputs the route sequence selection;
(22) Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function;
(23) Train the Critic and Actor to achieve the network convergence. The Critic and Actor share a state input head, which is the network structure of this module.
(24) The reward function is as follows:
(25)
(26) Where, r.sub.t is the reward of Step t in the ASM, t is the number of steps for the flight action, T is the maximum number of flight action steps, and α is a constant.
(27) The FM is used to generate fused features considering both the flight trajectory and the VI according to the route sequence and flight trajectory to the target waypoint;
(28) The FM includes fully connected layers and an attention mechanism;
(29) The fully connected layers are used to extract features from the route sequence and the flight trajectory with the same shape;
(30) The attention mechanism is used to aggregate information by assigning learnable weights to route sequence and flight trajectory to generate fused features H; the mathematical notations are as follows:
(31)
(32) Where, Q, K and V are the three weight matrices in the attention mechanism, X is the matrix composed of processed route sequence and flight trajectory, W.sub.Q, W.sub.K and W.sub.V are linear transformation matrices, b.sub.Q, b.sub.K and b.sub.V are bias vectors, d is the vector dimension, and T is the transpose of matrix.
(33) As shown in
(34) The ASM includes fully connected layers; the route sequence and flight trajectory outputted from the RPM are concatenated together and fed into the fully connected layers; the features considering both the flight trajectory and VI are extracted; the training process is as follows:
(35) Construct a value estimation network Critic (Critic) and an action output network Actor (Actor); the Critic evaluates the action outputted from the Actor, receives the decision-making action outputted by the Actor, and outputs a scalar as the value of the decision-making action in the current situation; the Actor outputs the decision-making for the ATCO action;
(36) Assign a reward to the traffic situation corresponding to the current flight trajectory by a dedicatedly designed reward function;
(37) Train the Critic and Actor to achieve the network convergence.
(38) The reward function is as follows:
(39)
(40) Where, θ.sub.rel_waypoint is the relative angle between the aircraft and the target waypoint, θ.sub.waypoint is the heading of the target waypoint, θ.sub.aircraft is the aircraft heading, h is the flight altitude, gp_altitude is the altitude where the aircraft starts taxiing, dist is the distance between the aircraft and target waypoint, limit_steps is the limit of steps in one episode of reinforcement learning, total_steps is the number of steps currently performed, deg is a function to calculate the relative angle, α, β and γ are weight coefficients, and abs is a function to return the absolute value.
(41) The decision-making actions outputted by the Actor are discrete ATCO's decision-making actions, including maintaining current state, left turn, right turn, ascent, descent, acceleration and deceleration.
(42) The training process of Critic and Actor is as follows:
(43) Let the Critic parameter be ϕ.sub.targ, the Actor parameter be θ.sub.targ, and the optimization target of Critic is:
(44)
(45) The optimization target of Actor is:
(46)
(47) Where, ϕ′ is the updated parameters for Critic network. V.sub.ϕ is the Critic network, s.sub.t is the expression of the processed and fused VI, flight trajectory and selected route sequence in the traffic situation at time t, {circumflex over (R)}.sub.t is the sum of rewards obtained from 0 to t, θ′ is the updated parameters for Actor network, r.sub.t is the reward obtained at t, s.sub.t+1 is the expression of the VI, flight trajectory and selected route sequence in the traffic situation at t+1, π.sub.0 is the Actor network, and a.sub.t is the route sequence selection action in RPM or ATC decision-making action in ASM at t;
(48) The loss functions of the two networks are calculated in turn, and the network parameters are updated by gradient descent method.
(49) As shown in
(50) A visual input module for acquiring the VI and the flight trajectory;
(51) An information processing module for pre-processing and considering both the VI and the flight trajectory;
(52) A decision-making module for making flight path control decisions based on the processed information.
(53) In the present invention, the decision-making method realizes the feature extraction of the information by collecting the VI and flight trajectory and using the reinforcement learning-based methods constructed by deep neural networks. The stable flight path planning operation is extracted to control the flights, so as to achieve the end-to-end flights path planning. At the input side of the system, the VI for ATCOs can be obtained, making the system more robust and capable to handle more complex air traffic situation and make stable decisions that are more similar to human ATCOs. The method can support the flight path planning for air traffic operation in a safe and efficient manner and is able to reduce the workload of air traffic controllers.