SPACE-AIR-GROUND INTEGRATED UAV-ASSISTED IOT DATA COLLECTIONCOLLECTION METHOD BASED ON AOI
20230239037 · 2023-07-27
Inventors
Cpc classification
H04B7/18508
ELECTRICITY
International classification
Abstract
A space-air-ground integrated UAV-assisted IoT data collection method based on AoI comprises: constructing a UAV-assisted space-air-ground integrated IoT system, constructing a UAV channel model and an AoI model, establishing an AoI-based UAV-assisted space-air-ground integrated IoT data collection model, transforming a problem into a Markov problem, introducing a neural network to solve a high-dimensional state problem, introducing a deep reinforcement learning algorithm to train UAVs to find optimal collection points, and introducing a matching theory to match the UAVs and IoT devices. To meet the requirement for the timeliness of information collection, the invention finds the optimal configuration of flight parameters of UAVs and deduces the restrictive relation between performance indicators such as AoI, system capacity and energy utilization rate, thus effectively improving the timeliness of information collection, reducing the management and control complexity of the system, and improving the application level of AI in the IoT field.
Claims
1. A space-air-ground integrated UAV-assisted IoT data collection method based on AoI, comprising the following steps: Step 1: constructing a UAV-assisted space-air-ground integrated IoT system; Step 2: constructing a UAV channel model and an AoI model; Step 3: establishing an AoI-based UAV-assisted space-air-ground integrated IoT data collection model; Step 4: transforming a problem into a Markov problem; Step 5: introducing a neural network to solve a high-dimensional state problem; Step 6: introducing a deep reinforcement learning algorithm to train UAVs to find optimal collection points; and Step 7: introducing a matching theory to match the UAVs and IoT devices.
2. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 1, the UAV-assisted space-air-ground integrated IoT system is constructed, the UAV-assisted space-air-ground integrated IoT system comprises a low earth orbit satellite, the low earth orbit satellite is connected to multiple UAVs, the multiple UAVs are connected to multiple IoT devices, data generated by the IoT devices is randomly distributed by time, the size of the data follows Poisson distribution, each UAV flies from an initial location to a preset location to collect data and transmits the collected data to the satellite, and the UAVs are configured in a hovering mode during data collection.
3. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 2, data transmission between the UAVs and the IoT devices is based on line-of-sight, and a path loss between the UAV n and the IoT device m is:
R.sub.m,n=B log.sub.2(1+Γ.sub.m,n) wherein, B represents a bandwidth; AoI is introduced to describe the freshness of sensing data received by the UAVs; assume when a first matching IoT device generates data, the UAVs start to fly towards a final location; other matching IoT devices generate data randomly in a UAV flight time; when arriving at a target location, the UAVs start to send data; so, the AoI is composed of the UAV flight time and a transmission time from the IoT devices to the UAVs; the AoI of data received from the IoT device m in a time t is expressed as A.sub.m(t):
A.sub.m(t)=t−u.sub.m(t) u.sub.m(t) represents a time when the IoT device m generates data.
4. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 3, a system AoI minimization problem to be solved is summarized as an optimization problem:
5. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 4, a framework combining deep reinforcement learning and a matching algorithm is constructed to find data collection positions and matching information of the UAVs; during a matching process, agents are expressed as V={V.sub.k|∀k∈M}, a virtual agent is introduced into each UAV to realize point-to-multipoint data collection, and each agent matches one IoT device; the UAVs fly at a same height, the deep reinforcement learning is used for training positions of the UAVs, and in the deep reinforcement learning, the agents interact with an environment to obtain an optimal strategy; a Markov decision process is composed of a quaternion <S,A,P,R>, where S, A, P and R respectively denote a state space, an action space, a state transition probability and a reward; state: s.sub.t=(x.sub.t.sup.U,y.sub.t.sup.U), s.sub.t∈S, denotes the position of the UAVs at the time t; action: a.sub.t=(d.sub.t,θ.sub.t), a.sub.t∈A, where d.sub.t and θ.sub.t respectively represent a flight distance and a flight direction of the UAVs at the time t, and are discretized to limit selections; reward: r.sub.t is defined as negative AoI at the time t, and r.sub.t=A.sub.m(t); to minimize overall AoI of the network, minimum AoI between each agent and the corresponding IoT device is explored with an optimal UAV position, so the optimization problem is transformed into a problem of maximizing a cumulative reward:
Q.sup.π(s.sub.t,a.sub.t)=E.sub.s.sub.
6. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 5, deep reinforcement learning is used to solve the problem of a high-dimensional state space of the system model; experience replay and a target network are introduced in a deep Q network, and in the experience replay mechanism, a sequence <s.sub.t,a.sub.t,r.sub.t,s.sub.t+1> of the interaction between the agents and the environment is stored in an experience replay buffer D; during the learning process, a mini-batch sequence is uniformly sampled from D, and the deep Q network is trained by means of stochastic gradient descent to approximate a Q function in high-dimensional state spaces; the neural network is parameterized by θ to approximate the Q function as:
Q*(s.sub.t,a.sub.t)≈{circumflex over (Q)}(s.sub.t,a.sub.t;θ) a loss function of the neural network is defined for the stochastic gradient descent, which is expressed as:
7. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 6, to find the optimal collection points of the UAVs, an SAC algorithm is used for training; the SAC algorithm adopts a stochastic strategy, which is implemented by means of maximum entropy, so that any useful behaviors or paths will not be neglected; the agents develop more feasible solutions to explore the state space more fully, so as to complete a task with better optimization performance and learning efficiency; an optimal strategy achieved by using an entropy as:
V(s.sub.t)=E.sub.a.sub.
Q(s.sub.t,a.sub.t)=r(s.sub.t,a.sub.t)+γE.sub.s.sub.
{circumflex over (Q)}(s.sub.t,a.sub.t)=r(s.sub.t,a.sub.t)+γE.sub.s.sub.
J.sub.π(ϕ)=E.sub.s.sub.
J(α)=E.sub.a.sub.
8. The space-air-ground integrated UAV-assisted IoT data collection method based on AoI according to claim 1, wherein in Step 7, according to received AoI values, the satellite constructs preference lists P L.sub.k.sup.V and P L.sub.m.sup.I for each agent and the corresponding IoT device in an increasing order of the AoI, and then pairs the UAVs and the IoT devices through a GS algorithm; to ensure a same location of the agents of the UAVs, the agent with minimum AoI is selected as a primary agent, and auxiliary agents select the IoT device nearest to a training position of the primary agent; the GS algorithm has a propose rule and a reject rule, which are respectively as follows: definition 1: propose rule: the agent V.sub.k∈V files a connection application with a favorite IoT device in a preference list PL.sub.k.sup.V; definition 2: reject rule: in presence of a better matching candidate, the IoT device I.sub.m∈I receiving the connection application will reject the agent; otherwise, the agent will be reserved as a matching candidate; according to the rules, the GS algorithm comprises the following matching steps: (1) dividing V into a primary agent set V.sup.P and an auxiliary agent set V.sup.A; and (2) filing, by each primary agent, a connection application with a favorite IoT device in the preference list of the primary proxy; then selecting, by each IoT device, the agent that most prefers the IoT device, and rejecting other agents; each auxiliary agent V.sub.k.sup.A adjusts its preference list according to a distance from a most favorable position to the corresponding primary agent obtained by learning, and then performs the process in Step (2) until stable matching is realized.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0072]
[0073]
DETAILED DESCRIPTION OF THE INVENTION
[0074] To expound in detail the technical solutions adopted by the invention to fulfill desired technical purposes, the technical solutions of the embodiments of the invention will be clearly and completely described below in conjunction with the drawings of the embodiments of the invention. Obviously, the embodiments in the following description are merely illustrative ones, and are not all possible ones of the invention, and the technical means or technical features in the embodiments of the invention may be substituted without creative labor. The invention will be described in detail below with reference to the accompanying drawings and embodiments.
[0075] The invention provides a space-air-ground integrated UAV-assisted IoT data collection method based on AoI, comprising the following steps:
[0076] Step 1: a UAV-assisted space-air-ground integrated IoT system is constructed.
[0077] As shown in
[0078] Step 2: a UAV channel model and an AoI model are constructed.
[0079] Data transmission between the UAVs and the IoT devices is based on line-of-sight, and a path loss between the UAV n and the IoT device m is:
[0080] Wherein, d.sub.m,n indicates a distance from the UAV n to the IoT device m, f represents a center frequency, c represents a speed of light, and η.sub.LoS represents an additive loss due to shadowing and scattering caused by man-made structures;
[0081] A signal plus noise ratio from the IoT device m to the UAV n is expressed as:
[0082] Wherein, p.sub.m represents power from the IoT device m to the UAV n, and σ.sup.2 represents Gaussian white noise power;
[0083] A transmission rate from the IoT device m to the UAV n is calculated by:
R.sub.m,n=B log.sub.2(1+Γ.sub.m,n)
[0084] Wherein, B represents a bandwidth;
[0085] AoI is introduced to describe the freshness of sensing data received by the UAVs; assume when a first matching IoT device generates data, the UAVs start to fly towards a final location; other matching IoT devices generate data randomly in a UAV flight time; when arriving at a target location, the UAVs start to send data; so, the AoI is composed of the UAV flight time (the time of waiting for data transmission) and a transmission time from the IoT devices to the UAVs;
[0086] The AoI of data received from the IoT device m in a time t is expressed as A.sub.m(t):
A.sub.m(t)=t−u.sub.m(t)
[0087] u.sub.m(t) represents a time when the IoT device m generates data.
[0088] Step 3: an AoI-based UAV-assisted space-air-ground integrated IoT data collection model is established.
[0089] A system AoI minimization problem to be solved is summarized as an optimization problem:
[0090] Wherein, b.sub.m,n.sup.t is a matching variable of the UAV n and the IoT device m at the time t, and x.sub.t.sup.U and y.sub.t.sup.U respectively represent a horizontal coordinate and a vertical coordinate of a flight location of the UAVs at the time t; constraint C2 and constraint C3 represent point-to-multipoint matching between the UAVs and the IoT devices; constraint C4 represents a UAV flying area with radius S.
[0091] Step 4: the problem is transformed into a Markov problem.
[0092] A framework combining deep reinforcement learning and a matching algorithm is constructed to find data collection positions and matching information of the UAVs; as shown in
[0093] In this scenario, the UAVs fly at a same height, the deep reinforcement learning is used for training positions of the UAVs, and in the deep reinforcement learning, the agents interact with an environment to obtain an optimal strategy so as to maximize a long-term gain.
[0094] A Markov decision process provides a theoretical framework for reinforcement learning and is composed of a quaternion <S,A,P,R>, where S, A, P and R respectively denote a state space, an action space, a state transition probability and a reward;
[0095] State: s.sub.t=(x.sub.t.sup.U,y.sub.t.sup.U), s.sub.t∈S, denotes the position of the UAVs at the time t;
[0096] Action: a.sub.t=(d.sub.t,θ.sub.t), a.sub.t∈A, where d.sub.t and θ.sub.t respectively represent a flight distance and a flight direction of the UAVs at the time t, and are discretized to limit selections;
[0097] Reward: r.sub.t s defined as negative AoI at the time t, and r.sub.t=A.sub.m(t);
[0098] To minimize overall AoI of the network, minimum AoI between each agent and the corresponding IoT device is explored with an optimal UAV position, so the optimization problem is transformed into a problem of maximizing a cumulative reward:
[0099] Wherein, γ∈[0,1] is a discount factor for future rewards;
[0100] Under a policy π, a Q-value function used for selecting an action at in a state s.sub.t is defined as:
Q.sup.π(s.sub.t,a.sub.t)=E.sub.s.sub.
[0101] An optimal action-state value Q*(s.sub.t,a.sub.t) is defined as an optimal return obtained by taking the action under the state s.sub.t,
[0102] According to a Bellman equation, Q*(s.sub.t,a.sub.t) is expressed as:
[0103] The optimal strategy is obtained as follows:
[0104] Step 5: a neural network is introduced to solve a high-dimensional state problem
[0105] Because traditional reinforcement learning cannot be applied to a large discrete space or a continuous state space, deep reinforcement learning is used to solve the problem of a high-dimensional state space of the system model. Experience replay and a target network are introduced in a deep Q network (DQN). In the experience replay mechanism, a sequence <s.sub.t,a.sub.t,r.sub.t,s.sub.t+1> of the interaction between the agents and the environment is stored in an experience replay buffer D; during the learning process, a mini-batch sequence is uniformly sampled from D, and the deep Q network is trained by means of stochastic gradient descent to approximate a Q function in high-dimensional state spaces; the neural network is parameterized by θ to approximate the Q function as:
Q*(s.sub.t,a.sub.t)≈{circumflex over (Q)}(s.sub.t,a.sub.t;θ)
[0106] A loss function of the neural network is defined for the stochastic gradient descent, which is expressed as:
[0107] Wherein, θ.sup.− and θ respectively represent parameters of a separate target network and an online network.
[0108] Step 6: a deep reinforcement learning algorithm is introduced to train the UAVs to find optimal collection points
[0109] The objective is to find the optimal collection points of the UAVs. However, due to the complexity of the change of AoI, multiple local optimal collection points may exist during the training process. In order to prevent the UAVs from falling into local optimization, a soft actor-critic (SAC) algorithm is used for training.
[0110] Compared with traditional deep reinforcement learning algorithms, the SAC algorithm, as a good deep reinforcement learning method, adopts a stochastic strategy, which has more practical advantages than a deterministic strategy during training. The random stochastic is implemented by means of maximum entropy, so that any useful behaviors or paths will not be neglected. The agents develop more feasible solutions to explore the state space more fully, so as to complete a task with better optimization performance and learning efficiency.
[0111] The optimal strategy is achieved by using an entropy as:
[0112] Wherein, H(π(.Math.|s.sub.t))=E.sub.a.sub.
[0113] A state value function V(s.sub.t) and an action-state value function Q(s.sub.t,a.sub.t) are expressed as:
V(s.sub.t)=E.sub.a.sub.
Q(s.sub.t,a.sub.t)=r(s.sub.t,a.sub.t)+γE.sub.s.sub.
[0114] The algorithm constructs two action-state functions Q.sub.θ.sub.
[0115] A loss function for the critic network is as follows:
[0116] Wherein, {circumflex over (Q)}(s.sub.t,a.sub.t) is defined as:
{circumflex over (Q)}(s.sub.t,a.sub.t)=r(s.sub.t,a.sub.t)+γE.sub.s.sub.
[0117] When the policies π.sub.ϕ are trained, a loss function for the actor network is:
J.sub.π(ϕ)=E.sub.s.sub.
[0118] Because the temperature parameter α plays an important role in training, an automatic entropy adjustment scheme is employed; in an initial space exploration of the UAVs, α is increased to explore more spaces and is then decreased with the reduction of the space exploration, and a temperature loss is minimized by:
J(α)=E.sub.a.sub.
[0119] After training, the agents obtain minimum AoI between the UAVs and each IoT device and transmit the minimum AoI to the satellite for matching.
[0120] Step 7: a matching theory is introduced to match the UAVs and IoT devices.
[0121] According to received AoI values, the satellite constructs preference lists P L.sub.k.sup.V and P L.sub.m.sup.I for each agent and the corresponding IoT device in an increasing order of the AoI, and then pairs the UAVs and the IoT devices through a Gale-Shapley (GS) algorithm; to ensure a same location of the agents of the UAVs, the agent with minimum AoI is selected as a primary agent, and auxiliary agents select the IoT device nearest to a training position of the primary agent;
[0122] The GS algorithm has a propose rule and a reject rule, which are respectively as follows:
[0123] Definition 1: propose rule: the agent V.sub.k∈V files a connection application with a favorite IoT device in a preference list PL.sub.k.sup.V;
[0124] Definition 2: reject rule: in presence of a better matching candidate, the IoT device I.sub.m∈I receiving the connection application will reject the agent; otherwise, the agent will be reserved as a matching candidate;
[0125] According to the rules, the GS algorithm comprises the following matching steps:
[0126] (1) V is divided into a primary agent set V.sup.P and an auxiliary agent set V.sup.A; and
[0127] (2) Each primary agent files a connection application with a favorite IoT device in the preference list of the primary agent, and then each IoT device selects the agent that most prefers the IoT device, and rejects other agents;
[0128] Each auxiliary agent V.sub.k.sup.A adjusts its preference list according to a distance from a most favorable position to the corresponding primary agent obtained by learning, and then performs the process in Step (2) until stable matching is realized.
[0129] The above embodiments are merely preferred ones of the invention, and are not intended to limit the invention in any form. Although the invention has been disclosed above with reference to the preferred embodiments, these embodiments are not used to limit the invention. Any skilled in the art can obtain equivalent embodiments by slightly changing or modifying the technical contents disclosed above without departing from the scope of the technical solutions of the invention. Any simple amendments, equivalent substitutions and improvements made to the above embodiments based on the spirit and principle of the invention according to the technical essence of the invention should still fall within the protection scope of the technical solutions of the invention.