METHOD FOR TASK OFFLOADING BASED ON POWER CONTROL AND RESOURCE ALLOCATION IN INDUSTRIAL INTERNET OF THINGS
20220377137 · 2022-11-24
Assignee
Inventors
- Jincheng XU (Shanghai, CN)
- Peng Zhou (Shanghai, CN)
- Bo YANG (Shanghai, CN)
- Cailian Chen (Shanghai, CN)
- Xinping Guan (Shanghai, CN)
Cpc classification
H04L43/0876
ELECTRICITY
H04L67/1008
ELECTRICITY
H04L67/125
ELECTRICITY
H04L67/10
ELECTRICITY
H04L41/145
ELECTRICITY
H04L67/12
ELECTRICITY
International classification
H04L67/1008
ELECTRICITY
Abstract
A method for task offloading based on power control and resource allocation in the Industrial Internet of Things includes establishing a computing model for computation tasks at different offloading locations, constructing communication power control, resource allocation and computation offloading problems as a mixed integer non-linear programming model, solving them using a deep reinforcement learning algorithm to obtain an optimal strategy for offloading of the computation tasks, thus achieving communication power optimization and cross-domain resource allocation.
Claims
1. A method for task offloading based on power control and resource allocation in the Industrial Internet of Things, comprising steps of: Step 1: configuring an Industrial Internet of Things network, wherein the Industrial Internet of Things network comprises a plurality of switches and a plurality of devices, the plurality of switches communicating with one another in a wired fashion, partitioning the Industrial Internet of Things network into a plurality of cluster domains according to communication coverage ranges of the plurality of switches, wherein each of the plurality of cluster domains comprises one edge server and at least one device of the plurality of devices, the at least one device wirelessly communicating with a switch in the cluster domain where it is in, computing capacity of the edge server being f.sub.j.sup.S, computing capacity of each of the at least one device being f.sub.i.sup.L, each of the at least one device configured to generate one computation task Q.sub.i, the computation task Q.sub.i configured to contain a task data volume indicator d.sub.i and a task computational load indicator c.sub.i, configuring offloading locations for the computation task, wherein the offloading locations include a first offloading location, a second offloading location and a third offloading location, wherein the first offloading location is the device itself, the second offloading location is a second edge server, the second edge server comprising the edge server in the cluster domain where the device is in, the device offloading, via the second switch, the computation task to the second edge server for computation, the second switch comprising the switch in the cluster domain where the device is in, the second edge server configured to allocate, to the computation task offloaded to it, computing resources, and the third offloading location is a third edge server, the third edge server comprising the edge server in another cluster domain where the device is not in, the device offloading, via the second switch and a third switch, the computation task to the third edge server for computation, the third switch comprising a switch in the cluster domain where the third edge server is in, the third edge server configured to allocate, to the computation task offloaded to it, computing resources, in case of the computation task being executed at the first offloading location, establishing a first computing model, in case of the computation task being executed at the second offloading location, establishing a second computing model, in case of the computation task being executed at the third offloading location, establishing a third computing model; Step 2: based on the first computing model, the second computing model and the third computing model, establishing a total overhead model for all the computation tasks in the Industrial Internet of Things network, constructing an objective function and constructing a mixed integer non-linear programming problem; Step 3: decomposing the non-linear programming problem in Step 2 into a communication power optimization problem and a computing resource allocation problem, obtaining optimal communication power at an extreme value point or a boundary of a domain of definition of the communication power optimization problem, using a method of Lagrange multipliers and a system of simultaneous equations of KKT conditions to derive an optimal computing resource allocation strategy for computing resource allocation, substituting the optimal communication power and the optimal computing resource allocation strategy into the objective function to obtain an offloading location decision problem model; Step 4: based on the offloading location decision problem model obtained in the Step 3, establishing a reinforcement learning model, using a deep reinforcement learning algorithm to train parameters of a depth neural network so as to maximize a cumulative reward of the reinforcement learning model from multi-step iteration, obtaining an optimal offloading location decision for the computation tasks, obtaining a joint optimization strategy comprising the optimal communication power, the optimal computing resource allocation strategy and the optimal offloading location decision.
2. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 1, wherein the first computing model comprises first time consumption, first energy consumption and first overhead.
3. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 2, wherein the first time consumption is
4. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 3, wherein the first energy consumption is e.sub.i.sup.L=ζ.sub.i(f.sub.i.sup.L).sup.2c.sub.i, where ζ.sub.i is an energy consumption density of a device i for execution of the computation task.
5. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 4, wherein the first overhead is u.sub.i.sup.L=α.sub.i.Math.t.sub.i.sup.L+(1−α.sub.i).Math.e.sub.i.sup.L, where α.sub.i is a weight factor and α.sub.i∈(0,1).
6. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 5, wherein the second computing model comprises second time consumption, second energy consumption and second overhead.
7. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 6, wherein the second time consumption is
8. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 7, wherein the second energy consumption is
9. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 8, wherein the second overhead is u.sub.i.sup.LS=α.sub.i.Math.t.sub.i.sup.LS+(1−α.sub.i).Math.e.sub.i.sup.LS, where α.sub.i is a weight factor and α.sub.t∈(0,1).
10. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 9, wherein the third computing model comprises third time consumption, third energy consumption and third overhead.
11. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 10, wherein a data transmission rate between the second switch and the third switch is a constant r.sub.w; the third time consumption is
12. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 11, wherein the third energy consumption is
13. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 12, wherein the third overhead is u.sub.i.sup.OS=α.sub.i.Math.t.sub.i.sup.OS+(1−α.sub.i).Math.e.sub.i.sup.OS.
14. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 13, wherein establishing the total offloading overhead model for all the computation tasks in the Industrial Internet of Things network in Step 2 comprises: defining a first decision variable x.sub.i={0,1}, wherein x.sub.i=0 means the computation task is executed at the first offloading location, while x.sub.i=1 means the computation task is offloaded to the edge server for computation; defining a second decision variable β.sub.i={0,1}, wherein β.sub.i=0 means the computation task is executed at the first offloading location or the second offloading location, while β.sub.i=1 means the computation task is executed at the third offloading location; defining a third decision variable γ.sub.i, wherein γ.sub.i represents the edge server that executes the computation task and γ.sub.i ∈ {1,2, . . . , N}, overhead for the computation task Q.sub.i is
u.sub.i=(1−x.sub.i)u.sub.i.sup.L+x.sub.i(u.sub.i.sup.LS+β.sub.i(u.sub.i.sup.OS−u.sub.i.sup.LS)), total overhead for all the computation tasks in the Industrial Internet of Things network is
15. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 14, wherein constructing the mixed integer non-linear programming problem in Step 2 comprises: constructing an objective function
16. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 15, wherein Step 3 comprises: according to the offloading location where the computation task is executed, substituting a set of feasible solutions, x.sup.0, β.sup.0 and γ.sup.0, into the objective function to obtain a function of a continuous variable κ.sub.i and a function of a continuous variable p.sub.i; transforming a function of the continuous variable p.sub.i into a communication power optimization problem and solving it to obtain the optimal communication power p*.sub.i; transforming a function of the continuous variable κ.sub.i into a computing resource allocation problem and solving it to obtain the optimal computing resource allocation strategy κ*.sub.i; substituting the optimal communication power p*.sub.i and the optimal computing resource allocation strategy κ*.sub.i into the original objective function to obtain a offloading location decision problem expressed as:
17. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 16, wherein the communication power optimization problem is configured to leverage a nature of the function to solve the optimal communication power.
18. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 17, wherein the computing resource allocation problem is configured to first leverage a convex optimization theory to make a decision and then use a method of Lagrange multipliers and KKT (Karush-Kuhn-Tucker) conditions to solve the optimal allocation strategy.
19. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 18, wherein Step 4 comprises: Step 4.1: defining a state, an action and a reward; Step 4.2: initializing a memory buffer area, the memory buffer area configured to store at least one set of memory data, the memory data comprising a current state, a current action, a current reward and a subsequent state, and initializing a weight of a value function network to make network parameters of the objective function the same as network parameters of the value function; Step 4.3: initializing a state of the value function network, computing total time consumption T and total energy consumption E of all the computation tasks, taking a result of the computation as a state s.sub.1, and inputting the state s.sub.1 to the value function network to obtain corresponding outputs of the value function in response to different actions taken in the state s.sub.1; Step 4.4: selecting a current action using a ε-greedy strategy, after executing the current action, obtaining an immediate reward and proceeding to a next state s.sub.t+1, storing each set of the memory data in the memory buffer area; Step 4.5: stochastically sampling a plurality of the memory data from the memory buffer area, in case of the current state being a final state, configuring a temporal difference target as r.sub.j, in case of the current state not being a final state, inputting each of the plurality of the memory data to the objective function network to compute the temporal difference target, the objective function giving a network output as
20. The method for task offloading based on power control and resource allocation in the Industrial Internet of Things as in claim 19, wherein the state comprises the total time consumption T and the total energy consumption E of all the computation tasks, the total time consumption T being the sum of the first time consumption, the second time consumption and the third time consumption of all the computation tasks, the total energy consumption E being the sum of the first energy consumption, the second energy consumption and the third energy consumption of all the computation tasks; the action comprises a first decision variable vector [x.sub.1,x.sub.2, . . . ,x.sub.m], a second decision variable vector [β.sub.1,β.sub.2, . . . ,β.sub.m] and a third decision variable vector [γ.sub.1,γ.sub.2, . . . ,γ.sub.m], wherein the first decision variable vector is configured to determine whether the computation tasks need to be offloaded, the second decision variable vector is configured to determine whether the computation tasks are computed on the edge servers in the cluster domains where the devices are in, the third decision variable vector is configured to determine the edge servers where the computation tasks are on, an action space of the action is α=[x.sub.1,x.sub.2, . . . ,x.sub.m,β.sub.1,β.sub.2, . . . ,β.sub.m,γ.sub.1,γ.sub.2, . . . ,γ.sub.m]; the reward is configured as a reward function
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0065]
[0066]
[0067]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0068] Below, the accompanying drawings of this specification are referenced to introduce many preferred embodiments of the present invention so that the techniques thereof become more apparent and readily understood. The present invention may be embodied in many different forms of embodiment, and the protection scope of the invention is not limited only to the embodiments mentioned herein.
[0069] Throughout the accompanying drawings, structurally identical parts are indicated with identical reference numerals, and structurally or functionally similar components are indicated with similar reference numerals. The size and thickness of each component in the drawings are arbitrarily depicted, and the present invention is not limited to any size or thickness of each component. For greater clarity of illustration, the thicknesses of some parts are exaggerated as appropriate somewhere in the drawings.
[0070] Shown in
[0071] A method used thereby for task offloading based on power control and resource allocation includes the steps as follows, with reference to
[0072] Step 1: configure an Industrial Internet of Things network, wherein, the Industrial Internet of Things network includes a plurality of switches and a plurality of devices, and the plurality of switches communicate with one another in a wired fashion. The plurality of devices may be any device used at industrial site, including, but not limited to, machines that produce products, product transportation devices such as AGVs and hoisting machinery, and detection devices.
[0073] According to communication coverage ranges of the plurality of switches, the Industrial Internet of Things network is partitioned into a plurality of cluster domains.
[0074] Each of the plurality of cluster domains includes one edge server and at least one device. The device in each cluster domain communicates with a switch in the cluster domain wirelessly, and through the switch in the cluster domain where it is in, offloads a computation task to an edge server in this cluster domain for computation. Devices cannot directly communicate with switches in different cluster domains, but switches in different cluster domains can communicate with one another in a wired fashion. Therefore, a computation task from a device in a cluster domain can be offloaded by a switch in this cluster domain to the edge server in another cluster domain for computation. Computing capacity of the edge server is denoted as f.sub.j.sup.S, and computing capacity of each device is denoted as f.sub.i.sup.L. Each device is configured to generate one computation task Q.sub.i. The computation task Q.sub.i is configured to contain a data volume indicator d.sub.i of the task and a computational load indicator c.sub.i of the task.
[0075] Offloading locations for the computation task are configured. As shown in
[0076] The offloading locations include a first offloading location, a second offloading location and a third offloading location, wherein the first offloading location is the device 11 itself that generates the computation task. The second offloading location is a second edge server. The second edge server includes an edge server 12 in the first cluster domain 1 where the device 11 is in. The device 11 offloads, via a second switch, the computation task to the second edge server 12 for computation. The second switch includes a switch 13 in the first cluster domain 1 where the device 11 is in. The second edge server is configured to allocate, to the computation task offloaded to it, a certain proportion of computing resources. The third offloading location is a third edge server. The third edge server includes the edge server in another cluster domain where the device 11 is not in, such as an edge server 22 in a second cluster domain 2, or an edge server 23 in a third cluster domain 3. The device 11 offloads, via the second switch and a third switch, the computation task to the third edge server for computation. The third switch includes a switch in another cluster domain where the device 11 is not in, such as a switch 23 in the second cluster domain 2, or an edge server 33 in the third cluster domain 3. The third edge server is configured to allocate, to the computation task offloaded to it, a certain proportion of computing resources.
[0077] In case of the computation task being executed at the first offloading location, a first computing model is established.
[0078] In case of the computation task being executed at the second offloading location, a second computing model is established.
[0079] In case of the computation task being executed at the third offloading location, a third computing model is established.
[0080] Step 2: Based on the first computing model, the second computing model and the third computing model, establish a total overhead model for all the computation tasks in the Industrial Internet of Things network, and construct a mixed integer non-linear programming problem.
[0081] Step 3: Decompose the non-linear programming problem in Step 2 into a communication power optimization problem and a computing resource allocation problem, obtain optimal communication power at an extreme value point or a boundary of a domain of definition of the communication power optimization problem, use a method of Lagrange multipliers and a system of simultaneous equations of KKT conditions to derive an optimal computing resource allocation strategy for computing resource allocation, substitute the optimal communication power and the optimal computing resource allocation strategy into the objective function to obtain an offloading location decision problem model.
[0082] Step 4: Based on the communication power and the computing resource allocation strategy obtained in Step 3, establish a reinforcement learning model, use a deep reinforcement learning algorithm to train parameters of a depth neural network so as to maximize a cumulative reward of the reinforcement learning model from multi-step iteration, obtain an optimal offloading location decision for the computation tasks, obtain a joint optimization strategy including the optimal communication power, the optimal computing resource allocation strategy and the optimal offloading location decision.
[0083] In Step 1, n switches are arranged in an industrial network system, and according to communication coverage ranges of these switches, the network is partitioned into n cluster domains. Aside each switch, there is arranged one edge server with computing capacity of f.sub.j.sup.S. In the network, there are m on-site devices scattered in the n cluster domains. Computing capacity of each on-site device is f.sub.i.sup.L, and each device will generate one computation task Q.sub.i. The task Q.sub.i includes two indicators: data volume size d.sub.i of the task and computational load c.sub.i of the task. Each task is made available with three computation options, which are respectively computation on the specific device, computation on the edge server in the respective cluster domain, and computation on the edge server in another cluster domain.
[0084] Computation on the device (first computing model):
[0085] First time consumption of computation on the device is
[0086] First energy consumption of computation on the device is e.sub.i.sup.L=ζ.sub.i(f.sub.i.sup.L).sup.2c.sub.i, where ζ.sub.i is an energy consumption density for computation on the device i.
[0087] First overhead for computation on the device is u.sub.i.sup.L=α.sub.i.Math.t.sub.i.sup.L+(1−α.sub.i).Math.e.sub.i.sup.L, where α.sub.i is a weight factor and α.sub.i∈(0,1).
[0088] Offloading of the computation task to the edge server in the respective cluster domain where the device is in and computation thereon (second computing model).
[0089] Time consumption of computation on the edge server in the local cluster domain is a task transmission time plus a computation time. At first, a wireless transmission model is established.
[0090] Within the same cluster domain, during a data uploading process from a device to the switch, a single wireless frequency is used. As uplinks of different devices suffer from interference with one another, which affects the transmission rate and transmission quality, a time division multiple access (TDMA) technique is employed. The TDMA technique divides time into periodic, non-overlapping frames, and each TDMA frame is divided into a number of time slots. Devices transmit data in the respectively assigned time slots, enabling many-to-one communication without mutual interference. This increases the utilization of wireless channel resources and, at locations with heavy network load, allows for desirable utilization of wireless channels, ensuring transmission quality and rate and providing tasks with real time guarantees. Since tasks from devices in the cluster domain are offloaded to different locations and have different requirements on channel resources, a dynamic time slot allocation algorithm is adopted, in which only when a device is in need of data transmission, it is allocated with a time slot. This avoids interference with the uplink of another device and increases time slot utilization.
[0091] A TDMA frame is structured to contain header bits, several time slots and tail bits. Each time slot contains information such as synchronization bits, user information bits and guard bits. Among these, data to be transmitted is contained in the user information bits, and is also a main part of TDMA frame transmission. If a total wireless communication bandwidth for the switch j in the cluster domain is B.sub.j, then a total bandwidth that one TDMA frame can allocate will be B.sub.j. It is necessary to, according to the number of devices in need of offloading and data volumes of the tasks in the cluster domain, to carry out time slot and bandwidth allocation for the devices. Assuming that, in the cluster domain of the switch S.sub.j, a corresponding ensemble of devices is
Ψ.sub.j, and that corresponding rules are determined according to the actual topology of the network, then according to the dynamic time slot allocation algorithm, time slot and bandwidth allocation is carried out for devices in need of task offloading. Allocation proportions are determined according to data volume sizes of the tasks. Then, the size of an actual bandwidth allocated to the device i is:
[0092] where x.sub.i is a binary variable. x.sub.i=0 means the task is computed on the device, while x.sub.i=1 means the task is offloaded to the edge server for computation.
[0093] The Shannon formula is used to derive a data transmission rate at which the device transmits the task:
[0094] where B.sub.i is a wireless channel bandwidth allocated to the device i, p.sub.i is communication transmission power of the device, g.sub.i is a channel gain between the device i and the switch, N.sub.0 is a single-sided power spectral density of channel noise. Second time consumption can be obtained as
[0095] where γ.sub.i denotes the location of the edge server for the computation task Q.sub.i and γ.sub.i ∈ {1,2, . . . , N}, κ.sub.i.sup.γ.sup.
[0096] Second overhead for computation in the respective cluster domain is
[0097] Offloading of the computation task to the edge server in another cluster domain and computation thereon (third computing model).
[0098] For offloading the computation task to the edge server in another cluster domain, the transmission path is two-hop, i.e., first from the device to the switch in the local cluster domain, and then from the switch in the local cluster domain to the switch in the destination cluster domain. The switches in different cluster domains are connected in a wired fashion, and a data transmission rate between them is a constant r.sub.w. Time consumption of computation of the computation task on the edge server in the other cluster domain consists of three parts: a transmission time for the device to offload the task to the switch in the respective cluster domain, a transmission time for the switch in the respective cluster domain to offload the task to the switch in the destination cluster domain, and a computation time of the task on the edge server in the destination cluster domain. Therefore, third time consumption is
[0099] Third energy consumption is
[0100] Here, similarly, only energy consumption for the device to transmit the task to the switch in the respective cluster domain is considered.
[0101] Third overhead is u.sub.i.sup.OS=α.sub.i.Math.t.sub.i.sup.OS+(1−α.sub.i).Math.e.sub.i.sup.OS.
[0102] Step 2 specifically involves:
[0103] based on the three computing models established in Step 1, establishing a total overhead model for the offloading of the computation task. First of all, a decision variable x.sub.i={0,1} is defined so that x.sub.i=0 means the computation task is computed on the local device, while x.sub.i=1 means the computation task is offloaded to an edge server for computation. A decision variable β.sub.i={0,1} is defined so that β.sub.i=0 means the computation task is executed on the edge server in the respective cluster domain, while β.sub.i=1 means the computation task is executed on the edge
server in another cluster domain. The location of an edge server is denoted as γ.sub.i, where γ.sub.i ∈ {1,2, . . . , N}. Total overhead for offloading of the computation task Q.sub.i is u.sub.i=(1−x.sub.i)u.sub.i.sup.L+x.sub.i(u.sub.i.sup.LS+β.sub.i(u.sub.i.sup.OS−u.sub.i.sup.LS)). Thus, total overhead for offloading of all the computation tasks in the Industrial Internet of Things network is:
[0104] The following objective function is constructed:
[0105] Constraints of this function are:
p.sub.i≤p.sub.i.sup.max; x.sub.i∈{0,1}, l=1,2, . . . ,m; β.sub.iε{0,1}, l=1,2, . . . ,m; γ.sub.i ∈ {1,2, . . . , N}. Where, O.sub.γ, represents a set of computation tasks processed on the edge server γ.sub.i, the optimization variables κ.sub.i.sup.γ.sup.
[0106] In Step 3, the mixed integer non-linear programming problem constructed in Step 2 is solved.
[0107] In Step 3, a given set of feasible solutions corresponding to the offloading location of the computation task, x.sup.0, β.sup.0 and γ.sup.0, is substituted into the original objective function. Assuming there are l on-site devices choosing to offload the computation tasks to edge servers for processing, in which p devices choose to offload the tasks to the edge servers in other cluster domains for processing and the remaining devices choose local computation, then in the objective function, there remain only the continuous variables κ.sub.i and p.sub.i. The objective function is expressed as:
[0108] The non-linear programming problem is decomposed into a communication power optimization problem and a computing resource allocation problem. The following functions and constraints are obtained:
[0109] with the corresponding constraint p.sub.i≤p.sub.i.sup.max;
[0110] and
[0111] with the corresponding constraint
[0112] For the communication power optimization problem g(p), parameter substitutions are made by letting
q.sub.i=log.sub.2(1+C.Math.p.sub.i). Thus,
and the original function is transformed into
A first-order derivative of this is obtained as
and a second-order derivative as
Let y=ln2.Math.q.sub.i.Math.2.sup.q.sup.
is always true in the domain of definition q.sub.i>0. Therefore, G″(q.sub.i)>0, i.e., the function G(q) is a convex function.
[0113] Since G″(q.sub.i)>0, G′(q.sub.i) monotonically increases in the domain of definition. Let G′(q.sub.i)=0, then we obtain the value of q.sub.i.sup.0 at which first-order derivative is zero. Thus, if q.sub.i∈(0,q.sub.i.sup.0) and G′(q.sub.i)<0, then G(q) monotonically decreases. If q.sub.i∈(q.sub.i.sup.0,∞) and G′(q.sub.i)>0, then G(q) monotonically increases. From a range in which the value of p.sub.i is constrained, a range in which the value of q.sub.i is constrained is obtained as
If
[0114]
then the optimal value will be q.sub.i.sup.0. That is,
Otherwise, the optimal value will be
That is, p*.sub.i=p.sub.i.sup.max.
[0115] Through calculating a Hessian matrix of the function h(κ) over the variable κ, it can be proved that the function h(κ) is a convex function over the variable κ. Moreover, since the constraints are linear, the resource allocation sub-problem is a convex optimization problem over the variable κ. A Lagrange function of h(κ) is established as
Solving it using the KKT conditions results in an optimal computing resource allocation strategy as
[0116] By substituting the optimal communication power, p*.sub.i and optimal computing resource allocation strategy κ*.sub.i into the original objective function, we obtain an offloading location decision problem expressed as:
[0117] This offloading location decision problem is an integer linear programming problem over the variables x.sub.i, β.sub.i and γ.sub.i. In Step 4, a deep reinforcement learning algorithm is adopted for solution. Reinforcement learning has three key components: state, action and reward. For the offloading location decision problem model in this application, the three components are defined as follows.
[0118] State: a state of the system is total time consumption T and total energy consumption E of all the tasks.
[0119] Action: an action of the system consist of three parts, which are respectively the variable [x.sub.1,x.sub.2, . . . ,x.sub.m] determining whether a task needs to be offloaded, the variable [β.sub.1,β.sub.2, . . . ,β.sub.m] determining whether a task is computed on the edge server in the respective cluster domain, and the variable [γ.sub.1,γ.sub.2, . . . ,γ.sub.m] of the edge server location for a computation task. Therefore, an action space is defined as α=[x.sub.1,x.sub.2, . . . ,x.sub.m,β.sub.1,β.sub.2, . . . ,β.sub.mγ.sub.1,γ.sub.2, . . . ,γ.sub.m].
[0120] Reward: when the system in a state s takes an action a to transition to the next state s′, an immediate reward is given as R (s,a). The goal of reinforcement learning is to find a continuous optimal strategy that maximizes the total reward, while the objective function requires computation overhead for all the tasks to be shortest. Therefore, a reward of the system is defined as:
[0121] where U.sub.local is total overhead when all the tasks are computed on the devices, U is total overhead required to complete the tasks when the system takes the current decision. Therefore, the value of the objective function is negatively correlated to the reward function. At the maxima of the reward function, the minima of the objective function can be obtained.
[0122] As shown in
[0123] Step 4.1: defining a state, an action and a reward;
[0124] Step 4.2: initializing a memory buffer area, the memory buffer area configured to store at least one set of memory data, the memory data including a current state, a current action, a current reward and a subsequent state, and initializing a weight of a value function network to make network parameters of the objective function the same as network parameters of the value function;
[0125] Step 4.3: initializing a state of the value function network, computing total time consumption T and total energy consumption E of all the computation tasks, taking a result of the computation as a state s.sub.1, and inputting the state s.sub.1 to the value function network to obtain corresponding outputs of the value function in response to different actions taken in the state s.sub.1;
[0126] Step 4.4: performing action selection using a ε-greedy strategy by stochastically selecting an action α.sub.t at a probability of ε or, otherwise, selecting an action that maximizes the value function, obtaining an immediate reward and proceeding to the next state s.sub.t+1 as a result of executing the action, and storing each set of such memory data (including the state, action, immediate reward and next state) in the memory buffer area for network training;
[0127] Step 4.5: stochastically sampling a plurality of the memory data from the memory buffer area, in case of the current state being a final state, configuring a temporal difference target as r.sub.j, in case of the current state not being a final state, inputting each of the plurality of the memory data to the objective function network to compute the temporal difference target, the objective function giving a network output as
[0128] Step 4.6: taking the network outputs of the value function as estimated values, taking the network output of the objective function as a labeled value, using an SGD (Stochastic Gradient Descent) algorithm to update the network parameters of the value function,
[0129] wherein an expression of the SGD algorithm is
[0130] the network parameters of the value function are configured to be updated according to the formula θ=θ+Δθ;
[0131] Step 4.7: repeating Steps 4.4 to 4.6 until the network parameters of the value function are updated for a fixed number of times, valuing the objective function network with the network parameters of the value function in the current state, outputting an optimal state and an action associated with the optimal state. Moreover, this valuation approach weakens the relevance between successive data and increases stability of the network parameters.
[0132] Preferred specific embodiments have been described in detail above. It should be understood that, those of ordinary skill in the art, without the need for creative effort, can make various modifications and changes, based on the concept of the present invention. Accordingly, all the technical solutions that can be obtained by those skilled in the art by logical analysis, inference or limited experimentation in accordance with the concept of the invention on the basis of the prior art are intended to fall within the protection scope as defined by the claims.