METHOD OF CONTROLLING A WIND FARM USING A REINFORCEMENT LEARNING METHOD
20240183337 ยท 2024-06-06
Inventors
- Eva BOUBA (RUEIL-MALMAISON CEDEX, FR)
- Donatien DUBUC (RUEIL-MALMAISON CEDEX, FR)
- Jiamin ZHU (RUEIL-MALMAISON CEDEX, FR)
- Claire BIZON MONROC (PARIS CEDEX 12, FR)
- Ana BUSIC (PARIS CEDEX 12, FR)
Cpc classification
F03D7/049
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F03D7/0204
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F03D7/028
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F03D7/046
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F05B2270/8042
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F05B2270/321
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
Y02E10/72
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
F05B2270/20
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F03D7/048
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F05B2270/32
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
International classification
F03D7/04
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
Abstract
The present invention is a wind farm control method, wherein a reinforcement learning method (RL) is implemented in a decentralized manner (for each wind turbine). The reward (REC) is calculated according to a wake propagation delay (DEL). Thus, the reward is truly representative of the effect of the last action (previous yaw control for example).
Claims
1-10. (canceled)
11. A method of control of a wind farm comprising wind turbines, each wind turbine of the wind farm, comprising an actuator for modifying an operating point including a yaw angle of each turbine with the yaw angle being an angle formed between a rotor of the turbine and a wind direction, comprising steps of: a. acquiring power generated by each wind turbine, a wind speed and the wind speed direction, and the yaw angle of each turbine; b. for each wind turbine, determining a wake propagation delay formed by the turbine, according to the acquired wind speed and direction, and a layout of the wind turbines within the wind farm; c. for each wind turbine, determining a value for a reward representing an impact of control of the turbine, on a sum of the power generated by all the turbines of the wind farm wherein the reward being calculated by accounting for the determined wake propagation delay; d. for each wind turbine, applying a reinforcement learning method to determine a target operating point of the turbine according to the determined reward and to a previous yaw angle; and e. controlling the operating point of each wind turbine by applying the determined target operating point using the actuator.
12. The wind farm control method as claimed in claim 11, wherein the wake propagation delay is determined by use of Taylor's frozen turbulence hypothesis.
13. The wind farm control method as claimed in claim 11, wherein the wake propagation delay is determined by accounting for a wake distance limit.
14. The wind farm control method as claimed in claim 12, wherein the wake propagation delay is determined by accounting for a wake distance limit.
15. The wind farm control method as claimed in claim 11, wherein the reinforcement learning method is a Watkins Q-learning method.
16. The wind farm control method as claimed in claim 13, wherein the reinforcement learning method is a Watkins Q-learning method.
17. The wind farm control method as claimed in claim 15, wherein the reinforcement learning method is a Watkins Q-learning method.
18. The wind farm control method as claimed in claim 11, wherein for each wind turbine, the reward is determined when a time elapsed since the last control by the actuator is greater than the determined wake propagation delay.
19. The wind farm control method as claimed in claim 12, wherein for each wind turbine, the reward is determined when a time elapsed since the last control by the actuator is greater than the determined wake propagation delay.
20. The wind farm control method as claimed in claim 13, wherein for each wind turbine, the reward is determined when a time elapsed since the last control by the actuator is greater than the determined wake propagation delay.
21. The wind farm control method as claimed in claim 14, wherein for each wind turbine, the reward is determined when a time elapsed since the last control by the actuator is greater than the determined wake propagation delay.
22. The wind farm control method as claimed in claim 11, wherein the operating point is the yaw angle and the control of the yaw angle is a variation of a fixed pitch of the yaw angle.
23. The wind farm control method as claimed in claim 22, wherein the acquired yaw angle is the previous controlled yaw angle.
24. The wind farm control method as claimed in claim 11, wherein the wind speed and the wind direction are acquired by measuring with one of a LiDAR sensor, an anemometer, a real-time control, and a data acquisition system.
25. The wind farm control method as claimed in claim 12, wherein the wind speed and the wind direction are acquired by measuring with one of a LiDAR sensor, an anemometer, a real-time control, and a data acquisition system.
26. The wind farm control method as claimed in claim 13, wherein the wind speed and the wind direction are acquired by measuring with one of a LiDAR sensor, an anemometer, a real-time control, and a data acquisition system.
27. The wind farm control method as claimed in claim 14, wherein the wind speed and the wind direction are acquired by measuring with one of a LiDAR sensor, an anemometer, a real-time control, and a data acquisition system.
28. The wind farm control method as claimed in claim 15, wherein the wind speed and the wind direction are acquired by measuring with one of a LiDAR sensor, an anemometer, a real-time control, and a data acquisition system.
29. The wind farm control method as claimed in claim 11, wherein the reward is determined by acounting for an average of the electrical powers generated during a predetermined time interval.
30. A wind farm, wherein each wind turbine of the wind farm comprises an actuator for modifying an operating a yaw point of the turbine with the yaw angle being an angle formed between the rotor of the turbine and a wind direction and wherein the wind farm comprises a computer system for implementing the wind farm control method according to claim 11.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] Other features and advantages of the method and of the system according to the invention will be clear from reading the description hereafter of embodiments given by way of non-limitative example, with reference to the accompanying figures wherein: [0039]
DETAILED DESCRIPTION OF THE INVENTION
[0049] The present invention concerns a method for real-time control of a wind farm. A wind farm, also referred to as wind park or wind power plant, is a group of wind turbines that generate electricity. Each wind turbine of the wind farm comprises an actuator for modifying an operating point of the turbine. An example of an operating point can be the yaw angle of the turbine. Other operating points can notably be speed governing of the turbine, or modifying the power curve of the turbine. The position of the turbines within the wind farm, also referred to as wind turbine layout or wind turbine implementation, is previously known.
[0050] In the rest of the description, only the yaw angle control is described with it being understood that other operating points can however be controlled.
[0051] The method according to the invention comprises the following steps: [0052] 1) Acquiring the powers generated, the wind speed and direction, the yaw angles [0053] 2) Determining the wake propagation delay of each wind turbine [0054] 3) Determining a reward for each wind turbine [0055] 4) Applying a reinforcement learning method for each wind turbine [0056] 5) Controlling each wind turbine.
[0057] Steps 2 to 4 can be carried out by a computing system, notably a computer, a processor or a calculator. The steps are detailed in the description below. The steps are carried out in parallel for each wind turbine so that the method is decentralized. Furthermore, the method uses no physical model, and it is based on the real data measured within the wind farm, which makes it competitive in terms of computing time, representativity and accuracy. These steps are carried out continuously in real time and are repeated at each time step.
[0058]
[0059] 1) Acquiring the powers generated, the wind speed and direction, the yaw angles
[0060] This step acquires, uses measuring or uses estimation means: [0061] the electrical power generated by each wind turbine [0062] the wind speed, notably the undisturbed wind speed measured at the wind farm inlet [0063] the wind direction, notably the wind direction at the wind farm inlet, and [0064] the yaw angle of each wind turbine.
[0065] According to an embodiment, the wind speed and direction can be measured, notably by a LiDAR (Light Detection And Ranging) sensor, an anemometer, or measurements performed using a real-time control and data acquisition SCADA (Supervisory Control And Data Acquisition) system, or any similar sensor. A real-time control and data acquisition SCADA system is a large-scale remote control system allowing real-time processing of a large number of remote measurements and remote control of the technical facilities. It is an industrial technology in the field of instrumentation, whose implementations can be considered as instrumentation structures including a middleware type layer. The undisturbed wind speed at the wind farm inlet can be deduced from these measurements with the wind farm inlet being defined as a function of the wind direction.
[0066] According to an implementation, the electrical power generated by each wind turbine can be measured by the real-time control and data acquisition SCADA system, or by current and voltage measurements delivered by each turbine.
[0067] According to an aspect of the invention, the acquired yaw angle of each turbine can be the yaw angle controlled at a previous time. Alternatively, the yaw angle can be measured by a sensor which notably may be an angle sensor.
[0068] 2) Determining the wake propagation delay of each wind turbine
[0069] This step determines, for each wind turbine, a propagation delay for the wake formed by the turbine, as a function of the wind speed and direction (acquired in step 1), and of the implementation of the turbines within the wind farm. The wake effect corresponds to the disturbances and turbulences formed by a wind turbine in the wind flow. The wake formed by a turbine impacts the turbines downstream from this turbine in the direction of the wind. The implementation of the wind turbines corresponds to the layout of the turbines within the wind farm. In other words, the wind turbine implementation corresponds to the relative position of the turbines within the wind farm. The wind turbine implementation and the wind direction allow determination of the turbines downstream and upstream from each turbine. This step accounts for the wake propagation delay in wind park optimization problems.
[0070] According to an embodiment of the invention, the wake propagation delay can be approximated for each wind turbine using Taylor's frozen turbulence hypothesis. For example, the propagation delay d.sub.i,j from upstream turbine i to a downstream turbine j can be determined with the following formula:
with c.sub.i,j being the distance between turbines i and j being along the wind direction axis, and u.sup.? being the undisturbed wind speed measured at the wind farm inlet (obtained in step 1).
[0071] According to an implementation of this embodiment of the invention, the propagation delay formula can comprise a multiplying coefficient greater than 1.
[0072] As a variant, the propagation delay formula accounts for induction zones upstream from the rotor, which allows relaxing the frozen turbulence hypothesis with a more realistic duration taking account of the associated decelerations.
[0073] According to an aspect of the invention, determination of the wake propagation delay can account for a wake distance limit. Indeed, the disturbances in the wind field due to a turbine i become less significant as the distance to the rotor thereof increases, and the wind again encounters conditions similar to the free flow at the farm inlet with its wake effect being negligible beyond a given distance. Thus, due to this distance limit, the delay of propagation to any turbine j located beyond this distance may not be taken into account to calculate the reward delivery delay.
[0074] For example, a wake propagation delay matrix Di,j can be determined for each turbine pair (i,j) of the wind farm. This wake propagation delay matrix Di,j can be written for all turbine pairs i,j with 0?i, j? M, with M being the number of turbines of the wind farm:
with m being a multiplying coefficient greater than 1, d.sub.i,j being the propagation delay (resulting from Taylor's frozen turbulence hypothesis for example) from upstream turbine i to a downstream turbine j, c.sub.i,j being a distance between the two turbines i and j along the wind direction axis, and d.sub.lim being a wake distance limit.
[0075] 3) Determining a reward for each wind turbine
[0076] This step determines, for each wind turbine, a value of a reward representing the impact of a control of the turbine notably on the sum of the powers generated by all the turbines of the wind farm. The reward is determined as a function of the wake propagation delay determined in step 2. The reward is a parameter of the reinforcement learning method that is described in step 4. In other words, the reward is a value of a reward function of a reinforcement learning method described in step 4. The reward associated with a state enables the method to automatically learn an action. If the reward is positive, the action taken previously is favorable, which favors this action by machine learning. Given that a modification in a yaw angle of a wind turbine (or any other operating point) has a delayed impact on the downstream turbines, the reward is determined late according to the wake propagation delay determined in step 2.
[0077] According to an embodiment, the reward is determined by measuring the energy generation of the wind turbines downstream from a turbine considered at the time of the impact estimated by use of the propagation delay. Thus, the present invention allows reducing uncertainties. Each wind turbine therefore receives a different reward function according to its location in the wind park. Finally, in order to reduce the influence of the rated power on the evaluation of the impact of the various yaws, increases in percentage rather than in gross value can be taken into account. The sign of the measured variation can be used as the reward signal, and a threshold can be applied to filter the noise. For example, a reward ri,k can be defined for each turbine i (ranging between 1 and M, with M being the number of turbines of the wind farm), for each time step k:
with ? being a positive threshold, V.sub.i,1=?.sub.j=1.sup.M P.sub.j,k and V.sub.i,2=?.sub.j=1.sup.M P.sub.j,k+D.sub.
[0078] Furthermore, to take account of the instantaneous power variation due to the wind turbulence, the measurements of the power generated by each wind turbine can be averaged: by defining ??1 the size of the average window. In other words, the reward can be determined by taking account of an average of the powers generated by the wind turbines during a predetermined time interval (size of the average window ?).
[0079] According to an implementation of the invention, the reward can be determined for each wind turbine if the time elapsed since the last previous control is greater than the determined wake propagation delay. Thus, the reward is determined only when enough time has elapsed since the last previous control to be able to properly observe the impact of the last previous control on the powers generated by the turbines.
[0080] 4) Applying a reinforcement learning method for each wind turbine
[0081] This step is, for each wind turbine, applying a reinforcement learning method in order to determine a target yaw angle (or target operating point) of the turbine, according to the reward determined in step 3, and according to the previous yaw angle (or the previous operating point). The target yaw angle corresponds to a yaw angle setpoint for the turbine. The target operating point corresponds to an operating point setpoint. Reinforcement learning is, for an autonomous agent (a turbine of the wind farm here), the learning of the actions to be taken (yaw angle modification here) from experiments, so as to optimize a quantitative reward over time (here the reward determined in step 3, which takes the wake propagation delay into account). The agent is immersed in an environment (the wind farm and the wind here), and it makes decisions according to its current state (the current yaw angle here). In return, the environment provides the agent with a reward, which may be positive or negative. For the invention, it is a delayed-reward approach. The agent searches for an optimal strategy through past realizations insofar as it maximizes the sum of the rewards over time. This online learning approach is particularly interesting due to the modeling uncertainties. Furthermore, this approach enables real-time determination. Such a learning method is applied to each turbine of the wind farm, which allows optimization to be decentralized.
[0082] According to an embodiment of the invention, the reinforcement learning method can be a Watkins Q-learning method. Such a reinforcement learning method is notably described in C. Watkins, P. Dayan, Technical Note - Q-learning, Machine Learning 8, 279-292 (1992), 1992 Kluwer Academic Publishers, Boston, Manufactured in the Netherlands. As a variant, other reinforcement learning methods can be used, which are notably learning methods known as policy-gradient methods of actor-critic type, or any similar learning method. In these approaches, each agent has two elements: an actor trying to predict the optimal control for the current state, and a critic that assesses the efficiency of the control predicted by the actor. Algorithms of this type are for example known as A2C (Advantage Actor Critic), DDPG (Deep Deterministic Policy Gradient), PPO (Proximal Policy Optimization).
[0083] Formally, for the embodiment implementing the reinforcement learning method using the Q-learning method, a Markov Decision Process (MDP) {S, A, r, P} can be defined, with S being the state space, A being an action space, and P being the environment transition probability matrix and r: S?A.fwdarw.R a reward function. In other words, at time t, an agent situated in a state s.sub.t selects an action a.sub.t and receives an associated reward r.sub.t. In the following, the continuous time step is denoted by t and the discrete time is denoted by k.
[0084] The application that associates with each observed state a corresponding action, or a probability distribution over the possible actions in this state which is referred to as policy ?. It is then the that an agent follows a policy x. The sequence of states, actions and rewards {s.sub.0, a.sub.0, r.sub.0, s.sub.1, a.sub.1, . . . , s.sub.T, a.sub.T, r.sub.T} observed by the agent when it interacts with the environment is referred to as trajectory. The trajectory followed depends on the agent policy and on the environment dynamics; for an action taken in a given state, the next state and the reward obtained are not always identical. It is stated then to be a probability of transition from one state to another. In the present case for example, exogenous factors such as the wind vagaries can increase or decrease the reward. One therefore seeks to maximize its expected return.
[0085] It can be written that A(s) is the subset of actions a ? A available in state s. An agent interacts with the environment by following a stochastic policy a??(s), s ? S, a ? A(s), where x(als) is the probability of selecting action a when it is in state s. If the policy is deterministic, there is an action a for which ?(a|s)=1, and we can directly write a=?(s). The goal of the agent is then to find an optimal strategy x* that maximizes the expectation E of the infinite sum of all its discounted rewards, which is also referred to as discounted return:
with 0<?<1 being a discount factor (in other words, a weighting factor), s.sub.0 being an initial state, G being the discounted return, that is the sum of the discounted future rewards, and {S.sub.k, a.sub.k}:k=0 . . . ? the trajectory of the agent in the environment under policy ?. For a policy ?, it can be defined that the state-action value function Q.sub.? (or q function) for x being the expected discounted return for a state-action pair (a,s) : Q.sub.? (s, a)=E[G | s.sub.0=s, a.sub.0=a], which results that, for any state-action pair (s, a), Q.sub.? (s, a) is the expected value for an agent that selects action a in state s, which then follows policy ? for the rest of the trajectory. An optimal q function Q* can be defined such that ?(s, a), Q*(s, a)=max.sub.? Q.sub.?(s, a).
[0086] To search for the best policy in a given environment, one may directly try to learn Q*. The Watkins Q-learning algorithm conserves estimates of q values for each pair (s, a). It is said to be tabular.In particular Dong et al. exploit and iteratively update an estimate {circumflex over (Q)} of the optimal q function Q* at each time step k:
{circumflex over (Q)}.sub.k+1(s.sub.k, a.sub.k)={circumflex over (Q)}.sub.k(S.sub.k, a.sub.k)+l.sub.k. TD.sub.k
where TD.sub.k is the Bellman error estimator that can be defined as:
with ? ? (0, 1) being a discount factor and l.sub.k being a learning rate at time step k. {circumflex over (Q)} then converges with a probability of one towards Q* under certain reasonably achievable conditions. By following a decentralized approach where each turbine is modeled by an agent, M is considered (M being the number of turbines in the wind farm) with state spaces S.sub.i (1?i?M): Si=Y?R.sup.2 (with Y an allowable yaw angle space, and R is the set of real numbers) such that the state observed by an agent i at each time step k is defined by:
s.sub.i,k=[?.sub.i,k, w.sub.k].sup.T?.sub.L??.sub.i,k??.sub.U with ?.sub.L and ?.sub.U respectively for the lower and upper limit of the yaw values, and ?.sub.i,k being the yaw angle of turbine i at time step k and w.sub.k being the acquired wind conditions at time step k, in other words, the wind speed and the wind direction. According to a non-limitative example, the action space can be defined as A={?1?, 0?, 1? }.
[0087] When there is a known delay c between the time when an action is sent to the environment and the time when the associated reward is collected by the agent, the environment is referred to as a delayed-reward environment. The time step delay ca is further considered, so that, with h the sampling periodi.e. the time in seconds between two iterations of the algorithm, we obtain
is then the number of time steps before the reward becomes available. This delay can be managed by allowing the agent to observe the history of all the states received and actions taken during the delay. As a variant, a modification can be made to the Watkins Q-learning method referred to as dQ(0): at time step k, instead of updating ?(s.sub.k, a.sub.k), updates are made for action {hacek over (a)}.sub.k that takes effect at time step k. This flexible approach, readily adaptable to the decentralized case, enables management of the wake propagation delays in wind parks.
[0088] The present invention can implement a modification of the estimator update method in a reinforcement learning algorithm. For the Q-learning that updated ?(s.sub.k, a.sub.k) at time step k, updates can be made for action a.sub.k-c.sub.
[0089] Thus, according to an embodiment, the reward delay at time step ci for each turbine i can correspond to the number of time steps corresponding to the greatest estimated delay of wake propagation to another turbine of the park. For this embodiment, the update of the Q-learning method for any wind turbine can be written as follows:
with (S.sub.k, a.sub.k)?b(a|s.sub.k) the policy followed by an agent and guaranteeing a certain degree of exploration in the environment, ? ? (0, 1) is a discount factor and l.sub.k is a learning rate at time step k, s is the current state, that is the current yaw angle, a is the action, i.e. the yaw angle, r is the reward, ci is the time step corresponding to the greatest delay of wake propagation to another turbine of the park.
[0090] Once {circumflex over (Q)} determined, action a (that is the target yaw angle or the target operating point) is deduced. a can therefore be selected according to a so-called exploration policy b(a/s) that selects argmax ({circumflex over (Q)}(s, a)) with the highest probability, and all the other actions with non-zero probabilities.
[0091] According to an embodiment, a Boltzmann exploration policy can be selected:
with ?=0.1. The greater ?, the more the exploration is in an attempt to find better actions. The smaller ?, the more the trend is to systematically select the best action according to our current estimates. It is also possible to reduce t throughout the learning process.
[0092] 5) Controlling each wind turbine
[0093] This step controls each wind turbine by applying the target yaw angle (or the target operating point) determined in step 4. In this step, for each turbine, the actuator of the operating point of the turbine is controlled. Notably, the yaw angle actuator of the turbine can be controlled.
[0094] According to an embodiment, control of the yaw angle can correspond to a variation by a fixed pitch of the yaw angle. In other words, the yaw angle can be increased by a fixed pitch or decreased by a fixed pitch, or it may remain constant. The fixed pitch can range between 0.5 and 5 degrees, preferably between 0.5 and 2 degrees, and it can be 1 degree. This embodiment allows prevention of sudden yaw angle changes and, therefore, sudden wake changes.
[0095] As a variant, control of the yaw angle can correspond to a control at a precise value of the yaw angle.
[0096] Furthermore, the invention concerns wind farms. Each turbine of the wind farm comprises an actuator for modifying the yaw angle of the turbine or the operating point of the turbine. Besides, the wind farm comprises computing means, notably a computer, a processor or a calculator for implementing the control method according to any one of the variants or variant combinations described below. The wind farm is thus controlled by the computing means. In particular, the computing means allow to: [0097] acquire the powers generated by the wind turbines; [0098] acquire the yaw angles of each wind turbine; [0099] determine the wake propagation delay of each wind turbine; [0100] determine a target yaw angle for each wind turbine or a target operating point; and [0101] control the yaw angle of each wind turbine or the operating point.
[0102] The computing operation can be centralized: the wind farm comprises a single computer for implementing the steps of the control method, which communicates at a minimum with all the yaw angle actuators. Alternatively, each wind turbine comprises a computer, and all the computers communicate with each other.
[0103] According to an embodiment, the wind farm can comprise a wind sensor, notably a
[0104] LiDAR sensor or an anemometer.
[0105] According to an aspect of the invention, the wind farm can comprise SCADA measuring devices.
[0106] According to an embodiment option, the wind farm can comprise communications, notably intended to transmit at least one of data acquired in step 1 and communicate the target yaw angles to the controllers.
EXAMPLES
[0107] Other features and advantages of the method according to the invention will be clear from reading the application examples hereafter.
[0108] For these examples, two wind farms are considered. The two wind farms are schematically illustrated, by way of non-limitative example, in
[0109]
[0110] We have subsequently applied the control method according to the invention by implementing the Watkins Q-learning method with delayed reward management, as described above, over a period of 600,000 s, corresponding to 230,000 iterations. The action space of each agent is limited to 3 actions: {?1?, 0?, 1?}, for a corresponding yaw angle increase or decrease (in other words, the target yaw angle can increase by 1?, remain constant or decrease by)1? . For the first example, we thus learn 3 Q-tables of size 63?3, that is 189 parameters. For the second example, we learn 6 Q-tables of size 63?6, which are 378 parameters. A window is used for the power average of ?=10 min. In order to assess the algorithm performances without any prior knowledge, all the values of the Q-table are initialized at q0=0.15. The yaws are initialized at 0?, which corresponds to a naive and greedy strategy where all the turbines are designed to face the wind.
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117] It can further be noted that, for these two examples, the algorithm converges around 450,000 s, which corresponds to 150,000 iterations. Although the farm has doubled in size between the two examples, the convergence time remains substantially the same, which is a key advantage in relation to centralized approaches or those using turbine locking. Furthermore, as expected, the turbine pairs having the same position in the alignments converge towards similar values. These results validate the management of delayed reward by the present invention for decentralized and delay-sensitive Q-learning, under realistic turbulent wind conditions and with dynamic wake simulation. Besides, these results show that the method according to the invention maximizes the total power generation, that it adapts to several wind farm configurations, and that it enables real-time control with fast convergence. AMENDMENTS TO THE CLAIMS