Multi-Objective Real-time Power Flow Control Method Using Soft Actor-Critic
20210367424 · 2021-11-25
Inventors
- Ruisheng Diao (Richland, WA, US)
- Di Shi (San Jose, CA)
- Bei Zhang (San Jose, CA, US)
- Siqi Wang (San Jose, CA, US)
- Haifeng Li (Nanjing, CN)
- Chunlei Xu (Nanjing, CN)
- Desong Bian (San Jose, CA, US)
- Jiajun Duan (San Jose, CA)
- Haiwei Wu (Nanjing, CN)
Cpc classification
Y04S40/20
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y02E40/70
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
H02J2203/20
ELECTRICITY
Y04S10/50
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
H02J3/001
ELECTRICITY
Y02E60/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
Systems and methods are disclosed for control voltage profiles, line flows and transmission losses of a power grid by forming an autonomous multi-objective control model with one or more neural networks as a Deep Reinforcement Learning (DRL) agent; training the DRL agent to provide data-driven, real-time and autonomous grid control strategies; and coordinating and optimizing power controllers to regulate voltage profiles, line flows and transmission losses in the power grid with a Markov decision process (MDP) operating with reinforcement learning to control problems in dynamic and stochastic environments.
Claims
1. A method to control voltage profiles, line flows and transmission losses of a power grid, comprising: forming an autonomous multi-objective control model with one or more neural networks as a Deep Reinforcement Learning (DRL) agent; training the DRL agent to provide data-driven, real-time and autonomous grid control strategies; and coordinating and optimizing power controllers to regulate voltage profiles, line flows and transmission losses in the power grid with a Markov decision process (MDP) operating with reinforcement learning to control problems in dynamic and stochastic environments.
2. The method of claim 1, wherein the DRL agents are trained offline by interacting with offline simulations and historical events which are periodically updated.
3. The method of claim 1, wherein the DRL agent provides autonomous control actions once abnormal conditions are detected.
4. The method of claim 1, wherein after an action is taken in the power grid at a current state, the DRL agent receives a reward from the power grid.
5. The method of claim 1, comprising updating a relationship among action, states and reward in the agent's memory.
6. The method of claim 1, comprising solving a coordinated voltage control problem.
7. The method of claim 6, comprising performing a Markov Decision Process (MDP) that represents a discrete time stochastic control process.
8. The method of claim 6, comprising using a 4-tuple to formulate the MDP:
(S,A,P.sub.a,Ra) where S is a vector of system states, A is a list of actions to be taken, P.sub.a(s, s′)=Pr(s.sub.t+1=s′|s.sub.t=s, a.sub.t=a) represents a transition probability from a current state s.sub.t to a new state, s.sub.t+1, after taking an action a at time=t, and R.sub.a(s, s′) is a reward received after reaching state s′ from a previous state s to quantify control performance.
9. The method of claim 1, wherein the DRL agent comprises two architecture-identical deep neural networks including a target network and an evaluation network,
10. The method of claim 1, comprising providing a sub-second control with a EMS or PMU data stream from a wide area measurement system (WAMS).
11. The method of claim 1, wherein the DRL agent self-learns by exploring control options in a high dimension by moving out of local optima.
12. The method of claim 1, comprising performing voltage control, line flow control and transmission loss control by the DRL agent by considering multiple control objectives and security constraints.
13. The method of claim 1, wherein a reward is determined based on voltage operation zones with voltage profiles, including a normal zone, a violation zone, and a diverged zone.
14. The method of claim 1, comprising applying a decaying ε-greedy method for learning, with a decaying probability of ε.sub.i to make a random action selection at an i.sup.th iteration, wherein ε.sub.i is updated as
15. A method to control voltage profiles, line flows and transmission losses of a power grid, comprising: measuring states of a power grid; determining abnormal conditions and locating affected areas in the power grid; creating representative operating conditions including contingencies for the power grid; conducting power grid simulations in an offline or online environment; training deep-reinforcement-learning-based agents for autonomously controlling power grid voltage profiles, line flows and transmission losses; and coordinating and optimizing control actions of power controllers in the power grid.
16. The method of claim 15, wherein the measuring states comprises measuring from phasor measurement units or energy management systems.
17. The method of claim 15, comprising generating data-driven, autonomous control commands for correcting voltage issues and line flow issues considering contingencies in the power grid.
18. The method of claim 15, comprising presenting expected control outcomes once one or more DRL-based commands are applied to a power grid.
19. The method of claim 15, comprising providing a sub-second control with a phasor measurement unit (PMU) data stream from a wide area measurement system (WAMS).
20. The method of claim 15, comprising providing a platform for data-driven, autonomous control commands for regulating voltages, line flows, or transmission losses in a power network under normal and contingency operating conditions.
Description
BRIEF DESCRIPTIONS OF FIGURES
[0010]
[0011]
[0012]
[0013]
[0014]
DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
[0015] Without losing generality, this embodiment mainly targets at deriving real-time corrective operational control decisions for the actual system operating conditions at an interval of 5 minutes in a control center. The control objectives include regulating bus voltages within their secure zones and minimizing transmission line losses while respecting power flow equations and physical constraints, e.g., line ratings, limits of generators. The mathematical formulation of the control problem is given below:
Objective:
[0016]
minimize Σ.sub.i,j.sup.NPloss.sub.i,j,(i,j)∈Ω.sub.L (1)
Subject to:
[0017]
Σ.sub.n∈GiP.sub.n.sup.g−Σ.sub.m∈DiP.sub.m.sup.d−g.sub.iV.sub.i.sup.2=Σ.sub.j∈B.sub.
Σ.sub.n∈GiQ.sub.n.sup.g−Σ.sub.m∈DiQ.sub.m.sup.d−b.sub.iV.sub.i.sup.2=Σ.sub.j∈B.sub.
P.sub.n.sup.min≤P.sub.n≤P.sub.n.sup.max,n∈G (4)
Q.sub.n.sup.min≤Q.sub.n≤Q.sub.n.sup.max,n∈G (5)
V.sub.i.sup.min≤V.sub.i≤V.sub.i.sup.max,i∈B (6)
√{square root over (P.sub.ij.sup.2+Q.sub.ij.sup.2)}≤S.sub.ij.sup.max(i,j)∈Ω.sub.L (7)
where Eqs. (2) and (3) represent active and reactive power flow equations, respectively. Eqs. (4) and (5) are active and reactive power output constraints of each generator, respectively. Eqs. (6) and (7) specify bus voltage secure zones and line flow limits of a power system to be controlled, respectively.
A. Overall Flowchart of the Proposed Methodology
[0018] Deriving multi-objective real-time control actions can be formulated as a discrete-time stochastic control process, a.k.a., MDP. Among various DRL techniques, the off-policy, SAC method is adopted because of its superior performance in fast convergence and robustness, which maximizes the expected reward by exploring as many control actions as possible, leading to a better chance of finding optimum.
[0019] The main flowchart of the proposed methodology is depicted in
B. Training Effective SAC Agents
[0020] To train effective DRL agents for multi-objective real-time power flow control, one needs to carefully define several key elements, including:
1) Episode and Terminating Conditions
[0021] Each episode is defined as a quasi-steady-state operating snapshot, obtained from the EMS system and saved in text files. Termination condition of a training episode can be: i) no more voltage or thermal violations & reduction of transmission losses reaching a threshold, e.g., 0.5%; ii) power flow diverges; or iii) the maximum number of control iteration is reached.
2) State Space
[0022] The action space is formed by including bus voltage magnitudes, phase angles, active and reactive power on transmission lines. Batch normalization technique is applied to different types of variables for maintaining consistency and improving model training efficiency.
3) Control Space
[0023] In this work, conventional generators are used to regulate voltage profiles and transmission line losses. A control vector is then created to include voltage set points at each power plant as continuous values, e.g., [0.9,1.1] p.u.
4) Reward Definition
[0024] The reward value at each control iteration when training SAC agent adopts the following logic:
If voltage or flow violation is detected:
else if delta_p_loss<0:
reward=50−delta_p_loss*1000
else if delta_p_loss>=0.02
reward=−100
else: reward=−1−(p_loss−p_loss_pre)*50
where dev_overflow=Σ.sub.i.sup.N(Sline(i)−Sline_max(i)).sup.2; N is the total number of lines with thermal violation; Sline is the apparent power of line; Sline max is the limit of line apparent power; vio_voltage=Σ.sub.j.sup.M(Vm(j)−Vmin)*(Vm(j)−Vmax); M is the total number of buses with voltage violation;
loss is the present transmission loss value and p_loss_pre is the line loss at the base case. The details of training SAC agents are given in Algorithm I, shown below.
TABLE-US-00001 Algorithm I: Soft Actor-Critic Training Process for Multi-Objective Power Flow Control 1. Initialize weights of neural networks, θ and ϕ, for policy π(s, a) and value function V(s), respectively; initialize weights ψ and
[0025] The proposed SAC-based methodology for multi-objective power flow control was developed and deployed in the control center of SGCC Jiangsu Electric Power Company. For demonstrating its effectiveness, the city-level high-voltage (220 kV+) power network is used, which consists of 45 substations, 5 power plants (with 12 generators) and around 100 transmission lines, serving electricity to the city of Zhangjiagang. Massive historical operating snapshots (full topology node/breaker models for Jiangsu province with ˜1500 nodes and ˜420 generators, at an interval of 5 minutes) were obtained from their EMS system (named D5000 system) where the AC power flow computational module is used as grid simulation environment to train SAC agents. The control objectives are set to minimize transmission losses (at least 0.5% reduction) without violating bus voltages ([0.97-1.07] pu) and line flow limits (100% of MVA rating). Voltage setpoints of the 12 generators in 5 power plants are adjusted by the SAC agent.
[0026] The performance of training and testing SAC agents using a time series of actual system snapshots is illustrated in
[0027] An example of test bed can be found in
[0031] The system supports training effective SAC agents with periodic updating for multi-objective power flow control in real-time operational environment. The detailed design and flowchart of the proposed methodology are provided for reducing transmission losses without violating voltage and line constraints. Numerical simulations conducted on a real power network in real-time operational environment demonstrates the effectiveness and robustness.
[0032] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. As used herein, the term “module” or “component” may refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein may be preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In this description, a “computing entity” may be any computing system as previously defined herein, or any module or combination of modulates running on a computing system. All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Although embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.