This paper proposes a hybrid deep actor-critic framework for the optimal operation of a phase-changing soft open point (PCSOP) in an unbalanced distribution network. The framework combines algorithmic features of off-policy reinforcement learning and imitation learning. The proposed method comprises a policy-guiding module based on the PCSOP physics and an adaptive dynamic experience replay buffer. The policy-guiding module facilitates the agent’s navigation of the complex action space of the PCSOP. The dynamic experience replay accelerates agent training by leveraging expert demonstrations. As part of the design process, the paper also proposes a data-driven linearization of operational power losses in PCSOPs to enhance the convergence of nonlinear AC optimal power flow calculations without compromising accuracy. The proposed framework was trained and tested on a modified three-phase IEEE-33 bus test feeder. Results demonstrate the superiority of our framework compared to three different methods, including the conventional nonlinear AC optimal power flow.
Authors: Shoaib Hussain, Mostafa Farrokhabadi, Hamidreza Zareipour