Deep Reinforcement Learning Models for Traffic Flow Optimization in SDN Architectures

Sakina Abbasova; Maya  Karimova

doi:10.69760/lumin.2025000205

Authors

Sakina Abbasova Azerbaijan State Oil and Industry University. Author https://orcid.org/0000-0002-9213-5273
Maya Karimova Azerbaijan State Oil and Industry University Author https://orcid.org/0000-0003-4932-7031

DOI:

https://doi.org/10.69760/lumin.2025000205

Keywords:

Software-Defined Networking, Deep Reinforcement Learning, Traffic Engineering, DQN, PPO, A3C, Network Optimization

Abstract

Deep reinforcement learning (DRL) has emerged as a promising approach to dynamic traffic engineering in software-defined networks (SDN). In this work, we evaluate three popular DRL agents—Deep Q-Network (DQN), Asynchronous Advantage Actor-Critic (A3C), and Proximal Policy Optimization (PPO)—on simulated SDN routing tasks. Using a Mininet emulated network with a Ryu controller and TensorFlow-based agents, we compare DRL models against traditional baselines (shortest-path routing and equal-cost multi-path (ECMP)). The DRL agents learn to select routes based on observed link loads and flow queues, with rewards reflecting combined throughput, latency, and packet loss. Our simulations show that all DRL methods significantly outperform fixed routing baselines: for example, a PPO-based agent reduced average flow latency by ≈20% and packet loss by ≈25% relative to shortest-path routing. PPO and A3C converged faster and to higher rewards than DQN, likely due to their on-policy and parallel learning designs. We provide a detailed comparison of algorithm characteristics, training stability, and network metric outcomes. The results highlight each model’s strengths: PPO’s stability and sample efficiency, A3C’s parallelism and multi-agent potential, and DQN’s simplicity. We critically discuss limitations such as training overhead and convergence variance. Finally, we outline future directions for improving real-world SDN traffic control with DRL, including transfer learning across topologies, online continual learning, and multi-agent coordination.

Author Biographies

Sakina Abbasova, Azerbaijan State Oil and Industry University.

¹Abbasova, S. Lecturer, Department of Instrument Engineering, Azerbaijan State Oil and Industry University. ORCID: https://orcid.org/0000-0002-9213-5273
Maya Karimova, Azerbaijan State Oil and Industry University

²Kərimova, M. Lecturer, Department of Instrument Engineering, Azerbaijan State Oil and Industry University. ORCID: https://orcid.org/0000-0003-4932-7031

References

Dake, D. K., Gadze, J. D., Klogo, G. S., & Nunoo-Mensah, H. (2021). Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention. Technologies, 9(3), 44. https://doi.org/10.3390/technologies9030044:contentReference[oaicite:44]{index=44}

He, Y., Xiao, G., Zhu, J., Zou, T., & Liang, Y. (2024). Reinforcement Learning-based SDN Routing Scheme Empowered by Causality Detection and GNN. Frontiers in Computational Neuroscience, 18, Article 1393025.

Lantz, B., Heller, B., & McKeown, N. (2010). A network in a laptop: Rapid prototyping for software-defined networks. Proceedings of HotNets-IX, 1–6.

Li, J., Ye, M., Guo, Z., Yen, C.-Y., & Chao, H. J. (2020). CFR-RL: Critical Flow Rerouting Using Deep Reinforcement Learning for Network-wide Traffic Engineering. IEEE Journal on Selected Areas in Communications (Under Review).

Troia, S., Sapienza, F., Varese, L., & Maier, G. (2020). On Deep Reinforcement Learning for Traffic Engineering in SD-WAN. IEEE Journal on Selected Areas in Communications, 39(7), 1–15.

Wang, S., Song, R., Zheng, X., Huang, W., & Liu, H. (2025). A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks. Future Internet, 17(4), 158.

X. Zhang, L. Qiu, Y. Xu, X. Wang, S. Wang, A. Paul, & Z. Wu (2023). Multi-Path Routing Algorithm Based on Deep Reinforcement Learning for SDN. Applied Sciences, 13(22), 12520.

Zhang, J., Ye, M., Guo, Z., Yen, C.-Y., & Chao, H. J. (2020). Deep Reinforcement Learning for Traffic Engineering in SDN (CFR-RL). arXiv:2004.11986.

Zhang, S., Tan, G., Wu, Z., & Wu, J. (2018). DROM: Optimizing the Routing in Software-Defined Networks with Deep Reinforcement Learning. IEEE Access, 6, 64533–64539.

Chiesa, L., Forster, A., Keslassy, I., Freris, N., Vardi, M., & Schapira, M. (2017). Traffic engineering with equal-cost multipath: An algorithmic perspective. IEEE/ACM Transactions on Networking, 25(1), 320–337.

He, H., Sun, Z., & Zhu, Z. (2016). Achieving Fast Convergence in SDN. IEEE Transactions on Network and Service Management, 13(2), 307–321.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.

Mnih, V., Badia, A. P., Mirza, M., Graves, A., et al. (2016). Asynchronous methods for deep reinforcement learning. Proceedings of ICML.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv:1707.06347.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.

Deep Reinforcement Learning Models for Traffic Flow Optimization in SDN Architectures

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Latest publications

Information

Language

Make a Submission

Browse