HM‑DRL: Enhancing Multi‑Agent Pathfinding with a Heatmap‑Based Heuristic for Distributed Deep Reinforcement Learning

Authors: M. Boumediene, A. Maoudj & A. L. Christensen

Published: 15 July 2025 |  Journal: Applied Intelligence, Vol. 55, Art. 873

Abstract

The integration of mobile robots into industrial environments, particularly in warehouse logistics, has led to an increasing need for efficient Multi-Agent Pathfinding (MAPF) solutions. These solutions are crucial for coordinating large fleets of robots in complex operational settings, yet several existing methods struggle with scalability issues in densely populated environments. Additionally, many existing learning-based approaches are computationally expensive and require excessive training time.

In this paper, we propose a Deep Reinforcement Learning (DRL)-based approach that exploits the strengths of DRL and graph-convolutional communication to efficiently coordinate fleets of mobile robots with limited communication range in partially observable environments. We introduce a novel heatmap-based heuristic that reduces the observation space while retaining the critical information needed for effective path planning and conflict resolution.

Additionally, we use optimized training objectives that allow agents to reduce the training time by as much as 3.4 times compared to state-of-the-art DRL-based methods. Our approach also employs curriculum learning and distributed training to improve efficiency. To further enhance performance, we introduce mechanisms for resolving conflicts in constrained scenarios, resulting in a higher success rate and overall operational efficiency.

Finally, we validate our approach through extensive simulation-based experiments, showing that it outperforms the state-of-the-art DRL-based MAPF methods and that it scales effectively to larger maps and densely populated environments.

Experimental Results

Demo of HM-DRL running with 64 agents at 10% obstacle density
Animation of HM‑DRL coordinating 64 agents in an environment with 10 % obstacle density.

Success Rate Comparison

We compared HM‑DRL against DHC and PRIMAL by measuring the success rate on the custom 40×40 map (256‑step limit) and custom 80×80 map (386‑step limit) across 400 randomized scenarios per agent count.

Success Rate Comparison: HM-DRL vs DHC vs PRIMAL
HM‑DRL consistently outperforms DHC and PRIMAL, maintaining >84 % success on the 40×40 map and >98 % on the 80×80 map even at 64 agents.

Video

Citation

@article{boumediene2025hmdrl,
  author    = {Boumediene, Mouad and Maoudj, Abderraouf and Christensen, Anders Lyhne},
  title     = {HM-DRL: Enhancing Multi‑Agent Pathfinding with a Heatmap‑Based Heuristic for Distributed Deep Reinforcement Learning},
  journal   = {Applied Intelligence},
  volume    = {55},
  pages     = {873},
  day       = {15},
  month     = {Jul},
  year      = {2025},
  doi       = {10.1007/s10489-025-06747-0},
  url       = {https://doi.org/10.1007/s10489-025-06747-0},
  publisher = {Springer}
}
        

Back to Publications