Implemented Q-Learning algorithm for a 2D grid world environment in CUDA
-
Multiple agents were initialized with random start locations in a 2D grid world
-
Q-learning was used to learn the optimal policy, where the objective was to reach the goal state (flag) in the minimum number of steps while avoiding land mines
-
A common Q-table was maintained and updated simultaneously by all the agents based on the Bellman Optimality Equation
-
The computation involved with each agent was implemented in different threads to parallelize the Q learning process across multiple agents
Visualization of the Multi-agent Q-Learning for grid navigation