multi-agent reinforcement learning and bandit learning