Goal: Stay alive (possibly kill others) Input: Game State(Pixel value) Output: A move (direction, up, left, right)
Grid size is 20 by 20 For our project, we will be creating a tron agent for the ColosseumRL competition trained through reinforcement learning. The goal of our project is to create an agent that will survive the longest in the competition. The input for the agent is a list of valid_actions and an observation. In the observation, we are always player 0. The agent picks “the best” valid action based on the observation. Potential applications include self driving cars in terms of avoiding crash, real time bidding etc.
Self-play,Proximal Policy Optimization(PPO) Policy Gradient
For our project, we will be using self play to train the agent and modify its policy using PPO or Policy Gradient.
Metrics - time it survives, tries to maximize its number of moves We expect the agent’s score to be really low at first, maybe 2-3 moves. But as we continue to train it, we expect its score to increase. We will give the agent a reward for every move it makes. After it is able to survive for 10+ move, we will start to increase its reward for move it continues to make after 10. The longer it stays alive, the more difficult it is to continue making moves because there are less spaces for the agent to move. Ideally, we would like to give the agent a higher reward for blocking off/killing an enemy, but it will be difficult to determine whether or not an action was the direct cause of an enemy dying.
We plan on using self play to teach the agent how to increase its lifespan. We will start by giving the player awards for staying alive longer. But as it learns to not run into itself or into walls, we will start decreasing that reward and increasing the reward for avoid enemy players’ tails. Once our agent gets good enough at avoiding enemies, we can start to increase our reward for blocking off enemy agents. Ultimately we will evaluate by taking part in the UCI RL Tron game competition competing against other agents and hopefully win the competition (moonshot).