• Our Lab
    • About
    • Research Themes
    • Gallery
    • Exhibitions
    • Workshops >
      • Workshop Info
      • FAQ
    • Intern Diaries
  • Projects
    • Flagship Projects
    • Summer Projects
  • Publications
  • Our Team
    • Professor Incharge
    • Alumni >
      • Batch 2014
      • Batch 2016
      • Batch 2017
      • Batch 2018
      • Batch 2019
      • Batch 2020
      • Batch 2021
      • Batch 2022
      • Batch 2023
    • Core Coordinators
    • Junior Year Coordinators
  • Contact
  • Spin-offs
    • Makxenia
    • AidBots
IvLabs
  • Our Lab
    • About
    • Research Themes
    • Gallery
    • Exhibitions
    • Workshops >
      • Workshop Info
      • FAQ
    • Intern Diaries
  • Projects
    • Flagship Projects
    • Summer Projects
  • Publications
  • Our Team
    • Professor Incharge
    • Alumni >
      • Batch 2014
      • Batch 2016
      • Batch 2017
      • Batch 2018
      • Batch 2019
      • Batch 2020
      • Batch 2021
      • Batch 2022
      • Batch 2023
    • Core Coordinators
    • Junior Year Coordinators
  • Contact
  • Spin-offs
    • Makxenia
    • AidBots

AI Snake

Overview

The project is an attempt to solve Classic Snake Game using Reinforcement Learning Tabular method i.e. Q-learning. The goal of the agent, the snake, is to maximize cumulative reward which it does by generating a table called Q-table which is used to decide action for each state. The values in this table are called state-action values as they represent the value of performing a particular action in a particular state. The state is information that an agent receives for a particular situation that is used to estimate future outcomes. The policy tells us about the behavior of the agent. Here the agent uses the Q-table to pick the most suitable action for the current state which returns a reward after each action.​
​Algorithm used-
​Epsilon greedy policy improvement
Picture
Q-Learning​​​
​
Picture

Approach

Snake is assumed to be near-sighted. Its state has four parameters-
  1. The direction of food relative to snakehead.
  2. Presence of obstacle or wall in the immediate left of snake’s head.
  3. Presence of obstacle or wall in the immediate left of snake’s head.
  4. Presence of obstacle or wall in the immediate left of snake’s head.
Action space has been re-initialized so that it will be relative to the head of the snake.
Reward function-
  • +1 for eating the food.
  • -1 for hitting its body or wall.
Except for the above two conditions, the default value of the reward function is 0.


Result

Training for 5000 episodes
Picture
  Testing policy
Picture

Applications

After training, the obtained policy can be used in different modes such as 2 or 3 snakes in the same environment simultaneously or in a custom environment with obstacles randomly placed.​

Conclusion

1st episode
Picture
           After 1000 episodes of training
Picture

Note

With this approach an optimal policy cannot be obtained because the snake can dodge only immediate obstacles, thus there are high possibilities that the snake will be entangled in its own body eventually cornered by obstacles from each side which will lead to the death of the snake.
​In the testing graph, it can be observed that the rewards are not consistent. This is due to the random position of the food each time and which affects the early entanglement of the snake.

GitHub Repository

Tools and Libraries used

Picture
Python
Picture
Picture
Matplotlib
Picture
Team Members:
  • Harsh Sharma
  • Om Bhise
  • Poojan Gandhi
  • Vinita Lakhani
  • Ahmed Hamzah
  • Shubham Vishwakarma 


Mentors:
  • Tanmay Pathrabe
  • Akshansh Malviya
  • ​Aneesh Shetye
Powered by Create your own unique website with customizable templates.
  • Our Lab
    • About
    • Research Themes
    • Gallery
    • Exhibitions
    • Workshops >
      • Workshop Info
      • FAQ
    • Intern Diaries
  • Projects
    • Flagship Projects
    • Summer Projects
  • Publications
  • Our Team
    • Professor Incharge
    • Alumni >
      • Batch 2014
      • Batch 2016
      • Batch 2017
      • Batch 2018
      • Batch 2019
      • Batch 2020
      • Batch 2021
      • Batch 2022
      • Batch 2023
    • Core Coordinators
    • Junior Year Coordinators
  • Contact
  • Spin-offs
    • Makxenia
    • AidBots