top of page

PPO AI directed dungeon generator

As part of my major project dissertation I created and trained a PPO AI that would model player flow based on enemy and player interactions and adjust the next dungeon level to suit the player's skill.

This was a 13 week university project at Falmouth University, created with Unity, C# and Unity's ML agent package.

Fig: 1. Oates 2022. Demo of AI

Screenshot of tech demo 4.JPG

Fig: 2. Oates 2022. Screenshot of tech demo

Research question

My dissertation's research question was “Could a dungeon generator with Proximal Policy Optimisation (PPO) Artificial Intelligence (AI), maintain flow in the player”. Games that use Procedural Content Generation (PCG) such as Procedural Dungeons or roguelikes will build rooms that are completely random and this can cause problems for the player as the challenges may not match their skills set. This could in turn disrupt their flow and possible enjoyment of the game. A modern example of bad dungeon generators would be the Chalice Dungeons in BloodBorne (FromSoftware Ltd, 2015).

 

I chose PPO as my AI of choice as compared to other Reinforcement Learning algorithms it is better at recognising rewards in its environment and can handle random events much better than other algorithms such as Deep Q-Network (DQN) (Barros, 2021). The randomness recognition is particularly important as it could be argued that the player is the most random element in a game. Additionally, compared to other Reinforcement Learning systems PPO is the fastest to train (Holler et al. 2019), however when using existing data, it is not as efficient as other algorithms (Larsen et al. 2021).

 

Despite this disadvantage PPO AI is more than capable as demonstrated in the graph below, which shows PPO was able to maintain 100% progress throughout different environments compared to other reinforcement learning algorithms (Larsen et al. 2021).

PPO progress example bar .jpg

Fig: 3. Larsen et al. ca. 2021. Comparison of reinforcement learning algorithm on a variety of environments

Flow is defined as a psychological state between the person’s skills and the challenge they have to complete, Ideally to reach flow the challenge shouldn’t be too hard but not too easy either (Beard and Csikszentmihalyi. 2015).

AI Artefact

The artefact for my dissertation was a dungeon generator influenced by a PPO AI that creates a model of the player’s flow based on the AI’s environment. The dungeon generator created 3 small levels of dungeons with enemies of different difficulty to provide challenge. The first level was completely randomly generated with easy rooms and opponents to ensure they can complete the level and so that the director can collect data on the player. After that, the second and third level were built with rooms picked by the AI based on the player’s predicted flow state, which was determined by player and enemy interactions. This was in order to maintain flow in the player.

The goal of the player was to use the in-game compass and make their way through the dungeon to reach a teleporter that would take them to the next level. Within the dungeon, there were enemies to block the player and provide challenge.

Multiple AI agents were trained at once to save time and the training environments were basic simulations interoperating how the AI's decision would affect the values representing the player's skill and challenge. This was done to train the AI to accurately model the player's flow and make appropriate decisions.

Third party assets uses

Third party assets were used for the tech demo to allow for focus to be put on the AI itself, its respected systems and training system. The used third-party assets are:

 

The assets for the player mechanics, enemy AI mechanics and dungeon generator were modified to ensure that they work together and with the custom systems.

Additionally, Unity's Machine Learning (ML) agent package was used to provide PPO AI support when building the AI system. However, the systems that measure flow and the training simulations were all custom features. Additionally, custom features like the level manager, compass and teleporter were made from scratch.

Further enhancements

If this project was taken further I would utilise different training simulations to ensure that the AI would make more accurate decisions when modeling flow. 

To ensure the AI's flow model is more refined, the goal of the tech demo will require the players to kill a set number of enemies in order to progress. This is to avoid passive playstyle which introduced data poisoning.

For data collection, further play testing would be conducted for longer periods of time to collect enough data for accurate results.

Finally, if proven successful the AI and tech demo would be turned in a polished product with the AI being used for a small scale game.

Bibliography

  • BARROS, Pablo, Ana TANEVSKA and Alessandra SCIUTTI. 2021. Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game.​

  • BEARD, Karen Stansberry and Mihaly CSIKSZENTMIHALYI. 2015. 'Theoretically Speaking: An Interview with Mihaly Csikszentmihalyi on Flow Theory Development and Its Usefulness in Addressing Contemporary Challenges in Education'. Educational Psychology Review, 27(2), 353-364.

  • BloodBorne.2015. FromSoftware Ltd, FromSoftware Ltd.

  • Holler, J et al. Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem. 2019. Beijing, China: IEEE.

  • LARSEN, Thomas Nakken et al. 2021. 'Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters'. Frontiers in Robotics and AI, 8.

Figure List

Figure 1: Oates 2022. Demo of DQN AI.

Figure 2: Oates 2022. Screenshot of tech demo.

Figure 3: Larsen et al. ca. 2021. Comparison of reinforcement learning algorithm on a variety of environments.

© 2020 by Max Oates. Proudly created with Wix.com

bottom of page