The crossroads of AI and gaming are more than just a breeding ground for innovation; they are testing grounds for the future. Our latest venture pushes this boundary by integrating Large Language Models (LLMs) into the heart of role-playing games (RPGs). This project, rooted in open-source and community collaboration, puts LLMs to the test in a dynamic RPG universe. Here, LLMs face a world that changes with every decision, challenging them to devise strategies in real-time. But this is about more than gaming. It's about showcasing LLMs' capacity for decision-making in complex, evolving environments. This experiment hints at a broader horizon where AI assists humans in navigating not just new worlds, but the intricate decisions of our own.
This is an experiment to explore whether LLMs or some evolution of them, might transcend their role as content generators or next-token predictors to unravel complex scenarios and devise strategies for navigating through uncharted and thrilling territories.
Here is an example of a player controlled by LLM navigating around the RPG world.
A few things to note:
The text on the bottom is the output from LLM's reasoning as to why it chose that direction of movement, as well as notes on its future. This helps keep the strategy carried through various steps.
The LLM can navigate around the hazard and pick up health when its health is low.
You will also see how LLM balances exploration to find the exit portal (goal) while avoiding obstacles and restoring health.
No path planning or specific directions are provided to the LLM.
The Experiment Setup: A Journey Through Code and Decision
Employing the pyGame, we engineered a virtual realm filled with dangers, treasures, barriers, and a hidden gateway. The task for the LLM (7B Dolphin2.2-Mistral) was clear but demanding: to traverse and find a way out of this world. To accomplish this, the model had to interpret textual clues about its environment, making strategic choices to ensure both survival and exploration.
In this experiment with the LLM, we initially confronted a fundamental challenge: LLMs cannot perceive their environment visually. To bridge this gap, we meticulously crafted textual descriptions to represent the virtual world's visual elements from the LLM's perspective. This included detailing the player's location, nearby resources, obstacles, hazards, and the elusive exit, all within the model's "field of vision." We specified the position, direction, and distance of these elements relative to the player, effectively translating the visual cues of the game into a language the LLM could understand and interact with. This foundational step enabled the LLM to navigate, strategize, and ultimately seek escape within the complex, dynamically rendered RPG landscape.
Example of Field of Vision described in words:
Player Position: Player Position: [470, 390], Nearby Elements: resource at (493, 426) is to the South East, 42.72 units away., hazard at (478, 438) is to the South, 48.66 units away.
Key Findings and Technical Insights
Our journey through this digital landscape yielded fascinating insights:
Prompt Engineering: Finding the right prompts was crucial. We iterated over various versions to effectively communicate the task to the LLM, highlighting the importance of clear, concise instructions in AI-driven tasks.
Understanding Coordinates: A pivotal moment came when we refined our approach to explaining the coordinate system. This ensured the LLM could make informed decisions on direction, significantly improving navigational efficacy.
Memory Integration: Adding memory to the LLM's capabilities was a game-changer. It prevented repetitive behaviors, such as moving back and forth in the same area, enhancing the model's ability to explore efficiently.
Risk and Recovery: Interestingly, the LLM would occasionally choose paths that led to hazards. However, it quickly prioritized finding resources to regain health, demonstrating adaptability in its decision-making process.
Navigational Skills: Without relying on additional navigational algorithms, the LLM showcased impressive skills in finding its way through the RPG world, underscoring the potential of LLMs in spatial reasoning and exploration.
Rationalizing Decisions: We enabled the LLM to articulate reasons behind its directional choices. The insights offered were intriguing, shedding light on the model's thought process and its approach to problem-solving.
Here is an example where the Memory of strategy has not been enabled. You will notice the LLM does a lot of back and forth without a clear goal.
Open Source and Community Engagement
In line with our commitment to open-source principles and fostering community engagement, we are excited to share the code for this experiment on GitHub (Link Coming Soon). This not only allows for transparency but also invites collaboration, experimentation, and feedback from the wider community. It's an opportunity for enthusiasts, researchers, and developers to dive into the intricacies of LLM-powered gaming characters and explore the limits of AI-driven content creation.
Reflections and Next Steps
This exploration into the capabilities of open-source LLMs within RPG settings reveals the vast potential of AI in enhancing interactive storytelling, game design, and simulation. The findings underscore the importance of clear communication, memory integration, and the ability to adapt and rationalize within AI models.
As we move forward, our focus remains on deepening our understanding of AI's role in gaming, contributing to the open-source community, and encouraging a collaborative approach to innovation. We invite you to join us in this journey, to experiment, learn, and perhaps even discover new realms of possibility within the fusion of AI and gaming.
Part 2 of this series is here.
Comments