Pokémon Red, fully controlled by an LLM 🤖🎮
This repo contains a minimal, hack‑able agent that teaches large language models to play Pokémon Red inside the PyBoy Game Boy emulator.
Forked from the excellent
portalcorp/ClaudePlaysPokemon
and extended with the OpenAI Responses API so it can run both the o3
and o4‑mini
models alongside Anthropic Claude. Anthropic remains the default
provider (see --provider
flag below).
Project by Lander Media / Steve Moraco. Initial agent code by o4‑mini.
- Declarative function‑calling interface – the model calls the tools
press_buttons
andnavigate_to
(path‑finding helper enabled by default). - Screenshot‑based gameplay – what the model “sees” is precisely what is on the screen, delivered as a PNG each step (hex‑encoded over WebSocket).
- FastAPI + WebSockets live UI – watch the game, pause, resume, load save
states, and inspect the model’s thoughts in real time at
http://localhost:<port>
. - Automatic log folder per run (frames, model messages, structured game log).
- Context summarisation to keep the conversation within token limits.
-
Clone this repository:
git clone <repo-url> cd <repo-directory>
-
Install Python dependencies (Python ≥3.10 recommended):
pip install -r requirements.txt
-
Provide an API key for your preferred provider:
- Anthropic (𝚍𝚎𝚏𝚊𝚞𝚕𝚝):
export ANTHROPIC_API_KEY="sk-ant-…"
- OpenAI (when running with
--provider openai
):
export OPENAI_API_KEY="sk-openai-…"
-
Place a Pokémon Red ROM (
pokemon.gb
) in the project root (or point to it with--rom
).
The entry‑point is main.py
. It both spins up a FastAPI server and starts
the agent. All interaction happens through the web UI – no separate headless
mode is needed.
# Quick start – Anthropic Sonnet playing 1 000 000 steps (~10 weeks), UI on port 3000
python main.py --rom pokemon.gb --steps 1000000
# Use OpenAI o4‑mini instead
python main.py --provider openai --model o4-mini
Key flags:
--rom <file.gb>
– path to the Pokémon Red ROM (default:pokemon.gb
)--steps <N>
– maximum steps to execute (agent can be paused / resumed). Default is1_000_000
(~30 frames × 10 weeks).--port <N>
– port for the FastAPI server / web UI (default 3000)--save-state <file.state>
– load a PyBoy save state at startup--overlay
– draw walkable‑tile overlay inside the game feed--provider anthropic|openai
– choose LLM backend (default: anthropic)--model <name>
– override default model for the chosen provider
Open http://localhost:<port>
in a browser to see:
- Game Screen – live 30 FPS video
- Assistant Messages – the model’s tool calls & high‑level reasoning
- Context History – compressed conversation so far
- Controls – Run, Pause, Stop, Load Save State
Each run writes to logs/run_<timestamp>/
:
frames/
: PNG screenshots per stepclaude_messages.log
: model response logsgame.log
: emulator and agent logs
Inside each run folder you will also find history_saves/
containing periodic
PyBoy .state
snapshots. These are written automatically:
- Whenever the agent summarises the running conversation (~every 50 steps).
- Immediately after the player transitions between major areas (e.g. moves to another floor or map).
You can resume from any snapshot by either:
• Supplying --save-state <file>
on the command line, or
• Clicking Load Save in the web UI and selecting a .state
file.
Global defaults live in config.py
:
MODEL_NAME
– default Anthropic model (CLI--model
overrides)TEMPERATURE
– sampling temperature passed to the LLMMAX_TOKENS
– hard limit for the response sizeUSE_NAVIGATOR
– toggle the higher‑levelnavigate_to
tool (default: True)
PRs welcome! Please open issues or pull requests 😊