-
-
Notifications
You must be signed in to change notification settings - Fork 33k
Initial AGENTS.md and CLAUDE.md with LLM agent instructions for the repo #139388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I expect we'll rename this AGENTS.md in the future once the dust settles around the pile of agentic instruction file naming patterns. Starting here as many of the core team now have Claude Max access.
I recommend naming this AGENTS.md from the start. That seems to be the most common "standard" at the moment. Also, I don't think we should be advertising or endorsing a specific AI vendor/model/tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very useful agents file for letting AI work on CPython!
I worry about giving an AI tools such complete CLI access to a workspace though, e.g. providing access to gh
through which it could potentially wreak havoc on our GitHub repo.
Further, this could be seen as a recommendation or endorsement of using AI for CPython development in such a manner, and perhaps that requires more warnings and instruction how to do this safely, e.g. run all of this in a sandboxed environment.
I have a feeling committing this to the repo would be a bit premature at this point.
That being said, I would very much like to advance our use of AI for CPython development in a way that is safe and thoughtful, if doing so can enable us to be more effective. Perhaps we should start something like a working group of interested core devs? WDYT @gpshead?
This isn't something to worry about from an instructions file. This is something to understand when using agents, you need to control what access you give them. And good agents ask for permissions before taking actions which you can grant once/session/always, etc. You can create limited scope GitHub API tokens (read only, specific repos, etc) for the gh CLI if you are worried about this. |
Simplifies a couple other less important statements. (no need to mention graphql and the tmux debugging story works but isn't widely used or well fleshed out yet)
I added a statement up top to make that more clear that we expect a human to be in the loop and have reviewed the work before making a PR. The existence of a config should not be seen as endorsement, it's basically a "config file" for models. Though the optics of its existence and being annoyed at the plethora of brand-name.md files before agents.md appeared is why I didn't push for adding one of these earlier - I needed to know that enough of the core team had meaningful access to such tooling first.
It isn't for us as a project to explain how to do that any more than it is up to us to tell people how to use vi, emacs, pycharm, or vscode. That is out of place for us to say. What level of sandboxing people want is up to them and all of the tooling choices they make (many already offer different sandboxing strategies). We as a project don't control that. Fundamentally working on open source is all about downloading code from the internet from somewhere you think you trust and executing it without having read it. I'd give the same advice to people about sandboxing their human use dev environments away from things with credentials, yet hardly anyone actually does because it isn't the easy thing to do.
What would change your mind on this being premature? Getting a discussion going among existing users first? Many have access to these tools already. The existence of this makes everyone's agentic experience in the repo better by default and provides a place to build up meaningful commonly needed instructions. Without this, everyone using agent tooling either has to manually recreate their own local versions of these instructions in each clone+worktree they have or suffer with poorer performing agents or a frustrating need to repeat explanations of how to work in our repo during every session. With no easy way to share and build upon one another's instructions. Not being able to share is painful. I do think a Discord channel for the topic is worthwhile (I'm following up on that), but I'd love to end the bulk of sneakernet style manually copying of repo specific agent instructions around by putting a baseline common set in-repo. A key takeaway from the discussion at the sprint seemed to be that we need to ensure our devguide talks about expectations from contributors when agents are used. I see that as orthogonal to this though. Both would be useful but I don't think we need to block on that. (I want to followup with docs folks on getting something in the devguide regardless) |
AI agents are a lot newer than text editors. I think it's certainly worth at least a section in the devguide. The risk profile here is genuinely novel; I think the comparison to using open source code only holds to a limited extent. Secondarily, whatever CPython does or says will be seen and emulated by folks who aren't core devs in projects that aren't CPython (a la PEP 8) |
Full speed ahead! Given recent progress, this is the future. Core devs aren't naive about tech, and already seem well aware of that AI-generated stuff has to be taken with caution. |
Things are indeed moving fast; the concern is not as much about AI-generated stuff as much as the process of doing the AI generation, where these days AI agents are able to perform actions in potentially privileged environments. Your comment makes me want to at least note this fact somewhere! :-) |
Agreed. We're growing quite a few top-level files and it would be nice to use a dotfile or dotdir, but And looks like Claude only supports
Can we also put a "code comment" at the top to direct humans to our policy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can those agents *.md files live in their own .agents/config
files? We already have many files at the root of the repository and as I'm not using AI at all (as probably many others), I don't think we want even more visible files.
If tools such as `rg` (ripgrep), `gh`, `jq`, or `pre-commit` are not found, ask | ||
the user to install them. ALWAYS prefer using `rg` rather than `find` or `grep`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have ripgrep nor do I have pre-commit installed, and I don't like to install more tools than I need. I think it is better to rely on find and grep instead of installing the tools when possible.
|
||
# Expanding your knowledge | ||
|
||
* ALWAYS load a `.agents/pr-{PR_NUMBER}.md` or `.agents/branch-{branch_name_without_slashes}.md` file when you are told a PR number or when the current git branch is not `main`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My branch names always contain slashes so that I can sort them as if they were directories. Would this make it impossible?
|
||
## Optional developer guides and PEPs | ||
|
||
* If `REPO_ROOT/../devguide/` exists, its `developer-workflow/` and `documentation/` subdirs contain detailed info on how we like to work on code and docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow the AI agent to actually access something outside the directory? Can't we ask it to actually directly read the webpages instead when possible?
|
||
# Source code | ||
|
||
* The runtime is implemented in C as are many of the core types and extension modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe each sentence should end with a period (I don't know how sensitive an agent can be).
* The Python standard library (stdlib) itself lives in the `Lib/` tree | ||
* stdlib C extension modules live in the `Modules/` Tree | ||
* builtins, objects, and the runtime itself live in `Objects/` and `Python/` | ||
* NEVER edit files in a `**/clinic/**` subdirectory; those are generated by argument clinic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* NEVER edit files in a `**/clinic/**` subdirectory; those are generated by argument clinic | |
* NEVER edit files in a `**/clinic/**` subdirectory; those are generated by Argument Clinic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "never edit files specified in .gitattributes
as generated", because we have many generated files.
ONLY build in a `build/` subdirectory that you create at the repo root. | ||
|
||
* Use sub-agents when running configure and make build steps | ||
* `REPO_ROOT` is the root of the cpython git repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `REPO_ROOT` is the root of the cpython git repo | |
* `REPO_ROOT` is the root of the CPython git repo |
* Use sub-agents when running configure and make build steps | ||
* `REPO_ROOT` is the root of the cpython git repo | ||
* let `BUILD_DIR=REPO_ROOT/build` | ||
* Setup: `cd BUILD_DIR && ../configure --with-pydebug` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Setup: `cd BUILD_DIR && ../configure --with-pydebug` | |
* Setup: `cd BUILD_DIR && mkdir -p "$(pwd)/dist" && ../configure --with-pydebug --prefix="$(pwd)/dist"` |
You should also say that if the underlying distribution is openSUSE for instance, then the configure should add --with-platlibdir=lib64
. I wouldn't want a configuration where hitting make install
actually installs anything in /usr/local/
.
* `REPO_ROOT` is the root of the cpython git repo | ||
* let `BUILD_DIR=REPO_ROOT/build` | ||
* Setup: `cd BUILD_DIR && ../configure --with-pydebug` | ||
* `make -C BUILD_DIR -j $(nproc)` will rebuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it actually use "BUILD_DIR" as is or would it make the substitution? maybe an explicit substitution would be better.
|
||
* After editing C code: `make -C BUILD_DIR && BUILT_PY -m test relevant_tests` | ||
* After editing stdlib Python: `BUILT_PY -m test relevant_tests --match specific_test_name_glob` (no rebuild needed) | ||
* After editing .rst documentation: `make -C BUILD_DIR/Doc check` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the Doc
folder won't be copied so this step fail. Checking docs should be done directly at the top-level, that is make -C REPO_ROOT/Doc check
. Note that we should also say that make -C REPO_ROOT/Doc venv
is a setup to be executed once.
* After editing C code: `make -C BUILD_DIR && BUILT_PY -m test relevant_tests` | ||
* After editing stdlib Python: `BUILT_PY -m test relevant_tests --match specific_test_name_glob` (no rebuild needed) | ||
* After editing .rst documentation: `make -C BUILD_DIR/Doc check` | ||
* Before committing: `make -C BUILD_DIR patchcheck && pre-commit run --all-files` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead suggest using uvx pre-commit ...
here. Installing uv instead of pre-commit globally may be better for the user (I personally have uv installed but I don't have any pre-commit binary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not a fan of committing these files to the repo/top level, both (a) as I fear more low-effort LLM driven contributions that might feel emboldened by the presence of this file and (b) as the LLM tool landscape is moving sufficiently quickly that we might want to change this in 3, 6, 12 months, and in general we seek to avoid needless churn.
Could we consider adding this file instead to the devguide, perhaps with a command to run to fetch a copy into one's local clone of the repo? That would meet the goal of having a centrally maintained copy, but also would make it more 'opt-in'
A
(Disclosure: I currently have gratis access to GitHub's Copilot and Anthropic's Claude Code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - I'd either opt to have this referenced in the devguide, or alternatively in Tools/
since we seem to have other miscellaneous scripts and experimental feature tooling in there.
* The Python standard library (stdlib) itself lives in the `Lib/` tree | ||
* stdlib C extension modules live in the `Modules/` Tree | ||
* builtins, objects, and the runtime itself live in `Objects/` and `Python/` | ||
* NEVER edit files in a `**/clinic/**` subdirectory; those are generated by argument clinic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "never edit files specified in .gitattributes
as generated", because we have many generated files.
* `Lib/test/test_csv.py` are tests for `csv` | ||
* C header files are in the `Include/` tree | ||
* Documentation is written in .rst format in `Doc/` - this is source for the public facing official Python documentation. | ||
* CPython internals are documented for maintainers in @InternalDocs/README.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providing file as a context.
* Be consistent with existing nearby code style unless asked to do otherwise. | ||
* NEVER leave trailing whitespace on any line. | ||
* ALWAYS preserve the newline at the end of files. | ||
* We do not autoformat code in this codebase. If the user asks you to run ruff format on a specific file, it can be found in `Doc/venv/bin/ruff`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* We do not autoformat code in this codebase. If the user asks you to run ruff format on a specific file, it can be found in `Doc/venv/bin/ruff`. | |
* We do not autoformat code in this codebase. If the user asks you to run `ruff format` on a specific file, it can be found in `Doc/venv/bin/ruff`. |
* NEVER leave trailing whitespace on any line. | ||
* ALWAYS preserve the newline at the end of files. | ||
* We do not autoformat code in this codebase. If the user asks you to run ruff format on a specific file, it can be found in `Doc/venv/bin/ruff`. | ||
* NEVER add Python type annotations to Python code in the `Lib/` tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some code is annotated there, like _pyrepl
. maybe suggest not adding annotations to files that do have them?
* let `BUILD_DIR=REPO_ROOT/build` | ||
* Setup: `cd BUILD_DIR && ../configure --with-pydebug` | ||
* `make -C BUILD_DIR -j $(nproc)` will rebuild | ||
* Check what OS you are running on. Let `BUILT_PY=BUILD_DIR/python` or `BUILD_DIR/python.exe` on macOS (.exe is used to avoid a case insensitive fs name conflict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Check what OS you are running on. Let `BUILT_PY=BUILD_DIR/python` or `BUILD_DIR/python.exe` on macOS (.exe is used to avoid a case insensitive fs name conflict) | |
* Check what OS you are running on. Let `BUILT_PY=BUILD_DIR/python` or `BUILD_DIR/python.exe` on macOS (`.exe` is used to avoid a case insensitive file system name conflict) |
## Running our built Python and tests | ||
|
||
* ALWAYS use sub-agents when running builds or tests | ||
* NEVER use `pytest`. CPython tests are `unittest` based. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* NEVER use `pytest`. CPython tests are `unittest` based. | |
* NEVER use `pytest`. In CPython for tests we use an interal tools called `regrtest`, which is `unittest` based. |
* Individual test files can be run directly using `BUILT_PY Lib/test/test_csv.py` | ||
* `BUILT_PY -m test test_zipfile -j $(nproc)` will properly run all `Lib/test/test_zipfile` tests | ||
* `BUILT_PY -m test` supports a `--match TESTNAME_GLOB` flag to run specific tests, pass `--help` to learn its other capabilities. NEVER try to pass `-k` as this is not pytest based, use `--match` instead. | ||
* `make -C BUILD_DIR clinic` will regenerate argument clinic generated code. Do this after you've edited a corresponding input .c file in a way that changes a C extension module function signature or docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `make -C BUILD_DIR clinic` will regenerate argument clinic generated code. Do this after you've edited a corresponding input .c file in a way that changes a C extension module function signature or docstring | |
* `make -C BUILD_DIR clinic` will regenerate argument clinic generated code. Do this after you've edited a corresponding input `.c` file in a way that changes a C extension module function signature or docstring |
FYI - I'll respond to specific comments later. I love ❤️ all the feedback here. A few notes: CLAUDE.md can be moved to a .claude/ subdirectory file and work where it would be less obtrusive (I'd still have it just @ the top level AGENTS.md - which is Claude Code's form of a direct C #include) - it has been noted at that different tools have different features for these files beyond just the text itself, but I the way models work is more squishy just like us so I don't think that matters greatly - any top model is going to figure out the meaning regardless even if it takes another turn or two (a turn being a round trip to the model and back to a tool use)... All that said: I actually have good feels around the devguide idea and simple command ( re: some of the comments on specifics in the file - that was entirely expected. these instructions were significantly what I wrote for two major generations of model ago to use Claude Sonnet 3.7 and are some are over specific and I made some typos recently (agents get over those but it's smoother if corrected, will do). Sonnet 4.5 is far better, and learning what to put in these that makes an impact vs not is a continual process. Instructions can be simplified (which is generally the best in the face of disagreement of which approach to take for now). at a high level the point of these instructions is to be just enough but not hyper specific. the agents merely need to know the basics but already understand how to fill in and discover local details. ie: we don't use pytest, here's how to use regrtest, tests live here.... they figure out the exceptions to that pattern on their own quite nicely. more later. |
Many of the core team now have Claude Max access or similar.
The norm is for these files to be updated over time as people see fit, never to be perfect.
Ex: Someone windows enabled can fill in Windows build instructions in the future.