Skip to content

feat: Add pixi environment #534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 17, 2025
Merged

Conversation

matthewfeickert
Copy link
Contributor

@matthewfeickert matthewfeickert commented Feb 17, 2025

Note

I'm opening this in draft mode now just to get things going, but I'll need to iterate with @rabst on design (pixi.toml vs. pyproject.toml) and get all the other parts of the CI moved over later.

* Add pixi manifest pixi.toml for Linux x86, macOS arm64, Windows 64.
@matthewfeickert matthewfeickert force-pushed the feat/add-pixi-environment branch from 5d35081 to bd81543 Compare February 17, 2025 01:34
* Enable workflow dispatch.
* Add concurrency limits.
* Use pixi for CI workflow.
* Unify to a single workflow for all OS tested
@matthewfeickert matthewfeickert force-pushed the feat/add-pixi-environment branch from bd81543 to b6be9e3 Compare February 17, 2025 01:41
@rasbt
Copy link
Owner

rasbt commented Feb 17, 2025

Oh wow, thanks!

One question, do you actually need the .gitattributes and pixi.lock files here or are they optional? I am asking because I would try as hard as possible to reduce the number of files in the main folder (because I do sometimes get emails from people asking what these files are for.)

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Feb 17, 2025

One question, do you actually need the .gitattributes and pixi.lock files here or are they optional? I am asking because I would try as hard as possible to reduce the number of files in the main folder (because I do sometimes get emails from people asking what these files are for.)

@rasbt This is personal choice. If you view this repo as yours for actually developing the llms-from-scratch library then I think I would remove the pixi.lock and just have people solve on install with pixi (still quite fast). If you view this repo as an educational resource (e.g. an application) for people to get up and running then I would for sure keep the lock file.

In that case I'd also keep the autogenerated .gitattributes for the reason it mentions:

$ cat .gitattributes 
# SCM syntax highlighting & preventing 3-way merges
pixi.lock merge=binary linguist-language=YAML linguist-generated=true

Though seems that I'll have to debug (later, sorry) why Windows doesn't pick up the existence of tensorflow-cpu

$ pixi run pytest --ruff setup\02_installing-python-libraries\tests.py
...
============================= test session starts =============================
platform win32 -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0
rootdir: D:\a\LLMs-from-scratch\LLMs-from-scratch
configfile: pyproject.toml
plugins: anyio-4.8.0, nbval-0.11.0, ruff-0.4.1
collected 2 items
setup\02_installing-python-libraries\tests.py .F                         [100%]
================================== FAILURES ===================================
__________________________________ test_main __________________________________
capsys = <_pytest.capture.CaptureFixture object at 0x000001A4037B5420>
    def test_main(capsys):
        main()
        captured = capsys.readouterr()
>       assert "FAIL" not in captured.out
E       AssertionError: assert 'FAIL' not in '[OK] torch ...util 6.1.1\n'
E         
E         'FAIL' is contained here:
E           b 3.10.0
E           [FAIL] tensorflow-cpu 0.0, please install a version matching >=2.18.0
E         ?  ++++
E           [OK] tqdm 4.67.1
E           [OK] numpy 2.0.2
E           [OK] pandas 2.2.3
E           [OK] psutil 6.1.1
setup\02_installing-python-libraries\tests.py:14: AssertionError
=========================== short test summary info ===========================
FAILED setup/02_installing-python-libraries/tests.py::test_main - AssertionError: assert 'FAIL' not in '[OK] torch ...util 6.1.1\n'
  
  'FAIL' is contained here:
    b 3.10.0
    [FAIL] tensorflow-cpu 0.0, please install a version matching >=2.18.0
  ?  ++++
    [OK] tqdm 4.67.1
    [OK] numpy 2.0.2
    [OK] pandas 2.2.3
    [OK] psutil 6.1.1
========================= 1 failed, 1 passed in 5.74s =========================

as it is installed

$ pixi list --environment tests
Environment: tests
Package                        Version                Build                       Size       Kind   Source
...
tensorflow_cpu                 2.18.0                                             17.4 KiB   pypi   tensorflow_cpu-2.18.0-cp310-cp310-win_amd64.whl
...

@rasbt
Copy link
Owner

rasbt commented Feb 17, 2025

Thanks!

And regarding

Though seems that I'll have to debug (later, sorry) why Windows doesn't pick up the existence of tensorflow-cpu

I think it was just a spelling issue that the package is called tensorflow-cpu but is imported as tensorflow_cpu.

@matthewfeickert
Copy link
Contributor Author

matthewfeickert commented Feb 17, 2025

I think it was just a spelling issue that the package is called tensorflow-cpu but is imported as tensorflow_cpu.

Yeah, that's what it looks like from my side too. I can put a sed equivalent form in Powershell in there to fix it.

edit: Ah, your fix is much nicer. :)


[dependencies]
python = "3.10.*"
pytorch-cpu = ">=2.6.0,<3"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I just put in pytorch-cpu for quick debugging here, but obviously pytorch-gpu is the desired target. pytorch-cpu can be used in a tests-ci environment that would be designed for GitHub Actions where you aren't going to have GPUs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can we just put torch so it handles it based on the OS similar to what pip does with the wheels?

Copy link
Contributor Author

@matthewfeickert matthewfeickert Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rasbt No, as conda/mamba/pixi don't define things by OS, they define things be environment. If pytorch is asked for and there's no CUDA support already defined in the existing environment or environment definition file the pytorch will resolve to pytorch-cpu. If there's something like cuda-version defined as additional dependencies or a cuda version in a pixi system-requirements table then the solver will resolve pytorch to CUDA enabled pytorch. And if you explicitly ask for pytorch-gpu it will get the additional CUDA dependencies it needs to ensure that the version of PyTorch installed will be GPU compatible.

Small example:

# mkdir pytorch-check
# cd pytorch-check/
# pixi init
✔ Created /tmp/pytorch-check/pixi.toml
# pixi add pytorch
✔ Added pytorch >=2.6.0,<3
# pixi list pytorch
Package  Version  Build                       Size      Kind   Source
pytorch  2.6.0    cpu_mkl_py313_he6a733d_100  26.9 MiB  conda  pytorch
# rm pixi.toml pixi.lock
# pixi init
# pixi add pytorch-gpu
# pixi list pytorch
Package      Version  Build                   Size       Kind   Source
pytorch      1.11.0   cuda111py310h385535d_1  628.1 MiB  conda  pytorch
pytorch-gpu  1.11.0   cuda111py310h3d52e46_1  9.5 KiB    conda  pytorch-gpu
# pixi list cuda
Package       Version  Build        Size       Kind   Source
cuda-version  11.1     hdbd7af8_3   20.5 KiB   conda  cuda-version
cudatoolkit   11.1.1   hb139c0e_13  929.6 MiB  conda  cudatoolkit
# rm pixi.toml pixi.lock
# pixi init
# echo -e '\n[system-requirements]\ncuda = "12"' >> pixi.toml
# pixi add pytorch
# cat pixi.toml
[project]
authors = ["Matthew Feickert <[email protected]>"]
channels = ["conda-forge"]
name = "pytorch-check"
platforms = ["linux-64"]
version = "0.1.0"

[tasks]

[dependencies]
pytorch = ">=2.6.0,<3"

[system-requirements]
cuda = "12"
# pixi list pytorch
Package  Version  Build                           Size      Kind   Source
pytorch  2.6.0    cuda126_mkl_py313_h7373160_300  27.4 MiB  conda  pytorch
# pixi list cuda
Package               Version  Build       Size       Kind   Source
cuda-crt-tools        12.8.61  ha770c72_1  26.6 KiB   conda  cuda-crt-tools
cuda-cudart           12.8.57  h5888daf_1  22.2 KiB   conda  cuda-cudart
cuda-cudart_linux-64  12.8.57  h3f2d84a_1  188.4 KiB  conda  cuda-cudart_linux-64
cuda-cuobjdump        12.8.55  hbd13f7d_0  227.1 KiB  conda  cuda-cuobjdump
cuda-cupti            12.8.57  hbd13f7d_0  1.8 MiB    conda  cuda-cupti
cuda-nvcc-tools       12.8.61  he02047a_1  24.5 MiB   conda  cuda-nvcc-tools
cuda-nvdisasm         12.8.55  hbd13f7d_0  4.9 MiB    conda  cuda-nvdisasm
cuda-nvrtc            12.8.61  hbd13f7d_0  63.1 MiB   conda  cuda-nvrtc
cuda-nvtx             12.8.55  hbd13f7d_0  31 KiB     conda  cuda-nvtx
cuda-nvvm-tools       12.8.61  he02047a_1  23.5 MiB   conda  cuda-nvvm-tools
cuda-version          12.8     h5d125a7_3  20.6 KiB   conda  cuda-version

With conda/mamba/pixi it is better to be more explicit about the environment you want, as they have the ability to actually resolve it correctly.

@rasbt rasbt force-pushed the feat/add-pixi-environment branch from e4da5d2 to 49c5a70 Compare February 17, 2025 15:14
@rasbt rasbt marked this pull request as ready for review February 17, 2025 15:27
Comment on lines +45 to +48
uv python install 3.10
uv add . --dev
uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
uv add pytest-ruff nbval
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to use uv add here, or can you just do

uv pip install --system .
uv pip install --system -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
uv pip install --system pytest-ruff nbval

and then skip manually activating the virtual environment for each step?

Given that you do this for pip as well I assume there's a reason(?).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is for testing purposes, too.

In addition, uv add . --dev installs from the pyproject.toml and should be faster. I think uv pip install --system . may be slower as it uses the pip algo.

The reason why I am doing

uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt

though is that this is bonus material 95% of the users don't care about, and so there is no pyproject.toml for this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think it is any faster. If anything it should be slower as it needs to open and edit a file (in the case of uv add pytest-ruff nbval) and then run uv pip install <whatever your uv add target was>. AFAIK, uv doesn't have multiple installer algorithms, it just has uv pip install and then higher level APIs on top of that. Would be interested if I'm wrong though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I thought but someone told me yesterday that uv add is supposed to be faster as it doesn't try to mimic pip legacy behavior. Should check some official docs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The docs (https://docs.astral.sh/uv/reference/cli/#uv-add) don't mention that, but maybe @charliermarsh can comment on if there's interesting under the hood behavior.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewfeickert or maybe I misinterpreted the "Yes" here

Screenshot 2025-02-17 at 2 26 24 PM

@rasbt
Copy link
Owner

rasbt commented Feb 17, 2025

I think it looks good now! Thanks again for all your help here setting this up. Learned a lot!

@rasbt rasbt merged commit a8b8eb4 into rasbt:main Feb 17, 2025
13 checks passed
@matthewfeickert matthewfeickert deleted the feat/add-pixi-environment branch February 17, 2025 20:15
@matthewfeickert
Copy link
Contributor Author

I think it looks good now! Thanks again for all your help here setting this up. Learned a lot!

Sure thing and hope it helps! Thanks also for creating the amazing educational resources that you do, in addition to all the tools you build, as well as the great educational posts and blogs!

I might make some follow up PRs during breaks from my job talk 😛 that can streamline the process a bit more. Also the pyproject.toml is missing a build-system table.

@rasbt
Copy link
Owner

rasbt commented Feb 17, 2025

Thanks! Haha, and I definitely wouldn't mind that.

Regarding streamlining... it may seem inefficient that there are pip/uv/pixi tests now, which seems like a lot. But it's so that I know all 3 things work because users may choose whatever they choose :).

Also, there are the uv-tests that are in separate Windows/Linux/macOS files. That's mainly to get the badge buttons to work properly. I tried doing it differently, but if you have them with a build matrix in the same .yml, you can't have separate badge descriptions based on the things I tried.

Screenshot 2025-02-17 at 2 30 10 PM

@matthewfeickert
Copy link
Contributor Author

Regarding streamlining... it may seem inefficient that there are pip/uv/pixi tests now, which seems like a lot. But it's so that I know all 3 things work because users may choose whatever they choose :).

Oh yeah that was all clear. By "streamlining" I mean "simplifying/speeding up individual processes" not "removing existing workflows".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants