Skip to content

Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Ph0rk0z opened this issue Feb 24, 2025 · 13 comments
Open
1 task done

Comments

@Ph0rk0z
Copy link

Ph0rk0z commented Feb 24, 2025

NVIDIA Open GPU Kernel Modules Version

565

Operating System and Version

Ubuntu 22.04

Kernel Release

n/a

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Build Command

none

Terminal output/Build Log

none

More Info

If you look at https://developer.nvidia.com/cuda-toolkit-archive, the archive stop at 12.6.3 and so the drivers cannot be downloaded.

The 12.6.3 driver is still a PR. Basically there is no repo to build against if you want up to date cuda. Just pick a random version and cross fingers?

@aikitoria
Copy link

I maintain more up to date branches here: https://github.com/aikitoria/open-gpu-kernel-modules

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Mar 8, 2025

Thanks. Will try it out. My next hurdle is that I also have a 22g 2080ti. If it doesn't affect the 3090s, I'm golden. if it does I guess I just install the closed source modules and frown.

edit: it still doesn't match cuda repo drivers. So I have to d/l cuda repo, dl the updated driver after. We'll see how that works out. Maybe I try my luck with drivers only since I am using conda for all environments.

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Mar 9, 2025

So I tested it with the provided driver (.04) and it works. Only a slight speed bump, much lower latency. Worse issue is not having 570.86.10 like cuda repo. You can't install it's driver packages and ubuntu assumes it's not installed. To compile NCCL and run the perf test, I needed to manually copy the libs, set paths. Plus more bleeding edge driver rather than the tested/shipped one in cuda may not be better for performance overall. 2080ti doesn't seem to affect anything.

I mean, we are using this with cuda, right? Why make it hard?

@aikitoria
Copy link

I'm not sure about ubuntu I don't use it, I use Debian testing myself, didn't have any problems installing the 570.124.04 driver and the cuda 12.8 toolkit there. If you need a specific other version of the driver you can probably just rebase a close version to it with no merge conflicts, the change is very simple.

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Mar 9, 2025

Upon further inspection, I think the issue might be nvidia. The driver versions they put in their kernel repo don't match what they package. So the only way to have it work seamlessly would be to rip the source that is shipped and apply the patches on top of that.

For instance, on my system, clinfo acts like no opencl is present despite the libs being under /usr/local/cuda. Even when added to the paths it doesn't all pick up. You can't install all packages relating to these things (like cuda runtime) because they pull in their own driver files too. I doubt it's distro dependent.

Pytorch/conda seem to work just fine as-is so it's just a matter of fixing things for the rest of it.

@LuosjDD
Copy link

LuosjDD commented Mar 9, 2025

I maintain more up to date branches here: https://github.com/aikitoria/open-gpu-kernel-modules我在这里维护更更新的分支:https://github.com/aikitoria/open-gpu-kernel-modules

I installed your branch and did some testing, but I cannot test the simpleP2P pass on the 2x RTX5070 Ti. The 2x RTX4070TiS works well.
I have confirmed that RTX5070Ti BAR1 size meets the requirements, and the CUDA samples version is 12.8. Do you have any suggestions for this CUDA error? By the way, you should enable the issues Feature on your branch.

`[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
Checking GPU(s) for support of peer to peer memory access

Peer access from NVIDIA GeForce RTX 5070 Ti (GPU0) -> NVIDIA GeForce RTX 5070 Ti (GPU1) : Yes
Peer access from NVIDIA GeForce RTX 5070 Ti (GPU1) -> NVIDIA GeForce RTX 5070 Ti (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
CUDA error at /home/a/test/cuda-samples/Samples/0_Introduction/simpleP2P/simpleP2P.cu:170 code=719(cudaErrorLaunchFailure) "cudaEventSynchronize(stop_event)"
`

@aikitoria
Copy link

I don't currently have a system with multiple 50 series GPUs, so I'm unable to test it on that generation. It's possible the patch needs further changes. You will be on your own making it work for now

@Panchovix
Copy link

Panchovix commented Apr 7, 2025

Just wondering, does applying P2P affects graphics workloads? (Not compute)

When applying the patch of aikitoria, P2P works fine on 2 of my 4090s, but when trying to do graphics on either or a 5090 I have on the same PC, card doesn't boost as it should (Wine). I can confirm that re installing the driver from scratch fixes that issue (without P2P). Fedora 41.

Also, do we have the .run file without or with kernel?

@aikitoria
Copy link

Cards are boosting correctly for me when running compute workload. Do you also get the issue just installing vanilla custom kernel modules (without the patch, but not signed by nvidia)?

@Panchovix
Copy link

Oh for compute they boost fine, but for graphics they don't, for some reason. Now I tried with --no-kernel-modules from the .run file and then applying the tinygrad patch and no luck either, still doesn't boost normally on graphics.

I haven't tried with vanilla custom kernel modules yet, but it's highly factible that it comes from there actually and not the patch actually.

@Panchovix
Copy link

Oh and probably maybe another factor, on fedora I do have to rebuild the kernel to make p2p work after applying the patch (sudo dracut --regenerate-all --force), else it doesn't work.

@aikitoria
Copy link

Note sure about Fedora. For me on Debian the process to install it is always:

  1. uninstall previous driver using the uninstall script made by the last installation
  2. install the new driver version using the .run file provided by nvidia (choosing MIT licensed modules)
  3. run install script in the repo on branch matching the version exactly to replace the modules with modified ones
  4. reboot the computer (otherwise it doesn't work correctly)

@Panchovix
Copy link

Okay just tried without rebuilding the kernel after install.sh and it does detect P2P for the 4090s and work fine, but graphic workloads still don't boost correctly :(

It is too late ATM for me so tomorrow I will try vanilla custom kernel modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants