Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

Ph0rk0z · 2025-02-24T20:12:39Z

NVIDIA Open GPU Kernel Modules Version

565

Operating System and Version

Ubuntu 22.04

Kernel Release

n/a

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

I am running on a stable kernel release.

Build Command

none

Terminal output/Build Log

none

More Info

If you look at https://developer.nvidia.com/cuda-toolkit-archive, the archive stop at 12.6.3 and so the drivers cannot be downloaded.

The 12.6.3 driver is still a PR. Basically there is no repo to build against if you want up to date cuda. Just pick a random version and cross fingers?

aikitoria · 2025-03-07T20:10:16Z

I maintain more up to date branches here: https://github.com/aikitoria/open-gpu-kernel-modules

Ph0rk0z · 2025-03-08T13:02:48Z

Thanks. Will try it out. My next hurdle is that I also have a 22g 2080ti. If it doesn't affect the 3090s, I'm golden. if it does I guess I just install the closed source modules and frown.

edit: it still doesn't match cuda repo drivers. So I have to d/l cuda repo, dl the updated driver after. We'll see how that works out. Maybe I try my luck with drivers only since I am using conda for all environments.

Ph0rk0z · 2025-03-09T12:07:09Z

So I tested it with the provided driver (.04) and it works. Only a slight speed bump, much lower latency. Worse issue is not having 570.86.10 like cuda repo. You can't install it's driver packages and ubuntu assumes it's not installed. To compile NCCL and run the perf test, I needed to manually copy the libs, set paths. Plus more bleeding edge driver rather than the tested/shipped one in cuda may not be better for performance overall. 2080ti doesn't seem to affect anything.

I mean, we are using this with cuda, right? Why make it hard?

aikitoria · 2025-03-09T12:30:04Z

I'm not sure about ubuntu I don't use it, I use Debian testing myself, didn't have any problems installing the 570.124.04 driver and the cuda 12.8 toolkit there. If you need a specific other version of the driver you can probably just rebase a close version to it with no merge conflicts, the change is very simple.

Ph0rk0z · 2025-03-09T15:34:30Z

Upon further inspection, I think the issue might be nvidia. The driver versions they put in their kernel repo don't match what they package. So the only way to have it work seamlessly would be to rip the source that is shipped and apply the patches on top of that.

For instance, on my system, clinfo acts like no opencl is present despite the libs being under /usr/local/cuda. Even when added to the paths it doesn't all pick up. You can't install all packages relating to these things (like cuda runtime) because they pull in their own driver files too. I doubt it's distro dependent.

Pytorch/conda seem to work just fine as-is so it's just a matter of fixing things for the rest of it.

LuosjDD · 2025-03-09T17:25:38Z

I maintain more up to date branches here: https://github.com/aikitoria/open-gpu-kernel-modules我在这里维护更更新的分支：https://github.com/aikitoria/open-gpu-kernel-modules

I installed your branch and did some testing, but I cannot test the simpleP2P pass on the 2x RTX5070 Ti. The 2x RTX4070TiS works well.
I have confirmed that RTX5070Ti BAR1 size meets the requirements, and the CUDA samples version is 12.8. Do you have any suggestions for this CUDA error? By the way, you should enable the issues Feature on your branch.

`[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
Checking GPU(s) for support of peer to peer memory access

Peer access from NVIDIA GeForce RTX 5070 Ti (GPU0) -> NVIDIA GeForce RTX 5070 Ti (GPU1) : Yes
Peer access from NVIDIA GeForce RTX 5070 Ti (GPU1) -> NVIDIA GeForce RTX 5070 Ti (GPU0) : Yes
Enabling peer access between GPU0 and GPU1...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
CUDA error at /home/a/test/cuda-samples/Samples/0_Introduction/simpleP2P/simpleP2P.cu:170 code=719(cudaErrorLaunchFailure) "cudaEventSynchronize(stop_event)"
`

aikitoria · 2025-03-09T17:30:42Z

I don't currently have a system with multiple 50 series GPUs, so I'm unable to test it on that generation. It's possible the patch needs further changes. You will be on your own making it work for now

Panchovix · 2025-04-07T01:54:45Z

Just wondering, does applying P2P affects graphics workloads? (Not compute)

When applying the patch of aikitoria, P2P works fine on 2 of my 4090s, but when trying to do graphics on either or a 5090 I have on the same PC, card doesn't boost as it should (Wine). I can confirm that re installing the driver from scratch fixes that issue (without P2P). Fedora 41.

Also, do we have the .run file without or with kernel?

aikitoria · 2025-04-07T02:03:57Z

Cards are boosting correctly for me when running compute workload. Do you also get the issue just installing vanilla custom kernel modules (without the patch, but not signed by nvidia)?

Panchovix · 2025-04-07T02:16:17Z

Oh for compute they boost fine, but for graphics they don't, for some reason. Now I tried with --no-kernel-modules from the .run file and then applying the tinygrad patch and no luck either, still doesn't boost normally on graphics.

I haven't tried with vanilla custom kernel modules yet, but it's highly factible that it comes from there actually and not the patch actually.

Panchovix · 2025-04-07T02:17:56Z

Oh and probably maybe another factor, on fedora I do have to rebuild the kernel to make p2p work after applying the patch (sudo dracut --regenerate-all --force), else it doesn't work.

aikitoria · 2025-04-07T02:22:09Z

Note sure about Fedora. For me on Debian the process to install it is always:

uninstall previous driver using the uninstall script made by the last installation
install the new driver version using the .run file provided by nvidia (choosing MIT licensed modules)
run install script in the repo on branch matching the version exactly to replace the modules with modified ones
reboot the computer (otherwise it doesn't work correctly)

Panchovix · 2025-04-07T02:34:54Z

Okay just tried without rebuilding the kernel after install.sh and it does detect P2P for the 4090s and work fine, but graphic workloads still don't boost correctly :(

It is too late ATM for me so tomorrow I will try vanilla custom kernel modules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

Ph0rk0z commented Feb 24, 2025

aikitoria commented Mar 7, 2025

Uh oh!

Ph0rk0z commented Mar 8, 2025 •

edited

Loading

Uh oh!

Ph0rk0z commented Mar 9, 2025

Uh oh!

aikitoria commented Mar 9, 2025

Uh oh!

Ph0rk0z commented Mar 9, 2025

Uh oh!

LuosjDD commented Mar 9, 2025 •

edited

Loading

Uh oh!

aikitoria commented Mar 9, 2025

Uh oh!

Panchovix commented Apr 7, 2025 •

edited

Loading

Uh oh!

aikitoria commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

aikitoria commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31

Comments

Ph0rk0z commented Feb 24, 2025

NVIDIA Open GPU Kernel Modules Version

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Build Command

Terminal output/Build Log

More Info

aikitoria commented Mar 7, 2025

Uh oh!

Ph0rk0z commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ph0rk0z commented Mar 9, 2025

Uh oh!

aikitoria commented Mar 9, 2025

Uh oh!

Ph0rk0z commented Mar 9, 2025

Uh oh!

LuosjDD commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aikitoria commented Mar 9, 2025

Uh oh!

Panchovix commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aikitoria commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

aikitoria commented Apr 7, 2025

Uh oh!

Panchovix commented Apr 7, 2025

Uh oh!

Ph0rk0z commented Mar 8, 2025 •

edited

Loading

LuosjDD commented Mar 9, 2025 •

edited

Loading

Panchovix commented Apr 7, 2025 •

edited

Loading