-
Notifications
You must be signed in to change notification settings - Fork 111
Cuda archive for 12.7 is not available. 12.8 is unsupported. 12.6.3 is in some random PR. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I maintain more up to date branches here: https://github.com/aikitoria/open-gpu-kernel-modules |
Thanks. Will try it out. My next hurdle is that I also have a 22g 2080ti. If it doesn't affect the 3090s, I'm golden. if it does I guess I just install the closed source modules and frown. edit: it still doesn't match cuda repo drivers. So I have to d/l cuda repo, dl the updated driver after. We'll see how that works out. Maybe I try my luck with drivers only since I am using conda for all environments. |
So I tested it with the provided driver (.04) and it works. Only a slight speed bump, much lower latency. Worse issue is not having 570.86.10 like cuda repo. You can't install it's driver packages and ubuntu assumes it's not installed. To compile NCCL and run the perf test, I needed to manually copy the libs, set paths. Plus more bleeding edge driver rather than the tested/shipped one in cuda may not be better for performance overall. 2080ti doesn't seem to affect anything. I mean, we are using this with cuda, right? Why make it hard? |
I'm not sure about ubuntu I don't use it, I use Debian testing myself, didn't have any problems installing the 570.124.04 driver and the cuda 12.8 toolkit there. If you need a specific other version of the driver you can probably just rebase a close version to it with no merge conflicts, the change is very simple. |
Upon further inspection, I think the issue might be nvidia. The driver versions they put in their kernel repo don't match what they package. So the only way to have it work seamlessly would be to rip the source that is shipped and apply the patches on top of that. For instance, on my system, clinfo acts like no opencl is present despite the libs being under /usr/local/cuda. Even when added to the paths it doesn't all pick up. You can't install all packages relating to these things (like cuda runtime) because they pull in their own driver files too. I doubt it's distro dependent. Pytorch/conda seem to work just fine as-is so it's just a matter of fixing things for the rest of it. |
I installed your branch and did some testing, but I cannot test the simpleP2P pass on the 2x RTX5070 Ti. The 2x RTX4070TiS works well. `[./simpleP2P] - Starting...
|
I don't currently have a system with multiple 50 series GPUs, so I'm unable to test it on that generation. It's possible the patch needs further changes. You will be on your own making it work for now |
Just wondering, does applying P2P affects graphics workloads? (Not compute) When applying the patch of aikitoria, P2P works fine on 2 of my 4090s, but when trying to do graphics on either or a 5090 I have on the same PC, card doesn't boost as it should (Wine). I can confirm that re installing the driver from scratch fixes that issue (without P2P). Fedora 41. Also, do we have the .run file without or with kernel? |
Cards are boosting correctly for me when running compute workload. Do you also get the issue just installing vanilla custom kernel modules (without the patch, but not signed by nvidia)? |
Oh for compute they boost fine, but for graphics they don't, for some reason. Now I tried with --no-kernel-modules from the .run file and then applying the tinygrad patch and no luck either, still doesn't boost normally on graphics. I haven't tried with vanilla custom kernel modules yet, but it's highly factible that it comes from there actually and not the patch actually. |
Oh and probably maybe another factor, on fedora I do have to rebuild the kernel to make p2p work after applying the patch (sudo dracut --regenerate-all --force), else it doesn't work. |
Note sure about Fedora. For me on Debian the process to install it is always:
|
Okay just tried without rebuilding the kernel after install.sh and it does detect P2P for the 4090s and work fine, but graphic workloads still don't boost correctly :( It is too late ATM for me so tomorrow I will try vanilla custom kernel modules. |
NVIDIA Open GPU Kernel Modules Version
565
Operating System and Version
Ubuntu 22.04
Kernel Release
n/a
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Build Command
none
Terminal output/Build Log
none
More Info
If you look at https://developer.nvidia.com/cuda-toolkit-archive, the archive stop at 12.6.3 and so the drivers cannot be downloaded.
The 12.6.3 driver is still a PR. Basically there is no repo to build against if you want up to date cuda. Just pick a random version and cross fingers?
The text was updated successfully, but these errors were encountered: