Skip to content

Dual 4090 report ‘Cuda failure p2pBandwidthLatencyTest.cu:189: 'mapping of buffer object failed'’ in Third Gen Xeon platform with PCIe Gen4 switch #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
aosudh opened this issue Mar 7, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@aosudh
Copy link

aosudh commented Mar 7, 2025

NVIDIA Open GPU Kernel Modules Version

550.90.07-p2p, 565.57.01-p2p

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 22.04.2 LTS, Ubuntu 24.04.2 LTS

Kernel Release

Linux 4090 6.8.0-55-generic NVIDIA#57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

RTX4090

Describe the bug

After both the system driver and the kernel driver are installed, run nvidia-smi topo -p2p p. The driver shows that the device supports P2P.

Image

The results obtained using non - P2P drivers are also shown in the following figure.

Image

However, whether in Docker or on the host machine, when using p2pbandwidthtest and simpleP2P test compiled with CUDA 12.4 and 12.6 and their corresponding CUDA samples, the P2P status can be recognized correctly, but the P2P test cannot be carried out.

Image

Image

To Reproduce

Install the driver with the --no-kernel-module option. Run install.sh. Install CUDA from the runfile. Install CUDA samples, compile them, and then run.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

@aosudh aosudh added the bug Something isn't working label Mar 7, 2025
@lambo111-x86
Copy link

+1

@zvorinji
Copy link

This is the expected behavior behind a switch. The Linux kernel does not support it yet. There is some work on this topic going on but I think it's probably 6-12 months before it's done if ever. And even so, it's unclear if the p2p driver will work out of the box or ever from those changes.

@hyson666
Copy link

Same problem, has anyone solved this problem?

@developer-hq
Copy link

same problem

1 similar comment
@caolonghao
Copy link

same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants