Skip to content

nv_drm_atomic_commit can silently fail due to down_interruptible in nvkms_ioctl_common being interrupted, causing the Jay wayland compositor to hang #832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 2 tasks
khyperia opened this issue Apr 20, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@khyperia
Copy link

NVIDIA Open GPU Kernel Modules Version

570.133.07

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Arch Linux

Kernel Release

6.14.2-arch1-1

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 2080

Describe the bug

When running the Jay wayland compositor, the screen freezes after a few seconds/minutes of use.

To Reproduce

Run the Jay wayland compositor. Wait a few minutes (move the mouse around/etc. until it freezes)

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

The long story short, this down_interruptible call being interrupted is the cause of the issue. Replacing it with while ((status = down_interruptible(&nvkms_lock)) != -4); or just down(&nvkms_lock); (and removing the status variable) fully fixes the issue for me and makes Jay be fully stable (I am writing this post in Jay on nvidia at the moment).

status = down_interruptible(&nvkms_lock);

The call stack:

In reverse order:

  • down_interruptible is interrupted, returning -EINTR
  • which causes KmsFlip to fail (and print "NVKMS_IOCTL_FLIP ioctl failed"), returning false (instead of bubbling up -EINTR)
  • which causes applyModeSetConfig to return false
  • which makes nv_drm_atomic_apply_modeset_config return -EINVAL
  • causing nv_drm_atomic_commit to fail (and print "Failed to apply atomic modeset. Error code: -22"). However, nv_drm_atomic_commit still returns 0 to userland, success.

Some comments:
In nv_drm_atomic_commit, it states:

/*
 * nv_drm_atomic_commit_internal() must not return failure after
 * calling drm_atomic_helper_swap_state().
 */

(presumably, nv_drm_atomic_commit used to be called nv_drm_atomic_commit_internal). However, after calling drm_atomic_helper_swap_state, it calls drm_atomic_helper_swap_state, which eventually calls down_interruptible, which can obviously "fail" (be interrupted). The guess is that nvkms_ioctl_common was never intended to be called in such a strict error handling context, hence the use of down_interruptible rather than down.

Additionally, the author of Jay said that this is likely an interaction with io_uring, with respect to signals and interruption.

I originally filed this issue with Jay. Some comments from the author of Jay and additional context can be found here: mahkoh/jay#425

@khyperia khyperia added the bug Something isn't working label Apr 20, 2025
@AlexGoinsNV
Copy link

Thank you for the detailed bug report and analysis. This issue is being tracked internally via bug 5236368.

@khyperia
Copy link
Author

Thank you for looking into this!

Just an update, the author of Jay committed mahkoh/jay#441 to the Jay project which makes it significantly less likely that this issue reproduces easily (less likely that an interrupt happens from io_uring), although this issue still needs to be fixed. So, additional instructions to reproduce easily would be to use a version of Jay before that commit - I believe 1.10.0 is the latest release before that commit. Hopefully you can get a repro using your own internal code and not through Jay, though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants