You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nv_drm_atomic_commit can silently fail due to down_interruptible in nvkms_ioctl_common being interrupted, causing the Jay wayland compositor to hang
#832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Arch Linux
Kernel Release
6.14.2-arch1-1
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
I am running on a stable kernel release.
Hardware: GPU
NVIDIA GeForce RTX 2080
Describe the bug
When running the Jay wayland compositor, the screen freezes after a few seconds/minutes of use.
To Reproduce
Run the Jay wayland compositor. Wait a few minutes (move the mouse around/etc. until it freezes)
The long story short, this down_interruptible call being interrupted is the cause of the issue. Replacing it with while ((status = down_interruptible(&nvkms_lock)) != -4); or just down(&nvkms_lock); (and removing the status variable) fully fixes the issue for me and makes Jay be fully stable (I am writing this post in Jay on nvidia at the moment).
which calls down_interruptible to obtain a lock on the global nvkms_lock
In reverse order:
down_interruptible is interrupted, returning -EINTR
which causes KmsFlip to fail (and print "NVKMS_IOCTL_FLIP ioctl failed"), returning false (instead of bubbling up -EINTR)
which causes applyModeSetConfig to return false
which makes nv_drm_atomic_apply_modeset_config return -EINVAL
causing nv_drm_atomic_commit to fail (and print "Failed to apply atomic modeset. Error code: -22"). However, nv_drm_atomic_commit still returns 0 to userland, success.
Some comments:
In nv_drm_atomic_commit, it states:
/* * nv_drm_atomic_commit_internal() must not return failure after * calling drm_atomic_helper_swap_state(). */
(presumably, nv_drm_atomic_commit used to be called nv_drm_atomic_commit_internal). However, after calling drm_atomic_helper_swap_state, it calls drm_atomic_helper_swap_state, which eventually calls down_interruptible, which can obviously "fail" (be interrupted). The guess is that nvkms_ioctl_common was never intended to be called in such a strict error handling context, hence the use of down_interruptible rather than down.
Additionally, the author of Jay said that this is likely an interaction with io_uring, with respect to signals and interruption.
I originally filed this issue with Jay. Some comments from the author of Jay and additional context can be found here: mahkoh/jay#425
The text was updated successfully, but these errors were encountered:
Just an update, the author of Jay committed mahkoh/jay#441 to the Jay project which makes it significantly less likely that this issue reproduces easily (less likely that an interrupt happens from io_uring), although this issue still needs to be fixed. So, additional instructions to reproduce easily would be to use a version of Jay before that commit - I believe 1.10.0 is the latest release before that commit. Hopefully you can get a repro using your own internal code and not through Jay, though!
NVIDIA Open GPU Kernel Modules Version
570.133.07
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Arch Linux
Kernel Release
6.14.2-arch1-1
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 2080
Describe the bug
When running the Jay wayland compositor, the screen freezes after a few seconds/minutes of use.
To Reproduce
Run the Jay wayland compositor. Wait a few minutes (move the mouse around/etc. until it freezes)
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
The long story short, this
down_interruptible
call being interrupted is the cause of the issue. Replacing it withwhile ((status = down_interruptible(&nvkms_lock)) != -4);
or justdown(&nvkms_lock);
(and removing thestatus
variable) fully fixes the issue for me and makes Jay be fully stable (I am writing this post in Jay on nvidia at the moment).open-gpu-kernel-modules/kernel-open/nvidia-modeset/nvidia-modeset-linux.c
Line 1266 in 4159579
The call stack:
nvkms_lock
In reverse order:
-EINTR
false
(instead of bubbling up-EINTR
)-EINVAL
Some comments:
In
nv_drm_atomic_commit
, it states:(presumably,
nv_drm_atomic_commit
used to be callednv_drm_atomic_commit_internal
). However, after callingdrm_atomic_helper_swap_state
, it callsdrm_atomic_helper_swap_state
, which eventually callsdown_interruptible
, which can obviously "fail" (be interrupted). The guess is thatnvkms_ioctl_common
was never intended to be called in such a strict error handling context, hence the use ofdown_interruptible
rather thandown
.Additionally, the author of Jay said that this is likely an interaction with
io_uring
, with respect to signals and interruption.I originally filed this issue with Jay. Some comments from the author of Jay and additional context can be found here: mahkoh/jay#425
The text was updated successfully, but these errors were encountered: