Skip to content

Conversation

MelReyCG
Copy link
Contributor

@MelReyCG MelReyCG commented Oct 8, 2025

This PR is based on Amandine work on adding error YAML file in GEOS (PR #3828), and aims at adding a detection & management inside GEOS of 1. Error signals, 2. External errors from dependencies, in order to be able to manage & output them in the log & error YAML file.

Managing those external errors gives us the opportunity to:

  • detect any kernel / system allocator errors,
  • add the stack-trace of the error,
  • output them reliably in the log, even if the stderr get lost or used for another reason,
  • factorize them with external tools / scripts, thus highlighting which are the source rank(s) of the issue.

This PR also prevent the stacktrace to be cut by other ranks message, which could previously happen on a signal.

We can imagine adding later some tag for each dependency (system, LvArray, Hypre, ...) to quickly identify / filter issues source.


(Replaces #3722)

MelReyCG added 19 commits July 7, 2025 17:46
…o feature/rey/signal-and-external-error-managment
…ate-yaml-file-and-structure' into feature/rey/signal-and-external-error-managment
…o feature/rey/signal-and-external-error-managment
…o feature/rey/signal-and-external-error-managment
…o feature/rey/signal-and-external-error-managment
…file-and-structure-2' into feature/rey/signal-and-external-error-managment
…file-and-structure-2' into feature/rey/signal-and-external-error-managment
…external-error-managment' into feature/rey/signal-and-external-error-managment-2
@MelReyCG MelReyCG self-assigned this Oct 8, 2025
@MelReyCG MelReyCG added the type: bug Something isn't working label Oct 8, 2025
@MelReyCG MelReyCG added ci: run CUDA builds Allows to triggers (costly) CUDA jobs flag: ready for review ci: run integrated tests Allows to run the integrated tests in GEOS CI labels Oct 8, 2025
MelReyCG and others added 3 commits October 10, 2025 13:36
…nto feature/rey/signal-and-external-error-managment-2
…file-and-structure-2' into feature/rey/signal-and-external-error-managment-2
…file-and-structure-2' into feature/rey/signal-and-external-error-managment-2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI flag: ready for review type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant