-
Couldn't load subscription status.
- Fork 105
Description
Version
24.12
On which installation method(s) does this occur?
Docker
Describe the issue
Issue: Validation Loss Not Converging for Bracket Example
We are encountering issues with the validation loss not converging during training for the bracket example. Specifically, we tested two different modulus containers, 24.12 and 23.08, and observed different outcomes:
Container Version 24.12:
The validation loss remained unchanged even after running for 2 million iterations. This suggests that the model is not making progress toward convergence.
Container Version 23.08:
Without restarting from the checkpoint, the results showed some improvement. However, the model did not fully converge, especially for the Z components. After 2 million iterations, the Z components reached a plateau at a validation loss of 0.7, while the other components reached 0.3.
With restart from the checkpoint, the loss curve started either oscillating or diverging, preventing further progress.
Container Version 23.05:
The results with this version were similar to those obtained with container version 23.08, showing a consistent pattern of incomplete convergence, particularly for the Z components.
ldc_2d_zeroEq with container version 24.12
validation loss are converging and it seems that convergency problem is not valid for ldc_2d_zeroEqexample
Minimum reproducible example
Relevant log output
Environment details
Other/Misc.
No response


