Replies: 1 comment 1 reply
-
Introducing randomness may potentially make the training process more unstable, and the convergence performance may be affected? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The formula used in the loss calculation:
and the formula used for the final txt generation with temperature, top_k and sampling;
differs for the management of randomness that manipulate the final logits, creating esentially two distinct path for the training and the use of the model.
I am wondering if we incorporate the
text_generation
(modified) logit calculation in the training loss calculation could be benefit for the performance of the model.Beta Was this translation helpful? Give feedback.
All reactions