Elastic resets #2566

t0278611 · 2025-08-14T12:38:15Z

t0278611
Aug 14, 2025

Using model ema (at least for small-ish models) seems to drastically improve the validation results. I wonder why the ema weights are never used in training (a bit like the lookahead optimizer). It should be fairly forward to implement "elastic resets", where the online weights periodically get overwritten by the ema weights. I did not find the feature in the train args, if it exist please do point it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Elastic resets #2566

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Elastic resets #2566

Uh oh!

t0278611 Aug 14, 2025

Replies: 0 comments

t0278611
Aug 14, 2025