some details about train #113

wwwjh333 · 2025-04-08T05:09:32Z

During training, I experimented with using bf16 precision combined with 8-bit AdamW. While the training ran without issues, the quality of the generated images was noticeably poor. I then followed the setup described in the paper and switched to the standard AdamW optimizer. I found that when using bf16 with AdamW, CFG could not be properly enabled, resulting in entirely blurry outputs during inference. Based on this, I suspect that the paper likely used standard AdamW with fp32 precision, though I’m not entirely sure if my assumption is correct.

Additionally, the paper mentions a batch size of 128. I assume this refers to the global batch size—i.e., with 8 GPUs, each GPU uses a batch size of 16. Again, I’m not certain if this interpretation is accurate.

all2xii · 2025-04-12T03:01:49Z

Your replicated training code is an impressive piece of work! I’m deeply interested in the implementation details of your training code. Would you be willing to share the complete code for educational or collaborative purposes? I’d be happy to discuss compensating you for your work or sponsoring your efforts.this is my email:[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some details about train #113

some details about train #113

wwwjh333 commented Apr 8, 2025

all2xii commented Apr 12, 2025 •

edited

Loading

Uh oh!

some details about train #113

some details about train #113

Comments

wwwjh333 commented Apr 8, 2025

all2xii commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all2xii commented Apr 12, 2025 •

edited

Loading