You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During training, I experimented with using bf16 precision combined with 8-bit AdamW. While the training ran without issues, the quality of the generated images was noticeably poor. I then followed the setup described in the paper and switched to the standard AdamW optimizer. I found that when using bf16 with AdamW, CFG could not be properly enabled, resulting in entirely blurry outputs during inference. Based on this, I suspect that the paper likely used standard AdamW with fp32 precision, though I’m not entirely sure if my assumption is correct.
Additionally, the paper mentions a batch size of 128. I assume this refers to the global batch size—i.e., with 8 GPUs, each GPU uses a batch size of 16. Again, I’m not certain if this interpretation is accurate.
The text was updated successfully, but these errors were encountered:
Your replicated training code is an impressive piece of work! I’m deeply interested in the implementation details of your training code. Would you be willing to share the complete code for educational or collaborative purposes? I’d be happy to discuss compensating you for your work or sponsoring your efforts.this is my email:[email protected]
During training, I experimented with using bf16 precision combined with 8-bit AdamW. While the training ran without issues, the quality of the generated images was noticeably poor. I then followed the setup described in the paper and switched to the standard AdamW optimizer. I found that when using bf16 with AdamW, CFG could not be properly enabled, resulting in entirely blurry outputs during inference. Based on this, I suspect that the paper likely used standard AdamW with fp32 precision, though I’m not entirely sure if my assumption is correct.
Additionally, the paper mentions a batch size of 128. I assume this refers to the global batch size—i.e., with 8 GPUs, each GPU uses a batch size of 16. Again, I’m not certain if this interpretation is accurate.
The text was updated successfully, but these errors were encountered: