Skip to content

some details about train #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wwwjh333 opened this issue Apr 8, 2025 · 1 comment
Open

some details about train #113

wwwjh333 opened this issue Apr 8, 2025 · 1 comment

Comments

@wwwjh333
Copy link

wwwjh333 commented Apr 8, 2025

During training, I experimented with using bf16 precision combined with 8-bit AdamW. While the training ran without issues, the quality of the generated images was noticeably poor. I then followed the setup described in the paper and switched to the standard AdamW optimizer. I found that when using bf16 with AdamW, CFG could not be properly enabled, resulting in entirely blurry outputs during inference. Based on this, I suspect that the paper likely used standard AdamW with fp32 precision, though I’m not entirely sure if my assumption is correct.

Additionally, the paper mentions a batch size of 128. I assume this refers to the global batch size—i.e., with 8 GPUs, each GPU uses a batch size of 16. Again, I’m not certain if this interpretation is accurate.

@all2xii
Copy link

all2xii commented Apr 12, 2025

Your replicated training code is an impressive piece of work! I’m deeply interested in the implementation details of your training code. Would you be willing to share the complete code for educational or collaborative purposes? I’d be happy to discuss compensating you for your work or sponsoring your efforts.this is my email:[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants