Incorrect buckets sizes are generated

In experimenting with adding native qwen resolutions to the bucket sizes I discovered that the function get_bucket_sizes is adding new and incorrect resolutions to the buckets list. I believe this is due to a unintended type promotion.  Basically the mod operations that check for matching width and height will rarely equal 0 because it is resulting in a float value. So, the comparison should instead use a very small value close to zero or the int type should be explicit.

The result is that this function adds new resolution buckets with often wildly different aspect ratios than the input image. Images that should in fact match an existing resolution will end up in a bucket for the newly added resolution. 

I am assuming this may have some affect on training quality.

https://github.com/ostris/ai-toolkit/blob/ee206cfa18b52f91b8b4cba9395c687f050d2c4e/toolkit/buckets.py#L59

Edit: 

In other words, if I create a dataset where all the images conform to a standard SDXL width and height, they will still get put in random buckets and resized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect buckets sizes are generated #476

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Incorrect buckets sizes are generated #476

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions