-
-
Notifications
You must be signed in to change notification settings - Fork 805
Open
Description
This is for bugs only
Did you already ask in the discord?
Yes/No
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes/No
Describe the bug
Got the following error while training Qwen-Image-Edit-2509 with batch_size > 1:
========================================
Result:
- 0 completed jobs
- 1 failure
========================================
Traceback (most recent call last):
Traceback (most recent call last):
File "/data/ai-toolkit/run.py", line 120, in <module>
File "/data/ai-toolkit/run.py", line 120, in <module>
main()main()
File "/data/ai-toolkit/run.py", line 108, in main
File "/data/ai-toolkit/run.py", line 108, in main
raise eraise e
File "/data/ai-toolkit/run.py", line 96, in main
File "/data/ai-toolkit/run.py", line 96, in main
job.run()job.run()
File "/data/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
File "/data/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
process.run()process.run()
File "/data/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2208, in run
File "/data/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2208, in run
batch = next(dataloader_iterator)batch = next(dataloader_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 733, in __next__
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 733, in __next__
data = self._next_data()data = self._next_data()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data
return self._process_data(data, worker_id)return self._process_data(data, worker_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data
data.reraise()data.reraise()
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/_utils.py", line 750, in reraise
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/_utils.py", line 750, in reraise
raise exceptionraise exception
RuntimeErrorRuntimeError: : Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/toolkit/data_loader.py", line 642, in dto_collation
batch = DataLoaderBatchDTO(
^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/toolkit/data_transfer_object/data_loader.py", line 306, in __init__
raise e
File "/data/ai-toolkit/toolkit/data_transfer_object/data_loader.py", line 180, in __init__
self.control_tensor = torch.cat([x.unsqueeze(0) for x in control_tensors])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 1851 but got size 1819 for tensor number 1 in the list.
Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/venv/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/toolkit/data_loader.py", line 642, in dto_collation
batch = DataLoaderBatchDTO(
^^^^^^^^^^^^^^^^^^^
File "/data/ai-toolkit/toolkit/data_transfer_object/data_loader.py", line 306, in __init__
raise e
File "/data/ai-toolkit/toolkit/data_transfer_object/data_loader.py", line 180, in __init__
self.control_tensor = torch.cat([x.unsqueeze(0) for x in control_tensors])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 1851 but got size 1819 for tensor number 1 in the list.
The following is my training config:
job: "extension"
config:
name: "qwen_image_edit_v3.7"
process:
- type: "diffusion_trainer"
training_folder: "/data/ai-toolkit/output"
sqlite_db_path: "./aitk_db.db"
device: "cuda"
trigger_word: null
performance_log_every: 10
network:
type: "lora"
linear: 32
linear_alpha: 32
conv: 16
conv_alpha: 16
lokr_full_rank: true
lokr_factor: -1
network_kwargs:
ignore_if_contains: []
save:
dtype: "bf16"
save_every: 250
max_step_saves_to_keep: 4
save_format: "diffusers"
push_to_hub: false
datasets:
- folder_path: "/data/ai-toolkit/datasets/target_v2_0"
mask_path: null
mask_min_value: 0.1
caption_ext: "txt"
caption_dropout_rate: 0.05
cache_latents_to_disk: false
is_reg: false
network_weight: 1
resolution:
- 1024
- 512
- 768
controls: []
shrink_video_to_frames: true
num_frames: 1
do_i2v: true
flip_x: false
flip_y: false
control_path_1: "/data/ai-toolkit/datasets/masked_v2_0"
train:
batch_size: 8
bypass_guidance_embedding: false
steps: 6000
gradient_accumulation: 1
train_unet: true
train_text_encoder: false
gradient_checkpointing: true
noise_scheduler: "flowmatch"
optimizer: "adamw8bit"
timestep_type: "weighted"
content_or_style: "balanced"
optimizer_params:
weight_decay: 0.0001
unload_text_encoder: false
cache_text_embeddings: true
lr: 0.0001
ema_config:
use_ema: false
ema_decay: 0.99
skip_first_sample: false
force_first_sample: false
disable_sampling: false
dtype: "bf16"
diff_output_preservation: false
diff_output_preservation_multiplier: 1
diff_output_preservation_class: "person"
switch_boundary_every: 1
loss_type: "mse"
model:
name_or_path: "Qwen/Qwen-Image-Edit-2509"
quantize: true
qtype: "qfloat8"
quantize_te: true
qtype_te: "qfloat8"
arch: "qwen_image_edit_plus"
low_vram: false
model_kwargs:
match_target_res: false
layer_offloading: false
layer_offloading_text_encoder_percent: 1
layer_offloading_transformer_percent: 1
neg: ""
seed: 42
walk_seed: true
guidance_scale: 4
sample_steps: 25
num_frames: 1
fps: 1
meta:
name: "[name]"
version: "1.0"
Metadata
Metadata
Assignees
Labels
No labels