[magpietts] added logging of consumed samples at each training step. #13849

XuesongYang · 2025-06-06T06:53:16Z

The Lhotse dataloader applies dynamic batching based on duration buckets, causing the number of consumed samples at each training step to vary. Tracking the number of consumed samples enables fairer model comparisons than relying solely on training step counts. The Megatron models also tracked consumed samples.

This PR adds sample consumption logs to the wandb UI and appends this information to checkpoint filenames.

ckpt filename changes
MagpieTTS-EN-Lhotse--val_loss=11.7273-step=1350-epoch=26.ckpt -->
MagpieTTS-EN-Lhotse--val_loss=11.7273-step=1350-epoch=26-consumed_samples=96466.ckpt

wandb UI changes

…g it into wandb UI. The checkpoint filename also appended with it. Signed-off-by: Xuesong Yang <[email protected]>

Copilot

Pull Request Overview

This PR adds logging for the number of consumed samples at each training step to facilitate fairer model comparisons and improve checkpoint traceability. Key changes include:

New instance variables and functions added in the MagpieTTS model to track consumed samples.
Updates to log calls during training to propagate the consumed samples count to the UI and checkpoints.
Modifications to YAML configuration files to include consumed_samples in checkpoint filenames.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
nemo/collections/tts/models/magpietts.py	Implements consumed samples tracking and logging in training and resuming.
examples/tts/conf/magpietts/magpietts_multilingual_v1.yaml	Updates checkpoint filename format to include consumed_samples.
examples/tts/conf/magpietts/magpietts_lhotse_dc_en.yaml	Updates checkpoint filename format to include consumed_samples.
examples/tts/conf/magpietts/magpietts_en.yaml	Updates checkpoint filename format to include consumed_samples.
examples/tts/conf/magpietts/magpietts_dc_en.yaml	Updates checkpoint filename format to include consumed_samples.

Comments suppressed due to low confidence (2)

nemo/collections/tts/models/magpietts.py:277

Consider including the checkpoint path (ckpt_path) in the warning log message to aid in debugging when parsing fails.

init_consumed_samples = int(re.search(r"consumed_samples\=(\d+)", ckpt_path).group(1))

examples/tts/conf/magpietts/magpietts_multilingual_v1.yaml:229

[nitpick] Ensure that the updated filename format including 'consumed_samples' is clearly documented for users, such as in the changelog or release notes.

filename: '${name}--{${exp_manager.checkpoint_callback_params.monitor}:.4f}-{step}-{epoch}-{consumed_samples:.0f}'

nemo/collections/tts/models/magpietts.py

blisc

Need to root cause checkpoint issues

XuesongYang · 2025-06-10T17:02:08Z

Lhotse applied dynamic batching for a duration bucket across ranks so that each rank will proceed with varying batch sizes. This PR would then makes training progress hanging right after the validation is complete. Need to find alternative ways of reducing with sum batch sizes across ranks.

convert to draft for further investigation.

github-actions · 2025-06-28T02:09:38Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2025-07-06T02:12:25Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions · 2025-07-21T02:12:41Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2025-07-28T02:12:46Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

[magpietts] added computation of consumed samples at each step and lo…

1adcb33

…g it into wandb UI. The checkpoint filename also appended with it. Signed-off-by: Xuesong Yang <[email protected]>

XuesongYang requested a review from Copilot June 6, 2025 06:53

github-actions bot added the TTS label Jun 6, 2025

XuesongYang requested a review from blisc June 6, 2025 06:53

Copilot AI reviewed Jun 6, 2025

View reviewed changes

nemo/collections/tts/models/magpietts.py Show resolved Hide resolved

blisc requested changes Jun 9, 2025

View reviewed changes

XuesongYang marked this pull request as draft June 10, 2025 17:02

github-actions bot added the stale label Jun 28, 2025

github-actions bot closed this Jul 6, 2025

XuesongYang reopened this Jul 6, 2025

github-actions bot removed the stale label Jul 7, 2025

github-actions bot added the stale label Jul 21, 2025

github-actions bot closed this Jul 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[magpietts] added logging of consumed samples at each training step. #13849

[magpietts] added logging of consumed samples at each training step. #13849

Uh oh!

XuesongYang commented Jun 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

blisc left a comment

Uh oh!

XuesongYang commented Jun 10, 2025

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

github-actions bot commented Jul 6, 2025

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

Uh oh!

[magpietts] added logging of consumed samples at each training step. #13849

[magpietts] added logging of consumed samples at each training step. #13849

Uh oh!

Conversation

XuesongYang commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

blisc left a comment

Choose a reason for hiding this comment

Uh oh!

XuesongYang commented Jun 10, 2025

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

github-actions bot commented Jul 6, 2025

Uh oh!

github-actions bot commented Jul 21, 2025

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

Uh oh!

XuesongYang commented Jun 6, 2025 •

edited

Loading