Skip to content

run_glue.py: inconsistent train/eval metrics with and without PiPPy #376

Open
@pbelevich

Description

@pbelevich

the original run_glue.py:

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.4244
  train_runtime            = 0:20:44.02
  train_samples            =       3668
  train_samples_per_second =      8.846
  train_steps_per_second   =      0.277
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.8382
  eval_combined_score     =      0.862
  eval_f1                 =     0.8858
  eval_loss               =      0.412
  eval_runtime            = 0:00:15.68
  eval_samples            =        408
  eval_samples_per_second =     26.015
  eval_steps_per_second   =      3.252

run_glue.py no splits 1 stage pipe with backward:

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     1.1115
  train_runtime            = 0:29:12.92
  train_samples            =       3668
  train_samples_per_second =      6.277
  train_steps_per_second   =      0.197
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.3162
  eval_combined_score     =     0.1581
  eval_f1                 =        0.0
  eval_loss               =      1.115
  eval_runtime            = 0:00:14.51
  eval_samples            =        408
  eval_samples_per_second =     28.109
  eval_steps_per_second   =      3.514

no splits 1 stage pipe without backward:

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     1.1115
  train_runtime            = 0:29:58.23
  train_samples            =       3668
  train_samples_per_second =      6.119
  train_steps_per_second   =      0.192
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.3162
  eval_combined_score     =     0.1581
  eval_f1                 =        0.0
  eval_loss               =      1.115
  eval_runtime            = 0:00:15.75
  eval_samples            =        408
  eval_samples_per_second =     25.889
  eval_steps_per_second   =      3.236

8 stages pipe without backward::

***** train metrics *****
  epoch                    =        3.0
  train_loss               =     8.8978
  train_runtime            = 0:20:36.90
  train_samples            =       3668
  train_samples_per_second =      8.896
  train_steps_per_second   =      0.279
***** eval metrics *****
  epoch                   =        3.0
  eval_accuracy           =     0.3162
  eval_combined_score     =     0.1581
  eval_f1                 =        0.0
  eval_loss               =     8.9199
  eval_runtime            = 0:00:16.97
  eval_samples            =        408
  eval_samples_per_second =     24.031
  eval_steps_per_second   =      3.004

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghuggingfacerelated to huggingface transformers models

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions