Open
Description
Reproducer:
python ./test/local_test_forward_hf_bert.py --cuda 1
Output:
REPLICATE config: 1 -> MultiUseParameterConfig.REPLICATE
/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib/python3.8/site-packages/transformers/activations.py:56: UserWarning: Defining your `__torch_function__` as a plain method is deprecated and will be an error in future, please define it as a classmethod. (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:289.)
return self.act(input)
Using schedule: 1F1B
Instantiating BERT Pipeline
...
Traceback (most recent call last):
File "test/local_test_forward_hf_bert.py", line 155, in <module>
mp.spawn(run_worker, args=(args.world_size, args,), nprocs=args.world_size, join=True)
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/test/local_test_forward_hf_bert.py", line 139, in run_worker
run_master(args)
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/test/local_test_forward_hf_bert.py", line 98, in run_master
torch.testing.assert_close(out['last_hidden_state'], ref_out['last_hidden_state'])
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/testing/_comparison.py", line 1304, in assert_close
assert_equal(
File "/home/ec2-user/actions-runner/_work/PiPPy/PiPPy/build_binary_3.8/lib64/python3.8/site-packages/torch/testing/_comparison.py", line 1074, in assert_equal
raise error_metas[0].to_error()
AssertionError: Tensor-likes are not close!
Mismatched elements: 393212 / 491520 (80.0%)
Greatest absolute difference: 4.328115940093994 at index (9, 12, 223) (up to 1e-05 allowed)
Greatest relative difference: 1.0 at index (4, 0, 0) (up to 1.3e-06 allowed)