Skip to content

Conversation

GCS-ZHN
Copy link

@GCS-ZHN GCS-ZHN commented Oct 26, 2022

This PR is modified based on previous PR #165 by @cainmagi ,

Main change features:

  • automatical detect and filter not array like elements in forward output list/tuple/dict. For example, MultiHeadAttention module return a tuple which contain a NoneType value as a placeholder of attention weight.
  • If filtered output contain no element, raise a ValueError to notify user instead of original NoneType AtrributeError.
  • Replace -1 to batch_size in dict/list/tuple output shape because I believe it will be more properly.

cainmagi and others added 6 commits February 27, 2021 01:21
1. Fix the bug of parameter number calculation when there are more than one output variables, including both sequence case and dict case.
2. Make multuple output variables split into multiple lines.
3. Remove the last line break of summary_string()
4. Enable argument "device" to accept both str and torch.device.
5. Fix a bug when the model requires "batch_size" to be a specific number.
6. Fix a bug caused by multiple input case when "dtypes=None".
7. Add text auto wrap when the layer name is too long.
8. Add docstring.
Support counting all parameters instead of `weight` and `bias`.
Using numpy sum/prod to calculate the total size may cause overflow problem. This modification would drop the numpy and use the python built-in method to calculate the size.
Fix the bug caused by layers with dict input values.
Fix the data type of the output params_info from torch.tensor to int.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants