it seems that counter in vocabulary is counting 'token' tokens with a newline character.
for example, vocabulary.pkl in java-small dataset, i can find
'return': 6020684,
and
'return\n': 33290,
separately.
i personally fixed this problem by stripping path_context on Vocabulary._process_raw_sample,
but im little confused whether this problem(mixing '\n' in tokens) is intended.
thank you!