Description
If I understand correctly, there may be some performance considerations when using mappers such as scale_to_z_score_per_key
and either setting or leaving unset key_vocabulary_filename
. The documentation makes it sound like it's simply a matter of whether the keys fit into memory or not. Please correct me if I am wrong, but it seems like if you leave key_vocabulary_filename=None
, then it will do the lookups in memory via map_per_key_reductions
which can be very inefficient if the number of keys is more than just a handful. On the other hand, setting key_vocabulary_filename
will create a StaticHashTable
and lookups will be much more efficient.
If my understanding is correct, it would be good to note this in the docs so that other people can decide what is best for their use case.