Skip to content

Clarify performance considerations for mappers like scale_to_z_score_per_key and key_vocabulary_filename #251

Open
@cyc

Description

@cyc

If I understand correctly, there may be some performance considerations when using mappers such as scale_to_z_score_per_key and either setting or leaving unset key_vocabulary_filename. The documentation makes it sound like it's simply a matter of whether the keys fit into memory or not. Please correct me if I am wrong, but it seems like if you leave key_vocabulary_filename=None, then it will do the lookups in memory via map_per_key_reductions which can be very inefficient if the number of keys is more than just a handful. On the other hand, setting key_vocabulary_filename will create a StaticHashTable and lookups will be much more efficient.

If my understanding is correct, it would be good to note this in the docs so that other people can decide what is best for their use case.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions