Clarify performance considerations for mappers like scale_to_z_score_per_key and key_vocabulary_filename

If I understand correctly, there may be some performance considerations when using mappers such as `scale_to_z_score_per_key` and either setting or leaving unset `key_vocabulary_filename`. The documentation makes it sound like it's simply a matter of whether the keys fit into memory or not. Please correct me if I am wrong, but it seems like if you leave `key_vocabulary_filename=None`, then it will do the lookups in memory via `map_per_key_reductions` which can be very inefficient if the number of keys is more than just a handful. On the other hand, setting `key_vocabulary_filename` will create a `StaticHashTable` and lookups will be much more efficient.

If my understanding is correct, it would be good to note this in the docs so that other people can decide what is best for their use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify performance considerations for mappers like scale_to_z_score_per_key and key_vocabulary_filename #251

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify performance considerations for mappers like scale_to_z_score_per_key and key_vocabulary_filename #251

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions