Add embeddings to datasets?

One thing I have done a number of times, manually: 
1. Download a video dataset such as ASL citizen. Usually direectly from the source so I have the .mp4 files, rather than with this library. 
2. run pose estimation on them all, foo1.mp4, foo2.mp4
3. put those through SignCLIP, saving off the embeddng as foo1-embedded-using-asl-citizen-model.npy, foo1-embedded-using-sem-lex-model.npy, etc.
4. backup those files somewhere. 

It would be nice to have a consistent, documented way to bring all this into the `sign-language-datasets` ecosystem. Is there a standardized method for how to save the embeddings, load them in, etc?

Perhaps something like...
```
ds = tfds.load("asl-citizen")

# if they're hosted somewhere and the dataloader knows it
ds_with_embeddings = tfds.load("asl-citizen", embeddings="signclip_asl_citizen") 

# if they're hosted locally
ds_with_embeddings = tfds.load("asl-citizen", embeddings="/path/to/folder/with/embeddings") 
```


See also: https://www.tensorflow.org/datasets/catalog/sift1m which is a tfds with pretrained embeddings

See also also: https://www.tensorflow.org/datasets/catalog/laion400m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add embeddings to datasets? #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add embeddings to datasets? #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions