New Preprocessing Feature  - Deduplication [Request]

**Is your feature request related to a problem? Please describe.**
Many large datasets are full of duplications and it has been shown that deduplicating datasets can lead to better performance while training, and more truthful evaluation at test-time.

A feature that allows one to easily deduplicate a dataset can be cool!

**Describe the solution you'd like**
We can define a function and keep only the first/last data-point that yields the value according to this function.

**Describe alternatives you've considered**
The clear alternative is to repeat a clear boilerplate every time someone want to deduplicate a dataset.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Preprocessing Feature - Deduplication [Request] #4448

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Preprocessing Feature - Deduplication [Request] #4448

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions