fix(webdataset): don't .lower() field_name #7726

YassineYousfi · 2025-08-05T16:57:09Z

This fixes cases where keys have upper case identifiers

YassineYousfi · 2025-08-08T16:56:57Z

lhoestq · 2025-08-13T13:12:21Z

src/datasets/packaged_modules/webdataset/webdataset.py

            else:
-                data_extension = field_name.split(".")[-1]
+                data_extension = field_name.split(".")[-1].lower()
            if data_extension in cls.DECODERS:
                current_example[field_name] = cls.DECODERS[data_extension](current_example[field_name])


we need it lowered to check if it's in cls.DECODERS no ?

yes the data_extension is lowered but the field_name is not in the proposed fix

lhoestq

LGTM :) can you just run make style before we merge ?

this will fix the code formatting for the CI

HuggingFaceDocBuilderDev · 2025-08-18T15:41:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq · 2025-08-20T16:35:52Z

CI failures are unrelated, merging :)

wds: lower everywhere

cdec70f

YassineYousfi changed the title ~~webdataset: consistent .lower() for keys~~ fix(webdataset): consistent .lower() for keys Aug 5, 2025

better: just use lower for checks

1bbebf5

YassineYousfi changed the title ~~fix(webdataset): consistent .lower() for keys~~ fix(webdataset): don't .lower() for keys Aug 5, 2025

YassineYousfi changed the title ~~fix(webdataset): don't .lower() for keys~~ fix(webdataset): don't .lower() field_name Aug 5, 2025

YassineYousfi mentioned this pull request Aug 8, 2025

webdataset: key errors when field_name has upper case characters #7732

Open

lhoestq reviewed Aug 13, 2025

View reviewed changes

lhoestq approved these changes Aug 18, 2025

View reviewed changes

YassineYousfi added 2 commits August 19, 2025 23:13

make style

8265d7b

Merge branch 'main' into wds-lower

f5aa7a1

lhoestq merged commit 896616c into huggingface:main Aug 20, 2025
5 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(webdataset): don't .lower() field_name #7726

fix(webdataset): don't .lower() field_name #7726

Uh oh!

YassineYousfi commented Aug 5, 2025

Uh oh!

YassineYousfi commented Aug 8, 2025

Uh oh!

lhoestq Aug 13, 2025

Uh oh!

YassineYousfi Aug 14, 2025

Uh oh!

lhoestq Aug 18, 2025

Uh oh!

lhoestq left a comment •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 18, 2025

Uh oh!

lhoestq commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

fix(webdataset): don't .lower() field_name #7726

fix(webdataset): don't .lower() field_name #7726

Uh oh!

Conversation

YassineYousfi commented Aug 5, 2025

Uh oh!

YassineYousfi commented Aug 8, 2025

Uh oh!

lhoestq Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

YassineYousfi Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

lhoestq left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 18, 2025

Uh oh!

lhoestq commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

lhoestq left a comment •

edited

Loading