Skip to content

FolderBase Dataset automatically resolves under current directory when data_dir is not specified #6152

Closed
@npuichigo

Description

@npuichigo

Describe the bug

FolderBase Dataset automatically resolves under current directory when data_dir is not specified.

For example:

load_dataset("audiofolder")

takes long time to resolve and collect data_files from current directory. But I think it should reach out to this line for error handling

if not self.config.data_files:
raise ValueError(f"At least one data file must be specified, but got data_files={self.config.data_files}")

Steps to reproduce the bug

load_dataset("audiofolder")

Expected behavior

Error report

Environment info

  • datasets version: 2.14.4
  • Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17
  • Python version: 3.8.15
  • Huggingface_hub version: 0.16.4
  • PyArrow version: 12.0.1
  • Pandas version: 1.5.3

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions