Segfault when using ffmpeg to decode MP3s

I'm trying to use `tfio.IOTensor.from_ffmpeg` to build a simple data input pipeline reading MP3s, but it results in a segfault after processing a certain number of files. It did several tests to verify that it really is the number of files and not a corrupt file.

Code to reproduce:

```python
import tensorflow as tf
import tensorflow_io as tfio


def ds_from_tsv(label_file, src_path):
    ds = tf.data.experimental.CsvDataset(
        label_file,
        [tf.string, tf.string],
        select_cols=[1, 2],
        field_delim="\t",
        use_quote_delim=False,
        header=True
    )

    ds = ds.map(lambda p, _: tf.strings.join([src_path, p], "/"))

    return ds


def read_mp3():
    # apparently ffmpeg only works in eager mode
    def ffmpeg_decode(path):
        print(path)

        ffmpeg_io = tfio.IOTensor.from_ffmpeg(path)
        audio_io = ffmpeg_io("a:0")
        audio_tensor = audio_io.to_tensor()
        audio_tensor = tf.squeeze(audio_tensor)
        return audio_tensor
    return lambda p: tf.py_function(ffmpeg_decode, [p], tf.int16)


commonvoice_root = "/home/ubuntu/commonvoice_de"

ds = ds_from_tsv(
    commonvoice_root + "/train.tsv",
    commonvoice_root + "/clips"
)

ds = ds.map(read_mp3())

n = 0
for r in ds:
    n += 1
    print(n, r.shape)
```

I'm working with the German part of the [Mozilla CommonVoice](https://voice.mozilla.org/en/datasets) dataset. For me the segfault happens after 1020 files. My machine has 20GB RAM, but the memory consumption of the process does look OK to me - not like a memory leak.

I'm running Ubuntu 18.04.3 LTS and ffmpeg 7:3.4.6-0ubuntu0.18.04.1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Segfault when using ffmpeg to decode MP3s #758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segfault when using ffmpeg to decode MP3s #758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions