[wip] Update VideoDecoder init #799

Dan-Flores · 2025-08-01T05:51:33Z

This PR enables custom_frame_mappings to be used in the Python VideoDecoder.

Implements read_custom_frame_mappings(). This parses a JSON str or JSON file to extract all_frames, is_key_frame, duration.
Updates the seek_mode if custom_frame_mappings is passed in to avoid the exact mode scan during initialization.
Tests are added to check that the frames are being decoded correctly.

Benchmarking

I wrote a short benchmarking script to test the initialization times of custom_frame_mappings versus exact.

custom_frame_mapping mode was quicker than exact mode if the ffprobe command used to generate the frame mapping json used the -show_entries option to reduce the JSON size.

With this optimization the performance improvement increased with longer videos.
Without this optimization, using the full ffprobe output, exact mode was faster.

The results on nasa_13013.mp4:

exact:
med = 3.38ms +- 2.54
custom_frame_mappings:
med = 3.02ms +- 0.86

The results on a generated video, mandelbrot_1920x1080_120s.mp4:

exact:
med = 29.51ms +- 7.56
custom_frame_mappings:
med = 16.69ms +- 9.38

The benchmarking code:

import subprocess
import torch
from time import perf_counter_ns

from torchcodec.decoders._video_decoder import VideoDecoder


def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med

def main() -> None:
    """Benchmarks the init of VideoDecoder with different seek_modes"""
    resources_dir = "/Users/danielflores3/torchcodec/test/resources/"

    # video=resources_dir+"nasa_13013.mp4"
    video=resources_dir+"mandelbrot_1920x1080_120s.mp4"

    mappings_json = subprocess.run(
            [
                "ffprobe",
                "-i",
                f"{video}",
                "-select_streams",
                "0",
                "-show_frames",
                "-show_entries", "frame=pts,key_frame,duration",
                "-of",
                "json",
            ],
            check=True,
            capture_output=True,
            text=True,
        ).stdout
    
    print("exact:")
    report_stats(bench(VideoDecoder, source=video, seek_mode="exact", stream_index=0))

    print("custom_frame_mappings:")
    report_stats(bench(VideoDecoder, source=video, custom_frame_mappings=mappings_json, stream_index=0))


if __name__ == "__main__":
    main()

</details>

NicolasHug · 2025-08-01T15:14:43Z

Thanks for the benchmarks! That's quite interesting. It'd be interesting to use the core API instead of the VideoDecoder in the benchmarks, so that we can measure the time for add_video_stream() specifically when passing the frame mapping as tensors.

If that's faster than add_video_stream(seek_mode="exact"), as we would expect, then it may be that what's slowing down the frame-mappings is the io or the json parsing parts?

Dan-Flores · 2025-08-01T15:23:24Z

... then it may be that what's slowing down the frame-mappings is the io or the json parsing parts?

I believe this is it - I updated the PR description with my updated benchmarking script and results. By reducing the JSON size to contain only the necessary information, the performance of custom_frame_mapping improves significantly.

Edit: Here's my alternative code for benchmarking the core API by passing the frame mapping as tensors. It shows a similar performance improvement to reducing the JSON, so this could be another viable design approach.

import subprocess
import torch
from time import perf_counter_ns

from torchcodec import _core
from torchcodec.decoders._video_decoder import VideoDecoder, read_custom_frame_mappings


def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med

def main() -> None:
    """Benchmarks the init of VideoDecoder with different seek_modes"""
    resources_dir = "/Users/danielflores3/torchcodec/test/resources/"

    # video=resources_dir+"nasa_13013.mp4"
    video=resources_dir+"mandelbrot_1920x1080_120s.mp4"

    mappings_json = subprocess.run(
            [
                "ffprobe",
                "-i",
                f"{video}",
                "-select_streams",
                "0",
                "-show_frames",
                "-show_entries", "frame=pts,key_frame,duration",
                "-of",
                "json",
            ],
            check=True,
            capture_output=True,
            text=True,
        ).stdout
    
    # Get tensors of each, Pass into core function directly
    custom_frame_mappings_data = read_custom_frame_mappings(mappings_json)
    args = [("exact", None), ("custom_frame_mappings", custom_frame_mappings_data)]
    for seek_mode, frame_mappings_data in args:
        # benchmark speed up in add_video_stream
        print(f"{seek_mode=}")
        report_stats(bench(init_add_stream, 
            video,
            seek_mode,
            frame_mappings_data,
        ))

def init_add_stream(video, seek_mode, frame_mappings_data):
    decoder = _core.create_from_file(str(video), seek_mode)
    _core.add_video_stream( 
            decoder,
            stream_index=0,
            dimension_order="NCHW",
            num_threads=1,
            device="cpu",
            custom_frame_mappings=frame_mappings_data,
        )

if __name__ == "__main__":
    main()

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 1, 2025

Daniel Flores added 4 commits August 1, 2025 17:17

Update VideoDecoder init

1d59870

Update seek_mode passed to C++ when frame_mappings used

b107435

Update test match string

6fa070e

update json error message

0d661f5

Dan-Flores force-pushed the init_with_frame_mappings branch from 897e15d to 0d661f5 Compare August 1, 2025 21:21

Daniel Flores added 4 commits August 1, 2025 17:26

lints

aeac69b

Fix type annotations for linter

526eda0

more annotation fixes

d668dbc

type annotation

e220d14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] Update VideoDecoder init #799

[wip] Update VideoDecoder init #799

Uh oh!

Dan-Flores commented Aug 1, 2025 •

edited

Loading

Uh oh!

NicolasHug commented Aug 1, 2025

Uh oh!

Dan-Flores commented Aug 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

[wip] Update VideoDecoder init #799

Are you sure you want to change the base?

[wip] Update VideoDecoder init #799

Uh oh!

Conversation

Dan-Flores commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking

Uh oh!

NicolasHug commented Aug 1, 2025

Uh oh!

Dan-Flores commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Dan-Flores commented Aug 1, 2025 •

edited

Loading

Dan-Flores commented Aug 1, 2025 •

edited

Loading