Skip to content

[wip] Update VideoDecoder init #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Aug 1, 2025

This PR enables custom_frame_mappings to be used in the Python VideoDecoder.

  • Implements read_custom_frame_mappings(). This parses a JSON str or JSON file to extract all_frames, is_key_frame, duration.
  • Updates the seek_mode if custom_frame_mappings is passed in to avoid the exact mode scan during initialization.
  • Tests are added to check that the frames are being decoded correctly.

Benchmarking

I wrote a short benchmarking script to test the initialization times of custom_frame_mappings versus exact.

custom_frame_mapping mode was quicker than exact mode if the ffprobe command used to generate the frame mapping json used the -show_entries option to reduce the JSON size.

With this optimization the performance improvement increased with longer videos.
Without this optimization, using the full ffprobe output, exact mode was faster.

The results on nasa_13013.mp4:

exact:
med = 3.38ms +- 2.54
custom_frame_mappings:
med = 3.02ms +- 0.86

The results on a generated video, mandelbrot_1920x1080_120s.mp4:

exact:
med = 29.51ms +- 7.56
custom_frame_mappings:
med = 16.69ms +- 9.38

The benchmarking code:

import subprocess
import torch
from time import perf_counter_ns

from torchcodec.decoders._video_decoder import VideoDecoder


def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med

def main() -> None:
    """Benchmarks the init of VideoDecoder with different seek_modes"""
    resources_dir = "/Users/danielflores3/torchcodec/test/resources/"

    # video=resources_dir+"nasa_13013.mp4"
    video=resources_dir+"mandelbrot_1920x1080_120s.mp4"

    mappings_json = subprocess.run(
            [
                "ffprobe",
                "-i",
                f"{video}",
                "-select_streams",
                "0",
                "-show_frames",
                "-show_entries", "frame=pts,key_frame,duration",
                "-of",
                "json",
            ],
            check=True,
            capture_output=True,
            text=True,
        ).stdout
    
    print("exact:")
    report_stats(bench(VideoDecoder, source=video, seek_mode="exact", stream_index=0))

    print("custom_frame_mappings:")
    report_stats(bench(VideoDecoder, source=video, custom_frame_mappings=mappings_json, stream_index=0))


if __name__ == "__main__":
    main()
</details>

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 1, 2025
@NicolasHug
Copy link
Member

Thanks for the benchmarks! That's quite interesting. It'd be interesting to use the core API instead of the VideoDecoder in the benchmarks, so that we can measure the time for add_video_stream() specifically when passing the frame mapping as tensors.

If that's faster than add_video_stream(seek_mode="exact"), as we would expect, then it may be that what's slowing down the frame-mappings is the io or the json parsing parts?

@Dan-Flores
Copy link
Contributor Author

Dan-Flores commented Aug 1, 2025

... then it may be that what's slowing down the frame-mappings is the io or the json parsing parts?

I believe this is it - I updated the PR description with my updated benchmarking script and results. By reducing the JSON size to contain only the necessary information, the performance of custom_frame_mapping improves significantly.

Edit: Here's my alternative code for benchmarking the core API by passing the frame mapping as tensors. It shows a similar performance improvement to reducing the JSON, so this could be another viable design approach.

import subprocess
import torch
from time import perf_counter_ns

from torchcodec import _core
from torchcodec.decoders._video_decoder import VideoDecoder, read_custom_frame_mappings


def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med

def main() -> None:
    """Benchmarks the init of VideoDecoder with different seek_modes"""
    resources_dir = "/Users/danielflores3/torchcodec/test/resources/"

    # video=resources_dir+"nasa_13013.mp4"
    video=resources_dir+"mandelbrot_1920x1080_120s.mp4"

    mappings_json = subprocess.run(
            [
                "ffprobe",
                "-i",
                f"{video}",
                "-select_streams",
                "0",
                "-show_frames",
                "-show_entries", "frame=pts,key_frame,duration",
                "-of",
                "json",
            ],
            check=True,
            capture_output=True,
            text=True,
        ).stdout
    
    # Get tensors of each, Pass into core function directly
    custom_frame_mappings_data = read_custom_frame_mappings(mappings_json)
    args = [("exact", None), ("custom_frame_mappings", custom_frame_mappings_data)]
    for seek_mode, frame_mappings_data in args:
        # benchmark speed up in add_video_stream
        print(f"{seek_mode=}")
        report_stats(bench(init_add_stream, 
            video,
            seek_mode,
            frame_mappings_data,
        ))

def init_add_stream(video, seek_mode, frame_mappings_data):
    decoder = _core.create_from_file(str(video), seek_mode)
    _core.add_video_stream( 
            decoder,
            stream_index=0,
            dimension_order="NCHW",
            num_threads=1,
            device="cpu",
            custom_frame_mappings=frame_mappings_data,
        )

if __name__ == "__main__":
    main()

@Dan-Flores Dan-Flores force-pushed the init_with_frame_mappings branch from 897e15d to 0d661f5 Compare August 1, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants