Skip to content

Add blender dataset + alpha training support #573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

thomakah
Copy link

@thomakah thomakah commented Mar 4, 2025

This pull request adds the following:

  • Support for the commonly-used synthetic blender dataset
  • Support for source images with alpha channels, and small changes to training and evaluation to accommodate this (more info below)

The latter is the main intended purpose of the pull request, and the former is to allow for convenient testing on a public dataset with transparent images. Building off of this work, users can train models using matted photos to only produce splats for the object of interest - however this would require a small additional change on the COLMAP parsing to load the alpha channel (and is not included in this pull request).

Alpha is handled in the following way:

  • For evaluation, both the render and ground truth photo are composed on a fixed background (defined by a new configuration parameters, defaulting to white).
  • For training
    • if the (existing but slightly repurposed) configuration setting random_bkgd is True: on each iteration the same random background color is applied for both the photo and render
    • otherwise: the fixed background color is used

Using the random background in training encourages better alpha consistency and reduces floaters. However, the final evaluation metrics are marginally worse than if using a fixed background. For this reason, I have not enforced a setting of random_bkgd = True when blender data is used.

The metrics are included below. alpha_iou is the intersection over union comparing the rendered alpha to the source image alpha channel (thresholding on > 127). nerfbaselines has also assessed the Blender dataset on an earlier version of gsplat (patching in the dataset compatibility on their end). Compared to their results, 6 scenes show marginally better PSNR while 2 (materials and ficus) have marginally worse PSNR.

scene strategy background psnr ssim lpips alpha_iou
chair default fixed_bkgd 35.823 0.987 0.008 0.991
chair default random_bkgd 35.718 0.987 0.008 0.997
chair mcmc fixed_bkgd 36.398 0.989 0.008 0.993
chair mcmc random_bkgd 36.149 0.989 0.009 0.997
drums default fixed_bkgd 26.121 0.954 0.039 0.982
drums default random_bkgd 26.024 0.953 0.040 0.989
drums mcmc fixed_bkgd 26.209 0.956 0.040 0.973
drums mcmc random_bkgd 26.127 0.955 0.042 0.989
ficus default fixed_bkgd 34.360 0.987 0.010 0.952
ficus default random_bkgd 33.639 0.987 0.012 0.965
ficus mcmc fixed_bkgd 34.957 0.988 0.008 0.958
ficus mcmc random_bkgd 34.023 0.987 0.011 0.967
hotdog default fixed_bkgd 37.668 0.985 0.014 0.981
hotdog default random_bkgd 37.417 0.985 0.013 0.998
hotdog mcmc fixed_bkgd 38.373 0.988 0.011 0.762
hotdog mcmc random_bkgd 37.717 0.988 0.010 0.997
lego default fixed_bkgd 35.286 0.981 0.012 0.996
lego default random_bkgd 35.046 0.981 0.013 0.997
lego mcmc fixed_bkgd 35.732 0.984 0.011 0.994
lego mcmc random_bkgd 35.431 0.984 0.012 0.997
materials default fixed_bkgd 29.885 0.960 0.019 0.997
materials default random_bkgd 29.791 0.960 0.021 0.997
materials mcmc fixed_bkgd 30.577 0.965 0.018 0.997
materials mcmc random_bkgd 30.464 0.965 0.018 0.997
mic default fixed_bkgd 35.335 0.991 0.006 0.992
mic default random_bkgd 35.256 0.992 0.006 0.994
mic mcmc fixed_bkgd 37.226 0.994 0.004 0.992
mic mcmc random_bkgd 36.522 0.993 0.005 0.994
ship default fixed_bkgd 30.764 0.906 0.088 0.955
ship default random_bkgd 30.324 0.906 0.088 0.994
ship mcmc fixed_bkgd 31.437 0.911 0.075 0.829
ship mcmc random_bkgd 30.739 0.910 0.077 0.994

@thomakah
Copy link
Author

thomakah commented Mar 6, 2025

To better illustrate the benefit of the random background, here is a validation render from training with a fixed background (using MCMC, though we see similar effects with the default strategy). The figures show the ground truth image on the left, and the corresponding render on the right.

val_step29999_0167_fixed

Here is the validation render using a random background:
val_step29999_0167_random

Note that the large patch of white floaters below the object is no longer present after using the random background in training.

@f-dy
Copy link

f-dy commented Mar 13, 2025

related issue (closed a bit quickly): #89

I knew it didn't work by looking at the nerfbaselines outputs, which had lots of distant floaters:

All the blender examples in https://nerfbaselines.github.io/m-gsplat have that issue, which is NOT reflected in the PSNR/SSIM numbers (@jkulhanek)

Original 3DGS doesn't have the issue: https://nerfbaselines.github.io/m-gaussian-splatting

Splatfacto isn't in nerfbaselines, but I think it doesn't have that issue.

Looking at the code, it seemed clear that something was missing. Thank you @thomakah for fixing it.

@weihan1
Copy link

weihan1 commented Mar 30, 2025

hey @thomakah i've implemented your pull request and i get similar results on the blender scenes with transparent bkgd. I was wondering if you have tried Blender scenes with non-transparent bkgd (e.g. hdri bkgd)? My NVS results for these scenes seem to be a lot worse (blurry bkgd, floaters), despite using a fairly simple camera trajectory and dense supervision

@thomakah
Copy link
Author

@weihan1 no, I've only used a transparent background

By HDRI background, do you mean you compose the render against an HDRI background? is the background rotated properly as it would be if it were a real 3D environment? Are you re-rendering from the original blender files to properly light the objects?

I personally have not seen any use of radiance fields on synthetic data with a composed background. I think you would need at least need to adjust parameters. For example, the random point initialization needs to extend far enough to cover background. But it seems even more complex than that. I can't picture easily how the camera models would apply. For example, the HDRI would have a different effective focal length than the renders, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants