Skip to content

rerun-io/vistadream

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.

Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.

badge-github-stars

VistaDream 3D scene reconstruction

Overview

VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:

  1. Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
  2. Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views

The framework integrates multiple state-of-the-art models:

  • Flux diffusion models for high-quality image outpainting and inpainting
  • 3D Gaussian Splatting for efficient 3D scene representation
  • Rerun for real-time 3D visualization and debugging

Installation

Prerequisites

  • Linux only with NVIDIA GPU (CUDA 12.8)
  • Pixi package manager

Using Pixi

git clone https://github.com/rerun-io/vistadream.git
cd vistadream
pixi run example

This will automatically download the required models and run the example with the included office image.

Usage

Full VistaDream Pipeline - 3D Scene Reconstruction ⚠️ Under Construction

Generate a complete 3D scene from a single image with outpainting, depth estimation, and Gaussian splatting:

pixi run python tools/run_vistadream.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2 --n-frames 10

Note: The full 3D reconstruction pipeline is currently under active development. Some features may be experimental or incomplete.

Single Image Processing

Process a single image with depth estimation and basic 3D reconstruction:

pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg

Flux Outpainting Only

Run just the outpainting component with Rerun visualization:

pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2

Multi-Image Pose & Depth Pipeline (VGGT + MoGe)

Estimate camera intrinsics/extrinsics, per-image depth, confidence masks, and fuse them into an (optionally downsampled) colored point cloud from a directory of images. Results stream live to a Rerun viewer.

pixi run python tools/run_multi_img.py --image-dir /path/to/image_folder

Connect to an already running Rerun viewer (instead of spawning a new one):

pixi run python tools/run_multi_img.py --rr-config.connect --image-dir /path/to/image_folder

Notes:

  • Supported image extensions: .png, .jpg, .jpeg
  • Automatically orients & recenters camera poses ("up" orientation heuristic) and logs a consolidated point cloud plus per‑view RGB, depth, filtered depth, MoGe depth, and confidence.
  • Uses VGGT (multiview geometry transformer) for joint pose & depth, robust depth confidence filtering, MoGe for refined monocular depth, and voxel downsampling to target a manageable point count.

Gradio Web Interface

Launch an interactive web interface for experimenting with the models:

pixi run python tools/gradio_app.py

Key Features

  • Single Image to 3D: Complete pipeline from single image to navigable 3D scene
  • Multi-Image Geometry: Batch multi-view camera & depth estimation with fused colored point cloud export
  • Memory Efficient: Model offloading support for GPU memory management
  • Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
  • Training-free: No fine-tuning required for existing diffusion models
  • High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction

Project Structure

├── src/vistadream/
│   ├── api/                 # High-level pipeline APIs
│   │   ├── flux_outpainting.py    # Outpainting-only pipeline
│   │   ├── multi_image_pipeline.py # Multi-image pose & depth fusion (VGGT + MoGe)
│   │   └── vistadream_pipeline.py # Full 3D reconstruction pipeline
│   ├── flux/                # Flux diffusion model integration
│   │   ├── cli_*.py         # Command-line interfaces
│   │   ├── model.py         # Flux transformer architecture
│   │   ├── sampling.py      # Diffusion sampling logic
│   │   └── util.py          # Model loading and configuration
│   └── ops/                 # Core operations
│       ├── flux.py          # Flux model wrappers
│       ├── gs/              # Gaussian splatting implementation
│       ├── trajs/           # Camera trajectory generation
│       └── visual_check.py  # 3D scene validation tools
└── tools/                   # Standalone applications
    ├── gradio_app.py        # Web interface
    ├── run_flux_outpainting.py
    ├── run_vistadream.py    # Main 3D pipeline
    └── run_single_img.py    # Single image processing

Model Checkpoints

Models are automatically downloaded from Hugging Face on first run. Manual download:

pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/

Expected structure:

ckpt/
├── flux_fill/
│   ├── flux1-fill-dev.safetensors
│   └── ae.safetensors
├── vec.pt
├── txt.pt
└── txt_256.pt

Citation

Thanks to the original authors! If you use VistaDream in your research, please cite:

Original Repo

@inproceedings{wang2025vistadream,
  title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
  author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Acknowledgements

This project builds upon several outstanding works:

Related Work

  • ASUKA - Enhanced image inpainting for mitigating unwanted object insertion
  • MoGe - Accurate monocular geometry estimation for open-domain images

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published