Abstract
Background
Novel view synthesis has grown significantly in popularity recently thanks to the introduction of implicit 3D scene represenations such as NeRF. These techniques enable many downstream applications in fields ranging from robotics to entertainment. However, most of these techniques are limited by the number of training views required: many need up to 200 views which can be prohibitively high for real-world usage. Prior work has tackled this problem with sparse view techniques, but these almost entirely focus on forward-facing scenes and suffer from long training and inference times of NeRFs.
Approach
We introduce a technique for real-time 360 sparse view synthesis by leveraging 3D Gaussian Splatting. The explicit nature of our scene representations allows to reduce sparse view artifacts with techniques that directly operate on the scene representation in an adaptive manner. Combined with depth based constraints, we are able to render high-quality novel views and depth maps for unbounded scenes.
Our proposed pipeline integrates depth and diffusion constraints, along with a floater pruning technique, to enhance the performance of few-shot novel view synthesis. During training, we render the alpha-blended depth, denoted as dalpha , and employ Pearson correlation to ensure its alignment with the monocularly estimated depth dpt. Furthermore, we impose a score distillation sampling loss on novel viewpoints to guarantee the generation of naturally-appearing images. At predetermined intervals, we execute floater pruning as described in Section 3 of our paper. In this illustration, new components that we introduce are highlighted in color, while the foundational 3D Gaussian Splatting pipeline is depicted in grey.
Key Ideas
- Use off-the-shelf depth estimation models to regularize novel view outputs
- Apply softmax function to gaussian depth values for better depth gradient control
- Leverage the explicit gaussian representation to directly remove “floaters”
- Reconstruct regions with low coverage in training views with diffusion-model guidance
- Use Depth Warping to create more training views
Table 1: Ablation study on pipeline components
Model | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
---|---|---|---|
Base 3DGS | 15.38 | 0.442 | 0.506 |
Base + Alpha-blending Depth Loss | 15.67 | 0.456 | 0.500 |
Base + Softmax Depth Loss | 16.52 | 0.587 | 0.438 |
↑ + SDS Loss | 16.79 | 0.585 | 0.452 |
↑ + Depth Warping | 16.93 | 0.598 | 0.438 |
↑ + Floater Prunning | 17.18 | 0.602 | 0.437 |
Floater Removal Example
Table 2: Sparse-view baseline comparisons on the MipNeRF360 dataset
Model | PSNR ↑ | SSIM ↑ | LPIPS ↓ | Runtime* (h) | Render FPS |
---|---|---|---|---|---|
SparseNeRF | 11.5638 | 0.3206 | 0.6984 | 4 | 1/120 |
RegNeRF | 11.7379 | 0.2266 | 0.6892 | 4 | 1/120 |
Mip-NeRF 360 | 17.1044 | 0.4660 | 0.5750 | 3 | 1/120 |
Base 3DGS | 15.3840 | 0.4415 | 0.5061 | 0.5 | 30 |
ViP-NeRF | 11.1622 | 0.2291 | 0.7132 | 4 | 1/120 |
SparseGS (Ours) | 17.2558 | 0.5067 | 0.4738 | 0.75 | 30 |
We use 12 images for each scene. * Runtimes are recorded on one RTX3090.
Table 3: Sparse-view baseline comparisons on the DTU dataset
Model | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
---|---|---|---|
SparseNeRF | 19.55 | 0.769 | 0.201 |
RegNeRF | 18.89 | 0.745 | 0.190 |
DSNeRF | 16.90 | 0.570 | 0.450 |
Base 3DGS | 14.18 | 0.6284 | 0.301 |
SparseGS (Ours) | 18.89 | 0.702 | 0.229 |
While our method does not surpass some of the previous methods specifically designed for frontal scenes in the DTU dataset, we significantly enhance the original 3DGS method, achieving high visual fidelity and competitive metrics.
More Pictures
More Videos (Updating)
Citation
@article{xiong2023sparsegs,
author = {Xiong, Haolin and Muttukuru, Sairisheek and Upadhyay, Rishi and Chari, Pradyumna and Kadambi, Achuta},
title = {SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting},
journal = {Arxiv},
year = {2023},
}