GitHub - pcl3dv/OV-NeRF: [IEEE TCSVT24] OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

Guibiao Liao^1,2, Kaichen Zhou³, Zhenyu Bao^1,2, Kanglin Liu^{2, *}, Qing Li^{2, *}

¹Peking University ²Pengcheng Laboratory ³University of Oxford

^*Corresponding author

Paper

Abstract: The development of Neural Radiance Fields (NeRFs) has provided a potent representation for encapsulating the geometric and appearance characteristics of 3D scenes. Enhancing the capabilities of NeRFs in open-vocabulary 3D semantic perception tasks has been a recent focus. However, current methods that extract semantics directly from Contrastive Language-Image Pretraining (CLIP) for semantic field learning encounter difficulties due to noisy and view-inconsistent semantics provided by CLIP. To tackle these limitations, we propose OV-NeRF, which exploits the potential of pre-trained vision and language foundation models to enhance semantic field learning through proposed single-view and cross-view strategies. First, from the single-view perspective, we introduce Region Semantic Ranking (RSR) regularization by leveraging 2D mask proposals derived from Segment Anything (SAM) to rectify the noisy semantics of each training view, facilitating accurate semantic field learning. Second, from the cross-view perspective, we propose a Cross-view Self-enhancement (CSE) strategy to address the challenge raised by view-inconsistent semantics. Rather than invariably utilizing the 2D inconsistent semantics from CLIP, CSE leverages the 3D consistent semantics generated from the well-trained semantic field itself for semantic field training, aiming to reduce ambiguity and enhance overall semantic consistency across different views. Extensive experiments validate our OV-NeRF outperforms current state-of-the-art methods, achieving a significant improvement of 20.31% and 18.42% in mIoU metric on Replica and Scannet, respectively. Furthermore, our approach exhibits consistent superior results across various CLIP configurations, further verifying its robustness.

Qualitative Result

Replica

ScanNet

3DOVS

Quantitative Result

Replica

ScanNet

3DOVS

Data Preparation

We provide the preprocessed dataset here. You can download them through the following link. Google Drive | BaiduWangpan

Installation

Tested on Ubuntu 18.04 + Pytorch 1.12.1+cu116

On default, run the following commands to install the relative packages

conda create -n ovnerf python=3.9
conda activate ovnerf
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install ftfy regex tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia tensorboard
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/facebookresearch/segment-anything.git

Training

1. Train original TensoRF

This step is for reconstructing the TensoRF for the scenes. Please modify the datadir and expname in configs/resonstruction/$scene_name.txt to specify the dataset path and the experiment name. By default we set datadir to data/$scene_name and expname as $scene_name. You can then train the original TensoRF by:

bash script/reconstruction.sh [GPU_ID]

The reconstructed TensoRF will be saved in log/$dataset/$scene_name.

2. Train segmentation

We provide the training script for our datasets under configs as $scene_name.txt. You can train the segmentation by:

bash scripts/segmentation.sh [CONFIG_FILE] [GPU_ID]

The trained model will be saved in log_seg/$dataset/$scene_name.

3. Evaluate reconstruction

bash script/test_reconstruction.sh

4. Evaluate segmentation

bash script/test_segmentation.sh

Pre-trained Models & Outputs

We provide the pre-trained models and outputs of our method. You can download them through the following link. BaiduWangpan

TODO list

release the code of the training
release the code of the evaluation
update the arxiv link
release the preprocessed dataset
release the pretrained model
release the code of preprocessing

Acknowledgements

Some codes are borrowed from TensoRF, SAM and 3DOVS. We thank all the authors for their great work.

Citation

Cite below if you find this repository helpful to your project:

@article{liao2024ov,
  title={OV-NeRF: Open-vocabulary neural radiance fields with vision and language foundation models for 3D semantic understanding},
  author={Liao, Guibiao and Zhou, Kaichen and Bao, Zhenyu and Liu, Kanglin and Li, Qing},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  volume={34},
  pages={12923--12936},
  year={2024},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
configs		configs
dataLoader		dataLoader
extra		extra
images		images
models		models
scripts		scripts
README.md		README.md
funcs.py		funcs.py
opt.py		opt.py
renderer.py		renderer.py
train.py		train.py
train_seg.py		train_seg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

Paper

Qualitative Result

Replica

ScanNet

3DOVS

Quantitative Result

Replica

ScanNet

3DOVS

Data Preparation

Installation

Training

1. Train original TensoRF

2. Train segmentation

3. Evaluate reconstruction

4. Evaluate segmentation

Pre-trained Models & Outputs

TODO list

Acknowledgements

Citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

pcl3dv/OV-NeRF

Folders and files

Latest commit

History

Repository files navigation

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

Paper

Qualitative Result

Replica

ScanNet

3DOVS

Quantitative Result

Replica

ScanNet

3DOVS

Data Preparation

Installation

Training

1. Train original TensoRF

2. Train segmentation

3. Evaluate reconstruction

4. Evaluate segmentation

Pre-trained Models & Outputs

TODO list

Acknowledgements

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages