Towards Exo-Ego Correspondence 👀

A Technical Review of the State of the Art

Featuring Ego-Exo4D and Object Mask Matching (O-MaMa)

Research project for Deep Learning for Computer Vision (20060) Bocconi MSc course.

Built on O-MaMa with support for multiple backbone architectures (DINOv2, DINOv3, ResNet-50) and precomputed feature extraction for accelerated training.

Read the official report.

📁 Project Structure

exo-ego-correspondence/
├── config/                                                     # Environment configuration
│   └── requirements.txt                                        # Python dependencies
├── data/                                                       # Dataset storage (gitignored)
│   ├── raw/                                                    # Raw EgoExo4D videos
│   ├── root/                                                   # Processed data for O-MaMa
│   ├── casa_gio/                                               # Custom hand-made dataset
│   └── annotations/                                            # Relation annotations
├── docs/                                                       # Project documentation
│   ├── BOTTLENECK_ANALYSIS.md
│   ├── DATA_PIPELINE_GUIDE.md
│   └── RELATION_DATA_GUIDE.md
│   └── report_towards_exo-ego_correspondence.pdf
│   └── presentation_towards_exo-ego_correspondence.pdf
├── notebooks/                                                  # Jupyter notebooks
├── results/                                                    # Experiment outputs
│   ├── training_run_*/                                         # Training logs & checkpoints
│   ├── evaluation_*_run_*/                                     # Evaluation metrics
│   └── timing_profile_*/                                       # Performance benchmarks
└── src/                                                        # Source code
    ├── O-MaMa/                                                 # Core model implementation
    ├── scripts/                                                # Data processing & utilities
    ├── fastsam_extraction/                                     # FastSAM mask extraction
    └── dinov3-main/                                            # DINOv3 backbone setup

⚡ Quick Start

1. Environment Setup

# Clone the repository
git clone https://github.com/your-username/ego-exo-correspondence.git
cd ego-exo-correspondence

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r config/requirements.txt

2. Data Preparation

# Download and process EgoExo4D data
cd src/scripts
python download_and_process_data.py --scenario health

# Generate ego-exo pairs
python create_pairs.py --data_dir ../../data/root --scenario health

# Extract FastSAM masks
cd ../fastsam_extraction
python extract_masks_FastSAM.py

3. Precompute Features (Recommended)

Precomputing features provides 10x faster training by caching backbone outputs:

cd src/scripts

# DINOv3 (default, 384-dim features)
python precompute_features_dinov3.py --root ../../root

# DINOv2 (768-dim features)
python precompute_features_dinov2.py --root ../../root

# ResNet-50 (2048-dim features)
python precompute_features_resnet50.py --root ../../root

🚀 Usage

Training

cd src/O-MaMa

# Train with DINOv3 features
python main_precomputed.py \
    --root ../../root \
    --reverse \
    --patch_size 16

# Train with DINOv2 features
python main_precomputed.py \
    --root ../../root \
    --reverse \
    --patch_size 14 \
    --dino_feat_dim 768

# Train with ResNet-50 features
python main_precomputed.py \
    --root ../../root \
    --reverse \
    --dino_feat_dim 2048

Evaluation

cd src/O-MaMa

# Evaluate trained model
python main_eval_precomputed.py \
    --root ../../root \
    --reverse \
    --patch_size 16 \
    --checkpoint_dir train_output/run_XXX/model_weights/best_IoU_run_XXX.pt

# Evaluate baseline (no fine-tuning)
python main_eval_precomputed.py \
    --root ../../root \
    --reverse \
    --patch_size 16

📊 Results

Results are organized by experiment type:

Directory	Contents
`training_run_*`	Training logs, loss curves, model checkpoints
`evaluation_baseline_run_*`	Baseline (pretrained) model metrics
`evaluation_finetuned_run_*`	Fine-tuned model metrics
`timing_profile_*`	Performance benchmarks
`casa_gio_*`	Custom dataset evaluation

Each evaluation produces:

results_metrics_run_*.json — Per-sample IoU scores and predictions
evaluation_run_*.log — Aggregate metrics (mean IoU, accuracy)

🔧 Key Components

Component	Description
O-MaMa	Object Matching with Masked Attention model for correspondence
FastSAM	Fast Segment Anything for proposal mask extraction
DINOv2/v3	Self-supervised vision transformers for feature extraction
ResNet-50	CNN backbone alternative (DINO pretrained)

📚 Documentation

docs/DATA_PIPELINE_GUIDE.md — End-to-end data preparation
docs/RELATION_DATA_GUIDE.md — EgoExo4D annotation format
docs/BOTTLENECK_ANALYSIS.md — Performance optimization notes

📄 License

This project is for academic purposes. See LICENSE for details.

🙏 Acknowledgments

O-MaMa — Base model architecture
EgoExo4D — Dataset
FastSAM — Mask extraction
DINOv1 / DINOv2 / DINOv3 — Feature backbones

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Exo-Ego Correspondence 👀

A Technical Review of the State of the Art

Featuring Ego-Exo4D and Object Mask Matching (O-MaMa)

📁 Project Structure

⚡ Quick Start

1. Environment Setup

2. Data Preparation

3. Precompute Features (Recommended)

🚀 Usage

Training

Evaluation

📊 Results

🔧 Key Components

📚 Documentation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
assets		assets
config		config
data		data
docs		docs
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

marcolomele/exo-ego-correspondence

Folders and files

Latest commit

History

Repository files navigation

Towards Exo-Ego Correspondence 👀

A Technical Review of the State of the Art

Featuring Ego-Exo4D and Object Mask Matching (O-MaMa)

📁 Project Structure

⚡ Quick Start

1. Environment Setup

2. Data Preparation

3. Precompute Features (Recommended)

🚀 Usage

Training

Evaluation

📊 Results

🔧 Key Components

📚 Documentation

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages