Research project for Deep Learning for Computer Vision (20060) Bocconi MSc course.
Built on O-MaMa with support for multiple backbone architectures (DINOv2, DINOv3, ResNet-50) and precomputed feature extraction for accelerated training.
Read the official report.
exo-ego-correspondence/
├── config/ # Environment configuration
│ └── requirements.txt # Python dependencies
├── data/ # Dataset storage (gitignored)
│ ├── raw/ # Raw EgoExo4D videos
│ ├── root/ # Processed data for O-MaMa
│ ├── casa_gio/ # Custom hand-made dataset
│ └── annotations/ # Relation annotations
├── docs/ # Project documentation
│ ├── BOTTLENECK_ANALYSIS.md
│ ├── DATA_PIPELINE_GUIDE.md
│ └── RELATION_DATA_GUIDE.md
│ └── report_towards_exo-ego_correspondence.pdf
│ └── presentation_towards_exo-ego_correspondence.pdf
├── notebooks/ # Jupyter notebooks
├── results/ # Experiment outputs
│ ├── training_run_*/ # Training logs & checkpoints
│ ├── evaluation_*_run_*/ # Evaluation metrics
│ └── timing_profile_*/ # Performance benchmarks
└── src/ # Source code
├── O-MaMa/ # Core model implementation
├── scripts/ # Data processing & utilities
├── fastsam_extraction/ # FastSAM mask extraction
└── dinov3-main/ # DINOv3 backbone setup
# Clone the repository
git clone https://github.com/your-username/ego-exo-correspondence.git
cd ego-exo-correspondence
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r config/requirements.txt# Download and process EgoExo4D data
cd src/scripts
python download_and_process_data.py --scenario health
# Generate ego-exo pairs
python create_pairs.py --data_dir ../../data/root --scenario health
# Extract FastSAM masks
cd ../fastsam_extraction
python extract_masks_FastSAM.pyPrecomputing features provides 10x faster training by caching backbone outputs:
cd src/scripts
# DINOv3 (default, 384-dim features)
python precompute_features_dinov3.py --root ../../root
# DINOv2 (768-dim features)
python precompute_features_dinov2.py --root ../../root
# ResNet-50 (2048-dim features)
python precompute_features_resnet50.py --root ../../rootcd src/O-MaMa
# Train with DINOv3 features
python main_precomputed.py \
--root ../../root \
--reverse \
--patch_size 16
# Train with DINOv2 features
python main_precomputed.py \
--root ../../root \
--reverse \
--patch_size 14 \
--dino_feat_dim 768
# Train with ResNet-50 features
python main_precomputed.py \
--root ../../root \
--reverse \
--dino_feat_dim 2048cd src/O-MaMa
# Evaluate trained model
python main_eval_precomputed.py \
--root ../../root \
--reverse \
--patch_size 16 \
--checkpoint_dir train_output/run_XXX/model_weights/best_IoU_run_XXX.pt
# Evaluate baseline (no fine-tuning)
python main_eval_precomputed.py \
--root ../../root \
--reverse \
--patch_size 16Results are organized by experiment type:
| Directory | Contents |
|---|---|
training_run_* |
Training logs, loss curves, model checkpoints |
evaluation_baseline_run_* |
Baseline (pretrained) model metrics |
evaluation_finetuned_run_* |
Fine-tuned model metrics |
timing_profile_* |
Performance benchmarks |
casa_gio_* |
Custom dataset evaluation |
Each evaluation produces:
results_metrics_run_*.json— Per-sample IoU scores and predictionsevaluation_run_*.log— Aggregate metrics (mean IoU, accuracy)
| Component | Description |
|---|---|
| O-MaMa | Object Matching with Masked Attention model for correspondence |
| FastSAM | Fast Segment Anything for proposal mask extraction |
| DINOv2/v3 | Self-supervised vision transformers for feature extraction |
| ResNet-50 | CNN backbone alternative (DINO pretrained) |
docs/DATA_PIPELINE_GUIDE.md— End-to-end data preparationdocs/RELATION_DATA_GUIDE.md— EgoExo4D annotation formatdocs/BOTTLENECK_ANALYSIS.md— Performance optimization notes
This project is for academic purposes. See LICENSE for details.
