OpenEEGBench should evaluate EEG foundation models on private held-out data. The current risk is leakage: many submitted or pretrained models may already have seen the public datasets used for testing. The goal here is to identify new testing data that is not released to participants during model development.
- Mirror the task categories used by the OpenEEGBench leaderboard.
- Use private held-out datasets where labels and raw recordings are not available to model developers.
- Run evaluation as code submission only: participants submit models, and the organizer runs them on hidden data.
- Release only aggregate scores until the benchmark is frozen.
- Add more private datasets to cover categories that are not yet represented.
OpenEEGBench Categories to Cover
- Motor Imagery
- Sleep Staging
- Pathology Detection
- Seizure Detection
- Emotion Recognition
- Speech Imagery
- Event Classification
- Cognitive Load
| Category | Private dataset candidate | Target task | Notes |
|---|---|---|---|
| Sleep Staging | Onton 36-channel sleep validation data | Sleep stage classification | Use this as private test data. The related public dataset is OpenNeuro ds006695 / NEMAR, "Validation of Sleep Staging with Forehead EEG Patch." That public release contains curated EEG recordings for validating sleep staging from a three-electrode forehead patch. The planned private data are a different 36-channel dataset from the same protocol, not the public three-electrode patch release. It includes EEG.VisualHypnogram manual 30-second epoch labels: Wake, REM, N1, N2, N3, and unknown/movement. Models could be trained to classify sleep states with about 85% cross-validation accuracy. |
| Cognitive Load / Behavioral Prediction | HBN reaction-time prediction data | Predict Contrast Change Detection response time | Source: papers/2506.19141v2.pdf. The HBN-EEG challenge uses 128-channel EEG from 300 subjects across six tasks, and its use was validated for the 2025 NeurIPS competition. For the transfer challenge, models predict CCD response time from Surround Suppression and pre-trial EEG. The intended private test set is the withheld HBN Release 12, evaluated by normalized RMSE. |
| Motor Imagery / BCI Commands | Planned 2026 EEG/EMG BCI competition | Three-class BCI command decoding | Source: papers/NeurIPS_2026___EEG_EMG_competition.pdf. Track 2 uses cued mental tasks: kinesthetic motor imagery, mental calculation, and word/letter association. The new BCI command dataset contains 20 subjects recorded across six sessions separated by multiple days, balanced over Graz and BrainHero runs. Hidden sessions are used for within-subject cross-session testing, with balanced accuracy as the headline metric. |
More private data are still needed. Current candidates do not adequately cover:
- Speech Imagery
- Event Classification
- Seizure Detection
- Emotion Recognition
- Additional private Motor Imagery subjects and sessions beyond the BCI command data
- Additional private Sleep Staging data if the Onton 36-channel set is too small
- OpenEEGBench Space: https://huggingface.co/spaces/braindecode/OpenEEGBench#/
- OpenEEGBench benchmark paper: Toward OpenEEG-Bench: A Live Community-Driven Benchmark for EEG Foundation Models
- Related Onton/NEMAR dataset from the same protocol, not the planned private 36-channel dataset: https://nemar.org/dataexplorer/detail?dataset_id=ds006695
- HBN reaction-time challenge paper: papers/2506.19141v2.pdf
- NeurIPS 2026 EEG/EMG BCI command competition paper: papers/NeurIPS_2026___EEG_EMG_competition.pdf