BASE
LINE
Neural signals only become meaningful when interpreted relative to themselves.
Why EEG fails in the wild
Most EEG research optimizes for benchmark accuracy on controlled lab datasets. Deployment reveals the gap between academic performance and real-world robustness.
"Subject-specific covariance normalization can partially stabilize feature distributions across sessions without requiring retraining of downstream models."
Inter-subject Variability
Resting alpha power can differ by 3× across individuals. Population-level models trained on pooled data are structurally biased against everyone they claim to represent.
Longitudinal Drift
EEG features shift gradually across sessions due to electrode impedance changes, circadian effects, and cognitive load history. Static calibrations become unreliable within hours.
Wearable Constraints
Consumer EEG devices offer 2–4 channels, noisy preprocessing, and no gel. Clinical preprocessing pipelines designed for 64-channel lab setups fail in deployment.
Limited Calibration
Real-world users will not sit through 30-minute calibration sessions. Systems that need large labeled datasets per user are commercially non-viable.
Studying the stability–utility gap
The central question is not whether alignment works, since it measurably does, but whether improved stability translates to improved decoding performance. This evaluation framework tests three alignment strategies under identical wearable-constrained conditions and measures both drift reduction and classification accuracy independently.
From a calibration session, estimate the subject-specific mean vector μ and regularized covariance matrix Σ, capturing the individual's neural feature distribution at a reference point in time.
New sessions are projected into a covariance-normalized feature space. Mahalanobis deviation from the calibration baseline quantifies how far the current session has drifted from the subject's reference distribution.
Slow drift is tracked via exponential moving-average recalibration. The estimated baseline gradually follows the subject's evolving distribution, at the cost of potentially over-adapting to transient states.
Aligned features replace raw features as input to a downstream classifier. The research question: does reducing distributional drift improve decoding performance, or are the two quantities dissociable?
"Does subject-specific covariance normalization stabilize EEG feature distributions across sessions, and if so, does that stability translate to improved downstream decoding performance under wearable deployment constraints?"
The answer found here: yes to the first, no to the second. The dissociation between these two outcomes is the principal finding and the motivation for further investigation.
What the data shows
Evaluated on real EEG data from BNCI Horizon 2020 (dataset 001-2014), subject A01, trained on session A01T and evaluated on future session A01E. A 4-channel wearable-constrained subset was used to simulate realistic deployment conditions.
"Covariance normalization substantially reduced longitudinal feature instability, though reduced drift did not necessarily improve downstream decoding accuracy."
Feature shift measured as L₂ distance between session-mean feature vectors (train A01T vs test A01E). Covariance whitening reduced longitudinal instability by 94.6%, from 4.06 to 0.22. Moving-average adaptation achieved moderate reduction (86.5%). This is the clearest finding in the evaluation.
Raw feature shift of 4.06 confirms longitudinal instability under wearable constraints. This is the core problem Baseline is designed to address.
Covariance normalization reduced shift from 4.06 to 0.22, a 94.6% reduction, demonstrating effective statistical alignment.
Feature stabilization did not translate to decoding gains. This dissociation suggests the downstream model absorbed the remaining variance independently.
Four views into one problem
Each module is a different lens on the same phenomenon: longitudinal distribution shift in EEG systems. All simulations are deterministic and run in the browser, with no backend required. Seed-controlled for reproducibility.
Drift Geometry Lab
Visualize session-to-session feature space drift and the limits of alignment
Alignment reduces centroid drift. It does not improve class separability.
Across all four views, the same structural finding emerges: EEG feature distributions shift across time in ways that are measurable and partially correctable. Correction and improvement are not the same thing.

Building infrastructure
for minds at the
margin of measurement.
I'm interested in the gap between laboratory neuroscience and real-world neurotechnology, specifically why EEG systems that perform well in controlled environments fail to remain stable across time, users, and deployment conditions.
My current work centers on BASELINE, a research investigation into longitudinal distribution shift in wearable EEG systems. The core question: can subject-specific statistical alignment stabilize feature distributions across sessions, and if it can, does that stability actually improve decoding performance? The answer, as the data shows, is that these two things are dissociable.
My interests sit at the intersection of EEG signal processing, statistical learning, and computational neuroscience infrastructure, with a focus on the systems-level constraints that govern real-world biosignal deployment.
Themes that cut across the technical work. The questions that make the engineering decisions feel necessary.
Personal Baseline Deviation vs. Population Classification for Wearable EEG Stress Detection: A Pilot Study
Comparing unsupervised subject-specific personalisation against supervised cross-subject gradient boosting under two-electrode temporal constraints · SAM40 dataset · n = 40
This study tests whether an unsupervised personal baseline deviation model outperforms a supervised population classifier for EEG stress detection under wearable-constrained conditions, using two temporal electrodes (T7/T8) to simulate consumer behind-ear devices. On the SAM40 dataset (40 subjects, 32-channel EEG, 128 Hz), the personal baseline model significantly outperformed the population classifier in the wearable condition (accuracy: 0.611 vs 0.538, p=0.025, r=0.355) and full 32-channel condition (accuracy: 0.693 vs 0.619, p=0.044, r=0.318). SHAP analysis identified temporal alpha differential entropy as the dominant stress biomarker, 2.4× more important than any other band. Alpha suppression occurred in 27/40 subjects; the remaining 13/40 showing enhancement represent a subpopulation for whom directionally rigid models fail. Results constitute an upper bound on real wearable performance as preprocessing used all 32 channels before electrode extraction.
Psychological stress is a growing public health concern, and its objective measurement through physiological signals has attracted significant research attention. Electroencephalography (EEG) is an effective tool for identifying stress as it detects the cognitive aspects of stress prior to the emergence of peripheral reactions such as changes in heart rate or changes in skin conductance [1]. Even though EEG wearables have made ambulatory brain monitoring increasingly accessible, two fundamental problems prevent reliable deployment.
First is the inter-subject variability: resting alpha power, peak frequency, and spectral distributions vary significantly among individuals [1, 2]. Population classifiers trained on averaged patterns across many subjects learn a mean that may not represent any individual accurately, causing systematic misclassification in subjects whose baseline deviates from the group mean. Second, the electrode constraint: devices such as the Emotiv MN8 and use only two electrodes. Specifically, Emotiv MN8 uses electrodes that are behind the ear at temporal positions T7 and T8. Detection algorithms developed for 32-channel laboratory systems, however, do not indicate the extent to which performance deteriorates under this constraint.
This study addresses both problems simultaneously. We compare an unsupervised personal baseline deviation model against a supervised gradient-boosting population classifier under T7/T8-restricted and full 32-channel conditions. Previous work has benchmarked cross-subject classifiers on public datasets including DEAP [10], but the specific question of whether personalisation advantage persists under two-channel wearable constraints has not been quantified with proper leave-one-subject-out evaluation. This is Layer 1 of a three-layer research program: pilot analysis on public data (this study), original data collection with validated stress induction (Layer 2), and real wearable hardware validation (Layer 3).
Primary literature informing the evaluation design and contextualising the findings within EEG adaptation research.
Limitations, caveats, and what comes next
Research-grade work requires honest accounting of its boundaries. These notes document where Baseline's assumptions hold, where they break down, and what the real evaluation revealed.
On the scope of this evaluation
Experiments use real EEG data from BNCI Horizon 2020 (001-2014), single subject A01, evaluated across two sessions (A01T to A01E). Single-subject evaluation is a known limitation. The observed drift reduction may not generalize across subjects, devices, or tasks. These results are proof-of-concept, not deployment-ready benchmarks.
Method limitations
Covariance whitening assumes the training session covariance is representative of the long-run statistic. In practice, a single session may under-sample the distribution. Moving-average adaptation can over-fit to short-term transient states if the decay rate is too aggressive relative to the actual drift timescale.
What this system does not claim
Baseline is not a classifier. It makes no clinical claims about cognitive state, mental health, or neurological function. The dissociation between feature stability and decoding accuracy observed here should be interpreted as a constraint on what statistical alignment alone can provide.
Open questions
Why did covariance whitening fail to improve accuracy despite 94.6% drift reduction? Is the discriminative signal for motor imagery orthogonal to the high-variance drift directions? Can Riemannian alignment outperform covariance whitening on multi-subject longitudinal evaluation? What is the minimum viable calibration protocol for wearable BCI deployment?
Future directions
The current evaluation establishes a methodological baseline. The deeper question it opens: how should future neurotechnology systems adapt to humans as continuously evolving biological distributions rather than static users?
Systems capable of updating personal neural representations longitudinally without requiring explicit recalibration sessions — treating alignment as a persistent background process rather than an upfront cost.
Wearable EEG systems designed around passive adaptation and minimal user burden. Practical deployment imposes hard constraints on setup time; the alignment layer should absorb longitudinal variation silently.
Long-term subject-specific representation spaces capable of tracking gradual cognitive and physiological change across months or years. Whether such embeddings remain discriminative at that timescale is an open empirical question.
Adaptive systems that estimate signal reliability and drift in real time, flagging when alignment has degraded rather than silently producing uncertain predictions under noisy real-world conditions.
Wearable neurotechnology designed jointly with adaptive alignment infrastructure, rather than treating signal instability purely as a post-processing residual. Electrode placement, signal conditioning, and adaptation as a unified system.
Privacy-preserving personalization allowing wearable neural devices to improve longitudinally without centralized storage of raw neural data — a necessary constraint for any deployment at population scale.
Whether future AI systems interacting with biological users require temporally adaptive representations rather than fixed assumptions about users. BASELINE examines one constrained, measurable piece of that larger question.