Brain's AI models prove data standardization, not just scale, unlocks neuroscience

Machines don't apprentice. The implicit must become explicit.
Why neuroscience's traditional knowledge-transfer methods fail when data needs to be integrated across labs.

The emergence of foundation models for brain science reveals a quiet truth: artificial intelligence did not outpace biology — it finally caught up to decades of painstaking human agreement. Across thirty years, neuroscientists built shared languages for their data, and only where that work matured have AI models begun to generalize meaningfully. The lesson is not that machines grew smarter, but that knowledge must be made legible before it can be learned — and that the unglamorous labor of standardization is, in the end, the precondition for discovery.

  • Brain AI models like MICrONS and TRIBE v2 are arriving now not because of AI breakthroughs, but because neuroscience spent thirty years quietly agreeing on how to store, label, and share its data.
  • A single uncorrected voltage offset — the liquid junction potential — can shift membrane measurements by up to 15 millivolts, silently corrupting biophysical models across entire circuits when labs don't report their calibration choices.
  • The assumption that scale alone will wash out methodological noise is only partly true: without overlapping data, documented covariates, and biology that isn't entangled with technique, AI models learn the lab, not the brain.
  • Most neuroscience data is still produced outside large consortia, in individual labs where tacit knowledge passes through apprenticeship — a system that works for humans but leaves machines with no context and no ground truth.
  • The field is moving toward frameworks like FAIR² that demand data travel with its full provenance and protocol history, but neuroscience still lacks a shared infrastructure comparable to the Protein Data Bank.
  • The path forward requires every lab — not just coordinated consortia — to make the implicit explicit, committing to standardized protocols and operational transparency before the next generation of models can reach behavior and clinical domains.

The first foundation models for brain science did not arrive because artificial intelligence grew more powerful. They arrived because neuroscience spent three decades doing work that rarely makes headlines: agreeing on data formats, standardizing experimental procedures, and recording what measurements actually mean.

This is the inverse of the story AI hype tells. Since the early 1990s, when the first Human Brain Project recognized that individual labs could never build shared infrastructure alone, major initiatives — the BRAIN Initiative, the European Human Brain Project, the Allen Institute — along with standards like NWB and BIDS, have done that slow, coordinating work. Foundation models are now appearing precisely where that work has matured.

In April 2025, the MICrONS consortium published a model trained on calcium-imaging data from roughly 135,000 neurons in mouse visual cortex, capable of generalizing to new animals and linking function to structural connectivity. In March 2026, Meta released TRIBE v2, predicting human fMRI responses across visual, auditory, and language stimuli from over 1,000 hours of standardized data. Both followed the AlphaFold pattern: standardization first, foundation model after.

A widespread belief in AI holds that scale will dissolve methodological variation — that a model trained on enough heterogeneous data will factor out lab-to-lab differences the way large language models handled the open web. This is partly true, but only under conditions neuroscience rarely meets: covariates rich enough to specify what varies, overlapping data across conditions, and biology not hopelessly entangled with technique. Without those conditions, models learn the lab, not the brain.

The precision required is more extreme than it appears from outside. The liquid junction potential — a small voltage at the pipette-bath interface — shifts membrane measurements by 10 to 15 millivolts when uncorrected. Whether to correct it varies by lab and often goes unreported. Because voltage-gated channels operate in narrow windows, even a small miscalibration can systematically distort models of excitability and threshold, with errors compounding across channels, cells, and circuits. This is one parameter among dozens. The same problem recurs in imaging, behavioral measurement, and transcriptomics.

These discrepancies persist not from carelessness but from culture. Methodology in neuroscience has always traveled tacitly, passed through apprenticeship. That works between humans. Machines don't apprentice. A decade ago, Shreejoy Tripathy and colleagues demonstrated this empirically: after back-modeling methodological covariates across thousands of published reports, classification accuracy of neuron types rose from 48 to 81 percent. The variance was always there — it just hadn't been recorded.

Frameworks like FAIR² are trying to define what must travel with a measurement beyond the data file itself: protocol context, provenance, assumptions, and reuse constraints. But most neuroscience data is still produced outside large consortia, in individual labs running experiments one at a time. For that data to be broadly useful — for validation, hypothesis testing, cross-study comparison — it must be produced with standardized protocols and explicit documentation of technical conditions. The field still lacks a shared integrative infrastructure comparable to the Protein Data Bank. The question now is whether neuroscience as a whole will commit to building it.

The first foundation models for brain science did not arrive because artificial intelligence got smarter. They arrived because neuroscience spent three decades doing unglamorous work: agreeing on how to store data, standardizing how experiments are performed, and recording what measurements actually mean.

This is the opposite lesson from what AI hype suggests. The field has been struggling to integrate its diverse data since the early 1990s, when the first Human Brain Project launched on the premise that individual labs could never build the infrastructure needed to make their data talk to each other. Over the past thirty years, major initiatives like the BRAIN Initiative, the European Human Brain Project, and the Allen Institute, along with coordinating consortia and shared standards like NWB and BIDS, have done that slow work. The foundation models are now appearing in the areas where that work has matured.

In April 2025, Eric Wang and colleagues in the MICrONS consortium published a foundation model trained on calcium-imaging recordings from roughly 135,000 neurons across mouse visual cortex. It generalizes to new mice, predicts responses to novel stimuli, and links function to structural connectivity. In March 2026, Meta released TRIBE v2, a model that predicts human functional MRI responses to visual, auditory, and language stimuli, trained on more than 1,000 hours of fMRI data from about 720 participants. Both efforts followed the path that AlphaFold took for protein structure: the standardized work came first, the foundation model came after. Building the MICrONS corpus took half a decade of coordinating in-vivo functional imaging and electron microscopy across labs. TRIBE v2 was made possible largely by the Brain Imaging Data Structure, which gave researchers shared data formats, and by the Human Connectome Project and UK Biobank, which used BIDS to establish large, standardized fMRI datasets over more than a decade.

A common view in AI holds that scale will solve this challenge—that a foundation model trained on enough heterogeneous brain data will factor out methodological variation the way large language models learned to handle the open web. This is partly right. Such methods already recover biological signal across single-cell sequencing batches. But they work only under specific conditions: methodological covariates rich enough to specify what varies, data that overlap across conditions, and biology not confounded with methodology beyond what conditioning can untangle. For most of neuroscience today, none of those conditions reliably holds. The result is models that predict well on familiar data but fail when asked to generalize to a new lab, a new perturbation, or a new mechanism.

The detail required to integrate data accurately is more extreme than people outside the lab assume. Take the liquid junction potential—a small voltage at the interface between pipette and bath solutions. Uncorrected, it shifts membrane voltage measurements by 10 to 15 millivolts. The convention on whether to correct this varies across laboratories and is often unreported. The same cell, recorded in two labs, ends up measured against different zeros. Because voltage-gated channels operate in narrow windows, a biophysical model built with a small miscalibration can be systematically wrong about excitability, integration, and threshold. The errors compound when channels interact in a cell, and again when cells interact in a circuit. This is just one parameter of dozens—temperature, electrode type, filter settings. The same kind of thing happens in two-photon imaging, behavioral measurement, and transcriptomics.

These discrepancies persist for cultural and structural reasons, not carelessness. Methodology in neuroscience has always been carried tacitly, passed from senior to junior through apprenticeship. That works for human-to-human reuse. Machines don't apprentice. If data are to travel, the implicit must become explicit; otherwise variability has no context and becomes noise. A decade ago, Shreejoy Tripathy and colleagues showed this empirically: after they back-modeled methodological covariates across thousands of literature reports, the classification accuracy of new recordings against canonical neuron types rose from 48 to 81 percent. The variance hadn't gone anywhere. Most of it was methodology that hadn't been recorded.

Efforts like FAIR², which builds on the FAIR principles by requiring that data also be AI-ready, responsibly governed, and context-rich, try to define what should travel with a measurement beyond the file itself: protocol context, provenance, assumptions, reuse constraints, and the evidence needed to interpret the data. This matters beyond foundation model training. Most neuroscience data is produced outside corpus-building consortia, in individual labs running experiments one at a time. For those data to be broadly useful—for validating predictions, testing hypotheses, comparing results across studies—they have to be produced with standardized protocols where possible and explicit recording of technical conditions where that is not. To date, the field also lacks a common integrative infrastructure akin to the Protein Data Bank for structural biology. Efforts are emerging, but neuroscience does not yet have that infrastructure at the scale and breadth the field needs. The question now is whether the rest of neuroscience will commit to doing the same kind of work.

A foundation model can learn useful biological structure only when it can separate biological differences from methodological ones.
— The author, on the core challenge of brain AI
The variance hadn't gone anywhere; most of it was methodology that hadn't been recorded.
— Shreejoy Tripathy's empirical finding on unreported methodological variation
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

Why did these foundation models appear now, and not five years ago? The AI was already capable.

Model

Because the data wasn't ready. You can't train a model to separate biological signal from methodological noise if the noise isn't documented. MICrONS spent half a decade just coordinating how different labs would measure the same things.

Inventor

But couldn't the model learn to filter that out on its own, given enough data?

Model

Only if the variation is random and well-distributed. But it's not. A lab's choice to correct—or not correct—the liquid junction potential shifts every voltage measurement by 10 to 15 millivolts. That's systematic. A model trained on that data learns the artifact, not the biology.

Inventor

So standardization is the bottleneck, not computing power.

Model

Exactly. And it's not glamorous work. It's negotiating cell-type definitions across labs, deciding whether to report temperature, agreeing on stimulus protocols. But without it, you're just fitting noise.

Inventor

What happens to all the data produced outside these consortia—the thousands of labs doing their own experiments?

Model

It's largely isolated. Useful within that lab, but it can't be compared or integrated because nobody recorded the context. The field needs infrastructure like the Protein Data Bank, but for brains.

Inventor

Is that infrastructure being built now?

Model

Starting to. BICAN and others are emerging. But we're nowhere near the scale neuroscience needs. Most labs still aren't recording the technical details that would make their data reusable.

Fale Conosco FAQ