Researchers push for transparency in protein AI models before real-world deployment

We risk building powerful tools that we cannot fully trust
A researcher warns that protein AI models are advancing faster than our ability to understand how they work.

At a moment when artificial intelligence can design proteins that have never existed in nature, scientists are asking a question older than any algorithm: do we understand what we have built? Researchers at the Centre for Genomic Regulation have published a call in Nature Machine Intelligence urging the biotechnology community to demand explainability from protein language models before these opaque systems become further entrenched in decisions that affect medicine, industry, and the environment. The concern is not that the tools lack power, but that power without transparency is a form of trust we have not yet earned the right to extend.

  • Protein language models can now design novel enzymes for carbon capture and industrial catalysis, yet the reasoning behind their predictions remains hidden from the researchers who rely on them.
  • The opacity creates a quiet crisis: biologists cannot verify whether a model's conclusions are grounded in genuine biological logic or in statistical artifacts and dataset biases.
  • A team led by Dr. Noelia Ferruz has mapped four extraction points for explanations—training data, input sequences, internal architecture, and behavioral testing—offering a concrete framework for opening the black box.
  • Most of the field currently uses explainability only to confirm what scientists already know, leaving its most transformative role—uncovering biological principles humans have never recognized—almost entirely unexplored.
  • The path forward requires community-wide benchmarks, open-source tools, and a firm commitment that any AI-derived insight must be validated in the laboratory before it shapes real-world design decisions.

Protein language models have reached a strange crossroads. They can now generate enzymes that have never existed in nature—molecules that might strip carbon dioxide from the atmosphere or remake industrial manufacturing by eliminating toxic waste. The potential is real. But as these systems begin shaping decisions in biotechnology labs worldwide, they remain largely opaque, offering predictions without explanations and conclusions without traceable reasoning.

This gap has moved a group of scientists at the Centre for Genomic Regulation to publish a comprehensive analysis in Nature Machine Intelligence, arguing that the field must prioritize explainability before these tools become further embedded in discovery work. Unlike older physics-based models that could be traced step by step, today's neural networks have traded transparency for raw capability. A researcher submits a protein sequence and receives a prediction, but the path the model traveled to reach that answer stays hidden.

Dr. Noelia Ferruz, who led the research, identifies four places where explanations can be extracted: the training data and its potential blind spots, the specific sequence features driving a prediction, the model's internal architecture, and behavioral testing that watches how outputs shift when inputs are nudged. Yet when the team surveyed existing literature, they found the field using these tools in narrow ways—mostly as backward-looking verification, confirming patterns biologists already recognize rather than revealing new ones.

The most ambitious possibility, which the researchers call the "Teacher" role, remains almost entirely unrealized. A true Teacher model would surface biological principles that humans have not yet discovered—new rules of folding or catalysis that could reshape how medicines and materials are made. Ferruz describes the ultimate destination as controllable protein design: a model that not only generates a candidate sequence but explains why it should work and why alternatives would fail, tracing its reasoning down to specific molecular interactions.

Getting there will require deliberate effort. The researchers call for robust benchmarks that test whether an explanation genuinely reflects a model's reasoning rather than a plausible-sounding story, open-source tools that make explainability comparable across laboratories, and a firm rule that AI-derived insights must be confirmed through experiment. The field stands at a threshold between deploying ever-more-powerful black boxes and building the transparency infrastructure that would allow these tools to become trusted partners in discovery.

Protein language models have arrived at a peculiar moment in their development. These artificial intelligence systems can now design proteins that have never existed in nature—enzymes that might pull carbon dioxide from the air, catalysts that could transform industrial manufacturing by cutting energy use and toxic waste. The potential is genuine and vast. Yet as these tools begin to influence real decisions in biotechnology labs around the world, they remain largely opaque. Researchers cannot easily see how the models reach their conclusions, whether they harbor hidden biases, or whether their predictions are actually safe to trust.

This gap between capability and understanding has prompted a group of scientists at the Centre for Genomic Regulation to publish a comprehensive analysis in Nature Machine Intelligence, calling for the field to prioritize explainability before these systems become further embedded in discovery and design work. The core problem is straightforward: protein language models operate as black boxes. A researcher feeds in a protein sequence and receives a prediction about its structure or properties, but the path the model took to reach that answer remains hidden. Unlike older physics-based approaches, which could be traced step by step, these neural networks have sacrificed transparency for power.

Dr. Noelia Ferruz, who led the work, frames the stakes clearly. The models are advancing rapidly, but our understanding of fundamental biology—how proteins fold, how they catalyze reactions—has not kept pace. In some ways, she argues, we have actually lost ground. "Without better ways to explain what these models learn and how they make decisions, we risk building powerful tools that we cannot fully trust," she says. The research team identified four critical points where explanations can be extracted. The first is the training data itself: what proteins did the model learn from, and does that dataset reflect human genetic diversity or introduce systematic blind spots? The second is the specific protein sequence being analyzed—which amino acids or regions actually drove the model's prediction? The third is the model's internal architecture, the artificial neurons and their connections, which can be inspected like an engine. The fourth is behavioral testing: nudging the input slightly and watching how the output changes.

When the researchers surveyed the existing literature on explainability in protein science, they found the field using these tools in surprisingly limited ways. Most studies employ explainability as an "Evaluator"—a verification mechanism to check whether the model has learned patterns that biologists already recognize, like binding sites or structural motifs. Some go further, acting as a "Multitasker" to annotate new proteins or predict additional properties. But these two roles dominate the landscape, meaning explainability is largely a backward-looking tool, confirming what is already known rather than driving discovery. Fewer studies use explainability as an "Engineer" or "Coach," redesigning the model's architecture to improve performance. And the most ambitious role—the "Teacher"—remains almost entirely unrealized.

A Teacher model would do something fundamentally different. It would reveal biological principles that humans have not yet recognized, the way AlphaZero uncovered novel chess strategies or AI systems deciphered damaged ancient texts by finding patterns invisible to human eyes. In protein science, this would mean AI systems helping researchers discover new rules of folding, catalysis, or molecular interaction that could reshape how medicines and materials are designed. Ferruz describes the ultimate goal as "controllable protein design." Imagine telling a model: design a protein with this shape, active at this pH. The model would not only generate a candidate sequence but explain why that design should work and why alternatives would fail—perhaps by noting that a particular mutation would disrupt a critical hydrogen-bonding network. That level of mechanistic transparency would transform protein language models from impressive pattern generators into genuinely reliable design partners.

Reaching that milestone will not happen by accident. Today's models are sophisticated pattern recognizers, but they often rely on statistical correlations rather than true mechanistic understanding. The researchers call for the community to establish robust benchmarks and evaluation frameworks that test whether an explanation genuinely reflects the model's reasoning, not just a plausible story. They advocate for open-source tools that make explainability accessible and comparable across different laboratories. Most importantly, any insight derived from the AI must ultimately be validated in the laboratory, converting mathematical patterns into experimentally confirmed biological knowledge. The field stands at a threshold: it can continue deploying increasingly powerful black boxes, or it can pause to build the transparency and validation infrastructure that would allow these tools to become trusted partners in discovery rather than impressive but opaque oracles.

Without better ways to explain what these models learn and how they make decisions, we risk building powerful tools that we cannot fully trust.
— Dr. Noelia Ferruz, Centre for Genomic Regulation
Explainability must not be an afterthought if we want protein language models to become a reliable partner in discovery and design.
— Andrea Hunklinger, first author of the paper
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

Why does it matter if we can't see inside these models? If they're making good predictions, isn't that enough?

Model

It's not just about predictions. If a model tells you to design a protein a certain way, you need to know whether it's basing that on real biological principles or just statistical noise. You might build something that looks good on paper but fails in the lab. Or worse, you might miss entirely different designs that would work better.

Inventor

So the concern is that we're trusting tools we don't understand.

Model

Exactly. And there's a second layer: these models learn from training data. If that data is skewed—say, mostly human proteins from certain populations—the model will inherit those biases. You won't know it unless you look.

Inventor

The paper mentions five different roles explainability can play. Why does it matter that most research only uses two of them?

Model

Because we're using explainability as a safety check, not as a discovery engine. We're asking, "Did the model learn what we already know?" instead of asking, "What is the model seeing that we don't?" That's the difference between verification and insight.

Inventor

And that "Teacher" role—the one that's barely being used—that's where the real breakthrough would be?

Model

That's where you move from having a tool to having a collaborator. Imagine an AI system that doesn't just generate protein sequences but explains the biological logic behind them in ways that teach us something new about how life actually works. That's the holy grail.

Inventor

What's stopping us from getting there?

Model

Partly it's technical—we need better benchmarks and validation frameworks. But it's also cultural. The field is moving fast, and explainability feels like it slows things down. The paper is essentially saying: slow down now, or you'll have to slow down much harder later when you realize you've built something you can't trust.

Quer a matéria completa? Leia o original em News-Medical ↗
Fale Conosco FAQ