AI Systems Show Promise but Reveal Language's Limits in Scientific Discovery

Language alone cannot do science
Two new AI systems for drug discovery succeeded at identifying candidates but revealed that words and statistical patterns cannot replace structured scientific reasoning.

Dois sistemas de inteligência artificial — Robin e Co-Scientist — avançaram na descoberta de candidatos a medicamentos para leucemia mieloide aguda e degeneração macular, mas revelaram, ao fazê-lo, uma fronteira que a linguagem sozinha não consegue cruzar. Por mais sofisticadas que sejam as redes neurais que processam texto, a ciência exige estruturas que existem além das palavras: sequências genômicas, geometrias moleculares, medições quantitativas. O que esses sistemas iluminam não é apenas o que a IA pode fazer, mas o que ela ainda não é — e o que a próxima geração precisará se tornar.

  • Sistemas de IA identificaram dezenas de candidatos a medicamentos promissores, mas dependeram de cientistas humanos em cada etapa crítica de validação.
  • A limitação central emergiu com clareza: modelos de linguagem navegam relações estatísticas entre palavras, não o comportamento real de sistemas biológicos vivos.
  • Pesquisadores precisaram formular as perguntas certas, filtrar previsões e decidir quais hipóteses mereciam investigação — sem essa mediação, os sistemas não avançariam.
  • A próxima geração de modelos já se desenha no horizonte, integrando dados genômicos, estruturas proteicas e imagens celulares à compreensão conceitual da linguagem.
  • O campo caminha para sistemas que usam a linguagem como uma ferramenta entre muitas, e não como fundamento do raciocínio científico.

Dois sistemas de inteligência artificial apresentados recentemente na revista Nature — Robin e Co-Scientist — entregaram resultados ao mesmo tempo encorajadores e reveladores. Ambos conseguiram identificar candidatos promissores a medicamentos e sintetizar vastas literaturas científicas. Mas seus sucessos expuseram algo que seus criadores não antecipavam: a linguagem sozinha, mesmo processada pelas redes neurais mais sofisticadas, não é capaz de fazer ciência.

O Co-Scientist, desenvolvido pelo Google DeepMind, opera como um sistema multiagente que simula trabalho cognitivo abstrato, com agentes de reflexão que criticam hipóteses e agentes de classificação que promovem torneios internos entre ideias concorrentes. O Robin, criado pela organização sem fins lucrativos Future House, foi construído com foco mais estreito: reposicionar medicamentos existentes para tratar novas doenças. Ao enfrentar a leucemia mieloide aguda, o Co-Scientist indicou trinta candidatos; oncologistas selecionaram cinco para testes laboratoriais e três mostraram atividade. O Robin, diante da degeneração macular relacionada à idade, chegou a dois medicamentos promissores após múltiplas rodadas de análise e diálogo com cientistas humanos.

Essa dependência aponta para uma limitação mais profunda. A comunicação conduzida puramente por linguagem carrega imprecisões que a ciência não tolera. Um cientista que lê sobre dobramento de proteínas compreende não apenas as palavras, mas as estruturas tridimensionais que elas descrevem. Um modelo de linguagem, por maior que seja, navega relações estatísticas entre termos — e quando precisa modelar o comportamento real de sistemas vivos, vacila. Uma sequência genética não é uma descrição de uma sequência genética: é o próprio gene, codificado em um alfabeto de quatro letras.

A próxima geração de sistemas já se anuncia. Esses modelos integrarão dados quantitativos e estruturados às estruturas conceituais que sustentam os fatos científicos, ancorando o raciocínio em dados genômicos, estruturas proteicas e imagens celulares. Até que essa transição se complete, sistemas como Robin e Co-Scientist permanecerão o que são agora: assistentes valiosos capazes de acelerar certas fases da pesquisa, mas incapazes de substituir o julgamento de cientistas que compreendem tanto as perguntas quanto o mundo que essas perguntas descrevem.

Two artificial intelligence systems designed to accelerate scientific discovery have delivered results that are simultaneously encouraging and humbling. Robin and Co-Scientist, both unveiled recently in papers published in Nature, can identify promising drug candidates and synthesize vast bodies of research literature. Yet their successes have exposed something their creators did not anticipate: language alone—even language processed by the most sophisticated neural networks—cannot do science.

Google DeepMind built Co-Scientist as a multi-agent system that simulates abstract cognitive work. It deploys what the team calls a reflection agent to critique hypotheses like a skeptical peer reviewer, and ranking agents that stage internal tournaments where competing ideas debate each other. Future House, a nonprofit research organization, constructed Robin with a narrower focus: repositioning existing drugs to treat new diseases. Robin's agents specialize in selecting experiments and parsing complex biomedical datasets.

When Co-Scientist tackled acute myeloid leukemia, it nominated thirty drug candidates. Oncologists then filtered that list, selected five for laboratory testing, and found three showed activity. One displayed particularly encouraging signs. Robin approached age-related macular degeneration differently, proposing thirty candidates and, after multiple rounds of analysis and conversation with human scientists, identifying two drugs worth pursuing further. Both systems stopped before moving into direct experimental validation. Both leaned heavily on human researchers to pose the actual scientific questions, to reject or accept predictions, to decide which hypotheses deserved deeper investigation.

This dependency points to a deeper constraint. The researchers themselves acknowledged it: communication conducted purely through language, however natural it feels, carries imprecision and ambiguity that science cannot tolerate. A scientist reading a paper about protein folding understands not just the words but the three-dimensional structures those words describe. A language model, no matter how large, navigates the statistical relationships between words. It can traverse decades of accumulated scientific literature with remarkable fluency. But when asked to model the actual behavior of living systems—the systems those papers merely describe—it falters.

Science operates on structured data, quantitative measurements, and causal relationships that extend far beyond the statistical associations between terms. A gene sequence is not a description of a gene sequence; it is the gene itself, encoded in a four-letter alphabet. A protein's three-dimensional shape determines its function in ways that no amount of linguistic description can fully capture. A microscope image of a cell contains information that no summary can replace. Language models excel at finding patterns in text. They struggle when the phenomenon they need to understand exists in a different domain entirely—in numbers, in molecular geometry, in the physical world.

The next generation of AI systems is already taking shape on the horizon. These models will integrate quantitative, structured data with the conceptual frameworks and relationships that underpin scientific facts. They will anchor reasoning not in word sequences but in genomic data, protein structures, and cellular images. They will use language as one tool among many, not as the foundation of understanding. Until that transition occurs, AI systems like Robin and Co-Scientist will remain what they are now: valuable assistants that can accelerate certain phases of research but cannot replace the judgment of scientists who understand both the questions being asked and the world those questions describe.

Communication based solely on language carries imprecisions and ambiguities incompatible with the rigor science demands
— Researchers involved in the study
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

These systems found drug candidates that actually showed promise in the lab. Why is that not enough? Why do we need to say they revealed a limitation?

Model

Because they didn't find those candidates alone. Humans had to ask the right questions first, then filter the results, then decide which ones were worth testing. The AI generated possibilities from existing literature, but it couldn't validate them or understand why they might work.

Inventor

But that's collaboration, isn't it? Humans and machines working together?

Model

It is, but the collaboration has a ceiling. The AI is working entirely in language—in the patterns of scientific papers. It's not actually modeling how a drug molecule interacts with a protein, or how a cell responds. It's finding statistical correlations between words that describe those things.

Inventor

So the problem is that language is too abstract?

Model

Exactly. A protein's shape matters more than any description of it. A gene sequence is information that language can only approximate. The AI can read about these things brilliantly, but it can't reason about them the way a scientist does—by holding the actual structure in mind.

Inventor

What changes with the next generation?

Model

They'll work directly with the data itself—the sequences, the structures, the images. Language will still be useful for explaining findings, but it won't be the foundation of the reasoning. The AI will think in the same domain as the phenomenon it's studying.

Inventor

And then the AI becomes a real scientist?

Model

No. It becomes a much more capable tool. But the scientist—the human asking which questions matter and why—that remains irreplaceable.

Quer a matéria completa? Leia o original em O Cafezinho ↗
Fale Conosco FAQ