AI Agent Reaches 91% Accuracy in Oncology Decision Support, Promising Clinical Tool

These agents are designed to support clinicians, but by no means to replace them.
A leading researcher emphasizes the tool's role as an assistant, not a substitute for medical judgment.

At the intersection of artificial intelligence and oncology, researchers at TU Dresden have built a system that may one day serve as medicine's most tireless collaborator — not to replace the physician's judgment, but to extend the reach of human knowledge in moments when complexity threatens to overwhelm it. By equipping GPT-4 with specialized medical tools and access to thousands of clinical guidelines, the team demonstrated that an AI agent could navigate realistic cancer cases with 91% accuracy, correctly citing oncology protocols in three out of four responses. The work is a proof of concept, not a clinical product, but it points toward a future in which the burden of keeping pace with an ever-expanding body of medical knowledge need not rest on any single human mind alone.

  • Oncologists face a mounting cognitive burden — cancer cases now involve overlapping genetic mutations, multimodal imaging, and treatment guidelines that can change within weeks of a diagnosis.
  • The AI agent tested at TU Dresden achieved 91% accuracy across 20 simulated cancer cases, and its use of specialized tools dramatically reduced the dangerous 'hallucinations' that make unchecked AI a liability in clinical settings.
  • The system integrates radiology software, genetic mutation prediction, and a library of 6,800 oncology documents — reasoning through cases the way a clinician would, but without fatigue or knowledge gaps.
  • Researchers and clinicians alike are tempering enthusiasm: 20 simulated cases is a starting point, not a finish line, and the road to deployment runs through regulatory approval, data privacy compliance, and clinician training.
  • The underlying vision is augmentation, not automation — a tool that handles research and synthesis so that the oncologist remains fully in command of the decision that matters most.

An oncologist reviewing a complex cancer case confronts a familiar problem: multiple genetic mutations, imaging showing spread across several organs, and treatment guidelines updated just weeks ago. The decision tree branches in a dozen directions. Researchers at TU Dresden have built something close to the tireless research assistant that moment demands.

Their system takes GPT-4 and equips it with a suite of specialized medical tools — software that reads MRI and CT scans, algorithms that predict genetic mutations from tissue slides, and direct access to PubMed, Google, and OncoKB, a comprehensive cancer genomics database. The system was also given roughly 6,800 documents drawn from official oncology guidelines, creating a constantly updated reference library it can reason from in real time.

When presented with a patient scenario, the agent decides which tools it needs, retrieves the relevant information, and reasons through to a clinical conclusion while citing its sources. Tested on 20 realistic simulated cases, it reached the correct conclusion 91% of the time and accurately cited oncology guidelines in more than 75% of its responses. Crucially, the specialized tools significantly reduced hallucinations — the confident, plausible-sounding falsehoods that make unguided AI dangerous in medicine.

Study author Dyke Ferber describes the tool as a time-saver and knowledge-keeper, one that could help doctors stay current on evolving treatment recommendations and support the identification of personalized care. But the researchers are careful not to oversell. Professor Jakob Kather, an oncologist and clinical AI researcher at TU Dresden, is direct about what remains: larger validation datasets, integration into hospital systems, data privacy compliance, regulatory approval as a medical device, and clinician training on how to work alongside AI without surrendering their own judgment.

'These agents are designed to support clinicians, but by no means to replace them,' Kather emphasizes. The team envisions similar systems eventually serving cardiology, neurology, and other fields where complex data and evolving guidelines outpace what any single human can hold in mind — but only after the practical, regulatory, and educational challenges between a promising result and a working clinical tool have been honestly confronted and solved.

An oncologist sitting down to review a complex cancer case faces a familiar problem: the patient's tumor involves multiple genetic mutations, imaging shows spread to three organs, and the latest treatment guidelines were updated just weeks ago. The decision tree branches in a dozen directions. What if the doctor had a tireless research assistant—one that could instantly cross-reference medical images, genetic data, patient records, and the entire landscape of current treatment protocols?

Researchers at TU Dresden have built something close to that. They took GPT-4, the large language model behind ChatGPT, and equipped it with a suite of specialized medical tools: software that reads MRI and CT scans, algorithms that predict genetic mutations from tissue slides, and direct access to PubMed, Google, and OncoKB—a comprehensive database of cancer genomics. They also fed the system roughly 6,800 documents drawn from official oncology guidelines and clinical resources, creating what amounts to a constantly updated reference library.

The result is an autonomous AI agent designed to think through cancer cases the way a human oncologist does, but faster and without fatigue. When given a patient scenario, the system first decides which tools it needs—maybe it pulls the radiology report, maybe it analyzes the pathology slide, maybe it searches for the latest evidence on a particular drug combination. It then retrieves the relevant information and reasons through to a clinical conclusion, citing its sources as it goes.

To test whether this actually works, the researchers ran the system through 20 realistic, simulated patient cases. Human medical experts then reviewed each answer for accuracy, completeness, and whether the AI had correctly cited the guidelines it claimed to use. The results were striking: the AI reached the correct clinical conclusion in 91 percent of cases. In more than 75 percent of its responses, it accurately cited relevant oncology guidelines. Perhaps more importantly, the specialized tools and access to medical knowledge dramatically reduced what researchers call "hallucinations"—those moments when an AI confidently states something plausible but entirely false, a particular hazard in medicine.

Dyke Ferber, the study's first author, frames the tool as a time-saver and a knowledge-keeper. "AI tools are designed to support medical professionals, freeing up valuable time for patient care," Ferber said. "They could help in daily decision-making processes and support doctors to stay updated on the latest treatment recommendations, contributing to the identification of optimal personalized care for cancer patients."

But the researchers are careful not to oversell. The study tested the system on only 20 simulated cases—a proof of concept, not a finished product ready for the clinic. Jakob Kather, a professor of clinical artificial intelligence at TU Dresden and an oncologist himself, acknowledges the real obstacles ahead. The system needs validation on far larger datasets. It needs to be integrated into existing hospital computer systems without breaking them. It needs to comply with data privacy laws, which means running on local servers rather than cloud systems. It needs regulatory approval as a medical device. And perhaps most fundamentally, doctors need training on how to work with it—how to trust it without abdicating their own judgment.

"These agents are designed to support clinicians, but by no means to replace them," Kather emphasizes. The vision is not a robot making decisions, but a tool that handles the research and synthesis, leaving the oncologist in full command. The team also imagines adapting similar systems for other medical fields—cardiology, neurology, infectious disease—anywhere that complex multimodal data and constantly evolving guidelines demand more than any single human can hold in mind at once. But that future depends on solving the practical, regulatory, and educational challenges that still stand between a promising lab result and a tool that actually changes how doctors work.

AI tools are designed to support medical professionals, freeing up valuable time for patient care and helping doctors stay updated on the latest treatment recommendations.
— Dyke Ferber, first author of the study
These agents are designed to support clinicians, but by no means to replace them. Full authority over final clinical decision-making must remain with medical professionals.
— Prof. Jakob N. Kather, TU Dresden
La Conversación del Hearth Otra perspectiva de la historia
Inventor

Why does oncology specifically need this kind of AI support? Isn't cancer treatment already pretty standardized?

Model

It's the opposite. Cancer is incredibly heterogeneous—the same tumor type in two patients can have completely different genetic drivers, and the treatment landscape shifts constantly. An oncologist has to synthesize imaging, pathology, genomics, and guidelines that get updated every few months. It's not that treatment is simple; it's that it's overwhelmingly complex.

Inventor

So the AI is basically a very fast literature review machine?

Model

It's more than that. It's not just retrieving information—it's reasoning through a case the way a doctor would, deciding which tools to use and in what order. It's the synthesis that matters. A doctor could spend an hour researching; the AI does it in seconds.

Inventor

The 91% accuracy number—does that mean it's better than doctors?

Model

No. It means on these 20 simulated cases, it reached the right conclusion 91% of the time. Real practice is messier. And the point isn't to replace doctors; it's to catch what they might miss and free them to focus on the human parts of care.

Inventor

What's a hallucination in this context?

Model

When the AI confidently states something that sounds true but is completely false. In medicine, that's dangerous. If an AI invents a drug interaction or misquotes a guideline, a tired doctor might not catch it. The specialized tools and access to real data dramatically reduced that problem.

Inventor

Why does it need 6,800 guideline documents if it's already trained on the internet?

Model

Because the internet is noisy and outdated. Official guidelines are curated, current, and authoritative. The AI needs to cite them correctly, not just sound plausible. That's the difference between a chatbot and a clinical tool.

Inventor

What happens next?

Model

More testing on real cases, integration into hospital systems, regulatory approval, and—crucially—teaching doctors how to work with it. The technology is promising, but the implementation is the hard part.

Contáctanos FAQ