You see islands instead of a landscape
Beneath the surface of what we can see lies a vast landscape of biological complexity that has long resisted human understanding. Researchers at Utah State University have developed RF-PHATE, a machine learning tool that translates high-dimensional biological data into forms the human mind can interpret and act upon. Published in Nature Computational Science, the work emerges from a timeless scientific impulse — to find pattern within chaos — and carries immediate consequence for patients whose diseases, like multiple sclerosis, resist simple categorization. In making the invisible legible, the tool quietly reframes what personalized medicine might one day mean.
- Modern biology generates datasets of staggering complexity — hundreds of thousands of data points per patient — and existing visualization tools have struggled to show how different groups of data connect rather than simply diverge.
- RF-PHATE enters this gap by fusing random forest algorithms with heat-diffusion mathematics, preserving the relational structure between data clusters that earlier methods like t-SNE and UMAP tended to distort or sever.
- The stakes sharpened immediately in testing: the tool surfaced a multiple sclerosis subtype long suspected but never clearly confirmed, a finding with direct implications for matching patients to treatments that will actually work for them.
- Validation across COVID-19 and lung cancer datasets signals that the tool's utility is not narrow, and its capacity to produce interpretable AI predictions positions it at the frontier of the broader AI-for-Science movement.
- A collaboration spanning eight institutions across North America and Europe now carries this work forward, reflecting a growing consensus that the hardest scientific questions demand not just better instruments, but fundamentally better ways of seeing.
Our eyes work in two dimensions, yet we navigate a three-dimensional world through the brain's quiet translation. A similar problem confronts researchers studying the sub-microscopic processes of disease: how do you see what no conventional instrument can show you? At Utah State University, a team spanning mathematics, engineering, and statistics has built a tool to answer that question.
They call it RF-PHATE — Random Forest-Potential of Heat-diffusion for Affinity-based Trajectory Imbedding — a machine learning system designed to render high-dimensional biological data into forms researchers can actually interpret. Published June 30 in Nature Computational Science, the work is led by Kevin Moon, director of Utah State's Data Science and Artificial Intelligence Center.
The team tested RF-PHATE on clinical data from multiple sclerosis patients, datasets tracking disease progression at the cellular level alongside treatment records and outcomes. Where existing tools like PHATE, t-SNE, and UMAP tended to overstate the separation between data groups — treating them as isolated islands — RF-PHATE preserves the connective tissue between them, offering a truer picture of what the data is actually saying.
The payoff was immediate. The tool identified a previously suspected MS subtype that had never been clearly confirmed — a finding that matters because multiple sclerosis manifests differently across patients, and knowing a patient's subtype shapes which treatments are likely to help. The team also applied RF-PHATE to COVID-19 blood plasma samples and lung cancer cell data, with promising results.
Moon sees the tool's reach extending well beyond biology. RF-PHATE can support more interpretable AI systems — ones that not only make predictions but can explain their reasoning — and fits within the growing international AI-for-Science movement aimed at accelerating discovery through machine learning. Collaborators from eight institutions across North America and Europe are part of the effort.
The mathematics is abstract, but the human dimension is not. Somewhere inside those datasets are patients whose treatment paths will be shaped by a clearer understanding of their disease. That is the quiet purpose behind the algorithms: not just to see the data, but to see the people within it.
Our eyes work in two dimensions but our brains construct three. We judge distance, sense our place in space, examine the world around us—all because of this translation. But beneath the visible world lies another realm entirely: the sub-microscopic structures and high-dimensional biological processes that no conventional microscope can capture. For researchers trying to make sense of that hidden landscape, the challenge has always been the same. How do you see what your instruments cannot show you?
At Utah State University, a team of mathematicians, engineers, and statisticians has built a new tool to answer that question. They call it RF-PHATE—Random Forest-Potential of Heat-diffusion for Affinity-based Trajectory Imbedding—and it is, in essence, a machine learning system designed to translate the invisible into the visible. The work, published June 30 in Nature Computational Science, represents a significant step forward in how scientists can explore and understand the massive, tangled datasets that modern biology produces.
Kevin Moon, director of Utah State's Data Science and Artificial Intelligence Center, leads the effort. His team tested the tool on clinical data from multiple sclerosis patients—datasets containing hundreds of thousands of individual data points tracking disease progression at the cellular level, alongside treatment records and patient outcomes. The problem they were solving is fundamental to modern medicine: when you have that much information, how do you find the patterns that matter? How do you see the relationships between different groups of data without losing sight of how those groups connect to each other?
Existing visualization methods—tools with names like PHATE, t-SNE, and UMAP—had already made progress on this problem. But each carried limitations. Some tended to overstate the differences between groups of data, treating them as isolated islands rather than as parts of a connected landscape. RF-PHATE takes a different approach. By combining random forest algorithms with heat-diffusion mathematics, it preserves the actual structure of how different data groups relate to one another. The result is a clearer picture of what the data is actually saying.
The practical payoff emerged quickly. Using RF-PHATE on the multiple sclerosis data, Moon's team found evidence of a disease subtype that researchers had long suspected but never clearly identified. This matters because multiple sclerosis is not one disease—it manifests differently in different patients, and the specific subtype a person has shapes which treatments will work best for them. Identifying those subtypes with precision could mean the difference between a treatment plan that helps and one that misses the mark.
The team tested RF-PHATE on other datasets as well: blood plasma samples from COVID-19 patients and lung cancer cells treated with antioxidants. But Moon emphasizes that the tool's reach extends far beyond biology. RF-PHATE can be applied across disciplines, and it can also help researchers build more interpretable artificial intelligence systems—tools that not only make predictions but can explain how they arrived at those predictions. This is still an active frontier in his lab's work.
The research sits within a larger movement called AI for Science, an international effort to harness machine learning and artificial intelligence to accelerate discovery, process massive datasets, and simulate systems too complex for traditional analysis. Moon's collaborators span eight institutions across North America and Europe, from Brigham Young University to McGill to the University of Lausanne. That breadth reflects a simple truth: the biggest scientific questions now require not just better instruments, but better ways of seeing what those instruments reveal.
The human dimension is quiet but real. Somewhere in those datasets are patients whose treatment decisions will be guided by clearer understanding of their disease. The tool itself is abstract—mathematics, algorithms, computational geometry. But it exists because people get sick, and because understanding how they get sick, and why they get sick differently from one another, remains one of medicine's most pressing challenges.
Notable Quotes
MS affects each patient differently, and knowing the specific type guides treatment decisions— Kevin Moon, director of USU's Data Science and Artificial Intelligence Center
RF-PHATE does a much better job of preserving the structure of how they relate to each other— Kevin Moon, on how the tool improves over existing visualization methods
The Hearth Conversation Another angle on the story
Why does it matter that RF-PHATE preserves relationships between data groups better than other methods?
Because in biology, nothing exists in isolation. A disease subtype doesn't exist by itself—it exists in relation to other subtypes, to healthy tissue, to how treatments affect different populations. If your visualization tool exaggerates the differences and ignores the connections, you miss the actual structure of what's happening. You see islands instead of a landscape.
The paper mentions multiple sclerosis specifically. Why is identifying subtypes so clinically important?
MS affects every patient differently. Two people with the same diagnosis might need completely different treatments. If you can identify which subtype a patient has, you can match them to the therapy most likely to work for them. That's the difference between precision medicine and guessing.
This tool works on biological data, but you said it extends to other fields. What would that look like?
Any field dealing with high-dimensional data—finance, climate science, materials engineering, social networks. Anywhere you have thousands of data points and you need to find meaningful patterns without losing sight of how those patterns connect to each other.
Is this a finished product, or is there more work ahead?
It's a proof of concept that works. But Moon's lab is still actively researching how to use it to build more interpretable AI systems—tools that don't just predict but explain themselves. That's the frontier now.
What makes this different from just having a better computer or more data?
More data without better ways to see it is just noise. This is about translation—taking what your instruments measure and converting it into something the human mind can actually understand and act on.