Study identifies 32 causal genes for Long COVID, revealing three distinct disease subtypes

Long COVID affects millions worldwide with persistent symptoms lasting months after SARS-CoV-2 infection, impacting respiratory, neurological, cardiovascular, and gastrointestinal systems.
Long COVID is not one disease—it's at least three, each with distinct biology.
Researchers identified three patient subtypes with different gene expression patterns and symptom profiles, suggesting personalized treatment approaches.

Months after infection ends, millions remain caught in the prolonged aftermath of SARS-CoV-2 — fatigued, foggy, struggling to breathe or digest or sleep — yet medicine has lacked the genetic map to explain why. A research team has now charted that territory, using computational tools borrowed from both biology and engineering to identify 32 genes likely driving Long COVID, and in doing so, discovered that the condition is not one illness but three. The work, published in PLOS Computational Biology, offers the first comprehensive genetic framework for a disease that has resisted understanding, and opens a path toward treatments tailored not to Long COVID in general, but to the specific molecular reality each patient inhabits.

  • Millions of people remain trapped in Long COVID's aftermath — fatigued, cognitively impaired, and physiologically disrupted — while medicine has had no clear genetic map to guide treatment.
  • A computational framework combining causal inference and network control theory has pinpointed 32 genes likely responsible for the condition, 13 of which have never before been linked to Long COVID.
  • The most disruptive finding is that Long COVID is not a single disease — three molecularly distinct subtypes have been identified, each with its own symptom profile, gene expression signature, and potential therapeutic logic.
  • Treatments effective for one subtype — say, anti-inflammatory approaches for the respiratory-sleep cluster — could be irrelevant or harmful for patients in the psychological or gastrointestinal clusters.
  • The researchers have released a free, open-source online tool so that scientists and clinicians worldwide can explore the data and identify new treatment targets as understanding continues to evolve.

Millions of people know Long COVID from the inside — the fatigue, the brain fog, the breathing difficulties, the digestive disruptions that persist long after the infection itself has passed. Medicine has recognized the condition as real and widespread, yet its genetic architecture has remained largely unmapped, leaving clinicians without a clear sense of which biological mechanisms to target.

A research team has now built that map. Using a computational framework that fuses two analytical approaches — one establishing direct causal links between gene expression and disease risk, another borrowed from engineering to identify regulatory hubs within biological networks — they identified 32 genes likely to cause Long COVID. Nineteen had already appeared in prior COVID-19 research, validating the method. Thirteen were entirely new discoveries. The work draws on genome-wide association studies, RNA sequencing from patient tissues, and protein interaction maps, integrating these layers to distinguish genes that directly drive disease from those that act as critical control points whose disruption cascades through the entire biological system.

Several genes stood out across multiple analyses — MORN4, CDC26, and EIF5A for their consistent causal signal; TP53, CREBBP, and SMAD3 for their extensive downstream influence. Enrichment analysis placed these genes within pathways governing immune regulation, viral response, cell cycle control, and metabolic adaptation, with the TGF-beta signaling pathway appearing prominently — consistent with the persistent inflammation and tissue remodeling clinicians observe in Long COVID patients.

The most consequential finding, however, was structural: Long COVID is not one disease. When patients were grouped by the expression patterns of these 32 genes, three distinct subtypes emerged. The first, and largest, cluster was defined by respiratory and sleep disturbances, with elevated expression of inflammatory and stress-response genes. The second was marked by psychological symptoms — anxiety, depression — and dental problems, with a gene signature tied to cell cycle regulation and mood modulation. The third, smallest cluster presented primarily gastrointestinal and metabolic disturbances, with mitochondrial gene activity suggesting disrupted energy production. Crucially, these groupings were not explained by age, sex, or pre-existing conditions — they appeared to reflect genuine molecular differences producing different clinical realities.

The implications are direct: a therapy suited to one subtype may be useless or harmful for another, making personalized medicine not an aspiration but a clinical necessity. To extend the work's reach, the team released a free online tool allowing researchers and clinicians to explore the data and identify new therapeutic targets. The authors are candid about limitations — the genetic data skews toward European ancestry, and the subtypes require validation in independent cohorts before entering clinical practice. But the framework they have constructed, integrating causal inference with network analysis across multiple biological data types, represents a meaningful advance toward understanding not just that Long COVID happens, but why — and what, precisely, might be done about it.

Millions of people have experienced Long COVID—the constellation of symptoms that persists or emerges months after a SARS-CoV-2 infection ends. Fatigue, breathing problems, brain fog, heart palpitations, digestive distress. The condition is real and widespread, yet medicine has struggled to explain why it happens or how to treat it. Doctors can identify patients with Long COVID, but the genetic architecture underlying the disease has remained largely opaque, leaving researchers without a clear map of which genes drive the condition and where to intervene.

A team of researchers has now filled that gap. Using a computational framework that combines two complementary analytical approaches—one that identifies genes with direct causal effects on disease, and another that maps critical control points within biological networks—they have identified 32 genes likely to cause Long COVID. Nineteen of these genes had already been linked to COVID-19 or Long COVID in prior studies, validating the approach. The remaining thirteen represent entirely novel discoveries, genes never before connected to the condition. The work appears in PLOS Computational Biology and represents, the authors argue, the first comprehensive genetic framework for understanding Long COVID.

The researchers drew on multiple layers of biological data: genome-wide association studies that track genetic variants across populations, expression quantitative trait loci that show how genetic variants influence gene activity, RNA sequencing data from patient tissues, and maps of how proteins interact within cells. They applied Transcriptome-Wide Mendelian Randomization to establish causal relationships between gene expression and disease risk, then layered on control theory—a mathematical framework borrowed from engineering—to identify which genes act as regulatory hubs, the nodes whose disruption would ripple through the entire biological network. By varying a parameter that balanced these two approaches, they generated a spectrum of candidate genes, from those with the strongest direct causal evidence to those with the most critical network positions.

Among the 32 genes identified, several stood out for their consistency across analyses. MORN4, CDC26, and EIF5A ranked highly across multiple parameter settings, suggesting they exert strong influence on disease susceptibility. Other genes—TP53, CREBBP, EP300, SMAD3, GRB2, YWHAG—emerged as network drivers, genes with extensive connections to downstream processes. The enrichment analysis revealed that these genes cluster in pathways related to immune regulation, viral response, cell cycle control, and metabolic adaptation. The transforming growth factor beta signaling pathway appeared prominently, a finding that aligns with what clinicians observe: Long COVID patients often show signs of persistent inflammation and tissue remodeling.

But perhaps the most striking finding was that Long COVID is not one disease. When the researchers clustered patients based on the expression patterns of their 32 causal genes, three distinct subtypes emerged, each with a coherent symptom profile and underlying biology. Cluster 1, comprising 65 patients, was dominated by respiratory and sleep problems. These patients showed elevated expression of genes involved in inflammatory response and stress adaptation—CREBBP, GRB2, MAPK1, SMAD2. Nearly half reported sleep disturbances; nearly a third reported increased mucus production. Cluster 2, with 53 patients, was characterized by psychological symptoms and dental problems. Anxiety and depression affected 38 percent of this group; cavities and tooth problems affected 19 percent. The gene expression signature here involved CDC26, CDKN1A, ESR1, and YWHAG—genes tied to cell cycle regulation and mood modulation. Cluster 3, the smallest group at 36 patients, presented primarily gastrointestinal and metabolic disturbances. Nearly 39 percent experienced nausea, diarrhea, or vomiting; 47 percent reported changes in appetite. The gene expression pattern showed downregulation of HDAC1, SRC, and TP53, alongside upregulation of NDUFA6, a mitochondrial gene critical for energy production.

These subtypes were not artifacts of demographic variation. Age, sex, smoking status, and most pre-existing conditions were evenly distributed across the three clusters. The symptom differentiation appeared to reflect genuine molecular distinctions—different underlying biology producing different clinical manifestations. This finding has immediate practical implications. If Long COVID consists of three distinct disease subtypes, then a treatment effective for one subtype might be ineffective or even harmful for another. The respiratory-sleep subtype might benefit from therapies targeting inflammatory pathways; the psychological subtype might require interventions addressing stress-related signaling; the gastrointestinal subtype might need metabolic support. Personalized medicine becomes not just an aspiration but a necessity.

To make their findings accessible, the researchers built a free online tool—a Shiny application—that allows other scientists and clinicians to explore the data, adjust parameters, and identify new therapeutic targets. The code is open-source and available on GitHub. This democratization of the analysis means that as new data emerges, as new eQTL datasets become available, as understanding of Long COVID deepens, researchers can refine and validate these findings without starting from scratch. The authors acknowledge limitations: the genetic data they used came primarily from people of European ancestry, a bias that could obscure population-specific susceptibility patterns. They note that the three subtypes should be viewed as testable hypotheses rather than fixed diagnostic categories. They emphasize that external validation in independent cohorts will be essential before these findings translate into clinical practice. Yet the framework they have built—combining causal inference with network analysis, integrating multiple data types, stratifying patients by biology rather than by arbitrary symptom lists—represents a significant step toward understanding why some people develop Long COVID and how to treat them.

Long COVID consists of three distinct subtypes, each with distinct symptoms and underlying biology.
— Study authors, PLOS Computational Biology
The identified genetic overlaps suggest that variants in these genes may influence individual susceptibility to persistent post-viral symptoms, as they do in other chronic conditions.
— Study authors, discussing shared genetic basis with autoimmune and metabolic disorders
The Hearth Conversation Another angle on the story
Inventor

Why does Long COVID seem to affect different people so differently? Some people can't breathe; others can't think straight; still others can barely eat.

Model

That's the central mystery the study is trying to solve. The researchers found that Long COVID isn't one disease—it's at least three. The gene expression patterns that distinguish them are real and reproducible. When you cluster patients by how their causal genes are expressed, you get three groups, and each group has a distinct symptom signature.

Inventor

But genes are the same in everyone, right? So how can the same genes produce different diseases?

Model

Genes are the same, but their expression—how active they are, how much protein they make—varies enormously between people and between tissues. The study identified 32 genes that appear to cause Long COVID, but they're not all active in the same way in every patient. In one group, inflammatory genes are highly expressed; in another, metabolic genes are dysregulated. Same genes, different expression patterns, different diseases.

Inventor

So if I'm in cluster one with respiratory problems, a drug that works for cluster two's psychological symptoms might not help me?

Model

Exactly. And it might even make things worse if the underlying biology is fundamentally different. That's why the researchers are calling these subtypes endotypes—they're defined by their molecular biology, not just their symptoms. It's the difference between treating a symptom and treating the disease.

Inventor

How confident are they that these three subtypes are real and not just statistical artifacts?

Model

They did several robustness checks. The clustering was stable across resampling. The symptom differences between clusters were statistically significant. Demographic factors like age and sex were evenly distributed, so the differences weren't driven by population stratification. But they're honest about limitations: this is one cohort, one time point. They need external validation in other populations before these subtypes become clinical reality.

Inventor

What about the 13 genes they found that nobody had linked to Long COVID before? How do they know those are real discoveries and not false positives?

Model

That's a fair question. They validated 19 of their 32 genes against existing literature, which is a good sign. For the 13 novel genes, they looked at the biological pathways those genes participate in—nucleotide metabolism, lipid synthesis, cell cycle regulation, iron-sulfur cluster assembly. The pathways make sense for a post-viral condition. But you're right to be skeptical. Those 13 genes need experimental validation, ideally in cell culture or animal models, before anyone should bet on them as drug targets.

Contact Us FAQ