Uncovering genetic clues to long COVID: Insights from a global GWAS

Long COVID genetics

Although long COVID cases have surged in recent years, the genetic factors that influence who develops the condition remain poorly understood. Traditional genome-wide association studies have struggled to identify reproducible genetic signals in long COVID, in part because of limited cohort sizes and inconsistent phenotype definitions.

The Sano GOLD dataset offers a valuable resource for addressing these gaps. It combines genotype and phenotype data from 1,996 individuals who have experienced long COVID. In a new study published in Nature Genetics, investigators conducted a genome-wide association study (GWAS) and replication using data from 33 cohorts, including the Sano GOLD cohort, spanning 19 countries. In this blog, we cover key insights from the study. 

Key Takeaways

  • Genetic Risk Factor: A major genome-wide association was identified in the FOXP4 locus, linking specific variants to increased long COVID risk.
  • Global Scale: The study analyzed data from 15,950 long COVID cases and 1.8 million controls across 33 global cohorts.
  • Protective Factors: Vaccination was strongly associated with a decreased risk of developing long COVID.
  • Ancestry Variations: The risk variant frequency varies significantly by ancestry, ranging from 1.6% in non-Finnish Europeans to 36% in East Asians.
  • Lung Pathology: Higher expression of FOXP4 in lung cells (type 2 alveolar cells) suggests a biological link between lung function and long COVID.

According to the World Health Organization, long COVID is characterized by symptoms beginning within 3 months of infection that persist for at least 2 months. Key study statistics include:

  • Estimated Prevalence: 10% to 70% of infected individuals.
  • Study Size: 15,950 individuals with long COVID.
  • Control Group: Approximately 1.8 million individuals.
  • Collaborative Effort: Part of the COVID-19 Host Genetics Initiative (COVID-19 HGI).

The main finding from the study was a significant genome-wide association within the FOXP4 locus, with certain variants exhibiting association with increased risk of long COVID. In line with previous studies, variants in the FOXP4 region were associated with severity of COVID. Importantly, vaccination was associated with decreased risk of long COVID. Identifying specific genetic loci associated with long COVID is a meaningful step toward biomarker-driven approaches that could help distinguish long COVID from other post-viral syndromes and inform more targeted prevention and treatment strategies.

Interestingly, the variant frequency differed widely across ancestry, from 1.6% in non-Finnish Europeans to 36% in East Asians. While most individuals in the cohorts were of European ancestry, this variation underscores the importance of diverse representation in genomic studies. In a separate replication study that also drew on the Sano GOLD dataset, over 88% of genes identified in an initial analysis were confirmed in a US cohort with different ancestry composition, demonstrating that these genetic signals hold across populations.

Blood sample analysis showed that FOXP4 levels were higher in non-acute COVID cases. This was associated with increased risk of long COVID in non-acute COVID samples but not in acute COVID samples. The expression of FOXP4 was found to be high in type 2 alveolar cells and granulocytes (immune cells) in the lung under normal conditions, indicating that its presence in the lungs was not an effect of infection. This lung-specific expression pattern is consistent with a growing body of research exploring how genetic factors drive persistent pulmonary vascular pathology in long COVID.

Most FOXP4 variants associated with long COVID were found to be localized within active enhancers or transcription factor binding sites. This means the variants likely affect how much FOXP4 is expressed rather than altering the structure of the protein itself. Furthermore, one of the risk alleles for long COVID was associated with lung cancer in Biobank Japan samples. 

By combining data across 33 cohorts worldwide, including the Sano GOLD dataset, researchers identified genetic variants associated with long COVID and their role in lung pathology. These findings are now informing the development of precision medicine approaches for long COVID, from biomarker-driven diagnostics to targeted therapeutic strategies. The Sano GOLD dataset has now contributed to multiple published studies on long COVID genetics, demonstrating how well-curated, longitudinal patient datasets can generate compounding scientific value across research programs.

Get in touch