Key Takeaways
- Genetic Risk Factor: A major genome-wide association was identified in the FOXP4 locus, linking specific variants to increased long COVID risk.
- Global Scale: The study analyzed data from 15,950 long COVID cases and 1.8 million controls across 33 global cohorts.
- Protective Factors: Vaccination was strongly associated with a decreased risk of developing long COVID.
- Ancestry Variations: The risk variant frequency varies significantly by ancestry, ranging from 1.6% in non-Finnish Europeans to 36% in East Asians.
- Lung Pathology: Higher expression of FOXP4 in lung cells (type 2 alveolar cells) suggests a biological link between lung function and long COVID.
According to the World Health Organization, long COVID is characterized by symptoms beginning within 3 months of infection that persist for at least 2 months. Key study statistics include:
- Estimated Prevalence: 10% to 70% of infected individuals.
- Study Size: 15,950 individuals with long COVID.
- Control Group: Approximately 1.8 million individuals.
- Collaborative Effort: Part of the COVID-19 Host Genetics Initiative (COVID-19 HGI).
The main finding from the study was a significant genome-wide association within the FOXP4 locus, with certain variants exhibiting association with increased risk of long COVID. In line with previous studies, variants in the FOXP4 region were associated with severity of COVID. Importantly, vaccination was associated with decreased risk of long COVID. Identifying specific genetic loci associated with long COVID is a meaningful step toward biomarker-driven approaches that could help distinguish long COVID from other post-viral syndromes and inform more targeted prevention and treatment strategies.
Interestingly, the variant frequency differed widely across ancestry, from 1.6% in non-Finnish Europeans to 36% in East Asians. While most individuals in the cohorts were of European ancestry, this variation underscores the importance of diverse representation in genomic studies. In a separate replication study that also drew on the Sano GOLD dataset, over 88% of genes identified in an initial analysis were confirmed in a US cohort with different ancestry composition, demonstrating that these genetic signals hold across populations.
Blood sample analysis showed that FOXP4 levels were higher in non-acute COVID cases. This was associated with increased risk of long COVID in non-acute COVID samples but not in acute COVID samples. The expression of FOXP4 was found to be high in type 2 alveolar cells and granulocytes (immune cells) in the lung under normal conditions, indicating that its presence in the lungs was not an effect of infection. This lung-specific expression pattern is consistent with a growing body of research exploring how genetic factors drive persistent pulmonary vascular pathology in long COVID.
Most FOXP4 variants associated with long COVID were found to be localized within active enhancers or transcription factor binding sites. This means the variants likely affect how much FOXP4 is expressed rather than altering the structure of the protein itself. Furthermore, one of the risk alleles for long COVID was associated with lung cancer in Biobank Japan samples.
By combining data across 33 cohorts worldwide, including the Sano GOLD dataset, researchers identified genetic variants associated with long COVID and their role in lung pathology. These findings are now informing the development of precision medicine approaches for long COVID, from biomarker-driven diagnostics to targeted therapeutic strategies. The Sano GOLD dataset has now contributed to multiple published studies on long COVID genetics, demonstrating how well-curated, longitudinal patient datasets can generate compounding scientific value across research programs.