Podcast recap: Ryan Dhindsa and Caleb Lareau on using biobank genomics to measure EBV persistence

The Genetics Podcast featuring Ryan Dhindsa and Caleb Lareau

In a recent episode of The Genetics Podcast, Patrick Short spoke with Dr. Ryan Dhindsa, Assistant Professor at Baylor College of Medicine and Investigator at the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, and Dr. Caleb Lareau, Assistant Professor and Investigator at Memorial Sloan Kettering Cancer Center. They discuss their recent study that used UK Biobank whole-genome sequencing to quantify Epstein-Barr virus (EBV) persistence in blood and links that signal to autoimmune disease risk and host genetics.

Turning “unmapped reads” into a persistence trait

Ryan and Caleb describe how their team reused short read whole-genome data to create a new molecular phenotype. Instead of focusing only on reads that map to the human genome, they pulled out unmapped reads and aligned them to the EBV genome. That produced a quantitative snapshot of EBV DNA in blood at the time each person enrolled in UK Biobank.

A key step was validation. UK Biobank has EBV serology for a subset of participants, which allowed them to check whether their computational signal tracked real infection status. Early versions of the approach were noisy because repetitive regions in the EBV genome can attract spurious alignments. Once they filtered those regions appropriately, the readout aligned with serology and became reliable enough to use for downstream analyses.

What the phenotype points to in autoimmunity

With a persistence trait in hand, Ryan and Caleb looked across UK Biobank clinical phenotypes and found stronger links to autoimmune conditions such as lupus and rheumatoid arthritis. They also discuss an important null. Despite strong evidence that EBV infection is a prerequisite for multiple sclerosis, the persistence phenotype did not show a meaningful association in the way lupus and rheumatoid arthritis did.

Their interpretation is mechanistic. Some EBV related outcomes may be driven by chronic antigen exposure, where persistent infected cells keep stimulating the immune system over time. Others may be driven by an earlier trigger, where infection initiates an immune trajectory that can persist even without a high measurable burden later. That distinction matters because it suggests the right biomarker depends on the disease model you are testing.

Host genetics shapes viral control

Ryan highlights that EBV persistence has a measurable genetic architecture. Their genome wide association analysis identified multiple loci, with the strongest signal in the major histocompatibility complex (MHC) and human leukocyte antigens (HLA) region, consistent with antigen presentation playing a central role in viral control.

Caleb adds a practical advantage of this setting. Many immune genome wide association studies (GWAS) signals are hard to interpret because the relevant antigen is unknown. Here, you have the EBV proteome, so you can model predicted interactions between HLA alleles and viral peptides to connect association signals to a plausible immune mechanism.

They also note non-genetic correlates that are clearly associated with higher EBV levels, including age, smoking status, and socioeconomic measures.

A template that can extend beyond EBV

Ryan notes that the same approach can be applied to other DNA viruses with blood tropism and latency features, and he mentions extensions into other herpesviruses. Both guests also point to what would make future population resources more powerful: access to raw sequencing data, orthogonal assays like serology, and longitudinal sampling that would let researchers measure persistence as a dynamic exposure rather than a single time point.

Biobank design considerations highlighted by this work

Ryan and Caleb emphasize that discoveries like this depend heavily on how population genomics resources are structured.

Access to raw sequencing data is foundational. Viral signals often reside in reads that do not map to the human reference genome and are typically discarded in standard pipelines. When only processed variant calls are available, opportunities to define new molecular phenotypes from unmapped or poorly characterized sequences are lost. Making primitive data formats accessible, within appropriate governance frameworks, substantially expands the range of questions that can be asked over time.

The presence of orthogonal measurements can determine whether a derived phenotype is credible. In this study, EBV serology in a subset of UK Biobank participants provided an external validation layer that was essential for quality control. Even partial serology panels can anchor computational traits to biological ground truth.

Longitudinal sampling is another limitation they discuss. A single time point provides a snapshot of viral burden, but repeated measurements would allow estimation of cumulative exposure or persistence dynamics, which may better reflect immune consequences over time.

The biological scope of a biobank is also shaped by the analytes and tissues collected. Whole genome sequencing from blood enables the study of latent DNA viruses with blood tropism. Expanding to RNA based assays or additional tissues would broaden the range of infectious agents and host responses that can be interrogated at scale.

Finally, ancestral diversity remains central. Most large scale datasets are still disproportionately European, while EBV associated outcomes vary across populations. Broader representation is necessary to disentangle host genetics, viral strain variation, and environmental factors across global cohorts.

Listen to the full episode below.

Get in touch