Sano blog

Analyzing genomic data with AI

Written by Sano Marketing Team | Jul 31, 2024 1:57:22 PM

The progress in genome sequencing has catalyzed a significant transformation in the field of digital biology. Genomics programs across the world are gaining momentum as the cost of high-throughput, next-generation sequencing has dropped dramatically over the past decade. Now, whole genome sequencing is becoming a fundamental step in clinical workflows and drug discovery, especially for critical-care patients with rare diseases and in population-scale genetics research. However, traditional methods for analyzing genomic data are facing challenges in coping with the explosion of bioinformatics data.

In contrast, AI-driven approaches have been used to accelerate genomic analysis, enabling researchers to understand as much as they can from DNA without the risk of human error. For example, AI applications such as the Broad Institute's GATK (Genome Analysis Toolkit), are making the identification of variants simpler than ever. These tools find differences between a patient's sample and a reference genome, which is a critical step in determining the causes of genetic diseases, and can lead to the identification of new drug targets.

AI – and more specifically, deep learning – has become increasingly important in the field of gene expression analysis. This is particularly evident in technologies that use advanced neural network architectures to improve genomic analysis. Here's how these technologies and AI contribute to the field of genomics:

  • Detecting and characterizing genetic variations: In the context of genomic analysis, convolutional neural networks (CNNs) are used to analyze patterns in sequence data. These networks are adept at processing data with a grid-like topology, such as images in computer vision. In genomics, sequence data can be conceptualized in this way, so CNNs can identify and learn patterns within nucleotide sequences. This capability enhances the detection and characterization of genetic variations, making it particularly useful in variant identification and in understanding complex genetic relationships.
  • Identifying dependencies and patterns in a DNA sequence: Recurrent Neural Networks (RNNs) are a class of neural networks particularly suited for sequential data, which is a core aspect of genomic sequences. They are capable of processing sequences of data points, like nucleotides in DNA sequences. This allows them to identify dependencies and patterns over the sequence, which is crucial in understanding gene expression and regulation. RNNs can model dynamic behavior tied to time, making them useful in predicting how changes in the DNA sequence might impact gene expression over time.
  • Enhancing accuracy and speed of sequencing and analysis: AI-powered methods like those used in STAR (Spliced Transcripts Alignment to a Reference) and BWA-MEM (Burrows-Wheeler Aligner's Maximal Exact Matches) enhance the accuracy of sequence alignment and variant detection while also significantly speeding up these processes. Deep learning-based tools, such as Google's DeepVariant and NVIDIA's GPU-accelerated Parabricks, leverage CNN architectures to improve the accuracy of variant calling. This is crucial in large-scale genomic studies where the amount of data can be overwhelming.

Sequencing an individual's whole genome generates large amounts of raw data, exceeding 100 gigabytes. With the cost of sequencing decreasing, the volume of data available is exponentially increasing, and while traditional methods of genomic analysis struggle to keep pace with this data explosion, AI-driven approaches have emerged as powerful solutions. By interpreting image and signal data quickly, they ensure that base calling occurs as fast and as accurately as possible, driving the promise of understanding genetic diseases and developing novel therapeutics. 

To learn more, download our whitepaper, “Data-driven healthcare: How artificial intelligence and machine learning are transforming genomics.”