What are the major sources of genetic data for drug discovery?

sano doctor discussion-1

Genetics plays a principal role in health and disease, and continues to become increasingly important for drug development. With genetic data, the focus is shifting from traditional trial-and-error approaches towards precision medicine. This data helps us to understand the genetic underpinnings of diseases, enabling the identification of potential drug targets and more personalised treatment strategies. In this blog, we explore the major sources of genetic data that drive advancements in drug discovery, ranging from genomic databases to cutting-edge sequencing technologies.

Genomic databases and the Human Genome Project

The Human Genome Project marked a turning point in genetic research by being the first project to sequence the complete human genome. Led by a group of international researchers between 1990 and 2003, the project looked to comprehensively study all of the DNA of a select set of organisms. Doing this provided fundamental information about the human blueprint, allowing scientists to understand how mutations in genes affect people and learn more about inherited diseases. This project was also one of the main steps towards developing genetically targeted medicines and more personalised treatments.

Key genomic databases such as GenBank, dbSNP, and Ensembl have since played a pivotal role in cataloguing genetic information. GenBank stores genetic sequences from a diverse range of organisms, providing a vast resource for researchers to explore. Meanwhile, DbSNP focuses on single nucleotide polymorphisms, highlighting genetic variations associated with diseases. Ensembl also provides a centralised resource for researchers studying the genomes of humans, other vertebrates and model organisms. Other genomic databases also contribute to our larger understanding of the biomarker-based underpinnings of disease. 

Next-generation sequencing (NGS) 

Next-generation sequencing (NGS) and other genetic analysis technologies help researchers better understand the genetic variants associated with various diseases. NGS offers rapid sequencing of entire genomes and allows researchers to precisely pinpoint disease-associated mutations. Currently, NGS plays a critical role in identifying potential drug targets and informing treatment strategies tailored to an individual's genetic makeup. It's also used for multi-analyte tumour analysis and to discover new ways to monitor cancer treatment and recurrence.

Biobanks, patient registries, and population studies

Biobanks serve as repositories for genetic samples sourced from diverse populations. These collections facilitate large-scale genetic analyses that shed light on genetic diversity, disease prevalence, and treatment responses. Population studies also help to identify genetic variations associated with specific diseases across different ethnic groups.

On the other hand, genetic patient registries are sophisticated databases designed to house comprehensive genetic and clinical information about patients with specific medical conditions. These databases act as purpose-driven collections of data, organised to serve predetermined scientific, clinical, or policy objectives. A registry can capture patients' clinical statuses, medical histories, laboratory results, and more. Some examples of patient registries might be those focused on rare diseases, cancer subtypes, or primary immune deficiencies. They're used to:

  • Assist healthcare practitioners in formulating optimal treatment strategies for individual patients or specific groups.
  • Provide data to develop therapeutics or to learn about population behaviour patterns and their association with disease development
  • Help develop research hypotheses
  • Support quality healthcare and personalised treatments

Pharmacogenomics and precision medicine

Genetic differences between people can influence how individuals metabolise and respond to medications. Pharmacogenomics explores the link between these genetic variations and drug responses. This field of study has paved the way for precision medicine, where treatments are tailored to an individual's genetic profile. By understanding how genetic variations impact drug efficacy and safety, pharmacogenomics can increase the success rates of drug therapies and minimise the risk of adverse reactions. Since it's estimated that only 50% of patients respond positively to their medications and adverse drug reactions can be very serious, identifying genetic factors that may predispose a patient to a negative reaction could massively affect people with a condition. 

There are some pharmacogenomic tests available at the moment, including HercepTest, which received approval from the Center for Devices and Radiological Health in 2001 to detect HER2 protein overexpression in breast cancer tissue. More advanced multigene solutions for breast cancer diagnosis are now emerging, such as the FDA-approved 70-gene-based MammaPrint. These tests guide long-term management decisions and help create tailored treatment plans. However, cancer treatment isn't the only place where pharmacogenomic testing is proving beneficial. In fact, the FDA now includes pharmacogenomic information on the labels of around 200 medications. This information can help doctors tailor drug prescriptions for individual patients by providing guidance on dose, possible side effects, or differences in effectiveness for people with certain gene variants.

Examples of common medicines that have pharmacogenetic tests include:

  • Abacavir: an HIV treatment
  • Carbamazepine: an epilepsy treatment
  • Tamoxifen: a breast cancer treatment

Genome-wide association studies (GWAS)

Genome-wide association studies (GWAS) scan the genomes of large populations to identify genetic markers associated with diseases. For example, some GWAS have identified single nucleotide polymorphisms (SNPs) associated with several complex conditions including diabetes, heart disease, Parkinson's disease, and Crohn's disease. These insights provide critical starting points for drug discovery efforts.

Several GWAS resources are available online, and one of the largest is the GWAS catalogue, which is a structured repository of summary statistics for a large variety of traits. Other useful resources include the LD-hub, GWAS summary statistics from the UK Biobank and dbGaP, which allows access to individual genomic data for authorised users.

Functional genomics and CRISPR-Cas9

Functional genomics aims to understand how genes operate and how they contribute to biological processes. The goal of functional genomics is to work out the relationship between an organism's genome and its phenotype. Several technologies are available to study functional genomics, however the most effective is a gene editing technology called CRISPR/Cas9 or Clustered Regularly Interspaced Short Palindromic Repeat. This tool gives researchers the ability to change an organism's DNA by adding, removing or altering genetic material at particular locations in the genome. With the ability to “edit” genes, it is now possible to create libraries of CRISPR reagents covering the activation or deletion of every gene in the genome to help find the specific genes involved with conditions.

Single-cell sequencing

Single-cell sequencing allows researchers to analyse individual cells and understand their unique gene expression profiles. It can also help them understand diseases better and identify suitable preclinical models for specific disease subtypes. Combining single-cell sequencing with CRISPR (scCRISPR screening) speeds up the process of confirming targets and provides more information on how the targets work. In the same regard, single-cell sequencing allows researchers to learn more about how compounds affect specific cell types and uncover any unintended effects. In clinical development, single-cell sequencing helps:

  • Find biomarkers to group patients
  • Understand how drugs work
  • Track how diseases and drug responses change


Genetic data is improving success rates, reducing costs, and shortening timelines in drug development, and collaborative use of genomic databases, sequencing technologies, biobanks, and functional genomics has made this possible. By using data from these sources throughout drug trials and discovery, researchers are uncovering new drug targets, tailoring treatments to individual genetic profiles, and working towards a future of more personalised medicine. To learn more about the role of genomics in drug discovery, download our whitepaper, “Unravelling the complexities of genomics-driven drug discovery” below.


Get in touch