Combining machine learning with genetic and medical data in UC

sano x benevolent ai

Sano Genetics and BenevolentAI have successfully completed the first phase of their research collaboration. 

Sano and BenevolentAI have worked closely over the past year to generate a linked genetic and medical record dataset for patients with ulcerative colitis (UC). The project, partly funded by the UK government’s innovation agency grant, uses a study design that doesn’t require patients to travel to specific clinical sites, thus placing them at the heart of the process, enabling ‘real-world’ data to be collected, and creating more interactions between patients and researchers.

Read on below to learn more about UC, the project goals, the participants’ perspective, and outcomes and next steps.



UC is one of the most common types of inflammatory bowel disease (IBD), the other most common type being Crohn’s disease. UC affects the large intestine, causing ulcers to develop along the colon's lining and inflammation of the colon and rectum, with patients suffering from stomach pain and other debilitating side effects. 

It is known that there’s a large genetic component to IBD, with over 200 genetic risk factors identified to date.[1] Some of the proteins encoded by these genes are involved in forming the epithelial lining of the colon, but most are part of the immune system, and much is still unknown about the pathogenesis of the disease. 



In 2017, 6.8 million cases of IBD were recorded globally,[2] and cases continue to grow. 

Unfortunately, many who suffer with UC find available treatments ineffective. The reasons why some people experience mild symptoms while others progress rapidly are also poorly understood. 

It’s for this reason that building a unique database incorporating genetic, clinical, and patient-reported data should prove vital in the discovery of novel drug targets and drivers of disease progression, and help us identify biomarkers that inform a patient's response to treatment.


unnamed (1)



This research collaboration brings together BenevolentAI’s expertise in machine learning and IBD with Sano’s expertise in collecting and linking patient genetic and medical data. 

The goal of the project is to generate a linked genetic and medical record database that could be used in machine-learning applications to accelerate drug and biomarker discovery in UC. This aligns with BenevolentAI’s commitment to developing an oral, small-molecule treatment with disease-modifying efficacy, and improved safety for patients with UC who do not respond to the current standard of care options.

At the time of writing, we have enrolled 619 people into the database which forms the basis of this project. In the first phase, medical records were successfully retrieved for 399 participants. Of these 399, 68% completed a genetic test resulting in a linked dataset of 272 participants.

The first phase of the study was conducted in the United States, where medical records are fragmented and often challenging to access. Despite this, we were still able to retrieve both electronic and paper medical records for more than 60% of participants.


unnamed (2)



The study was entirely de-centralised, with participants providing:

  • electronic consent
  • detailed information about their symptoms 
  • medical history
  • permission for medical record linkage 

All via Sano Genetics’ online platform.

In parallel with the medical record linkage process, participants were sent a simple, at-home saliva sample collection kit. 

The saliva samples were then genotyped, and additional DNA was stored to enable exome sequencing or whole genome sequencing in the future.

Now, all participants have access to the Sano Platform where they’re:

  • alerted to new research opportunities they’re eligible to participate in
  • able to provide up-to-date information via questionnaires
  • able to access free, personalised genetic trait reports
  • engaged using expert educational materials via the ‘Sano Virtual Waiting Room.’



BenevolentAI uses machine learning algorithms to integrate large volumes of scientific literature, along with patient-level data such as genetic and clinical data/medical records, to identify potential new drug targets and opportunities for precision medicine approaches such as patient stratification for a wide variety of diseases.

By exploring large, well-annotated patient-level datasets in which clinical details are linked to genetic information, we can better understand the efficacy of different treatment regimens to try to identify those patients that are likely to respond (or not respond) to a given treatment for a given disease.

BenevolentAI has worked closely with Sano to guide the collection, formatting and ingestion of study data in a way that ensures it can be used efficiently in machine learning applications. As the database of enrolled patients grows, BenevolentAI plans to use this valuable resource to further explore precision medicine approaches, such as defining subgroups of UC patients who have experienced either effective or ineffective treatment regimens. 



The successful completion of the first phase is a major milestone for Sano Genetics and BenevolentAI. By uniting genetics and clinical data through a patient-focused siteless study design, the project has created a powerful new tool for precision medicine and genomics research and a framework that can be applied to a range of diseases.

And, by allowing participants to take part fully from home, this approach reduces barriers for participants and aligns with Sano’s and BenevolentAI’s commitments to developing more diverse and representative datasets.

The learnings from this phase are being applied to expand the programme, reach a greater number of participants, and explore testing for additional biomarkers that may provide insight into progression, treatment response, and novel drug targets for UC. 


Group 48095699 (2)



Discover Sano Genetics.

We’re accelerating the transition to precision medicine, and equipping precision medicine development teams with the technology to work faster at a fraction of the cost, without compromising on quality.  

If you would like to find out more, simply complete the form below and a member of our team will be in touch to arrange a free, no-obligation call. 




[2] The global, regional, and national burden of inflammatory bowel disease in 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017, The Lancet: Gastroenterology and Hepatology

Get in touch