Building better data ecosystems for rare and ultra-rare conditions

Written by Joy N. Ismail, PhD | Apr 10, 2025 6:31:18 PM

In rare and ultra-rare diseases, data availability is one of the most significant constraints in research and drug development. With thousands of rare conditions each affecting very small populations, the data needed to characterize these diseases is often fragmented, inconsistent, or inaccessible. Addressing this requires a combination of international collaboration, interoperable technology, and sustained patient trust. This article examines how better data practices, from standardization to secure sharing, can help researchers, clinicians, and patients advance precision medicine more effectively.

Key Takeaways

Data Scarcity: Rare diseases affect 1 in 200,000 people or less, making international data aggregation essential for statistically significant research.
Interoperability Challenges: Disconnected repositories for scans and biomarkers hinder the integration of medical records into precision medicine.
FAIR Principles: Adopting Findable, Accessible, Interoperable, and Reusable standards is critical for global scientific stewardship.
Economic Impact: Poor data accessibility and interoperability cost EU member states approximately €10.2bn annually.
Tech Solutions: FHIR APIs, federated architectures, and wearable devices are key to balancing data security with collaborative research.

Characterizing patient populations

Precision medicine initiatives in rare or ultra-rare conditions face a fundamental constraint: insufficient data to accurately characterize these small populations. Current estimates suggest more than 10,000 distinct rare diseases, each of these affects approximately 1 in 200,000 people or less. The vast majority of these conditions have a genetic basis, making genomic data central to disease characterization and therapy development. Collecting statistically meaningful data is an ongoing challenge across rare disease research, but is particularly acute in the context of precision therapy development. With limited data available for any individual condition, companies must rely on a combination of natural history and claims data when informing study design and interpreting results.

These constraints reinforce the need for international collaboration. Rare diseases collectively affect hundreds of millions of people globally, yet for any single condition, data is dispersed across geographies, institutions, and health systems. Many patients also face extended diagnostic journeys, during which clinical observations and test results accumulate across disconnected systems and are rarely consolidated into reusable datasets.

Building aggregated, cross-border datasets that better represent these populations is essential to accelerating treatment development. Equally, patient-centric approaches that earn trust and embed a culture of data sharing and research participation are foundational to making this collaboration sustainable.

Interoperability and data sharing

Pharma and healthcare organizations have made progress in digitizing and improving secure access to medical records. However, data linking remains a persistent barrier. Physicians and researchers can now access individual medical records more readily, but integrating associated data types, such as imaging, biomarker results, and genomic data, is far more difficult when information sits in disconnected repositories or incompatible formats.

For precision medicine research, this fragmentation is not just inconvenient. It limits the ability to build the coordinated, multi-modal datasets that modern trial design requires. Structuring data so it can be integrated and securely shared across systems and organizations is critical to both research progress and the improvement of frontline patient care.

Established rare disease databases and registries, including resources such as NORD, GARD, and Orphanet, already capture significant disease-level knowledge. However, the data they hold often remains siloed from clinical trial workflows and genomic datasets. Adopting universal data sharing standards at an international scale is essential to connecting these resources, enabling the pooling of fragmented datasets, and improving the likelihood of trial success.

The impact of poor data practices is significant:

Economic Cost: The EU estimates that poor data accessibility and interoperability costs member states €10.2bn per year.
Opportunity: Improving these practices offers substantial trickle-down cost savings for both businesses and patients.

FAIR guiding principles

The FAIR Guiding Principles for scientific data management and stewardship, published in 2016, represent the most widely referenced framework for universal data sharing standards. These principles aim to improve the Findability, Accessibility, Interoperability, and Reuse of scientific data. The framework establishes the core requirements that data standardization should meet to foster large-scale collaboration and support patient benefit.

For sponsors running genetically stratified programs, alignment with these principles has direct operational implications. Standardized, interoperable data reduces the friction of patient identification across registries and health systems, shortens the time between genetic confirmation and eligibility determination, and creates the conditions for patient recontact across successive studies in the same therapeutic area.

Some key tenets of collaborative data sharing practices include:

Standardizing metadata: Using frameworks such as the Fast Healthcare Interoperability Resources (FHIR)13 Application Programming Interfaces (APIs) to harmonize genomic and clinical data and provide a common language for health data exchange. Such approaches enable the safe exchange of EHRs, genomic repositories, and data recorded by wearables, as well as automated category standardization.
Federated architectures: Prioritizing architectures which centralize metadata while allowing localized data storage is also key to balancing flexibility and collaborative data sharing with adherence to ethics and governance regulations. Such approaches allow data to stay at their respective institutions, while the federated interface makes data findable across multiple organizations.
Investment in tech: Investing in digital tools such as electronic patient-reported outcomes (ePRO) and wearable devices reduce data entry errors, improve real-time data quality, and support more reliable capture and analysis.

Conclusion

Building effective data ecosystems for rare and ultra-rare conditions is not a single technical problem. It requires coordinated progress across data standardization, cross-border interoperability, and sustained patient trust. Without this foundation, the datasets needed to characterize small populations, design effective trials, and develop precision therapies will remain fragmented and underutilized.

For sponsors and research teams working in rare disease, the path forward involves adopting shared standards, investing in platforms that connect data across systems, and maintaining transparent, patient-centric engagement. Each of these steps compounds the value of the next, creating the kind of durable data infrastructure that precision medicine requires.

To learn how Sano Genetics supports data-driven precision medicine programs across recruitment, genetic testing, and long-term engagement, get in touch.

View full post