Strengthening rare-disease pharmacovigilance with curated real-world data
Estimated reading time: 7 minutes


Traditional pharmacovigilance (PV) assumptions—that volume will reveal signal or that statistical thresholds will separate noise from true safety issues—break down fast in rare-disease settings. When a global patient population measures in the dozens rather than the thousands, every single case carries outsized weight,natural history is often incomplete and routine thresholds become meaningless.That reality forces sponsors to rethink signal detection: if we cannot rely on volume, we must rely on fit-for-purpose data, clinical context and human oversight
Carefully curated real-world data (RWD)—structured datasets drawn from registries, electronic health records (EHRs), natural history studies and patient-reported outcomes (PROs)—can materially improve safety monitoring in small, diverse populations.
When combined withadvanced analytics and artificial intelligence (AI),these datasets support hypothesis generation and signal validation, offering a window into long-term treatment effects, off-label use and rare adverse events. This window often materialises months or years before traditional PV systems would catch them.
So, how can we build a RWD strategy and get usable data into PV workflows?
The key is selecting the right data, ensuring quality and consistency, and integrating these sources into safety workflows in a scalable, regulatory-aligned manner.
Why curated RWD matters more than quantity
Data quality, completeness and context trump sheer volume in rare-disease PV. You can learn far more about causality and plausibility from a harmonised registry entry or a structured EHR with relevant timestamps, concomitant medications and clinician notes than inconsistent spontaneous reports.
Curated sources help improve safety monitoring in three concrete ways: contextualisation, hypothesis generation and signal validation.
As new therapies extend survival, what once looked abnormal in a child may be expected in an adult with a particular rare disease. Only quality contextual data clarifies that shift. Curated datasets capture elements of natural history—age at onset, progression milestones, comorbidities and concurrent interventions—that help determine whether an event is disease-related or treatment-related.
Pattern recognition across harmonised datasets can also create early plausible hypotheses about adverse events without relying on disproportionality statistics that fail when denominators are tiny. Rather than rely on speculative, data-hungry statistical tests, curated feeds enable focused clinician review.
Additionally, when an unusual event appears, curated data allows targeted adjudication—comparing carefully selected historical and contemporary cases, reviewing clinical notes and following up with patients or carers in a way that respects burden and ethics. That depth of information supports regulatory dialogue and increases the credibility of any safety assessment.
Ultimately, it’s not how much data you have; it’s whether the data you do have is organised, interpretable and anchored to clinical reality.
Building a fit-for-purpose RWD strategy
RWD sources must be chosen for relevance, reliability and feasibility. Although typical high-value inputs are disease registries, structured EHR extracts, natural history cohorts and PROs, the final mix depends on the disease, geography and patient-care pathway.
When selecting data partners, sponsors and marketing authorisation holders (MAHs) should look for those who can demonstrate governance and practical capture workflows that collect data in ways that respect patient burden and real-world constraints. Equally, data sources, capture methods and workflows should align with the therapy area and patient population of interest. Data-sharing agreements should also be explicit about permitted uses (for example, signal detection versus research), data security and retention policies, responsibilities for de-identification and expectations for update cadence. These operational details determine whether a dataset is usable for routine PV or only for ad hoc analyses.
Curating RWD also requires disciplined preprocessing. Data cleaning and normalisation are not optional ‘tidying’ steps.
Clear records—including cautious test cases, trial runs and expert clinical review—help build regulator trust in an approach that has to work differently from the usual large-population statistical methods.
So, sponsors and MAHs need to agree on the key pieces of data to collect, make sure terms and definitions are consistent, and keep records showing where each piece of information came from so that every analytic result can be traced back to source. Regulators want to see transparent decision-making; it’s important to explain why bespoke thresholds were chosen, how curated sources were assembled and validated, and how having a ‘human-in-the-loop’ can mitigate AI-model biases.
Mapping and coding are also essential.
When datasets are small and drawn from diverse sources, mapping them into recognised common models or controlled vocabularies ensures that information is interpretable across studies, usable for regulatory submissions and comparable over time. Without this step, data can’t be reliably integrated or analysed, and regulators may question its validity.
For example, theOMOP Common Data Modelhelps organise real-world health data in a consistent way, theCDISC SDTM formatis used for structuring clinical trial results andMedDRAis the standard for coding side effects.
Each one serves a different purpose—research, regulatory submissions or safety monitoring—and the mapping needs to be done carefully so that no important details are lost.

Integrating RWD with existing PV systems and workflows
Curated feeds must slot into existing safety systems so that they become part of everyday decision-making rather than an occasional add-on.
This integration is just as much about processes and people as it is about technology.
Practical integration steps include defining accountable use cases, setting pragmatic thresholds and always keeping a human in the loop.
Sponsors and MAHs should start by narrowing the scope of RWD to clearly identified needs, such as spotting new patterns of symptoms early or tracking lab results over time. Each use case should have a direct link to specific decisions—whether that is triaging cases for further review, following up with patients or deciding when to escalate findings to regulators. Building processes around these defined purposes ensures that data is being used efficiently and meaningfully.
Setting pragmatic thresholds for decision-making is also crucial. In rare-disease settings, traditional statistical triggers for safety signals often do not work because the numbers are too small. Instead, sponsors and MAHs should create tailored decision rules suited to the available data. These should be backed by a clear explanation of why they were chosen, along with a cautious approach to testing and validation. This transparency allows regulators to understand the reasoning and trade-offs involved, while also helping the PV team apply the same standards consistently across reporting periods.
Even with strong automation in place, human oversight remains critical. Automation can handle many repetitive or time-consuming tasks, such as combining datasets, generating comparative summaries or performing an initial triage of cases. However, the final judgement about whether a treatment caused an adverse event should rest with trained clinicians. Automated tools should be designed to provide clear, concise and reliable summaries that make the clinician’s work easier—not to replace their expertise in making complex safety decisions.
It’s also worth remembering that patients and advocacy groups are often the subject-matter experts—especially when it comes to rare diseases. Use them!
Effectively integrating RWD with existing PV systems and workflows means asking what is feasible and meaningful for patients. For example, asking for daily weights may be unrealistic, but periodic pharmacy touchpoints, physiotherapy records or text-based check-ins may be practical and high value. Sponsors and MAHs should, therefore, co-design data collection with patient communities to reduce burden, increase retention and preserve trust.
When data is scarce, the quality, completeness and context of each contribution from patients and caregivers can make the difference between spotting a safety signal early and missing it altogether.

Volume is not on our side for rare-disease PV.
But sponsors and MAHs that combine curated RWD with appropriate AI tools will find they can shorten the time from signal emergence to clinician-led assessment—enabling them to make better safety decisions for patients who often have nowhere else to turn.
Connect with Lucy
in the know brings you the latest conversations from the RARE think tank. To access more in the know articles click below.