How Do We Audit an AI That Changes Itself?


As artificial intelligence becomes more deeply embedded in clinical care, a new dilemma is quietly emerging: what happens when the model doesn’t stay the same?

Unlike traditional medical devices or fixed software tools, a growing number of AI systems in healthcare are continuously learning, meaning they retrain on new data over time, often without human oversight. These “adaptive algorithms” are designed to get smarter. But as they evolve, so do the risks.

And it raises a fundamental, uneasy question: How do you audit a medical tool that is constantly rewriting itself?

For years, the AI tools deployed in healthcare followed a predictable lifecycle. A model was trained on a dataset, tested, validated, and either approved or integrated into clinical workflows. Once deployed, it remained “locked,” meaning no additional training occurred unless a new version was created and revalidated.

This static approach made oversight manageable. But as hospitals and health systems began to accumulate enormous volumes of real-world data, developers saw an opportunity: What if AI could retrain continuously, using the latest data to fine-tune its predictions?

These continuously learning systems are called adaptive AI or lifelong learning models. They ingest new patient records, clinician notes, imaging data, or lab results to improve accuracy over time. In theory, this makes them more responsive to local populations, emerging diseases, or shifting clinical patterns.

But it also means they are no longer the same model that was initially tested and approved.

Regulators have not ignored this shift. In 2019, the U.S. Food and Drug Administration (FDA) issued a discussion paper outlining the foundational principles for regulating AI/ML-based Software as a Medical Device (SaMD). But real momentum arrived between 2021 and 2025.

By December 2024, the FDA finalized its guidance on Predetermined Change Control Plans (PCCPs). This framework allows AI developers to submit models for approval with predefined guardrails for how the system may adapt over time. A PCCP includes:

  • Expected data types for retraining
  • Parameters that the model is allowed to change
  • Performance benchmarks that must be maintained

In January 2025, the FDA expanded its position further with a draft guidance requiring:

  • Version tracking
  • Real-time monitoring for performance drift
  • Transparent reporting of model updates

This evolving guidance marks a major shift: from one-time validation to lifecycle oversight.

While the promise of adaptive AI is compelling, real-world implementation has exposed its dangers. At one major U.S. hospital, an AI-powered sepsis alert system was designed to evolve with local data. Initially, it outperformed static systems.

But during the COVID-19 pandemic, lab values and diagnostic protocols changed rapidly. The adaptive system began to “learn” from the pandemic-era norms, misclassifying patients and triggering an increase in false alarms. Clinicians reported alarm fatigue and, in some cases, missed true positives.

There was no rollback system in place. The model had evolved to be less effective, without anyone realizing until harm had occurred.

This is not a theoretical risk. It’s a lived reality that underscores the stakes of deploying adaptive AI in high-pressure clinical environments.

As adaptive models evolve, another uncomfortable question surfaces: If an AI makes a faulty decision based on a post-deployment update, who is responsible?

  • The software vendor, who designed the evolving algorithm?
  • The hospital, which deployed it?
  • The clinician, who relied on its recommendation?

Currently, there’s no consensus. And as models become more autonomous, this ambiguity could lead to legal and ethical quagmires. Without clear lines of accountability, the very promise of adaptive AI could become a liability landmine.

To responsibly integrate adaptive AI into healthcare, we must move from compliance events (e.g., initial approval) to continuous governance. That includes:

  • Drift detection: Flagging when a model’s predictions deviate from historical norms
  • Statistical post-deployment monitoring: Using hypothesis testing frameworks to detect performance degradation
  • Human-in-the-loop validation: Ensuring clinicians can override or question AI decisions with clarity
  • Version disclosure to patients and staff: Transparency about when and how models have changed

Academic researchers, including a 2025 study published in npj Digital Medicine, argue for mandatory statistical drift audits, including error thresholds, reproducibility standards, and fail-safe triggers.

In a symbolic shift, the FDA itself began using generative AI internally in 2024. After pilots showed productivity improvements in regulatory review, they rolled out a tool called ELSA across all centers in May 2025.

ELSA is now used to:

  • Summarize scientific documentation
  • Generate statistical code
  • Detect anomalies in drug data
  • Draft regulatory correspondence

While not adaptive in the clinical sense, ELSA’s deployment marks the first time the FDA has used AI to shape its own operations, even as it tries to regulate similar technologies in industry.

This dual role deepens the urgency: Regulators are becoming users. The call for robust governance is no longer just external; it’s internal, too.

The FDA’s final guidance on AI lifecycle management is expected by late 2025, and it will likely formalize many of today’s “best practices” into regulatory mandates. In the meantime, hospitals and AI vendors face a simple but profound imperative:

If your AI changes, your oversight must too.

Deploying adaptive AI without a robust audit framework is no longer just risky. It may soon be non-compliant.

Until then, the question remains: Can we build medical AI systems that learn in ways we can still understand?

Because in medicine, uncertainty is always the enemy, and adaptive AI, without guardrails, risks making certainty harder, not easier, to come by.


💻 Stay Informed with PulsePoint!

Enter your email to receive our most-read newsletter, PulsePoint. No fluff, no hype —no spam, just what matters.

We don’t spam! Read our privacy policy for more info.

We don’t spam! Read our privacy policy for more info.

💻 Stay Informed with PulsePoint!

Enter your email to receive our most-read newsletter, PulsePoint. No fluff, no hype —no spam, just what matters.

We don’t spam! Read our privacy policy for more info.

Leave a Reply