Thesis

Three Ways AI-Oncology Companies Will Fail at the FDA

And the Clinical Expertise Gap That Causes It

Originally published 2026-05-12 · Also on LinkedIn. Followed up 2026-05-20 by How I Would Know If I'm Wrong — six conditions for falsifying this thesis, with locked seed list.

In 2025, artificial intelligence accounted for nearly half of all venture capital deployed into healthcare — an estimated $18 to $21 billion in a single year, with AI-drug-discovery companies alone absorbing roughly $3.8 billion across 135 deals, according to SVB and PitchBook. Oncology, as always, captured the largest share. The capital flowing into AI-driven cancer drug discovery has never been larger — and much will be set on fire.

I do not say that as a critic of the technology. I say it as a medical oncologist who spent three decades inside drug development — at the bench, in the clinic, across the table from FDA reviewers, and at the bedside of patients on Phase I/II/III trials. I have watched superb molecules die because the people building them did not understand the regulatory and clinical reality they were walking into. The AI-oncology wave is now setting up the same failure modes at industrial scale, and the agencies are actively tightening the screws.

Within 12 to 18 months, a meaningful share of today's well-funded AI-oncology companies will hit the FDA wall. Most of these failures will not be dramatic — they will surface as Complete Response Letters, clinical holds that extend and extend, pre-IND meetings that end without an agreed endpoint, programs quietly deprioritized. The public failures will be the visible edge of a larger pattern. The cause will be the same across the distribution: not that the models were bad, but that no one on the team was calibrated to what the agency was actually going to ask.

Here are the three failure modes I expect to see, and the gap that produces all of them.

A note on scope: this piece concerns biotech-shaped AI-oncology companies — those running their own programs toward an IND under their own name, not platform companies selling tools and services into pharma. The failure modes apply differently to the two business models, and the engagement model I describe at the end is built for the first.


Failure Mode 1: The Endpoint That Never Was a Drug

AI-discovered targets and AI-prioritized candidates routinely come wrapped in elegant computational endpoints — binding affinities, off-target predictions, multi-omic signatures, in silico response curves. Beautiful science. Not a drug.

A drug is something that changes a patient's clinical course in a way the FDA will recognize. Overall survival. Progression-free survival in a defined population. A validated surrogate with regulatory precedent. A symptomatic benefit measured by an instrument the agency has already accepted.

I have sat in pre-IND meetings where a sponsor presented a stunning translational story and walked out without an endpoint the agency would accept. The FDA does not reward the sophistication of your model. It rewards the rigor of your endpoint and the discipline of your trial design.

AI-oncology teams led by computational scientists and engineers — with no one inside the room who has run an oncology trial — keep designing programs around what their models can predict, instead of what the agency can review. That gap will not close itself. It is closed by people who have lived in both worlds.

Failure Mode 2: The Patient Who Was Never Going to Enroll

The second failure mode is more brutal: the trial that cannot enroll, or enrolls the wrong patient.

AI platforms are excellent at identifying biologically attractive subpopulations. They are usually terrible at predicting which patients oncologists will actually refer, which sites will activate, what the standard of care is in the geographies where you can still recruit, and which lines of therapy leave a window for an investigational agent.

This is not a hypothetical risk; it is the modal outcome. Roughly half of oncology trials fail to meet their enrollment targets on schedule, and a non-trivial share terminate for accrual reasons alone — the highest rate of any therapeutic area. The pattern is most visible in the structures AI platforms are most likely to recommend: precision-medicine basket and umbrella trials whose narrow biomarker arms close without reaching target, and combination regimens layered on top of standard-of-care backbones that the community-oncology referral base will not actually prescribe. The NCI-MATCH experience and the early KRAS-combination programs are the public version of a failure pattern that plays out, less visibly, in dozens of sponsor-led programs every year.

I have seen pristine trial designs collapse on contact with the clinic. A biomarker-defined population that exists in the database but not in the referral pattern. An eligibility criterion that excludes 80% of the patients the disease actually presents with.

The FDA notices. Slow enrollment is not just an operational problem — it is a signal of design failure. It triggers tougher questions at the next interaction, and it kills the credibility a sponsor needs to negotiate accelerated pathways.

Failure Mode 3: The Black Box That Cannot Be Audited

The third failure mode is the one the agency itself is now telegraphing loudly: AI-derived components of a submission that cannot be explained, validated, or reproduced to regulatory standard.

In January 2025, the FDA published draft guidance on AI in regulatory submissions. One year later, in January 2026, the FDA and EMA jointly issued Guiding Principles of Good AI Practice in Drug Development — a ten-principle framework across the full drug lifecycle. The direction is now explicit and transatlantic: models used in target identification, patient selection, dose finding, or endpoint adjudication must demonstrate provenance, validation, fitness for purpose, and a plan for monitoring drift. The documentation burden is risk-tiered — exploratory use carries a lighter footprint than a model driving patient selection in the registrational trial — but the expectation that the sponsor can defend the model's role in the regulatory decision is now baseline. CDER alone has published its experience with over 500 drug and biological product submissions containing AI components since 2016, and CBER's count is higher still. Reviewers know what they are looking at.

"We trained a transformer on multi-omic data" is not a regulatory answer.

The failure here is rarely that the engineering is missing. Serious teams do have provenance, validation, and drift discipline — those are standard MLOps practice. The failure is that the engineering has not been translated into the language and format a CDER review division will accept. A submission can contain every validation artifact and still walk into a pre-IND meeting unable to answer basic agency questions, because the artifacts have not been mapped to the agency's specific framework for the model's context of use. I have watched the equivalent happen in earlier waves — biomarker assays that were analytically sound but unvalidated for the claim being made, companion diagnostics that fell apart under regulatory scrutiny for the same translation reason. The pattern is repeating, with higher stakes and a more explicit regulatory framework.


The Underlying Gap: Calibration Between Computational and Clinical-Regulatory Judgment

These are not three separate problems. They are three symptoms of the same missing input: calibration — the reliable human sense of when the answer is actually right, in a domain where the stakes are high, the signal is noisy, and the consequences of error are irreversible. AI accelerates everything in drug development except this. In fact, the more a team delegates judgment to its models, the faster calibration atrophies — and the agencies are now explicitly asking for the human reasoning the models cannot supply.

The current generation of AI-oncology companies was built, correctly, around computational and engineering talent. That was the right starting move; without it there is no platform. But it produced a structural blind spot: founders and senior teams who have never personally taken a molecule through an IND, never sat across from a division at the FDA, never managed a Phase I dose-escalation in oncology patients, and never explained to a family why a trial closed early. Credentials they may have. Calibration is something else.

Investors and boards understand this. Series A AI-oncology companies are now structuring senior clinical-regulatory hires as co-founder-shaped roles, with equity that reflects the asymmetry of the risk those hires retire. They are not buying a name on a slide. They are buying the calibration that prevents the three failures above.

The companies that will survive the credibility reckoning are the ones that close this gap before the first major FDA interaction — not after the rejection.


What I Look For

I am writing this because I am selective about where I engage, and I want to be visible to the small number of teams where the fit is real.

I am most useful to AI-oncology companies that are:

  • Pre-IND or in Phase I, with Series A or B closed in the last 18 months.
  • Led by strong computational and scientific founders who already know clinical-regulatory judgment is the next hire, not the last.
  • Working on a target or modality where I can directly shape endpoint strategy, trial design, and FDA interactions — not endorse decisions already made.
  • Open to a fractional CMO or co-founder structure with equity, not a name-on-deck advisory retainer.

The right shape is simple: a small number of companies, deep involvement, real ownership, and direct accountability for the outcomes that determine whether the molecule lives or dies.

What I bring to those companies is calibrated clinical-regulatory judgment — the accuracy of confidence, not the volume of opinion — applied to the specific decisions that determine whether the molecule reaches patients. To be clear about the scope: a fractional role retires near-term decision risk. It does not build a long-term clinical-regulatory culture in 90 days — no one does. If what you need is the right call on the next eight to twelve decisions that determine whether the molecule reaches Phase II, that is what I am for. If what you need is a culture, you need a different hire, on a different clock.


Why Now

The window is narrow. AI-oncology is in a funding surge today; the credibility reckoning is 12 to 18 months away. After the first wave of public failures, the market will flood with post-mortem consultants — the wrong moment to start a relationship. The right moment is now, while there is still time to redesign the program and walk into the agency with answers instead of apologies.

If you are building an AI-oncology company at this stage, and the description above sounds like the seat you have not yet filled, the conversation is worth having. I take a small number of these engagements per year. I am taking inquiries for one of them now.


Jesús Gómez-Navarro, M.D., is a board-certified medical oncologist with 30+ years in oncology drug development at Pfizer, Millennium, and Takeda. As VP, Head of Clinical R&D at Takeda Oncology (2011–2020) and Takeda's inaugural Distinguished R&D Fellow (2020–2022), his strategic and technical leadership contributed to seven anticancer approvals across FDA, EMA, PMDA, and NMPA — NINLARO, ADCETRIS, ALUNBRIG, ICLUSIG, ZEJULA, CABOMETYX, and EXKIVITY — and to advancing the anti-CTLA4 antibody tremelimumab from first-in-human through phase 3 (later approved as IMJUDO). He founded OncAdios, LLC in 2022 to bring calibrated clinical-regulatory judgment to AI-oncology and biotech companies as fractional CMO, co-founder, or board director.

← Back to Writing  ·  Follow-up: How I Would Know If I'm Wrong →