Follow-up to the Three Ways thesis

How I Would Know If I'm Wrong

Six Conditions for Falsifying the Three Ways Thesis

Published 2026-05-20 · Also on LinkedIn. Companion artifact: the locked seed list of fifteen biotech-shaped AI-oncology companies that anchor Condition 1.

Two weeks ago I published a thesis arguing that a meaningful share of today's well-funded AI-oncology companies will hit the FDA wall in the next twelve to eighteen months — and that the cause will not be that the models were bad, but that no one on the team was calibrated to what the agency was actually going to ask. The piece named three failure modes — the endpoint that never was a drug, the patient who was never going to enroll, the black box that cannot be audited — and traced all three back to a single missing input: calibration.

I sell calibrated judgment for a living. So it would be intellectually dishonest of me to publish a thesis without also publishing the conditions under which I would know I was wrong. And it would be cheap to publish those conditions without committing to test them on a calendar.

I therefore commit to two specific public assessments:

Twelve months from publication — a public post-mortem on the six conditions named below: which were met, which were partially met, which were unmet. With what I learned and what I would do differently.
Twenty-four months from publication — the final assessment, on the same conditions, on the same standard.

The post-mortems will publish whether the thesis held or failed. They will publish on time.

These are the six conditions.

1. Volume

For this falsifier I track a locked universe of biotech-shaped AI-oncology companies — those pre-IND or in Phase I, with Series A or B closed between January 2024 and this piece's publication date, with AI load-bearing in target discovery, candidate prioritization, patient selection, or endpoint adjudication, and public enough to be observable. Fifteen companies meet all four criteria. The seed list is locked on publication date and published in full alongside this piece. Companies can exit the universe (acquired, IPO'd into a different category, ceased operations); they cannot be added retrospectively to dilute or inflate the rate.

If, eighteen months from publication, fewer than one-third of the companies in that universe have hit one of the three failure patterns — Complete Response Letters, clinical-hold extensions that keep extending, pre-IND meetings that ended without an agreed endpoint, programs quietly deprioritized — then the rate is overstated. The directional claim might survive; the twelve-to-eighteen-month urgency framing would not, and the call to act now would need to be retired.

2. Distribution

If observed failures cluster heavily on a single failure mode — almost all endpoint failures, or almost all enrollment failures, or almost all translation failures — then the "three modes from one root" framing is too tidy. The root cause would be narrower than the calibration argument requires, and the engagement model I propose should be sharpened around the dominant mode rather than offered as a general defense across all three.

This is the most judgment-laden of the six conditions. I have not pre-registered a specific clustering threshold because the failure modes are not cleanly separable in practice — a program can fail Mode 1 and Mode 3 simultaneously, or fail Mode 2 in a way that is downstream of Mode 1. Rather than pretend to a precision I cannot defend, I will report the full distribution at each post-mortem and let readers judge whether the "three from one root" framing held. The cost is interpretive latitude; the alternative would be false precision.

3. Causation

If AI-oncology companies that fail in the predicted patterns turn out to have had senior, calibrated clinical-regulatory operators embedded and still failed, then the missing-input thesis is wrong. The cause was something else — financing, science, market, talent — and the calibration framing is right as a concept but wrong as the explanatory variable for the predicted failures.

This is the hardest falsifier to test publicly, because it requires knowing the internal composition of failed companies and whether they had calibrated clinical-regulatory judgment embedded. I will use a public-information proxy: failed companies in the tracked universe whose disclosed leadership team included at least one senior clinical-regulatory operator with prior IND filing experience and at least one prior FDA oncology approval involvement. The proxy is imperfect — calibration is more than credentials — but it is testable from public sources, and I will report against it transparently rather than waving the difficulty away.

4. Closure

If a meaningful share of AI-oncology companies clear the FDA wall without an embedded fractional CMO or co-founder — relying instead on heavy advisory boards, big-pharma partnerships, or late-stage regulatory consultants brought in after the first agency interaction — then the engagement model I proposed is wrong even if the diagnosis is right. The advisory-board route, which I argued against in the original piece, would need to be re-examined honestly. Some sponsors will get there through different paths than the one I am offering. If enough of them do, my path is not the path.

A clarification, since I do offer a board director and scientific advisor structure as one of three engagement options: the "advisory-board route" I argued against is the name-on-deck retainer model — many advisors, low cadence, no decision ownership, no accountability for outcomes. The board director or scientific advisor engagement I offer is different in kind: defined cadence, specific decision support, embedded in the program rather than adjacent to it, equity-anchored, with explicit accountability. Both that engagement and a fractional CMO engagement count as "embedded" for the purposes of this falsifier. The falsifier still holds: if companies are clearing the FDA wall with the model I disfavor and without the one I propose, my path is not the path.

5. Timing

If the reckoning extends past twenty-four months without arriving, the model may still be right but the urgency framing — the narrow window, the credibility reckoning twelve to eighteen months away — is wrong. The go-to-market premise of the work I am offering would need to be rebuilt around a longer horizon. A pattern that takes thirty-six or forty-eight months to materialize is not a pattern sponsors should reorganize themselves around today.

Condition 1 tests the rate at eighteen months. This condition tests whether the urgency framing survives even if the rate eventually arrives. The two windows are not redundant: a thesis can be right about what happens and wrong about when, and the call to act now depends on both being right.

6. Calibrated-departure traction

If the reasoned departures I help shape — proposals into agency-acknowledged territory of uncertainty: accelerated approval on early response, novel surrogate endpoints, tumor-agnostic indications, external controls — do not get traction at pre-IND or in the subsequent agency interactions, then the calibrated-departure disposition is overstated. The offer should be narrowed to fitting the published guidance well, and the disposition language I have used recently should be retired.

This is the falsifier whose outcome I most directly affect. It is therefore the one I have the strongest obligation to track and to disclose honestly. I have placed it last for that reason.

This is also the falsifier most constrained by client confidentiality. I cannot, and will not, disclose details of specific engagements. Instead, the post-mortems will report aggregate, anonymized outcomes across all engagements I had during each window: of the reasoned departures I helped shape, what fraction reached agreement with the agency at pre-IND, what fraction were accepted in subsequent interactions, and what fraction were rejected. The aggregates will be reported across every relevant engagement, not selectively. The cost of confidentiality is loss of resolution; the cost of selectivity would be loss of credibility. I will accept the first.

A second honest constraint: the resolution of this falsifier scales with the number of engagements I take during each window. I am seeking one engagement, at most two, at this point in the franchise — by design, not by accident. That means the denominator for this condition will be small, and a post-mortem reporting "two of two reached pre-IND agreement" or "one of two was rejected" is a different kind of evidence than the aggregate rates the other five conditions will produce. I will report what is testable at the resolution it is testable at, and I will not pretend to more.

A pattern I have observed across thirty years in oncology drug development: the people who get strategic judgment right over time are the ones who publish their predictions, hold themselves to them, and publish the post-mortem regardless of which way it ran. The discipline matters more than the result.

The two post-mortems will publish on schedule. Their conclusions will not be known in advance.

Jesús Gómez-Navarro, M.D., is a medical oncologist and drug development executive, and founder of OncAdios LLC. His original thesis, "Three Ways AI-Oncology Companies Will Fail at the FDA — and the Clinical Expertise Gap That Causes It", was published on May 12, 2026.

← Back to Writing · Seed list (Condition 1 denominator) →