
One simple way to decide which data to generate, which models to trust, and when to stop spending before Phase II turns uncertainty into sunk cost. Most leadership debates about program risk are framed as categories: efficacy, safety, commercial, operations. It sounds orderly. It is also misleading. A program usually fails because the exposure you need and the exposure you can tolerate sit closer together than you admitted, and the uncertainty around both is wider than your plan. Call it what it is: an overlap problem. Phase II is where overlap becomes a write-off.
A third view: shift, squeeze, spread
Brian’s argument this month is that drug development behaves like math whether we like it or not. Nick’s argument is that failure rarely has a single cause and the causes bleed into each other. Put those together and you get a more useful operating view:
Every failure mode does one of three things.
Shift. It moves the benefit threshold or the harm threshold. Target tissue exposure issues, weak target engagement, and sloppy endpoints shift the benefit threshold in the wrong direction. Off-target pharmacology, time-dependent liabilities, and susceptibility shift the harm threshold in the wrong direction.
Squeeze. It collapses the usable band by adding time or operational constraints. Late-emerging toxicity, slow recruitment, endpoint noise, and budget compression all shrink your room to maneuver.
Spread. It increases variability. Metabolism differences, drug-drug interactions, comorbidity, adherence, site effects, and protocol drift widen the distributions until the average stops being a decision tool.
That framing is not a slogan. It forces a concrete question at every gate: are we narrowing the overlap between “needs” and “cannot tolerate,” or are we just collecting more artifacts that feel scientific? Once you accept that, Brian’s quantitative framing stops being an analogy and starts being a design requirement.
Extending Brian: stop treating toxicity as a snapshot
Brian’s most decision-relevant point is not that benefit and harm can be quantified. Everyone nods at that and then returns to the comfort of categories. The point is that toxicity is often a trajectory, not an event.
The operational implication is straightforward and, in most organizations, still underused: if a liability emerges after an adaptive phase, then your early work should be built to detect the bend in the curve, not just the end state.
That means three changes.
First, instrument time, not just dose. In vitro systems and animal work should be designed with repeated functional and mechanistic readouts that let you see when compensation is failing. The decision is not “is there an effect.” The decision is “at what exposure and after what duration does the system stop coping.”
Second, pick a transition marker that can travel. Brian’s example of mitochondrial disruption in human cardiomyocytes is instructive because it forces you to look upstream of organ failure. The point is not to collect mechanistic readouts because they are interesting. The point is to identify a small set of markers you can measure early, that correlate with later functional decline, and that you can plausibly monitor in people. If you use AI here, use it for a bounded purpose: integrate the time-series so you can set an exposure cap or a monitoring trigger that you would otherwise hand-wave. That is the decision. Not “deploy AI.”
Third, turn the trajectory into a gate. If you cannot describe what pattern would cause you to slow escalation, change formulation, or stop a program, then your “early warning” system is ornamental. The decision owners should agree, in advance, what the trajectory has to look like to earn another month of spend.
This is how you take Brian’s math and force it into portfolio behavior.
Now, zoom out to Nick’s map of why programs fail.
Reframing Nick: more data is not the same as more decision power
Nick is right to highlight that failure reasons blur together. Dosing constraints are rarely separable from efficacy, and “strategic” failures often arrive because the science did not reduce uncertainty fast enough to justify the next check. Where I would tighten his framing is on the implied remedy. When the handoff from models to patients is hard, the reflex is to conclude we have a data generation problem. Organoids, microphysiological systems, imaging, registries, real-world data, digital endpoints. All of these can help. They also routinely fail to help, for a simple reason: data that does not change a decision is just overhead with a scientific accent.
So here is the rule I wish more leadership teams enforced. Before you commission a new data stream, answer two questions in plain language.
Which of the three moves is this data supposed to make: shift, squeeze, or spread?
What decision will change if the data comes back “bad,” and who has the authority to act on it?
If you cannot answer both, do not blame the maturity of the modality. Blame governance.
This is also the quiet trap with EHR and registry analyses. They are powerful for feasibility and for hypothesis support, but they are not automatically causal. If you are using real-world patterns to claim “target validation,” the constraint is confounding. You need pre-specified checks that would convince you you are seeing biology, not healthcare behavior. Otherwise you will move forward with a false sense of certainty, and the overlap will still be waiting for you.
Nick’s instinct to ask for new types of data is correct. The fix is to make that instinct conditional on decision impact.
A concrete workflow: the Three Curves Test
Here is a realistic scenario.
A mid-size biotech has an oral immunology asset with clean potency and a plausible mechanism. In animals, the program shows mild liver signals that appear late and resolve after washout. Early human PK suggests wide variability. The clinical team wants to move fast into Phase II because the competitive landscape is heating up. I have watched teams handle this two ways. They either bury the program under more studies, or they “manage risk” with the word monitor and hope the spreadsheet cooperates. Neither is a plan.
A better move is to run the Three Curves Test in the Phase II planning meeting.
Mental model: The Three Curves Test
Curve 1, Benefit. What exposure in the target tissue is required for a clinically meaningful effect, and how confident are you?
Curve 2, Burden. What exposure and duration produces an unacceptable liability, and what is the earliest marker that predicts it?
Curve 3, Variability. How wide is the distribution of exposure and response in the intended population, and what are the drivers (metabolism, interactions, comorbidity, adherence)?
Decision rule. If the uncertainty bands around Curve 1 and Curve 2 overlap once you apply Curve 3, you do not have a Phase II problem. You have a measurement problem.
Now apply it.
The team commissions a short, deliberately designed human liver system that is stable long enough to capture the delayed signal and that includes functional readouts, not just cell death. The explicit decision is to determine whether the late signal is an adaptive response that plateaus, or a progression that accelerates with time. In parallel, instead of asking AI to “find patterns” in a pile of data, the team uses analytics for one bounded decision: identify which baseline features predict high exposure in humans, so inclusion criteria and drug-drug interaction guidance can be tightened before Phase II enrollment ramps. Then the Phase II protocol links monitoring to action. Not “we will monitor liver enzymes.” A specific threshold that changes dosing or stops treatment in a subgroup, with a rationale that ties back to the preclinical trajectory work.
This is also where regulators become unexpectedly practical. They do not need you to be omniscient. They need you to be explicit. If you want to lean on a novel marker, or replace a legacy step with a different package, show the action rule and show what you will do when it breaks. Ambiguity is what gets punished. Same budget class. Different intent. You are spending to narrow the overlap, not to look thorough.
Brian is pushing us to treat benefit and harm as a quantitative balance over time. Nick is pushing us to recognize that failures emerge from a web of causes, many of them operational. The actionable synthesis is simple:
Manage overlap under uncertainty. Do not wait for the after-the-fact label. So here is the decision. Will you keep funding data that makes you feel informed, or will you fund only data that narrows the overlap enough to change what you do next? Pick one program this month. Run the Three Curves Test. If you cannot fill it with numbers, ranges, and owners, you have your answer.
I hope that from this letter you can adopt these actionable takeaways:
Require every program entering a major gate to present the Three Curves Test on one slide, including uncertainty ranges and the next decision it will change.
Redesign one high-risk liability package to measure the trajectory and the transition marker, not just the end state.
Audit your current “innovation” data streams and cancel at least one dataset that cannot be tied to shift, squeeze, or spread and a named decision owner.
Replace vague monitoring language in Phase I and early Phase II protocols with one explicit action threshold tied to an early marker.
Assign ownership for variability. If nobody owns exposure distribution and its drivers, your average-patient assumptions will keep failing in real populations.