This is not an argument against AI tools. ChatGPT, Perplexity, and Claude are genuinely powerful for synthesising what is known. This is an argument about something more specific: why the market size number your BD deck is built on — if it came from an AI tool — is structurally more likely to be wrong than right, and why that matters more in biotech than in almost any other sector.
Start with what AI market sizing tools actually do. They read published sources — analyst reports, academic literature, government health data, industry databases, clinical trial registries — and synthesise what those sources say about market size, growth trajectory, patient population, and comparable treatment penetration. They do this at extraordinary speed and present the output with a confidence and formatting that reads like rigorous research.
The problem is not the tool. The problem is what the tool is reading. And what it is reading is, by definition, backward-looking.
The Structural Problem with AI-Generated TAM in Biotech
Market sizing in biotech is not a literature synthesis exercise. It is a primary data problem. The variables that determine whether a market size estimate is commercially useful — real-world treatment adherence, actual payer coverage rates, prescriber intent in the specific indication, patient identification rates in the real healthcare system — are not published anywhere. They exist in the lived behaviour of patients, physicians, and payer decision-makers. And that means no AI tool reading published sources can access them.
This is not a limitation of current AI. It is a structural property of how market knowledge is produced and distributed. Here is what that looks like in practice across the five variables that matter most in biotech commercial modelling.
Variable 1: Real-World Treatment Adherence
Clinical trial adherence data — the figure that anchors most market size models — is produced in controlled conditions: monitored patients, reimbursed access, motivated sites, and active follow-up. These conditions do not exist in the commercial market. Published real-world studies consistently document that actual patient adherence in commercial settings is substantially lower than trial adherence, with the gap widening further in markets with restricted payer coverage or cost-sharing requirements.
What AI gives you: A market size built on trial adherence rates or on published real-world studies from 12 to 24 months ago in a comparable indication.
What is actually true: Adherence rates vary by payer type, geography, patient population, and the competitive treatment options available in your specific launch window. Those variables are dynamic and indication-specific. The only way to establish them with the precision a commercial model requires is primary research with the patients and physicians who will drive adherence in your market.
Variable 2: Payer Coverage Rates
For any new biotech therapy, the commercial market size is not all patients with the indication. It is the patients whose access is reimbursed at a level where the therapy can be prescribed and sustained. Payer coverage rates — the percentage of relevant insurance plans that will cover the therapy for your indication at your planned price point — are the single most important variable in translating patient prevalence into a commercially accessible market.
What AI gives you: Coverage rate data for comparable therapies in comparable indications, drawn from published managed care studies, PBM formulary analyses, and publicly available health plan data.
What is actually true: Coverage decisions are made indication by indication, price point by price point, and modality by modality by specific formulary committees at specific payers. The coverage rate your therapy will receive is not the coverage rate a comparable therapy received 18 months ago. It is determined by the current internal investment thesis of the payer decision-makers who have not yet made that decision — and that information exists nowhere in any published source.
Variable 3: Prescriber Intent in Your Specific Indication
Treatment guidelines describe approved pathways. They do not describe what physicians will actually prescribe when faced with a new therapy in your indication, at your anticipated price point, against the competitive options that will exist at your launch date. The gap between what guidelines say and what physicians do is well-documented in every therapeutic area — and it is different for every new therapy based on the specific clinical context and the specific physician population being targeted.
What AI gives you: A synthesis of published treatment guidelines, formulary data, and real-world prescribing studies — most of which reflect the pre-existing treatment landscape, not the one your therapy will enter.
What is actually true: Prescriber intent for a new therapy is established only by asking prescribers directly — what patient profiles they would treat, what prior therapy exposure they require, what evidence threshold they need before adopting, and how they weigh your therapy against the alternatives that exist at the time of their decision. A primary survey of 300 to 500 relevant specialists in your indication produces data that no secondary source contains and no AI tool can generate.
Variable 4: Patient Identification Rates in the Real Healthcare System
Prevalence data — the denominator in every TAM calculation — is derived from epidemiological studies, disease registries, and administrative health data. These sources count patients who have been diagnosed and recorded in a system. In rare disease, in under-diagnosed conditions, and in emerging indications where clinical awareness is evolving, the gap between the true patient population and the diagnosed, recorded population is substantial.
What AI gives you: Published prevalence figures, disease registry data, and epidemiological study estimates.
What is actually true: The patient identification rate — the percentage of prevalent patients actually finding their way to the physicians who will prescribe your therapy — varies by geography, healthcare system, specialty penetration, and diagnostic awareness in the current clinical community. In rare disease, this is often the single largest variable in the commercial model, and it is almost never accurately represented in published prevalence data.
Variable 5: Competitive Treatment Landscape at Launch
Market size models are built at a point in time. By the time a therapy reaches commercial launch, the competitive landscape has often changed materially. Competing therapies have progressed in clinical development. Biosimilar competition has emerged. Off-label use of adjacent therapies has reshaped standard of care. Treatment algorithms have been updated by clinical societies. None of these developments are visible in a market model built 24 months before launch.
What AI gives you: The competitive landscape as it exists in currently published trial registries, approved label databases, and analyst pipeline trackers.
What is actually true: The competitive treatment landscape your therapy will enter is partly visible from published sources and partly invisible — specifically the part that involves unpublished late-stage data, the part that involves evolving prescriber perceptions of current standard of care, and the part that involves payer formulary strategies that have not yet been formalised.
What AI Market Sizing Actually Produces — and Why It Looks Right
The output of an AI-generated biotech TAM model is not wrong in the way that a calculation error is wrong. It is wrong in a more dangerous way: it is internally consistent, plausible, and well-referenced — and it is systematically biased in the direction that commercial teams want it to be biased.
A Worked Illustration: Why the Same Therapy Gets Different Numbers
Consider a therapy targeting a metabolic or cardiovascular indication with a published patient prevalence of 30 million adults in the United States. An AI-generated TAM model applied to this population might produce a market size in the tens of billions of dollars — consistent with published analyst estimates and based on reasonable assumptions about treatment penetration, pricing, and persistence.
Primary research on the same indication reveals a different picture at every step:
At each step, the primary research finding is directionally consistent with the general pattern secondary data describes — but meaningfully different in magnitude. When compounded across a full commercial model, the addressable market the therapy can realistically access in its first three years is substantially smaller than the TAM an AI tool reading published sources would produce.
What Primary Validation Changes
The purpose of primary market research in commercial model validation is not to replace AI tools. It is to test the assumptions that AI tools cannot generate — and to confidence-score the output of the commercial model with the actual data that determines whether each assumption is correct.
The commercially useful output of this process is not a different number. It is a number with a known confidence architecture — where the commercial team understands which assumptions are robust, which are uncertain, and specifically which uncertain assumptions, if wrong, would change the strategic decision most materially.
That is intelligence. An AI-generated TAM is information. In biotech, the difference between those two things is the difference between a commercial strategy that works and one that fails in the market before anyone has understood why.
The Three Questions Your TAM Model Should Be Able to Answer
These are the tests of whether a biotech market size model is built on a foundation that will survive commercial reality.
1. What is the confidence score on your payer access assumption? If the answer is "we used comparable therapy coverage curves" — that is a secondary data assumption that has not been tested against the payer decision-makers who will determine your specific coverage. A primary survey of 20 to 30 formulary directors in your key payer segments can resolve this before your BLA, not after your launch.
2. At what out-of-pocket cost threshold does your modelled adherence rate break? If the commercial model has a single adherence assumption that does not vary by cost-sharing exposure, it is not modelling the real market. It is modelling an idealised one. A primary patient survey establishes the cost sensitivity function that determines where your adherent population sits under each payer architecture scenario.
3. What proportion of the physicians in your target specialty will prescribe your therapy within the first 12 months of launch — and what is the specific barrier that stops the rest? If the answer is derived from historical launch curves rather than from direct prescriber intent surveys, you are planning a launch against a prescriber population you have not spoken to. The answer, asked directly of 300 to 500 relevant specialists, is the intelligence that shapes early launch investment and sales force deployment more than any other single data point.
