AI Drug Discovery & Wet Lab Startups: Market Analysis

01 The Incyte–Genesis Deal

On May 20, 2026, Incyte (Nasdaq: INCY) and Genesis Molecular AI announced a major expansion of their strategic collaboration—one of the first pharma-AI partnerships to feed large-scale foundation model training with a partner’s proprietary experimental data.

Deal Structure

$120M Total upfront: $80M cash + $40M equity investment in Genesis

$1B+ Potential milestones across five initial collaboration targets

$232M Per-program milestone ceiling (preclinical through commercial)

5+ New collaboration targets with option to nominate more over time

Why This Deal Matters

Genesis’ GEMS platform (Genesis Exploration of Molecular Space) includes foundation models for protein-ligand structure and property prediction, backed by investors including a16z and NVIDIA. The deal creates what Evan Feinberg, Genesis CEO, calls an “industrial-scale flywheel of AI-enabled design-make-test cycles.”

“High-quality proprietary data is among the most valuable inputs for advancing molecular AI. This expanded collaboration will enable both companies and patients to benefit from an industrial-scale flywheel.”
— Evan Feinberg, Ph.D., Founder & CEO, Genesis Molecular AI

The deal builds on an initial collaboration from February 2025. Incyte will share proprietary experimental data with Genesis to enhance the models—a critical differentiator. Incyte retains exclusive rights to develop and commercialize all resulting compounds, plus receives recurring research funding for compute workloads. Additional programs beyond the initial five could yield “several billion dollars” in additional milestone payments.

02 Market Landscape

Market Sizing

The AI drug discovery market is at an inflection point. Multiple data sources converge on a consistent story: rapid growth from a relatively small base.

AI Drug Discovery: ~$2.9B in 2026 → $13.8B by 2033 (Grand View Research)
Broader forecast: $3.6B (2024) → ~$50B by 2034 (ChemLex/industry analysts)
Drug Discovery Informatics: $7.5B in 2025, 12.5% CAGR through 2030 (Technavio)
Digital Biology (inclusive): $45B in 2024 → $125B by 2033, 12.5% CAGR
Lab Robotics: $1.2B in 2024 → $3.2B by 2033, 10.8% CAGR

Global pharma R&D deals hit $86.7B in 2025—up 49% YoY—with AI driving a shift toward fewer, larger, more targeted partnerships averaging $1.16B each (IQVIA).

Deal Flow Is Accelerating

The Incyte–Genesis deal joins a cascade of nine- and ten-figure AI pharma partnerships. Companies are concentrating capital on AI-driven platforms that appear to offer a better probability of clinical success, rather than spreading bets across many smaller collaborations.

Partnership	Value	Year	Focus
Sanofi × Exscientia	$5.2B	2022	Oncology, immunology (15 targets)
Isomorphic × Eli Lilly	$1.7B	2024	Small-molecule discovery (multi-target)
Isomorphic × Novartis	$1.2B	2024	Small-molecule discovery (3 targets)
NVIDIA × Eli Lilly	$1.0B	2025	Next-gen AI research lab
AstraZeneca × CSPC Pharma	$5.3B	2025	AI platform-driven pipeline
Incyte × Genesis	$120M+ ($1B milestones)	2026	Foundation model + proprietary data flywheel
Almirall × Absci	$650M (milestones)	2026	AI-designed biologics for dermatology

03 Key Players & Competitive Map

Company	Approach	Wet Lab?	Key Milestones
Recursion Public / RXRX	Phenomics + computer vision foundation models; merged with Exscientia (2024)	Yes—millions of experiments/week, NVIDIA BioHive-2 supercomputer	$1.6B accumulated R&D spend; most comprehensive AI drug discovery stack
Isomorphic Labs Alphabet / Private	AlphaFold-derived structure prediction; small-molecule design	Minimal—primarily computational, partners provide validation	$2.7B raised (Series B from Thrive Capital); $3B in pharma deals with Lilly + Novartis + J&J
Insilico Medicine Private	Generative AI (GANs); Pharma.AI platform; end-to-end from target to clinic	Yes—LifeStar 2 automated lab; MMAI Gym for Science	Rentosertib: first AI-designed drug to reach Phase IIa; 30 preclinical candidates at 12-18 month pace
Absci Public / ABSI	Generative AI + synthetic biology for de novo antibody design	Yes—screens billions of cells/week; AI-to-validated-candidate in 6 weeks	Deals with AstraZeneca, Almirall ($650M), AMD ($20M strategic); ABS-201 in Phase 2
Genesis Molecular AI Private	GEMS foundation models for protein-ligand prediction; design-make-test loops	Via partner (Incyte data)	$120M Incyte deal; backed by a16z, NVIDIA
ChemLex Private	Self-driving chemistry lab; 24/7 autonomous synthesis	Yes—core differentiator is physical automation	$45M raised (Dec 2025); 70+ customers including 6 of top 10 pharma
Chan Zuckerberg Biohub Non-Profit	ESM protein world models; open-source AI for protein biology	Partner labs provide validation	ESM generation 4 models launched May 2026; validated in cancer + immune targets

Big Pharma Adopters

Sanofi, Novartis, Bayer, GSK, Eli Lilly, and AstraZeneca have all committed to deep AI integration. Lilly’s strategy is emblematic: a $1B lab partnership with NVIDIA, a $1.7B Isomorphic deal, and a $350M Innovent collaboration—all within 18 months.

Tech Disruptors

Google/DeepMind (via Isomorphic and AlphaFold), Microsoft (Novartis alliance, BioGPT), NVIDIA (BioNeMo platform, compute partnerships), and AWS (cloud lab infrastructure) are all building the picks-and-shovels layer. The CZ Biohub’s open-source ESM4 models, released in late May 2026, could democratize protein understanding in the same way AlphaFold did for structure prediction.

04 The Wet Lab Angle: Why Physical Matters

This is where the real startup opportunity lives. The biggest lesson from the last decade of AI drug discovery is stark: computational predictions without physical validation are insufficient.

The Data Flywheel Problem

AI models are only as good as the data they train on. The companies winning the biggest deals—Genesis, Recursion, Absci—all share one trait: they generate proprietary experimental data at scale. This creates a compounding advantage:

AI predicts candidate molecules or biological targets
Wet lab validates (or falsifies) those predictions at high throughput
Results feed back into the AI model, improving its next predictions
Faster cycles mean more data, which means better models, which means faster cycles…

This is what Genesis calls “an industrial-scale flywheel of AI-enabled design-make-test cycles.” Companies without wet lab capabilities can’t close this loop.

Why Purely Computational Approaches Hit a Wall

Clinical failures: Insilico’s Phase 2a results for Rentosertib showed safety but fell short on statistically significant efficacy. Recursion’s first clinical trial showed no reportable efficacy. BenevolentAI and Deep Genomics have both struggled. The gap between “looks good in silico” and “works in patients” remains enormous.
Data quality bottleneck: Public datasets are noisy, incomplete, and already exhausted by every competitor. Proprietary wet lab data is the new moat.
Biology is messy: Protein folding predictions (AlphaFold) are necessary but not sufficient. Drug behavior in actual cells—metabolism, toxicity, off-target effects—requires physical testing.

The Self-Driving Lab Thesis

Recursion collaborates with HighRes Biosolutions on self-driving, high-throughput labs using robotic perception, digital twins, and natural language-driven lab orchestration. ChemLex runs a 24/7 autonomous chemistry system that compresses months of synthesis into days. Emerald Cloud Lab pioneered the cloud-accessible lab concept with Carnegie Mellon.

The convergence of robotics, AI, and miniaturized biology means the cost of running a wet lab experiment is dropping exponentially—while the value of the data it produces is increasing exponentially. This is the core startup opportunity.

05 White Space & Startup Opportunities

Opportunity Map

Opportunity	Gap	Why Now
Lab-as-a-Service for AI Biotechs	Most AI-native companies (Genesis, Isomorphic) lack their own wet lab. They depend on partners or CROs that aren’t designed for rapid iteration.	AI-first biotechs need 10x faster turnaround than traditional CROs provide
Automated Assay Development	Setting up new biological assays is still artisanal. Each target requires custom protocols, cell lines, and readouts.	LLMs can now read protocols and suggest optimizations; robotics costs have dropped 60% in 5 years
Data-Quality-as-a-Service	Pharma companies have petabytes of experimental data but it’s siloed, inconsistently formatted, and hard to use for model training.	Foundation models require standardized, high-quality training data—there is no good middleware for this
Niche Therapeutic Area Vertical	Big players target cancer and immunology. Rare diseases, neglected tropical diseases, and agricultural biotech are underserved.	Smaller data requirements for rare diseases; regulatory incentives (orphan drug status); less competition
Biologics / ADC Design Platform	Most AI drug discovery focuses on small molecules. Antibody-drug conjugates (ADCs) and biologics are a $300B+ market with limited AI tooling.	Absci proved the model works; the ADC market is exploding (10+ new approvals since 2023)
AI-Native Contract Research	Traditional CROs (Covance, Charles River) are slow to adopt AI. A born-digital CRO could compress timelines 5-10x.	The design-make-test cycle is the bottleneck; whoever runs it fastest wins

06 Viable Business Models

A. Platform Licensing (the Genesis Model)

Build AI models, license them to pharma partners. Revenue = upfront payments + milestones + royalties. Pros: Capital-efficient, scales well, pharma absorbs clinical risk. Cons: You don’t own the drugs; value capture is capped by royalty rates (typically 1-5%).

B. AI-First CRO (the ChemLex Model)

Run physical experiments on behalf of clients using your AI + automated lab stack. Revenue = fee-for-service + data licensing. Pros: Recurring revenue from day one; you accumulate proprietary data from every experiment. Cons: Capital-intensive (lab buildout); operational complexity.

C. Hybrid: Platform + Internal Pipeline (the Recursion/Absci Model)

Use your platform for partner-funded programs (generates cash) while advancing your own drug candidates (captures upside). Pros: Partners de-risk early operations; internal pipeline captures full drug value. Cons: Requires significantly more capital; execution risk on two fronts.

D. Data Flywheel Company

Build and operate automated wet labs purpose-built to generate high-quality biological data. Sell data and trained models, not drugs. Pros: Avoids clinical trial risk entirely; every customer’s experiments make your models better. Cons: Harder to command premium pricing; depends on network effects materializing.

Recommended for a New Entrant

Start with Model B or D. The AI-first CRO or Data Flywheel model lets you generate revenue and proprietary data from day one, without requiring $100M+ to fund an internal drug pipeline. Once you have traction, you can selectively advance internal programs (evolving toward Model C) using the data advantage you’ve built.

ChemLex reached 70+ customers including 6 of the top 10 pharma companies in just 3.5 years on $45M. That velocity is instructive.

07 Risks & Moats

Key Risks

The clinical valley of death is real. As of late 2024, no end-to-end AI-designed drug has demonstrated clear Phase II efficacy. Insilico’s Rentosertib showed early signs, but the field is still awaiting a definitive proof point. This matters even for non-pipeline companies because pharma’s willingness to pay premium prices depends on demonstrated clinical impact.

Hype cycle correction: AI drug discovery experienced significant skepticism in 2024 after clinical stumbles. Brendan Frey of Deep Genomics stated: “AI has really let us all down in the last decade when it comes to drug discovery.” Market sentiment can shift quickly.
Incumbency advantage: Recursion has spent $1.6B and a decade building its platform. Isomorphic has $2.7B in funding and Google’s AI talent. Competing head-to-head on general-purpose discovery is suicidal.
Regulatory uncertainty: The FDA is beginning to engage (launching “Elsa” AI tools across all centers), but regulatory frameworks for AI-designed therapeutics are still evolving.
Talent scarcity: The intersection of ML expertise + wet lab biology + drug development is extremely narrow. Recursion just granted ~3M RSUs to attract 33 new employees.
Capital requirements: Even a “lean” wet lab buildout runs $5-15M. A competitive automated facility is $30-80M.

Defensible Moats

Proprietary data: The single strongest moat. Each experiment generates data that improves your models. This compounds over time and cannot be replicated by a well-funded competitor without doing the actual physical work.
Lab-model integration: Tight coupling between your AI and your physical experiments creates switching costs. Once a pharma partner is running their programs through your flywheel, migration is painful.
Speed: If you can consistently deliver validated candidates in 6 weeks (Absci’s benchmark) vs. 6 months (traditional CRO), the value proposition sells itself.
Vertical specialization: Owning a specific disease area or modality (ADCs, peptides, gene therapy vectors) deeply is more defensible than being a general-purpose platform competing with Recursion.

08 Recommended Startup Positioning

The Play

AI-Native CRO with a Proprietary Data Flywheel

Build an automated wet lab optimized for AI-speed iteration. Serve AI-first drug discovery companies (Genesis, Isomorphic, smaller biotechs) who have great models but no physical lab. Also serve mid-size pharma who want to experiment with AI-driven discovery without building internal capability.

Why This Positioning Wins

You’re the arms dealer, not the soldier. You profit whether Isomorphic, Genesis, or the next unknown AI company wins the discovery race. Every one of them needs physical validation.
Revenue from day one. Fee-for-service + milestone bonuses. You don’t need to wait 10 years for a drug approval to see returns.
Data moat builds automatically. Every client experiment improves your assay protocols, robotics calibration, and (optionally) your own predictive models.
The market is creating itself. As AI drug discovery platforms proliferate, the demand for fast, AI-compatible physical validation will scale proportionally.

Execution Playbook

Phase	Timeline	Focus	Capital
Seed	Months 0-6	Pick a niche modality (e.g., peptide therapeutics or ADCs). Build an MVP automated assay pipeline. Sign 2-3 design partners.	$3-5M
Series A	Months 6-18	Scale the lab. Build the data layer (structured experimental results as a product). Hit 10+ paying customers.	$15-25M
Series B	Months 18-36	Launch proprietary AI models trained on your accumulated data. Optionally start 1-2 internal drug programs using your own data advantage.	$40-80M
Growth	Year 3+	Platform licensing + internal pipeline + CRO flywheel = the Absci/Recursion playbook, but starting from revenue instead of from $1.6B in burn.	Revenue-funded + opportunistic raises

Key Hires (First 10)

Lab Automation Engineer (robotics + liquid handling; this person is your CTO equivalent)
Computational Biologist / ML Engineer (assay design + data pipeline)
Wet Lab Biologist (2-3; hands-on assay development and protocol optimization)
Data Engineer (structured data capture from experiments; LIMS integration)
BD / Pharma Partnerships (someone who has sold into pharma before)
Medicinal Chemist (if targeting small molecules; swap for protein engineer if biologics)

Where to Set Up

San Francisco / South San Francisco (AI talent + biotech ecosystem density), Boston / Cambridge (pharma proximity + Kendall Square network), or Singapore (ChemLex’s playbook: government subsidies, access to Asian pharma, growing hub). Salt Lake City (Recursion’s BioHive ecosystem) is an emerging dark horse with lower costs.

The insight that makes this work: the limiting reagent in AI drug discovery is not compute or algorithms. It’s high-quality, standardized, rapid-turnaround experimental data. If you own the fastest path from AI prediction to physical validation, you own the bottleneck. Every AI company is a potential customer.