There is a specific kind of failure mode in this industry that doesn't make it into press releases. A company raises a Series A on the strength of computational results — beautiful binding predictions, clean ADMET profiles, a hit rate in virtual screens that looks exceptional on a slide. Then they run the wet lab validation. And the results don't hold.

It's not fraud. It's not even usually bad science. It's a structural problem in how most AI-first drug discovery companies are built, and understanding it matters if you're trying to do this work seriously.

The Validation Gap

Computational drug discovery generates hypotheses. The wet lab tests them. That has always been the relationship. What's changed in the last several years is that the hypothesis-generation step has gotten dramatically faster and more sophisticated, while the experimental validation step has gotten proportionally more expensive relative to the compute. The result is an expanding gap between what you can claim computationally and what you can actually prove experimentally before running out of runway.

Most AI startups are staffed heavily on the computational side. This makes sense — ML engineers, structural biologists, cheminformatics specialists are where the differentiation lives, and they're the people you need to build the platform. But drug discovery isn't a pure computational problem. A model that predicts binding affinity is trained on experimental binding data. A model that predicts solubility is trained on experimental solubility data. When you apply those models to novel chemical space, the predictions are only as reliable as the training distribution allows.

Novel AI-generated scaffolds are, by definition, often outside the chemical space the models trained on. And that's where the experimental reality diverges most sharply from the predicted one.

Three Specific Points of Failure

The first is assay artifact. Pan-assay interference compounds — PAINs — are a well-documented problem in HTS, but AI-generated compounds can have the same problem in different ways. Compounds with reactive functionalities that don't appear reactive under standard filters, compounds that aggregate at assay concentrations, compounds that interfere with the detection chemistry rather than the target itself. If your validation pipeline doesn't include counter-screens and orthogonal assay formats early, you can chase false positives for months.

The second is cell permeability. A biochemical IC50 is measured on isolated protein in solution. What you actually need is activity in a cellular context, which requires the compound to cross a membrane, avoid efflux pumps, maintain stability in cytoplasm, and reach the target at the right concentration. AI models for permeability prediction are decent for compounds similar to known drugs. They are less reliable for structurally novel molecules, and genuinely novel molecules — the ones you'd hope differentiate an AI-first approach — are precisely where the models have least data.

The third failure point is synthetic accessibility. This one is more mundane but probably more common. You generate a beautiful designed compound, score it excellently in silico, hand it to synthetic chemistry, and they tell you it would take six months and require a protecting group strategy that will add four steps and a purification nightmare. The compound was never going to get made. The time spent on it was wasted. Models that claim to score synthetic accessibility are improving, but they are not yet reliable enough to trust without medicinal chemistry review of every compound you plan to synthesize.

What Actually Works

The companies that are making the wet-lab transition successfully — and there are some — are almost always doing three things. They have experimental scientists embedded in the loop at every stage, not downstream of it. They have tight feedback cycles where experimental results are fed back into model retraining within weeks, not quarters. And they choose their initial programs carefully — targets with validated assay formats, strong structural data, and enough known actives to train reliable models.

We made a deliberate choice early on to keep wet lab validation in-house for our most critical early data points, and to use CRO partnerships for scale once we had confidence in the chemistry series. That meant slower capital efficiency in the early stages, but the data quality was meaningfully better than what we'd seen when relying entirely on CRO-generated validation data without our own scientists reviewing it in real time.

The Honest Assessment

AI drug discovery is not a solved problem. The computational tools genuinely help — they reduce the search space, they flag liabilities early, they surface SAR hypotheses that would take much longer to generate experimentally. But the experiments are still rate-limiting. They're still expensive. And a model that performs well on a benchmark does not automatically translate to a clinical candidate.

The failure rate in early clinical trials has not meaningfully improved despite a decade of machine learning investment in the industry. That's a data point worth sitting with. The tools are better. The process still needs to be better.