The Protein Folding Revolution and What It Means for Small Molecule Design

Before 2020, if you wanted to do structure-based drug design on a target without a crystal structure, you were guessing. Homology modeling helped when you had a close structural relative. When you didn't, you were working off binding data alone, which is a bit like trying to design a key without seeing the lock.

The availability of high-quality structure predictions for most human proteins changed the geometry of that problem. The question now isn't whether you can get a structural model — it's whether that model is good enough to guide small molecule design, and the answer depends heavily on what you're asking it to do.

What High-Confidence Predictions Actually Give You

For well-folded globular domains — kinases, GPCRs in their inactive conformations, nuclear receptors — predicted structures from current-generation models are good enough to run structure-based virtual screening against. Binding pocket geometry tends to be reliable when the confidence scores are high. Residue positions within the core of the protein are accurate enough that you can identify potential pharmacophore anchors, estimate pocket volume, and filter compounds on basic shape complementarity.

What you get is a structural hypothesis. You can run docking against it, look at predicted binding modes, and generate SAR hypotheses. The models trained on experimental binding data can then augment this, tying predicted structural features to activity. This is useful. It's not the same as a 1.8-angstrom crystal structure with a bound ligand, but it gets you to a better starting point than nothing, faster.

The practical impact for us has been most pronounced with targets that were considered structurally intractable three years ago. CAI-031, our undisclosed autoimmune program, is targeting a protein-protein interaction interface where experimental structure determination has been challenging due to the complex's transient nature. We ran structure prediction on both binding partners, used ensemble sampling to model the interface geometry in multiple conformational states, and identified a shallow but druggable groove that we wouldn't have found without a structural model to work from.

The Parts That Still Require Caution

Two areas in small molecule design where predicted structures should be treated with more skepticism: binding pocket flexibility and water networks.

Crystal structures capture one or a small number of protein conformations. Predictions capture even fewer — typically the lowest-energy ground state. Proteins that undergo significant conformational change upon ligand binding — induced fit — can generate false negatives in virtual screening because the apo structure simply doesn't have a pocket in the right shape. You score a compound as inactive, synthesize nothing, and miss a perfectly good series. This has bitten us once, on an earlier internal program, and we now routinely run molecular dynamics simulations to generate conformational ensembles before any structure-based docking campaign.

Water networks inside binding pockets are another persistent problem. Predicted structures don't tell you which water molecules are structural and which are displaceable. Displacing a structural water that makes a key hydrogen bond to the protein backbone can kill potency entirely. Since these waters aren't in the structure prediction output, you have to model them explicitly — or, more often in practice, trust the experimental SAR to reveal their presence after you've already made a series of disappointing analogs.

Intrinsically Disordered Targets

The real limitation of structure-based design is targets that simply don't fold into a defined structure under physiological conditions. Intrinsically disordered proteins — IDPs — are responsible for a significant fraction of drug-resistant oncology targets. MYC, for example, lacks a folded domain that would support small molecule binding through conventional occupancy-based mechanisms. Structure prediction doesn't solve this; you can't predict the structure of something that doesn't have one.

There's active work on targeting IDPs through molecular glues, proximity-induced degradation, and by binding to transiently folded states that flicker in and out of existence on microsecond timescales. Some of this work is promising. None of it is yet routine, and none of it relies on structure prediction in the same way structure-based design does for ordered proteins.

The Net Effect on a Discovery Program

We ran a rough comparison across programs that had structural data available from the start versus those that didn't. Programs where we could ground the design in a structural model — even a predicted one — reached first active hits around 40% faster on average. That's not controlled for target difficulty, so the number is directional at best. But the qualitative experience is consistent: having a structural hypothesis to argue against, even an imperfect one, focuses chemistry effort in ways that pure ligand-based work doesn't.

The protein folding tools are genuinely useful. They are not magic. A structural model of your target is the beginning of a structure-based design campaign, not the end of one. The experiments still have to happen. The chemistry still has to be done. What changes is where you start, and sometimes that turns out to matter quite a lot.

The Protein Folding Revolution and What It Means for Small Molecule Design

What High-Confidence Predictions Actually Give You

The Parts That Still Require Caution

Intrinsically Disordered Targets

The Net Effect on a Discovery Program

Structure-Based Design at Scale