Retrosynthesis Planning With AI - Making Impossible Molecules Practical

Drug candidates die in synthesis more often than the field acknowledges. You design a compound that looks excellent on paper — good predicted binding, clean ADMET profile, novel scaffold — and then a medicinal chemist with twenty years of experience looks at the structure and quietly tells you that the key bond formation would require a protecting group strategy that adds six steps and a column you can't run at scale. The compound gets deprioritized. A perfectly good idea evaporates because no one thought carefully about how to make it.

Computer-assisted retrosynthesis planning has existed for decades. What's changed recently is that the tools have become practical enough to use routinely, rather than as a last resort after the chemist is already stuck.

How Retrosynthesis Planning Actually Works

The problem in retrosynthesis is straightforward to state and hard to solve: given a target molecule, identify a sequence of known chemical reactions that can build it from available starting materials. Working backwards from the target, you break bonds according to known transforms — a Suzuki coupling, an amide bond formation, a reductive amination — until you reach commercially available precursors. Each step in the retrosynthetic tree represents a reaction that needs to work with the specific substitution pattern of your actual molecule, not just the generic transform.

Modern AI retrosynthesis tools are trained on millions of published reactions from the chemical literature and patent databases. The model learns which bond disconnections are most likely to succeed based on the substrate, which reaction conditions are typically required, and what functional group incompatibilities to avoid. Given a target structure, it generates a ranked list of retrosynthetic routes, estimates the number of steps for each, and flags potential problem areas.

The output is a tree of proposed routes, not a single answer. The chemist still decides which route to attempt based on reagent cost, step count, yield expectations at each step, and practical considerations like whether any intermediate requires chromatographic purification that would be impractical at scale. The tool narrows the search space from effectively infinite to manageable.

A Concrete Example From CAI-022

CAI-022, our rare disease program targeting a lysosomal storage disorder, required an unusual bicyclic scaffold — a dihydroimidazo-pyrimidinone core — that our modeling work identified as a promising fit for the active site geometry. The scaffold isn't uncommon in published literature, but the specific substitution pattern we needed — a chiral center at C-4, an aryl group at C-6, and a pendant hydroxymethyl group at N-3 — had no close precedent in the synthesis literature.

Running the target through our retrosynthesis planning tool generated 14 distinct proposed routes. Seven were immediately eliminated because they required a key step — a direct C-H arylation at C-6 under oxidative conditions — that, while mechanistically reasonable, had a very limited substrate scope in published examples that didn't include our substitution pattern. The tool flagged this automatically based on known substrate scope limitations in its training data.

Of the remaining seven routes, three were selected for parallel execution by our synthetic chemistry CRO. The first route succeeded on the fourth analog attempt, with a modification to the cyclization step suggested by the CRO chemist that the automated tool hadn't proposed. Total synthesis from commercial starting materials took eleven steps with an overall yield of approximately 6% — acceptable for a first-generation synthesis of a novel scaffold.

Where the Tools Still Fall Short

Two areas where current AI retrosynthesis tools are genuinely unreliable: reactions involving novel reagents or conditions published in the last two years, and stereoselective synthesis of complex chiral molecules.

The training data has a cutoff. Recent methodology developments — new photoredox transformations, novel asymmetric organocatalysts, recent developments in C-H functionalization — may not be represented. A chemist current with the primary literature will sometimes propose a route that the tool doesn't consider, because the key reaction was only described in a paper published after the training data cutoff.

Stereoselective synthesis is more fundamentally limited. Retrosynthesis tools are generally better at planning the carbon skeleton than at proposing how to set stereocenters reliably. Predicting the facial selectivity of a specific substrate in an asymmetric reaction requires detailed knowledge of transition state geometry that most tools don't model explicitly. For molecules with multiple stereocenters — which are common in natural product-inspired drug design — the synthesis plan may be incomplete in critical ways.

The Practical Value

Running retrosynthesis planning early in the design cycle changes the compounds that get proposed. When chemists know that the design tool is integrated with synthetic feasibility scoring, they think differently about what to propose. Exotic scaffolds that would previously have been casually included in a designed library get flagged before anyone spends resources on them. The overall synthetic tractability of compound libraries improves.

More concretely: tracking synthetic accessibility scores for designed compounds before committing to synthesis has reduced the fraction of compounds that require route development rather than straightforward synthesis from roughly 35% in our earliest programs to under 15% in current programs. Whether that's attributable to better planning tools or to chemists learning to think computationally is hard to disentangle. Probably both.

Retrosynthesis Planning With AI - Making Impossible Molecules Practical

How Retrosynthesis Planning Actually Works

A Concrete Example From CAI-022

Where the Tools Still Fall Short

The Practical Value

Synthesis-Aware Molecular Design