Target identification and target validation sound like consecutive steps in the same process. In most presentations about drug discovery, they're discussed as if one leads naturally into the other, maybe six months apart. In practice, they're completely different scientific problems, and the distinction matters when you're deciding where to invest in AI tools.

Target identification answers the question: which molecular entities play a causal role in disease biology? Target validation answers: does modulating this specific target with a drug-like molecule produce a therapeutic effect in a relevant biological system? The first is primarily a data analysis problem. The second is fundamentally an experimental one.

Where AI Works Well: Target Identification

Target identification is exactly the kind of problem that computation handles well. You have large, multi-dimensional datasets — genomics, transcriptomics, proteomics, metabolomics, clinical genetics — and you're trying to find patterns that implicate specific proteins or pathways in specific diseases. The signal-to-noise is low. The dimensionality is high. The number of potential targets vastly exceeds what anyone could evaluate manually.

Our target identification workflow integrates genome-wide association study data, disease-relevant expression datasets from public repositories, Mendelian randomization analysis for causal inference, and protein-protein interaction networks to surface targets with multiple orthogonal lines of evidence. A target that appears in GWAS hits, shows dysregulated expression in affected tissue, has human genetic validation through loss-of-function variants with relevant phenotypes, and sits in a pathway with other validated disease genes gets a composite score that ranks it against every other potential target in the indication.

For our neurodegeneration program (CAI-007), the tau aggregation target came through a process like this. The tau connection to Alzheimer's disease isn't novel, but the specific mechanism we're targeting — an allosteric site that influences post-translational modification-dependent aggregation dynamics — emerged from network analysis of phosphoproteomics data in patient-derived neurons that we wouldn't have identified without the computational target identification layer.

Where AI Doesn't Help: Target Validation

Target validation requires answering a hard biological question with hard experimental data. The question is: if you pharmacologically inhibit (or activate) this target in a relevant cell type, organ, or whole animal model, do you see a disease-relevant phenotypic change consistent with therapeutic benefit and without unacceptable mechanism-based toxicity?

No AI system predicts this from first principles. You can use genetic models — knockout mice, CRISPR-mediated knockdowns in patient-derived cell lines, or human genetics data from populations with natural loss-of-function variants — to get evidence. You can use tool compounds from the literature. But at some point, you're doing wet lab biology, and the results of that biology are what you're validating against, not a model prediction.

This matters because it sets a realistic expectation for what target identification tools can and cannot guarantee. A target that scores highly in a computational identification workflow is a well-supported hypothesis. It is not a validated target. Companies that conflate "high confidence identification" with "validated for drug discovery" are making claims their data don't support.

The Human Genetics Shortcut

The strongest form of target validation available without running your own animal studies is human genetics. A protein encoded by a gene where loss-of-function variants are associated with reduced disease risk is validated, in a meaningful sense, by nature's own experiment. If people who naturally make less of the protein have lower disease burden, inhibiting the protein pharmacologically should theoretically phenocopy that effect.

This logic has driven a substantial shift in how target identification is done. Human genetics data is abundant, public, and increasingly analyzed with sophisticated causal inference tools. Mendelian randomization — which uses genetic variants as instrumental variables to estimate causal effects of biomarkers on disease outcomes — has become a standard part of computational target evaluation. Targets with human genetic validation have historically shown higher clinical success rates than those without, though the difference is modest and the confounding factors are significant.

The limitation is that human genetics validation only exists for the protein targets where genetic variation is common enough to study. Rare proteins, pathways where genetic variation is lethal, or targets relevant only to somatic disease processes not captured in germline genetics don't benefit from this approach.

The Practical Workflow

Our current workflow treats computational target identification as a way to rank and prioritize candidates for experimental validation, not to replace it. The output of target identification is a shortlist — typically five to ten candidates per indication — that goes into a structured validation program. Each candidate gets a defined set of validation experiments: expression confirmation in disease tissue, functional knockdown phenotyping in a relevant cell model, and where possible, interrogation of available human genetics data.

Targets that pass validation become candidates for chemistry programs. Targets that fail get annotated with the specific reason for failure — wrong expression pattern, phenotype not disease-relevant, mechanism-based safety signal — and those annotations feed back into the scoring model to improve future target identification. Over time, the model learns what kinds of targets are likely to validate and what patterns are red flags. That feedback loop is more valuable than any single target identification run.