Skip to main content

End-to-End Virtual Screening Campaign Guide

Filtering Mega Chemical Spaces into Experimental-Ready Compound Sets This guide provides a complete, engine-by-engine protocol for running a computational drug discovery virtual screening campaign — from initial library ingestion through final compound selection for wet lab testing. It covers static docking, flexible docking, ensemble docking, Boltz2 co-folding, chemical space analysis, and diversity-driven final selection.
Follow the funnel in order. Each phase feeds the next. Do not skip phases.

1. Campaign Philosophy & Funnel Architecture

Virtual screening campaigns operate as a staged triage funnel. The fundamental trade-off at every stage is compute speed vs. accuracy. Faster, lower-accuracy methods process large libraries to remove obvious non-binders, while slower, higher-accuracy methods are reserved for the smaller, pre-enriched pools that survive each cut.

The Three-Phase Funnel

  • Phase 1 — Static Docking: Process millions of compounds cheaply. Hard-filter on binding affinity. Carry forward the top 2–5%.
  • Phase 2 — Flexible Docking: Introduce receptor flexibility + CNN rescoring. Multi-dimensional filter with physical plausibility gates. Output ~3,000–4,000 compounds.
  • Phase 3 — Ensemble Docking: Use molecular dynamics-derived protein conformers. Aggregate across snapshots. CNN affinity + pose filters. Output ~500 compounds.
Alternative Funnel (if benchmarks are operating well; use when in the regime of 3–10k compounds that still require filtration):
  • Phase 4 (optional) — Boltz2 Co-folding: Deepest structural predictions. Gate on TM score, iPTM, pIC50 or affinity probability. Output top 50–100 candidates.

Master Decision Summary — Three-Phase Docking Pipeline

EngineLibrary SizePrimary MetricKey ThresholdOutput / Next Stage
Static DockingFull library (1M–2M)Best Affinity (kcal/mol)< −8 kcal/molTop 2–5% → Flexible Docking
Flexible Docking~40–50k filteredCNN Affinity (desc.)Intramol ≤ 0; CNN Pose ≥ 0.6Top 3–4k → Ensemble Docking
Ensemble Docking3–5kCNN Affinity (aggregated)80th/20th percentile filtersTop 500 → MD/FEP

Alternative Pipeline

EngineLibrary SizePrimary MetricKey ThresholdOutput / Next Stage
Boltz2 Co-folding500–1,000pIC50 / affinity_probTM ≥ 0.5; iPTM > 0.6Top 50–100 → Diversity + Wet Lab
Library sizes above are representative; scale thresholds based on your compute budget and target class. Calibration R² always drives engine selection — see Section 3.

2. Pre-Screening: Library Preparation

2.1 Source & Deduplication

Before any docking begins, prepare the compound library to ensure clean, unique chemistry.
  • Consolidate all library sources (sub-batches, vendors, internal plates) into a single file.
  • Deduplicate on canonical SMILES string — not on compound identifier or name.
  • Record: total input count, unique count, duplicate count. This becomes your baseline.
  • Log the deduplication statistics explicitly. They matter for tracking compound attrition across the funnel.
Deduplication by SMILES ensures unique chemistry evaluation. Duplicates skew frequency-based rankings and waste compute.

2.2 ADMET & Physicochemical Pre-Filtering (Optional Pre-Screen)

For very large libraries (>500k), applying lightweight physicochemical filters before docking can reduce noise and runtime. These are not required but are recommended when the library source is broad (e.g., a general commercial collection rather than a focused set).
  • Lipinski Ro5: MW ≤ 500, logP ≤ 5, H-bond donors ≤ 5, H-bond acceptors ≤ 10
  • QED score (Quantitative Estimate of Drug-likeness): QED ≥ 0.4 for lead-like filtering
  • TPSA: ≤ 140 Ų (general oral bioavailability proxy)
  • Rotatable bonds: ≤ 10 (conformational complexity filter)
  • Pan-assay interference compounds (PAINS): flag and optionally remove reactive, promiscuous scaffolds
You can use RevADMET for this task.
Apply these only if your library is chemically unfiltered. For targeted collections (e.g., Enamine US/UA stock), these filters add marginal value and may remove valid hits.

2.3 Protein Target Preparation

The quality of the protein structure drives the quality of every downstream docking run. Do not skip this step.
  • Source a high-resolution crystal structure for your target (from PDB or AlphaFold2/3 if no crystal exists).
  • Remove all co-crystallized ligands, water molecules, and ions that are not part of the binding site.
  • Retain metal cofactors that are known to participate in binding (e.g., zinc in coordination sites).
  • Verify the intended chain(s) are present; strip extraneous chains not needed for the screen.
  • Define the docking box: center on the known binding site or active site. A 30 × 30 × 30 Å box is typical for most pockets; adjust based on pocket size.
  • Run a short MD simulation (10 ns minimum) to check protein stability in solution before running ensemble docking downstream. Monitor RMSD, RMSF, and radius of gyration for convergence.

3. Calibration Strategy

Calibration is the most important step that most campaigns skip. Before screening thousands or millions of compounds, you must establish which scoring function actually correlates with experimental potency for your specific target. R² from IC50 calibration dictates every engine and readout selection decision downstream.

3.1 What to Calibrate

  • Use a set of compounds with known experimental activity (EC50, IC50, Ki) against your target.
  • 10–20 diverse compounds with at least a 10-fold range in potency is the minimum. Wider is better.
  • Include both active and inactive compounds if available.
  • Run this calibration set through every docking modality you plan to use (static, flexible, ensemble, Boltz2).

3.2 Calibration Metrics to Compare

Method / ReadoutInterpretation
Ensemble Docking — CNN AffinityHighest accuracy; preferred primary readout
Ensemble Docking — CNN Pose ScoreStrong geometry validation
Flexible Docking — CNN AffinityGood fallback; used when ensemble not run
Ensemble Docking — Best AffinityModerate; supplement with CNN metrics
Flexible Docking — Best AffinityWeaker; use only as supporting signal
Boltz2 — pIC50 / log₁₀(IC50)Exploratory; gate with TM score / iPTM
Boltz2 — Affinity ProbabilityWeak; use as tiebreaker only
Utilize Pearson Correlations, Spearman Coefficients, and RMSE/MAE to calibrate. Calibrations should be done based on the number of compounds available on your test sets as well (n sensitivity).
  • R² > 0.7 is a reliable readout for ranking — use as primary score.
  • R² 0.4–0.7 is a usable supporting signal — combine with a stronger readout.
  • R² < 0.4 should not be used as a standalone primary metric.
Always re-run calibration on your specific target — values will vary by protein class, binding site character, and library chemistry.

3.3 Calibration Decision Rule

  • Aim for R² > 0.8: Calibrate across static, flexible, ensemble docking, and co-folding. If you get to >0.8 Pearson, use that as your primary guide for downstream filtering.
  • If R² > 0.6: Attempt to utilize other engines that represent the biology better, like ensemble docking.
  • If primary calibration metrics are weak: Fall back to affinity_probability as tiebreaker, gated by TM score and iPTM.
Calibration is not a one-time activity. Re-calibrate whenever you change the binding site definition, protein model, or add flexibility to new residues.

4. Phase 1 — Static (Rigid) Docking

Static docking treats both the protein and ligand as rigid bodies. It is the fastest method, making it the only practical choice for screening libraries in the millions. The purpose here is rapid triage: eliminate obvious non-binders, not to find the best poses. It is driven by GPUs so it is able to move much faster.

4.1 Setup Parameters

  • Protein: rigid receptor, prepared as described in Section 2.3
  • Ligand: rigid SMILES input; do not generate flexible conformers at this stage
  • Exhaustiveness: 8 for full library triage; increase to 16–32 for smaller batches if time allows; 200 only for calibration compounds
  • Docking box: 30 × 30 × 30 Å centered on binding site (adjust per target)
  • Batch processing: split large libraries into batches of 100k–250k for parallelization and fault tolerance

4.2 Key Metrics & Thresholds

MetricThresholdNotes
Binding Affinity< −8 kcal/mol (strict)Hard cutoff. Compounds weaker than −8 kcal/mol are deprioritized for downstream runs
Binding Affinity — strong hits< −10 to −15 kcal/molCompounds in this range should be forwarded preferentially to flexible/ensemble docking
Number of poses≥ 9 poses generatedInspect top 3 poses for pharmacophoric match to known binding residues
Exhaustiveness8 (screening) → 200 (calibration)Use low exhaustiveness for full library triage; increase to 200 only for calibration compounds

4.3 Filtering Logic

  1. Remove all rows with missing or null affinity scores.
  2. Sort by Best Affinity ascending (most negative = strongest binding).
  3. Apply hard cutoff: Best Affinity < −8 kcal/mol.
  4. From the passing compounds, take the top 2–5% by affinity for the next phase.
  5. For targets where known active compounds cluster at −10 to −15 kcal/mol, set the threshold accordingly.
Distribution sanity check:
  • Compute mean, median, and standard deviation of Best Affinity across the full screened set.
  • If mean affinity is weaker than −8 kcal/mol, your library may not contain quality binders for this pocket, or the pocket definition needs adjustment.
  • Validate that your known calibration hits fall in the top 5–10% of the distribution.
Static docking will produce false positives. The purpose of this stage is speed-based enrichment only. All static docking hits must be validated through flexible or ensemble docking.

4.4 Chemical Space Check (Post Phase 1)

After extracting your top compounds, plot a UMAP or PCA of the filtered set using Morgan fingerprints (ECFP4, radius 2, 2048 bits). Verify:
  • The shortlisted compounds are not all clustered in one scaffold region (confirms chemical diversity in your carry-forward set).
  • Known active compounds, if available, fall within or near the dense regions of the top-scoring set.
  • Isolated outliers in chemical space are not artifacts of the library (check their raw docking scores).
  • You can sample across the distribution of chemical spaces to take a more diverse set into further screening. At the end of the day, you are triaging all of these into the wet lab to get primary SAR before lead optimization, so more chemical diversity helps to get more shots on target with diverse chemistries before optimizing within constrained spaces.

5. Phase 2 — Flexible Docking with CNN Rescoring

Flexible docking allows specified protein side chains (residues in the binding site) to move during the docking calculation, and incorporates a Convolutional Neural Network (CNN) to re-score poses based on geometric and energetic realism. This dramatically improves accuracy over static docking at moderate compute cost.

5.1 Setup Parameters

  • Protein: same structure as Phase 1, but with designated flexible residues enabled
  • Flexible residues: select residues with known pharmacophoric roles (from co-crystal data, mutagenesis, or MD RMSF analysis). Typically 2–5 residues. Do not make the entire protein flexible.
  • CNN re-scoring: enabled; adds a geometry-aware neural network on top of classical docking scoring
  • Exhaustiveness: 8 (default for flexible screen); increase to 32 for highest-priority batches
  • Input: top-scoring compounds from Phase 1 static screen

5.2 Outputs Produced

  • Best Affinity (kcal/mol): classical empirical binding energy; lower = more favorable
  • Best Intramol (kcal/mol): intramolecular strain energy of the ligand in the docked pose; higher = more strained
  • Best CNN Pose Score (0–1): CNN-based assessment of pose geometry and physical realism; higher = more realistic
  • Best CNN Affinity: CNN-predicted binding affinity (pK metric); higher = stronger predicted binding

5.3 Multi-Dimensional Filtering Pipeline

The flexible docking filter is a sequential QC funnel, not a single cutoff. Apply in order:
Filter / GateThresholdPurpose
Intramol Strain VetoBest Intramol ≤ 0 kcal/molRemoves physically strained (implausible) conformations
CNN Pose Quality GateBest CNN Pose Score ≥ 0.6Validates geometric realism of the docked pose (0–1 scale)
Binding Energy FloorBest Affinity ≤ −10 kcal/molEnsures minimum thermodynamic favorability for the pocket
Primary RankingBest CNN Affinity (desc.)Final sort: highest CNN affinity compounds advance
Final output selection:
  • After all gates pass, sort by Best CNN Affinity descending.
  • Select top N compounds (typically 3,000–4,000 as input to ensemble docking).
  • Retain a backup pool (top 4,000) if downstream ensemble docking yields insufficient hits.
CNN Pose Score ≥ 0.6 is a geometry threshold, not a strict binary. If your target class shows systematically lower pose scores (e.g., allosteric or shallow sites), adjust downward — but document the change.

5.4 Metric Definitions Reference

  • Best Affinity: Additive, empirical energy calculation. Reflects thermodynamic favorability of the interaction. Lower (more negative) is better.
  • Best Intramol: Minimum intramolecular energy of the ligand in its docked conformation. Values > 0 indicate steric clashes or physically impossible geometries. Acts as a hard veto.
  • Best CNN Pose Score: Binary-style geometric validation (0–1 scale) assessing whether the pose looks physically realistic based on thousands of known crystal structures. Values ≥ 0.6 indicate plausible poses.
  • Best CNN Affinity: Neural network-predicted binding affinity derived from pose geometry and energetics. This is the highest-quality readout from flexible docking and the primary ranking signal.

6. Phase 3 — Ensemble Docking

Ensemble docking accounts for protein conformational dynamics by docking compounds against multiple representative protein structures, each from a different point in a molecular dynamics trajectory. This captures the protein’s natural flexibility beyond individual side chains and substantially reduces false positives.

6.1 Generating the Protein Ensemble

  • Run an MD simulation of the apo or holo protein for at least 10 ns (100 ns preferred for full equilibration).
  • Confirm equilibration: RMSD should plateau; RMSF should show stable core with defined flexible loops; radius of gyration should level off.
  • Extract representative snapshots at regular intervals (e.g., every 2 ns for a 10 ns simulation = 5 conformers; every 10 ns for 100 ns = 10 conformers).
  • Optionally exclude outlier snapshots where known binding site geometry is disrupted.
  • Run docking against each conformer independently, then aggregate scores per compound.

6.2 Score Aggregation per Compound

For each unique compound (identified by SMILES), aggregate across all conformers:
  • Best CNN Affinity → take the maximum across all conformers
  • Best Affinity → take the maximum across all conformers
  • Best Intramol → take the minimum across all conformers
This aggregation captures the best observed interaction of the compound with any accessible protein conformation. It is more informative than any single-conformer score.

6.3 Filtering Pipeline

StepOperationRationale
1. Dedup / AggregatePer SMILES: CNN Affinity → max; Best Affinity → max; Intramol → minCollapses multiple poses per compound to single representative scores
2. CNN Affinity Filter≥ 80th percentile of setSelects top-scoring compounds by neural network affinity prediction
3. Best Affinity Filter≤ 20th percentile (within subset)Confirms thermodynamic favorability via classical scoring within the CNN-filtered pool
4. Intramol VetoBest Intramol < 0 kcal/molHard exclusion of strained structures
5. Final RankCNN Affinity (desc.)Sort and take top N
Percentile-based thresholds (80th/20th) are relative to the screened set. Re-compute percentiles after aggregation, not from pre-aggregation raw scores.

6.4 Why Ensemble Docking Outperforms Flexible Docking

  • Flexible docking moves only designated side chains. Ensemble docking samples backbone movements and global conformational states that flexible docking cannot reach.
  • CNN Affinity R² typically improves by 15–25% going from flexible to ensemble docking in a well-calibrated system.
  • Ensemble docking is significantly more compute-intensive. Reserve it for the pre-filtered pool (3k–5k compounds) from Phase 2, not the full library.

7. Alternative Pipeline — Boltz2 Co-folding

Boltz2 is a structure prediction model that co-folds a protein–ligand complex from sequence and SMILES, generating predicted binding poses and associated confidence metrics. It is the most computationally expensive per-compound method and is reserved for the top 500–1,000 candidates from ensemble docking.

7.1 Key Outputs from Boltz2

ReadoutGood Range / ThresholdInterpretation / Notes
pIC50> 5.0 (i.e. IC50 < 10 μM)Predicted potency in log scale. Use as primary ranking when IC50 calibration R² > 0.5
Affinity Probability> 0.5 (higher = better)Model confidence in a binding event. Use as tiebreaker or when IC50 calibration is weak
Predicted TM Score≥ 0.5Structural reliability of the co-folded complex. Gate: reject compounds below threshold regardless of affinity
iPTM (interface pTM)> 0.6 preferredInterface quality score. High iPTM with low TM = good binding pose but uncertain overall fold; still useful
Confidence ScoreUse only as supporting signalLow predictive correlation with experimental IC50 on its own; context-dependent

7.2 Ranking Strategy — Choosing the Right Readout

Use the calibration R² from Section 3 to select your primary readout:
  • If IC50 calibration R² ≥ 0.5 for pIC50: rank by pIC50 descending; gate by TM score ≥ 0.5 and iPTM > 0.6
  • If IC50 calibration R² < 0.5: use affinity_probability as tiebreaker, with TM score and iPTM as mandatory gates
  • In all cases: compute predicted_ln(ic50)_nM = log₁₀(predicted_ic50_nM) and include in export for downstream reference
  • Always apply confidence gates before using any potency ranking — a high pIC50 with a low TM score is not a trustworthy prediction

7.3 Gated Confidence Filtering

  • Gate 1 (Hard): TM Score ≥ 0.5. Predictions below this threshold indicate the model failed to produce a reliable fold and should be excluded.
  • Gate 2 (Soft): iPTM > 0.6. Interface quality. Values below this suggest the binding interface is not well-modeled; flag but do not necessarily exclude if other metrics are strong.
  • Gate 3 (Context-dependent): Affinity Probability > 0.5 as a supporting filter when potency data is unavailable.
Boltz2 calibrates better for some target classes than others. Co-folding performance is weakest for large allosteric sites, covalent binders, and metal-coordinated ligands. Use with appropriate skepticism and weight against docking data.

7.4 Hybrid Boltz2 + Ensemble Approach (Exploratory)

As an exploratory check, you can merge Boltz2 outputs with ensemble docking scores:
  • Apply the same ensemble pose filters (Section 6.3) to the merged set.
  • Rank by predicted_pic50 descending.
  • Use this as a comparison list, not as the primary deliverable.
  • If Boltz2 IC50 calibration is valid, the Boltz2-primary ranking supersedes the hybrid for final compound selection.

8. Chemical Space Analysis

Chemical space visualization is performed at two key points in the campaign: (1) after Phase 1 to verify diversity of the carry-forward pool, and (2) before final compound selection to ensure the final set covers the activity landscape. Skip this step and you risk selecting 50 structurally identical compounds with one structural scaffold, which doesn’t help you get the outputs you need for a primary screen — which is primarily SAR.

8.1 Fingerprinting

All chemical space analysis uses Morgan fingerprints (ECFP) as the molecular representation:
  • Morgan ECFP4: radius 2, 2048-bit vector. Standard default.
  • Morgan ECFP6: radius 3 for finer resolution of large, complex libraries.
  • Generate fingerprints for the full screened set and the shortlisted subset simultaneously for direct comparison.

8.2 Dimensionality Reduction Methods

MethodWhat It CapturesWhen to UseInterpretation Tips
PCAGlobal structural diversity (variance-maximizing axes)First pass: gauge overall chemical space breadthTight clusters = structurally similar compounds; spread = diverse library
tSNELocal neighborhoods and cluster relationshipsWhen you need to identify sub-families or scaffold clustersNot comparable across runs; don’t read into global distances
UMAPBoth global and local structure simultaneouslyDefault for visualizing screened sets; best middle groundClusters with high CNN affinity/color encoding = activity hotspots to prioritize
Recommended workflow:
  1. Run UMAP as your default visualization. It balances global structure and local clusters.
  2. Overlay activity scores (CNN affinity, binding energy) as a color dimension to identify activity hotspots.
  3. Run PCA as a secondary check to validate that diversity metrics are not UMAP artifact-driven.
  4. Run tSNE only when you need to investigate specific scaffold families or local cluster composition.

8.3 What to Look For

Good diversity indicators:
  • Top-scoring compounds (color-coded) are spread across multiple UMAP regions, not concentrated in one cluster.
  • High-affinity regions have some overlap with known active scaffolds but also extend into novel chemical space.
  • The shortlisted set (e.g., top 3k from flexible docking) covers most of the dense regions of the full filtered pool.
Red flags:
  • 70% of top compounds fall within one tight UMAP cluster — indicates scaffold enrichment, not diverse chemistry.
  • Known active scaffolds are completely absent from the top-scoring set — may indicate a pocket definition problem.
  • The screened set collapses into a single dense region after filtering — likely because a narrow binding energy range was used; consider relaxing the threshold.

8.4 Centralized Clusters Correlated with High Activity

It is normal and expected for some chemical clusters to correspond to high CNN affinity or binding activity. This does not disqualify them. The goal is not to eliminate clusters, but to ensure that your final compound list is not exclusively drawn from one cluster while missing other potentially active regions. A compound list of 50 from one scaffold provides weak SAR; 50 compounds spanning 5–10 distinct scaffolds provides a robust SAR foundation.

9. Final Diversity-Driven Compound Selection

After all computational filters have been applied, you will typically have 100–500 compounds that pass potency and quality thresholds. Final selection to the experimental batch size (typically 50–100) requires balancing hit quality with chemical diversity.

9.1 Seed Set Definition

Start by anchoring the selection to confirmed high-quality compounds:
  • Extract the top 10 compounds by highest CNN affinity (or pIC50 if Boltz2 was primary).
  • Extract the top 10 compounds by lowest binding energy (most negative Best Affinity).
  • Deduplicate to produce a seed set. This ensures the final list retains the best-scoring compounds regardless of diversity algorithm outcome.

9.2 MaxMin Selection

MaxMin (Maximum Minimum Distance) is the standard diversity selection algorithm for compound libraries. It maximizes structural spread across the chemical space.
  1. Compute Morgan ECFP4 fingerprints for all candidate compounds.
  2. Compute pairwise Tanimoto distance = 1 − Tanimoto similarity for all compound pairs.
  3. Initialize with the seed set (best-scoring compounds from 9.1).
  4. Iteratively add the compound that is farthest from all currently selected compounds (maximizes minimum pairwise distance).
  5. Continue until target selection size is reached.
Why MaxMin works here: By seeding with the top-affinity compounds, you guarantee the best hits are included. MaxMin then fills the remainder with maximally diverse chemistry, ensuring coverage of activity hotspots across the chemical space rather than oversampling one scaffold family.

9.3 Verification of Final Set

Before finalizing, verify the selected set using UMAP and PCA:
  • Plot seed compounds (blue), MaxMin-selected compounds (red), and the full candidate pool (grey).
  • Confirm seed and MaxMin compounds are distributed across the major UMAP clusters.
  • Verify no single cluster accounts for more than 30–40% of the final selected set (unless target biology demands scaffold specificity).
  • If diversity is inadequate, relax the seed set size or broaden the input candidate pool.
Chemical diversity in the experimental set is not just aesthetically preferable — it directly determines the SAR value of the data returned from the wet lab. Redundant scaffolds give you redundant data.

10. Post-Selection: MD Simulation Validation

The top 50–100 compounds from the final selection should undergo individual molecular dynamics simulation before ordering. This step provides the highest-confidence computational validation and filters the list once more before incurring synthesis or procurement costs.

10.1 Short MD Run (0.1 ns) — Screening Mode

  • Run all final candidates at 0.1 ns in complex with the target.
  • Compute MMPBSA/MMGBSA binding free energy as an approximation.
  • Note: 0.1 ns simulations are pre-equilibrium. Energies will be overestimates but are useful for relative ranking and early flag-raising.
  • Flag any compound showing immediate ligand displacement or catastrophic pose degradation.

10.2 Full MD Run (1–10 ns) — Final Validation

  • Take the top 20–30 compounds from the 0.1 ns screen and run for 1–10 ns.
  • Monitor RMSD of the ligand in the binding site: stable RMSD < 2–3 Å indicates retained binding.
  • Monitor RMSF of key binding residues: unexpected rigidification or large fluctuations signal poor complementarity.
  • SASA (Solvent Accessible Surface Area): ligand binding should reduce solvent exposure in the pocket.
  • Compute final MMPBSA/MMGBSA energies. Reference: compounds with −15 to −20 kcal/mol binding free energies are strong candidates.

10.3 Binding Free Energy Benchmarks

Potency CategoryIC50 RangeMMPBSA Range
Tight bindersSub-100 nM IC50−15 to −20 kcal/mol
Moderate binders1–10 μM IC50−8 to −15 kcal/mol
Weak / threshold binders> 10 μM< −8 kcal/mol
Proceed with weak/threshold binders only with strong computational evidence from other engines.

11. Complete Campaign Checklist

Pre-Screening

1

Library sourced and deduplicated by canonical SMILES

2

ADMET pre-filters applied (if applicable)

3

Protein structure prepared, cleaned, metal cofactors retained

4

Docking box defined and validated against known ligand binding site

5

MD simulation run; equilibration confirmed

Calibration

1

Calibration compounds with experimental IC50/EC50 values assembled

2

All docking modalities run on calibration set

3

R² table computed; primary readout for each phase selected

4

Engine selection documented

Phase 1 — Static Docking

1

Full library screened; batched as needed

2

Best Affinity < −8 kcal/mol filter applied

3

Top 2–5% of library carried forward

4

UMAP diversity check of carry-forward pool completed

Phase 2 — Flexible Docking

1

Flexible residues designated based on SAR/MD analysis

2

CNN rescoring enabled

3

Intramol ≤ 0 gate applied

4

CNN Pose Score ≥ 0.6 gate applied

5

Binding Energy ≤ −10 kcal/mol gate applied

6

Final sort by CNN Affinity; top 3–4k selected

Phase 3 — Ensemble Docking

1

MD ensemble generated; snapshots extracted

2

Docking run against each conformer

3

Scores aggregated per SMILES

4

80th/20th percentile filters applied

5

Top 500 compounds selected

Phase 4 — Boltz2 (Optional)

1

Co-folding run on top 500–1,000

2

TM Score ≥ 0.5 gate applied

3

iPTM > 0.6 gate applied

4

Primary readout selected based on calibration (pIC50 or affinity_probability)

5

Top 100–200 candidates identified

Final Selection

1

Seed set defined (top 10 CNN affinity + top 10 binding energy)

2

Morgan ECFP4 fingerprints computed

3

Tanimoto distances computed

4

MaxMin selection run to target N (50–100)

5

UMAP/PCA of final set verifies diversity

6

0.1 ns MD validation run on all final candidates

7

Full MD validation (1–10 ns) on top 20–30 candidates

8

Final compound list prepared for procurement or synthesis

Your final experimental compound set should represent diverse scaffolds, pass physical plausibility filters, show strong predicted binding energies across multiple engines, and include both the best-scoring hits and structurally distinct representatives from each major activity cluster.

12. Glossary

TermDefinition
Best Affinity (kcal/mol)Classical empirical binding energy from docking. Lower (more negative) = stronger predicted binding.
Best Intramol (kcal/mol)Intramolecular strain energy of the ligand in its docked conformation. Values > 0 indicate physically implausible geometry (hard veto).
CNN AffinityConvolutional neural network-predicted binding affinity. Trained on crystal structures; captures geometric and energetic features beyond classical scoring.
CNN Pose Score (0–1)Binary-style geometric validation of the docked pose. ≥ 0.6 = physically plausible geometry.
ExhaustivenessDocking algorithm search parameter. Higher = more thorough search but slower. Use 8 for triage; 200 for calibration.
pIC50Negative log₁₀ of IC50 in molar units. Higher = more potent. pIC50 = 9 corresponds to IC50 = 1 nM.
TM ScoreTemplate Modeling score from Boltz2. Measures structural similarity of predicted complex to a plausible reference. ≥ 0.5 indicates reliable fold prediction.
iPTMInterface predicted TM score. Measures quality of the binding interface specifically. > 0.6 preferred.
Affinity ProbabilityBoltz2 model confidence in a binding event occurring. Supplements pIC50 as a tiebreaker.
MaxMin SelectionDiversity selection algorithm. Iteratively adds compounds maximally distant from the current set by Tanimoto distance.
Tanimoto SimilarityFingerprint-based pairwise molecular similarity metric (0–1). Tanimoto Distance = 1 − similarity.
Morgan / ECFPExtended Connectivity FingerPrint. Circular fingerprint encoding chemical neighborhoods around each atom. ECFP4 (radius 2) is standard.
UMAPUniform Manifold Approximation and Projection. Dimensionality reduction for chemical space visualization capturing both global and local structure.
MMPBSA/MMGBSAMolecular Mechanics Poisson–Boltzmann / Generalized Born Surface Area. Free energy calculation methods for estimating binding affinity from MD trajectories.
R² (Calibration)Coefficient of determination between predicted docking scores and experimental IC50 values. Primary metric for engine selection.
SMILESSimplified Molecular Input Line Entry System. String representation of molecular structure. Used as canonical identifier for deduplication.