End-to-End Virtual Screening Campaign Guide
Filtering Mega Chemical Spaces into Experimental-Ready Compound Sets
This guide provides a complete, engine-by-engine protocol for running a computational drug discovery virtual screening campaign — from initial library ingestion through final compound selection for wet lab testing. It covers static docking, flexible docking, ensemble docking, Boltz2 co-folding, chemical space analysis, and diversity-driven final selection.
Follow the funnel in order. Each phase feeds the next. Do not skip phases.
1. Campaign Philosophy & Funnel Architecture
Virtual screening campaigns operate as a staged triage funnel. The fundamental trade-off at every stage is compute speed vs. accuracy. Faster, lower-accuracy methods process large libraries to remove obvious non-binders, while slower, higher-accuracy methods are reserved for the smaller, pre-enriched pools that survive each cut.
The Three-Phase Funnel
- Phase 1 — Static Docking: Process millions of compounds cheaply. Hard-filter on binding affinity. Carry forward the top 2–5%.
- Phase 2 — Flexible Docking: Introduce receptor flexibility + CNN rescoring. Multi-dimensional filter with physical plausibility gates. Output ~3,000–4,000 compounds.
- Phase 3 — Ensemble Docking: Use molecular dynamics-derived protein conformers. Aggregate across snapshots. CNN affinity + pose filters. Output ~500 compounds.
Alternative Funnel (if benchmarks are operating well; use when in the regime of 3–10k compounds that still require filtration):
- Phase 4 (optional) — Boltz2 Co-folding: Deepest structural predictions. Gate on TM score, iPTM, pIC50 or affinity probability. Output top 50–100 candidates.
Master Decision Summary — Three-Phase Docking Pipeline
| Engine | Library Size | Primary Metric | Key Threshold | Output / Next Stage |
|---|
| Static Docking | Full library (1M–2M) | Best Affinity (kcal/mol) | < −8 kcal/mol | Top 2–5% → Flexible Docking |
| Flexible Docking | ~40–50k filtered | CNN Affinity (desc.) | Intramol ≤ 0; CNN Pose ≥ 0.6 | Top 3–4k → Ensemble Docking |
| Ensemble Docking | 3–5k | CNN Affinity (aggregated) | 80th/20th percentile filters | Top 500 → MD/FEP |
Alternative Pipeline
| Engine | Library Size | Primary Metric | Key Threshold | Output / Next Stage |
|---|
| Boltz2 Co-folding | 500–1,000 | pIC50 / affinity_prob | TM ≥ 0.5; iPTM > 0.6 | Top 50–100 → Diversity + Wet Lab |
Library sizes above are representative; scale thresholds based on your compute budget and target class. Calibration R² always drives engine selection — see Section 3.
2. Pre-Screening: Library Preparation
2.1 Source & Deduplication
Before any docking begins, prepare the compound library to ensure clean, unique chemistry.
- Consolidate all library sources (sub-batches, vendors, internal plates) into a single file.
- Deduplicate on canonical SMILES string — not on compound identifier or name.
- Record: total input count, unique count, duplicate count. This becomes your baseline.
- Log the deduplication statistics explicitly. They matter for tracking compound attrition across the funnel.
Deduplication by SMILES ensures unique chemistry evaluation. Duplicates skew frequency-based rankings and waste compute.
2.2 ADMET & Physicochemical Pre-Filtering (Optional Pre-Screen)
For very large libraries (>500k), applying lightweight physicochemical filters before docking can reduce noise and runtime. These are not required but are recommended when the library source is broad (e.g., a general commercial collection rather than a focused set).
- Lipinski Ro5: MW ≤ 500, logP ≤ 5, H-bond donors ≤ 5, H-bond acceptors ≤ 10
- QED score (Quantitative Estimate of Drug-likeness): QED ≥ 0.4 for lead-like filtering
- TPSA: ≤ 140 Ų (general oral bioavailability proxy)
- Rotatable bonds: ≤ 10 (conformational complexity filter)
- Pan-assay interference compounds (PAINS): flag and optionally remove reactive, promiscuous scaffolds
You can use RevADMET for this task.
Apply these only if your library is chemically unfiltered. For targeted collections (e.g., Enamine US/UA stock), these filters add marginal value and may remove valid hits.
2.3 Protein Target Preparation
The quality of the protein structure drives the quality of every downstream docking run. Do not skip this step.
- Source a high-resolution crystal structure for your target (from PDB or AlphaFold2/3 if no crystal exists).
- Remove all co-crystallized ligands, water molecules, and ions that are not part of the binding site.
- Retain metal cofactors that are known to participate in binding (e.g., zinc in coordination sites).
- Verify the intended chain(s) are present; strip extraneous chains not needed for the screen.
- Define the docking box: center on the known binding site or active site. A 30 × 30 × 30 Å box is typical for most pockets; adjust based on pocket size.
- Run a short MD simulation (10 ns minimum) to check protein stability in solution before running ensemble docking downstream. Monitor RMSD, RMSF, and radius of gyration for convergence.
3. Calibration Strategy
Calibration is the most important step that most campaigns skip. Before screening thousands or millions of compounds, you must establish which scoring function actually correlates with experimental potency for your specific target. R² from IC50 calibration dictates every engine and readout selection decision downstream.
3.1 What to Calibrate
- Use a set of compounds with known experimental activity (EC50, IC50, Ki) against your target.
- 10–20 diverse compounds with at least a 10-fold range in potency is the minimum. Wider is better.
- Include both active and inactive compounds if available.
- Run this calibration set through every docking modality you plan to use (static, flexible, ensemble, Boltz2).
3.2 Calibration Metrics to Compare
| Method / Readout | Interpretation |
|---|
| Ensemble Docking — CNN Affinity | Highest accuracy; preferred primary readout |
| Ensemble Docking — CNN Pose Score | Strong geometry validation |
| Flexible Docking — CNN Affinity | Good fallback; used when ensemble not run |
| Ensemble Docking — Best Affinity | Moderate; supplement with CNN metrics |
| Flexible Docking — Best Affinity | Weaker; use only as supporting signal |
| Boltz2 — pIC50 / log₁₀(IC50) | Exploratory; gate with TM score / iPTM |
| Boltz2 — Affinity Probability | Weak; use as tiebreaker only |
Utilize Pearson Correlations, Spearman Coefficients, and RMSE/MAE to calibrate. Calibrations should be done based on the number of compounds available on your test sets as well (n sensitivity).
- R² > 0.7 is a reliable readout for ranking — use as primary score.
- R² 0.4–0.7 is a usable supporting signal — combine with a stronger readout.
- R² < 0.4 should not be used as a standalone primary metric.
Always re-run calibration on your specific target — values will vary by protein class, binding site character, and library chemistry.
3.3 Calibration Decision Rule
- Aim for R² > 0.8: Calibrate across static, flexible, ensemble docking, and co-folding. If you get to >0.8 Pearson, use that as your primary guide for downstream filtering.
- If R² > 0.6: Attempt to utilize other engines that represent the biology better, like ensemble docking.
- If primary calibration metrics are weak: Fall back to affinity_probability as tiebreaker, gated by TM score and iPTM.
Calibration is not a one-time activity. Re-calibrate whenever you change the binding site definition, protein model, or add flexibility to new residues.
4. Phase 1 — Static (Rigid) Docking
Static docking treats both the protein and ligand as rigid bodies. It is the fastest method, making it the only practical choice for screening libraries in the millions. The purpose here is rapid triage: eliminate obvious non-binders, not to find the best poses. It is driven by GPUs so it is able to move much faster.
4.1 Setup Parameters
- Protein: rigid receptor, prepared as described in Section 2.3
- Ligand: rigid SMILES input; do not generate flexible conformers at this stage
- Exhaustiveness: 8 for full library triage; increase to 16–32 for smaller batches if time allows; 200 only for calibration compounds
- Docking box: 30 × 30 × 30 Å centered on binding site (adjust per target)
- Batch processing: split large libraries into batches of 100k–250k for parallelization and fault tolerance
4.2 Key Metrics & Thresholds
| Metric | Threshold | Notes |
|---|
| Binding Affinity | < −8 kcal/mol (strict) | Hard cutoff. Compounds weaker than −8 kcal/mol are deprioritized for downstream runs |
| Binding Affinity — strong hits | < −10 to −15 kcal/mol | Compounds in this range should be forwarded preferentially to flexible/ensemble docking |
| Number of poses | ≥ 9 poses generated | Inspect top 3 poses for pharmacophoric match to known binding residues |
| Exhaustiveness | 8 (screening) → 200 (calibration) | Use low exhaustiveness for full library triage; increase to 200 only for calibration compounds |
4.3 Filtering Logic
- Remove all rows with missing or null affinity scores.
- Sort by Best Affinity ascending (most negative = strongest binding).
- Apply hard cutoff: Best Affinity < −8 kcal/mol.
- From the passing compounds, take the top 2–5% by affinity for the next phase.
- For targets where known active compounds cluster at −10 to −15 kcal/mol, set the threshold accordingly.
Distribution sanity check:
- Compute mean, median, and standard deviation of Best Affinity across the full screened set.
- If mean affinity is weaker than −8 kcal/mol, your library may not contain quality binders for this pocket, or the pocket definition needs adjustment.
- Validate that your known calibration hits fall in the top 5–10% of the distribution.
Static docking will produce false positives. The purpose of this stage is speed-based enrichment only. All static docking hits must be validated through flexible or ensemble docking.
4.4 Chemical Space Check (Post Phase 1)
After extracting your top compounds, plot a UMAP or PCA of the filtered set using Morgan fingerprints (ECFP4, radius 2, 2048 bits). Verify:
- The shortlisted compounds are not all clustered in one scaffold region (confirms chemical diversity in your carry-forward set).
- Known active compounds, if available, fall within or near the dense regions of the top-scoring set.
- Isolated outliers in chemical space are not artifacts of the library (check their raw docking scores).
- You can sample across the distribution of chemical spaces to take a more diverse set into further screening. At the end of the day, you are triaging all of these into the wet lab to get primary SAR before lead optimization, so more chemical diversity helps to get more shots on target with diverse chemistries before optimizing within constrained spaces.
5. Phase 2 — Flexible Docking with CNN Rescoring
Flexible docking allows specified protein side chains (residues in the binding site) to move during the docking calculation, and incorporates a Convolutional Neural Network (CNN) to re-score poses based on geometric and energetic realism. This dramatically improves accuracy over static docking at moderate compute cost.
5.1 Setup Parameters
- Protein: same structure as Phase 1, but with designated flexible residues enabled
- Flexible residues: select residues with known pharmacophoric roles (from co-crystal data, mutagenesis, or MD RMSF analysis). Typically 2–5 residues. Do not make the entire protein flexible.
- CNN re-scoring: enabled; adds a geometry-aware neural network on top of classical docking scoring
- Exhaustiveness: 8 (default for flexible screen); increase to 32 for highest-priority batches
- Input: top-scoring compounds from Phase 1 static screen
5.2 Outputs Produced
- Best Affinity (kcal/mol): classical empirical binding energy; lower = more favorable
- Best Intramol (kcal/mol): intramolecular strain energy of the ligand in the docked pose; higher = more strained
- Best CNN Pose Score (0–1): CNN-based assessment of pose geometry and physical realism; higher = more realistic
- Best CNN Affinity: CNN-predicted binding affinity (pK metric); higher = stronger predicted binding
5.3 Multi-Dimensional Filtering Pipeline
The flexible docking filter is a sequential QC funnel, not a single cutoff. Apply in order:
| Filter / Gate | Threshold | Purpose |
|---|
| Intramol Strain Veto | Best Intramol ≤ 0 kcal/mol | Removes physically strained (implausible) conformations |
| CNN Pose Quality Gate | Best CNN Pose Score ≥ 0.6 | Validates geometric realism of the docked pose (0–1 scale) |
| Binding Energy Floor | Best Affinity ≤ −10 kcal/mol | Ensures minimum thermodynamic favorability for the pocket |
| Primary Ranking | Best CNN Affinity (desc.) | Final sort: highest CNN affinity compounds advance |
Final output selection:
- After all gates pass, sort by Best CNN Affinity descending.
- Select top N compounds (typically 3,000–4,000 as input to ensemble docking).
- Retain a backup pool (top 4,000) if downstream ensemble docking yields insufficient hits.
CNN Pose Score ≥ 0.6 is a geometry threshold, not a strict binary. If your target class shows systematically lower pose scores (e.g., allosteric or shallow sites), adjust downward — but document the change.
5.4 Metric Definitions Reference
- Best Affinity: Additive, empirical energy calculation. Reflects thermodynamic favorability of the interaction. Lower (more negative) is better.
- Best Intramol: Minimum intramolecular energy of the ligand in its docked conformation. Values > 0 indicate steric clashes or physically impossible geometries. Acts as a hard veto.
- Best CNN Pose Score: Binary-style geometric validation (0–1 scale) assessing whether the pose looks physically realistic based on thousands of known crystal structures. Values ≥ 0.6 indicate plausible poses.
- Best CNN Affinity: Neural network-predicted binding affinity derived from pose geometry and energetics. This is the highest-quality readout from flexible docking and the primary ranking signal.
6. Phase 3 — Ensemble Docking
Ensemble docking accounts for protein conformational dynamics by docking compounds against multiple representative protein structures, each from a different point in a molecular dynamics trajectory. This captures the protein’s natural flexibility beyond individual side chains and substantially reduces false positives.
6.1 Generating the Protein Ensemble
- Run an MD simulation of the apo or holo protein for at least 10 ns (100 ns preferred for full equilibration).
- Confirm equilibration: RMSD should plateau; RMSF should show stable core with defined flexible loops; radius of gyration should level off.
- Extract representative snapshots at regular intervals (e.g., every 2 ns for a 10 ns simulation = 5 conformers; every 10 ns for 100 ns = 10 conformers).
- Optionally exclude outlier snapshots where known binding site geometry is disrupted.
- Run docking against each conformer independently, then aggregate scores per compound.
6.2 Score Aggregation per Compound
For each unique compound (identified by SMILES), aggregate across all conformers:
- Best CNN Affinity → take the maximum across all conformers
- Best Affinity → take the maximum across all conformers
- Best Intramol → take the minimum across all conformers
This aggregation captures the best observed interaction of the compound with any accessible protein conformation. It is more informative than any single-conformer score.
6.3 Filtering Pipeline
| Step | Operation | Rationale |
|---|
| 1. Dedup / Aggregate | Per SMILES: CNN Affinity → max; Best Affinity → max; Intramol → min | Collapses multiple poses per compound to single representative scores |
| 2. CNN Affinity Filter | ≥ 80th percentile of set | Selects top-scoring compounds by neural network affinity prediction |
| 3. Best Affinity Filter | ≤ 20th percentile (within subset) | Confirms thermodynamic favorability via classical scoring within the CNN-filtered pool |
| 4. Intramol Veto | Best Intramol < 0 kcal/mol | Hard exclusion of strained structures |
| 5. Final Rank | CNN Affinity (desc.) | Sort and take top N |
Percentile-based thresholds (80th/20th) are relative to the screened set. Re-compute percentiles after aggregation, not from pre-aggregation raw scores.
- Flexible docking moves only designated side chains. Ensemble docking samples backbone movements and global conformational states that flexible docking cannot reach.
- CNN Affinity R² typically improves by 15–25% going from flexible to ensemble docking in a well-calibrated system.
- Ensemble docking is significantly more compute-intensive. Reserve it for the pre-filtered pool (3k–5k compounds) from Phase 2, not the full library.
7. Alternative Pipeline — Boltz2 Co-folding
Boltz2 is a structure prediction model that co-folds a protein–ligand complex from sequence and SMILES, generating predicted binding poses and associated confidence metrics. It is the most computationally expensive per-compound method and is reserved for the top 500–1,000 candidates from ensemble docking.
7.1 Key Outputs from Boltz2
| Readout | Good Range / Threshold | Interpretation / Notes |
|---|
| pIC50 | > 5.0 (i.e. IC50 < 10 μM) | Predicted potency in log scale. Use as primary ranking when IC50 calibration R² > 0.5 |
| Affinity Probability | > 0.5 (higher = better) | Model confidence in a binding event. Use as tiebreaker or when IC50 calibration is weak |
| Predicted TM Score | ≥ 0.5 | Structural reliability of the co-folded complex. Gate: reject compounds below threshold regardless of affinity |
| iPTM (interface pTM) | > 0.6 preferred | Interface quality score. High iPTM with low TM = good binding pose but uncertain overall fold; still useful |
| Confidence Score | Use only as supporting signal | Low predictive correlation with experimental IC50 on its own; context-dependent |
7.2 Ranking Strategy — Choosing the Right Readout
Use the calibration R² from Section 3 to select your primary readout:
- If IC50 calibration R² ≥ 0.5 for pIC50: rank by pIC50 descending; gate by TM score ≥ 0.5 and iPTM > 0.6
- If IC50 calibration R² < 0.5: use affinity_probability as tiebreaker, with TM score and iPTM as mandatory gates
- In all cases: compute
predicted_ln(ic50)_nM = log₁₀(predicted_ic50_nM) and include in export for downstream reference
- Always apply confidence gates before using any potency ranking — a high pIC50 with a low TM score is not a trustworthy prediction
7.3 Gated Confidence Filtering
- Gate 1 (Hard): TM Score ≥ 0.5. Predictions below this threshold indicate the model failed to produce a reliable fold and should be excluded.
- Gate 2 (Soft): iPTM > 0.6. Interface quality. Values below this suggest the binding interface is not well-modeled; flag but do not necessarily exclude if other metrics are strong.
- Gate 3 (Context-dependent): Affinity Probability > 0.5 as a supporting filter when potency data is unavailable.
Boltz2 calibrates better for some target classes than others. Co-folding performance is weakest for large allosteric sites, covalent binders, and metal-coordinated ligands. Use with appropriate skepticism and weight against docking data.
7.4 Hybrid Boltz2 + Ensemble Approach (Exploratory)
As an exploratory check, you can merge Boltz2 outputs with ensemble docking scores:
- Apply the same ensemble pose filters (Section 6.3) to the merged set.
- Rank by predicted_pic50 descending.
- Use this as a comparison list, not as the primary deliverable.
- If Boltz2 IC50 calibration is valid, the Boltz2-primary ranking supersedes the hybrid for final compound selection.
8. Chemical Space Analysis
Chemical space visualization is performed at two key points in the campaign: (1) after Phase 1 to verify diversity of the carry-forward pool, and (2) before final compound selection to ensure the final set covers the activity landscape. Skip this step and you risk selecting 50 structurally identical compounds with one structural scaffold, which doesn’t help you get the outputs you need for a primary screen — which is primarily SAR.
8.1 Fingerprinting
All chemical space analysis uses Morgan fingerprints (ECFP) as the molecular representation:
- Morgan ECFP4: radius 2, 2048-bit vector. Standard default.
- Morgan ECFP6: radius 3 for finer resolution of large, complex libraries.
- Generate fingerprints for the full screened set and the shortlisted subset simultaneously for direct comparison.
8.2 Dimensionality Reduction Methods
| Method | What It Captures | When to Use | Interpretation Tips |
|---|
| PCA | Global structural diversity (variance-maximizing axes) | First pass: gauge overall chemical space breadth | Tight clusters = structurally similar compounds; spread = diverse library |
| tSNE | Local neighborhoods and cluster relationships | When you need to identify sub-families or scaffold clusters | Not comparable across runs; don’t read into global distances |
| UMAP | Both global and local structure simultaneously | Default for visualizing screened sets; best middle ground | Clusters with high CNN affinity/color encoding = activity hotspots to prioritize |
Recommended workflow:
- Run UMAP as your default visualization. It balances global structure and local clusters.
- Overlay activity scores (CNN affinity, binding energy) as a color dimension to identify activity hotspots.
- Run PCA as a secondary check to validate that diversity metrics are not UMAP artifact-driven.
- Run tSNE only when you need to investigate specific scaffold families or local cluster composition.
8.3 What to Look For
Good diversity indicators:
- Top-scoring compounds (color-coded) are spread across multiple UMAP regions, not concentrated in one cluster.
- High-affinity regions have some overlap with known active scaffolds but also extend into novel chemical space.
- The shortlisted set (e.g., top 3k from flexible docking) covers most of the dense regions of the full filtered pool.
Red flags:
-
70% of top compounds fall within one tight UMAP cluster — indicates scaffold enrichment, not diverse chemistry.
- Known active scaffolds are completely absent from the top-scoring set — may indicate a pocket definition problem.
- The screened set collapses into a single dense region after filtering — likely because a narrow binding energy range was used; consider relaxing the threshold.
It is normal and expected for some chemical clusters to correspond to high CNN affinity or binding activity. This does not disqualify them. The goal is not to eliminate clusters, but to ensure that your final compound list is not exclusively drawn from one cluster while missing other potentially active regions. A compound list of 50 from one scaffold provides weak SAR; 50 compounds spanning 5–10 distinct scaffolds provides a robust SAR foundation.
9. Final Diversity-Driven Compound Selection
After all computational filters have been applied, you will typically have 100–500 compounds that pass potency and quality thresholds. Final selection to the experimental batch size (typically 50–100) requires balancing hit quality with chemical diversity.
9.1 Seed Set Definition
Start by anchoring the selection to confirmed high-quality compounds:
- Extract the top 10 compounds by highest CNN affinity (or pIC50 if Boltz2 was primary).
- Extract the top 10 compounds by lowest binding energy (most negative Best Affinity).
- Deduplicate to produce a seed set. This ensures the final list retains the best-scoring compounds regardless of diversity algorithm outcome.
9.2 MaxMin Selection
MaxMin (Maximum Minimum Distance) is the standard diversity selection algorithm for compound libraries. It maximizes structural spread across the chemical space.
- Compute Morgan ECFP4 fingerprints for all candidate compounds.
- Compute pairwise Tanimoto distance = 1 − Tanimoto similarity for all compound pairs.
- Initialize with the seed set (best-scoring compounds from 9.1).
- Iteratively add the compound that is farthest from all currently selected compounds (maximizes minimum pairwise distance).
- Continue until target selection size is reached.
Why MaxMin works here: By seeding with the top-affinity compounds, you guarantee the best hits are included. MaxMin then fills the remainder with maximally diverse chemistry, ensuring coverage of activity hotspots across the chemical space rather than oversampling one scaffold family.
9.3 Verification of Final Set
Before finalizing, verify the selected set using UMAP and PCA:
- Plot seed compounds (blue), MaxMin-selected compounds (red), and the full candidate pool (grey).
- Confirm seed and MaxMin compounds are distributed across the major UMAP clusters.
- Verify no single cluster accounts for more than 30–40% of the final selected set (unless target biology demands scaffold specificity).
- If diversity is inadequate, relax the seed set size or broaden the input candidate pool.
Chemical diversity in the experimental set is not just aesthetically preferable — it directly determines the SAR value of the data returned from the wet lab. Redundant scaffolds give you redundant data.
10. Post-Selection: MD Simulation Validation
The top 50–100 compounds from the final selection should undergo individual molecular dynamics simulation before ordering. This step provides the highest-confidence computational validation and filters the list once more before incurring synthesis or procurement costs.
10.1 Short MD Run (0.1 ns) — Screening Mode
- Run all final candidates at 0.1 ns in complex with the target.
- Compute MMPBSA/MMGBSA binding free energy as an approximation.
- Note: 0.1 ns simulations are pre-equilibrium. Energies will be overestimates but are useful for relative ranking and early flag-raising.
- Flag any compound showing immediate ligand displacement or catastrophic pose degradation.
10.2 Full MD Run (1–10 ns) — Final Validation
- Take the top 20–30 compounds from the 0.1 ns screen and run for 1–10 ns.
- Monitor RMSD of the ligand in the binding site: stable RMSD < 2–3 Å indicates retained binding.
- Monitor RMSF of key binding residues: unexpected rigidification or large fluctuations signal poor complementarity.
- SASA (Solvent Accessible Surface Area): ligand binding should reduce solvent exposure in the pocket.
- Compute final MMPBSA/MMGBSA energies. Reference: compounds with −15 to −20 kcal/mol binding free energies are strong candidates.
10.3 Binding Free Energy Benchmarks
| Potency Category | IC50 Range | MMPBSA Range |
|---|
| Tight binders | Sub-100 nM IC50 | −15 to −20 kcal/mol |
| Moderate binders | 1–10 μM IC50 | −8 to −15 kcal/mol |
| Weak / threshold binders | > 10 μM | < −8 kcal/mol |
Proceed with weak/threshold binders only with strong computational evidence from other engines.
11. Complete Campaign Checklist
Pre-Screening
Library sourced and deduplicated by canonical SMILES
ADMET pre-filters applied (if applicable)
Protein structure prepared, cleaned, metal cofactors retained
Docking box defined and validated against known ligand binding site
MD simulation run; equilibration confirmed
Calibration
Calibration compounds with experimental IC50/EC50 values assembled
All docking modalities run on calibration set
R² table computed; primary readout for each phase selected
Engine selection documented
Phase 1 — Static Docking
Full library screened; batched as needed
Best Affinity < −8 kcal/mol filter applied
Top 2–5% of library carried forward
UMAP diversity check of carry-forward pool completed
Phase 2 — Flexible Docking
Flexible residues designated based on SAR/MD analysis
Intramol ≤ 0 gate applied
CNN Pose Score ≥ 0.6 gate applied
Binding Energy ≤ −10 kcal/mol gate applied
Final sort by CNN Affinity; top 3–4k selected
Phase 3 — Ensemble Docking
MD ensemble generated; snapshots extracted
Docking run against each conformer
Scores aggregated per SMILES
80th/20th percentile filters applied
Top 500 compounds selected
Phase 4 — Boltz2 (Optional)
Co-folding run on top 500–1,000
TM Score ≥ 0.5 gate applied
Primary readout selected based on calibration (pIC50 or affinity_probability)
Top 100–200 candidates identified
Final Selection
Seed set defined (top 10 CNN affinity + top 10 binding energy)
Morgan ECFP4 fingerprints computed
Tanimoto distances computed
MaxMin selection run to target N (50–100)
UMAP/PCA of final set verifies diversity
0.1 ns MD validation run on all final candidates
Full MD validation (1–10 ns) on top 20–30 candidates
Final compound list prepared for procurement or synthesis
Your final experimental compound set should represent diverse scaffolds, pass physical plausibility filters, show strong predicted binding energies across multiple engines, and include both the best-scoring hits and structurally distinct representatives from each major activity cluster.
12. Glossary
| Term | Definition |
|---|
| Best Affinity (kcal/mol) | Classical empirical binding energy from docking. Lower (more negative) = stronger predicted binding. |
| Best Intramol (kcal/mol) | Intramolecular strain energy of the ligand in its docked conformation. Values > 0 indicate physically implausible geometry (hard veto). |
| CNN Affinity | Convolutional neural network-predicted binding affinity. Trained on crystal structures; captures geometric and energetic features beyond classical scoring. |
| CNN Pose Score (0–1) | Binary-style geometric validation of the docked pose. ≥ 0.6 = physically plausible geometry. |
| Exhaustiveness | Docking algorithm search parameter. Higher = more thorough search but slower. Use 8 for triage; 200 for calibration. |
| pIC50 | Negative log₁₀ of IC50 in molar units. Higher = more potent. pIC50 = 9 corresponds to IC50 = 1 nM. |
| TM Score | Template Modeling score from Boltz2. Measures structural similarity of predicted complex to a plausible reference. ≥ 0.5 indicates reliable fold prediction. |
| iPTM | Interface predicted TM score. Measures quality of the binding interface specifically. > 0.6 preferred. |
| Affinity Probability | Boltz2 model confidence in a binding event occurring. Supplements pIC50 as a tiebreaker. |
| MaxMin Selection | Diversity selection algorithm. Iteratively adds compounds maximally distant from the current set by Tanimoto distance. |
| Tanimoto Similarity | Fingerprint-based pairwise molecular similarity metric (0–1). Tanimoto Distance = 1 − similarity. |
| Morgan / ECFP | Extended Connectivity FingerPrint. Circular fingerprint encoding chemical neighborhoods around each atom. ECFP4 (radius 2) is standard. |
| UMAP | Uniform Manifold Approximation and Projection. Dimensionality reduction for chemical space visualization capturing both global and local structure. |
| MMPBSA/MMGBSA | Molecular Mechanics Poisson–Boltzmann / Generalized Born Surface Area. Free energy calculation methods for estimating binding affinity from MD trajectories. |
| R² (Calibration) | Coefficient of determination between predicted docking scores and experimental IC50 values. Primary metric for engine selection. |
| SMILES | Simplified Molecular Input Line Entry System. String representation of molecular structure. Used as canonical identifier for deduplication. |