End-to-End Virtual Screening Campaign Guide

Filtering Mega Chemical Spaces into Experimental-Ready Compound Sets This guide provides a complete, engine-by-engine protocol for running a computational drug discovery virtual screening campaign — from initial library ingestion through final compound selection for wet lab testing. It covers static docking, flexible docking, ensemble docking, Boltz2 co-folding, chemical space analysis, and diversity-driven final selection.

Follow the funnel in order. Each phase feeds the next. Do not skip phases.

1. Campaign Philosophy & Funnel Architecture

Virtual screening campaigns operate as a staged triage funnel. The fundamental trade-off at every stage is compute speed vs. accuracy. Faster, lower-accuracy methods process large libraries to remove obvious non-binders, while slower, higher-accuracy methods are reserved for the smaller, pre-enriched pools that survive each cut.

The Three-Phase Funnel

Phase 1 — Static Docking: Process millions of compounds cheaply. Hard-filter on binding affinity. Carry forward the top 2–5%.
Phase 2 — Flexible Docking: Introduce receptor flexibility + CNN rescoring. Multi-dimensional filter with physical plausibility gates. Output ~3,000–4,000 compounds.
Phase 3 — Ensemble Docking: Use molecular dynamics-derived protein conformers. Aggregate across snapshots. CNN affinity + pose filters. Output ~500 compounds.

Alternative Funnel (if benchmarks are operating well; use when in the regime of 3–10k compounds that still require filtration):

Phase 4 (optional) — Boltz2 Co-folding: Deepest structural predictions. Gate on TM score, iPTM, pIC50 or affinity probability. Output top 50–100 candidates.

Master Decision Summary — Three-Phase Docking Pipeline

Engine	Library Size	Primary Metric	Key Threshold	Output / Next Stage
Static Docking	Full library (1M–2M)	Best Affinity (kcal/mol)	< −8 kcal/mol	Top 2–5% → Flexible Docking
Flexible Docking	~40–50k filtered	CNN Affinity (desc.)	Intramol ≤ 0; CNN Pose ≥ 0.6	Top 3–4k → Ensemble Docking
Ensemble Docking	3–5k	CNN Affinity (aggregated)	80th/20th percentile filters	Top 500 → MD/FEP

Alternative Pipeline

Engine	Library Size	Primary Metric	Key Threshold	Output / Next Stage
Boltz2 Co-folding	500–1,000	pIC50 / affinity_prob	TM ≥ 0.5; iPTM > 0.6	Top 50–100 → Diversity + Wet Lab

Library sizes above are representative; scale thresholds based on your compute budget and target class. Calibration R² always drives engine selection — see Section 3.

2. Pre-Screening: Library Preparation

2.1 Source & Deduplication

Before any docking begins, prepare the compound library to ensure clean, unique chemistry.

Consolidate all library sources (sub-batches, vendors, internal plates) into a single file.
Deduplicate on canonical SMILES string — not on compound identifier or name.
Record: total input count, unique count, duplicate count. This becomes your baseline.
Log the deduplication statistics explicitly. They matter for tracking compound attrition across the funnel.

Deduplication by SMILES ensures unique chemistry evaluation. Duplicates skew frequency-based rankings and waste compute.

2.2 ADMET & Physicochemical Pre-Filtering (Optional Pre-Screen)

For very large libraries (>500k), applying lightweight physicochemical filters before docking can reduce noise and runtime. These are not required but are recommended when the library source is broad (e.g., a general commercial collection rather than a focused set).

Lipinski Ro5: MW ≤ 500, logP ≤ 5, H-bond donors ≤ 5, H-bond acceptors ≤ 10
QED score (Quantitative Estimate of Drug-likeness): QED ≥ 0.4 for lead-like filtering
TPSA: ≤ 140 Å² (general oral bioavailability proxy)
Rotatable bonds: ≤ 10 (conformational complexity filter)
Pan-assay interference compounds (PAINS): flag and optionally remove reactive, promiscuous scaffolds

You can use RevADMET for this task.

Apply these only if your library is chemically unfiltered. For targeted collections (e.g., Enamine US/UA stock), these filters add marginal value and may remove valid hits.

2.3 Protein Target Preparation

The quality of the protein structure drives the quality of every downstream docking run. Do not skip this step.

Source a high-resolution crystal structure for your target (from PDB or AlphaFold2/3 if no crystal exists).
Remove all co-crystallized ligands, water molecules, and ions that are not part of the binding site.
Retain metal cofactors that are known to participate in binding (e.g., zinc in coordination sites).
Verify the intended chain(s) are present; strip extraneous chains not needed for the screen.
Define the docking box: center on the known binding site or active site. A 30 × 30 × 30 Å box is typical for most pockets; adjust based on pocket size.
Run a short MD simulation (10 ns minimum) to check protein stability in solution before running ensemble docking downstream. Monitor RMSD, RMSF, and radius of gyration for convergence.

3. Calibration Strategy

Calibration is the most important step that most campaigns skip. Before screening thousands or millions of compounds, you must establish which scoring function actually correlates with experimental potency for your specific target. R² from IC50 calibration dictates every engine and readout selection decision downstream.

3.1 What to Calibrate

Use a set of compounds with known experimental activity (EC50, IC50, Ki) against your target.
10–20 diverse compounds with at least a 10-fold range in potency is the minimum. Wider is better.
Include both active and inactive compounds if available.
Run this calibration set through every docking modality you plan to use (static, flexible, ensemble, Boltz2).

3.2 Calibration Metrics to Compare

Method / Readout	Interpretation
Ensemble Docking — CNN Affinity	Highest accuracy; preferred primary readout
Ensemble Docking — CNN Pose Score	Strong geometry validation
Flexible Docking — CNN Affinity	Good fallback; used when ensemble not run
Ensemble Docking — Best Affinity	Moderate; supplement with CNN metrics
Flexible Docking — Best Affinity	Weaker; use only as supporting signal
Boltz2 — pIC50 / log₁₀(IC50)	Exploratory; gate with TM score / iPTM
Boltz2 — Affinity Probability	Weak; use as tiebreaker only

Utilize Pearson Correlations, Spearman Coefficients, and RMSE/MAE to calibrate. Calibrations should be done based on the number of compounds available on your test sets as well (n sensitivity).

R² > 0.7 is a reliable readout for ranking — use as primary score.
R² 0.4–0.7 is a usable supporting signal — combine with a stronger readout.
R² < 0.4 should not be used as a standalone primary metric.

Always re-run calibration on your specific target — values will vary by protein class, binding site character, and library chemistry.

3.3 Calibration Decision Rule

Aim for R² > 0.8: Calibrate across static, flexible, ensemble docking, and co-folding. If you get to >0.8 Pearson, use that as your primary guide for downstream filtering.
If R² > 0.6: Attempt to utilize other engines that represent the biology better, like ensemble docking.
If primary calibration metrics are weak: Fall back to affinity_probability as tiebreaker, gated by TM score and iPTM.

Calibration is not a one-time activity. Re-calibrate whenever you change the binding site definition, protein model, or add flexibility to new residues.

4. Phase 1 — Static (Rigid) Docking

Static docking treats both the protein and ligand as rigid bodies. It is the fastest method, making it the only practical choice for screening libraries in the millions. The purpose here is rapid triage: eliminate obvious non-binders, not to find the best poses. It is driven by GPUs so it is able to move much faster.

4.1 Setup Parameters

Protein: rigid receptor, prepared as described in Section 2.3
Ligand: rigid SMILES input; do not generate flexible conformers at this stage
Exhaustiveness: 8 for full library triage; increase to 16–32 for smaller batches if time allows; 200 only for calibration compounds
Docking box: 30 × 30 × 30 Å centered on binding site (adjust per target)
Batch processing: split large libraries into batches of 100k–250k for parallelization and fault tolerance

4.2 Key Metrics & Thresholds

Metric	Threshold	Notes
Binding Affinity	< −8 kcal/mol (strict)	Hard cutoff. Compounds weaker than −8 kcal/mol are deprioritized for downstream runs
Binding Affinity — strong hits	< −10 to −15 kcal/mol	Compounds in this range should be forwarded preferentially to flexible/ensemble docking
Number of poses	≥ 9 poses generated	Inspect top 3 poses for pharmacophoric match to known binding residues
Exhaustiveness	8 (screening) → 200 (calibration)	Use low exhaustiveness for full library triage; increase to 200 only for calibration compounds

4.3 Filtering Logic

Remove all rows with missing or null affinity scores.
Sort by Best Affinity ascending (most negative = strongest binding).
Apply hard cutoff: Best Affinity < −8 kcal/mol.
From the passing compounds, take the top 2–5% by affinity for the next phase.
For targets where known active compounds cluster at −10 to −15 kcal/mol, set the threshold accordingly.

Distribution sanity check:

Compute mean, median, and standard deviation of Best Affinity across the full screened set.
If mean affinity is weaker than −8 kcal/mol, your library may not contain quality binders for this pocket, or the pocket definition needs adjustment.
Validate that your known calibration hits fall in the top 5–10% of the distribution.

Static docking will produce false positives. The purpose of this stage is speed-based enrichment only. All static docking hits must be validated through flexible or ensemble docking.

4.4 Chemical Space Check (Post Phase 1)

After extracting your top compounds, plot a UMAP or PCA of the filtered set using Morgan fingerprints (ECFP4, radius 2, 2048 bits). Verify:

The shortlisted compounds are not all clustered in one scaffold region (confirms chemical diversity in your carry-forward set).
Known active compounds, if available, fall within or near the dense regions of the top-scoring set.
Isolated outliers in chemical space are not artifacts of the library (check their raw docking scores).
You can sample across the distribution of chemical spaces to take a more diverse set into further screening. At the end of the day, you are triaging all of these into the wet lab to get primary SAR before lead optimization, so more chemical diversity helps to get more shots on target with diverse chemistries before optimizing within constrained spaces.

5. Phase 2 — Flexible Docking with CNN Rescoring

Flexible docking allows specified protein side chains (residues in the binding site) to move during the docking calculation, and incorporates a Convolutional Neural Network (CNN) to re-score poses based on geometric and energetic realism. This dramatically improves accuracy over static docking at moderate compute cost.

5.1 Setup Parameters

Protein: same structure as Phase 1, but with designated flexible residues enabled
Flexible residues: select residues with known pharmacophoric roles (from co-crystal data, mutagenesis, or MD RMSF analysis). Typically 2–5 residues. Do not make the entire protein flexible.
CNN re-scoring: enabled; adds a geometry-aware neural network on top of classical docking scoring
Exhaustiveness: 8 (default for flexible screen); increase to 32 for highest-priority batches
Input: top-scoring compounds from Phase 1 static screen

5.2 Outputs Produced

Best Affinity (kcal/mol): classical empirical binding energy; lower = more favorable
Best Intramol (kcal/mol): intramolecular strain energy of the ligand in the docked pose; higher = more strained
Best CNN Pose Score (0–1): CNN-based assessment of pose geometry and physical realism; higher = more realistic
Best CNN Affinity: CNN-predicted binding affinity (pK metric); higher = stronger predicted binding

5.3 Multi-Dimensional Filtering Pipeline

The flexible docking filter is a sequential QC funnel, not a single cutoff. Apply in order:

Filter / Gate	Threshold	Purpose
Intramol Strain Veto	Best Intramol ≤ 0 kcal/mol	Removes physically strained (implausible) conformations
CNN Pose Quality Gate	Best CNN Pose Score ≥ 0.6	Validates geometric realism of the docked pose (0–1 scale)
Binding Energy Floor	Best Affinity ≤ −10 kcal/mol	Ensures minimum thermodynamic favorability for the pocket
Primary Ranking	Best CNN Affinity (desc.)	Final sort: highest CNN affinity compounds advance

Final output selection:

After all gates pass, sort by Best CNN Affinity descending.
Select top N compounds (typically 3,000–4,000 as input to ensemble docking).
Retain a backup pool (top 4,000) if downstream ensemble docking yields insufficient hits.

CNN Pose Score ≥ 0.6 is a geometry threshold, not a strict binary. If your target class shows systematically lower pose scores (e.g., allosteric or shallow sites), adjust downward — but document the change.

5.4 Metric Definitions Reference

Best Affinity: Additive, empirical energy calculation. Reflects thermodynamic favorability of the interaction. Lower (more negative) is better.
Best Intramol: Minimum intramolecular energy of the ligand in its docked conformation. Values > 0 indicate steric clashes or physically impossible geometries. Acts as a hard veto.
Best CNN Pose Score: Binary-style geometric validation (0–1 scale) assessing whether the pose looks physically realistic based on thousands of known crystal structures. Values ≥ 0.6 indicate plausible poses.
Best CNN Affinity: Neural network-predicted binding affinity derived from pose geometry and energetics. This is the highest-quality readout from flexible docking and the primary ranking signal.

6. Phase 3 — Ensemble Docking

Ensemble docking accounts for protein conformational dynamics by docking compounds against multiple representative protein structures, each from a different point in a molecular dynamics trajectory. This captures the protein’s natural flexibility beyond individual side chains and substantially reduces false positives.

6.1 Generating the Protein Ensemble

Run an MD simulation of the apo or holo protein for at least 10 ns (100 ns preferred for full equilibration).
Confirm equilibration: RMSD should plateau; RMSF should show stable core with defined flexible loops; radius of gyration should level off.
Extract representative snapshots at regular intervals (e.g., every 2 ns for a 10 ns simulation = 5 conformers; every 10 ns for 100 ns = 10 conformers).
Optionally exclude outlier snapshots where known binding site geometry is disrupted.
Run docking against each conformer independently, then aggregate scores per compound.

6.2 Score Aggregation per Compound

For each unique compound (identified by SMILES), aggregate across all conformers:

Best CNN Affinity → take the maximum across all conformers
Best Affinity → take the maximum across all conformers
Best Intramol → take the minimum across all conformers

This aggregation captures the best observed interaction of the compound with any accessible protein conformation. It is more informative than any single-conformer score.

6.3 Filtering Pipeline

Step	Operation	Rationale
1. Dedup / Aggregate	Per SMILES: CNN Affinity → max; Best Affinity → max; Intramol → min	Collapses multiple poses per compound to single representative scores
2. CNN Affinity Filter	≥ 80th percentile of set	Selects top-scoring compounds by neural network affinity prediction
3. Best Affinity Filter	≤ 20th percentile (within subset)	Confirms thermodynamic favorability via classical scoring within the CNN-filtered pool
4. Intramol Veto	Best Intramol < 0 kcal/mol	Hard exclusion of strained structures
5. Final Rank	CNN Affinity (desc.)	Sort and take top N

Percentile-based thresholds (80th/20th) are relative to the screened set. Re-compute percentiles after aggregation, not from pre-aggregation raw scores.

6.4 Why Ensemble Docking Outperforms Flexible Docking

Flexible docking moves only designated side chains. Ensemble docking samples backbone movements and global conformational states that flexible docking cannot reach.
CNN Affinity R² typically improves by 15–25% going from flexible to ensemble docking in a well-calibrated system.
Ensemble docking is significantly more compute-intensive. Reserve it for the pre-filtered pool (3k–5k compounds) from Phase 2, not the full library.

7. Alternative Pipeline — Boltz2 Co-folding

Boltz2 is a structure prediction model that co-folds a protein–ligand complex from sequence and SMILES, generating predicted binding poses and associated confidence metrics. It is the most computationally expensive per-compound method and is reserved for the top 500–1,000 candidates from ensemble docking.

7.1 Key Outputs from Boltz2

Readout	Good Range / Threshold	Interpretation / Notes
pIC50	> 5.0 (i.e. IC50 < 10 μM)	Predicted potency in log scale. Use as primary ranking when IC50 calibration R² > 0.5
Affinity Probability	> 0.5 (higher = better)	Model confidence in a binding event. Use as tiebreaker or when IC50 calibration is weak
Predicted TM Score	≥ 0.5	Structural reliability of the co-folded complex. Gate: reject compounds below threshold regardless of affinity
iPTM (interface pTM)	> 0.6 preferred	Interface quality score. High iPTM with low TM = good binding pose but uncertain overall fold; still useful
Confidence Score	Use only as supporting signal	Low predictive correlation with experimental IC50 on its own; context-dependent

7.2 Ranking Strategy — Choosing the Right Readout

Use the calibration R² from Section 3 to select your primary readout:

If IC50 calibration R² ≥ 0.5 for pIC50: rank by pIC50 descending; gate by TM score ≥ 0.5 and iPTM > 0.6
If IC50 calibration R² < 0.5: use affinity_probability as tiebreaker, with TM score and iPTM as mandatory gates
In all cases: compute predicted_ln(ic50)_nM = log₁₀(predicted_ic50_nM) and include in export for downstream reference
Always apply confidence gates before using any potency ranking — a high pIC50 with a low TM score is not a trustworthy prediction

7.3 Gated Confidence Filtering

Gate 1 (Hard): TM Score ≥ 0.5. Predictions below this threshold indicate the model failed to produce a reliable fold and should be excluded.
Gate 2 (Soft): iPTM > 0.6. Interface quality. Values below this suggest the binding interface is not well-modeled; flag but do not necessarily exclude if other metrics are strong.
Gate 3 (Context-dependent): Affinity Probability > 0.5 as a supporting filter when potency data is unavailable.

Boltz2 calibrates better for some target classes than others. Co-folding performance is weakest for large allosteric sites, covalent binders, and metal-coordinated ligands. Use with appropriate skepticism and weight against docking data.

7.4 Hybrid Boltz2 + Ensemble Approach (Exploratory)

As an exploratory check, you can merge Boltz2 outputs with ensemble docking scores:

Apply the same ensemble pose filters (Section 6.3) to the merged set.
Rank by predicted_pic50 descending.
Use this as a comparison list, not as the primary deliverable.
If Boltz2 IC50 calibration is valid, the Boltz2-primary ranking supersedes the hybrid for final compound selection.

8. Chemical Space Analysis

Chemical space visualization is performed at two key points in the campaign: (1) after Phase 1 to verify diversity of the carry-forward pool, and (2) before final compound selection to ensure the final set covers the activity landscape. Skip this step and you risk selecting 50 structurally identical compounds with one structural scaffold, which doesn’t help you get the outputs you need for a primary screen — which is primarily SAR.

8.1 Fingerprinting

All chemical space analysis uses Morgan fingerprints (ECFP) as the molecular representation:

Morgan ECFP4: radius 2, 2048-bit vector. Standard default.
Morgan ECFP6: radius 3 for finer resolution of large, complex libraries.
Generate fingerprints for the full screened set and the shortlisted subset simultaneously for direct comparison.

8.2 Dimensionality Reduction Methods

Method	What It Captures	When to Use	Interpretation Tips
PCA	Global structural diversity (variance-maximizing axes)	First pass: gauge overall chemical space breadth	Tight clusters = structurally similar compounds; spread = diverse library
tSNE	Local neighborhoods and cluster relationships	When you need to identify sub-families or scaffold clusters	Not comparable across runs; don’t read into global distances
UMAP	Both global and local structure simultaneously	Default for visualizing screened sets; best middle ground	Clusters with high CNN affinity/color encoding = activity hotspots to prioritize

Recommended workflow:

Run UMAP as your default visualization. It balances global structure and local clusters.
Overlay activity scores (CNN affinity, binding energy) as a color dimension to identify activity hotspots.
Run PCA as a secondary check to validate that diversity metrics are not UMAP artifact-driven.
Run tSNE only when you need to investigate specific scaffold families or local cluster composition.

8.3 What to Look For

Good diversity indicators:

Top-scoring compounds (color-coded) are spread across multiple UMAP regions, not concentrated in one cluster.
High-affinity regions have some overlap with known active scaffolds but also extend into novel chemical space.
The shortlisted set (e.g., top 3k from flexible docking) covers most of the dense regions of the full filtered pool.

Red flags:

70% of top compounds fall within one tight UMAP cluster — indicates scaffold enrichment, not diverse chemistry.
Known active scaffolds are completely absent from the top-scoring set — may indicate a pocket definition problem.
The screened set collapses into a single dense region after filtering — likely because a narrow binding energy range was used; consider relaxing the threshold.

8.4 Centralized Clusters Correlated with High Activity

It is normal and expected for some chemical clusters to correspond to high CNN affinity or binding activity. This does not disqualify them. The goal is not to eliminate clusters, but to ensure that your final compound list is not exclusively drawn from one cluster while missing other potentially active regions. A compound list of 50 from one scaffold provides weak SAR; 50 compounds spanning 5–10 distinct scaffolds provides a robust SAR foundation.

9. Final Diversity-Driven Compound Selection

After all computational filters have been applied, you will typically have 100–500 compounds that pass potency and quality thresholds. Final selection to the experimental batch size (typically 50–100) requires balancing hit quality with chemical diversity.

9.1 Seed Set Definition

Start by anchoring the selection to confirmed high-quality compounds:

Extract the top 10 compounds by highest CNN affinity (or pIC50 if Boltz2 was primary).
Extract the top 10 compounds by lowest binding energy (most negative Best Affinity).
Deduplicate to produce a seed set. This ensures the final list retains the best-scoring compounds regardless of diversity algorithm outcome.

9.2 MaxMin Selection

MaxMin (Maximum Minimum Distance) is the standard diversity selection algorithm for compound libraries. It maximizes structural spread across the chemical space.

Compute Morgan ECFP4 fingerprints for all candidate compounds.
Compute pairwise Tanimoto distance = 1 − Tanimoto similarity for all compound pairs.
Initialize with the seed set (best-scoring compounds from 9.1).
Iteratively add the compound that is farthest from all currently selected compounds (maximizes minimum pairwise distance).
Continue until target selection size is reached.

Why MaxMin works here: By seeding with the top-affinity compounds, you guarantee the best hits are included. MaxMin then fills the remainder with maximally diverse chemistry, ensuring coverage of activity hotspots across the chemical space rather than oversampling one scaffold family.

9.3 Verification of Final Set

Before finalizing, verify the selected set using UMAP and PCA:

Plot seed compounds (blue), MaxMin-selected compounds (red), and the full candidate pool (grey).
Confirm seed and MaxMin compounds are distributed across the major UMAP clusters.
Verify no single cluster accounts for more than 30–40% of the final selected set (unless target biology demands scaffold specificity).
If diversity is inadequate, relax the seed set size or broaden the input candidate pool.

Chemical diversity in the experimental set is not just aesthetically preferable — it directly determines the SAR value of the data returned from the wet lab. Redundant scaffolds give you redundant data.

10. Post-Selection: MD Simulation Validation

The top 50–100 compounds from the final selection should undergo individual molecular dynamics simulation before ordering. This step provides the highest-confidence computational validation and filters the list once more before incurring synthesis or procurement costs.

10.1 Short MD Run (0.1 ns) — Screening Mode

Run all final candidates at 0.1 ns in complex with the target.
Compute MMPBSA/MMGBSA binding free energy as an approximation.
Note: 0.1 ns simulations are pre-equilibrium. Energies will be overestimates but are useful for relative ranking and early flag-raising.
Flag any compound showing immediate ligand displacement or catastrophic pose degradation.

10.2 Full MD Run (1–10 ns) — Final Validation

Take the top 20–30 compounds from the 0.1 ns screen and run for 1–10 ns.
Monitor RMSD of the ligand in the binding site: stable RMSD < 2–3 Å indicates retained binding.
Monitor RMSF of key binding residues: unexpected rigidification or large fluctuations signal poor complementarity.
SASA (Solvent Accessible Surface Area): ligand binding should reduce solvent exposure in the pocket.
Compute final MMPBSA/MMGBSA energies. Reference: compounds with −15 to −20 kcal/mol binding free energies are strong candidates.

10.3 Binding Free Energy Benchmarks

Potency Category	IC50 Range	MMPBSA Range
Tight binders	Sub-100 nM IC50	−15 to −20 kcal/mol
Moderate binders	1–10 μM IC50	−8 to −15 kcal/mol
Weak / threshold binders	> 10 μM	< −8 kcal/mol

Proceed with weak/threshold binders only with strong computational evidence from other engines.

11. Complete Campaign Checklist

Pre-Screening

Library sourced and deduplicated by canonical SMILES

ADMET pre-filters applied (if applicable)

Protein structure prepared, cleaned, metal cofactors retained

Docking box defined and validated against known ligand binding site

MD simulation run; equilibration confirmed

Calibration

Calibration compounds with experimental IC50/EC50 values assembled

All docking modalities run on calibration set

R² table computed; primary readout for each phase selected

Engine selection documented

Phase 1 — Static Docking

Full library screened; batched as needed

Best Affinity < −8 kcal/mol filter applied

Top 2–5% of library carried forward

UMAP diversity check of carry-forward pool completed

Phase 2 — Flexible Docking

Flexible residues designated based on SAR/MD analysis

CNN rescoring enabled

Intramol ≤ 0 gate applied

CNN Pose Score ≥ 0.6 gate applied

Binding Energy ≤ −10 kcal/mol gate applied

Final sort by CNN Affinity; top 3–4k selected

Phase 3 — Ensemble Docking

MD ensemble generated; snapshots extracted

Docking run against each conformer

Scores aggregated per SMILES

80th/20th percentile filters applied

Top 500 compounds selected

Phase 4 — Boltz2 (Optional)

Co-folding run on top 500–1,000

TM Score ≥ 0.5 gate applied

iPTM > 0.6 gate applied

Primary readout selected based on calibration (pIC50 or affinity_probability)

Top 100–200 candidates identified

Final Selection

Seed set defined (top 10 CNN affinity + top 10 binding energy)

Morgan ECFP4 fingerprints computed

Tanimoto distances computed

MaxMin selection run to target N (50–100)

UMAP/PCA of final set verifies diversity

0.1 ns MD validation run on all final candidates

Full MD validation (1–10 ns) on top 20–30 candidates

Final compound list prepared for procurement or synthesis

Your final experimental compound set should represent diverse scaffolds, pass physical plausibility filters, show strong predicted binding energies across multiple engines, and include both the best-scoring hits and structurally distinct representatives from each major activity cluster.

12. Glossary

Term	Definition
Best Affinity (kcal/mol)	Classical empirical binding energy from docking. Lower (more negative) = stronger predicted binding.
Best Intramol (kcal/mol)	Intramolecular strain energy of the ligand in its docked conformation. Values > 0 indicate physically implausible geometry (hard veto).
CNN Affinity	Convolutional neural network-predicted binding affinity. Trained on crystal structures; captures geometric and energetic features beyond classical scoring.
CNN Pose Score (0–1)	Binary-style geometric validation of the docked pose. ≥ 0.6 = physically plausible geometry.
Exhaustiveness	Docking algorithm search parameter. Higher = more thorough search but slower. Use 8 for triage; 200 for calibration.
pIC50	Negative log₁₀ of IC50 in molar units. Higher = more potent. pIC50 = 9 corresponds to IC50 = 1 nM.
TM Score	Template Modeling score from Boltz2. Measures structural similarity of predicted complex to a plausible reference. ≥ 0.5 indicates reliable fold prediction.
iPTM	Interface predicted TM score. Measures quality of the binding interface specifically. > 0.6 preferred.
Affinity Probability	Boltz2 model confidence in a binding event occurring. Supplements pIC50 as a tiebreaker.
MaxMin Selection	Diversity selection algorithm. Iteratively adds compounds maximally distant from the current set by Tanimoto distance.
Tanimoto Similarity	Fingerprint-based pairwise molecular similarity metric (0–1). Tanimoto Distance = 1 − similarity.
Morgan / ECFP	Extended Connectivity FingerPrint. Circular fingerprint encoding chemical neighborhoods around each atom. ECFP4 (radius 2) is standard.
UMAP	Uniform Manifold Approximation and Projection. Dimensionality reduction for chemical space visualization capturing both global and local structure.
MMPBSA/MMGBSA	Molecular Mechanics Poisson–Boltzmann / Generalized Born Surface Area. Free energy calculation methods for estimating binding affinity from MD trajectories.
R² (Calibration)	Coefficient of determination between predicted docking scores and experimental IC50 values. Primary metric for engine selection.
SMILES	Simplified Molecular Input Line Entry System. String representation of molecular structure. Used as canonical identifier for deduplication.

High Throughput Screening

⌘I

​End-to-End Virtual Screening Campaign Guide

​1. Campaign Philosophy & Funnel Architecture

​The Three-Phase Funnel

​Master Decision Summary — Three-Phase Docking Pipeline

​Alternative Pipeline

​2. Pre-Screening: Library Preparation

​2.1 Source & Deduplication

​2.2 ADMET & Physicochemical Pre-Filtering (Optional Pre-Screen)

​2.3 Protein Target Preparation

​3. Calibration Strategy

​3.1 What to Calibrate

​3.2 Calibration Metrics to Compare

​3.3 Calibration Decision Rule

​4. Phase 1 — Static (Rigid) Docking

​4.1 Setup Parameters

​4.2 Key Metrics & Thresholds

​4.3 Filtering Logic

​4.4 Chemical Space Check (Post Phase 1)

​5. Phase 2 — Flexible Docking with CNN Rescoring

​5.1 Setup Parameters

​5.2 Outputs Produced

​5.3 Multi-Dimensional Filtering Pipeline

​5.4 Metric Definitions Reference

​6. Phase 3 — Ensemble Docking

​6.1 Generating the Protein Ensemble

​6.2 Score Aggregation per Compound

​6.3 Filtering Pipeline

​6.4 Why Ensemble Docking Outperforms Flexible Docking

​7. Alternative Pipeline — Boltz2 Co-folding

​7.1 Key Outputs from Boltz2

​7.2 Ranking Strategy — Choosing the Right Readout

​7.3 Gated Confidence Filtering

​7.4 Hybrid Boltz2 + Ensemble Approach (Exploratory)

​8. Chemical Space Analysis

​8.1 Fingerprinting

​8.2 Dimensionality Reduction Methods

​8.3 What to Look For

​8.4 Centralized Clusters Correlated with High Activity

​9. Final Diversity-Driven Compound Selection

​9.1 Seed Set Definition

​9.2 MaxMin Selection

​9.3 Verification of Final Set

​10. Post-Selection: MD Simulation Validation

​10.1 Short MD Run (0.1 ns) — Screening Mode

​10.2 Full MD Run (1–10 ns) — Final Validation

​10.3 Binding Free Energy Benchmarks

​11. Complete Campaign Checklist

​Pre-Screening

​Calibration

​Phase 1 — Static Docking

​Phase 2 — Flexible Docking

​Phase 3 — Ensemble Docking

​Phase 4 — Boltz2 (Optional)

​Final Selection

​12. Glossary

End-to-End Virtual Screening Campaign Guide

1. Campaign Philosophy & Funnel Architecture

The Three-Phase Funnel

Master Decision Summary — Three-Phase Docking Pipeline

Alternative Pipeline

2. Pre-Screening: Library Preparation

2.1 Source & Deduplication

2.2 ADMET & Physicochemical Pre-Filtering (Optional Pre-Screen)

2.3 Protein Target Preparation

3. Calibration Strategy

3.1 What to Calibrate

3.2 Calibration Metrics to Compare

3.3 Calibration Decision Rule

4. Phase 1 — Static (Rigid) Docking

4.1 Setup Parameters

4.2 Key Metrics & Thresholds

4.3 Filtering Logic

4.4 Chemical Space Check (Post Phase 1)

5. Phase 2 — Flexible Docking with CNN Rescoring

5.1 Setup Parameters

5.2 Outputs Produced

5.3 Multi-Dimensional Filtering Pipeline

5.4 Metric Definitions Reference

6. Phase 3 — Ensemble Docking

6.1 Generating the Protein Ensemble

6.2 Score Aggregation per Compound

6.3 Filtering Pipeline

6.4 Why Ensemble Docking Outperforms Flexible Docking

7. Alternative Pipeline — Boltz2 Co-folding

7.1 Key Outputs from Boltz2

7.2 Ranking Strategy — Choosing the Right Readout

7.3 Gated Confidence Filtering

7.4 Hybrid Boltz2 + Ensemble Approach (Exploratory)

8. Chemical Space Analysis

8.1 Fingerprinting

8.2 Dimensionality Reduction Methods

8.3 What to Look For

8.4 Centralized Clusters Correlated with High Activity

9. Final Diversity-Driven Compound Selection

9.1 Seed Set Definition

9.2 MaxMin Selection

9.3 Verification of Final Set

10. Post-Selection: MD Simulation Validation

10.1 Short MD Run (0.1 ns) — Screening Mode

10.2 Full MD Run (1–10 ns) — Final Validation

10.3 Binding Free Energy Benchmarks

11. Complete Campaign Checklist

Pre-Screening

Calibration

Phase 1 — Static Docking

Phase 2 — Flexible Docking

Phase 3 — Ensemble Docking

Phase 4 — Boltz2 (Optional)

Final Selection

12. Glossary