Production VS with 2M+ Compounds

Overview

This tutorial covers production-scale high-throughput virtual screening (HTS) in RevBind’s RevDock/RevScreen engine. Unlike small-batch static, flexible, or ensemble docking runs, this workflow is designed for screening multi-million compound libraries efficiently using GPU-enabled parallel batch execution.

High Throughput Screening with RevDock and RevScreen — Watch Video

When to Use This Workflow

This workflow is for production high-throughput screening — not the same as running static, flexible, or ensemble docking on a small compound set. Use it when:

You have a large compound library (hundreds of thousands to millions of compounds)
You want to maximize GPU throughput by running multiple pipelines in parallel
You are executing a primary screen before downstream hit filtering and validation

For smaller sets or higher-accuracy modes, see RevScreen - Static & Flexible Docking and RevScreen - Ensemble Docking.

Key Parameters

Parameter	Recommended Value	Notes
Batch size	~150,000 compounds	Optimized for GPU-enabled throughput
Exhaustiveness	8	Do not increase — compute scales disproportionately
Parallelization	One pipeline per batch	Run all batches simultaneously

Step-by-Step Walkthrough

Navigate to RevDock and RevScreen

From the Revilico OS dashboard, open Revbind, then navigate to RevDock → RevScreen. This is the production HTS interface — distinct from the static, flexible, and ensemble docking modes.

Select Your Target Protein Structure

Load the protein structure you want to screen against. Confirm the correct structure is selected before proceeding — for example, the apo (no inhibitor) form of your target receptor.

If you haven’t yet characterized the binding site, run a RevPocket analysis first to identify the druggable pocket and obtain the coordinates you’ll need for box placement.

Define the Binding Site and Docking Box

Review the target site and define the screening box:

Determine where the box should be placed based on prior structural analysis
Identify the key amino acids in the target binding site
Calculate the grid center for the docking region
Adjust box position until it is centered correctly on the pocket

For production runs, box placement is typically based on prior RevPocket output or guidance from your computational team.

Set Exhaustiveness to 8

Set Exhaustiveness = 8 for high-throughput mode. Do not increase this value — compute cost scales disproportionately at higher settings. The goal is efficient throughput with usable screening results, not maximum conformational sampling.

Raising exhaustiveness above 8 for a 2M+ compound screen will cause a significant and non-linear increase in compute time and cost. Reserve higher exhaustiveness values for smaller, late-stage confirmation runs.

Prepare Your Compound Batches

For large libraries, you will receive compounds pre-split into batches of approximately 150,000 compounds each. For a 2 million compound library, this generates roughly 13–14 batches.If your library is not already batched, you can:

Ask the Revilico team to split it for you at no cost
Use Claude Code or RevAgent to batch it yourself

Each batch should be a separate file (e.g., Batch_1.sdf, Batch_2.sdf, etc.).

Different compound libraries (e.g., Enamine REAL, custom sets) can be screened separately using the same workflow. Each library gets its own set of batch pipelines.

Configure and Launch Batch 1

Select Batch 1 as your compound library. Confirm:

The correct compound library batch is loaded
The correct protein structure is selected
The box and grid settings are confirmed from Step 3

Click Run Pipeline. Once the pipeline is created successfully, Batch 1 begins processing.

Parallelize All Remaining Batches

This is the key step that makes production HTS efficient. Rather than waiting for each batch to complete sequentially, run all batches in parallel:

Clear Batch 1 from the configuration
Load Batch 2, confirm settings, click Run Pipeline
Repeat immediately for Batch 3, Batch 4, and so on

Each batch gets its own pipeline. All pipelines run simultaneously across the GPU cluster, compressing total wall-clock time dramatically compared to a single sequential job.

The box placement, protein structure, and exhaustiveness settings remain identical across all batches. The only thing that changes is the compound library file.

Download Results and Run Downstream Analytics

Once all pipelines complete, your screening data becomes available for download. Export the results and run filtration and analytics using your preferred AI agent or analysis workflow.A follow-up tutorial will cover the full post-screen analytics step in detail.

Parallelization Strategy

The core performance principle of this workflow is parallelization over batching:

A single 2M compound job run sequentially is slow and resource-inefficient
Splitting into ~150K batches and running all pipelines simultaneously compresses wall-clock time significantly
GPU utilization stays high across the cluster rather than being bottlenecked by a single large job

This is why batch size (~150K) is tuned to GPU throughput capacity, and exhaustiveness is kept at 8 — the combination maximizes screening velocity without sacrificing result quality.

Next Steps

After your HTS run completes, the standard hit progression workflow is:

RevAnalytics — Filter by docking score, select top-ranked compounds, analyze interaction fingerprints
Flexible Docking — Re-dock your top hits with flexible residues and CNN rescoring for higher-confidence rankings
Ensemble Docking — Run your highest-priority compounds against MD-sampled protein conformations for maximum accuracy
RevFEP — Compute rigorous binding free energies for your final shortlist

​Overview

​When to Use This Workflow

​Key Parameters

​Step-by-Step Walkthrough

​Parallelization Strategy

​Next Steps

Overview

When to Use This Workflow

Key Parameters

Step-by-Step Walkthrough

Parallelization Strategy

Next Steps