Optimizing Molecular Properties

The Problem You Are Trying to Solve
“I have a moderately active molecule, and I want to optimize it for higher potency, better selectivity, and improved developability.”

At this stage of hit-to-lead, the goal is no longer just to find binders, but to systematically improve a known chemical series. This is challenging because gains in one dimension (potency) often come at the expense of others (selectivity, solubility, safety, or synthetic feasibility). Traditional optimization relies on slow, iterative medicinal chemistry cycles supported by assays and structural reasoning, which can be costly and time-consuming. With modern AI-assisted workflows, we can propose and evaluate many optimization hypotheses in parallel, but only if generation, scoring, and validation are tightly integrated and interpretable. This allows for multi parameter optimization for every molecule designed and eventually synthesized prior to being sent into the lab. Solution
This workflow enables users to iteratively optimize a moderately active molecule using Revilico’s integrated generative, analytical, and structure-based engines, with Molecular Optimization as the core driver. At a high level, the workflow:

Generates improved analogs of the starting molecule
Scores and prioritizes them using fast, complementary signals
Validates top candidates with higher-fidelity methods
Repeats the loop as needed until clear lead candidates emerge

The emphasis is on a controlled optimization loop, rather than one-off generation. What Data Do I Need to Provide?
Required

Starting molecule(s) as SMILES (moderately active compounds)
Target protein structure (experimental or predicted)
Clear optimization intent (e.g. “increase potency without increasing lipophilicity”)

Optional (highly recommended)

Known liabilities or constraints (avoid motifs, MW limits, polarity ranges)
Selectivity context (off-targets or related proteins)
Property priorities (potency vs PK vs safety tradeoffs)
Historical activity or ADMET data (for QSAR guidance)

Workflow

Establish an Optimization Baseline

Before generating new molecules, establish what “better” means. On the Revilico Operating System, users typically:

Review Static Docking, Flexible Docking, or Ensemble Docking results (and/or experimental data) for the starting molecule to understand binding modes and pose stability
Use Pharmacophore Analysis and QSAR Modeling to identify key interactions, liabilities, and regions of the molecule that drive activity or risk
Decide whether optimization should be conservative (close analogs via Molecular Optimization with tight similarity constraints) or more exploratory (relaxed similarity, scaffold or substituent changes)

This baseline informs how aggressively the Molecular Optimization Engine should push the chemistry and which constraints or objectives should guide the optimization loop. Usually, the docking and activity engines will create a baseline for structure activity relationships that can be leveraged during downstream compound generations.

Generate Optimized Analogs

This is the core step of the workflow. Using Revilico’s Molecular Optimization Engine, users input their moderately active molecule(s) and define optimization goals. The engine then generates new “siblings” of the starting compound, iteratively improving them under a multi-objective scoring framework. This can be done through a ‘global iteration’, allowing the engine to modify any portion of the compound, or a more conservative approach of scaffold decoration, preserving core motifs driving activity, while maintaining other side chains or R-groups that can be iterated against. Typical optimization goals include:

Improving predicted binding or docking performance
Staying within desirable physicochemical ranges
Penalizing known liabilities or unstable motifs
Maintaining similarity to the active series (or relaxing it, if needed)

The engine works by repeatedly:

Generating candidate molecules
Scoring them against defined objective functions (engines that help predict parameters)
Updating the generator model during reinforcement learning to favor better chemistry as predicted by the scoring functions.

This balances exploitation (refining what works) with exploration (avoiding local SAR traps). This generates a ranked set of optimized molecular hypotheses and clear scoring summaries explaining why molecules were favored.

Rapid Structural and Statistical Triage

Once a new batch of optimized analogs is generated, users typically apply fast screening layers to prioritize which molecules are worth deeper analysis in other Revilico Engines, depending on what properties need to be optimized (like activity in this case). Docking

Used to evaluate binding modes and relative affinity trends across static, flexible, and ensemble conditions
Helps eliminate obvious false positives
Provides structural intuition for SAR decisions

QSAR Modeling

Uses data-driven patterns to predict activity, selectivity, or developability signals
Scales well across larger libraries
Complements docking by capturing non-structural trends

At this stage, the goal is down-selection, not final validation.

Validate Top Candidates with Dynamics and Energetics (Optional)

For the most promising candidates, users can integrate more expensive but higher-confidence methods. Protein-Ligand Molecular Dynamics

Tests binding stability over biologically relevant time scales
Reveals water effects, flexibility, and pose robustness
Helps eliminate unstable or over-fit docking poses
This engine is also equipped with snapshot Free Energy Perturbation (FEP) calculations using MMPSA and MMGBSA to get more accurate read outs of energies.

Free Energy Perturbation (RBFE / ABFE)

Provides quantitative ranking within a focused chemical series
Particularly useful when choosing which compounds to synthesize next
Often used as a final filter before experimental commitment
Is capable of highly resolving the breakdown of binding energy contributors for more resolved understanding of ligand protein engagement.

These steps are optional but powerful when optimization decisions become costly, and chemical space becomes more narrow.

Iterate the Optimization Loop

Based on what you learn:

Refine scoring objectives
Adjust similarity constraints
Introduce new penalties or priorities
Re-run Molecular Optimization with updated guidance

Most hit-to-lead campaigns cycle through this loop multiple times, progressively narrowing toward a small set of strong lead candidates, overall helping to save a tremendous amount of time and money on synthesized compounds which can eventually fail. Results

Iteratively improved compound series
Clear rationale for why each optimization step was taken
Reduced uncertainty before synthesis or experimental testing
A small, prioritized set of lead-like candidates ready for the next stage

When improvements are consistently supported across generation, docking, QSAR, and (optionally) physics-based validation, users can proceed with much higher confidence. Now what? I’ve identified optimized candidates, but what’s next?

After identifying these candidates and optimizing their activities and other properties using generative chemistry, you can move forward with Retrosynthesis to design and explore synthetic routes to utilize.
If you’d like to re-analyze the compound set for different key properties that were also flagged for optimization, the rest of the operating system suite can be used for this as well.

Why Revilico?
Revilico enables a closed-loop optimization workflow where molecule generation, scoring, and validation are tightly integrated. Rather than relying on a single signal, users can hedge decisions across generative chemistry, structure-based modeling, and data-driven analytics, allowing optimization to move faster without sacrificing scientific control or interpretability.