Generating 3D Structures

The Problem You are Trying to Solve:
“I have a molecular target (protein) without an experimentally resolved structure, and I want to obtain a 3D structural hypothesis suitable for analysis and structure based drug design and discovery.”

This problem can be traditionally difficult to solve, as many targets lack resolved structures, experimental structures may be incomplete or unavailable, and structure prediction must be interpreted with confidence to be valuable in downstream steps of drug discovery. Traditional methods include NMR, X-ray Crystallography, and Cryo-Electron Microscopy. Now with new developments in the AI space, it is becoming more prevalent to use engines to predict biological structures that are traditionally hard to resolve experimentally. Solution
This workflow enables users to a) determine whether a resolved structure exists or b) generate a high-confidence AI-predicted structural model using a variety of different protein folding and co-folding algorithms that are available on Revilico’s Operating System Platform. We also have a variety of engines for establishing co-crystal predicted structures of compounds bound to protein targets along with DNA, RNA, and Protein co-folding options to ensure you can properly represent your biological system computationally. Proper conformations of the ligand and confidences of the protein structure are calibrated using a model-hedging method to help negate singular model biases (i.e. Each algorithm has distinct training data and inductive assumptions, so by comparing and averaging across them, we reduce the resulting errors and increase confidence in our generated structures). What Data Do I Need to Provide?

Protein name or identifier (Required if you are searching databases for representative structures)
Protein amino acid sequence (Required to use protein folding algorithms)
Known ligands, DNA/RNA sequences, or binding partners (Optional, but required for algorithms that co-fold binding partners to protein targets)
Desired binding pocket or conformation (Optional, but somewhat required for co-folding ligands)
Other protein sequences (Optional, but required if you are looking at protein protein interactions)

Workflow

Identify Existing Experimental Structures

Determine whether a resolved structure already exists. To do this on Revilico, users can query structural databases (like the PDB or Uniprot) and evaluate resolution, coverage, and biological relevance with RevilicoGPT and Revilico Agent. Sample Query: I am evaluating AXL for Triple Negative Breast Cancer and want a scope of all the publicly available protein structures I can use for structure based drug designs. Already known inhibitor co-crystal structures would be optimal. RevilicoGPT and Revilico Agent will output experimental structure files (if available), and their corresponding coverage and quality annotations. If a suitable structure exists, users can proceed with downstream analysis using this obtained 3D structure. If no suitable structure exists, the user can proceed to AI structure prediction. Usually, we will look for target structures that already have well known co-crystal structures of inhibitors so that we can use it as a benchmark for our computational screening, and we take the structure into Pymol or ChimeraX to remove waters, the ligand, and ions to clean and prepare the structure for downstream docking and analysis. Sometimes, you will also see in public databases that there are proteins that have full coverage or multimer configurations, but for proteins like kinases with a simplified kinase domain, we extract just the portion of the protein (with the active ATP binding site) and a ligand co-crystal (if available) to use as our baseline structure.

Predict Structure from Sequence

Generate a de novo 3D structural hypothesis. Taking a protein amino acid sequence as an input, the AlphaFold, Boltz1/2, and OpenFold engines will produce predicted 3D structure(s), per-residue confidence scores, and model provenance and metadata. To understand how to read out the different metrics, you can refer to the engine specific documentation here: AlphaFold, Boltz1/2, OpenFold. It is important to note that outputs of AlphaFold, Boltz1/2, and OpenFold are labeled as hypotheses, not experimentally resolved structures. Proper analysis of the structure along with per residue PTM, pLDDT, and confidence scores should be assessed to help you gain confidence on your generated hypothesis.

Ligand-DNA-RNA or Conformation-Specific Refinement (Optional)

This step serves a specific case, when a specific binding pocket is known and a ligand or interacting protein is available. Users can input the protein structure or sequence, and ligand or binding partner into the Boltz or BoltzGen co-folding engines to produce protein-ligand or protein-protein complex structures with pocket-specific conformational hypotheses. Boltz2 co-folding is specifically geared towards generating ligand protein binding hypotheses and activity values. OpenFold is mainly utilized for protein protein interaction pairs, DNA-Protein, RNA-Protein, etc. biological pairs to get a more nuanced biological model for analysis. Results

Versioned 3D structural hypothesis
Confidence and provenance metadata
Structures ready for docking, simulation, or analysis

If you see that several models are in agreement on parameters and confidences, you can move forward with your computational campaign with greater trust in strong SBDD foundations. You should look for protein domain and region specific confidence scores to help validate assumptions on your target (i.e. flexible loops usually will have lower confidence, and structures that are more static/rigid should have higher predicted confidences) Now what? I have my structure and want to run a SBDD campaign!

Still pending: List of all the solutions that take in a protein 3D structure for SBDD

Why Revilico?
This Revilico workflow enables users to query the availability of experimental data before proceeding to AI structure prediction. All 3D structural hypotheses include transparent quality scoring, and can be seamlessly integrated with downstream discovery engines and workflows in a multi-modal way to help hedge all of your results against one another.