Skip to main content

Why Use This Engine?

In the documentation below, we will use Revilico’s AlphaFold Engine and Openfold engine to design protein structures with high confidence for downstream applications (i.e. Pocket Identification, Docking, Molecular Dynamics Simulations, etc.). The core foundation of Computational Chemistry begins after you identify your target and generate its structure. Protein folding algorithms are such a large breakthrough because it allows us to now utilize structure based drug discovery approaches, enabling us to take a more targeted engineering approach to a previous meticulous guess and check process.
Protein Folding Workflow

Background

Protein structure determines function, binding sites, and druggability. A protein’s 3D shape dictates what molecules it can bind, what reactions it catalyzes, and whether it can be targeted by drugs. Without structural information, drug discovery relies on trial and error rather than rational, structure based design. At its root protein structure required experimental determination using X-ray crystallography, a process that was both costly, time consuming, and had a fairly large failure rate. This is where Alphafold comes into place. Alphafold is a deep learning system that predicts a protein’s 3D structure from its amino acid sequence by learning evolutionary patterns and physical constraints from solved structures, predicting accurate structures at a fraction of the cost traditionally required to render the structure experimentally. In this guide, we will learn how to run AlphaFold, understand the theory behind it, and gain the intuition needed to derive novel insights from this pipeline. We will cover two Protein Folding engines on our platform, AlphaFold and OpenFold (a reiteration of AlphaFold with greater configurability suitable for research groups that need fine grained control over the prediction process rather than for production use. Simply, we have Alphafold2, an open-sourced version of the core model, and Openfold which resembles Alphafold3, a model traditionally reserved for enterprise. Structure Generation Workflows In order to run the AlphaFold Engine, we will do the following (1) name our pipeline (e.g. Pipeline #1), (2) upload protein sequence as csv or manual input, (3) configure the following parameters: Num Relax, Template Mode, MSA Mode, and Pair Mode. We will then run the pipeline, and open results once the pipeline has completed running. You can find your results in the Central Hub on the Command Center. The following workflow occurs on the backend. The first step is sequence validation. It takes the amino acid sequence input and standardizes the input (e.g. checking for valid amino acid codes, removing white space, and validating sequence length). This is then passed to MSA generation where it searches sequence databases, aligns homologous sequences (e.g. aligning all the similar protein sequences based on their residue number), and produces Multiple Sequence Alignment (MSA) showing conservation (e.g. amino acid at a particular residue do not change across all similar protein sequences) and co-evolution (e.g. if position 10 mutates with a positive change, position 85 might compensate by mutating with a negative charge). What this means is that a deep MSA will mean that we have high confidence in the structure as it shows consistency across multiple structures, and vice versa a shallow MSA will have low confidence as it does not have as many structures that are similar to it. If enabled, following the MSA step we will do the template search step, where it will search the PDB database for structures with sequence similarly and extract distance constraints from these structures, noting that these templates act as a soft hint rather than a hard constraint (i.e. a suggestion for how it should be structured rather than a command fixing the structure). From there, given the context of MSA and Template search, features are then extracted and fed into a neural network, where 5 ranked models are generated, each with per-residue confidence (pLDDT), and Inter-residue confidence (PAE). Additionally with the num relax parameter, it will take the top n output structures (i.e. 0, 1, 5), and use the AMBER99SB force field to fix geometric issues (i.e. remove atom overlap, correcting bond angels, and optimizing side chain positions).

Interactive AlphaFold Viewer

Explore AlphaFold protein structure prediction results in an interactive 3D viewer. View predicted structures colored by confidence (pLDDT), compare ranked models, and analyze per-residue quality metrics.

Interactive OpenFold Viewer

Explore OpenFold protein structure prediction results in an interactive 3D viewer. OpenFold provides greater configurability for research groups needing fine-grained control over the prediction process.