Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.revilico.bio/llms.txt

Use this file to discover all available pages before exploring further.

Why Use This Engine?

In the documentation below, we will use Revilico’s RevPhospho engine to analyze phosphoproteomics data, infer kinase activity from phosphosite changes, map samples against cancer subtype signatures, and prioritize druggable kinase targets. This engine bridges the gap between raw mass spectrometry output and actionable drug discovery hypotheses by linking observed phosphorylation changes to the upstream kinases driving them.
RevPhospho Workflow

Background

Post-translational modifications (PTMs) are chemical modifications to proteins that occur after translation, dramatically expanding the functional diversity of the proteome beyond what is encoded in the genome. Phosphorylation, the addition of a phosphate group to a serine, threonine, or tyrosine residue by a kinase, is the most extensively studied PTM. It acts as a molecular switch that controls protein activity, localization, stability, and protein-protein interactions, making kinases among the most important and frequently targeted protein families in drug discovery. Phosphoproteomics is the large-scale measurement of phosphorylation events across the proteome using mass spectrometry. It produces lists of quantified phosphosites with associated fold-change values between experimental conditions, providing a snapshot of signaling network activity. However, the direct interpretation of thousands of individual site-level changes is challenging. RevPhospho addresses this by using kinase-substrate databases to aggregate site-level signals into per-kinase activity scores, identifying which kinases are most active in the sample, comparing these activity profiles against cancer subtype reference signatures, and ranking kinases by their combined activity, druggability, and clinical evidence.

Input Formats and Data Loading

RevPhospho accepts three input formats, auto-detected or specified by the user. MaxQuant format is the output from the MaxQuant proteomics software suite. The engine reads columns for gene names, amino acid, position, localization probability, and normalized ratio values. Reverse database hits and common contaminants (marked with ”+” in the corresponding columns) are filtered out before analysis. Phosphosites with a localization probability below 0.75 are excluded as their assignment to a specific residue is insufficiently confident. Generic CSV/TSV format accepts any tabular file with columns for gene symbol, site (e.g. S473), log2 fold-change, and p-value. The separator (comma or tab) is auto-detected. PhosphoSitePlus format accepts .txt exports from the PhosphoSitePlus database with gene and modification residue columns (e.g. S15-p) and header comment lines. All three formats are normalized into a common internal representation with fields for gene symbol, site identifier, PTM type, PTM code, amino acid, log2 fold-change, p-value, and localization score. PTM types supported are phosphorylation (p), acetylation (ac), methylation (me), and ubiquitination (ub).

Quality Control

Before any analysis, the engine computes the following QC metrics from the parsed input: total number of sites in the file, number of sites with a valid quantified log2 fold-change value, number of unique protein-coding genes represented, count of significantly upregulated sites (log2FC greater than 0.5), count of significantly downregulated sites (log2FC less than -0.5), and median absolute log2 fold-change across all quantified sites. These statistics are reported in the QC summary and flagged if the data appears sparse or has an unusual fold-change distribution.

Kinase Substrate Enrichment Analysis

Kinase Substrate Enrichment Analysis (KSEA) is the core algorithm of RevPhospho. It translates site-level phosphorylation fold-changes into kinase-level activity scores by aggregating the fold-changes of all known substrates of each kinase and testing whether they are collectively shifted relative to the background distribution of all measured sites. KSEA Z-Score For each kinase kk, the activity Z-score is computed as: Zk=FCkFCallσallnkZ_k = \frac{\overline{\text{FC}}_k - \overline{\text{FC}}_{\text{all}}}{\sigma_{\text{all}}} \cdot \sqrt{n_k} Where FCk\overline{\text{FC}}_k is the mean log2 fold-change of the quantified substrates of kinase kk, FCall\overline{\text{FC}}_{\text{all}} is the mean log2 fold-change across all quantified phosphosites, σall\sigma_{\text{all}} is the standard deviation of log2 fold-changes across all sites, and nkn_k is the number of quantified substrates matched for kinase kk. A positive Z-score indicates that the kinase’s substrates are collectively upregulated relative to the background, implying kinase activation. A negative Z-score indicates that substrates are collectively downregulated, implying kinase inhibition or reduced activity. Statistical Testing In addition to the Z-score, a one-sample t-test is applied per kinase to assess whether the distribution of substrate fold-changes is significantly different from the global mean. All resulting p-values are corrected for multiple testing using the Benjamini-Hochberg false discovery rate procedure with a default significance threshold of FDR 0.05. Kinases with fewer than min_substrates matched quantified substrates (default 3) are excluded from reporting to prevent low-confidence estimates from sparse substrate coverage. The kinase-substrate reference database integrates annotations from PhosphoSitePlus, NetworKIN, and SIGNOR, covering over 150 human kinases. Each kinase entry includes its gene symbol, full name, kinase family (RTK, SFK, AGC, PIKK, CMGC, and others), and its curated list of known substrates.

Cancer Signature Scoring

Beyond identifying which kinases are active, RevPhospho maps the sample’s kinase activity profile against 12 reference cancer subtype signatures derived from CPTAC (Clinical Proteomic Tumor Analysis Consortium) phosphoproteomic datasets. This enables researchers to determine which cancer subtype the sample’s signaling landscape most closely resembles. Cosine Similarity For each cancer signature ss, the similarity to the sample’s kinase Z-score vector is computed using cosine similarity: similaritys=zvszvs\text{similarity}_s = \frac{\mathbf{z} \cdot \mathbf{v}_s}{\|\mathbf{z}\| \cdot \|\mathbf{v}_s\|} Where z\mathbf{z} is the vector of kinase Z-scores from the sample and vs\mathbf{v}_s is the reference activity direction vector for signature ss. The similarity is normalized to a [0, 1] score: scores=similaritys+12\text{score}_s = \frac{\text{similarity}_s + 1}{2} Scores are classified into three confidence levels: High (score above 0.65), Medium (0.45 to 0.65), and Low (below 0.45). The 12 signatures span five cancer types across their major molecular subtypes:
Cancer TypeSubtypes
Breast (BRCA)HER2-enriched, Luminal A, Triple-Negative
Lung Adenocarcinoma (LUAD)EGFR-mutant, KRAS-mutant, ALK/ROS1-fusion
Colorectal (COAD)Chromosomal Instability, MSI-Hypermutated
Glioblastoma (GBM)Classical (EGFR-amplified), Mesenchymal
Uterine (UCEC)Copy-Number High (Serous-like), POLE-ultramutated

Drug Target Prioritization

The drug target prioritization module combines the kinase activity Z-score, curated druggability, and cancer relevance into a single ranked score to identify the most actionable therapeutic targets in the sample. Combined Score scorek=ZkZmax4.0+Dk4.0+Rk2.0\text{score}_k = \frac{|Z_k|}{Z_{\max}} \cdot 4.0 + D_k \cdot 4.0 + R_k \cdot 2.0 Where Zk/Zmax|Z_k| / Z_{\max} is the normalized absolute activity score (scaled to 0-4), DkD_k is the curated druggability score for kinase kk on a scale of 0 to 1 (reflecting the existence of approved drugs, clinical-stage compounds, or published tool compounds), and RkR_k is the cancer relevance weight derived from the cancer signature scores (1.0 for high confidence match, 0.5 for medium, 0.1 for low). Kinases with negative Z-scores (inhibited kinases) receive a 15% score penalty, as activated kinases are generally higher-priority therapeutic targets than inhibited ones. The top 25 kinases by combined score are returned as the prioritized target list.

PTM Site Annotation

Each phosphosite quantified in the input is cross-referenced against a curated annotation database covering over 50 high-value cancer-relevant phosphosites across 10 key oncoproteins including TP53, AKT1, EGFR, SRC, BRCA1, CTNNB1, MYC, STAT3, CDK2, and MDM2. For each annotated site, the engine reports the known biological function (e.g. activation loop phosphorylation, degron phosphorylation), the known effect (activating, inhibitory, modulatory, or DNA damage response), the kinases known to phosphorylate that site, and associated disease contexts. Sites not present in the curated database are passed through without annotation.

PTM Crosstalk Detection

The crosstalk detection module identifies regulatory relationships between pairs of phosphosites on the same protein that are both measured in the input dataset. Crosstalk events are classified into four mechanistic types. Priming occurs when phosphorylation of site A creates a recognition motif that enables a second kinase to phosphorylate site B. For example, CK1-mediated phosphorylation of CTNNB1 S45 creates the priming event that allows GSK3 to phosphorylate S33, S37, and T41 in the beta-catenin destruction complex. Cooperative events occur when co-phosphorylation of both sites A and B is required for full protein activation. For example, AKT1 requires phosphorylation at both T308 (by PDK1) and S473 (by mTORC2) for maximal kinase activity. Competitive events occur when sites A and B compete for the same binding domain or reader protein, such that phosphorylation of one site can displace binding at the other. Reader events occur when phosphorylation of site A recruits a reader protein that then modifies or regulates the phosphorylation state of site B. The crosstalk database covers over 20 curated events. For each event where both sites are present in the input data, the engine reports the protein, both site identifiers, the crosstalk type, a mechanistic description, supporting literature, and the observed log2 fold-change at each site.

Protein Search and Structure Visualization

In addition to the phosphoproteomics analysis pipeline, RevPhospho provides a protein-centric search view. Users can search for any human protein by gene symbol, protein name, or synonym. For proteins in the curated database, the engine returns all annotated PTM sites with residue-level detail including known kinases, functional descriptions, literature support counts from low-throughput and high-throughput studies, and disease associations. For proteins outside the curated set, annotations are retrieved from UniProt via the public API. Protein 3D structures are rendered interactively using AlphaFold v4 structures retrieved from the EBI API via the NGL Viewer.

Running the Engine

Inputs

ParameterDefaultDescription
ptm_fileOptionalPhosphoproteomics file in MaxQuant, Generic, or PhosphoSitePlus format
input_formatmaxquantFile format: maxquant, generic, or phosphositePlus
organismhumanhuman or mouse
min_substrates3Minimum matched substrates per kinase for KSEA inclusion
ptm_typespComma-separated PTM codes to analyze: p, ac, me, ub
A built-in demo dataset simulating a lung adenocarcinoma phosphoproteome is available for immediate exploration without uploading a file.

Outputs

Upon completion, the engine returns results across six modules:
  • QC summary: Site counts, protein counts, upregulated and downregulated site counts, median absolute fold-change
  • KSEA results: Per-kinase Z-score, substrate fold-change mean, matched substrate count, p-value and FDR, kinase family, and list of matched substrate site IDs
  • Cancer signatures: Per-signature similarity score and confidence level across all 12 CPTAC-derived subtypes
  • Drug targets: Top 25 ranked kinases with combined priority score, druggability, cancer relevance weight, and Z-score component
  • PTM site annotations: Functional annotations, known effect, kinase assignments, and disease associations for each recognized site
  • Crosstalk events: Detected inter-site regulatory relationships with mechanistic classification and both site fold-changes