Documentation Index
Fetch the complete documentation index at: https://docs.revilico.bio/llms.txt
Use this file to discover all available pages before exploring further.
Why Use This Engine?
In the documentation below, we will use Revilico’s RevGRN engine to infer gene regulatory networks from single-cell, bulk, or spatial transcriptomic data and simulate the stable cellular states that emerge from those regulatory interactions. RevGRN identifies which transcription factors are driving gene expression changes in the dataset, constructs a network of TF-target regulatory edges, and then uses Boolean dynamical simulation to identify the attractor states that the network converges to, enabling prediction of how genetic perturbations such as knockouts or overexpression events alter the cellular phenotype.
Background
Gene regulatory networks describe the control logic by which transcription factors bind to gene promoters and enhancers to activate or repress downstream target genes. These networks determine which genes are expressed in each cell type and how expression patterns change in response to developmental signals, disease mutations, or drug perturbations. Inferring GRNs from transcriptomic data is challenging because correlation in expression does not imply causation, and regulatory relationships must be distinguished from indirect co-expression driven by shared upstream regulators. RevGRN addresses this using machine learning-based importance scoring (GRNBoost2) to prioritize direct TF-target regulatory edges, filtering to the most informative subset of the network, and then converting the inferred network into a Boolean dynamical model to simulate cellular states. Boolean models represent each gene as either ON or OFF and apply the regulatory logic iteratively until the system converges to stable attractor states. These attractors correspond to distinct biological phenotypes (cell types, disease states, drug-response signatures), and perturbation simulations reveal how knocking out or overexpressing a gene redirects the network toward different attractors.Input Data Loading
RevGRN accepts expression matrices in CSV, TSV, H5AD, and LOOM formats. The engine auto-detects whether rows represent cells and columns represent genes, or vice versa, by analyzing the index patterns: Ensembl IDs (ENSG prefix), gene symbols, or cell barcode patterns are detected and the matrix is transposed if necessary. For matrices indexed by Ensembl IDs with a gene_name column, symbols are resolved via the MyGeneInfo API.Quality Control and Normalization
Cells with fewer thanmin_genes_per_cell expressed genes (default 200) are removed. Genes detected in fewer than min_cells_per_gene cells (default 3) are removed. The filtered matrix is then normalized by library size and log-transformed:
Where is the raw count for gene in cell . Highly variable genes are then selected by ranking genes on their dispersion ratio:
The top 2,000 genes by dispersion (or adaptively capped at ) are retained for network inference.
GRN Inference
GRNBoost2 (Primary Method) GRNBoost2 uses an ensemble of gradient-boosted regression trees to estimate the regulatory importance of each transcription factor for each target gene. For each target gene, an XGBoost model is trained to predict that gene’s expression from the expression of all TF genes. The feature importances from the trained model quantify how much each TF reduces prediction error, providing a directed importance score for each TF-target pair. GRNBoost2 is fast, parallelizable, and robust to non-linear regulatory relationships. Mutual Information (Alternative) Mutual information between TF and target gene expression is computed as: Where is the marginal entropy of target gene expression and is the conditional entropy given TF expression. Mutual information captures non-linear associations that linear correlation misses. Spearman Correlation (Fallback) When neither GRNBoost2 nor mutual information libraries are available, Spearman rank correlation is used as a fallback. The absolute correlation value serves as the regulatory importance score. After inference, the topmax_network_edges edges (default 50) ranked by importance score are retained as the final network.
Boolean Simulation
The inferred network is converted to a Boolean dynamical model. Each gene is represented as a binary variable (ON = 1, OFF = 0). At each simulation step, the state of each gene is updated synchronously according to the regulatory logic: Where is the regulatory edge weight from TF to gene (positive for activation, negative for repression, using the sign of the inferred importance). The simulation is initialized from random binary starting states and iterated until convergence (a state that maps to itself) or a limit of 50 steps. Each terminal state is recorded as an attractor. The simulation is repeated from 200 different random initial states, and the basin of attraction for each attractor is estimated as the fraction of initial states that converge to it.Perturbation Analysis
In knockouts, the target gene is pinned to state 0 (OFF) throughout the simulation regardless of its regulatory inputs. In overexpression simulations, the gene is pinned to state 1 (ON). The resulting attractor distribution is compared to the baseline to predict how the perturbation redirects network dynamics. Genes whose knockout or overexpression produces the largest shift in attractor basin fractions are the regulatory hubs most relevant to phenotypic control.Running the Engine
Inputs
| Parameter | Default | Description |
|---|---|---|
| Expression matrix | Required | CSV, TSV, H5AD, or LOOM (cells x genes) |
| Data type | scrna | scrna, bulk, or spatial |
| Organism | human | human, mouse, or rat for TF annotation |
| Min genes per cell | 200 | QC threshold for sparse cell removal |
| Min cells per gene | 3 | QC threshold for sparse gene removal |
| Highly variable genes | 2000 | Number of HVGs for network inference |
| Max network edges | 50 | Top TF-target edges to retain |
| Inference method | grnboost2 | grnboost2, mutual_info, or correlation |
| Simulation type | attractor_landscape | attractor_landscape, gene_knockout, or gene_overexpression |
| Knockout/overexpression gene | None | Gene symbol to perturb |
Outputs
- Network: Ranked list of TF-target regulatory edges with importance scores
- TF hub ranking: Top transcription factors by total regulatory influence (sum of outgoing edge importance)
- Attractors: Stable cellular states with active gene sets and basin of attraction fractions
- Perturbation comparison: Baseline vs. perturbed attractor distribution (if perturbation mode selected)
- Network visualization: Interactive force-directed graph with TFs as blue diamonds and target genes as gray nodes, edge width proportional to importance
- QC summary: Cell counts, gene counts, HVG count, network edge count, TF count, and attractor count
- Downloads: Network CSV (TF, target, importance), attractors CSV (active genes and basin fraction), full results JSON

