RevMethyl - Revilico

Why Use This Engine?

In the documentation below, we will use Revilico’s RevMethyl engine to compute epigenetic biological age from DNA methylation beta values using four validated published clocks. This engine enables researchers and clinicians to measure how fast a sample is aging biologically, independent of its chronological age, and to quantify the degree of age acceleration or deceleration relative to population norms.

Background

DNA methylation (DNAm) is a heritable epigenetic modification in which a methyl group is added to the cytosine of a CpG dinucleotide. Methylation levels are quantified as beta values, representing the proportion of cells in a sample in which a given CpG site is methylated, ranging continuously from 0 (fully unmethylated) to 1 (fully methylated). DNAm patterns change predictably with age across a large fraction of the genome, and this property was exploited to build regression models, known as epigenetic clocks, that predict chronological or biological age from a weighted linear combination of CpG site beta values. Epigenetic clocks are among the most accurate and reproducible biomarkers of biological aging available. They capture aspects of aging that are not reflected by chronological age alone, including accumulated cellular damage, lifestyle exposures, disease burden, and mortality risk. RevMethyl implements four widely used and validated clocks: the Horvath 2013 pan-tissue clock, the Hannum 2013 blood clock, the PhenoAge 2018 phenotypic age clock, and the GrimAge 2019 mortality clock. Each clock was trained on different biological outcomes and covers a distinct number of CpG probes, and their results together provide a multi-dimensional picture of epigenetic aging.

Input and Data Loading

RevMethyl accepts a two-column CSV or TSV file where the first column contains Illumina 450k or EPIC array probe identifiers (e.g. cg16867657) and the second column contains the corresponding beta value for each probe. Gzip-compressed files (.csv.gz, .tsv.gz) are also accepted. A header row is auto-detected and skipped if the first cell does not begin with a “cg” prefix. Beta values are clipped to the valid range [0, 1] prior to computation, and any out-of-range values are flagged in the QC statistics. Alternatively, raw IDAT files (Red and Green channel) can be uploaded directly. The engine processes the IDAT pair through array annotation to derive probe-level beta values before passing them to the clock computation step. The array type (450k, EPIC, or EPICv2) can be specified or detected automatically.

Quality Control

Before computing any clock, the engine collects the following QC statistics from the input beta matrix: total number of probes present, mean beta value across all probes, standard deviation of beta values, and minimum and maximum observed beta values. These statistics serve as a data sanity check. For each clock, the engine also reports the number of probes matched from the clock definition, the number of probes that were absent in the input file and required imputation, and the resulting probe coverage fraction.

Epigenetic Clock Computation

All four clocks share the same underlying computation structure. Each clock defines a set of CpG probes with associated regression coefficients and a scalar intercept. The raw clock score is computed as:

\text{score} = \beta_0 + \sum_{i=1}^{n} w_i \cdot x_i

Where

\beta_0

is the clock-specific intercept,

w_i

is the regression coefficient for probe

i

, and

x_i

is the observed beta value for probe

i

. After this linear combination, each clock either applies a transformation or returns the raw score directly as DNAm age, depending on the clock’s original training procedure. The final age estimate is clipped to the biologically plausible range of [0, 120] years. Missing Probe Imputation Not all input datasets will contain every probe defined by a given clock. When a clock probe is absent from the input file, the engine imputes its beta value using the mean beta of all probes that were successfully matched for that clock. This is the standard approach used in the original clock publications and ensures that the score degrades gracefully with decreasing array coverage rather than failing entirely. Each missing probe contributes

w_i \cdot \bar{x}_{\text{matched}}

to the raw score, where

\bar{x}_{\text{matched}}

is the mean of matched probe beta values. Horvath 2013 The Horvath clock is a pan-tissue clock trained on 51 different tissue and cell types using penalized regression (elastic net) with 353 CpG probes. It is the most broadly applicable of the four clocks and was designed to predict age consistently regardless of tissue origin. The clock uses an intercept of 0.6955. Because the clock was trained with a non-linear age transformation applied to the outcome variable (compressing the age scale in childhood), the raw score must be inverted through a piecewise anti-transformation before reporting DNAm age:

\text{DNAm age} = \begin{cases} 21 \cdot e^{\text{score}} - 1 & \text{if score} < 0 \\ 21 \cdot (\text{score} + 1) - 1 & \text{if score} \geq 0 \end{cases}

The negative branch uses an exponential to expand compressed childhood ages, while the non-negative branch uses a linear mapping for adult ages. The result is a DNAm age estimate in years. Hannum 2013 The Hannum clock is a blood-specific linear clock trained on whole-blood methylation from 71 CpG probes using ridge regression with chronological age as the outcome. It uses an intercept of 0.0 and returns the raw score directly as DNAm age without any transformation. Due to its small probe set and tissue specificity, it is the fastest clock to compute and is most accurate when the input sample is blood-derived. PhenoAge 2018 PhenoAge uses 513 CpG probes and was trained not on chronological age directly but on a composite phenotypic age score derived from clinical biomarkers including albumin, creatinine, glucose, C-reactive protein, lymphocyte percentage, mean cell volume, red cell distribution width, alkaline phosphatase, and white blood cell count, combined with mortality risk modeling. The result is a DNAm age estimate that reflects morbidity and mortality risk more strongly than calendar age. PhenoAge uses an intercept of 60.664 and returns the raw score directly. A higher PhenoAge relative to chronological age indicates elevated biological aging associated with chronic disease burden. GrimAge 2019 GrimAge is the most complex of the four clocks, using 1,030 CpG probes. Rather than being trained directly on age or phenotypic age, GrimAge is a composite of DNAm-based surrogate scores for several plasma proteins and smoking pack-years. These surrogates were each individually trained using elastic net regression, and their weighted combination was then optimized to predict time-to-death from all causes. GrimAge uses an intercept of 25.0 and returns the raw composite score directly as DNAm age. Among the four clocks, GrimAge is the strongest predictor of mortality and disease onset, making it the most clinically informative metric for assessing biological age acceleration.

Age Acceleration

If the user provides the sample’s chronological age, the engine computes age acceleration for each clock:

\text{Age Acceleration} = \text{DNAm Age} - \text{Chronological Age}

Positive age acceleration indicates that the sample is epigenetically older than its calendar age, reflecting faster biological aging. Negative age acceleration indicates slower biological aging. Age acceleration is the primary metric used to identify samples with atypical aging trajectories relative to population norms.

Running the Engine

Inputs

Parameter	Required	Description
`methylation_file`	Yes	Two-column CSV or TSV with probe IDs and beta values
`chronological_age`	No	Decimal age of the sample for acceleration calculation
`clocks`	No	Comma-separated subset of clocks to run (default: all four)

Outputs

Upon completion, the engine returns the following for each clock:

DNAm age: Estimated biological age in years
Age acceleration: DNAm age minus chronological age (if chronological age provided)
Probe coverage: Fraction of clock-defined probes matched in the input file
Matched probes: Count of probes found in the input
Imputed probes: Count of probes absent from the input and imputed with mean beta

Summary statistics across all clocks are also returned: mean DNAm age, mean age acceleration, mean probe coverage, and number of clocks computed. QC metrics include total probes in the input file, mean and standard deviation of beta values, and minimum and maximum beta values observed.

​Why Use This Engine?

​Background

​Input and Data Loading

​Quality Control

​Epigenetic Clock Computation

​Age Acceleration

​Running the Engine

​Inputs

​Outputs