Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.revilico.bio/llms.txt

Use this file to discover all available pages before exploring further.

Why Use RevScan?

RevScan converts images of chemical structures into machine-readable SMILES strings using optical structure recognition (OSRA). This is directly useful when extracting structures from publication figures, patent images, scanned laboratory notebooks, or any source where the chemical information exists as a raster image rather than a digital structure file. The resulting SMILES can be immediately used as input to any Revilico engine.
RevScan Interface

Background

Chemical structure diagrams in the scientific literature are typically embedded as images in PDFs or scanned documents. Re-entering these structures manually into drawing tools is time-consuming and error-prone. Optical Structure Recognition (OSRA) is a computer vision approach that parses the geometry, bond angles, atom labels, and ring systems in a structure diagram image and reconstructs the corresponding connection table, producing a valid SMILES string. RevScan uses OSRA (Optical Structure Recognition Application), an open-source tool developed at NCI/NIH, which is optimized for processing 2D chemical structure depictions from publications and patents.

How It Works

The user uploads an image containing one or more 2D chemical structure drawings. RevScan passes the image through the OSRA recognition pipeline, which identifies atom positions, bond types (single, double, triple, aromatic), stereocenters, ring systems, and atom labels for heteroatoms and charges. The recognized connection table is then converted to a canonical SMILES string. Tips for best results:
  • Use clean, high-resolution images with clear bond lines and legible atom labels
  • Images with dark structures on white backgrounds produce the most accurate recognition
  • Avoid images with text overlapping structural elements
  • Simple structures are recognized more accurately than complex polycyclic or organometallic structures
  • Always verify the output SMILES against the source image, as recognition accuracy may not be 100% for complex or low-quality inputs

Running the Engine

Inputs

InputDescription
Chemical structure imageJPEG, PNG, TIFF, or GIF file containing one or more 2D structure drawings
Files can also be dragged and dropped directly from the Data Engineering environment.

Outputs

  • SMILES string: Canonical SMILES representation of the recognized chemical structure
  • The output SMILES can be copied directly into any Revilico engine input field or saved to a review session