Documentation Index
Fetch the complete documentation index at: https://docs.revilico.bio/llms.txt
Use this file to discover all available pages before exploring further.
Why Use RevScan?
RevScan converts images of chemical structures into machine-readable SMILES strings using optical structure recognition (OSRA). This is directly useful when extracting structures from publication figures, patent images, scanned laboratory notebooks, or any source where the chemical information exists as a raster image rather than a digital structure file. The resulting SMILES can be immediately used as input to any Revilico engine.
Background
Chemical structure diagrams in the scientific literature are typically embedded as images in PDFs or scanned documents. Re-entering these structures manually into drawing tools is time-consuming and error-prone. Optical Structure Recognition (OSRA) is a computer vision approach that parses the geometry, bond angles, atom labels, and ring systems in a structure diagram image and reconstructs the corresponding connection table, producing a valid SMILES string. RevScan uses OSRA (Optical Structure Recognition Application), an open-source tool developed at NCI/NIH, which is optimized for processing 2D chemical structure depictions from publications and patents.How It Works
The user uploads an image containing one or more 2D chemical structure drawings. RevScan passes the image through the OSRA recognition pipeline, which identifies atom positions, bond types (single, double, triple, aromatic), stereocenters, ring systems, and atom labels for heteroatoms and charges. The recognized connection table is then converted to a canonical SMILES string. Tips for best results:- Use clean, high-resolution images with clear bond lines and legible atom labels
- Images with dark structures on white backgrounds produce the most accurate recognition
- Avoid images with text overlapping structural elements
- Simple structures are recognized more accurately than complex polycyclic or organometallic structures
- Always verify the output SMILES against the source image, as recognition accuracy may not be 100% for complex or low-quality inputs
Running the Engine
Inputs
| Input | Description |
|---|---|
| Chemical structure image | JPEG, PNG, TIFF, or GIF file containing one or more 2D structure drawings |
Outputs
- SMILES string: Canonical SMILES representation of the recognized chemical structure
- The output SMILES can be copied directly into any Revilico engine input field or saved to a review session

