sORFdb.jpeg

sORFdb

sORFdb is a dedicated database for short open reading frames (sORFs) and small proteins in bacteria. It integrates sequences from annotated genomes, curated protein resources, and experimentally supported small proteins, applies strict quality filtering, and offers an interactive web interface to search, browse, and download high-confidence small-protein data.

Key Benefits
  • Comprehensive coverage of bacterial small proteins, built from multiple high-quality data sources.
  • Clustered small-protein families with alignments and HMMs for consistent annotation.
  • Rich metadata and functional hints, including predicted properties and domain signatures.
  • Flexible search tools (text, taxonomy, filters, sequence similarity) with fully downloadable datasets.
Features
  • High-confidence sORFs and small proteins (≤100 aa), filtered to remove artefacts.

  • Clustering into small-protein families, including HMMs and family-level metadata.

  • Functional and domain predictions where available, plus physicochemical properties.

  • Browsing by taxonomy, annotation, sequence features, or similarity search.

Applications
  • Comparative genomics across bacterial taxa.
  • Enhancing genome re-annotation workflows.
  • Discovery and annotation of previously overlooked small proteins.
  • Prioritizing small proteins for experimental validation.
 
Intended Use

Ideal for microbiome researchers, genome annotators, and molecular microbiologists who study bacterial coding potential and want reliable, structured data on small proteins.

 

Website

This email address is being protected from spambots. You need JavaScript enabled to view it.

No de.NBI funding