About

Info Image

Genomic Distribution of structural Superfamilies or GenDiS is a repository of homologous sequences of structural superfamily members in SCOP identified in the sequence space (NR, NCBI).

This version of GenDiS is in direct correspondence with SCOP 2.08, PASS 2.4 and PASS 2.7, NR, and UniProt. We used DELTA-BLAST for identifying the homologues of over 15,000 SCOP members from 2060 superfamilies. Hits were validated using the structure-based sequence alignments for superfamilies as provided in PASS2.4 and PASS2.7, single query HMMs generated from SCOPe 2.08 members were also used. 151,622,506 hits were identified by sequence searches from 504,094,943 (30%) sequences in NR, NCBI; out of which 116,393,303 (23%) sequences passed the validation.

The database reports the analysis of the hits obtained at the superfamily, fold and class- level, the domain architecture (DA), and the taxonomic occurrence. We have computed DA of the hits at the Pfam level, identified correspondence between Pfam and SCOP domain definitions of the hits, and classified all the Pfam and SCOP DA for each superfamily hits.

The user may navigate the database at different levels of SCOP hierarchies- SCOP class, fold and superfamily code and description.

If you found our database or the methodology useful please cite us:

Please contact us for further clarifications and for running the pipeline on multiple sequences.

Statistics

statistics image
Database Level Total Covered %Coverage
Pfam Families 16230 9853 60.71
SCOP v1.75 Superfamilies 1962 1961 99.94
SCOPe v2.06 Superfamilies 2008 1961 97.66
NR db Sequences 504,094,943 116,393,303 30.02
PDB Protein structures 130,536 81,026 62.07
UniProt Sequences 28,827,044 9,730,213 29.62
Taxonomy db, NCBI Organisms 164,890 67,377 40.86
SwissProt Sequences 557,275 298,699 53.60