About

Genomic Distribution of structural Superfamilies or GenDiS is a repository of homologous sequences of structural superfamily members in SCOP identified in the sequence space (NR, NCBI).
This version of GenDiS is in direct correspondence with SCOP 2.08, PASS 2.4 and PASS 2.7, NR, and UniProt. We used DELTA-BLAST for identifying the homologues of over 15,000 SCOP members from 2060 superfamilies. Hits were validated using the structure-based sequence alignments for superfamilies as provided in PASS2.4 and PASS2.7, single query HMMs generated from SCOPe 2.08 members were also used. 151,622,506 hits were identified by sequence searches from 504,094,943 (30%) sequences in NR, NCBI; out of which 116,393,303 (23%) sequences passed the validation.
The database reports the analysis of the hits obtained at the superfamily, fold and class- level, the domain architecture (DA), and the taxonomic occurrence. We have computed DA of the hits at the Pfam level, identified correspondence between Pfam and SCOP domain definitions of the hits, and classified all the Pfam and SCOP DA for each superfamily hits.
The user may navigate the database at different levels of SCOP hierarchies- SCOP class, fold and superfamily code and description.
If you found our database or the methodology useful please cite us:
- Iyer MS, Joshi AG and Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol. Omics, 2018,14, 266-280
- Joshi AG, Raghavender US and Sowdhamini R. Improved performance of sequence search approaches in remote homology detection. F1000Research 2014, 2:93
- Pugalenthi, G. Bhaduri, A and Sowdhamini, R (2005) GenDiS: Genomic Distribution of Protein Structural domain Superfamilies. Nucleic Acid Research Vol. 33, D252-D255
Please contact us for further clarifications and for running the pipeline on multiple sequences.
Statistics

Database | Level | Total | Covered | %Coverage |
---|---|---|---|---|
Pfam | Families | 16230 | 9853 | 60.71 |
SCOP v1.75 | Superfamilies | 1962 | 1961 | 99.94 |
SCOPe v2.06 | Superfamilies | 2008 | 1961 | 97.66 |
NR db | Sequences | 504,094,943 | 116,393,303 | 30.02 |
PDB | Protein structures | 130,536 | 81,026 | 62.07 |
UniProt | Sequences | 28,827,044 | 9,730,213 | 29.62 |
Taxonomy db, NCBI | Organisms | 164,890 | 67,377 | 40.86 |
SwissProt | Sequences | 557,275 | 298,699 | 53.60 |