Genomic Distribution of structural Superfamilies is a repository of homologous sequences of structural superfamily members in SCOP identified in the sequence space (NR, NCBI).

This version of GenDiS+ is in direct correspondence with SCOP 1.75, PASS 2.4, NR, and UniProt. We used remote homology methods like PHI-BLAST and CS-BLAST for identifying the homologues of over 11,000 SCOP members from 1961 superfamilies (Iyer M, Joshi A and Sowdhamini R, 2018). Hits were validated using the structure-based sequence alignments for superfamilies as provided in PASS2.4. 18,325,278 hits were identified by sequence searches from 67,289,356 (27.2%) sequences in NR, NCBI; out of which 14,495,466 (21.5%) sequences passed the validation. Using all the proteins predicted to have at least one structural domain, a high coverage of 61% of Pfam families was achieved which is the higher than the existing methods.

The database reports the analysis of the hits obtained at the superfamily, fold and class- level, the domain architecture (DA), and the taxonomic occurrence. Compared to the previous version (Pugalenthi G, Bhaduri A and Sowdhamini R, 2005), we have now computed DA of the hits at the Pfam level, identified correspondence between Pfam and SCOP domain definitions of the hits, and classified all the Pfam and SCOP DA for each superfamily hits.

The user may navigate the database at different levels:

1. SCOP hierarchies- SCOP class, fold and superfamily code and description

2. Taxonomic hierarchies- NCBI taxid and description

3. SCOP and Pfam domain architecture- based on SCOP superfamily code/description and Pfam family code/description

If you found our database or the methodology useful please cite us:

Iyer MS, Joshi AG and Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol. Omics, 2018,14, 266-280
Joshi AG, Raghavender US and Sowdhamini R. Improved performance of sequence search approaches in remote homology detection. F1000Research 2014, 2:93
Pugalenthi, G. Bhaduri, A and Sowdhamini, R (2005) GenDiS: Genomic Distribution of Protein Structural domain Superfamilies. Nucleic Acid Research Vol. 33, D252-D255

Please contact us for further clarifications and for running the pipeline on multiple sequences.

Statistics

Database Level Total Covered % Coverage
Pfam Families 16230 9853 60.71
SCOP v1.75 Superfamilies 1962 1961 99.94
SCOPe v2.06 Superfamilies 2008 1961 97.66
NR db Sequences 67,289,356 18,325,278 27.2
PDB Protein structures 130,536 81,026 62.07
UniProt Sequences 28,827,044 9,730,213 29.62
Taxonomy db, NCBI Organisms 164,890 67,377 40.86
SwissProt Sequences 557,275 298,699 53.60