Genomic Distribution of structural Superfamilies is a repository of homologous sequences of structural superfamily members in SCOP identified in the sequence space (NR, NCBI). This version of GenDiS+ is in direct correspondence with SCOP 1.75, PASS 2.4, NR, and UniProt. We used remote homology methods like PHI-BLAST and CS-BLAST for identifying the homologues of over 11,000 SCOP members from 1961 superfamilies (Iyer M, Joshi A and Sowdhamini R, 2018). Hits were validated using the structure-based sequence alignments for superfamilies as provided in PASS2.4. 18,325,278 hits were identified by sequence searches from 67,289,356 (27.2%) sequences in NR, NCBI; out of which 14,495,466 (21.5%) sequences passed the validation. Using all the proteins predicted to have at least one structural domain, a high coverage of 61% of Pfam families was achieved which is the higher than the existing methods. The database reports the analysis of the hits obtained at the superfamily, fold and class- level, the domain architecture (DA), and the taxonomic occurrence. Compared to the previous version (Pugalenthi G, Bhaduri A and Sowdhamini R, 2005), we have now computed DA of the hits at the Pfam level, identified correspondence between Pfam and SCOP domain definitions of the hits, and classified all the Pfam and SCOP DA for each superfamily hits. The user may navigate the database at different levels: 1. SCOP hierarchies- SCOP class, fold and superfamily code and description 2. Taxonomic hierarchies- NCBI taxid and description 3. SCOP and Pfam domain architecture- based on SCOP superfamily code/description and Pfam family code/description If you found our database or the methodology useful please cite us: Iyer MS, Joshi AG and Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol. Omics, 2018,14, 266-280 Joshi AG, Raghavender US and Sowdhamini R. Improved performance of sequence search approaches in remote homology detection. F1000Research 2014, 2:93 Pugalenthi, G. Bhaduri, A and Sowdhamini, R (2005) GenDiS: Genomic Distribution of Protein Structural domain Superfamilies. Nucleic Acid Research Vol. 33, D252-D255 Please contact us for further clarifications and for running the pipeline on multiple sequences. Statistics
|