Gendis Central Help |
Genomic Distribution of structural Superfamilies (Pugalenthi G, Bhaduri A and Sowdhamini R, 2005) identifies and classifies homologues of structural members identified at the SCOP superfamily level. This version of GenDiS+ is in direct correspondence with SCOP 1.75, PASS 2.4, NR May 2015 and UniProt 2016. GenDis aims to associate primary structure of protein sequences with the 3-dimensional homologous superfamilies.
The database reports the analysis of the hits obtained at the superfamily, fold and class- level, the domain architecture (DA) and the taxonomic occurrence. We also provide a library of the DA observed in the homologues at the level of SCOP and Pfam for each superfamily and organism. Compared to the previous version, we have now computed DA of the hits at the Pfam level, identified correspondence between Pfam and SCOP domain definitions of the hits and classified all the Pfam and SCOP DA for each superfamily hits. Users can also compare superfamily homologues from different organisms and different DA in the same organism.

Queries were obtained from PASS2.4, which provides the list of all SCOP (v1.75) superfamily members of domains of known structure, with less than 40% identity with each other. The search was carried out against the NR database, NCBI (May 2015) with an e-value and inclusion threshold of 10-3 for 20 iterations or till convergence. Two different sequence search approaches as described in Joshi et al, 2013 were used to identify more sequence homologues:
All PASS2.4 superfamily members were used as queries.
A PSI-BLAST (e-value and inclusion threshold 10-3, 20 iterations) search was carried using all PASS2.4 members as queries. The member which picked all other SCOP members and had the highest number of true positive hits (please see the section on validation of hits) was taken to be the BRS.
A stringent PSI-BLAST search (e-value and inclusion threshold) was carried out to identify closely related homologues of BRS of a superfamily. The hits having 60-90% identity with the query were aligned using ClustalW2.0. The MSA was used to generate patterns in PROSITE format using an in-house tool MOTIFS (available upon request). If there were more than three residues in a position, the position was denoted as X (any amino acid).
Searches were carried out using the BRS and each pattern as queries.
The data is arranged in three categories of superfamilies:

Each hit was validated using structure-based sequence alignments from PASS2.4.
A structure-based sequence alignment is available for all superfamilies with two or more members in PASS2.4. We also used the sequences of all PASS2.4 superfamily members to create HMM libraries for each superfamily.
Alignment of all PASS2.4 members.
Sequences from PASS2.4 for superfamilies.
SMS have only SQ-HMM libraries.
The following table shows the statistics of the SQ-HMM and SF-HMM components of the superfamily HMM libraries:
| Type of HMM library | Number of SMS HMM | Number of TMS HMM | Number of MMS HMM | Total |
| SF-HMM | - | 366 | 714 | 1180 |
| SQ-HMM | 864 | 732 | 8973 | 10,569 |
| Total | 864 | 1098 | 9687 | 11,794 |
HMMSCAN from HMMER3.1 was used with an e-value of 0.01. All the domain matches with an independent e-value (i e-value) of 10-2 and HMM model coverage of 0.7 were extracted and are provided in the superfamily page in structural domains tab.
Assigning a superfamily domain to a region:

DA was computed at the structural (SCOP) and sequence (Pfam) level for the hits which passed the above validation.
Pfam DA was calculated as follows:
To understand the diversity of homologues for a given superfamily, the domain regions identified by SF-HMM and SQ-HMM matches were extracted and aligned using CLUSTAL Omega. The domain alignments can be downloaded for each superfamily.
There are 314 superfamilies with a single domain architecture out of which 310 have a single domain DA. For the other 1646 superfamilies, diversity of the homologues at the level of structural (SCOP) DA was checked. A distance matrix of the DA was computed using Alignment-free Domain Architecture Similarity Search (ADASS) tool (available upon request). DAD trees were constructed using Neighbor Joining (NJ) from Phylip and are available for download.
From the BLAST results, alternate accession identifiers having identical sequences were extracted. We also extracted taxonomic details for all the hits. The different accessions for the hits were retained to gather information about the different strains and variants that contained the homologue of the superfamily domain.
For each hit, we provide NCBI accession identifiers and identifiers from other databases like SCOP v1.75, SCOPe v2.06, Uniprot and Interpro from the mapping files provided in Uniprot and SCOP. PDB details were extracted from sequence defline as provided in NCBI.
Users can browse the data using:
We have organised our database based on the SCOP (Structural Classification of Proteins) database heirarchy. The heirarchy for the same is:
Protein families sharing common structural features are categorised under the same superfamily. Being the first level of the heirarchy, the superfamily browsing page displays a list of all the protein superfamilies along with a short description and various other details. Internal (Gendis+) links to each superfamily's details are also provided.
Links for the PASS2 HMM files, extracted domain regions, and full-length sequences are also available for the user to download. Two different types of HMM files are available for each superfamily: alignment of all PASS2 superfamily members (SF-HMM) and sequences of all PASS2 members (SQ-HMM).
Protein superfamilies sharing common structural features are categorised under the same fold category. When browsing by the Fold classification, users can view the different folds under which each superfamily falls; along with other important information like, Fold description and also the Class under which the said fold lies. External links to SCOP database are also provided for each Fold and Class.
Multiple folds sharing common structural features are categorised under the same class category. When browsing by the Class classification, users can view the different classes under which each fold falls; along with other important information like, class description and also the superfamilies which lie under the said class. External links to SCOP database are also provided for each Class and Fold.
Users can browse the database on the basis of NCBI taxanomy. The three levels of the same are:
Under the browse by taxanomy-genome page, users will find a search bar with various associated parameters for advanced search. By clicking on a specific parameter and entering complete or partial query in the search bar, the user would be redirected to the results page.
The users may search the entire database at different levels:
Here's a short guide to use the global search tool:
For the ease of searching, along with the global search tool, local search bars are also provided with each table.
SCOP superfamily, fold and class code and description, number of organisms in which the homologues are found, number of hits, number of true positives, total number of accession ids, number of SCOP DA and associated domains, number of Pfam DA and average domain size have been provided. User can click on the tabs for taxonomy, DA to obtain detailed information. DAD tree, SF-HMM and SQ-HMM, domain sequences and domain alignments are available for download.
User can upload a sequence and the structural domain prediction results will be displayed. All the matching domains will be shown. The user can view the results according to different values of i e-value and HMM model coverage.
Please contact us if you would like to run the GenDiS+ pipeline for a large number of sequences.
The user can view domain alignments from two different genomes for the user-specified superfamily.
Users can align their sequences with homologues of a superfamily from different genomes.
Please refer to the Downloads tab for a detailed description of the files for download. The user can download the following files: