PASS2 Search facility
The database can be browsed efficiently using several search facilities. The database may be queried using the superfamily code, superfamily name, Fold name, PDB code and keywords.
- 1. To search by Superfamily code, enter a valid Superfamily code (5-6 digit code as mentioned in SCOP). For example: To search the "globin-like" superfamily, enter "46458" (the SCOP-code)
- 2. To search by domain code, enter a valid "SCOP domain code" (7 digit code). For example: Enter "d1cqxa1" to find the globin-like domain in FLAVOHAEMOPROTEIN (multi-domain protein) of Ralstonia eutropha
- 3. To search by protein code, enter a valid "PDB ID" (4 digits as mentioned in PDB). This is to identify number of different domains in a single protein. For example: Take the same example from point 2. Enter pdb id (4 letter code) 1cqx (a multi-domain protein) to find more than one domain
- 4. To search by key terms of Superfamily or Fold name other than mentioned above, then enter a full length or partial terms. For example: Enter the partial name as "globin" or enter the full name as "globin-like" will show the following results on the webpage
Distantly related proteins adopt and retain similar structural scaffolds despite length variations that could be as much as two-fold in some protein superfamilies. An algorithm CUSP, to examine multi-membered PASS2 superfamily alignments to identify indel regions in an automated manner. CUSP, examines protein domain structural alignments to distinguish regions of conserved structure common to related proteins from structurally unconserved regions that vary in length and type of structure.
Consecutive positions with high scores are merged to identify structurally conserved blocks and distinguish them from indels. An average score is associated with each such block and used to annotate the alignment to distinguish indel regions (USB) from 'core' regions (SSB) based on block scores as 'high, medium or poor' to indicate degree of conservation.
Highly conserved structural blocks (H, E and C) identified by:
(a) High (block score 4.5-5.0)
(b) Medium (block score 3-4.5)
(c) Poor (block score less than 3)
Sankaran Sandhya, Barah Pankaj, Madabosse Kande Govind, Bernard Offmann, Narayanaswamy Srinivasan, and Ramanathan Sowdhamini. CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations BMC Struct Biol. 2008; 8: 28.
SMotif program identify set of structural motifs from protein structures. Such motifs among structurally aligned multiple members of protein superfamilies are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding and residue packing.
A structural feature is considered conserved at an alignment position if it is present in all or all but-one members within the alignment. The identified structural motifs are mapped on the alignment using different color code. Ranking of the motifs is done considering the extent of conservation of the structural feature. A flavor of the three-dimensional orientation of the structural motifs is provided via graphic displays and spatial orientation matrices.
Ganesan Pugalenthi, P. N. Suganthan, R. Sowdhamini and Saikat Chakrabarti. SMotif: a server for structural motifs in proteins. Bioinformatics 2007 23(5):637-638
Alistat reads a multiple sequence alignment from the file alignfile in any supported format, and shows a number of simple statistics about it. These statistics include the name of the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, the alignment length (e.g. including gap characters).
A percent pairwise alignment identity is defined as (idents / MIN(len1, len2)) where idents is the number of exact identities and len1, len2 are the unaligned lengths of the two sequences. The "average percent identity", "most related pair", and "most unrelated pair" of the alignment are the average, maximum, and minimum of all (N)(N-1)/2 pairs, respectively. The "most distant seq" is calculated by finding the maximum pairwise identity (best relative) for all N sequences, then finding the minimum of these N numbers (hence, the most outlying sequence).
Outliers are the PDB entries, which are not able to bring in alignment. These protein domain entries originally considered as part of the supefamily, but couldn't be included in the alignment due to various reasons as follows:
- 1. High Root Mean Square Deviation
- 2. Impossible to obtain initial equivalences when it is included in the core alignment
- 3. Consistant difficulties for multiple strucgure alignment programs to deal with the entries