This version of SSToSS database contains structural templates for the singlemember superfamilies of PASS2 database. Structural templates are identified considering overlap of high sequential conservation with structural features. The SMoS database provides the structural motifs for multimember superfamilies and the motif regions have been identified by the conservation sequence similarity as well as structural features like solvent inaccessibility, secondary structure content, hydrogen bonding etc. Now in the SSToSS database provides sequence-structural template regions for the single structural representative of each superfamily. The sequential conserved parts have been identified from the homologous sequence alignment and these conserved regions have mapped with the above said structural parameters. Regions with high structural similarity as well as high structural feature content have been selected as sequence-structural templates for the particular protein in each superfamily. These sequence-structural templates can thus be regarded as both sequentially and structurally important segment for the superfamily member.
Templates are marked on the protein sequence using different color codes. Same color code is maintained in the dendrogram and in the spatial representation of the templates in 3 dimensional coordinates.
For each set of templates three dimensional orientation is represented in terms of spatial distances. Distances and absolute angle of each motif with respect to the centre of mass of the protein are calculated and average values for the superfamily members are presented in a tabular format in spatial orientation section. Inter motif distances and torsional angles are also calculated and presented in a matrix format.
The occurrence of several conserved motifs is often more informative than presence of a single motif. Multiple-motif-based search tools have been found to be useful in the past (Bailey and Gribskov, 1997; Grundy et al., 1997; Chakrabarti et al., 2005). In the SSToSS database we incorporated related sequence search option based on multiple-pattern matching combined with a search for statistically significant sequence similarity (Chakrabarti et al., 2005). The specificity of the search engine is increased by utilizing the inter-motif spacing and pair wise global alignment of the query and hits.
SSToSS database also utilizes an alignment algorithm (Chakrabarti et al., 2004) to provide multiple alignment of the similar sequences identified by multiple-motif based database search. It allows the user to obtain a control over the alignment by providing sequence-structure template regions as input to the alignment program to achieve a more structurally relevant and functionally useful alignment of protein sequences. The algorithm employs local conserved regions of the sequences to be fixed and aligns the rest based on normal progressive alignment. The chances of global misalignment are thereby reduced and the possibility of obtaining overall better alignment is increased (Chakrabarti et al., 2004).
We also provide three dimensional model structures of the similar sequences identified by multiple-motif based similarity search as potential member of each SSToSS superfamily. 3D models are built using the program MODELLER (Sali and Blundell, 1993) based on the structure of superfamily member protein.
Figure 1 provides a flowchart of the identification scheme of sequence-structural templates.
Bailey, T.L. and Gribskov, M. (1997) Score distributions for simultaneous matching to multiple motifs. J Comput Biol. 4, 45–59.
Chakrabarti, S., Bhardwaj, N., Anand, P.A. and Sowdhamini, R. (2004). Improvement of alignment accuracy utilizing sequentially conserved motifs. BMC Bioinformatics. 5, 167-179.
Chakrabarti, S., Anand, A.P, Bhardwaj, N., Pugalenthi, G. and Sowdhamini, R. (2005). SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs. Nucleic Acids Res. 33, W274-W276.
Grundy, W.N., Bailey, T.L., Elkan, C.P. and Baker, M.E. (1997) Meta-MEME: motif-based hidden Markov models of biological sequences. Comput. Appl. Biosci. 13, 397–406.
Sali, A. and Blundell, T.L. (1993). Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 234, 779–815.