SMotif

Home

Help

SMotif

SMotif is a server to identify set of structural motifs from protein structures. Such motifs among structurally aligned proteins are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other important structural features like secondary structural content, hydrogen bonding pattern and residue packing.

Input options

Input 1: Paste Alignment (PIR/FASTA format)

This option should be used, if the users already possess a reliable alignment and wants to identify structurally conserved regions within the alignment.

The input alignment should be in PIR or FASTA format. Please have a look at the example input page for more information.

Input 2: Paste sequence of your structure(s) in PIR/FASTA format

This option should be used if the users possess only the sequence(s) of the query protein structure(s).

i). Multiple sequences as input:

A Multiple alignment is created by superimposing the 3-D structures of the query proteins using the program STAMP (Russell and Burton, 1994). Structural motifs are recognized from this alignment by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding and residue packing.

ii). Single sequence as input:
A) Homologous sequences with available 3-D structure are retrieved by running a PSI-BLAST (Altschul et al., 1997) search against a database of proteins with known structures (PDB, Berman et al., 2000). Hits with E-values lower than 0.001 and alignments length >=70% of the query length are selected for further purpose. A Multiple alignment is generated by superimposing the structures. Structural motifs are recognized from this alignment by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding and residue packing.

B) If there is no homologous sequences with known 3-D structure is available, then sequence homologues are retrieved from the PSI-BLAST output (E-value lower than 0.001 and alignment length >=70% of the query length) searched against SWISSPROT sequence database (Apweiler et al., 1997) and/or Non Redundant (NR) sequence database obtained from NCBI. A multiple sequence alignment is obtained for these sequences by using the program MALIGN (Johnson et al., 1993). Sequentially conserved regions are identified from this alignment and are mapped back to the query structure. Further filtering of the motifs is performed by examining the important structural feature content of each sequential conserved region. Identification of sequence-structural template in this scheme is similar to the method described in SSToSS database (Chakrabarti et al., 2006).

1. Input sequence should be in PIR/FASTA format
2. The first line should contain PDB code. Chain information is default.
3. User may submit single or multiple sequences.
Please have a look at the example input page for more information.

Reference:

Apweiler,R., et al. (1997). Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT, TREMBL. Proceedings of the 5th International Conference on ISMB, pp. 33-43.

Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402.

Berman, H.M. et al. (2000). The Protein Data Bank. Nucleic Acids Res., 28, 235-242.

Chakrabarti S., et al. (2006) SSToSS - Sequence-Structural Templates of Single-member Superfamilies. In Silico Biol. 6, 0029

Johnson M.S., et al. (1993) Alignment and searching for common protein folds using a data bank of structural templates. J. Mol. Biol., 231, 735-752.

Russell, R. B. and Barton, G.J. (1994) Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side chain contacts secondary structure and accessibility. J Mol Biol., 244, 332-350.

Input 3: Upload your structure(s) in PDB format

This option should be used if the users possess the query protein structure(s) and wants to identify structural motifs for the query structure(s). A Multiple alignment is created by superimposing the 3-D structures of the query proteins using the program STAMP (Russell and Burton, 1994). Structural motifs are recognized from this alignment by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding and residue packing.
1. Upload your structure in PDB format
2. Specify name of the structure (optional)

Input 4: Enter PDB code , chain identifier and residue numbers

This option should be used if the user wants to identify structural motifs for the query structure(s).User should provide pdbcode and chain identifier information for the query structure(s). The protein structure corresponding to the given PDB code will be obtained automatically from our local PDB structure database

1. Enter 4 letter PDB code
2. Enter chain identifier. If there is no chain identifier, you may use "-".
3. Enter start and end residue number.
Please have a look at the example input page for more information.

Motif identification for multiple structures

Structural motifs are identified by the presence of at least three consecutive solvent-buried (inaccessible) residues that have higher amino acid exchange scores. Conservation of more structural parameters like secondary structural content, hydrogen bonding, and residue packing (Ooi number; Nishikawa and Ooi, 1986) are also examined among structurally aligned multiple proteins. A structural feature is considered conserved at an alignment position if it is present in all or all but-one member within the alignment.

In the SMotif algorithm, solvent accessibility is measured using the PSA program from JOY4.0 suite (Mizuguchi et al., 1998). Residues that have accessible surface area less than 7% are treated as solvent buried or inaccessible. At every alignment position, all possible pairs of proteins and their observed amino acids are scored using a standard 20x20 substitution matrix (Johnson and Overington, 1993) derived from structure-based sequence alignments of homologous protein families. SSTRUC program that is part of JOY4.0 suite of programs is used to identify secondary structural positions. The HBOND program, also part of JOY4.0 suite, has been used to identify hydrogen bonds. Residue packing has been measured in terms of Ooi number that provides the number of residues surrounding each Ca atom of residues in a protein. Higher Ooi numbers correspond to high residue packing and suggest that the residue is in a well-packed environment.

Reference:

Johnson, M. S. and Overington, J. P. (1993). A Structural Basis for Sequence Comparisons. An Evaluation of Scoring Methodologies. J. Mol. Biol., 233, 716-738.

Mizuguchi, K. et al. (1998) JOY: protein sequence-structure representation and analysis. Bioinformatics., 14, 617-623

Nishikawa, K. and Ooi, T. J. (1986) Radial locations of amino acid residues in a globular protein: correlation with the sequence. J. Biochem. (Tokyo)., 100, 1043-1047.

Motif identification for single structure

SMotif server provides option to submit sequence of single structure. SMotif retrieves structural homologs by running PSI-BLAST against PDB database. Homologous structures are superimposed using STAMP program and subsequent alignment is used to identify motif regions.

If there are no structural homologs, then sequence homologs are obtained from SWISSPROT database and are aligned using MALIGN program. Each sequentially conserved region is mapped and filtered based on higher structural feature content score with the underlying assumption that the regions that are highly conserved in terms of sequence similarity as well as rich in important structural feature would also be conserved in structural features.

Spatial Distance between motifs

The structural motif regions are transformed into a vector representation by the least-squares fit method (Chou et al., 1984; Srinivasan et al., 1991). Spatial distances for all the motifs are calculated and represented in the form of a matrix.

Average Torsion Angle between the Motifs

The structural motif regions are transformed into a vector representation by the least-squares fit method (Chou et al., 1984; Srinivasan et al., 1991). Virtual torsion angles for all the motifs are calculated and represented in the form of a matrix.
Reference:

Chou,K.C., Memethy,G. and Scheraga,H.A. (1984) J. Am. Chem. Soc., 106, 3161- 3170.

Srinivasan,N., Sowdhamini,R., Ramakrishnan,C. and Balaram,P. (1991) In Balaram,P. and Ramaseshan,S. (eds), Molecular Conformation and Biological Interactions. Indian Academy of Sciences, Bangalore, pp. 59-73.

View Motifs on the structures

Chime view

1. Click on the link "Chime view".
2. Chime window will display the 3D structure of the domain. The domain will be colored according to protein chains.
3. Use right click on the chime window to access the other chime options.

Rasmol view

* Click on the link "Rasmol".
* Save file to disk.
* Make sure that the file (for example, rasmol.cgi) has been saved properly.

How to view (Linux)

First type rasmol in command line.Rasmol prompt will appear on the screen. Now type script Rasmol.cgi
$ rasmol
RasMol>
Rasmol> script rasmol.cgi

Contact:

Dr. R. Sowdhamini (mini@ncbs.res.in)

Dr. Saikat Chakrabarti

Dr. PN. Suganthan

G. Pugalenthi