Protocol for rigorous structure-based sequence alignment of
distantly related proteins

The whole alignment procedure either for pair wise or for multi-member can be brought under three phases:
  (I) Initial alignment phase,
 (II) Final alignment phase
(III) Alignment Assessment phase



----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Note: All the commands should be performed inside the directory where you have the protein structure of your interest, unless otherwise specified. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

INITIAL ALIGNMENT PHASE

1| Initial alignment of pair wise structure comparison using MINRMS could be obtained using the steps in option A and initial alignment of multiple structure comparison using STAMP using the steps in option B.

(A) Steps to run MINRMS to do pairwise structural alignment

(i) Use the following command line to run MINRMS program
   /$path/minrms -HS first.pdb second.pdb
(The option "HS" could be used when there is no information provided about the secondary structures in the pdb file)
$path is the location, where you installed the program.
For example, if you installed the minrms executable file in "/usr/local/minrms/bin/" location,
then $path=/usr/local/minrms/bin/ in your case;

(ii) MINRMS provide number of aligned MSF files for the user to select the best alignment. Those result files can be uploaded into CHIMERA graphically and we can choose the best alignment. Usually the best alignment can be chosen with the highest log-P value where the longest distance could be met graphically.

(iii) Using the above logic, a script file "bestalign" was used to retrieve the best alignment file automatically without using the graphical viewer and later it was cross checked using graphical window. We suggest user to utilize the "bestalign" script to get the best alignment file, if he plan to work on multiple number of pair wise alignment.



Example:

If you want to align two proteins named d1hb6a- and d1hbka- using MINRMS, then give the command line in the shell as followed:
/usr/local/minrms/bin/minrms -HS d1hb6a-.pdb d1hbka-.pdb

This is an example output from the previous command

(B) Steps to run STAMP to do initial multiple structural alignment

  (i) Create an input query file using the command "/$path/" with the extension ".database"

 (ii) Run STAMP by "/$path/stamp -l query_file -n 2 -s -slide 5 -prefix query_name -d database_file".

where option 'l' is for the input file and 'n' is for number of fits and 's' makes the scan mode on, 'slide' tells the number of residue in query to slide against the database-query sequence, 'prefix' stands for the prefix of the output file.

(iii) Run SORTTRANS by "/pathname/sorttrans -f query_name.scan -s Sc 2.0 >query_name.sorted".

(iv) Run TRANSFORM by "/pathname/transform -f query_name.sorted -g". where "-g" is for graphical output.

 (v) Run POSTSTAMP by "/pathname/poststamp -f query_name -min 0.5" to check whether each position in the structural alignment has structurally equivalent across all the member in the alignment and also checks the number of pairwise comparison whose Pij value higher or equal to a cutoff.

(vi) Run STAMP_CLEAN by "/pathname/stamp_clean query_name.post 3> query_name.clean" to clean the structurally meaningless region in the alignment.

(vii) Run ACONVERT by "/pathname/aconver -in b -out p< query_name.clean >query_name.ali" to get the INITIAL STRUCTURAL ALINGMENT in clustalW or MSF format from the STAMP block file format.



FINAL ALIGNMENT PHASE

2| Assessary files. Assessary files containing informations like solvent accessibility, secondary structural informations, and H-bond informations can be obtained in separate files with the following single command using JOY-5 package. "/pathname/joy filename.ali"


3| Initial equivalences

(A) JOY-4v. If you are using JOY-4v package, use the following command to obtain the initial equivalences. "/pathname/joy -m filename.ali". The result file "filename.mnt" will be created automatically, but the result file should be renamed as "mnf1.inp" from "filename.mnt".

(B) SSTEQ. If you are using our SSTEQ script, run the command just outside the directory as "perl SSTEQ.pl directory-name". The result file "mnf1.inp" will be created automatically inside the directory, which would be easier for further steps.


4| Steps to run COMPARER to do final alignment.

The above obtained initial equivalence will be fed as a steering input file to the comparer package with the fillowing steps.

(A) First Stage. Before running the following steps, we need to create two types of files, one file having the list of input structures and that file should be named as "codes.nam", the second file having the relationship between the input structures and the file should be named as "codes.tre".

  (i) PREMNF. Run PREMNF to do pairwise least-square superimposition by the command "/pathname/pmnfc mnf1.inp " with the steering input file mnf2.inp, which could be copied from the example directory where you have installed the COMPARER package.
 (ii) MNFC. Run the command "/pathname/mnfc" with the steering input file, mnfc.inp, which was created as an output file by the previous step.
(iii) HPB2. Run the command "/pathname/hpb2" to obtain hydrophobic contacts as an output "filename.hpc".o
 (iv) HBOND. Run "/pathname/hbond filename.pdb" to obtain side chain hydrogen bond with the output file name "filename.shb".


(B) Second Stage (Simulated annealing)

  (i) PANN9. Run the command "/pathname/pann9" with the input file pann9.inp, which could be copied from the example directory of COMPARER package. This program produces a file for each protein which defines all the relationships of the selected type in this protein
 (ii) SPLITTER. Run the command "/pathname/splitter" to produce separate relationship tables from "mixed relationship" files optionally.
(iii) PREANN. Run the command "/pathname/preann" to construct the steering data file for the ANN9 program.
 (iv) ANN9. Run the command "/pathname/ann9" to produce several pairwise alignments.
 (v) POSTANN. Run the command "pathname/postann" to transform filename.ann files into AM13 format.


(C) Third stage (final alignment)

  (i) PRDGP. Run the command "/pathname/prdgp" to get gap penalties
 (ii) AM13. Run the command "/pathname/am13" to get final alignment in COMPARER format, which uses various parameter files and output files from the previous steps
(iii) ALNPAP. Run the command "/pathname/alnpap" to get the alignment in PIR format


5| Final equivalences. The equivalences from the final alignment have to be calculated using the same step 3 procedures.

6| Superposed coordinates. This could be obtained through JOY-3.2v buy running the command "/pathname/mnyfit -f" with the steering input file which we obtained from the previous step.


This is an example (IL-8 like members) output through previous steps

ALIGNMENT ASSESSMENT PHASE

Alignments derived using purely sequences or structure-based properties can be compared for structural deviations after rigid-body superposition and secondary structural equivalence at the level of superfamily relationships.

7| Mean RMSD. The mean root-mean-square-deviation (RMSD) values can be measured one-against-all within a group of structures which was compared. From the analysis of carefully curated alignments of a previous version of the database10 that, despite distant relationships, this value is generally less than 5.5 Angstrom. Therefore, any superfamily member in the derived aligment that shares more than 5.5 Angstrom mean rmsd is best removed and treated as an outlier.

8| Secondary structure equivalences. The concept of superfamily level relationships implies high structural similarity and secondary structural equivalence. Therefore, the number of alignment positions that retain majority equivalent secondary structures (in more than 75% of members normalised over the mean number of non-gap positions over the entire alignment for all the superfamily members can be calculated. From the analysis of carefully curated alignments of a previous version of the database, we found that this normalized factor of secondary structural equivalence is in the order of 30%. This threshold can be adopted to recognise superfamily alignments that are significantly poorly aligned if the value drops less than the threshold.

-------------
END
-------------