Validation of PSI-BLAST results:

SCANMOT algorithm is employed to validate and characterize PSI-BLAST output to extract true homologues. Following figure provides the number of homologous proteins identified by PSI-BLAST runs considering each member of the nine families (list) as query sequence. Homologous sequences are identified from a sequence database that contains closely related sequences augmented for each SCOP (Murzin et al., 1995) protein entry whose structure is known; this augmented sequence database provides information on expected number of true positives. SCANMOT algorithm is applied on every PSI-BLAST output and homologous sequences are identified on the basis of presence of motifs, characteristic of the query family of proteins. The numbers of proteins containing at least 50% of the total number of identified motifs are regarded as significant homologues. The number of significant homologues identified by SCANMOT motif scanning procedure is very much close to the actual number of true positives. The observation of increasing number of false positives with higher E-values supports earlier studies (Muller et al., 1999). It has also been shown that the inclusion or H-value does not have much effect on the number of true positives, as the numbers remain very similar for different H-values. With increasing E-values, the number of hits obtained by PSI-BLAST increases by 20 -25% whereas the increase in percentage of true positives remains quite low (5 - 7%). The numbers and the inclusion rate (3 - 8%) of the significant homologues characterized by the presence of motifs are also very similar to the true positives