3PFDB - Database of Best represesentative PSSM Profiles of Protein Families:
Sensitive sequence search techniques play a pivotal role in the post genome-era. The huge volume of sequence data generated using high-throughput sequencing experiments need to be rapidly and effectively annotated using sensitive sequence search methods as a pilot step to understand the biological implications of individual sequences. Due to the practical inability of functional validation of individual sequences from available genome projects, Bioinformatics tools are widely using to enhance the function annotation of sequence data based on robust sequence annotation programs. BLAST suite of programs is the first choice for such annotation of individual protein sequences. Position Specific Iterative BLAST (PSI- BLAST) is one of the best flavours among the BLAST programs that offers a much sensitive sequence search methods using Position Specific Scoring Matrices(PSSM). PSI-BLAST can be effectively used to measure residue conservation in set of sequences. PSSMs can be created using PSI-BLAST, which finds similar protein sequences to a query sequence, and then constructs a PSSM from the resulting alignment. PSI-BLAST can save the PSSM (Position Specific Score Matrix) constructed through iterations. 3PFDB provides a collection of profiles generated using PSI-BLAST method processed using the FASSM method. FASSM (Function Association using Sequence & Structure Motifs) algorithm can be used to assess the ability of individual sequence in a given sequence family to generate the PSSM profiles. The method is especially useful to detect difficult relationships across protein famlies. It has been shown that FASSM can be used to assign function to sequence belong to difficult categories such as discontinuous domains, small domains and circular-permutations in domains. FASSM is demonstrated to perform accurate family associations at sequence identities as low as 15%. We have created a database of PSSM profiles rigourously assessed using the coverage analysis based on FASSM score. We have generated approximately 1.8 million profiles and profiles with good coverage score were stored in a browsable database of PSSM profiles - 3PFDB.
3PFDB - Methodology: PFam based family specific profiles were generated by considering individual sequence in a family as a refernce sequence. FASSM method and coverage analysis score based on FASSM is used as the filtering step to identify the best hit among different members belong to seed or full category. We have considered full length and domain sequences in different situations to generate the final profile for a given family.
3PFDB - Database Statistics:
Number of Pfam Families in the current release of Pfam (Pfam 22.0(July 2007) : 9318 families
Number of profiles created using individual reference sequence and assessed using coverage analysis of FASSM score for development of 3PFDB : 10, 85588
Number of Pfam Families with representative in 3PFDB : 8,524 families
Number of PFam families with out representative PSSMs in 3PFDB : 794 families
3PFDB - Database Features:
Best represesentative PSSM profile
FASSM based Coverage Analysis Results
PSIMOT-Motifs extracted using PSIMOT routine of FASSM
PSIMOT Motifs marked on PSSM
Sequence based PCA plot of the Protein Famly
Alignment of Protein Family
Download PSSM, HMM Model and alignment
Details about PFam Families
3PFDB - References:
K Gaurav, N Gupta and R Sowdhamini., (2005)FASSM: Enhanced Function Association in whole genome analysis using Sequence and Structural Motifs. In Silico Biology, 5, 0040
R.D. Finn et. al., (2008) The Pfam protein families database Nucleic Acids Res., 36, D281-D288
Altschul S.F., et. al., (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Eddy, S. (1998) Profile hidden markov models. Bioinformatics, 14, 755–763.
3PFDB - Team : Prof. R. Sowdhamini (Contact : mini@ncbs.res.in)
K. Shameer, P. Nagarajan, Gaurav Kumar