Help Page: 3dswap-pred - Prediction of 3D domain swapping from protein sequence:
3dswap-pred is a webserver developed for the prediction of the structural phenomenon "3D-domain swapping" from protein sequence data. 3dswap-pred is using a unique machine learning approach based on the ensemble classifier - "Random Forest" for the prediction. 3DSwap-pred server is providing a prior method to understand the 3D domain swapping to develop approaches that enable to scan sequences for identifying putative members which can be involved in 3D domain swapping.
Positive sequence dataset is obtained from the curated a database of protein structures reported to be involved in 3D domain swapping. This dataset is currently being compiled as a database of 3D domain swapping in proteins (3Dswap: Knowledgebase of 3D Domain swapping in proteins). Based on the literature curation and structure analyses, protein structures with well defined .hinge regions. and .swapped regions. were included in the positive dataset. 805 sequences from the structures were extracted using custom Perl scripts from a total of 299 structures. Redundant datasets are removed using CD-HIT at 40% cut-off. Negative data set id derived using a novel data mining approach. To add diversity to the negative dataset and to avoid potential bias within the dataset, we retrieved representative sequence of one structure from each SCOP superfamily. We only consider the major four structural classes: all-&beta, all-&alpha, &alpha+&beta and &alpha/&beta. From this large sequence pool of negative datasets, we removed representative superfamily members that are present in the positive dataset. We have also removed the redundant sequences based on a CD-HIT performed at 40% cut-off to remove further redundancy. As a validation step for selecting appropriate negative dataset, we used DIAL server to scan the structural co-ordinates of proteins in negative dataset to assure that we have only non-swap cases in negative dataset. Only single continuous domains reported by DIAL server is considered in the final dataset used in testing and training of the ensemble classifier.
Input Method : Paste Sequence File
Step 1 : Paste a protein sequence in FASTA format inside the textarea of the 3dswap-pred server page
Step 2 : Click on "3dswap-pred" button to upload the FASTA file for the prediction
Step 3 : Results will be returned by server in less than 1-2 minutes depending on the load on the server Output / Prediction results from 3dswap-pred Server:
For a succesful input file of a protein sequence in FASTA format the server will return the output as "Domain-swap" or "non domain-swap" according to the prediction result from the Random Forest based prediction model.