About insectOR |
Insect olfactory receptor (OR) gene family contains divergent proteins that differ from one insect order to another and also across its subfamilies. It proves to be a great challenge to discover these genes from newly sequenced non-model genomes. As a single-stop good method for detection and curation of this family is not yet available, scientists do it manually, navigating through thousands of homology based alignments.
This tool simplifies this task. Usually multiple insect OR queries are used to provide more sensitivity needed to detect these highly diversified proteins. As a result, various queries may align at the same gene location on the provided genome, but with different gene and intron-exon boundaries. This tool filters such overlapping alignments (also called as alignment cluster) to give the best possible gene model for every unique alignment cluster location on the genome. It also joins consecutive partial gene models arising from the same protein query to provide a better composite gene model. InsectOR takes input in the form of thousands of Exonerate alignments of well-curated OR proteins against the genome of interest and filters them to provide few more accurate gene models. Submit your query alignment here.
Exonerate provides good quality intron-aware alignments in a relatively short time. Hence output of this tool is used for this server. InsectOR expects protein2genome Exonerate alignments alongwith their alignment boundaries on the target in GFF format. This can be done with a minimal command as following:
exonerate --model protein2genome --maxintron <~2000 or less> <protein queries> <genome sequence> --showtargetgff TRUE
exonerate --model protein2genome --maxintron <~2000 or less> <protein queries> <genome sequence> --showtargetgff TRUE -p pam250
Insect OR GWS is capable of joining consecutive alignments on the genome arising from the consecutive regions from the same protein query, or if small portion of the query is shared at both the locations. Hence --maxintron parameter of 2000 or less will work with almost similar efficiency. Avoid giving higher max-intron cutoff as it may stitch up two consecutive genes in an alignment, returning wrong gene models.
(Please note that max-intron 2000 or less was found to work best for finding insect ORs from many bee species (See references). It can be changed for other distant species of insects if average intron lengths of those ORs differ significantly from bees.)
The output displayed on insectOR has maximum 6 tabs. Details are explained below -
If user chooses to use any of the TMH prediction methods, 7tm_6 hmmsearch, MAST motif search tool or Dalliance genome viewer, kindly cite related articles cited in the references section.
You might find our other work interesting -
Snehal Karpe, Murugavel Pavalam, Vikas Tiwari & R. Sowdhamini
https://github.com/sdk15/insectOR
Prof. R. Sowdhamini
National Centre for Biological Sciences (NCBS),
Tata Institute of Fundamental Research (TIFR),
Bangalore - 560065
India
mini@ncbs.res.in