Stifdb [Stress Gene Transcription Factor Database] - Help Page :


1. Introduction:

STIFDB2 - Stress responsive Transcription Factor Database is a specialized database that provides information about various Stress responsive genes and Stress inducible Transcription Factor related information from Arabidopsis thaliana and Oryza sativa L..
STIFDB2 is an online resource of Abiotic Stress Gene Regulation in Arabidopsis and Oryza sativa L., a comprehensive collection of abiotic stress responsive genes in Arabidopsis thaliana and Oryza sativa L.with options to identify probable Transcription Factor Binding Sites in their promoters. In the response to abiotic stresses like ABA,drought,dehydration, cold, salinity, high light, heat,heavy metals etc, ten specific families of transcription factors are known to be involved Arabidopsis and five specific families of transcription factors are known to be involved in Oryza sativa L. HMM-based models are used to identify binding sites of transcription factors belonging to these families. We have also consulted literature reports to cross-validate the Transcription Factor Binding Sites predicted by the method. Transcriptional regulation of genes in response to abiotic stresses like ABA, drought, cold, salinity, high light, oxidative stress,heavy metals etc. is an emerging area of plant research.


Stress responsive transcription factors in Arabidopsis are known to belong to AP2/EREBP, ABI3/VP1, ARF, bHLH, bZIP, HB, HSF, MYB, NAC and WRKY and Oryza sativa L. are known to belong to ERF/AP2CBF/DREB,bHLH, bZIP,MYB, NAC families of transcription factors. Transcription factors belonging to different families recognize specific core sequences on the promoters of various stress responsive functional genes, for binding and further transcriptional activation of these target genes. The core binding sites/cis elements to which members of a transcription factor family bind have been characterized. Scanning the abiotic stress responsive promoters of Arabidopsis and Oryza sativa L. for the presence of these cis elements could be of interest in studies on the abiotic stress responses of plants.


STIFDB2 Data Curation:

STIFDB2 - The database of stress responsive genes has been compiled from microarray expression data extracted from public microarray databases like NCBI-GEO. We have used the STIF method to identify all possible abiotic stress responsive transcription factor binding sites. This database provides an option of scanning the 100bp and 1000bp promoters along with their 5'UTR of known stress responsive genes, for the presence of cis elements identified by stress responsive transcription factors.


Back to Top


2. Algorithm:

A computational method, STIF, has been developed to search for potential transcription factor binding sites of stress-specific transcription factors, starting from Hidden Markov Models of nucleotide binding site patterns of cis-elements that are well-known to respond during stress situations in plants. The 19 models of cis-elements, based on abiotic stress transcription factor families, were built as Hidden Markov Models and were validated using Jackknifing method. We had applied our HMM-based search algorithm, STIF, to search 100 base pairs upstream of the gene with its 5’UTR. We identified 60 abiotic stress genes from well-known microarray databases based on the high stress-induced expression profiles. These genes were known to be upregulated during stress and their validated TFBS information is also clearly available. To evaluate the method further, we also searched against 1000 base pairs with its 5’UTR.


Flowchart of STIF algorithm:


STIF approach for construction of a Hidden Markov Model of transcription factor binding sites given the experimentally observed nucleotide patterns :



3. Database Content:


TFmap:

TFmap is a pictorial representation of the the upstream regions of the stress genes in Arabidopsis with the predicted and validated Transcription Factor Binding Sites are marked along with the Z-Score.



TAIR ID:

The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. TAIR ID is used in STIFdb to access the gene-based contents.Users can query the database using TAIR ID.


TIGR ID:

The Information of Genomic Resourses (TIGR) maintains a database of genetic and molecular biology data for the model higher plant Oryza sativa subsp. indica. TIGR ID is used in STIFdb to access the gene-based contents.Users can query the database using TIGR ID.


RAPDB ID:

The Rice Annotation Project (RAP) maintains a database of genetic and molecular biology data for the model higher plant Oryza sativa ssp. japonica cv. Nipponbare. RAPDB ID is used in STIFdb to access the gene-based contents.Users can query the database using RAPDB ID.


Gene Names [Including Aliases]:

Users can access STIFdb using standard gene names or its aliases reported in TAIR database. For Example : TAIR ID - AT4G23600 referes to the single entry in the database with different aliases CORI3,CORONATINE INDUCED 1,JASMONIC ACID RESPONSIVE 2 and JR2.


Chromosome Position:

Chromosome Position refers to the exact location of the given stress gene among the 5 Arabidopsis and 12 Oryza sativa L.chromosomes.


References to Publication and Related Resources:

References to Publication and Related Resources are provided in each Gene-Related information pages.


Transcription Factor Family Name:

This refers to the Transcription Factor Family whose binding site sequence has been located/predicted on a given promoter sequence. This database identifies binding sites of the ten stress responsive transcription factor families and their subfamilies.


Binding Site Information:

Binding site refers to the core binding sequence to which the transcription factor binds. The binding site sequences have been characterized in literature reports and the references are provided.


Orientation of Binding Sites:

Orientation of Binding Sites refers to the DNA strand on which the Transcription Factor Binding Site has been located. It can be either on the Forward DNA strand or on the Reverse DNA strand.


Stress Signals:

Stress Signal refers to the type of stress, which according to literature reports, regulates the transcription factor. Most of the transcription factors dealt with here are regulated by various abiotic stress signals like drought, cold, heat, light etc.


Z-Score:


Where Z = Z-score
Score = HMM score of the hit
Mean = Mean of scores of all window slides of query sequence ad the window size depends on the transcription factor binding sites
Standard Deviation – Standard Deviation of mean of all window slides of query sequence.
This algorithm is validated with experimental data set of stress genes. As per that information, we suggest that zscore for 100bp and its 5’UTR regions can be seen above 2.0 and for 1000bp and its 5’UTR regions can be seen above 1.5.


Normalization Score:

The normalization score tells the distribution of particular TFBS (Transcription Factor Binding Site) in the whole dataset of the stress genes. If the normalization numbers are low, then it means it is well distributed among the data set.



Validation:

Statistical validation of binding sites identified by STIF method is performed using the assessment of parameters like Coverage, Sensitivity and Specificity.Statistical parameters are calculated as follows:


Coverage:



Sensitivity:



Specificity:



1000bp Validation:

No: of Genes used for Validation : 48

Click on the TAIR Gene IDs for more information about Genes


Statistics:

The values of 1000bp+5'UTR for zscore of 1.5 threshold are :

Coverage - 71.4%, Specificity - 20.4%, Sensitivity - 71.4%

AT1G01470
AT1G02920
AT1G02930
AT1G05680
AT1G07890
AT1G20440
AT1G20450
AT1G27730
AT1G46768
AT1G51090
AT1G52400
AT1G67090
AT1G77120
AT2G14610
AT2G15320
AT2G15970
AT2G17840
AT2G21330
AT2G33380
AT2G39810
AT2G40880
AT2G42530
AT2G42540
AT2G46270
AT3G02480
AT3G04720
AT3G15500
AT3G24190
AT3G46640
AT3G50960
AT3G50970
AT4G00340
AT4G01120
AT4G02380
AT4G14000
AT4G15910
AT4G24960
AT4G33070
AT4G35300
AT4G37070
AT4G38580
AT5G04340
AT5G15960
AT5G15970
AT5G17460
AT5G44420
AT5G51070
AT5G52310

100bp Validation:

No: of Genes used for Validation : 12

Click on the TAIR Gene IDs for more information about Genes


Statistics :

The values of 100bp+5'UTR for zscore of 2.0 threshold are:

Coverage - 85%, Specificity - 54%, Sensitivity - 85%

AT1G02930
AT1G20450
AT2G14960
AT2G15970
AT2G33380
AT2G40880
AT2G42540
AT2G46270
AT4G01120
AT4G37070
AT5G15970
AT5G51070

 
 
           

Contact :

Prof. R. Sowdhamini

STIFDB TEAM :

Prof. R. Sowdhamini | Shameer Khader
Mahantesha Naika B. N. | Oommen K. M.

Last Updated :

15th Oct, 2012