Home Browse Search BLAST Help Group
stifdb [Stress Gene Transcription Factor Database] - Help Page :
1. Introduction:
2. Algorithm:
3. Database Content:
4. Validation
5. Statistics

1. Introduction:

STIFDB - Stress responsive TranscrIption Factor Database is a specialized database that provides information about various Stress responsive genes and Stress inducible Transcription Factor related information from Arabidopsis thaliana.
STIFDB is an online resource of Abiotic Stress Gene Regulation in Arabidopsis, a comprehensive collection of abiotic stress responsive genes in Arabidopsis thaliana, with options to identify probable Transcription Factor Binding Sites in their promoters. In the response to abiotic stresses like drought, cold, salinity, high light, heat, etc, ten specific families of transcription factors are known to be involved. HMM-based models are used to identify binding sites of transcription factors belonging to these families. We have also consulted literature reports to cross-validate the Transcription Factor Binding Sites predicted by the method. Transcriptional regulation of genes in response to abiotic stresses like drought, cold, salinity, high light, ABA, oxidative stress etc. is an emerging area of plant research.


Stress responsive transcription factors in Arabidopsis are known to belong to AP2/EREBP, ABI3/VP1, ARF, bHLH, bZIP, HB, HSF, MYB, NAC and WRKY families of factors. Transcription factors belonging to different families recognize specific core sequences on the promoters of various stress responsive functional genes, for binding and further transcriptional activation of these target genes. The core binding sites/cis elements to which members of a transcription factor family bind have been characterized. Scanning the abiotic stress responsive promoterome of Arabidopsis for the presence of these cis elements could be of interest in studies on the abiotic stress responses of plants.
STIFDB Data Curation:

STIFDB - The database of stress responsive genes has been compiled from microarray expression data extracted from public microarray databases like NASC Array, DRASTIC, RARGE-MAEDA etc. We have used the STIF method to identify all possible abiotic stress responsive transcription factor binding sites. This database provides an option of scanning the 100bp and 1000bp promoters along with their 5'UTR of known stress responsive genes, for the presence of cis elements identified by stress responsive transcription factors.


2. Algorithm:

A computational method, STIF, has been developed to search for potential transcription factor binding sites of stress-specific transcription factors, starting from Hidden Markov Models of nucleotide binding site patterns of cis-elements that are well-known to respond during stress situations in plants. The 19 models of cis-elements, based on abiotic stress transcription factor families, were built as Hidden Markov Models and were validated using Jackknifing method. We had applied our HMM-based search algorithm, STIF, to search 100 base pairs upstream of the gene with its 5’UTR. We identified 60 abiotic stress genes from well-known microarray databases based on the high stress-induced expression profiles. These genes were known to be upregulated during stress and their validated TFBS information is also clearly available. To evaluate the method further, we also searched against 1000 base pairs with its 5’UTR.

Flowchart of STIF algorithm:



STIF approach for construction of a Hidden Markov Model of transcription factor binding sites given the experimentally observed nucleotide patterns :


3. Database Content:

TFmap:
TFmap is a pictorial representation of the the upstream regions of the stress genes in Arabidopsis with the predicted and validated Transcription Factor Binding Sites are marked along with the Z-Score.



TAIR ID:
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. TAIR ID is used in STiFdb to access the gene-based contents.Users can query the database using TAIR ID.


Gene Names [Including Aliases]:
Users can access STiFdb using standard gene names or its aliases reported in TAIR database. For Example : TAIR ID - AT4G23600 referes to the single entry in the database with different aliases CORI3,CORONATINE INDUCED 1,JASMONIC ACID RESPONSIVE 2 and JR2.


Chromosome Position:
Chromosome Position refers to the exact location of the given stress gene among the 5 Arabidopsis chromosomes.


References to Publication and Related Resources:
References to Publication and Related Resources are provided in each Gene-Related information pages.


Transcription Factor Family Name:
This refers to the Transcription Factor Family whose binding site sequence has been located/predicted on a given promoter sequence. This database identifies binding sites of the ten stress responsive transcription factor families and their subfamilies.

Binding Site Information:
Binding site refers to the core binding sequence to which the transcription factor binds. The binding site sequences have been characterized in literature reports and the references are provided.

Orientation of Binding Sites:
Orientation of Binding Sites refers to the DNA strand on which the Transcription Factor Binding Site has been located. It can be either on the Forward strand or on the Reverse DNA strand.

Stress Signals:
Stress Signal refers to the type of stress, which according to literature reports, regulates the transcription factor. Most of the transcription factors dealt with here are regulated by various abiotic stress signals like drought, cold, heat, light etc.

Z-Score:

Where Z = Z-score
Score = HMM score of the hit
Mean = Mean of scores of all window slides of query sequence ad the window size depends on the transcription factor binding sites
Standard Deviation – Standard Deviation of mean of all window slides of query sequence.
This algorithm is validated with experimental data set of stress genes. As per that information, we suggest that zscore for 100bp and its 5’UTR regions can be seen above 2.0 and for 1000bp and its 5’UTR regions can be seen above 1.5.

Normalization Score:
The normalization score tells the distribution of particular TFBS (Transcription Factor Binding Site) in the whole dataset of the stress genes. If the normalization numbers are low, then it means it is well distributed among the data set.


Validation:
Statistical validation of binding sites identified by STIF method is performed using the assessment of parameters like Coverage, Sensitivity and Specificity.Statistical parameters are calculated as follows: Coverage:
Sensitivity:
Specificity:
1000bp Validation:
No: of Genes used for Validation : 29
Click on the TAIR Gene IDs for more information about Genes
Statistics:
The values of 1000bp+5'UTR for zscore of 1.5 threshold are :
Coverage - 71.4%, Specificity - 20.4%, Sensitivity - 71.4%
AT1G02920
AT1G02930
AT1G05680
AT1G07890
AT1G20440
AT1G20450
AT1G52400
AT1G67090
AT1G77120
AT2G14610
AT2G14960
AT2G15970
AT2G21330
AT2G33380
AT2G40880
AT2G42540
AT2G46270
AT3G02480
AT3G04720
AT3G15500
AT4G00340
AT4G01120
AT4G02380
AT4G23130
AT4G37070
AT5G15970
AT5G44420
AT5G51070
AT5G52310

100bp Validation:
No: of Genes used for Validation : 12
Click on the TAIR Gene IDs for more information about Genes
Statistics :
The values of 100bp+5'UTR for zscore of 2.0 threshold are:
Coverage - 85%, Specificity - 54%, Sensitivity - 85%
AT1G02930
AT1G20450
AT2G14960
AT2G15970
AT2G33380
AT2G40880
AT2G42540
AT2G46270
AT4G01120
AT4G37070
AT5G15970
AT5G51070




Contact:
Prof. R. Sowdhamini [mini@ncbs.res.in]
Prof. M. Udayakumar

STIFdb Team:
Prof. R. Sowdhamini
Prof. M. Udayakumar
K. Shameer
S. Ambika
Susan Mary Varghese