Supplementary material
Tables 1a and 1b list the DIAL0 problematic entries. The problematic entires are those protein chains where the
number of protein domains identified by the scheme does not agree with a graphical inspection or as reported in the
crystal structure report. A general reduction in the number of problematic entries with no loss in accuracy for the
controls reflects improvement in the algorithm for this test dataset. Table 1c lists the DIAL0 non-problematic entries.
The domain boundary definitions given by DIAL4
were compared with those given by SCOP, 3Dee, CATH, crystallographers', DALI and Protein Domain Parser (PDP) for the 55 protein dataset (Jones et al., 1998)
and the results are provided in Tables 2 and 3.
Tables 4a-b compare detailed domain boundaries and the associated overlap scores between DIAL4 and various resources for the 55 protein dataset.
Table1a: Dial0 Single Domain Problematic Cases from the 40 Protein Dataset.
Table1b: Dial0 Multi Domain Problematic Cases from the 40 Protein Dataset.
Table1c: Dial0 Non-Problematic Proteins as controls from the 40 Protein Dataset (Both Single and Multi Domain Proteins)
Table2: Comparison of the number of domains identified by DIAL4 against
those of SCOP, 3Dee, CATH, DALI, crystallographers' and Protein Domain Parser (PDP)
for 55 dataset proteins (Jones et al., 1998).
Table3: Table containing agreement scores (that copares the number of domains) between
DIAL4 and
SCOP, 3Dee, CATH, DALI, crystallographers' and Protein Domain Parser (P
DP)
for 55 dataset proteins (Jones et al., 1998).
Table4a: Comparison of DIAL4 domain boundary definitions with
those given by SCOP, 3Dee and CATH for the 55 protein dataset (Jones et
al., 1998).
Table4b: Comparison of DIAL4 domain boundary definitions with
those given by crystallographers', DALI and Protein Domain Parser (PDP) for the 55
protein dataset (Jones et al., 1998).
Figure: Illustrative examples of agreement in
the number of domains identified between automatic domain identification
by DIAL4 in comparison to manually curated resources such as SCOP
(Murzin et al., 1995). Different domains in a polypeptide are
distinguished by means of different colours. Protein Data Bank (PDB) codes
(Berman et al., 2000) are provided in all four examples.
(A), (B): Domain boundaries defined for rhodanese (PDB code, 1rhd) by DIAL4
and SCOP, respectively.
(C), (D): same as (A), (B) but for ferredoxin reductase (PDB code, 1fnr).
1rhd and 1fnr are examples where the number of domains and domain boundaries
agree very well between DIAL4 and SCOP.
(E), (F): Domain definition in p-hydroxybenzoate hydroxylase. DIAL4 predicts mor
e number of domains than SCOP. An extra small domain (shown in green) is
recognized by DIAL4.
(G), (H): Domain definition of variant surface glycoprotein. The protein is
defined as two domains by DIAL4 but is treated as a single-domain fold by
SCOP.