Supplementary material


Tables 1a and 1b list the DIAL0 problematic entries. The problematic entires are those protein chains where the number of protein domains identified by the scheme does not agree with a graphical inspection or as reported in the crystal structure report. A general reduction in the number of problematic entries with no loss in accuracy for the controls reflects improvement in the algorithm for this test dataset. Table 1c lists the DIAL0 non-problematic entries.

The domain boundary definitions given by DIAL4 were compared with those given by SCOP, 3Dee, CATH, crystallographers', DALI and Protein Domain Parser (PDP) for the 55 protein dataset (Jones et al., 1998) and the results are provided in Tables 2 and 3.

Tables 4a-b compare detailed domain boundaries and the associated overlap scores between DIAL4 and various resources for the 55 protein dataset.

Table1a: Dial0 Single Domain Problematic Cases from the 40 Protein Dataset.

Table1b: Dial0 Multi Domain Problematic Cases from the 40 Protein Dataset.

Table1c: Dial0 Non-Problematic Proteins as controls from the 40 Protein Dataset (Both Single and Multi Domain Proteins)

Table2: Comparison of the number of domains identified by DIAL4 against those of SCOP, 3Dee, CATH, DALI, crystallographers' and Protein Domain Parser (PDP) for 55 dataset proteins (Jones et al., 1998).

Table3: Table containing agreement scores (that copares the number of domains) between DIAL4 and SCOP, 3Dee, CATH, DALI, crystallographers' and Protein Domain Parser (P DP) for 55 dataset proteins (Jones et al., 1998).

Table4a: Comparison of DIAL4 domain boundary definitions with those given by SCOP, 3Dee and CATH for the 55 protein dataset (Jones et al., 1998).

Table4b: Comparison of DIAL4 domain boundary definitions with those given by crystallographers', DALI and Protein Domain Parser (PDP) for the 55 protein dataset (Jones et al., 1998).

Figure: Illustrative examples of agreement in the number of domains identified between automatic domain identification by DIAL4 in comparison to manually curated resources such as SCOP (Murzin et al., 1995). Different domains in a polypeptide are distinguished by means of different colours. Protein Data Bank (PDB) codes (Berman et al., 2000) are provided in all four examples.

(A), (B): Domain boundaries defined for rhodanese (PDB code, 1rhd) by DIAL4 and SCOP, respectively.

(C), (D): same as (A), (B) but for ferredoxin reductase (PDB code, 1fnr). 1rhd and 1fnr are examples where the number of domains and domain boundaries agree very well between DIAL4 and SCOP.

(E), (F): Domain definition in p-hydroxybenzoate hydroxylase. DIAL4 predicts mor e number of domains than SCOP. An extra small domain (shown in green) is recognized by DIAL4.

(G), (H): Domain definition of variant surface glycoprotein. The protein is defined as two domains by DIAL4 but is treated as a single-domain fold by SCOP.