Substitution matrices are the key component of
protein-alignment methods. We developed a new structural alphabet substitution matrix (SASM) for 3D-BLAST. The
SASM is similar to BOLSUM 62 in BLAST for protein sequences search. The SASM
matrix (23¡Ñ23) offers insights about substitution preferences of 3D segments between homologous
structures with low sequence identity. The highest substitution score in this
matrix is for the alignment of an alphabet ¡§W¡¨ with an alphabet ¡§W¡¨, in which
the sharp of the representative segment is similar to
b-turn that that allow
the peptide backbone to fold back and bear great significance in protein structure and function.
This value is 11. In total, most of these segments (95.25%) in ¡§W¡¨
are the b-turn
based on the tool PROMOTIF. The substitution scores are high when two identical
structural alphabets (e.g., diagonal entries) are aligned. For example, the
alignment scores are high if ¡§I¡¨ and ¡§S¡¨ are aligned to ¡§I¡¨ and ¡§S¡¨,
respectively. Most of the substitution scores are positive if two structural
alphabets in the same category, e.g., helix alphabets (A, Y, B, C, and D), are
aligned together since the sharps of their representatives are similar.
On the other hand, the lowest substitution score (-15) in this SASM matrix
is for the alignment of the ¡§Y¡¨ (a helix alphabet) with the ¡§E¡¨ (a strand
alphabet). All of the substitution scores are low when the helix alphabets (A,
Y, B, C, and D) are aligned to strand alphabets (E, F, and H). The above
relationships are well known, showing that the SASM embodies conventional
knowledge about structure secondary conservation in proteins.
Structural alphabets substitution matrix (SASM) of 3D-BLAST. The
scores are high if similar alphabets are aligned, e.g., helix alphabets (A, Y,
B, C, and D) are aligned to helix alphabets. In contrast, the scores are low
when helix alphabets are aligned to strand alphabets.