James H. Collier,
Arthur M. Lesk,
Maria Garcia de la Banda, and
Arun S. Konagurthu
(ECCB) J. Bioinformatics, 30(17), pp.i512-i518, Sept. 2014.
Motivation: Progress in protein biology depends on the
reliability of results from a handful of computational techniques,
structural alignments being one. Recent reviews have highlighted substantial
inconsistencies and differences between alignment results generated
by the ever-growing stock of structural alignment programs.
The lack of consensus on how the quality of structural alignments must
be assessed has been identified as the main cause for the
observed differences. Current methods assess structural alignment quality
by constructing a scoring function that attempts to balance conflicting
criteria, mainly alignment coverage and fidelity of structures under
superposition. This traditional approach to measuring alignment quality,
the subject of considerable literature, has failed to solve the problem.
Further development along the same lines is unlikely to rectify the
current deficiencies in the field.
Results: This paper proposes a new statistical framework to
assess structural alignment quality and significance based on lossless
information compression. This is a radical departure from the traditional
approach of formulating scoring functions. It links the structural alignment
problem to the general class of statistical inductive inference problems,
solved using the information-theoretic criterion of minimum message length.
Based on this, we developed an efficient and reliable measure of structural
alignment quality, I-value. The performance of I-value is demonstrated in
comparison with a number of popular scoring functions, on a large
collection of competing alignments. Our analysis shows that I-value provides
a rigorous and reliable quantification of structural alignment quality,
addressing a major gap in the field.