An Information Measure for the String to String Correction Problem with Applications. 

isbn:047302313X, ACSC17, pp.659668, 1921 Jan. 1994.L. Allison & C. S. Wallace,

NB.
A related paper appeared as,
L. Allison & C. S. Wallace,
The Posterior Probability Distribution of Alignments and
Its Application to Parameter Estimation of Evolutionary Trees and
to Optimization of Multiple Alignments,

Algorithms for approximate string matching are widely used to infer a relation between two strings in situations where strings mutate, evolve or contain noise, eg. DNA or protein strings. Such a relation is (just) a hypothesis. It has been shown how to calculate a probability for such hypotheses. Unfortunately the matching algorithms do not directly extend to more than two strings because of rapidly increasing complexity. However, a stochastic method is one way around this difficulty. The probabilities of competing hypotheses are used to generate random alignments of two or more strings. The alignments are sampled from the correct probability distribution. Averaging over many such alignments gives good estimates of how closely the strings are related and in what way. (Mediocrity is useful.) In addition, sampling in an increasingly selective way gives a simulated annealing search for an optimal multiplealignment.
This work was partly supported by Australian Research Council grant A49030439.
Keywords and Phrases:
approximate string matching,
edit distance,
evolutionary tree,
hidden Markov model,
inductive inference,
Gibbs sampling,
Monte Carlo method,
multiple alignment,
simulated annealing,
stochastic,
string to string correction problem.
Algorithms on strings are much studied in computer science. Exact matching algorithms are ruled out in many realworld problems by the properties of the data: DNA strings evolve and mutate, speech is uttered at different rates, computer images often contain noise, there may be experimental error and so on. Approximate matching algorithms based on Levenshtein's (1966) metric, Sellers' (1974) editdistance, Wagner and Fischer's (1974) string to string correction problem, the longest common subsequence and other related notions can cope with this reality (Sankoff & Kruskall 1983). The idea is that characters can be changed, inserted or deleted and that a "good" relation between two or more strings will involve a "small" number of such character mutations. Note that other models of string mutation are possible, for example Tichy's (1984) blockmoves model, and may be more appropriate in certain circumstances.
string A: ACGTAATACGT string B: ACGTATAACCT Some possible alignments: (i) ACGTAATACGT (ii) ACGTAATACGT (iii) ACGTAATACGT            ACGTATAACCT ACGTATAACCT ACGTATAACCT      mismatch  ins_{B} ins_{A} Generation instruction sequence equivalent to alignment (i): match(A); match(C); match(G); match(T); match(A); ins_{A}(A); match(T); match(A); ins_{B}(A); match(C); mismatch(G,C); match(T) Mutation instruction sequence equivalent to alignment (i): copy; copy; copy; copy; copy; del; copy; copy; ins(A); copy; change(C); copy Figure 1: Alignment Examples. 
Given K>=2 strings, we are interested in the following kinds of question: are the strings related, and if so are they related by a given evolutionary tree, and if so which parts of the strings are related to each other? Many programs exist for comparing multiple strings but most use more or less ad hoc weights or penalties to score relations, and employ heuristics to search for one good relation. The aim here is to treat the problems as inductive inference problems: an answer is just a hypothesis and an estimate of a parameter of a model is just that, an estimate. Weights or parameters should be inferred from the data. Probabilities, error bounds or other figures of merit should be given for hypotheses. There are efficient inference algorithms for K=2 strings. Bishop et al (1987) first cast the two string problem as a maximum likelihood problem. Thorne et al (1991, 1992) extended it to models that were more realistic for DNA sequences. Allison and Yee (1990) used a minimum message length (MML) approach and Allison et al (1992a) examined finitestate models or hidden Markov models of string evolution. Wallace and Freeman (1987) gave the statistical foundations of the MML approach to inductive inference.
The next section summarises the inductive inference approach to two strings as a basis for the remainder of the paper. The naive extensions of algorithms for the K=2 case to several strings are infeasible. Instead, they are used as the basis of a stochastic method for comparing K>=2 strings. Note that if the "correct" evolutionary tree to use is not known, various methods, such as Hein's (1989) or Thorne and Kishino's (1992), could be used to suggest one or more candidate trees to be evaluated in turn. The paper is framed in terms of DNA strings over the alphabet of four bases, {A, C, G, T} but the general stochastic technique is applicable to other strings and alphabets.
An alignment is a way of displaying a relation between the parts of two or more strings. Some or all of the strings are padded out with a special null character `' so that all strings have the same length. The resulting strings are written out one above another. The columns of the alignment are tuples of proper characters and `', one component coming from each padded string. The only restriction is that the allnull tuple is not allowed. Some examples of alignments are given in figure 1. The number of possible alignments grows exponentially with string length and one often wants a "good" alignment. Each tuple can be given a cost (or a score) in various ways and the cost (score) of an alignment is the sum of the costs (scores) of its tuples. Tuples with matching characters generally have a low cost (high score). Attention is restricted to K=2 in this section.
If we are given two strings, A and B, it is sometimes convenient to think of A as the ancestor and B as the descendant. In that case we relate A and B by a mutation machine which reads A and executes a sequence of mutation instructions to output B:
copy: copy a character from A to B change(y): read a character x but output y del: read char' from A but output nothing ins(x): output x (reads no character of A) 
We can just as well consider B to be the ancestor and A the descendant by interchanging insert and delete and reversing changes. Insertions and deletions are called indels.
It is sometimes convenient to think of A and B as having equal standing, particularly if they are both descended from an unknown ancestor. They can then be related by a generation machine which outputs A and B by executing a sequence of generation instructions:
match(x): output x both in A and in B mismatch(x,y): output x in A and y in B ins_{A}(x): output x in A only ins_{B}(y): output y in B only 
It is clear that there is a correspondence between
an alignment, a mutation sequence and a generation sequence
and they are used interchangeably as convenient.
For example,
the pair
Boundary Conditions: D[0, 0] = v D[i, 0] = f( D[i1, 0], g(A[i], `') ), i = 1..A D[0, j] = f( D[0, j1], g(`', B[j]) ), j = 1..B General Step, i = 1..A, j = 1..B: D[i, j] = h( f(D[i1, j1], g(A[i], B[j])),  match/mismatch f(D[i1, j], g(A[i], `' )),  delete A[i] f(D[i, j1], g(`' , B[j])) )  insert B[j] Return: D[A,B] (i) longest common subsequence (LCS): v = 0 f = + g(x,x)=1, g(x,y)=g(x,`')=g(`',y)=0 h = max (ii) editdistance: v = 0 f = + g(x,x)=0, g(x,y)=g(x,`')=g(`',y)=1 h = min (iii) most probable alignment: v = 1 f = * g(x,x) = P(<x,x>), g(x,y) = P(<x,y>), g(x,`') = P(<x,`'>), g(`',y) = P(<y>) h = max (iv) rtheory, P(A&B): v, f and g as for (iii) h = + Figure 2: Versions of the Dynamic Programming Algorithm. 
It turns out that we cannot easily answer the question, are A and B related? We can however answer the nearly equivalent question, are A and B related by a generation machine M? The machine is defined by the probabilities of its instructions. Given M, the probability of A and B being generated by the sequence of instructions S_{1}; S_{2}; ...; S_{L} is
P(L)*P(S_{1})*P(S_{2})*...*P(S_{L})
The probability of the length, P(L), presents a nasty technical problem. If any good prior for lengths is known then it should certainly be used. Failing that, Rissanen's (1983) universal prior for integers or similar can be used. In any case, most reasonable priors on length behave as ~1/L^{1+delta} (delta>=0) and, as the lengths of all plausible alignments are similar, the probabilities of those lengths are very similar. Now, one instruction sequence represents one hypothetical relation between A and B. If the question of its length is ignored, a mostprobable alignment, conditional on M, can be found by a third version of the dynamic programming algorithm, figure 2(iii), which deals in probabilities.
M's instruction probabilities are called its parameters. They can be estimated from an alignment if they are not known apriori. This leads to an iterative method of estimating the parameters: first, initial estimates are guessed and an initial alignment is found, then the parameters are reestimated from the alignment, a new alignment is found and the process repeated until convergence. Convergence is guaranteed, and four to eight iterations are usually sufficient, but it is possible to get stuck in local optima for distantly related strings. Machine instructions are drawn from a multistate distribution and Wallace and Boulton (1968) show with what accuracy such probabilities can be legitimately inferred.
We do sometimes want a mostprobable alignment, but it is too detailed an answer for the question, are A and B related by M, ie. in some unspecified way? This is called the rtheory. To calculate P(A&BM) we need to sum the probability over all of M's instruction sequences that give A and B. Fortunately this can be done by yet another version of the DPA, figure 2(iv). If the parameters are to be estimated from the strings, the previous iterative method can be used and a weightedaverage of the observed frequencies of machineinstructions is calculated as the DPA proceeds. Simulation results from Yee and Allison (1993) confirm that a single optimal alignment gives biased estimates of machine parameters and that the rtheory gives unbiased estimates. The reason is clear: mutation is random and does not follow an optimal path. Mutations can compensate for each other so estimates based on a single optimal alignment are literally overoptimistic and tend to underestimate systematically the amount of change that occurred. (We do not yet know how to model any pressure of selection.)
It is convenient for the last two DPAs to deal with log_{2} of probabilities, also known as message lengths, which are measured in bits, rather than with probabilities directly, as the final values of the latter can be very small.
Lastly, there is a natural nulltheory, that the strings are not related. When the log_{2} of its probability is taken, this amounts to approximately two bits per character for the case of DNA. Comparison with the nulltheory gives a hypothesis test. Note it can easily be the case that no single alignment is an acceptable hypothesis yet the rtheory, that the strings are related by M, is acceptable.
An alignment of K>=2 strings is formed as above by padding out the strings with `' so that they have the same length and by writing the resulting strings one above another. This gives a formal language of multiplealignments but its semantics must also be specified. A very natural interpretation of a multiplealignment is conditional upon an evolutionary tree over the strings: Each tuple of the alignment represents the descendants, over all possible histories consistent with the tree, of an ancestral character or an inserted character. We consider unrooted binary trees here, the position of the ancestral string being arbitrary. Each edge of the tree is modelled by a mutationmachine. Unrooted trees are natural unless a nearly constant rate of evolution can be assumed and binary trees are sufficient for most purposes, although the general methods could be applied easily to rooted trees and to nary trees. The main problems considered here are to find an optimal multiplealignment given a tree and to estimate the edges (machines) of the tree.
An unrooted tree over K strings has 2K3 edges or machines.
If all their parameters are known,
the machines can be combined into an equivalent generation machine for
the K strings.
The probability of a Ktuple of characters can be calculated during an
O(K)time traversal of the tree; the details are given by
Felsenstein (1981, 1983)
who considered inferring an evolutionary tree from a fixed alignment
which is the complementary problem to the multiplealignment problem
considered here.
For example, given the tree ((1 2) (3 4)) and assuming that all edges
are similar and that P(change) is small,
the 4tuple
An optimal multiplealignment can be useful in its own right but it is likely to give biased estimates of the tree edges as does optimal pairwisealignment. The solution used for K=2 of averaging over all alignments is infeasible for larger K, but a Gibbs sampling or Monte Carlo method (Hastings 1970) is described which averages over many, if not all, multiplealignments to give a good approximation.
The key to the stochastic methods to be described is the generation of random alignments sampled from the correct probability distribution. The general step of the DPA (figure 2(iv)) calculates P(A[1..i]&B[1..j]M) by adding the probabilities of its three immediate predecessors:
diagonal: P(A[1..i1]&B[1..j1]; M) vertical: P(A[1..i1]&B[1..j] ; M) 
although A and B now represent alignments of sets of strings. If at the same time a random choice is made of one of these directions, in proportion to their probabilities, then a traceback of these choices will give a random alignment sampled from the probability distribution of alignments. This requires O(A*B) space which can be reduced to O(B) by using Hirschberg's (1975) technique.
The process to sample random multiplealignments repeatedly chooses a random edge to break the tree on. The current multiplealignment is projected onto two subalignments for the resulting partition of the strings. A new random Kway realignment is then generated as above. The initial multiplealignment is formed by a suboptimal process. The search space has many degrees of freedom. The use of subalignments amounts to keeping most of them fixed at each step, but the realignment is chosen from the conditional posterior probability distribution which is sufficient for a Gibbs sampling or Monte Carlo method. Parameters are estimated from each multiplealignment and averaged to give final estimates. Standard deviations are calculated and give an indication of accuracy. A working estimate from "recent" alignments is used during the sampling process.
s8 s15 s9 . . s14 . . . . . . . . .. .. s4 s7 . . . . . . s2s1=s3 . . . . . . s5 s6 .. .. . . . . . . . . s10 . . s13 s11 s12 During Evolution: Ancestor, s1=s3, length(s1)=500 and for each edge P(copy)=0.8; P(change)=0.1; P(insert)=P(delete)=0.05 Figure 3: Full Tree. 
The Gibbs sampling process generates random alignments by the repeated choice of directions in proportion to their probabilities. The choice can be influenced in a useful way if the probabilities are raised to a power, q>=0. When q=0 the choice is uniform. When q=1 the choice is in proportion to probability. When q>1 the choice is biased towards more probable alignments and when q is large the bias is to optimal alignments. Increasing q can therefore be used to implement the cooling of a simulated annealing search for an optimal multiplealignment. Recall that the programs actually deal with log_{2} probabilities, so multiplying these quantities by q is equivalent to raising probabilities to the power q. This strategy is very different from that of Ishikawa et al (1992) which repeatedly makes a small perturbation to the multiplealignment and either accepts the result or rejects it.
A Pascal program has been written to analyse DNA strings using the methods described above. A hashtable is used to store the probabilities of tuples when they are met for the first time in the DPA so that later meetings do not need recalculation. The program also uses Hirschberg's technique. Two alignments of four strings of length 500 characters can be aligned to give an alignment of eight strings in 15 to 30 seconds on a Sparcstation. 1000 random alignments of eight strings can be sampled in a few hours. The running time is affected by the similarity of the strings which influences both the lengths of alignments and how many different kinds of tuple are evaluated. The present implementation embodies the simplest onestate mutationmachines and assumes that all characters are equally likely, and that for each edge all insertions are equally likely, all changes are equally likely and P(ins) = P(del). These assumptions are not unreasonable for DNA and they could be relaxed. Each edge of the tree has a different mutation machine.
edge Simulated Ann' Gibbs Sampling 1000x    s2s3 : 40chng 28ins 27del (55indel) ED=85 .816 .169 .015 .799(.023) .107(.020) .094(.024) s2s4 : 51chng 29ins 25del (54indel) ED=96 .825 .119 .056 .748(.018) .092(.015) .160(.026) s2s5 : 50chng 22ins 18del (40indel) ED=85 .845 .119 .036 .817(.019) .083(.017) .100(.026) s3s6 : 60chng 29ins 23del (52indel) ED=103 .828 .164 .008 .816(.014) .141(.015) .043(.013) s3s7 : 50chng 26ins 31del (57indel) ED=97 .860 .080 .060 .759(.021) .109(.018) .132(.032) s4s8 : 48chng 25ins 24del (49indel) ED=90 .842 .116 .042 .810(.018) .099(.009) .092(.019) s4s9 : 37chng 22ins 26del (48indel) ED=80 .852 .084 .064 .837(.020) .079(.010) .085(.020) s5s10: 51chng 29ins 35del (64indel) ED=108 .827 .062 .111 .763(.023) .068(.013) .169(.032) s5s11: 53chng 23ins 17del (40indel) ED=92 .830 .139 .031 .817(.015) .111(.013) .071(.012) s6s12: 60chng 23ins 29del (52indel) ED=99 .795 .145 .060 .773(.014) .138(.012) .089(.019) s6s13: 53chng 21ins 20del (41indel) ED=92 .861 .078 .061 .835(.011) .096(.014) .069(.017) s7s14: 54chng 18ins 23del (41indel) ED=89 .825 .122 .053 .798(.012) .109(.009) .093(.013) s7s15: 58chng 30ins 28del (58indel) ED=108 .826 .097 .077 .785(.013) .100(.010) .115(.015)   Mean: .833 .115 .052 .797 .102 .101 log_{2} probabilities or message lengths (bits): 7705 9336(+274) mean(SD) Null=8088 Key: edge: actual #changes #inserts #deletes (#indels) EditDistance P(copy) P(chng) P(indel) P(copy)(SD) P(chng)(SD) P(indel)(SD) Evolution, each edge: P(copy)=0.8; P(chng)=0.1; P(ins)=P(del)=0.05 Length(s1=s3) = 500 Table 1: Inferred Parameters. 
The current program has been used on both artificial and real DNA data, the evolutionary history of artificial strings being known.
Figure 3 shows a tree used in one test. The ancestral string is s1, the first generation descendants s2 and s3, and so on. S1 and s3 are identical because we are dealing with unrooted trees. The machines on all edges were similar during evolution, causing 20% mutation on average: P(copy) = 0.8, P(change) = 0.1 and P(ins) = P(del) = 0.05. The process was random so there was variation around these figures. These parameter values were not known to the analysis program which was given only the tree topology and the third generation descendants s8 to s15. 20% mutation per edge may seem a small amount but brothers, such as s8 and s9 say, are separated by two edges, cousins such as s8 and s10 by four edges, and s8 and s12 by five edges. This data is at quite a high level of mutation for DNA where the small character set causes many spurious matches.
Table 1 summarises the results of the test. It shows the edge parameters estimated from the putative optimal alignment found by simulated annealing and those estimated by averaging over 1000 alignments from Gibbs sampling. The following values from the evolution of the strings are also given for each edge: actual numbers of changes, inserts and deletes, and the simple editdistance of the two strings at the ends of the edge. Note that every one of these editdistances is less than the sum of the corresponding mutations as a "better" explanation than the true events can be found. The optimal(?) multiplealignment underestimates P(indel) in particular and this is a general tendency that is greatest for internal edges of the tree and increases with mutation level.
CATQ CATB . . . . . . . . . . . . (.81,.19,.00) . < (P(copy),P(change),P(indel)) CCOLI. + . + . + . (.67,.32,.00)+ . +. . . . . . CATP CATD log_{2} probabilities or message lengths (bits): Tree optimal Gibbs mean(SD)    (((CATB CATQ)(CATD CATP))CCOLI)  : 4369 4555(+40) ((CATB CATQ)(CATD(CATP CCOLI))) +++ : 4376 4502(+26) Null: 6449 Data: Genbankid organism designation m55620 Clostridium perfringens CATQ chloramphenicol acetyltransf... m28717 Clostridium perfringens CATP chloramphenicolresistance p... x15100 Clostridium difficile CATD gene for chloramphenicol ace... m93113 Clostridium butyricum CATB chloramphenicol acetyltransf... m35190 Campylobacter coli CCOLI chloramphenicol acetyltransf... Figure 4: Two Alternative Trees for Five Bacterial Sequences. 
The mean machine from simulated annealing further illustrates this bias. The estimates from Gibbs sampling do not show bias and the discrepancy between the "true" and estimated values of P(indel) is less than two standard deviations, except for the edge s3s6, and is often much less. The standard deviations of the estimates are large, as is to be expected for this level of mutation, but a single (optimal) alignment gives no indication of this inherent uncertainty. The optimal alignment beats the nulltheory on message length and so is an acceptable hypothesis but an "average" alignment fails to be acceptable by a wide margin. However the latter collectively dominate the search space and the total probability which gives an indication of how many of them there are; only a tiny fraction is sampled by Gibbs sampling.
Many similar tests were done, with balanced and unbalanced trees and with different levels of mutation, some being recorded by Allison and Wallace (1993). At low levels of mutation, <10% per edge, the deterministic heuristic is rarely beaten by simulated annealing in the search for an optimal alignment and such an alignment can give reasonable edge estimates. At greater levels of mutation, >=15% per edge, simulated annealing invariably beats the deterministic heuristic and optimal alignments give increasingly biased edge estimates. These effects would be modified for a different alphabet. Much experimentation remains to be done; in particular the tuning of simulated annealing algorithms is something of a black art.
Five bacterial DNA sequences for chloramphenicol acetyltransferase (CAT)encoding resistance determinant (Huggins et al 1992) provide an interesting case summarised in figure 4. The coding regions or open reading frames were compared; these are about 640 characters long. It is clear from comparisons of four sequences that the (unrooted) tree topology ((CATB CATQ) (CATD CATP)) is strongly preferred over the two other possible topologies. Where CCOLI fits in is less clear. The tree topology (((CATB CATQ) (CATD CATP)) CCOLI) is slightly preferred on the basis of an optimal alignment, but the tree topology ((CATB CATQ) (CATD (CATP CCOLI))) is preferred on the basis of average alignment probabilities. This suggests some genuine uncertainty. It is resolved with more information: Huggins et al performed an analysis using protein sequences, including sequences from more organisms, and bringing experts' knowledge to the problem, and (((CATB CATQ) (CATD CATP)) CCOLI) is a subtree of their preferred tree.
Deterministic algorithms to estimate the edges of an evolutionary tree and to find an optimal alignment of more than two strings are infeasible in general. Stochastic processes provide one way around this problem. Averaging over many random alignments, sampled from the correct probability distribution, gives good estimates of the tree's edges. Secondly, sampling in an increasingly selective way gives a simulated annealing search for an optimal alignment. Although valuable for some purposes, an optimal alignment gives biased estimates of tree edges, the effect increasing with the level of mutation and being worst for internal edges of the tree. A program based on these ideas has been implemented for DNA strings using a simple 1state model or mutation machine. Current work includes generalising the model of mutation that is used.
The stochastic techniques could be applied to other kinds of strings and to related problems. It is not essential for probabilities to be calculated exactly for the simulated annealing method (only) to work; it is enough that the rankordering of the better alignments be correct at low temperature. This would allow simulated annealing to be applied to star and allpairs costs (Altschul & Lipman 1989) and to multiple LCS and editdistance problems in a direct way.
The processes described are good candidates for both finegrain and coarsegrain parallel implementations. Collins and Coulson (1984) and Dautricourt et al (1991) have implemented finegrain parallel versions of DPAs for two strings and this could be done for the central step of our stochastic processes. Dautricourt et al report that, for some algorithms, 1 hour's processing on a sparcstation could be done in 27 seconds on a Maspar. Independent copies of the stochastic processes could also be run as a coarsegrain parallel program.
Allison L. (1993) A fast algorithm for the optimal alignment of three strings. J. Theor. Biol. 164(2) 261269.
Allison L. & C. S. Wallace. (1993) The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimisation of multiple alignments. TR 93/188 July '93, Dept. Comp. Sci., Monash University.
Allison L., C. S. Wallace & C. N. Yee. (1992a)
Finitestate models in the alignment of macromolecules.
J. Mol. Evol. 35(1) 7789.
Also see
[TR 90/148 (HTML)].
Allison L., C. S. Wallace & C. N. Yee. (1992b) Minimum message length encoding, evolutionary trees and multiple alignment. 25^{th} Hawaii Int. Conf. Sys. Sci. 1 663674.
Allison L. & C. N. Yee. (1990) Minimum message length encoding and the comparison of macromolecules. Bull. Math. Biol. 52(3) 431453.
Altschul S. F. & D. J. Lipman. (1989) Trees, stars and multiple biological sequence alignment. SIAM J. Appl. Math. 49(1) 197209.
Bishop M. J., A. E. Friday & E. A. Thompson. (1987) Inference of evolutionary relationships. In Nucleic Acid and Protein Sequence Analysis. M. J. Bishop & C. J. Rawlings (eds), IRL Press, Oxford 359385.
Carrillo H. & D. J. Lipman. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48 10731082.
Collins J. F. & A. F. W. Coulson. (1984) Applications of parallel processing algorithms for DNA sequence analysis. Nucl. Acids Res. 12(1) 181192.
Dautricourt JP., D. Brutlag, B. Moxon & J. Fier. (1991) Similarity searching algorithm comparisons using massively parallel computing hardware. Genome Sequencing Conf. III.
Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17 368376.
Felsenstein J. (1983) Inferring evolutionary trees from DNA sequences. In Statistical Analysis of DNA Sequence Data B. S. Weir (ed), Marcel Dekker 133150.
Hastings W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97109.
Hein J. (1989) A tree reconstruction method that is economical in the number of pairwise comparisons used. Mol. Biol. & Evol. 6 669684.
Hirschberg D. S. (1975) A linear space algorithm for computing maximal common subsequences. CACM 18(6) 341343.
Huggins A. S., T. L. Bannam & J. I. Rood. (1992) Comparative sequence analysis of the catB gene from clostridium butyricum. Antimicrobial Agents & Chemotherapy 36(11) 25482551.
Ishikawa M., T. Toya, M. Hoshida, K. Nitta, A. Ogiwara & M. Kanehisa. (1992) Multiple sequence alignment by parallel simulated annealing. Institute for New Generation Computing (ICOT) TR730.
Lake J. A. (1991) The order of sequence alignment can bias the selection of tree topology. Mol. Biol. Evol. 8(3) 378385.
Levenshtein V. I. (1966) Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8) 707710.
Rissanen J. (1983) A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2) 416431.
Sankoff D. & J. B. Kruskal (eds). (1983) Time Warps, String Edits and MacroMolecules: The Theory and Practice of Sequence Comparison. Addison Wesley.
Sellers P. H. (1974) On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26(4) 787793.
Thorne J. L. & H. Kishino. (1992). Freeing phylogenies from artifacts of alignment. Molec. Biol. & Evol. 9(6) 11481162.
Thorne J. L., H. Kishino & J. Felsenstein. (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33 114124.
Thorne J. L., H. Kishino & J. Felsenstein. (1992) Inching towards reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34 316.
Tichy W. F. (1984) The string to string correction problem with block moves. ACM Trans. Comp. Sys. 2(4) 309321.
Ukkonen E. (1983) On approximate string matching. In Proc. Int. Conf. on Foundations of Computation Theory M. Karpinski (ed) Springer Verlag LNCS 158 487495.
Wagner R. A. & M. J. Fischer. (1974) The string to string correction problem. JACM 21(1) 168173.
Wallace C. S. & D. M. Boulton. (1968) An information measure for classification. Comp. J. 11(2) 185194.
Wallace C. S. & P. R. Freeman. (1987) Estimation and inference by compact coding. J. Royal. Statist. Soc. B 49(3) 240265.
Yee C. N. & L. Allison. (1993) Reconstruction of strings past. Comp. Appl. BioSci. 9(1) 17.
Note, 17th Annual Computer Science Conference, ACSC17, Christchurch, New Zealand, Proceedings are published in Australian Computer Science Communications Vol 16 (!) No 1, 1994, isbn:047302313X, ed. G. Gupta.
www: 
