program MMLAlign( input, output, LogSumData );

{L. Allison, Dept. Computer Science, Monash University, Australia 3800.
             (later Faculty of Information Technology, as of 2006)
 lloyd.allison AT infotech.monash.edu.au
 http://www.allisons.org/ll/Bioinformatics/Multiple/

 Prototype MML Evolutionary Tree and Multiple-Alignment   25 May 1993.
 Based on an underlying 1-state model of string mutation.
 MML (Minimum Message Length) Inference is a Bayesian method.
 
 (You also need to fetch file 'LogSumData'.)

 This code available under GNU copyleft, General Public Licence, GPL.

 Definition of "Prototype":
    has some good ideas but should be thrown away and rewritten.

-------------------------------------------------------------------------------

If you use this program, bits of the code, or ideas in it, please reference:

[1] L.Allison, C.S.Wallace & C.N.Yee
    Minimum message length encoding, evolutionary trees and multiple-alignment.
    27th Hawaii Int Conf Sys Sci (HICSS-27), Vol.1, pp.663-674, IEEE Jan 1992.

[2] L. Allison & C. S. Wallace.
    The Posterior Probability Distribution of Alignments and its
    Application to Estimation of Evolutionary Trees and Optimisation of
    Multiple Alignments.
    Jrnl. Molec. Evol., Vol.39, pp.418-430, 1994.
    http://dx.doi.org/10.1007/BF00160274

definitely, and maybe also:

[3] L.Allison, C.S.Wallace & C.N.Yee
    Finite-state models in the alignment of macro-molecules.
    Jrnl. Molec. Evol., Vol.35, pp.77-89, 1992.
    http://dx.doi.org/10.1007/BF00160262

[4] C. N. Yee & L. Allison.
    Reconstruction of strings past.
    J. Bioinformatics (was CABIOS), 9(1), pp.1-7, Feb 1993.
    http://bioinformatics.oxfordjournals.org/cgi/content/abstract/9/1/1
-------------------------------------------------------------------------------

Output:
-------
The output is moderately voluminous as information is given during the
iterative alignment-search process.
One Alignment is given per tree.

The `bit' quantities are bits of information.

And `params' refers to the number of bits spent in specifying the estimated
parameters of the edges of the tree.
This gives an indication of the number of significant figures in the estimates.

The parameters are inferred from the data.
Each edge has 3 parameters : p(match), p(change), p(indel).
These sum to 1.0 so there are 2 degrees of freedom per edge.

As a ROUGH guide ONLY,  -log(1-p(match)) can be taken as a kind
of distance on strings, especially if p(match) is high.
See [1] for how to combine, or add, 2 edges properly.

A difference in message lengths can be taken as a posterior -log odds-ratio
on 2 hypotheses, eg. 2 trees.
(I would not bet any $money$ on the strength of a few bits!)

Limitations:
------------
The program has a certain statistical model[4] of evolution.
The results must *** NOT *** be relied upon if the model is inappropriate:

 . No gap penalty [this is a hard problem in multiple alignment],
   so do NOT compare strings of very different lengths.

 . All substitutions are equally probable
   so the program is unsuitable for proteins.

A single optimal alignment is used which can be expected to bias edge
estimates, especially for distant strings, see [4]. We would prefer to average
over all (or most) alignments, but this is hard for several strings.

The alignment search-heuristic is not guaranteed to find the optimal alignment.
It grows less reliable as the strings grow more dissimilar.

The `hash-table' (see below) is really a lookup-table of size 5**K
(for DNA/RNA).
K <= 8 is possible, depending on the memory available.
Beyond this a real hash-table would be needed [not too hard].

Input:
------
K                             -- number of strings
>shortname                    -- each string has a short id     |
long description on one line  --                                |
acgtacgtacgt                  -- sequence on 1 or more lines    | 1st string
acgtacgt                      -- bad characters give a warning  |
acgt acgt                     -- digits and spaces ignored.     |
                              -- terminated by blank line       |
>shortname2                   --
description 2                 --
acgt...                       --

((1 2)(3 4))                  -- say, i.e. one or more trees
((1 3)(2 4))                  -- NB they are really UNrooted trees.
<eof>                         -- end of file

You must specify the trees that are to be tried.
The intention is that any pair-wise distance matrix method be used to
suggest a few likely trees.
-------------------------------------------------------------------------------

 K strings, average length N

 Time  is O(K * N**2)   - per tree.
 Space is O(N**2)  bits - if packed obeyed
 except that the HashTable is (|Alphabet|+1)**K - could easily
 be rather more compact
===============================================================================
}
label 99; {finish}

const AlphabetSize = 4;        {DNA -> 4}
      Null         = 0;        {Null "character"}
      Infinity     = 1.0E+8;   {at least big! bigger can give "underflow"}

      Kmax         = 8;        {max # of real strings}
      HypMax       = 14;       {max real or hyp' string #, = 2*Kmax-2}
      McMax        = 13;       {max machine #, = 2*Kmax-3}
      TupleMax     = 390624;   {(AlphabetSize+1)**Kmax - 1}

      Nmax         = 2000;     {max string length}
      Nmax1        = 2001;     {Nmax+1}

      Convergence  = 1.0;                            {convergence limit (bits)}

type  alfa         = packed array [1..10] of char;

      Alphabet     = 0..AlphabetSize; {0 is Null}
      AlphabetSet  = set of Alphabet;
      AlphaProb    = array [Alphabet] of real; {a fuzzy char}
      StringIndex  = 0..Nmax1;
      String       = record name :alfa;
                            description :packed array [1..80] of char;
                            N :StringIndex;                   {string's length}
                            c :array [StringIndex] of char     {chars as input}
                     end;
      StringNum    = 0..Kmax;                 {real, present-day, leaf strings}
      StringSet    = set of StringNum;
      Hypothetical = 0..HypMax;                     {real or ancestral strings}

      Tuple        = array [StringNum] of Alphabet;  {a column of an alignment}
      TupleNum     = 0..TupleMax;

      MachineNum   = 0..McMax;
      Operation    = (NoOp, Copy, Change, Indel);      {NB. NoOp == Non Op !!!}
      Machine      = array [Operation] of real;
      Machines     = array [MachineNum] of Machine;

      Tree         = ^ node;  {NB. "real" tree is UNrooted, top node is unused}
      node         = record left, right :Tree;
                            s :Hypothetical; m :MachineNum  {machine "above" s}
                     end;

      Direction    = (Horizontal, Vertical, Diagonal);
      Alignment = record K :StringNum;         {# of strings in the alignment}
                         s :array [StringNum] of StringNum; {which ones in A'}
                         N :StringIndex;                    {alignment length}
                         t :array [StringIndex] of TupleNum; {"fuzzy" "chars"}
                         i :array [StringNum, StringIndex] of StringIndex
                  end;
{ A.K = the number of strings in an alignment A.
  A.s[x] is the string number of the x-th string in A, 1<=x<=A.K.
  A.N = the number of columns in A.
  A.t[x] is the x-th column of the alignment as number of K digits in
     base |Alphabet|+1, corresponding to A.K as most sig' digit down to 1.
     eg. for DNA, "AA_C" gives 1102 base 5.
  A.i[x, y] is an index into string A.s[x] corr' to column y of A, 1<=y<=A.N
     The index is zero if a Null is indicated.
  S[ A.s[x] ].c[ A.i[x, y] ] gets a char of the x-th string in alignment A
     corresponding to the y-th column of A. (Fake 0-th char if index 0.)
}
      SearchMode = (Deterministic, NonDeterministic); {manner of (re)aligning}

var   T :Tree; K :StringNum;
      S :array [StringNum] of String;    str :StringNum;
      Mc, Est :Machines;  M :MachineNum;
      A :Alignment;
      StrML :real; {message length of characters of strings only}

      CharToAlpha :array [char] of Alphabet;  ch :char;     {conversion tables}
      AlphaToChar :array [Alphabet] of char;
      A2toOp :array [Alphabet, Alphabet] of Operation;
      x, y :Alphabet;
      ioffset, joffset :array [Direction] of integer;
      seed :record lo, hi :integer end; { seed needed by generator }
      LogSumArr  :array[0..959] of real;  {for LogSum}
      LogSumData :text;

{-----------------------------------------------------------------------------}

procedure Error( m :alfa );
   begin writeln('Error: ', m); goto 99 end;

procedure Assertion( b :boolean; m :alfa );
   begin if not b then Error(m) end;

function min( i, j :integer ) :integer;
   begin if i<j then min := i else min := j end;

function log2( x :real ) :real; begin log2 := ln(x) / ln(2.0) end;
{ x = 2^log2(x) = e^ln(x) = (e^ln(2))^log2(x) = e^(ln(2)*log2(x)) }

function exp2( x :real ) :real; begin exp2 := exp(x * ln(2.0)) end;
{ 2^x = (e^ln(2))^x = e^(ln(2)*x) }

function random(i : integer) : integer;
{30 bit linear-conguential pseudo random number generator }
{Guarantees no integer overflow                           }
const K32 = 32768 {= 2**15};
var   l, h : integer;
begin l := seed.lo * 20077 + 12345;
      h := seed.hi * 20077 + seed.lo * 908;
      seed.lo := l mod K32;
      seed.hi := (h + l div K32) mod K32;
      random  := (seed.hi + seed.lo * K32) mod i;
{On the algorithm - it is the same as the following in 30 bit unsigned
 arithmetic. swap is the seed with the upper and lower 15 bits swapped.
 The simpler formula below requires silent integer overflow, of course.
        seed := seed * 29755421 + 12345;
        swap := (seed mod K32) * K32 + seed div K32;
        random := swap mod i;
}
end {random};

function NTrees( Leaves :integer ) :integer;
   var i, NT :integer; {number of unrooted binary trees over `Leaves'}
begin NT := 1;
      for i := 1 to (2*Leaves-5) div 2 do NT := NT * (2*i+1);
      NTrees := NT
end;

function logstar( x :real ) :real;
begin if x>1 then begin x := log2(x); logstar := x+logstar(x) end
             else logstar := 0
end;

function U( L :real ) :real; { Code length for integers  L >= 1 ; Rissanen}
   begin U := logstar(L) + log2(2.865) end;

procedure NullTheory;
{P(len1, len2, ... lenK | len1+...+lenK) =          K_omial distribution
    (len1+...+lenK)! / (len1!*...*lenK! K**(len1+..+lenK) }
   var str :StringNum; i, TotalLength :integer;
       TotLenML, LengthsML, StrML :real;
begin TotalLength := 0;
      for str := 1 to K do TotalLength := TotalLength + S[str].N;
      TotLenML := U( TotalLength );{should this be the average instead ??? ***}

      LengthsML := TotalLength * log2( K );
      for i := 2 to TotalLength do LengthsML := LengthsML - log2(i);
      for str :=1 to K do
         for i := 2 to S[str].N do
            LengthsML := LengthsML + log2(i);

      StrML := TotalLength * log2(AlphabetSize); {assume equi-probable}

      writeln( 'Null Theory=', U(K)+TotLenML+LengthsML+StrML :6:1, ' bits',
               '=', U(K):4:1, '(K)',
               '+', TotLenML :6:1, '(tot len)',
               '+', LengthsML:6:1, '(diff len''s)',
               '+', StrML    :6:1, '(chars)' )
end {NullTheory};

function paramcost( fcopy, fchange, findel :real ) :real;
{ see: Boulton and Wallace  or  Wallace and Freeman }
   const NumOps = 3; {the "alphabet" of operations}
   var   fsum, pcopy, pchange, pindel, V, deltaVsq :real;
begin fsum := fcopy + fchange + findel;
      pcopy := fcopy / fsum;
      pchange := fchange / fsum;
      pindel := findel / fsum;
      deltaVsq := pcopy * pchange * pindel / exp((NumOps-1)*ln(fsum/12));
      V := 1/2 { the param-space volume, = 1/(NumOps-1)! };
      paramcost := ln( 1 + V*V/deltaVsq) / 2 / ln(2){make bits}
end;

function errorcost :real;
{ delta increase in msg length through using finite approx to param values}
   const NumOps = 3;
begin errorcost := (NumOps - 1)/2 * log2(exp(1))
end;

function Expn( x :real; n :integer ) :real;
begin if n < 0 then Expn := 1/Expn( x, -n )
      else if n = 0 then Expn := 1
      else if odd(n) then Expn := x*sqr(Expn(x, n div 2))
      else Expn := sqr(Expn(x, n div 2))
end;

function sum( P :AlphaProb ) :real;
   var s :real; x :Alphabet;
begin s := 0;
      for x := Null to AlphabetSize do s := s + P[x];
      sum := s
end;

{-----------------------------------------------------------------------------}

function LogSum(a, b :real):real;
{        Overall error is less than 10**-6
         Table of polynomial coefficients for the piecewize approximation
         of the function
                 log (1 + 2 ** -x)        (log is to base 2)        (x .ge. 0)
         To use the table, form y = 8 * x.  If y .ge. 239.5, fun = 0
         Otherwise, form k = nint(y), z = y-k.  Then k is in 0 ... 239,
         z is in -0.5 ... 0.5.
         Fun is approx given by the poly in z whose coeffs are in d()
         d(4k+3) is the constant term, d(4k) is the coeff of z ** 3
         Chris Wallace 1989
         converted to Pascal, L.Allison 1993
}
var x, y, abm, apr :real; i, k :integer;
begin if LogSumArr[0] <> 0.0 then {must initialise LogSumArr[]}
      begin reset(LogSumData);
            for i := 0 to 959 do
            begin read(LogSumData, k); Assertion(i=k, 'LogSumInit');
                  readln(LogSumData, LogSumArr[i])
      end   end;

      x := (a - b);
        if x >= 0.0 then abm := a
        else begin abm := b; x := -x end;
      { abm is max (a,b), x is abs (a-b) }

        y := 8.0 * x;
        k := round(y);
        if k >= 240 then LogSum := abm
        else
        begin y := y - k;
              k := k * 4;
              apr := LogSumArr[k];
              for i := 1 to 3 do
              begin k := k + 1;
                    apr := apr * y + LogSumArr[k]
              end;
              LogSum := abm + apr
        end
end {LogSum};

{-----------------------------------------------------------------------------}

procedure NormaliseMachines( var Mc :Machines );
   var M :MachineNum; Op :Operation; Optotal :real;
begin for M := 1 to 2*K-3 do
      begin Optotal := 0.0;
            for Op := Copy to Indel do
               Optotal := Optotal+Mc[M, Op];
            for Op := Copy to Indel do
               Mc[M, Op] := Mc[M, Op]/Optotal
end   end;

procedure ShowMachines( Mc :Machines );
   var M :MachineNum; Op :Operation;
begin for M := 1 to 2*K-3 do
      begin write( 'mc', M:1, ': ');
            for Op := Copy to Indel do
               write(Mc[M, Op] :6:4, ' ');
            writeln
end   end;

{-----------------------------------------------------------------------------}

procedure ShowInfixTree( T :Tree );
begin if T^.left = nil then write(T^.s:1)
      else begin write('('); ShowInfixTree(T^.left); write(' ');
                 ShowInfixTree(T^.right); write(')')
           end
end;

procedure ShowTree( T :Tree ); {print tree down the page}

   procedure ST(T :Tree; depth :integer);

      procedure PadOut;
         var i :integer;
         begin for i := 1 to depth do write(' ':14, '|') end;

   begin if T <> nil then
         begin ST(T^.right, depth+1);
               PadOut; writeln('<--(m', T^.m:2, ')-->s', T^.s:2, '|');
               ST(T^.left, depth+1)
         end
   end {ST};

begin ST(T^.right, 0);
      writeln('^'); writeln('|'); writeln('v');
      ST(T^.left,  0);
      writeln('NB. UNrooted tree')
end {ShowTree};

{-----------------------------------------------------------------------------}

function ParseTree :Tree;
   type symbol = (open, close, numeral, eoT);
   var sy :symbol; ch :char; n :integer;
       StrSeen :StringSet;

   procedure Error(m :alfa);
   begin writeln;
         writeln('Error: ', m, ' sy=', ord(sy):1, ' ch=', ch, ' n=', n:1);
         while not eof do
         begin while not eoln do begin read(ch); write(ch) end;
               readln; writeln
         end;
         writeln; goto 99
   end;

   procedure getch;
   begin if eof then ch := '#'
         else if eoln then ch := '#'
         else read(ch)
   end;

   procedure insymbol;
   begin while ch = ' ' do getch;
         if ch in ['(', ')'] then
         begin case ch of
                  '(': sy := open;
                  ')': sy := close
               end;
               getch
         end
         else if ch in ['0'..'9'] then
         begin sy := numeral; n:=0;
               while ch in ['0'..'9'] do
               begin n := n*10+ord(ch)-ord('0'); getch
               end {ch not in ['0'..'9']}
         end
         else if ch='#' then sy:=eoT
         else Error('bad symbol')
   end {insymbol};

   procedure check(s :symbol; m :alfa);
   begin if sy=s then insymbol else Error(m) end;

   function PT :Tree;
      var P :Tree;
   begin
      new(P);
      if sy = open then
      begin insymbol;
            P^.s := 0; P^.left := PT; P^.right:=PT;
            check(close, 'no )      ')
      end
      else if sy = numeral then
      begin Assertion((n>=1)and(n<=K)and not (n in StrSeen), 'string #  ');
            P^.s := n; StrSeen := StrSeen+[n]; insymbol;
            P^.left:=nil; P^.right:=nil
      end
      else Error('bad Tree  ');
      PT := P
   end {PT};

begin StrSeen := []; ch := ' '; insymbol;
      ParseTree := PT;
      check(eoT, 'end of T  '); Assertion(StrSeen = [1..K], 'string #s ');
      readln
end {ParseTree};

{. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .}

procedure Name( T :Tree ); {Name internal nodes}
   var dontcare :integer;

   function N( T :Tree; newname :integer ) :integer;
   begin if T^.left <> nil then {not a leaf}
         begin T^.s := N(T^.right, N(T^.left, newname));
               N := T^.s + 1
         end
         else {leaf is already named} N := newname;
         T^.m := min(T^.s, 2*K-3)
   end;

begin T^.m := 0;{unrooted}
      dontcare := N( T^.right, N(T^.left, K+1) );
      T^.right^.m := min(T^.left^.m, T^.right^.m);
      T^.left^.m  := T^.right^.m
end {Name};

{-----------------------------------------------------------------------------}

procedure ReadString( var S :String );
   var ch :char; i :integer; finished :boolean;
       BadChars :set of char; NBadChars :integer;
begin Assertion(input^ = '>', 'missing > ');
      read(ch); S.name := '?         ';
      i:=0;
      while not eoln do {name}
         begin i:=i+1; read(ch); if i<=10 then S.name[i]:=ch end;
      readln;

      for i := 1 to 80 do S.description[i] := ' ';
      i := 0;
      while not eoln do {description}
         begin i:=i+1; read(ch); if i<=80 then S.description[i]:=ch end;
      readln;

      S.N := 0; S.c[0] := '?'; finished:=false; BadChars := []; NBadChars := 0;
      while not finished do
      begin
         while not eoln do
         begin read( ch );
               if not( ch in ['0'..'9', ' '] ) then {skip numbering and space}
               begin
                  if ch in ['a','c','g','t','u','A','C','G','T','U'] then
                       begin Assertion( S.N < Nmax, 'too long  ');
                             S.N := S.N+1; S.c[S.N] := ch
                       end
                  else begin BadChars := BadChars + [ch];
                             NBadChars := NBadChars + 1
                       end
               end
         end;{eoln}
         readln;
         if eof then finished:=true                    {terminated with eof or}
            else if eoln then begin finished:=true; readln end  {blank line or}
               else if input^ = '>' then finished:=true           {next string}
      end;{finished}
      Assertion( S.N > 0, 'Str empty ');

      if NBadChars > 0 then
      begin write(' WARNING: ', NBadChars:1, ' bad chars skipped: [');
            for ch := chr(0) to chr(127) do if ch in BadChars then write(ch);
            writeln(']')
      end
end;

procedure StrToAlign( str :StringNum; var A :Alignment );
{convert S[str] to a trivial alignment of 1 string}
   var i :StringIndex;
begin A.N := S[str].N; A.K := 1; A.s[1] := str; A.t[0] := 0; A.i[1,0] := 0;
      for i := 1 to S[str].N do
      begin A.t[i] := CharToAlpha[ S[str].c[i] ]; A.i[1,i] := i
      end
end;

procedure Project( A :Alignment; ss :StringSet; var X, Y :Alignment );
{project an alignment A onto 2 subalignments X and Y as determined by}
{the strings in ss. Those in ss -> X, others -> Y.}
   var i, inA :StringIndex; str, strN :StringNum; strX, strY :integer;

   procedure RemoveZeros( var A :Alignment );{remove all-null tuples}
      var i, N :StringIndex; str :StringNum;
   begin N := A.N; A.N := 0;
         for i := 1 to N do
            if A.t[i] <> 0 then {remove zero tuples}
            begin A.N := A.N + 1;
                  A.t[A.N] := A.t[i];
                  for str := 1 to A.K do A.i[str, A.N] := A.i[str, i]
            end
   end;

begin X.N := A.N; X.K := 0; X.t[0] := 0;
      Y.N := A.N; Y.K := 0; Y.t[0] := 0;
      for str := 1 to A.K do
         if A.s[str] in ss then
              begin X.K := X.K+1; X.s[X.K] := A.s[str]; X.i[X.K,0] := 0 end
         else begin Y.K := Y.K+1; Y.s[Y.K] := A.s[str]; Y.i[Y.K,0] := 0 end;
      Assertion(X.K+Y.K = A.K, 'project   ');

      for i := 1 to A.N do
      begin strX := X.K+1; strY := Y.K+1;
            X.t[i] := 0; Y.t[i] := 0;
            for str := A.K downto 1 do {up a column of A}
            begin strN := A.s[str];
                  inA  := A.i[str, i];
                  if strN in ss then {one for X}
                  begin strX := strX - 1;
                        X.i[strX,i] := inA;
                        X.t[i] := X.t[i] * (AlphabetSize+1) +
                                  CharToAlpha[ S[strN].c[inA] ]
                  end
                  else {one for Y}
                  begin strY := strY - 1;
                        Y.i[strY,i] := inA;
                        Y.t[i] := Y.t[i] * (AlphabetSize+1) +
                                  CharToAlpha[ S[strN].c[inA] ]
                  end;
            end {str}
      end {i};

      RemoveZeros( X ); RemoveZeros( Y )
end {Project};

procedure ShowNames( A :Alignment );
   var str :StringNum; i :integer;
begin for str := 1 to A.K do
      begin write( 's', A.s[str] :1, ': ', S[A.s[str]].name, ': ' );
            for i := 1 to 60 do write(S[A.s[str]].description[i]);
            writeln
      end
end;

procedure ShowNumbers( A :Alignment );
   var str :StringNum;
   begin for str := 1 to A.K do write(' s', A.s[str]:1 ) end;

procedure ShowAlignment( A :Alignment );
   const cols = 50;
   var i, j :StringIndex; str, strN, temp :StringNum;
       Perm :array [StringNum] of StringNum;
       col :1..cols;
       Letters :array [1..cols] of AlphabetSet;  L :Alphabet;
       Letter  :array [1..cols] of Alphabet;
       NLetters:array [1..cols] of integer;
begin
   write('Alignment:');
   for str := 1 to A.K do Perm[str] := str;
   for str := 1 to A.K-1 do {Bubblesort}
   {inv: Perm[1..str-1] sorted and before Perm[str..A.K]}
      for strN := A.K-1 downto str do
         if A.s[Perm[strN]] > A.s[Perm[strN+1]] then
         begin temp := Perm[strN];
               Perm[strN] := Perm[strN+1]; Perm[strN+1] := temp
         end;

   for i := 0 to (A.N - 1) div cols do {each slab}
   begin writeln;
         for col := 1 to cols do begin Letters[col]:=[]; NLetters[col]:=0 end;

         for temp := 1 to A.K do {each string in A}
         begin str  := Perm[temp]; strN := A.s[str];
               j := i*cols + 1; {find next non-zero index for str}
               while (j < A.N) and (A.i[str, j] = 0) do j:=j+1;
               {j=A.N or A.i[str,j]<>0}
               write( 's', strN :2, ' ', S[strN].name,
                      '[', A.i[str,j] :4, '.. ]: ');

               for j := i*cols+1 to min((i+1)*cols, A.N) do
               begin
                  col := j-i*cols;
                  L := CharToAlpha[ S[strN].c[ A.i[str,j] ] ];
                  if not(L in Letters[ col ]) then
                  begin Letters[ col ] := Letters[ col ] + [L];
                       NLetters[ col ] :=NLetters[ col ] + 1;
                        Letter[  col ] := L
                  end;
                  if A.i[str, j] > 0 then write( S[strN].c[A.i[str,j]] )
                  else write('-')
               end;
               writeln
         end;

         write('    CONSENSUS [', i*cols+1 :4, '.. ]: ');
         for col := 1 to cols do
            if NLetters[col] = 1 then write( AlphaToChar[ Letter[col] ] )
            else if NLetters[col] = 2 then write('+')
            else write('.');
         writeln
   end;
   writeln; ShowNames( A ); writeln('**********')
end {ShowAlignment};

{-----------------------------------------------------------------------------}

procedure Align( A, B :Alignment; var R :Alignment;
                 Mode :SearchMode; Multiplier :real;
                 T :Tree; Mc :Machines; var Freq :Machines; var StrML :real );
{Multiplier multiplies msg len's ie. -log(probs) ie. raise probs to that power,
 NonDet & Multiplier=1 => Gibbs Sampling
          Multiplier>1, increasing => Simulated Annealing}

  type TwoRows = array[0..1, StringIndex] of real;
  var EDir :packed array [StringIndex, StringIndex] of Direction;  {N**2 space}
      EML  :TwoRows;
 {EML[,] = values of Msg Len of partial alignments; EDir for traceback.}

      ET :real; {expected # tuples / char of (arbitrary) "ancestor"}
      leftshift :integer;
      HashTable :array [TupleNum] of
            record occurs :integer;
                   ML :real
            end;
      HTentries :integer; Tn :TupleNum; Tp :Tuple; str :StringNum;
      TotalProb :real;

      Transition :array [MachineNum, Alphabet, Alphabet] of real; {Op probs}
      x, y :Alphabet;  Pcopy, Pchange, Pinsert, Pdelete :real;
      M :MachineNum; Op :Operation; Optotal :real;

      Estimating {new mc params in Freq} :boolean;
      ParamsML : real;

{ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

   procedure Explain( Tp :Tuple; Tn :TupleNum );

      var P, Left, Right :array [Hypothetical] of AlphaProb; {global  Up, Down}
          Sleft, Sright :Hypothetical; x :Alphabet;
          Downleft, Downright :AlphaProb;
          CharBelow :array [Hypothetical] of integer; {real chars}
          NRealChar :integer; i :StringNum;
          TupleProb :real;

      function Allowed( A1, A2 :Alphabet; NR :integer ) :boolean;
      { A1<---------------------->A2   can only insert A1->A2 if no real
      NRealChar-NR                NR   chars below A1 & similarly A2->A1 ... }
      begin if not(Null in [A1,A2]) then Allowed := true
            else if [A1,A2]=[Null]  then Allowed := true
            else if (A1=Null) and (NRealChar-NR = 0) then Allowed := true
            else if (A2=Null) and           (NR = 0) then Allowed := true
            else Allowed := false
      end;

      procedure Up( T :Tree ); {post-order}
         var str, Sleft, Sright :Hypothetical;
             Mleft, Mright :MachineNum;
             Tleft, Tright :Tree;
             x, y :Alphabet;
      begin
         str := T^.s; Tleft := T^.left; Tright := T^.right;
         if Tleft = nil then {leaf}
         begin for x := Null to AlphabetSize do
                  P[str, x] := 0;
               P[str, Tp[str]] := 1.0;
               CharBelow[str] := ord( Tp[str] <> Null )
         end
         else {not a leaf}
         begin Up( Tleft ); Up( Tright );
               CharBelow[str] := CharBelow[Tleft^.s] + CharBelow[Tright^.s];
               Sleft := Tleft^.s; Sright := Tright^.s;
               Mleft := Tleft^.m; Mright := Tright^.m;
               for x := Null to AlphabetSize do
                  begin Left[str,x] := 0; Right[str,x] := 0 end;
               for x := Null to AlphabetSize do
                  for y := Null to AlphabetSize do
                  begin if Allowed(x, y, CharBelow[Tleft^.s]) then
                           Left[str,x] := Left[str,x]
                              + P[Sleft,y]*Transition[Mleft,x,y];
                        if Allowed(x, y, CharBelow[Tright^.s]) then
                           Right[str,x] := Right[str,x]
                              + P[Sright,y]*Transition[Mright,x,y]
                  end;
               for x := Null to AlphabetSize do
                  P[str,x] := Left[str,x]*Right[str,x]
         end
      end {Up};
{              |above
               |
               |      NB. real tree is unrooted so directions are "arbitrary"
               v
            T^:*      P[T^.s] contains probs of value of char at this node
              ^ ^
             .   .
            .     .
       left.       .right
}
      procedure Down( T :Tree; above :AlphaProb; TopRight :boolean );{preorder}
         var abv, DL, DR :AlphaProb;
             str :Hypothetical; mc :MachineNum; x, y :Alphabet;
             Mcne :Machine {the one "above" str, T^.m}; Op :Operation;
             delta:real;
      begin
         str := T^.s; mc := T^.m;
         for Op := NoOp to Indel do Mcne[Op] := 0.0;
         for x  := Null to AlphabetSize do abv[x] := 0;
         for x {above} := Null to AlphabetSize do
            for y {me} := Null to AlphabetSize do
            begin if Allowed(x, y, CharBelow[str]) then
                     delta :=  above[x]*Transition[mc, x, y]
                  else delta := 0;
                  abv[y] := abv[y] + delta;
                  Op := A2toOp[ x, y ];
                  Mcne[Op] := Mcne[Op] + P[str, y] * delta
            end;
         for x := Null to AlphabetSize do
            P[str, x] := P[str,x] * abv[x];

         if Estimating then
         begin
            if not TopRight then
            begin
               delta := 0.0;   {normalise}
               for Op := NoOp to Indel do delta := delta + Mcne[Op];
               for Op := NoOp to Indel do
                  Freq[mc, Op] := Freq[mc, Op]
                                + HashTable[Tn].occurs * Mcne[Op] / delta
            end;

            if T^.left <> nil then {not a leaf}
            begin for x := Null to AlphabetSize do
                  begin DL[x] := Right[str,x]*abv[x];
                        DR[x] := Left [str,x]*abv[x]
                  end;
                  Down( T^.left, DL, false ); Down( T^.right, DR, false )
            end
         end {Estimating}
      end {Down};

   begin {Explain}
      NRealChar := 0;
      for i := 1 to K do if Tp[i]<>Null then NRealChar := NRealChar+1;

      Up( T^.left ); Up( T^.right );

      Sleft := T^.left^.s; Sright := T^.right^.s;
      Downleft[Null] := P[Sright, Null];
      Downright[Null]:= P[Sleft,  Null];
{***following could be changed to reflect base (letter) frequencies}
      for x := 1 {sic} to AlphabetSize do
      begin Downleft[x] := P[Sright, x] / AlphabetSize; {assume equi-probable}
            Downright[x]:= P[Sleft,  x] / AlphabetSize
      end;

      Down( T^.left, Downleft, false ); Down( T^.right, Downright, true );

      if HashTable[Tn].occurs < 0 then
      begin HTentries := HTentries + 1;
            HashTable[Tn].occurs := 0 {present};
            TupleProb := sum( P[Sleft] ) / ET{per tuple};
            HashTable[Tn].ML := -log2( TupleProb );
            TotalProb := TotalProb + TupleProb {total only over tuples seen}
      end
   end {Explain};

{ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

   function cost( i, j :StringIndex ) :real;        {Msg Len of a single tuple}
      var Tn :TupleNum; Tp :Tuple; str, strN :StringNum;
   begin {cost}
      Tn := A.t[i] * leftshift + B.t[j];        {NB. A.t[0] = B.t[0] = 0}

      if HashTable[Tn].occurs < 0 then {absent}
      begin for str := 1 to K   do Tp[str] := Null;
            for str := 1 to B.K do
            begin strN := B.s[str];
                  Tp[strN] := CharToAlpha[ S[strN].c[B.i[str,j]] ]
            end;
            for str := 1 to A.K do
            begin strN := A.s[str];
                  Tp[strN] := CharToAlpha[ S[strN].c[A.i[str,i]] ]
            end;
            Explain( Tp, Tn )
      end;

      cost := HashTable[Tn].ML
   end {cost};

{ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

   procedure DPA(var EML :TwoRows;          {The Dynamic Programming Algorithm}
                 Alo,Astep{+-1},Ahi,
                 Blo,Bstep{+-1},Bhi :StringIndex;
                 rightTerminal :boolean);
      var ML :array[Direction] of real;
          choice :Direction;
          i, j :StringIndex;
          NewRow, OldRow :0..1;
   begin {DPA}
   {Top/Bottom Boundary Conditions:}
      NewRow := (Alo-Astep) mod 2; {0 or 1}
      EML [NewRow,    Blo-Bstep] := 0.0;
 {NB. EDir[Alo-Astep, Blo-Bstep] is NOT assigned or changed!}

      j := Blo;
      while j*Bstep <= Bhi*Bstep do {Boundary conditions}
      begin if (Mode=NonDeterministic) and not rightTerminal and (Bstep<0) then
                 EML[NewRow, j] := Infinity {NB. breaks after A[?]}
            else EML[NewRow, j] := EML[NewRow,j-Bstep] + cost(0,j);
            EDir[Alo-Astep,j] := Horizontal;
            j := j+Bstep
      end;

   {Main Loops:}
      i := Alo;
      while i*Astep <= Ahi*Astep do
      begin OldRow := NewRow; NewRow := i mod 2;
      {Left/Right Boundary Conditions:}
         EML [NewRow, Blo-Bstep] := EML[OldRow, Blo-Bstep] + cost(i,0);
         EDir[i,      Blo-Bstep] := Vertical;

         j := Blo;
         while j*Bstep <= Bhi*Bstep do {along a row}
         begin
   {General Step:}
            if (Mode=NonDeterministic) and (Astep>0) and (i=Ahi) then
                 ML[Horizontal] := Infinity{must end with <A[Ahi],?>}
            else ML[Horizontal] := EML[NewRow,j-Bstep] + cost(0,j);
            ML[Vertical]   := EML[OldRow,j      ] + cost(i,0);
            ML[Diagonal]   := EML[OldRow,j-Bstep] + cost(i,j);

            case Mode of
            Deterministic:
            begin
               if ML[Horizontal] < ML[Vertical] then{min cost, max probability}
                  if ML[Horizontal] < ML[Diagonal] then choice := Horizontal
                  else choice := Diagonal
               else {MLvert<=MLhoriz}
                  if ML[Vertical] < ML[Diagonal] then choice := Vertical
                  else choice := Diagonal;

               EML [NewRow,j] := ML[choice];
               EDir[i,     j] := choice;
            end;

            NonDeterministic:
               EML [NewRow,j] := {-log2(sum of probabilities)}
                  -LogSum(-ML[Diagonal],LogSum(-ML[Vertical],-ML[Horizontal]))
            end {case};

            j := j+Bstep
         end {j};

         i := i+Astep
      end {j} {Main Loop};
   end {DPA};

{ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

procedure ChooseAlignment( Alo,Ahi, Blo,Bhi :StringIndex );
   const ten6 = 1000000;
{Reminiscent of Hirschberg's divide and conquer, but has another purpose!}
   var FwdRows, RevRows :TwoRows; {for DPA fwd, DPA rev and Choose}
       str :StringNum;

   procedure Step( i, j :StringIndex );
      var str :StringNum;
   begin {Step}
      R.N := R.N+1; {next column of alignment}
      Assertion(R.N <= Nmax, 'Aln length');
      R.t[R.N] := 0;{init' column's tuple number}
      if i>0 then R.t[R.N] := R.t[R.N]+A.t[i]*leftshift; {A is most sig' posn}
      if j>0 then R.t[R.N] := R.t[R.N]+B.t[j];           {B is least sig'    }
      HashTable[R.t[R.N]].occurs := HashTable[R.t[R.N]].occurs+1;
      StrML := StrML + HashTable[R.t[R.N]].ML {msg length of new alignment R};

      for str := 1 to B.K do R.i[str,     R.N] := B.i[str, j]*ord(j>0);
      for str := 1 to A.K do R.i[str+B.K, R.N] := A.i[str, i]*ord(i>0);
   end {Step};

   procedure CA( Alo,Ahi, Blo,Bhi :StringIndex; rightTerminal :boolean );
      var Amid, Blo2, Bmid, i, j :StringIndex;
          FwdR, RevR :0..1; insB, ML :real;

      function Choose( Blo, Bhi :StringIndex ) :StringIndex;
         var j, Bmid, Blast :StringIndex;
             rand, MinVal, ML, Prob, ProbTotal :real;
      {choose Bmid randomly with prob ~ exp2(-FwdRows[FwdR,Bmid])}
      {POST: Blo <= Bmid <= Bhi; return(Bmid)}
      begin {Choose}
         MinVal := FwdRows[FwdR,Blo];
         for j := Blo+1 to Bhi do     {find minimum value}
         begin ML := FwdRows[FwdR, j];
               if ML < MinVal then MinVal := ML
         end;

         ProbTotal := 0.0;
         for j := Blo to Bhi do   {for normalisation}
         begin ML := FwdRows[FwdR, j]-MinVal { >= 0 };
               if ML <= 20 {bits, say} then
                  begin Prob := exp2(-ML){0<Prob<=1.0}; Blast := j end
               else Prob := 0.0 {to all intents & purposes};
               FwdRows[FwdR, j] := Prob;
               ProbTotal := ProbTotal + Prob
         end; {Typically, ProbTotal between 1.0 and (approx) 2}

         rand := ProbTotal*(1+random(ten6))/ten6; {0.0 < rand <= ProbTotal}
         FwdRows[FwdR, Blast] := FwdRows[FwdR, Blast]+1.0; {guard v. arith}
         Bmid := Blo;
         while rand > 0.0 do                    {make the "random" choice}
            if rand <= FwdRows[FwdR, Bmid] then rand := -1.0 {stop !}
            else begin rand := rand-FwdRows[FwdR, Bmid]; Bmid := Bmid+1 end;
         Assertion((Blo<=Bmid)and(Bmid<=Bhi), 'Choose    ');

         Choose := Bmid { >= Blo & <= Bhi}
      end {Choose};

   begin {CA}
      if Alo > Ahi then  {A null}
            for j := Blo to Bhi do Step(0, j){insert B}

      else {A ~null} if Blo>Bhi then {B null}
            for i := Alo to Ahi do Step(i, 0){delete A}

      else {A ~null and B ~null} if Alo = Ahi then {A length = 1}
      begin insB := 0.0;{eventually cost to insert all of Blo..Bhi}
            for j := Blo to Bhi do
            begin ML := cost(0, j){insert};
                  FwdRows[0, j] := ML; insB := insB+ML
            end;

            if rightTerminal then Blo2:=Blo
            else Blo2:=Bhi{break alignment JUST after A[Alo]};

            for j := Blo2 to Bhi do
               FwdRows[1, j] :=
                  (insB-FwdRows[0,j]+cost(Alo, j){1 match/change})*Multiplier;
            FwdRows[1,Blo2-1] := (insB+cost(Alo,0){del})*Multiplier;
            if rightTerminal then
               FwdRows[1,Blo2-1]:=FwdRows[1,Blo2-1]-log2(Bhi-Blo+2){all ways};
            FwdR := 1{for Choose!};
            Bmid := Choose(Blo2-1, Bhi);
            if Bmid=Blo2-1 then {choose a,b1,b2 | b1,a,b2 | b1,b2,a  say}
            begin if rightTerminal then
                     Bmid:=Blo+random(Bhi-Blo+2){uniform; Blo<=Bmid<=Bhi+1}
                  else Bmid:=Bhi+1 {A[Alo] comes LAST};
                  for j := Blo to Bmid-1 do Step(0, j); {insert 0 or more}
                  Step(Alo, 0);                         {delete}
                  for j := Bmid to Bhi do   Step(0, j)  {insert 0 or more}
            end
            else {Blo2 <= Bmid <= Bhi; eg. -b1,a|b2,-b3 }
            begin for j := Blo to Bmid-1 do Step(0, j); {insert}
                  Step(Alo, Bmid);                      {match/change}
                  for j := Bmid+1 to Bhi do Step(0, j)  {insert}
      end   end

      else {A length >= 2 & B not null}
      begin Amid := (Alo+Ahi)div 2;
            DPA( FwdRows, Alo, 1,Amid,   Blo, 1,Bhi, false);
            DPA( RevRows, Ahi,-1,Amid+1, Bhi,-1,Blo, rightTerminal);

            FwdR := Amid mod 2; RevR := 1-FwdR;
            for i := Blo-1 to Bhi do {MsgLengths of partitions of B}
               FwdRows[FwdR,i]:=(FwdRows[FwdR,i]+RevRows[RevR,i+1])*Multiplier;
            Bmid := Choose(Blo-1, Bhi);
            {NB. Alo < Amid < Ahi and Blo-1 <= Bmid <= Bhi}

            CA( Alo,   Amid,  Blo,   Bmid, false );          {Divide and ...}
            CA( Amid+1,Ahi,   Bmid+1,Bhi,  rightTerminal  )  {Conquer}
      end
   end {CA};

begin {ChooseAlignment}
   StrML := 0.0; {init' alignment msg length}
   R.N := 0;     {clear alignment R}
   R.K := A.K+B.K;
   R.t[0] := 0;
   for str := 1 to B.K do R.s[str    ] := B.s[str];
   for str := 1 to A.K do R.s[str+B.K] := A.s[str];
   for str := 1 to R.K do R.i[str,0]   := 0;

   CA( Alo, Ahi, Blo, Bhi, true ) {does the work}
end {ChooseAlignment};

{ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . }

   procedure ExtractAlignment;

      procedure path( i, j :StringIndex );
         var dir :Direction; str :StringNum;
      begin if i+j > 0 {either > 0} then
            begin dir := EDir[i,j];
                  path( i-ioffset[dir], j-joffset[dir] );           {recursion}
                  R.N := R.N + 1;

                  {A forms most sig' digits of R.t[], B least sig' digits}
                  R.t[R.N] := A.t[i] * leftshift * ioffset[dir]
                            + B.t[j] * joffset[dir];
                  HashTable[R.t[R.N]].occurs := HashTable[R.t[R.N]].occurs + 1;

                  for str := 1 to B.K do {B becomes least sig digits of R}
                     R.i[str    , R.N] := B.i[str,j] * joffset[dir];
                  for str := 1 to A.K do {A most sig digits of R}
                     R.i[str+B.K, R.N] := A.i[str,i] * ioffset[dir];
           end
           else { i = j = 0; arrived at origin }
           begin R.N := 0;
                 R.K := A.K + B.K;
                 R.t[0] := 0;
                 for str := 1 to B.K do R.s[str    ] := B.s[str];
                 for str := 1 to A.K do R.s[str+B.K] := A.s[str];
                 for str := 1 to R.K do R.i[str,0]   := 0;
           end
      end {path};

   begin path(A.N, B.N)
   end {ExtractAlignment};

{. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .}

begin {Align}
   write('Align ');
   if Mode=NonDeterministic then write('Non');
   write('Det Mult=', Multiplier:5:2);
   ShowNumbers(A); write(' with'); ShowNumbers(B); writeln;

   Estimating := false;
   leftshift := round( Expn( AlphabetSize + 1, B.K ) );

   TotalProb := 0; {of seen tuples}
   ET := 1.0; {Expected # visible & invisible Tuples / char of (arb') ancestor}
   for M := 1 to 2*K-3 do
   begin Pcopy   := Mc[M][Copy];
         Pchange := Mc[M][Change];
         Pinsert := Mc[M][Indel] / 2; Pdelete := Pinsert {could change this};
         ET := ET + Pinsert/(1-Pinsert); {expected # tuples / char}
{***Here is where one would alter the rates for transitions/transversions:}
         for x := 1 to AlphabetSize do
            for y :=1 to AlphabetSize do                   {NB. per char rates}
               Transition[M, x, y] := Pchange/(AlphabetSize - 1)/(1-Pinsert);
{***Insert rate could be changed to reflect base (letter) frequencies:}
         for x := 1 to AlphabetSize do
         begin Transition[M, Null, x] := Pinsert/AlphabetSize/(1-Pinsert);
               Transition[M, x, Null] := Pdelete/(1-Pinsert);
               Transition[M, x, x]    := Pcopy/(1-Pinsert)
         end;
         Transition[M, Null, Null] := 1.0 {!!!}
   end;

   HTentries := 0;
   for Tn := 0 to round( Expn( AlphabetSize+1, A.K+B.K ) ) - 1 do
      HashTable[Tn].occurs := - 1 {absent};

{now get alignment:}
   case Mode of
   Deterministic: begin DPA( EML, {A:}1,1{fwd},A.N, {B:}1,1{fwd},B.N, true);
                        StrML := EML[ A.N mod 2, B.N ];
                        ExtractAlignment;
                  end;
   NonDeterministic: ChooseAlignment( {A:}1,A.N,  {B:}1,B.N )
   end{case};

   write('Hash Table     = ',
         100*HTentries/(Expn( AlphabetSize+1, R.K )-1) :7:2, '%, ',
         ' P(seen tuples)=', TotalProb:6:4);
   if (TotalProb<0) or (TotalProb>1.00001) then write(' warning ***');
   writeln;

   ParamsML := 0;
   for M := 1 to 2*K-3 do
      ParamsML := ParamsML +            {this may be a slight overestimate ***}
         paramcost( Mc[M,Copy]*R.N, Mc[M,Change]*R.N, Mc[M,Indel]*R.N );
   write(  'MsgL = ',
      U(K)+log2(NTrees(K))+U(R.N)+ParamsML+errorcost+StrML :7:1, ' bits, ');
   ShowInfixTree( T ); writeln;
   writeln('     =', U(K) :4:1, '(K)',
                '+', log2(NTrees(K)) :4:1, '(tree)',
                '+', U(R.N) :5:1,   '(length)',
                '+', ParamsML :5:1, '(params)',
                '+', errorcost :3:1,'(error)',
                '+', StrML :6:1, '(algn)');

{Estimate new parameters:}
   Estimating := R.K=K;
   if Estimating then
   begin
      for M := 1 to 2*K-3 do for Op := NoOp to Indel do Freq[M, Op] := 1/3;
      for Tn := 1 to round( Expn(AlphabetSize+1, R.K) ) - 1 do
         if HashTable[Tn].occurs > 0 then
         begin for str := 1 to R.K do
                  Tp[ R.s[str] ] := Tn div round(Expn(AlphabetSize+1,str-1))
                                       mod (AlphabetSize+1);
               Explain( Tp, Tn ) {set Freq}
         end;

      writeln; writeln('next estimate of parameters:');
      for M := 1 to 2*K-3 do
      begin Optotal := 0.0;
            for Op := Copy {!} to Indel do Optotal := Optotal + Freq[M, Op];
            write( 'mc', M:1, ': ');
            for Op := Copy {!} to Indel do
            begin Freq[M, Op] := Freq[M, Op] / Optotal;
                  write(Freq[M,Op]:6:4, ' ')
            end;
            writeln('; ', paramcost(Freq[M,Copy]*R.N, Freq[M,Change]*R.N,
                                    Freq[M,Indel]*R.N) :5:1, ' bits')
      end
   end

end {Align};

{-----------------------------------------------------------------------------}

procedure RecAlign( var R :Alignment;
                    Mc :Machines; var Est :Machines;
                    var StrML :real );
{Get an initial alignment, not guaranteed optimal}
{Time is (2*K-3)*Time(Align)  ie.  (2*K-3)*N**2  or O(K*N**2) }

   procedure RA( var R :Alignment; T :Tree );  { Bottom-up alignment }
      var Left, Right :Alignment;
   begin if T^.left = nil then StrToAlign( T^.s, R )
         else
         begin RA( Left,  T^.left  );
               RA( Right, T^.right );
               Align(Right,Left,R, Deterministic,0, {local}T, Mc,Est, StrML )
         end
   end {RA};

begin RA( R, T )
end {RecAlign};

{-----------------------------------------------------------------------------}

procedure Improve( var R :Alignment;
                   Mc :Machines; var Est :Machines;
                   var StrML :real);
{attempt to improve the alignment}
   var A, B :Alignment; ss :StringSet;  {BEWARE:  A, B, R are global to Im}
       StrML1 :real;

   procedure Im( Tlocal :Tree; var ss :StringSet );
      var ss1, ss2 :StringSet;
   begin if Tlocal^.left <> nil then { Tlocal not a leaf }
         begin Im( Tlocal^.left,  ss1 );
               Im( Tlocal^.right, ss2 ); ss := ss1 + ss2;
               Project(R, ss1, A, B);
               Align(A, B, R, Deterministic,0, {whole}T, Mc, Est, StrML);
               Mc := Est;
               if ss <> [1..K] then
                  begin Project(R, ss2, A, B);
                        Align(A,B,R, Deterministic,0, T{whole}, Mc,Est, StrML);
                        Mc := Est
                  end
         end
         else {leaf} ss := [ Tlocal^.s ]
   end {Im};

begin repeat StrML1 := StrML;
             Im(T, ss);
      until StrML >= StrML1 - Convergence;
      ShowAlignment( R )
end   {Improve};

{. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .}

procedure Sample( var R :Alignment;
                  Mc :Machines; var Avg :Machines;
                  Nsamples :integer);
var A, B :Alignment; M :MachineNum; Mc2, StdDev :Machines;
    Op :Operation; sample :integer;
    MinML, MaxML, AvgML, StdDevML, Sum1ML, Sum2ML, Optotal :real;

   function find( m:MachineNum; T:Tree ):Tree;
      var T1:Tree;
   begin if T=nil then find:=nil
         else if T^.m=m then find:=T
         else begin T1:=find(m, T^.left);
                    if T1<>nil then find:=T1 else find:=find(m, T^.right)
   end        end;

   function getstrings(T:Tree):StringSet;
   begin if T=nil then getstrings:=[]
         else getstrings:=getstrings(T^.left)+getstrings(T^.right)+[T^.s]
   end;

begin {Sample}
      for M := 1 to 2*K-3 do
         for Op := NoOp to Indel do
            begin Avg[M,Op]:=0.0; StdDev[M,Op]:=0.0 end;
      AvgML := 0.0; StdDevML := 0.0; MinML := Infinity; MaxML := 0.0;
      Sum1ML := Infinity; Sum2ML := Infinity;

      for sample := 1 to Nsamples do
      begin M := random(2*K-3)+1;
            write('break tree on mc', M:1, '; ');
            Project(R, getstrings(find(M,T)), A, B);  {break Tree at an edge}
            Align(A, B, R,
                  NonDeterministic, {mul:}1.0 {+5*sample/Nsamples{>1=>Sim Ann'},
                  T, Mc, Mc2, StrML);  {re-align}
            if StrML < MinML then MinML := StrML;
            if StrML > MaxML then MaxML := StrML;
            AvgML := AvgML+StrML; StdDevML := StdDevML+sqr(StrML);
            Sum1ML := -LogSum( -Sum1ML, -StrML);
            if sample >= Nsamples div 5 then Sum2ML := -LogSum(-Sum2ML,-StrML);
{ShowAlignment( R );}
            for M := 1 to 2*K-3 do
               for Op := Copy to Indel do
               begin Avg[M, Op] := Avg[M, Op]+Mc2[M, Op];
                     StdDev[M, Op]:= StdDev[M,Op] + sqr(Mc2[M,Op]);
                     Mc [M, Op] := Mc[M, Op] + 0.1 * Mc2[M, Op] {lagged};
                     NormaliseMachines( Mc )
               end
      end;
ShowAlignment( R );

      AvgML    := AvgML/Nsamples;
      StdDevML := StdDevML/Nsamples-sqr(AvgML);
      if StdDevML >= 0 then StdDevML := sqrt(StdDevML) else StdDevML := 0;

      writeln('After ', Nsamples:1, ' samples:',
             ' AvgMsgL (algn)=', AvgML:7:1, '(+-', StdDevML:6:1, ')',
             MinML:7:1, ' ..', MaxML:7:1);
      writeln('LogSum 100% sample (algn)=', Sum1ML :7:1,
              ',  LogSum 80% sample =', Sum2ML :7:1);
      NormaliseMachines( Avg );
      for M := 1 to 2*K-3 do
      begin write('mc', M:1, ': ');
            for Op := Copy to Indel do
            begin StdDev[M, Op] := StdDev[M, Op]/Nsamples-sqr(Avg[M,Op]);
                  if StdDev[M,Op] >= 0 then StdDev[M,Op] := sqrt(StdDev[M,Op])
                  else StdDev[M,Op] := 0.0;
                  write( Avg[M, Op]:6:4, '(+-', StdDev[M,Op]:5:3, ') ')
            end;
            writeln
      end;
      write('Avg: ');
      for Op := Copy to Indel do
      begin Optotal:=0.0;
            for M := 1 to 2*K-3 do Optotal:=Optotal+Avg[M,Op];
            write(Optotal/(2*K-3):6:4, ' ':10)
      end; writeln
end {Sample};

{-----------------------------------------------------------------------------}

begin {main}
    writeln('(c) L.Allison, Dept. Comp. Sci, Monash University, Australia:');
    writeln('Minimum Message Length Evolutionary-Tree and Multiple-Alignment');
    writeln('Version: 1-state model, 25 May 1993');
    Assertion(Nmax1  =  Nmax+1,  'Nmax1     ');
    Assertion(HypMax = 2*Kmax-2, 'HypMax    ');
    Assertion(McMax  = 2*Kmax-3, 'McMax     ');
    Assertion(TupleMax = Expn(AlphabetSize+1, Kmax)-1, 'TupleMax  ');

    seed.lo := 13; seed.hi := 107;{for random number generator}
    LogSumArr[0] := -99; {flag it needs init'}
{Initialise constants:}
    for ch := chr(0) to chr(127) do CharToAlpha[ch] := 0;
    CharToAlpha['a'] := 1; CharToAlpha['A'] := 1;
    CharToAlpha['c'] := 2; CharToAlpha['C'] := 2;
    CharToAlpha['g'] := 3; CharToAlpha['G'] := 3;
    CharToAlpha['t'] := 4; CharToAlpha['T'] := 4;
    CharToAlpha['u'] := 4; CharToAlpha['U'] := 4; {for RNA}

    AlphaToChar[0] := '-'; {standard output representation}
    AlphaToChar[1] := 'A';
    AlphaToChar[2] := 'C';
    AlphaToChar[3] := 'G';
    AlphaToChar[4] := 'T';

    for x := 1 to AlphabetSize do
       for y := 1 to AlphabetSize do
          A2toOp[x,y] := Change;

    for x := 1 to AlphabetSize do
    begin A2toOp[Null, x] := Indel;
          A2toOp[x, Null] := Indel;
          A2toOp[x, x]    := Copy
    end;

    A2toOp[Null, Null] := NoOp;

    ioffset[Diagonal]   := 1; joffset[Diagonal]   := 1;
    ioffset[Horizontal] := 0; joffset[Horizontal] := 1;
    ioffset[Vertical]   := 1; joffset[Vertical]   := 0;

{Default Machines:}
    for M := 1 to 2*Kmax-3 do
    begin Mc[M][Copy]   := 0.9;
          Mc[M][Change] := 0.05;
          Mc[M][Indel]  := 1.0 - Mc[M][Copy] - Mc[M][Change]
    end;

{Get Strings:}
    writeln('Type # of strings >= 2 and <= ', Kmax:1); readln(K);
    Assertion( (K>=2) and (K<=Kmax), 'come on!  ');
    writeln('Type ', K:1, ' strings');
    for str := 1 to K do
       ReadString( S[str] );

{Do Business:}
    writeln('Type Tree, eg. ((1 2)(3 4))');
    while not eof do
    begin T := ParseTree; Name(T); ShowTree(T);

          RecAlign( A, Mc, Est, StrML );
          Improve(  A, {initial} Est, {better} Est, StrML );
          NullTheory;
          Sample( A, Est, Est, 1000{samples});
          writeln('********** end of tree **********');
          writeln('Type next tree? or <eof>')
    end;

99: writeln('********** finish **********')

{Lloyd ALLISON, Dept.Comp.Sci. (now FIT), Monash University, Australia 3800}

end.
