
Markov Models
A Markov model is a probabilistic process over a finite set,
{S_{1}, ..., S_{k}},
usually called its states.
Each statetransition generates a character from the alphabet
of the process.
We are interested in matters such as
the probability of a given state coming up next,
pr(x_{t}=S_{i}),
and this may depend on the prior history to t1.
In computing, such processes,
if they are reasonably complex and interesting, are usually called
Probabilistic Finite State Automata (PFSA) or
Probabilistic Finite State Machines (PFSM)
because of their close links to deterministic and nondeterministic
finite state automata as used in formal language theory.
Examples are (hidden) Markov Models of
biased coins and dice, formal languages, the weather, etc.;
Markov models and Hidden Markov Models (HMM) are used in Bioinformatics
to model DNA and protein sequences.
Order 0 Markov Models
An order 0 Markov model has no "memory":
pr(x_{t} = S_{i}) =
pr(x_{t'} = S_{i}),
for all points t and t' in a sequence.
An order 0 Markov model is equivalent to
a multinomial probability distribution.
The issue of the accuracy
with which the model's parameters should be stated, and
hence the model's complexity, was investigated by
Wallace and Boulton (1968, appendix).
Order 1 Markov Models
An order 1 (firstorder) Markov model has a memory of size 1.
It is defined by a table of probabilities
pr(x_{t}=S_{i}  x_{t1}=S_{j}),
for i = 1..k & j = 1..k.
You can think of this as k order 0 Markov models,
one for each S_{j}.
Order m Markov Models
The order of a Markov model of fixed order, is the
length of the history or context
upon which the probabilities of
the possible values of the next state depend.
For example, the next state of
an order 2 (secondorder) Markov Model depends
upon the two previous states.
Example
Biased Coins
A large biased coin, `A',
is described by the order 0 Markov model
pr(H)=0.6, pr(T)=0.4.
A small biased coin, `B', pr(h)=0.3, pr(t)=0.7.
A third biased coin, 'C', pr(H)=0.8, pr(T)=0.2.
3 Coins
Consider the following process:
 Let X equal one of the coins `A' or `B' (above).
 repeat
 Throw the coin X and write down the result.
 Toss coin `C'.
 if `C' shows T then
change X to the other one of {`A',`B'},
otherwise leave X unchanged.
 until enough
 We get the following firstorder Markov model over
{H,T,h,t}:

x_{t1} 
H  T 
h  t 
pr(x_{t}=H) 
0.6×0.8  0.4×0.8  0.3×0.2  0.7×0.2 
pr(x_{t}=T) 
0.6×0.8  0.4×0.8  0.3×0.2  0.7×0.2 
pr(x_{t}=h) 
0.6×0.2  0.4×0.2  0.3×0.8  0.7×0.8 
pr(x_{t}=t) 
0.6×0.2  0.4×0.2  0.3×0.8  0.7×0.8 
Below: The process can be drawn as a diagram of states (nodes)
and transitions (edges), which makes it clear
that it is equivalent to a finite state machine with
probabilities associated with each transition:
The probabilities on the arcs out of a given state must
sum to 1.
e.g. output
HTHHTHHttthtttHHTHHHHtthtthttht...
Hidden Markov Models
A Hidden Markov Model (HMM) is simply a Markov Model
in which the states are hidden.
For example, suppose we only
had the sequence of throws from the 3coin example above,
and that the uppercase v. lowercase information had been lost.
HTHHTHHTTTHTTTHHTHHHHTTHTTHTTHT...
We can never be absolutely sure which coin was used at a given
point in the sequence but we can calculate the probability.
Variations
The are a number of variations on HMM problems, e.g.
 The number of states and transition probabilities are known
 Given data,
find the optimum (most likely) position of the change points;
this is sometimes called the `segmentation problem' or
the `cutpoint problem'.
 How precisely should the points be stated?
 In some problems the runlengths of the use of a component model
(i.e. one of the coins `A' or `B' above) is generalised
and modelled by some appropriate probability distribution.
 The number of states is known, but the transition probabilities are not
 Estimate the transition probabilities.
 What accuracy is appropriate for the estimates?
 The number of states, and the architecture, are unknown.
 Find a HMM / automaton which models the data "well".
 The simplest model has one state,
the most complex model has one state per data value;
almost certainly neither extreme is justified.
Quantifying model complexity is therefore a crucial issue.
 NB. The cutpoint positions are nuisance parameters for this problem.
Automata (Variable Order)
A probabilistic finite state automaton consists of states
(drawn as nodes) and transitions (arcs, edges).
Each transition causes a character to be output.
The automaton moves from
state to state outputing characters from its alphabet.
Each state can be thought of as an order 0 Markov model,
i.e. a multinomial probability distribution,
over its outgoing transitions.
On the other hand, two or more states may have outgoing transitions
labelled with the same set of characters
but having different probability distributions.
The paths into such states can be thought of as
contexts of varying lengths which govern the probability distribution
of the character to be output next.
Example
Above: The best probabilistic finite state automaton,
in terms of model complexity and fit to the data,
for the data
`CAAAB/ BBAAB/ CAAB/ BBAB/ CAB/ BBB/ CB '.
The character `/' is a delimiter,
known a priori to cause a return to the start state.
The problem was posed by Gaines (1976), the solution is from
[TR32] (1983).
The examples start with
C or BB ,
then there are a few A 's, and
finally B/ .
The solution shows this.
The probabilities of the transitions can be estimated
from the observed frequencies.
Inference
If the state transitions of the automaton are known,
all that remains is to infer the probabilities of
the transitions out of each state, which can be done
using the observed transition frequencies (counting).
Sometimes the architecture of the model and
the sequence of output characters are known,
but not the sequence of state transitions.
In general, neither the architecture of the automaton
nor the transitions are known  they are hidden.
One must search for an automaton that explains the data well.
It is necessary to weigh the tradeoff between
the complexity of the model and its fit to the data.
Applications
MML and (hidden) finite state models have be used to:
 Infer the grammar of simple languages:
 C. S. Wallace & M. P. Georgeff.
A General Objective for Inductive Inference.
[TR32 (HTML)],
March 1983,
Department of Computer Science, Monash University, Australia.
Also:
M. P. Georgeff & C. S. Wallace.
A General Selection Criterion for Inductive Inference.
European Conference on Artificial Intelligence (ECAI, ECAI84),
Pisa, pp.473482, September 1984.
 Infer the class structure in a timeseries or other sequence
of multivariate data:
 Model the relationship between pairs of DNA sequences:
 Align pairs of sequences where each sequence is nonrandom,
e.g. modelled by a finitestate model, or by any "left to right" model:
 D. R. Powell, L. Allison, T. I. Dix, D. L. Dowe.
Alignment of low information sequences.
Australian Comp. Sci. Theory Symp.
(CATS98),
Perth, pp.215230, Springer Verlag
isbn:9813083921, Feb. 1998
 L.Allison.
InformationTheoretic Sequence Alignment.
[98/14 (HTML)]
CSSE Monash University, 1998
 L. Allison, D. Powell & T. I. Dix.
Compression and Approximate Matching,
Computer Journal, 42(1), pp.110, 1999
 D. R. Powell, L. Allison, T. I. Dix.
ModellingAlignment for NonRandom Sequences,
AI2004, Springer Verlag, LNCS/LNAI 3339, pp.203214, Dec. 2004
 Model the relationship between several sequences given
an evolutionary (phylogenetic) tree:
 Discover weak and approximate patterns in DNA sequences:
 L. Allison, T. Edgoose, T. I. Dix.
Compression of Strings with Approximate Repeats.
Intelligent Systems in Molecular Biology ISMB98
pp.816, Montreal, 28 June  1 July 1998
 L. Allison, L. Stern, T. Edgoose & T. I. Dix.
Sequence Complexity for Biological Sequence Analysis.
Computers and Chemistry 24(1), pp.4355, Jan. 2000
 L. Stern, L. Allison, R. L. Coppel, T. I. Dix.
Discovering patterns in Plasmodium falciparum genomic DNA.
Molecular and Biochemical Parasitology, 118(2) pp.175186, 2001
 See
MML,
Bioinformatics

