The Message Paradigm

LA home
  <on MML<

Tracy T. Transmitter and Richard R. Receiver get together and select a set of hypotheses, {H0, H1, ... }, to describe data, and design a code book to transmit two-part messages, where each message consists of (i) an hypothesis and (ii) a data-set given the hypothesis. This allows T and R to write encoder and decoder programs P and P-1. Naturally T and R want to use short code words in a message but, at this stage, any data are purely hypothetical and so they must design the code book based on expected data.

Then T and R move apart and the following happens . . .

T gets an actual data-set, D.
T chooses an H from the set.
T transmits H;D to R.
|msgLen| = |part1| + |part2|
part1: code(H) part2: code(D|H)
decoder P-1... run on some UTMR
encoder P... run on some UTMT
R receives H;D.
R now knows the data-set, D,
& also T's opinion, H, of D.

UTM : A universal Turing machine.
Shannon, |code(X)| = -log(pr(X)), and
Bayes, |code(H&D)| = |code(H)| + |code(D|H)| = |code(D)| + |code(H|D)|,
give  - log(pr(H|D)) ~ |code(H)| + |code(D|H)|.
The selection of {H0, H1, ... }, and the issue of what data each Hi best covers, must be considered together in the design of the code book.
Being very sensible, T will select an H that is a good model of D, but a less sensible individual might not and yet R could still recover D, although the message would be longer:
- log(pr(Hi|D) / pr(Hj|D)) = |code(Hi)|+|code(D|Hi)| - (|code(Hj)|+|code(D|Hj)|),   -- negative log posterior-odds ratio.
Note, depending on the application area, a data-set could be a single thing, e.g., a genome.

↑ © L. Allison,   (or as otherwise indicated).
Created with "vi (Linux)",  charset=iso-8859-1,   fetched Sunday, 26-Jan-2020 18:33:28 EST.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.