Compression 
The information in learning of an event E of probability pr(E) is log_{2}(pr(E)) bits. For example, if the four DNA bases {A,C,G,T} each occured 1/4 of the time, an optimal code would be
Note that log_{2}(0.25)=2:
Each base would be worth 2bits of information.
However, if the probabilities of the bases were
would be optimal; note log_{2}(1/8)=3, etc.. In this case the average code length would be
which is less than before.
In general the probability of the next symbol, S[i], of a sequence, S,
may depend on previous symbols, and then we
deal with conditional probabilities Information content can be used to discover
patterns, repeats, gene duplications and the like in sequences.
It can also give a distance between DNA sequences or protein sequences,
for classification or for inference of phylogenetic (evolutionary) trees,
without aligning the sequences.
And "costing" the symbols in
an [alignment]
according to the symbols' information content gives an alignment
algorithm, 

