Completely unrelated but handy, here is some info grabbed off a page found in an altavista search, and preserved here in case the page goes away.

Example To illustrate this concepts, the classical example is the "coding" (crypting) method
of replacing each letter by another one (or another arbitrary symbol).

Which was just the style of cryptology known by the world war II.

In practice this kind of coding can be easily decrypted by using the probaility distribution
of apparition of symbols in the coded messages and comparing it with the known probability
distribution of letters in the original language.

The next table shows the letters of the alphabet, along with their approximate probabilities
of occurrence in the English language.

(the letters are listed in decreading order of frequency)
Letter Probability Letter Probability Letter Probability
Space 0.1859 H 0.0467 P 0.0152
E 0.1031 L 0.0321 G 0.0152
T 0.0796 D 0.0317 B 0.0127
A 0.0642 U 0.0228 V 0.0083
O 0.0632 C 0.0218 K 0.0049
I 0.0575 F 0.0208 X 0.0013
N 0.0754 M 0.0198 Q 0.0008
S 0.0514 W 0.0175 J 0.0008
R 0.0484 Y 0.0164 Z 0.0005
The binary entropy of the english alphabet is appoximately:

4.07991 bits

That means that when we take one letter from a text at random the uncertainty about
the result is 4.07991 bits.

But if the outcome results to be the letter 'M', then:

x = 'M'

and so,

p(x) =  0.0198

the information quantity of 'M' is:

-log2(p(x))  = -log2( 0.0198 ) = 5.6584 bits

The probability of getting this information quantity is the same
probability of apparition of 'M', that is p(x).

For that reason, entropy can be seen as the expected value of
information obtained by sampling the source.

It is, after all, a weighted sum of information quantities.
Where the weights are their respectives probabilities.