How many bits of data does it take to encode an [3v9 $~$n$~$-message]? Naively, the answer is $~$\lceil \log_2(n) \rceil$~$ ([n_message_bit_length why?]): For example, it takes 5 bits to encode a 21-message, because 4 bits are only enough to encode 16 different messages, but 5 bits are enough to encode 32. The use of the Ceiling function implies an inefficiency: 2 bits are required to encode a 3-message, but 2 bits are enough to distinguish between four different possibilities. One of those possibilities is being wasted. That inefficiency can be reduced by encoding multiple $~$n$~$-messages at the same time. For example, while an individual 3-message requires 2 bits to encode, a series of 10 3-messages requires at most 16 bits to encode: $~$3^{10} < 2^{16}.$~$
Why is it that encoding ten 3-messages together (using bits) is cheaper than encoding ten 3-messages separately? Naively, there are three different factors that allow the combined encoding to be shorter than the sum of the separate encodings: The messages could have different likelihoods ([expected_compression allowing the combined message to be compressed in expectation]); the messages could be dependent on each other ([compressing_dependent_messages meaning they can be compressed]); and the mismatch between bits and 3-messages gets washed out as we put more three-messages together (see How many bits to a trit?).
In fact, the first two factors are equivalent: 10 3-messages are equivalent to one $~$3^{10}$~$ message, and in general, [nkmessages $~$n$~$ $~$k$~$-messages are equivalent to one $~$n^k$~$-message]. If the individual n-messages are dependent on each other, then different $~$n^k$~$ messages have different likelihoods: For example, if message 3 never follows message 2, then in the combined message, "32" never appears as a substring.
Thus, there are two different ways that an encoding of $~$k$~$ $~$n$~$-messages can be shorter than $~$k$~$ times the encoding of an $~$n$~$-message: The various combined messages can have different likelihoods, and the efficiency of the coding might increase. To study the effect of different likelihoods on the encoding length in isolation, we can [assume_maximum_efficiency assume that the codings are maximally efficient] and see how much additional [compression] the different likelihoods get us. To study code efficiency in isolation, we can [assume_equal_likelihood_messages assume each message is equally likely] and see how much additional compression we get as we put more $~$n$~$-messages together. In practice, real compression involves using both techniques at once.
Comments
Eric Rogstad
Did you mean this to link to How many bits to a trit??
Eric Rogstad
Is what follows the colon meant to be justification for what precedes it? I'm not following.