734 THE BELL SYSTEM TECHNICAL JOURNAL, JULY 1952 



Ordinary PCM is statistically matched only to a random message source 

 with flat distribution. 



If the messages from a source are characterized by frequent long runs 

 of symbols of the same type (e.g., long runs of zeros) an obvious saving 

 is possible by sending the value of the symbol only once, together with 

 a code group which gives the length of the run. This is commonly known 

 as run length coding. The remaining sections of the message (between 

 runs) may then either be sent directly (i.e., merely remapped by a non- 

 statistical process) or they may be encoded by some other statistical 

 process, if this seems warranted. In the latter case we have a mixed cod- 

 ing procedure. The codes representing run lengths must either be set 

 apart from the remainder of the signal by "punctuating" codes, or iden- 

 tifiable by some distinguishing characteristic. 



Run length coding may be generalized to take care of other common 

 sequences besides runs of a single symbol. Any commonly occurring se- 

 quence of symbols may be considered a "run" and treated in the same 

 fashion. More complicated code groups will be required to specify the 

 type of run, if a large variety is accommodated this way. Ultimately, 

 the distinction between this type of coding and Shannon-Fano coding 

 becomes rather nebulous, especially if a fixed maximum length of run 

 is permitted, for then all possible messages of this length may be con- 

 sidered "runs" and simply encoded by the Shannon-Fano code. 



No optimal general solution of the coding problem is known. That is, 

 one cannot say in all cases exactly what coding procedure one should 

 use with a given message source to produce the most efficient encoding 

 for a given complexity of apparatus. Several procedures have been de- 

 vised which seem suitable for certain types of messages and these are 

 discussed in the following sections. 



n-GEAMMING 



The application of the Shannon-Fano code to a block of k symbols of 

 a message in an f letter alphabet requires that f different codes be used. 

 The receiver must be able to recognize each of these and to regenerate 

 the proper message block when a particular code is received. If / is on 

 the order of 10 to 100 as is typically the case, we very quickly run out of 

 room to house the receiver and money to build it with. On the other 

 hand, if k is small, say on the order of 1 to 3, considerable statistical 

 information between blocks is ignored. These considerations led to the 

 development of a class of encoders known as n-grammers. The name 

 stems from the fact that they operate on the n-gram statistics of the 



