398 BELL SYSTEM TECHNICAL JOURNAL 



The next two theorems show that H and H' can be determined by limit- 

 ing operations directly from the statistics of the message sequences, without 

 reference to the states and transition probabilities between states. 

 Theorem 5: Let p{Bi) be the probability of a sequence Bi of symbols from 

 the source. Let 



where the sum is over all sequences Bi containing N symbols. Then Gn 

 is a monotonic decreasing function of N and 



Lim Gn = H. 



N-*ao 



Theorem 6: Let p(Bi , Sj) be the probability of sequence Bi followed by 

 symbol Sj and pBi(Sj) = p(Bi , Sj)/p{Bi) be the conditional probability of 

 Sj after B » . Let 



Fn= -Ilp(Bi,Sj) log Pn,{Sj) 



where the sum is over all blocks Bi oi N — 1 symbols and over all symbols 

 Sj . Then Fn is a monotonic decreasing function of N, 



Fn = NGn-{N- 1)G^_i, 



Fn < Gn , 

 and Lim Fn = H. 



JV-»oo 



These results are derived in appendix III. They show that a series of 

 approximations to H can be obtained by considering only the statistical 

 structure of the sequences extending over 1, 2, ■ • - N symbols. Fn is the 

 better approximation. In fact Fn is the entropy of the N^^ order approxi- 

 mation to the source of the type discussed above. If there are no statistical 

 influences extending over more than N symbols, that is if the conditional 

 probability of the next symbol knowing the preceding (N — 1) is not 

 changed by a knowledge of any before that, then Fn = H. Fn of course is 

 the conditional entropy of the next symbol when the (N — 1) preceding 

 ones are known, while Gn is the entropy per symbol of blocks of A^ symbols. 



The ratio of the entropy of a source to the maximum value it could have 

 while still restricted to the same symbols will be called its relative entropy. 

 This is the maximum compressfon possible when we encode into the same 

 alphabet. One minus the relative entropy is the redundancy. The redun- 



