MATHEMATICAL THEORY OF COMMUNICATION 399 



dancy of ordinary English, not considering statistical structure over greater 

 distances than about eight letters is roughly 50%. This means that when 

 we write English half of what we write is determined by the structure of the 

 language and half is chosen freely. The figure 50% was found by several 

 independent methods which all gave results in this neighborhood. One is 

 by calculation of the entropy of the approximations to Enghsh. A second 

 method is to delete a certain fraction of the letters from a sample of English 

 text and then let someone attempt to restore them. If they can be re- 

 stored when 50% are deleted the redundancy must be greater than 50%. 

 A third method depends on certain known results in cryptography. 



Two extremes of redundancy in EngHsh prose are represented by Basic 

 Enghsh and by James Joyces' book "Finigans Wake." The Basic English 

 vocabulary is limited to 850 words and the redundancy is very high. This 

 is reflected in the expansion that occurs when a passage is translated into 

 Basic English. Joyce on the other hand enlarges the vocabulary and is 

 alleged to achieve a compression of semantic content. 



The redundancy of a language is related to the existence of crossword 

 puzzles. If the redundancy is zero any sequence of letters is a reasonable 

 text in the language and any two dimensional array of letters forms a cross- 

 word puzzle. If the redundancy is too high the language imposes too 

 many constraints for large crossword puzzles to be possible. A more de- 

 tailed analysis shows that if we assume the constraints imposed by the 

 language are of a rather chaotic and random nature, large crossword puzzles 

 are just possible when the redundancy is 50%. If the redundancy is 33%, 

 three dimensional crossword puzzles should be possible, etc. 



8. Representation of the Encoding and Decoding Operations 



We have yet to represent mathematically the operations performed by 

 the transmitter and receiver in encoding and decoding the information. 

 Either of these will be called a discrete transducer. The input to the 

 transducer is a sequence of input symbols and its output a sequence of out- 

 put symbols. The transducer may have an internal memory so that its 

 output depends not only on the present input symbol but also on the past 

 history. We assume that the internal memory is finite, i.e. there exists 

 a finite number m of possible states of the transducer and that its output is 

 a function of the present state and the present input S3anbol. The next 

 state will be a second function of these two quantities. Thus a transducer 

 can be described by two functions: 



yn = fiXn , an) 

 an+1 = giXn , an) 



