Information Processing Theory 43 



a word never before used in this sample will be produced. Thus, 

 we wish to attenuate slightly the probabilities attached to the 

 association process. 



The death process, though intuitively less satisfactory, would 

 be true if all occurrences of a given word were closely grouped as 

 if associated only with the topic of the moment. Thus, if a page is 

 removed from the sample, it is likely that all occurrences of every 

 word on it will be removed. 



Since we are discussing a sample of constant size, we may 

 write an equation to indicate that total births are balanced by 

 total deaths. But, furthermore, we are concerned with the so-called 

 steady state of this stochastic mechanism. This is the situation 

 which persists when the sample size is large enough that the 

 statistical distribution remains invariant. Thus, the number of 

 words dropped from the category witli relative frequency f*{i) 

 must be just balanced by the number of words entering that 

 category, which is the number of births in the category with 

 relative frequency /*(/-l). These requirements enable us to write 

 the following equation 



births in (z — 1) minus births in (z) minus deaths in (0 = 



(I-/3C-1) j*{i-\) - (i-^c) f*{i) - f*(i) = 



which may be rewritten as 



/•«=(:5^)/-o-i) 



thereby recursively defining the desired quantity. This function 

 has the required properties. 



In the example of the stochastic theory, then, the assumptions 

 are probabilistic decision rules and the deductions are made 

 analytically. 



The third description of language production is due to Mandel- 

 brot (11) and exemplifies the application of information theoretic 

 concepts. Superficially, this approach is similar to that of Zipf, 

 for Mandelbrot derives the equation for the standard curve by 

 minimizing the cost of coding the speaker's ideas into words, 

 subject to the constraint of a fixed amount of infoimation trans- 

 mitted per word- However, Mandelbrot is quite specific as to 

 what he means by both information and cost. 



