374 



POPULAR SCIENCE MONTHLY 



when he is so very, very far from ever 

 having made one. But serious he 

 seems to have been and perhaps never 

 more so than when he declares that 

 the ' average word length alone . . . 

 would, in general, be indicative of the 

 nature of the curve.' This is equiv- 

 alent to saying that the form of a curve 

 is known when its mean ordinate is 

 known, and is a statement which, to 

 those who are accustomed to the 

 graphic representation of variables, will 

 betray an almost immeasurable un- 

 familiarity with problems of this kind. 

 Among other evidences of this state 

 of mind which might be cited, the con- 

 struction of a ' typical word-curve of 

 extreme light dialogue ' — from a count 

 of 5,000 words from Swift's ' Polite 

 Conversation ' — is not the least con- 

 vincing. To produce this Swift's 

 curve is ' corrected ' by the suppression 

 of certain words of seven or eight 

 letters, for no assigned or imaginable 

 reason, except that perhaps Dr. Moritz 

 thinks that Swift ought to have known 

 better than to have used them. The 

 curve of this expurgated edition of 

 5,000 words from Swift is interesting 

 in form, but if it be the ' typical word- 

 curve of extreme light dialogue ' in 

 the English language, as declared by 

 Dr. Moritz, those who have dabbled, 

 even a very little, in word- counting of 

 modern comedy and humorous story 

 writers will be saddened by the thought 

 that the art of composing ' extreme 

 light dialogue ' must have long ago be- 

 come extinct. 



It seems impossible to avoid the con- 

 clusion that Dr. Moritz, perhaps as a 

 result of a somewhat hasty examina- 

 tion of the subject, has failed to grasp 

 in its entirety the fundamental prin- 

 ciple on which the whole doctrine (if 

 so dignified a term may be used) of 

 ' characteristic curves of composition ' 

 is based, and a brief exposition of its 

 most important propositions may not 

 be out of place. 



The notion that every author, how- 

 ever voluminous, must necessarily be 

 restricted in his use of words to a 



vocabulary which would remain sensibly 

 constant after his productive period 

 had been reached, which, in its char- 

 acter and extent would be one of the 

 personal ' qualities ' of that author 

 and thus offer a means of identification, 

 is due, as is stated in the paper of 

 twenty years ago, to Augustus De Mor- 

 gan, who suggested that vocabularies 

 might differ so much among different 

 authors as to make it possible to dif- 

 ferentiate them by means of the simple 

 average number of letters in a word. 

 In making some tests of this proposi- 

 tion it immediately became evident, as 

 might have been anticipated, that 

 vocabularies might differ indefinitely 

 and enormously and at the same time 

 agree in average word-length. The 

 scheme for the graphic display of vari- 

 ations in the average frequency of oc- 

 currence of words of different lengths, 

 as explained in the papers under dis- 

 cussion, was then devised and proved 

 to be a vastly more powerful means of 

 revealing peculiarities in composition. 

 As to the value of this method of treat- 

 ment, which is the one original feature 

 of the whole, there seems to be no 

 question, as even my critic has paid 

 me the highest compliment in his power 

 in making continued and apparently 

 confiding use of it. The point at issue 

 is, rather: Was De Morgan right in as- 

 suming that the personal element en- 

 ters into the vocabulary of any author 

 to such an extent as to furnish a 

 means of identifying his writing? He 

 evidently believes that it played so 

 large a part as to determine the aver- 

 age length of words used; the theory 

 of ' characteristic curves ' implies that 

 personality may determine the way in 

 which words are used rather than their 

 average length, and it furnishes a 

 method for revealing peculiarities, sucli 

 as persist in the long run (this is the 

 kernel of the thing) in the relative 

 frequencies of words of different num- 

 bers of letters, syllables, etc., of sen- 

 tences of different lengths or of any 

 'qualities' that may be treated nu- 

 merically. Because of simplicity, ease 



