130 NUCLEOTIDE SEQUENCE IN DEOXYRIBONUCLEIC ACIDS 



on p. 326 of a previous survey^ and applied by us to preparations 

 from calf thymus^^ and Escherichia coli protoplasts^^. 



c. The problem of statistical sequence analysis 



The possibility that no two nucleic acid molecules within the 

 same nucleus are entirely identical offers "a prospect that would 

 seem to condemn us to forced statistics for life''^^. It is this more 

 than anything else that has made work on the nucleotide sequence 

 in nucleic acids appear so unattractive. Who would, after all, 

 undertake to read a book that has been passed through a grinder? 

 Nevertheless, being rather modest in what I expected to gain 

 from a perusal of the nucleic acid text, I have never been able to 

 share these apprehensions completely. I knew that a great deal 

 can be learned about an unknown language through a study of 

 its phonemes, their frequency, distribution density, and allo- 

 phonic relationships. 



If the total deoxyribonucleic acid of a given species represents 

 a text, it is made up of "words" — the individual molecules — that 

 are composed of a singularly meagre alphabet: four or five letters. 

 But the words so spelled out are 10,000-letter words, each of 

 which could occur in a fantastically great number of positional 

 isomers: between 10^^^^ and 10^^^^, according to how many 

 restrictions on neighbors are admitted^. The situation facing us 

 in examining a nucleic acid preparation comprising a large num- 

 ber of isomers or homologues would, then, be comparable to one 

 in which all the words in a dictionary are lined up end to end in 

 a continuous, and essentially arrhythmic and aperiodic, sequence. 



It is quite clear that the first attempt at unraveling such a 

 clutter will have to be based on statistics and that it must limit 

 itself to the description of tendencies or trends of arrangement. 

 To give an example: running together the thirteen words making 

 up the first sentence of King Lear I obtain a monster word of 

 fifty-seven letters of which twenty-one are vowels. On this word 

 a number of determinations can be made: (a) the ratio of con- 

 sonants to vowels; (b) the nature of the individual consonants 

 and vowels; (c) the relative frequency of each constituent. If I 



