Thus, the basic advantage offered by coding chemical-biological information and placing the 

 coded information on a medium that can recognize and sort the coded information mechanically, such 

 as IBM punched cards or electronic tape or wire, is that many indexes can be composited in a single 

 file, representing an economy of space in information storage. The Appendix will disclose that, in 

 spite of using mechanical equipment and punching all information (i. e. , all the indexing criteria) in a 

 single file, the CBCC established several separate punched card files for separate indexes (test 

 organism, host, anatomical part, etc.). While this might seem to contradict the claims just made for 

 conservation of storage space and advantage of machines, it is actually only evidence of a complication 

 arising from the size of the Information collection and the limitation of machines used. The Appendix 

 should be consulted for an explanation of this. Under the general section describing the CBCC files, 

 see "Biology IBM Punched Card Files". 



A number of the fields of the Code are not expected to be used frequently, if ever, as indexes 

 for retrieval of coded information. Fields T-1, V, and R are fields (i. e. , indexing criteria) which 

 should probably be regarded as essentially useless as retrieval criteria (the action of the test 

 compound [increasing, decreasing, initiating, antagonizing, etc. ], the time to evaluation of the effect, 

 and the time of treatment with the test compound relative to the time of inoculation, respectively). 

 Fields A, B, C, F, G, K, L, O, P, 0, S, and U (refer to the Code for their descriptions) are of more 

 probable use as indexes, yet are of minor importance compared to the remaining fields, in terms of 

 frequency of use. 



Since a number of the coding fields can be regarded as of little importance as general retrieval 

 criteria, they must be considered from other standpoints. First, it should be pointed out that an index 

 which may have a low incidence of use for general retrieval of information in the CBCC coded files may 

 be of critical importance when studying and correlating a mass of chemical-biological information of 

 a circumscribed nature, such as all anti-malarial chemotherapy tests, all rodent repellency tests, all 

 tumor inhibiting tests, etc. , since the ability for separation of information according to this detail 

 might be expected to become more important as the study of information becomes more specialized. 

 Secondly, an advantage may be assumed for having available from the mechanized file of information 

 as many details of a test as possible, in a code language, making the coded information as self- 

 sufficient as possible; this would be especially true in the case of a system by which it was not 

 Intended that reference would be made to the original data or to a written abstract. The CBCC has 

 regarded coding in the light of both of these aspects and while perhaps none of the coding fields should 

 be considered to be totally valueless as an index, some of them may be regarded as having more signif- 

 icance as a means of expressing in code language all aspects of chemical-biological tests. 



The Organization of the Biology Code as Related to the Machines Used for Handling Coded Information : 



To return to the matter of determining the number of information categories (coding "fields") to 

 be included in the CBCC Biology Code and the limits of each, it must be recognized that the CBCC was 

 committed to using standard business machines. The limitations of this equipment, as well as the 

 equipment's advantages, were impressed on the structure of the Biology Code. This can be said of 

 every mechanical system and when coded chemical-biological information is placed on other media, 

 such as electronic tape, the limitations and advantages of that system will likewise be impressed on 

 the code used, influencing such decisions as that for the number of information categories. 



After studying the IBM equipment and methods, the CBCC decided that the IBM punched card 

 handling of masses of coded information concerning individual chemical-biological tests was practical 

 only by restricting the information about any given test to a single IBM punched card. Having made 

 this decision, coding of information about the test was limited to less than 80 IBM punched card 

 columns. Eight columns are needed for the eight units of the CBCC Chemical Serial Number identifying 

 the chemical used in the test; nine more columns are needed for the six units of the Code Sheet Number, 

 the two units of the Code Line Number, and the single unit of the IBM Punched Card File Number. Thus, 

 actually only 63 columns are available for information coded by the Biology Code. The 63 available 

 columns can not be utilized independently to permit coding of 63 types of information, since a few of 

 the types of Information require more than one IBM column. Thus, the CBCC apportioned the available 

 IBM columns to the various categories of information it considered most essential in order to have 

 converted into code language as completely as possible the record of the test and, by the same token, 

 to index the information to provide the greatest possible retrieval and correlative facility. 



The Key description of each of the code fields generally explains why a given type of information 

 can be adequately coded using only a single IBM punched card column or why more columns must be 



- 8 - 



