APPENDIX IV 
IX 
rich than that of the Research Notebooks, with the possible exception of 
the subject note cards. There was a tremendous amount of variation in the 
contents of the note cards, with some cards containing nothing but a name 
or subject number and some cards containing detailed clinical notes about a 
named individual. 
All of these documents were single-coded, meaning that only one person 
coded any given document. Each coder’s work was regularly audited for 
faithful recording of data from the source documents, appropriate applica- 
tion of the coding scheme, and consistency. 
Subject Database Quality Control 
Once all sources were coded, separate databases were combined into one master 
Subject Database, with over 30,000 lines of data on over 3,000 individual 
subjects. The database was checked for obvious coding errors (e.g., a name 
where a date should be, mode of inoculation where a test result should be) and 
corrected where necessary. The Subject Database was saved and archived. 
In order to identify the total number of subjects involved in the studies, as 
well as information about inoculation and treatment, further quality control 
of the database began. The first step was to review the names, where possible, 
in an effort to ensure that any given individual was only counted once. A 
new column was created (“Full Name Clean”) to hold the best assessment 
of an individual’s name in cases where ambiguity existed. A paradigmatic 
case is one where one (or many) line includes information on A. Gomes and 
another line (or many) includes data on A. Gomez. In this instance, if it was 
found that a second piece of information (subject number, date, population, 
age, or experiment number) on each individual matched, those two lines of 
data were assumed to be on the same individual. All lines with information 
on that individual were then assigned either the majority name or the most 
logical name — in our example, all lines would be assigned the name of “A. 
Gomez” in the Full Name Clean column. The First Name and Last Name 
columns were always left untouched, changing the name only in the Full 
Name Clean column. 
If a line of data could not be assigned to a unique individual, the data were 
not included in our subsequent calculations. 
151 
