FINCH: THE MECCA PROJECT 



APPENDIX 



TECHNIQUES USED TO RECOVER DEFECTIVE RECORDS 



While editing the Market Facts data, in prepara- 

 tion for creating the MECCA data base, 13,377 

 records were identified which qualified for 

 inclusion, but which had critical information 

 missing. The missing information was found 

 to be confined to four data elements: 



1. Number of individuals in family. 



2. Number of meals eaten away from home. 



3. Weight offish purchased. 



4. Item code (species identification). 



The majority of the 13,377 records had two 

 or more of these items missing. Because of 

 certain redundant information in each record 

 it was felt that many of these records could 

 be recovered without substantial distortion of 

 the true distribution offish consumption. 



The most easily recoverable data item was 

 "number of individuals in family." For each 

 record with this defect, the file of good records 

 was searched until a match on family identity 

 was obtained. The number of individuals shown 

 in the good record was then inserted into the 

 defective record. 



Records of meals eaten away from home 

 were supposed to contain the number of meals 

 eaten on that occasion (not necessarily equal 

 to number in family). To recover records in 

 which this information was missing, the file 

 of good records was read. A tally was kept 

 of the number of "meals away from home" 

 records, and the total number of meals eaten 

 on those occasions was tallied. From those 

 two figures the average number of meals eaten 

 away from home on each occasion was com- 

 puted (1.7). This value was then inserted into 

 the defective records. 



Records which had only the above two defects 

 were reintroduced into the file of good records 

 following recovery. The two remaining types 

 of defects presented a more difficult problem 

 of recovery. Not only was there no redundant 

 information of any use, but also it was deter- 

 mined that nearly all of the remaining de- 

 fective records had neither purchase weight 



nor species identification. The only usable data 

 in the record, other than family-specific items, 

 was an indication of fish category such as 

 "fresh or frozen finfish," "specialty items," etc. 

 There were 10 such categories. Balancing the 

 desire to account for all fish products con- 

 sumed against the possibility of introducing 

 distortion of the distribution of consumption, 

 it was decided to create two separate data 

 bases: one with only good records and the 

 records already recovered and the other to 

 include all records, following the recovery of 

 these more serious defects. 



These remaining records were recovered in 

 the following manner: 



1. The file of good records was first sorted 

 by fish category. 



2. The percentage of each category repre- 

 sented by each item code (species identi- 

 fication) was calculated. 



3. A cumulative percentage table was con- 

 structed for each category, giving the 

 relative frequencies of each item code 

 among the good records on a scale of 1 

 to 10,000. 



4. Each defective record without an item 

 code was read sequentially. A random 

 number in the range 1 to 10,000 was 

 generated and used as the subscript to 

 the appropriate cumulative percentage 

 table. The corresponding item code was 

 then assigned to the record. 



This processing resulted in the random 

 assignment of item codes to the defective 

 records with essentially the same relative fre- 

 quency found in the good records. 



The final phase of the recovery process in- 

 volved assigning a "consumption weight" to 

 each record without a "purchase weight" entry. 

 The good records had already been processed 

 to convert "weight purchased" to "weight 

 consumed" by use of ratio figures supplied by 

 NMFS. The good records were sorted by item 

 code, and the average consumption weight for 



625 



