PAPERS ON PSYCHOLOGY AND EDUCATION 493 



Each reasoning test provided a single measure of the 

 arithmetical-reasoning ability of each pupil. On the as- 

 sumption that the average of several expert attempts to 

 measure an object or reaction is more rehable than a 

 single attempt, the average score of a pupil from several 

 tests was regarded as approximating more closely the 

 true measure of the ability in question than the measure- 

 ment derived from a single test. 



The average score was obtained after each test series 

 had been transmuted into values on a percentile scale 

 and thus made comparable. The method of transmuta- 

 tion was shown to have satisfactory reliability by the high 

 correlation between each original series of scores and 

 the corresponding derived series. The average coefficient 

 (product-moment) was .99. The five tests which have 

 but one set of problems for the four grades in which the 

 experiment was conducted were employed in deriving the 

 average or composite score. 



The average coefficient of correlation of each test series 

 with the composite in each grade was found, and the sev- 

 eral tests ranked according to closeness of agreement 

 with the composite. 



For the purpose of verifying the results obtained by 

 the use of the composite scores, all the possible inter-test 

 correlation coefficients were computed for the sixth and 

 seventh grades. The method of composite scores showed 

 which test agreed most closely with the average result of 

 five of the tests. The method of inter-test correlation 

 showed which test on the average agreed most closely 

 with the rest of the tests. The results by the two methods 

 should be closely similar. 



Table I presents the ranking of the tests according to 

 the two different methods of estimating their validitv. 



