PAPERS ON CHEMISTRY AND PHYSICS 277 



take in copying, he puts down the wrong answer. Should 

 he not get some credit for it? Many questions, again, 

 can not be answered properly by the use of several 

 words. Suppose the full answer is "Simple harmonic 

 motion" and the student writes "Harmonic motion." 

 It is neither completely right nor completely wrong. It 

 has therefore been my custom to give two for correct an- 

 swers, one those partly right. Elaborate methods have 

 been worked out for giving weighted scores, the values 

 of which are proportional to the percentage of pupils 

 failing on each question. It is doubtful if such a scheme 

 is worth the trouble it takes. Better recast the whole test 

 if there is any great difference in the difficulty of the 

 questions. 



Having scored the papers, we must next study them 

 for certain necessary information. To find whether the 

 test is too hard or too easy, we plot a curve, using the 

 scores as abscissas, and the number of pupils making 

 each score as ordinates. An approximation to the nor- 

 mal probability curve shows that we have a test reason- 

 ably suited to the capacity of our class ; if it is skewed to 

 the right, it is too easy ; if to the left, it is too hard. 



We may next study the relative difficulty of the ques- 

 tions by recording the number of times each question has 

 been missed. If any question has been missed by no one, 

 it is too easy and should be dropped. On the contrary, if 

 any question is missed by all, it is too hard; the form is 

 ambiguous, or the content is too difficult for their com- 

 prehension, or the subject has not been taught properly. 

 It should be dropped or reshaped. If the number of cor- 

 rect answers to each question runs ninety-five to five per- 

 cent, we may conclude that we have a good set. 



The next step is to revise the questions, eliminating 

 some, inserting others, restating some, and putting the 

 harder questions toward the end. Another trial on the 

 same or a similar class should be much more satisfactory; 

 that is, it should give a nearly normal distribution in 

 which the failures should be much more numerous in the 

 latter part of the test than in the first part. 



We are now ready to compare the grades on this par- 

 ticular set of questions with a criterion. About the only 



