October 31, 1913] 



SCIENCE 



633 



example, if the marks assigned by 75 out of 

 100 teachers to a given paper lie between 80 

 and 90, then the unit of our scale should be 

 ten points. Any smaller division would have 

 little or no objective significance. Of course, 

 ;almost indefinitely small differences in merit 

 •can be measured if an indefinite number of 

 independent estimates is made. 



Now what are the actual facts with regard 

 ;to the size of distinguishable steps in the 

 marking scale? We have seen above that the 

 •mean variation of the estimates of a teacher 

 in matching his own marks, after eliminating 

 his own change in standard, is 1.75 points. 

 According to our principle that if a unit is to 

 'be large enough in range to include three 

 fourths of all his estimates of the same quan- 

 tity, then the smallest distinguishable step 

 "that can be used with reasonable validity is 

 2| times the mean variation (1.75) or prob- 

 able error, which would be 4.8, or roughly 5 

 ^oints.^ 



Hence our marking scale, instead of being 

 100, 99, 98, 97, 96, 95, etc., should be 100, 95, 

 ■90, 85, 80, etc. These are the smallest divi- 

 -sions that can be used with reasonable confi- 

 dence by a teacher in grading his own pupils. 

 This means that on a scale of passing grades 

 -of 70 to 100 only seven division points are dis- 

 tinguishable. This substantially confirms the 

 •scheme followed in many institutions that the 

 marking scale should be A +, A — , B -\-, B — , 

 ■C+, 0—, D+, D— and failure. No 

 medium A, B,G or D may be used. Letters or 

 -symbols are perhaps preferable to such desig- 

 nations as Excellent, Good, Fair and Poor be- 

 cause of the moral implication in the latter. 



Even as fine a scale as this might perhaps 

 better be replaced by a coarser one computed 

 on the mean variation of 4.3 points, which is 



° To those who may be interested in the basis of 

 this computation I may say that a range twice the 

 ■size of the probable error includes one half of the 

 series of estimates, and a range 2i times the mean 

 Tariation or 3 times the probable error includes 

 •approximately three fourths of the series of esti- 

 mates. In practise the mean variation and the 

 probable error are used interchangeably, but the 

 :former is usually a trifle larger than the latter. 



the mean variation of different teachers in the 

 same department and institution after the ef- 

 fect of the personal standard has been elimi- 

 nated. See Table II. On this basis the range 

 of a division on the scale should be 4.3 times 

 2f or approximately 12 points. The reason 

 for this larger step would be that this is as 

 closely as difl^erent competent teachers agree 

 on the evaluation of the same papers. One 

 teacher may be as much in the right for grad- 

 ing a paper 80 as another for grading it 90. 

 The only ultimate criterion is the consensus 

 or average of estimates. This coarser scale 

 would allow for only three divisions of pass- 

 able grades, A, B and C. But the finer scale 

 proposed above can be used with reasonable 

 accuracy by a teacher in grading his ovm 

 pupils in the light of his own viewpoint. 



Of course, any one may use as fine a scale as 

 he pleases provided one recognizes the range 

 of the probable error of the scale used. The 

 fine scale, if conscientiously used, probably 

 tends to stimulate the making of finer distinc- 

 tions than a coarse scale does. However, the 

 chief objections to a very fine scale are: (1) 

 An illusion of accuracy, (2) injustice to the 

 student of supposed differences where there is 

 no appreciable difference or where the relative 

 merit might be just reversed, (3) embarrass- 

 ment to the teacher due to this injustice. 



If we admit the soundness of our reasoning 

 it may seem to many teachers that even the 

 finer scale of five point steps is rather crude 

 and that the evaluation of a pupil's attain- 

 ment is very coarse. But not so. As a matter 

 of fact, the steps of the proposed scale are very 

 fine and the measurement of achievement 

 would be fairly accurate. 



Apropos of this point we may compare the 

 accuracy of making measurements of a similar 

 type in an entirely different field. A mechanic 

 through constant use has acquired a fairly 

 definite mental image of an inch or a foot. 

 Tet a mechanic's estimate of the length of a 

 rod is not an iota more accurate than a 

 teacher's estimate of an examination paper. 

 I tested this problem by having eleven experi- 

 enced carpenters estimate in inches as closely 

 as they could the length of five rods varying 



