ON THE MEASUREMENT OE THE INFLUENCE OF 
"BROAD CATEGORIES' 1 ON CORRELATION. 
By KARL PEARSON, F.R.S. 
(1) By a " broad category" I understand one of a finite small number of groups 
into which we class a variable. For example : we may divide General Health into the 
categories Very Robust, Robust, Normally Healthy, Rather Delicate, Delicate and 
Very Delicate. These categories may be verbally defined or have their boundaries 
determined by quantitative limits as when we state that the limits of the Delicate 
coincide with so many weeks of sickness or of absence from work in the year. 
Again we may put into four to six classes the competitors in an examination, and 
the boundaries to these classes may be really percentages of marks gained. Such 
broad categories are very common not only in social investigations, but also in 
psychological records, and quite recently Dr G. A. Jaederholm, a Swedish 
psychologist, wrote to me asking what was the correlation between the true 
quantitative value of a variate in any individual and that individual's category 
or class-mark. The answer is an obvious one, but I do not know that I have 
seen it stated or any discussion of it given. It of course assumes that at the back 
of the categorical classification a true quantitative value lies. 
Suppose a population of N individuals divided into p classes, and that G s is 
the class-mark of an individual in the sth category, whose true variate is x. 
The problem is what is the correlation of as and the class-mark. Let x s be the 
mean variate of the group of n s individuals who fall into the sth class. Then it 
is just as reasonable to call x s the class-mark as G s , for given one the other is fixed. 
We really want then the correlation of x and x s . 
We may either find this directly * or indirectly, and the latter is the easier 
course. Clearly if x be the mean value of x for a given class, then 
or x — x = x s — x, 
* Let x within the class = x s + x g ', S = sum for classes and 2 = sum within class. Then 
S2 (n x xx s ) = ,S'2 >n x x s ft + <) } = S + S {5,2 (n x x t ') } , 
