where the last "term on the right is the so-called "correction term", which 

 is then used to obtain the siain of squares of the computed Ys: 



SSY = SSY - YZY. 

 comp raw i 



Thus, the proportion of the total sum of squares attributable to 

 linear regression, expressed as a percentage, can now be computed directly 

 as 100 SSYcomp / SSYobs- For example, if SSY^^g = 250, and SSY^ = I50, 

 then the linear relation "accounted for" 63% of the total 3um of squares 



of Y, 



obs" 



The quantity 100 SSY 



com' 

 "percent reduction in the sum 



^ / SSY , IS informally referred to as the 



ip ' ObS„ . -u • 4-1. -u 



t>f squares , inasmuch as m the above example. 



if the total sum of squares of Y is 250, the I50 units attributable to 

 Ycomp leave, in effect, only kO'fo of the original variability "unaccounted" 

 for. Hence, the reference to a percent reduction in SSYq-^j^ means that the 

 regression relation has in effect "reduced" the initial variability by 60%. 

 The sum of squares criterion is generally most useful when the number of 

 samples (or times of observation of process elements) is several times as 

 great as the number of process elements measured. Thus, for six Xs the 

 minimum number of times of observation should be I5 to I8. As the nijmber 

 of Xs measured approaches the number of observations, the sum of squares 

 reduction tends to be forced closer to 100^, and may give an impression 

 of explaining a greater part of the variability in Y than is correct. 

 (The iinusually high percent reductions in the lower part 'of table Bk'^, 

 described later, may be attributable to such effects). 



The reduction of the sum of squares is a measure of the mathematical 

 association between variables, and is not necessarily the measure of a 

 physical relationship. Where the independent variable has physical mean- 

 ingfulness in the problem, however, it is not extreme to infer that the 

 strength of the mathematical relation is also a measure of the strength 

 of the physical relation. 



We retiirn to the output for this example, as given in table B. It 

 is interesting to note the wide range in the percentage of the sum of 

 squares of Y accounted for by the various combinations of Xs. In pre- 

 computer days, when only a few variables could be handled feasibly, it is 

 not surprising that different researchers, using different sets of variables, 

 might well have made different inferences, depending upon the particular 

 two or three variables that might have been used. 



The material in table B can be conveniently arranged as in table 

 C, which lists only the three strongest combinations for each computer 

 loop. There seems to be no doubt in the present subset of data that the 

 effect of mean particle size is by far the strongest single variable in 

 slope response, even though mean grain size is itself influenced by the 

 process elements. For Xs taken two at a time, it is noteworthy that mean 

 grain size (Xl) is consistently an element of each pair; and that the com- 

 bination of mean grain size and wave height (xh) is the strongest pair. 



14 



