Multicollinearity and Ridge Regression 



Multicollinearity results when one independent variable is highly correlated with another, 

 or with a linear combination of other independent variables. UTiile the model is still an 

 unbiased estimator, the effect of multicollinearity is to produce highly imprecise estimates 

 of each regression coefficient (Kmenta 1971) . This high degree of impreciseness is due to the 

 multicollinearity between independent variables causing the correlation matrix to approach 

 singularity (Farrer and Glauber 1967) . One possible effect is that some of the regression 

 coefficients may be of the wrong sign from what was expected (Hoerl and Kennard 1970a, 1970b) , 



One method used for reducing multicollinearity effects is ridge regression. This sacrifices 

 unbiasness to obtain estimates that, when compared to their unbiased least squares counter- 

 parts, are often interpretable and have a smaller mean square error. In terms of solving for 

 the standardized regression coefficients,^ the method consists of adding a small constant 

 value, K, to the diagonal elements of the correlation matrix and then solving in the usual 

 manner for the regression coefficients. VVhen K is zero, the ordinary least squares regression 

 estimates result. UTiile K can be any positive value, it usually lies between zero and one. 

 Also, the larger the value of K, the larger the bias. Details of ridge regression can be 

 found in Hoerl and Kennard (1970a, 1970b), Brown and Beattie (1975), Marquardt and Snee (1975), 

 or Hocking (1976) . 



Numerous methods have been proposed for determining which value of K is "best"; Hocking 

 (1976) summarizes these methods. One technique common to many of these methods is the devel- 

 opment of a ridge trace, which was first proposed by Hoerl and Kennard (1970a, 1970b). The 

 ridge trace is usually a plot of the resulting standardized regression coefficients over a 

 range of their respective K values, usually from zero to one. Using the ridge trace, Hoerl 

 and Kennard (1970a) suggested four items to consider when deciding upon a value of K: 



1. At a certain value of K, the system will stabilize and have the general characteristics 

 of an orthogonal system. 



2. Coefficients will not have unreasonable absolute values with respect to the factors 

 for which they represent rates of changes. 



3. Coefficients with apparently incorrect signs at K = will have changed to have the 

 proper sign. 



4. The residual sum of squares will not have been inflated to an unreasonable value. It 

 will not be large relative to the minimum residual sum of squares or large relative to what 

 would be a reasonable variance for the process generating the data. 



Because of the high degree of multicollinearity in the data and of the unreasonable signs 

 on some of the independent variables, I decided to try ridge regression to determine if it 

 could improve the estimates of the regression coefficients. I OTOte a program to compute 

 regular and standardized regression coefficients for K values ranging from zero to 0.95 in 

 intervals of 0.05. These values could then be plotted to produce the ridge trace. 



°Standarized regression in coefficients, b .* , and normal regression coefficients, b., are 

 related in the following manner: ^ 



s 



b 



where 



= the variance of the dependent variable. 



s = the variance of the ith independent variable, 

 i 



61 



