could be printed as an additional column and row in the 

 table. Hence, a model that utilized the row and column 

 sampling errors was included in this study. 



The models investigated were: 



SE(Ti,) = SE{T„)(T.ATij)-5 (1) 



SE(Ty) = (bb + b,(1/Tij))5 (2) 



SE(Tij) = bo + bi(1/Ti,)5 (3) 



SE(Tij) = biTif2Ti''3T|4 (4) 



SE(Tij) = bb + biSE(Tj){Tj/Tij)-5 

 + b2SE(Ti,)(T,n-jj)-5 



Model 1 is used by some FIA projects (Nines and 

 Vissage, 1988) and has no unknown parameters that can be 

 estimated from the data in the table. It assumes that the 

 sampling error is inversely proportional to the square root of 

 the cell value. It is also a special case of (3) with = 0.0 

 and bi =SE(T )/(T )5. 



Model 2 has been used in large surveys by the Current 

 Population Survey and the National Health Interview Survey 

 in a related form (Valliant, 1987). Model 3 is the general 

 form of (1). It allows for an intercept and does not force bi 

 to equal a specific value but allows the data to determine 

 the slope coefficient. 



Model 4 was derived to account for nonlinear relationships 

 between the variables Ty, Tj, and T| and the dependant 

 SE(T|j). Replacing with -.5 and eliminating Tj and T| , the 

 model becomes: 



SE(Ty) = biTij-5 (6) 



and is identical to Model 3 if b^ is equal to zero. 



Model 5 was derived to incorporate as much information as 

 possible about the variability of the cell sampling errors 

 without printing all of the sampling errors. It uses 

 information from the sum of the rows and columns as well 

 as the marginal sampling errors. The sampling errors of the 

 row and column totals must be published for this model to 

 be used. 



Tests 



The models were evaluated on a portion of the tables 

 produced by the Northeastern Forest Experiment Station's 

 FIA unit for the 1989 Kentucky inventory. Fifteen tables 

 were divided into two types: those that contained 

 continuous variables such as board-foot and cubic-foot 

 volumes and those that were from multinomial distributions. 

 An example of a multinomial variable is forest ownership in 

 acres. The plot is either located on one of several ownership 

 categories or it is not. The weighted proportion of plot 

 occurrences in a category is multiplied by the total acreage 



in that county (assumed not to have error). Although 

 acreage is usually considered as a continuous variable, it is 

 treated as a constant. Other area tables in this study were 

 derived in a similar manner. 



The coefficient of determination (R^) was the measure of 

 performance. For Model 1 it was calculated as if it were a 

 linear regression model; that is: 



yi = SE(T.)(T./Tij)-5 



r2 = 1 - var(SE(Tij) - y^ )/var(SE(T„)). 



Similarly, the R^s for the nonlinear models were: 



Rf = 1 - SS corrected/MSE i = 2,4 



where SS is the sum of squares and MSE is the mean 

 square error. 



Sampling errors greater than or equal to 100 percent were 

 eliminated from the regressions since they represent cells 

 with a single non-zero observation. However, we believe 

 that the models can be applied to these cells, resulting in 

 an improved estimate of the variance. 



Results 



The proportion of variation explained, R^, for each of the 

 five models is shown in Table 1. Model 1 consistently 

 explained less variation than the other models, and actually 

 introduced more error than it explained for the continuous 

 tables. Thus, Table 1 shows R^ values of zero. 



Model 2 performed well with the area tables, producing R^ 

 values between 0.81 and 0.99. This is the only model for 

 which a theoretical basis has been shown to exist for two 

 stage sampling used in populations surveys (Valliant 1987). 

 Valliant also shows how this model should do well for 

 continuous variables exhibiting gamma distribution of X(b,1) 

 where b and 1 are population parameters, so long as the 

 variables meet certain conditions. One condition is that the 

 second parameter must be constant across all strata in the 

 population, an improbability with tables composed of two or 

 more sampling intensities. 



Consequently, Model 2 did not perform well with the other 

 variables. The R^ values dropped to 0.49 to 0.84 for the 

 board-foot, cubic-foot and number-of-stems tables. 



With one exception, Model 3 performed as well as or better 

 than Model 2 and equaled Model 4 for several of the tables. 

 Its lowest R2 was 0.87 for the area tables, but Model 4 was 

 considerably better for tables involving number of stems 

 per acre. 



The range R^ values for area tables in model 4 was 0.90 to 

 0.99. It dropped to 0.67 to 0.92 for the continuous tables. 

 The overall performance of Model 4 was second only to 



2 



