A-48 
Because the variance of the p's is critical to the shape of the CFD, consider the vari¬ 
ance of p's computed from three sources in the experiment outlined above: 1) the true 
data, 2) a krig estimate based on a sample from the true data, and 3) conditionally 
simulated data based on a krig estimate of 2). To enhance our understanding of this 
comparison, the variance of the p's are discussed for two cases for each source. The 
first case assumes complete independence in the base simulation and does not use 
interpolation to estimate proportion of area out of compliance. This simplification 
allows us to easily infer the behavior of the CFD using analytical methods. The 
second case introduces an unknown spatial dependence in the base simulation and 
uses interpolated data to estimate the proportion of area out of compliance. These 
additional complexities make it difficult to implement analytical inference but 
conclusions may still be inferred by analogy to the simple independent case. 
Consider the sequence of sources where the base simulations are generated under the 
simple constraints of constant mean, constant variance and the errors for each cell of 
the grid that are independent. For this case the exceedance probability is: 
p = 1 - <F((x s - p - C) / o) 
where : C is the criterion threshold, 
x s is the data at location s, 
p is the mean used in the simulation, 
(7«is the variance used in the simulation, and 
<J> is the standard normal Cumulative Distribution Function. 
The distribution of the true p's computed from all 300 cells of the 5x60 simulation 
grid would behave like that of a independent binomial with N=300 with a variance 
of (p(l-p)/300). From these independent data draw a sample of size 40. Using only 
the proportion of the sample that is out of compliance to estimate the p's, the distri¬ 
bution of the p's would be that of a independent binomial with N = 40 and variance 
(p(l-p)/40). Clearly the p's estimated from the sample of 40 have much larger vari¬ 
ance than p's from the base simulation with 300 cells. Thus the true CFD computed 
using data from 300 cells will be steeper than the sample CFD computed from 40 
data points. This pattern is illustrated by comparing the true CFD (line 1) and the 
estimated CFD (curve 2) in Figure 5.6. This increase in the variance of the p's due to 
small sample size is the kernel of the sample size problem with the CFD. Now 
consider the behavior of p's computed from conditional simulations based on the 
sample. Compute x and s as estimates of (D and (D from the sample of 40 in the usual 
way. The conditional simulation is done by populating the 5x60 grid with data from 
a normal distribution with mean x t and variance s 2 j. The exceedance probability for 
these simulated data for the i th month is 
p'i = 1 - <J>((xs s - x, - C)/s') 
where : xs s is simulated data at location s 
Tj is the estimated mean used in the conditional simulation, and 
Sj is the estimated standard deviation used in the conditional simulation. 
If the p' were constant over months, the variance of the p's estimated by conditional 
simulation would be (p'( l-p')/300). The sample size component of this variance has 
been standardized to 300 which is the same as the sample size component of the true 
p’s, but the variability of conditionally simulated p's will be greater than that of true 
appendix a 
The Cumulative Frequency Diagram Method for Determining Water Quality Attainment 
