A-15 
tainty described for step 3 propagates into computing the percent of compliance for 
a segment. Add to that the fact that estimated values for interpolator cells have a 
complex dependence structure which rules out a simple binomial model and the rules 
governing the uncertainty of this step are also complex. The number of interpolator 
cells, N, is relatively constant and under an independent binomial model the variance 
of the proportion of cells not in compliance, p, would be (p)(l-p)/N. Intuitively, one 
expects the variance of p to decrease as the number of data points that feeds the inter¬ 
polation increases. This expectation has been confirmed by simulation, but the 
mathematical tools for modeling this propagation of error are yet to be developed. 
Step 5 - Percent of Time. While the percent of space coordinate of the CFD has 
simple interpretation of the percent of the segment out of compliance on a given 
date, the percent of time coordinate is not simply the percent of time out of com¬ 
pliance at a given point. Instead the percent of time coordinate has an interpretation 
similar to that of a cumulative distribution function. The percent of time coordinate 
is the percent of time that the associated spatial percent of noncompliance is 
exceeded. For example, if the (percent space, percent time) coordinates for a point 
on the CFD are (90,10), one would say that the spatial percent of noncompliance is 
greater than or equal to 90% about 10% of the time. 
This step is very similar to computing an empirical distribution function which is an 
estimator of a cumulative distribution function. Because of this similarity, one imme¬ 
diately thinks of statistical inference tools associated with empirical distribution 
functions, such as the Kolmogorov-Smimov, Shapiro-Wilk, Anderson-Darling, or 
Cramer-von Mises, as candidates for inference about the CFD. These procedures 
model uncertainty as a function of sample size only; in this case the number of 
sample dates. The fact that it does not incorporate the uncertainty discussed the 
previous steps seems unsatisfactory. 
A quick review of probability plotting will reveal several methods on estimating the 
percent of time coordinate in step 5. Formulae found in the literature include: (R/N), 
(R - 0.5) / (N - 1). and (R - 0.375) / (N + 0.5), where R is rank and N is sample size. 
These generally fall in to a family of given by (R - A)/(N - 2A + 1) for various values 
of A. They are approximately equal, but the choice should be fixed for a rule. 
Step 6 - Plotting the CFD. Even the plotting of the points is subject to variation, 
although these variations are somewhat minor compared to the larger issue of 
assessing the uncertainty of the assessment curve. The simple approach used in the 
figures above is to connect the points by line segments. In the statistical literature, it 
is more common to use a step function. If the graph represents an empirical distri¬ 
bution function, each horizontal line segment is closed on the left and open on the 
right. Because the CFD is an inversion of an EDF it would be appropriate for these 
line segments to be closed on the right and open on the left. 
Step 7 - Comparing the Curves. It is at the point of comparing the assessment 
curve to the reference curve that the issue of uncertainty becomes most important. 
From the preceding discussion it is clear that uncertainty in the assessment curve is 
an accumulation of uncertainty generated in and propagated through the preceding 6 
steps. If the reference curve is biologically based, it is derived under the same system 
appendix a 
The Cumulative Frequency Diagram Method for Determining Water Quality Attainment 
