K-5 
In each simulation run, a subset of the reference good sites for each habitat was 
selected at random, and the B-IBI threshold for this subset was determined (i.e., the 
IBI score at the 5th percentile, or the maximum score for the reference degraded 
samples). The scores of the assessment data for each habitat were then compared to 
the threshold to estimate the proportion of sites below the threshold. By repeating 
this process over and over again (5,000 runs) we were able to estimate the variance 
in the proportion of sites below the threshold from the bootstrap estimates. This vari¬ 
ance reflects variability in the thresholds as well as sampling variability in the 
assessment data. 
In the final step of the method, segments were declared impaired if the proportion of 
sites below the threshold (i.e., degraded area) was significantly higher than expected 
under the null hypothesis. Under the null hypothesis, a small number of sites 
(defined as 5% of the sites) would be expected to have low IBI scores even if all sites 
in a segment were in good condition (i.e., no low dissolved oxygen, contaminant, or 
nutrient enrichment problems). This is because of natural variability in the benthic 
communities, the effects of natural stressors, and sampling and methodological error. 
For a segment to be declared as impaired, the lower bound of the 95% confidence 
interval of the estimate had to be higher than 5% (the expected proportion under the 
null hypothesis), with a minimum sample size of 10. A 5% level was used in agree¬ 
ment with standard statistical practice. 
The steps described above are summarized below and in Appendix A: 
1. Thresholds are set for each of seven benthic habitats in Chesapeake Bay. 
2. The threshold is set as the smaller of two values: 5th percentile IBI score for 
the good reference sites or maximum observed IBI score for the degraded refer¬ 
ence sites. 
3. The 5th percentile score and its variance is estimated by bootstrap simulations. 
4. For each iteration of the bootstrap simulation, a subset (of same sample size) 
of the good reference sites for each habitat is selected at random (with replace¬ 
ment), and the 5th percentile score determined. 
5. At each iteration, the threshold is set according to #2. 
6. At each iteration, the assessment data are compared to the reference data to 
estimate the proportion of sites (P) with scores below the threshold. This is 
done for each of one or more habitats within a segment. 
7. P is averaged over all the iterations. 
8. Under the null hypothesis, 5% of the sites (Po) would be expected to have low 
IBI scores, even if all sites in a segment were in good condition. 
9. Segments are declared impaired if P — Po > 0 (greater than expected under the 
null hypothesis, with 95% confidence) (See Schenker and Gentleman 2001). 
3.3. WILCOXON 
A stratified Wilcoxon rank sum test was applied as described in Llanso et al. (2003) 
using Proc-StatXact 5 software (Cytel Software Corporation 2002). B-IBI scores 
were grouped into three ordered condition categories (1.0-2.0, 2.1-2.9, 3.0-5.0) and 
appendix k 
2006 303(d) Assessment Methods for Chesapeake Bay Benthos 
