affecting the outputs resulting in too many areas characterized as “good” when they were clearly 
unsatisfactory. Modifications applied to indicator calculations from 1998-2000, benchmark and 
status data sets (3-year windows) were log-transformed prior to analysis to address data 
skewness issues negatively impacting equality of data distribution characterizations. It was thus 
noted “that for water quality parameters the log and square root transformations are about equal 
in effecting a normal distribution of the data, and more effective than inverse transformations or 
using untransformed data” (Olson 2009). 
U.S. EPA (2007b) extended the published analyses of Harding (1994) and Harding and Perry 
(1997) modeling historical chlorophyll a data using a Generalized Linear Model (GLM) for logio 
(chlorophyll a). In deriving reference chlorophyll a criteria thresholds for Chesapeake Bay, 
thresholds were recommended as being derived by a model for the desired mean level of 
chlorophyll a in log space (U.S. EPA 2007b, page 17). Tables 111-2 and 111-3 in U.S. EPA 2007b, 
page 18) illustrate reference condition recommendations in log transform space mean 
chlorophyll and back transformed means. Recommendations for harmful algal bloom based 
chlorophyll a criteria in tidal fresh and oligohaline waters of Chesapeake Bay were further 
dependent upon log-transformed chlorophyll a analyses in their development (U.S. EPA 2007b). 
James River Focused Analyses of Log-transformed Chlorophyll a Data for Normality 
Tidal James River chlorophyll a data (1991-2000, n = 828) were log-transformed; natural 
logarithms were used. A Generalized Linear Modeling (GLM) approach was used to test 
chlorophyll a data for normality. Statistical Analysis Software (SAS) was used in the analysis. 
Seven Chesapeake Bay segments were included in the analysis: Mouth of Chesapeake Bay 
(CB8PH), Mouth to mid-Elizabeth River (ELIPH), Southern Branch Elizabeth River (SBEMH), 
Mouth of the James River (JMSPH), Lower James River (JMSMH), Middle James River 
(JMSOH) and Upper James River (JMSTF). Segments were grouped into one of four groups 
depending on similarity of their variances: 
then SegGrp = 1; 
then SegGrp = 2; 
"JMSPH" 
"JMSMH" "SBEMH" 
"JMSOH" "CB8PH" "ELIPH" 
'‘JMSTF" 
then SegGrp = 3; and 
then SegGrp = 4. 
The GLM model was ln(chlorophyll)= year, segment. (Equation 1) 
Data was analyzed by season. Spring was defined as March, April and May with summer defined 
as July, August and September. Normality diagnostics were reviewed for the raw residuals. 
For Spring and Summer seasons within the tidal James River, even without standardizing for 
heterogeneous variance, the ln(chla) residuals from the GLM model results show a fairly close 
approximation to a normal distribution. The normal probability plot shows very high 
concordance between the expected residuals and the observed residuals except for two outlier 
points in the extreme tails of the sample. These outliers probably reflect a failure of the simple 
model to capture some extreme event rather than a failure of log normality. The Shapiro-Wilk 
statistic of 0.994 (spring) and 0.988 (summer) shows that the residuals are very highly correlated 
with the expected residuals for approximating a normal distribution (see Appendix C). The 
35 
