# Metal Mining Technical Guidance for Environmental Effects Monitoring

## Chapter 8

### 8. Data Assessment and Interpretation

**8.2 Understanding the Definition of Effect, and Meaning of Data Interpretation, within EEM**

**8.3 Data Assessment and Interpretation for the Fish Study**

- 8.3.1 Preparing the Analyses
- 8.3.2 Summary Statistics
- 8.3.3 Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA)
- 8.3.4 Transformations
- 8.3.5 Level of Replication
- 8.3.6 Effect and Supporting Endpoints
- 8.3.7 Statistical Analysis for Non-lethal Sampling
- 8.3.8 Data Quality Assurance / Quality Control and Analysis (Errors and Outliers)

**8.4 Effects on Usability of Fisheries Resources**

**8.5 Data Assessment and Interpretation for the Benthic Invertebrate Community Study**

- 8.5.1 Study Design and Statistical Procedures
- 8.5.2 Data Treatment
- 8.5.3 Reference Condition Approach
- 8.5.4 Supporting Endpoints

**8.6 The Role of Power Analysis, α, β and Critical Effect Size in Determining Effects**

- 8.6.1 Setting α and β
- 8.6.2 Power Analysis: Determination of Required Sample Size, Power and Appropriate Critical Effect Size

**8.8 Statistical Considerations for Mesocosm Studies**

**Appendix 1: Step-by-Step Guidance through Statistical Procedures**

**Appendix 2: Graphical and Tabular Representation of Data**

**Appendix 3: Case study – ANCOVA and Power Analysis for Fish Survey**

### List of Tables

- Table 8-1: Required fish survey measurements, expected precision and summary statistics
- Table 8-2: Fish survey effect indicators and endpoints for various study designs and the appropriate statistical analyses
- Table 8-3: Supporting endpoints to be used for supporting analyses.
- Table 8-4: Summary of effect endpoints analyzed using ANCOVA
- Table 8-5: Fish Tissue effect and supporting endpoints and statistical procedures
- Table 8-6: Statistical procedure used to determine an effect for each of the seven study designs
- Table 8 7: Sample sizes required to detect difference of ± 2 SD

# 8. Data Assessment and Interpretation

## 8.1 Overview

As part of the environmental effects monitoring (EEM) requirements under the *Metal Mining Effluent Regulations* (MMER), after biological monitoring studies are conducted, an interpretative report shall be prepared (MMER, Schedule 5, section 17). The owner or operator shall submit to the Authorization Officer reports of the results of the studies in writing. The role of the interpretative report within the EEM program is to summarize study results (including difficulties or confounding factors encountered), conduct applicable spatial analyses (and when sufficient data are available, temporal trend analyses), specify any identified “effects,” and make recommendations for subsequent EEM program monitoring. Data interpretation or the role of the report does *not* include determining the ecological, economic or social significance of results. The content of the interpretative report is available in Chapter 10 of this document and in the MMER.

The purpose of this chapter is to provide general guidance on how to assess and interpret EEM data, specifically:

- which effect endpoints to use and report;
- the statistical (or other) approach to use for each effect endpoint in order to determine the presence or absence of an effect; and
- the role of power analysis, α, β and critical effect size (CES) in determining effects.

EEM involves iterative phases of monitoring and reporting. For each phase it is required to report the results of the data assessment made under Schedule 5, s. 16. The report must include the identification of any effects on fish populations, fish tissue or the benthic invertebrate community, the overall conclusions of the biological monitoring studies based on the results of the statistical analysis, and a summary of the results of previous monitoring. More specifically, the data generated for each mine should be analyzed to determine whether there are significant differences in certain effect indicators between reference and exposure areas or along an exposure gradient (i.e., determination of effect). In addition to the within-phase (spatial) analysis, a comparison of effects between phases (temporal comparisons) is recommended in order to determine whether any effects identified previously are lessening or worsening.

For EEM purposes, only specified data (the effect indicators) generated from the fish survey, benthic invertebrate community survey and fish usability studies are used to assess the presence of effects. Other EEM data are only used to help interpret effects on fish and benthos (e.g., effluent characterization and water quality monitoring) or to help characterize any changes in effluent quality over time (e.g., sublethal toxicity testing). The tables in the following sections summarize the recommended data analysis procedures for the effect indicators for each monitoring requirement (Tables 8-2, 8-3 and section 8.5). Also, refer to the relevant sections of this chapter for further details. Many of the data interpretation issues are the same for the fish survey, fish usability and benthic invertebrate community sections that follow (e.g., assumptions and interpretation of statistical techniques common to more than one of these sections). Several of these common issues are discussed in the fish section below, and are not repeated in the following sections on fish usability and benthic invertebrate communities.

## 8.2 Understanding the Definition of Effect, and Meaning of Data Interpretation, within EEM

Understanding 1) the types of data analyses that are relevant and 2) what is meant by the definition of “interpretation” is integral to the EEM program, particularly when writing an interpretation report. In order to address both issues, it is important to define “effect.”

Within EEM, an effect is defined generally as a statistically significant difference in fish or benthic invertebrate community effect indicators measured between an area exposed to effluent and a reference area, or a statistically significant difference in these effect indicators within an exposure area along a gradient of effluent concentrations. For fish tissue analysis (which is conducted to determine the usability of fisheries resources), an effect is defined as measurements of concentrations of total mercury that exceed 0.5 µg/g wet weight in fish tissue taken in an exposure area and that are statistically different from and higher than the measurements of concentrations of total mercury in fish tissue taken in a reference area (Schedule 5, Interpretation, s. 1). In cases where it is not feasible to examine wild fish or field distribution of benthic invertebrates in areas exposed to effluent and reference areas, an alternative monitoring approach for fish or fish habitat may be used to determine if the effluent is causing an effect (Chapter 9).

Given the above definition of effect, it is important to recognize that not all effects identified in EEM represent damage to fish, fish habitat or the usability of fisheries resources. However, effects as defined above do represent scientifically defensible differences or gradients that may reflect changes to the ecosystem associated with the effluent. As a result, detailed information on the effects, including the magnitude, geographic extent and possible cause of the effect, may contribute to the understanding of the ecosystem and could be used in the management of the aquatic resources.

## 8.3 Data Assessment and Interpretation for the Fish Study

The data collected during the fish population study will include indicators of growth, reproduction, condition and survival (when it is possible to obtain data to establish the indicators), that include the length, total body weight and age of the fish, the weight of its liver or hepatopancreas, and, if the fish are sexually mature, the egg weight, fecundity and gonad weight of the fish (MMER Schedule 5, s. 16).

The overall procedure that should be followed and reported can be divided into the following stages: 1) preparing the analyses, 2) initial summary statistics, 3) analysis of variance (ANOVA) analyses, 4) analysis of covariance (ANCOVA) analyses, and 5) power analyses. Appendix 1 provides a step-by step-guidance through the statistical procedures for the fish survey.

The required fish survey measurements, expected precision, and summary statistics are described in Table 8-1. Table 8-2 outlines the effect indicators for various study designs and the appropriate statistical analyses that are applicable for the fish population study. Table 8-3 outlines the supporting endpoints.

Measurement Requirement (MMER Schedule 5, Part 2, s. 16) | Expected Precision*** | Reporting of Summary Statistics (MMER Schedule 5, Part 2, s. 16) and other general reporting |

Length (fork or total or standard)* | +/- 1 millimetres (mm) | Mean, median, standard deviation (SD), standard error, minimum and maximum values for sampling areas |

Total body weight (fresh) | +/- 1.0% | Mean, median, SD, standard error (SE), minimum and maximum values for sampling areas |

Age | +/- 1 year (10% to be independently confirmed) | Mean, median, SD, SE, minimum and maximum values for sampling areas |

Gonad weight (if fish are sexually mature) | +/- 0.1 grams (g) for large-bodied fish species and 0.001 g for small-bodied fish species | Mean, median, SD, SE, minimum and maximum values for sampling areas |

Egg size(if fish are sexually mature) | +/- 0.001 g | Weight, (recommended minimum sub-sample sizes of 100 eggs), mean, SE, minimum and maximum values for sampling areas |

Fecundity** (if fish are sexually mature) | +/- 1.0% | Total number of eggs per female, SE, minimum and maximum values for sampling areas |

Weight of liver or hepatopancreas | +/- 0.1 g for large-bodied fish species and 0.001 g for small-bodied fish species | Mean, median, SD, standard error, minimum and maximum values for sampling areas |

External condition | n/a | Presence of any lesions, tumours, parasites or other abnormalities |

Sex | n/a |

* If caudal fin forked, use fork length (from the anterior-most part to the fork of the tail). Otherwise, use total length, and report type of length measurement conducted for each species. In cases where fin erosion is prevalent, standard length should be used.

** Fecundity can be calculated by dividing total ovary weight by weight of individual eggs. Individual egg weight can be estimated by counting the number of eggs in a sub-sample. The sub-sample should contain at least 100 eggs.

*** For small-size fish weights, use at least a 3-decimal scale.

Effect Indicator | Effect Endpoint and Statistical Procedure | ||
---|---|---|---|

Standard Survey | Non-lethal Sampling | Wild Molluscs | |

Growth (Energy Use) | Size at age (body weight against age) (ANCOVA) | Size (length and weight) of young of the year (age 0+) at end of growth period (ANOVA) | Whole-animal wet weight (ANOVA) |

Reproduction (Energy Use) | Relative gonad size (gonad weight against body weight) (ANCOVA) | Relative abundance of YOY (% composition of YOY) (See Chapter 3, section 3.4.2.2) | Relative gonad size (gonad weight against body weight) (ANCOVA) |

Condition (Energy Storage) | Body weight relative to length Relative liver weight (liver weight against body weight) (ANCOVA) | Body weight relative to length (ANCOVA) | Whole-animal dry weight, dry shell or soft tissue weight related to shell length (ANCOVA) |

Survival | Age (ANOVA) | Length frequency distribution (2-sample Kolmogorov-Smirnov test) | Length frequency analysis (2-sample Kolmogorov-Smirnov test) |

Effect Indicator | Supporting Endpoint | Statistical Procedure |
---|---|---|

Energy Use | Body weight (whole) | ANOVA |

Length | ANOVA | |

Size-at-age (length against age) | ANCOVA | |

Relative gonad size (gonad weight against length) | ANCOVA | |

Relative fecundity (# of eggs/female against body weight) | ANCOVA | |

Relative fecundity (# of eggs/female against length) | ANCOVA | |

Relative fecundity (# of eggs/female against age) | ANCOVA | |

YOY survival | See Chapter 3, section 3.4.2.2 | |

Energy Storage | Relative liver size (liver weight against length) | ANCOVA |

Relative egg size (mean egg weight against body weight) | ANCOVA | |

Relative egg size (mean egg weight against age) | ANCOVA |

Note: these analyses are for informational purposes, and significant differences between exposure and reference areas are not necessarily used to designate an effect.

^{1} For the ANCOVA analyses, the first term in parentheses is the endpoint (dependent variable, Y) that is analyzed for an effluent effect. The second term in parentheses is the covariate, X (age, weight or length).

### 8.3.1 Preparing the Analyses

Upon completion of the field and laboratory measurements, the data should be promptly entered into a computer spreadsheet and quality assurance / quality control (QA/QC) should be conducted. Values entered into the spreadsheet should be double-checked with the original handwritten data sheet to prevent typographical errors. A data matrix with the location identifier (area), variables in columns, and observations in rows operates as the fundamental working unit. In this spreadsheet, include a column for comments on the physical condition and any abnormalities noticed during the sampling process. These comments may prove to be useful in identifying unusual observations and help to determine whether data should be removed from an analysis. A location identifier for area or site should be chosen--one that can be easily distinguished as reference or exposure. This will allow for easier interpretation for others who are not familiar with the location identifier codes. If an insufficient number of fish were collected at an exposure site but were collected at the reference site, be sure to make special note of this.

Failure to identify transcription errors can invalidate further analyses. Assuming the data have been entered correctly, data that will be necessary for interpretation should be summarized, screened for erroneous values and outliers, and assessed for normality and transformed if necessary; and, any significant confounding factors should be summarized.

Differences between sexes in growth rate, body weight, condition factor, gonad size and liver size are common, due to differences in overall energetic requirements between male and female fish. Therefore, for all parameters, sexes should initially be treated separately when conducting the analyses. In addition, sexually immature fish should not be mixed with sexually mature fish for analyses.

#### 8.3.1.1 Immature Fish

It should be confirmed that all fish which are assumed to be adults are undergoing gonadal development for the next spawning season. The inclusion of immature fish into statistical analyses can provide misleading results. Immature fish devote proportionally more energy toward growth, so the body size-gonad relationship for immature fish is different than that of adult fish. For data analysis, fish identified as immature in the spreadsheet should be removed. The gonadosomatic index (GSI) = gonad weight / body weight x 100 can be useful in identifying immature fish. As a general rule, for many fish species, immature fish can be categorized as having a GSI of < 1%, although there are some notable exceptions, such as guarding species like the Brown Bullhead. A plot of gonad weight vs. body weight, and using this general rule for GSI, can be most useful in identifying immature fish. Comments from the field observations may also assist in identifying unusual observations that are suspected to be immature (e.g., comments such as “weighed only one testis”). The sampling period has to be adjusted to the biology (life history) of the species to avoid capturing fish prior to gonadal development for the upcoming reproductive season. However, when non-lethal sampling is to be carried out and age-frequency distributions are used to assess reproductive success, the timing of sampling is less important. Data analysis on immature and mature fish should be conducted separately, except, for obvious reasons, when comparing the proportion of non-spawning fish among sites.

### 8.3.2 Summary Statistics

The descriptive statistics (mean, median, standard deviation [SD], standard error [SE]) and the minimum and maximum values will be determined, when it is possible to obtain data, to establish the indicators of growth, reproduction, condition and survival that include the length, total body weight and age of the fish, the weight of its liver or hepatopancreas, and, if the fish are sexually mature, the egg size, fecundity and gonad weight of the fish (MMER Schedule 5, s. 16). The fish survey measurements to determine effects in fish growth, reproduction, condition and survival, the expected precision, and summary statistics are described in Chapter 3.

The summary statistics should be calculated by species and sex for each area being summarized (e.g., reference area and exposure area). Before calculating summary statistics, the data should be graphed using box plots for examination of extreme outliers. The summary statistics should be presented in graphical and tabular format for all variables. The data should be examined for normality and equality of variances (basic statistical assumptions). Note that slopes and adjusted means and associated error terms should also be reported for ANCOVA, as outlined below.

Visual screening techniques such as box and whisker plots, normal probability plots, and stem-and-leaf diagrams can be used to identify extreme values (true outliers and/or data entry errors). Most statistical software packages provide data summary modules capable of generating appropriate summary statistics and graphics.These summary statistics are usually needed for presentation, and aberrantly high or low values can indicate errors. Extreme values or outliers should not be removed from the data set (unless they are obvious sampling, measurement or data entry errors) (Grubbs 1969; Green 1979), because mistakenly removing valid data will result in the loss of statistical power in the fish survey. Instead, extreme values should be identified in the report and the influence of the extreme value(s) on the results should be determined by reanalyzing the data without the extreme value.

### 8.3.3 Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA)

In addition to descriptive statistics, an analysis of the results must be conducted to determine if there is a statistical difference between the sampling areas (MMER Schedule 5, s. 16(*c*)). This is usually conducted using ANOVA or ANCOVA. However, in some instances, other statistical procedures (e.g., non-parametric methods) may be used. The analyses (for ANOVA and ANCOVA) that are used to determine whether statistically significant effects have occurred should follow these three steps of data inspection, analysis and interpretation (Appendix 1 provides a step-by-step guidance through the statistical procedures for the fish survey):

- The data should be inspected to see whether they satisfy the assumptions of ANOVA or ANCOVA. These procedures are robust enough to allow for moderate violations of some assumptions and, in some cases, data transformation will help to remedy departures from the assumptions. In cases where data transformations do not sufficiently rectify departures from the assumptions, it may be necessary to use non-parametric procedures, in which case the methods of power analysis discussed in section 8.6 would not apply. These issues are further discussed below, and the standard statistical texts (e.g., Sokal and Rohlf 1995) should be consulted for a more complete discussion.
- Following inspection of the data and any necessary transformations, the actual statistical comparisons are carried out.
- After the statistical comparisons are made, key results for the effect indicators (Table 8-1) should be presented in a clear fashion so as to indicate whether there has been effects and, if so, the nature of the effects (including the direction and magnitude of the effects). An effect is declared if the palue is less than the a priori α value determined, as outlined in section 8.6.

#### 8.3.3.1 ANOVA

ANOVA is used to test for site differences in length, weight and age. The assumptions for ANOVA are that:

- the data for reference and exposure populations are normally distributed;
- the variances are equal between the reference and exposure populations; and
- the error terms are independently distributed.

A one-factor ANOVA is used to test for differences in the mean response (length, weight or age) using the factor site (e.g., reference or exposure). A residual plot can be useful in identifying outliers. Observations with studentized residuals with a magnitude greater than 4 typically warrant investigation. Non-parametric alternatives for ANOVA include the Kruskal-Wallis test, or, if comparing two sites, the Mann-Whitney test (non-parametric alternative to the two-sample T-test).

##### 8.3.3.1.1 Normality and Homogeneity of Variances

The assumptions of normality and homogeneity of variance should be assessed before applying most parametric procedures. However, most univariate normal distribution-based statistical methods are quite robust and can support moderate violations of the assumptions. Transformation of original data will help normalize the data or homogenize the variances. Logarithmic transformations are often preferred because most biological measures are considered to operate on a log or exponential scale (Peters 1983) and such a transformation is biologically meaningful. It should be noted that for the purposes of the fish EEM survey, 1 should not be added to values before logging because it has undesirable effects on the calculated variances when changing measurement units. If the transformations are unable to produce data that meet the assumptions, a plot of the residuals may reveal problematic data points that may warrant investigation. Most of the univariate statistical methods are robust under moderate violations of assumptions, with some exceptions such as analyses with small and unequal samples. For serious violations, non-parametric statistics can be considered.

##### 8.3.3.1.2 Independence (Pseudo-replication)

When designing experiments, it is desirable to ensure that replicates are randomly allocated to different treatment levels, such that the responses of each replicate are independent of other replicates. This element of randomness provides some assurance that observed differences in responses among treatments results from treatment effects and not from other factors.

Lack of independence can occur when, for example, one person collects all the data from the exposure area while another person collects data from the reference area. This can bias the data if the two individuals consistently use slightly different sampling or sorting protocols. Generally, these kinds of problems can only be remedied by changing the method of conducting the sampling so as to remove the sources of bias.

Randomly allocating replicates to different treatment levels is a relatively easy procedure when conducting manipulative experiments (e.g., controlled laboratory tests), but is less obvious for observational field studies. Observational studies, such as environmental impact studies (e.g., single-stressor EEM studies) or environmental assessments (i.e., multiple stressors), test hypotheses about the presence and magnitude of effects. However, the strength of inferences from these types of experiments is limited, for two reasons (Paine et al. 1998):

- the stressor (e.g., mine outfall, hydroelectric dam) cannot be reproduced; and
- stressors cannot be applied randomly to replicates.

What this means is that the stressor or treatment is always partly or wholly confounded with space or time, and that the observed effects may or may not be caused by the stressor of interest. For example, when investigating whether effluent from an industrial plant is having an effect on downstream fish populations, it is not possible to replicate the treatment of effluent exposure (i.e., there is only one plant and outfall), or to randomly assign fish populations to the different treatment levels (reference vs. exposed). As such, when significant differences are observed between reference and exposed fish populations, one can conclude that there are differences between these two populations, but not necessarily that the differences were caused by effluent exposure. Interpreting significant differences as treatment effects when either treatment is not replicated or replicates are not independent is referred to as pseudo-replication (Hurlbert 1984).

Before attributing cause to any specific stressor, it is critical that observations be confirmed, through replication over time, and that some effort be expended to confirm that the stressors of interest are involved in the responses.

#### 8.3.3.2 ANCOVA

ANCOVA is used to test for site differences in condition, relative gonad weight, relative liver weight, weight-at-age, size-at-age, and relative fecundity. A summary of these analyses is provided below.

Effect Endpoint | Response Variable | Covariate |
---|---|---|

Condition | Body weight | Length |

Relative liver weight | Liver weight | Body weight |

Relative gonad weight | Gonad weight | Body weight |

Weight-at-age | Body weight | Age |

Size-at-age | Length | Age |

Relative fecundity | Eggs/female | Body weight |

The assumptions for ANCOVA are that:

- the relationship between the response and covariate is linear;
- the slopes of regression lines among sites are parallel;
- the covariate is fixed and measured without error; and
- the residuals are normally and independently distributed with zero mean and a common variance.

It should be noted that ANCOVA is basically a two-step procedure consisting of:

- determining whether the slopes are approximately parallel; and
- if the slopes are parallel, going on to determine whether the elevations of the regressions are significantly different. This procedure is discussed more fully below.

ANCOVA is used to test for differences in a response among sites while taking into account the variability in test subjects by including a covariate in the analysis. This inclusion of a covariate in the analysis decreases the error term (by accounting for the variability explained by the regression of the response variable on the covariate) and thus increases the power of the test (Huitema 1980).

It has been suggested that the range of the independent variable (covariate) should be approximately the same for each site. This will be difficult to assure in practice, but the violation of this should be considered when interpreting results from such cases. If there is reason to believe that there are issues with the overlap of the range of covariate values, perform a single-factor ANOVA on the covariate values between sites. If the covariate means do not significantly differ between sites, the results of the ANCOVA will probably be reliable (Quinn and Keough 2002). A significant difference in the mean covariate values between sites is a significant effect. In interpreting differences in the covariate means or ranges observed, take into consideration the consistency of sampling gear between sampling sites and the selection of samples. It may be appropriate to provide an analysis of a subset of the data, omitting unusually high or low covariate values in order to provide a reliable analysis.

The range of covariate values for the weight-at-age effect endpoint must be considered before performing an ANCOVA. For several small-bodied fish species, the range of the covariate (age) might only be between 2 and 3 or 2 and 4. An ANCOVA with only 2 or 3 values of the covariate can provide misleading results. In these cases it may be appropriate to perform a one-factor ANOVA on body weight, using site as the factor for each age group.

##### 8.3.3.2.1 Analysis of Residuals

The preferred method of examining the residuals is to use graphical methods rather than relying on formal tests to assess normality and equality of variance. In fact, Day and Quinn (1989) have recommended against using formal tests. A good discussion of this topic can be found in Miller (1986). Draper and Smith (1981) review various methods of examining residuals, particularly residuals from regressions. Most statistical software packages also provide modules for examination of residuals. These methods are usually graphical, although diagnostic statistics are available as well. The primary advantage of these methods, compared to formal tests, is that they can identify the cause of violations of normality or equality of variances.

##### 8.3.3.2.2 Independent Variable

The assumption that the independent variable is fixed is frequently violated, and Draper and Smith (1981) discuss the consequences of this violation. A non-fixed independent variable is likely to prove problematic, mainly in situations where the range of the independent variable is very small, i.e., when the range in size (or age) of the fish included in the regression is very small. In this case (very narrow size or age range), there is little to be gained by using ANCOVA with size or age as a covariate, and the data would be better analyzed as a simple ANOVA comparison of the exposure to reference area (i.e., no need to factor out the influence of the covariate).

##### 8.3.3.2.3 Linear Regression

The assumption of a linear relationship can be tested for samples with multiple observations at different values of the independent variable. This may be possible for discrete variables such as age, but not for continuous independent variables such as body weight. At a minimum, linearity should be verified by visual inspection. Linearity can often be improved by transformation (e.g., the log-log transformation is used very widely for this purpose for the EEM fish ANCOVA analyses). The regression plots should also be inspected to ensure that the slopes are not unduly influenced by outliers. Scatter plots help identify outliers and unusual data. For example, when reproductive data are analyzed for fish, the plots aid in identifying potential “immature” fish that could affect the results. The scatter plots should be included in the interpretative report.

##### 8.3.3.2.4 Slopes of the Regression Lines

A key assumption of ANCOVA is that the slopes of the regression lines for the reference vs. exposure areas are approximately equal. Therefore, the first part of an ANCOVA analysis is to test for differences in slopes between areas. A significant interaction term in the ANCOVA for covariate X vs. area (e.g., age*area or size*area) indicates significantly different slopes. In cases where the slopes are not significantly different (i.e., interaction term not significant), this indicates that the regression lines are approximately parallel to each other. Using the weight-at-age ANCOVA as an example, parallel slopes would indicate that weight gain over age is similar for both areas. The next step in this example is to proceed with the ANCOVA model, and test for differences in adjusted means (elevation) to investigate whether fish are proportionately heavier at any age in one area than in another.

It is possible that the slopes of regressions may differ. For example, fish from the reference area may be gaining weight more rapidly with increasing age (steeper slope) than fish from the exposure area. If the slopes of the regressions are significantly different, the ANCOVA cannot be completed. In this case, using the weight-at-age example, the effect would not be a proportional difference in weight at any age; rather, the rate of weight gain with increasing age would be significantly different among areas. This is considered a statistically significant EEM effect for the fish survey. That is, an effect would be determined as a significant difference in slope among areas rather than a significant difference in elevation. For this situation, it is also a good idea to plot separate regression lines to obtain a better qualitative understanding of the weight-at-age relationship for each area over the entire data range of the X covariate (e.g., where do the lines intersect?). It should be noted that, even when the slopes of the regressions significantly differ among areas, it is still possible to make further comparisons over a particular range of values for the X covariate (i.e., a particular age or size range) (Sokal and Rohlf 1995). This kind of comparison would be appropriate if it is judged that that particular age or size range is of particular concern.

It is also preferable that the range of the independent variable be approximately the same for each “treatment” (i.e., area). This may be difficult to assure in practice, but any violation of this should be considered when interpreting results from such cases. For example, if the size range used as the X covariate for the reference area does not show much overlap with the size range for the exposure area, use of the ANCOVA results requires the assumption that the regression slopes would still be parallel for overlapping size ranges and may not be appropriate in this situation.

##### 8.3.3.2.5 Options for Non-parallel Regression Slopes

When the assumption of parallel regression slopes is not met, ANCOVA cannot proceed, because adjusted treatment means cannot be correctly interpreted. In this case there is a covariate by treatment interaction, and differences in the response variable among treatments vary at different values of the covariate. There are a few options for dealing with non-parallel regression slopes in ANCOVA. These are discussed below in the order that the methods should be applied to data sets with non-parallel slopes. The first two options provide mechanisms by which the slopes can be treated as being parallel, thus allowing a full ANCOVA and comparison of adjusted means. The third option provides an alternative methodology for calculating measured effects when the slopes cannot be treated as being parallel, even after applying options 1 and 2.

*1. Influential Points (from Barrett et al. 2010)*

Influential points are observations with high leverage (outliers in the covariate space) that have the potential to dominate conclusions by producing substantial influence on the regression coefficients (Fox 1997). If one or more points is highly influencing the slope of a regression line and causing non-parallel slopes, removal of this (these) point(s) may remove the evidence against fitting the data to the parallel model. Influence can be assessed using the Cook’s distance statistic (Cook 1977, 1979), which is incorporated into many statistical software packages. It is calculated using studentized residuals (outliers in the response variable) and a measure of leverage called “hat values” (outliers in the predictor variable) as a measure of impact for each observation (Fox 1997). A plot of Cook’s distance vs. the covariate is most useful in identifying high-influence observations. A numerical cut-off of 4 / (*n*-*k*-1), where *n* is the total number of observations and *k* is the number of predictors in the regression model, can also be used to assess high-influence observations (Fox 1997).

*2. Coefficients of Determination (from Barrett et al. 2010)*

The coefficient of determination (R^{2}) expresses the proportion of the total variability in the response variable that is explained by its linear relationship with the independent variable, and is a measure of the association between the two variables (Quinn and Keough 2002). When the regression slopes are found to be non-parallel, the R^{2} of the full regression model (model with the interaction term included) can be compared to the R^{2} of the reduced regression model (model with the interaction term removed). When the R^{2} of the parallel (reduced) model is high (greater then 0.8) and only slightly (less than 0.02) lower than that of the full model, the parallel model can provide a sufficient representation of the data and can be used to proceed with the analysis.

*3. Estimating Effects for Different-sized Fish (from Lowell and Kilgour 2008)*

When the above two methods cannot be applied to the data set (i.e., when the slopes remain non-parallel even after applying the above two methods), the following method can be used to estimate measured effects for smaller (or younger) and larger (or older) fish. First determine the minimum and maximum values of the covariate within the range of covariate overlap for the two regressions (reference and exposure areas). Then, determine the predicted values of the response variable for each area regression line at these two covariate values (minimum and maximum). An estimate for the effect at the minimum covariate value (i.e., the effect on smaller or younger fish) will be the difference in predicted values, calculated as exposure-predicted value minus reference-predicted value, expressed as a percentage of the reference-predicted value. If the data were log-transformed, the predicted values must be anti-logged (i.e., x expressed as 10x) before calculating the percent difference. The calculation is the same for larger (or older) fish, but using the maximum value of the covariate where the ranges for each area overlap. Each of these two measured effects (percent differences for small/young fish and large/old fish) can then be compared to CESs in the same way as is done for measured effects calculated from means (from ANOVA) or adjusted means (from ANCOVA).

##### 8.3.3.2.6 Non-parametric Alternatives to ANCOVA

ANCOVA is robust to violations of the assumptions of the test when sample sizes are approximately equal (Huitema 1980; Hamilton 1977). When assumptions are seriously violated and sample sizes are unequal, non-parametric alternatives to ANCOVA could be considered. Several different non-parametric techniques using ranks have been proposed. Iman and Conover (1982) proposed a non-parametric alternative in which the response and covariate are replaced by their ranks. The analysis is the same as the parametric ANCOVA using the ranks as data, and is the simplest non-parametric alternative. Groups of tied ranks are replaced by the average rank for that grouping. Some other non-parametric alternatives are discussed in Shirley (1981) and Quade (1967).

### 8.3.4 Transformations

Transformations of the data can often help improve normality and homogenize variances (reduce some violations), and an examination of the relationship between the means and variances can help identify the most appropriate transformation (see Green 1979). Taylor’s Power Law (Taylor 1961), which examines the relationship between treatment means and variances, can be used to determine the specific transformations in order to normalize data or homogenize variances (Green 1979). Logarithmic transformations are often preferred because biological measures are frequently considered to operate on a logarithmic or exponential scale (Peters 1983). It should be noted that 1 should not be added to values before logging for the purposes of the fish EEM survey, because it has undesirable effects on the calculated variances when changing measurement units. If the transformations are unable to produce data that approximately meet the assumptions, it may be necessary to use non-parametric statistics.

### 8.3.5 Level of Replication

For each of the ANOVA and ANCOVA analyses, the level of replication (sample size, n) is the number of individual fish. The minimum sample size recommended is 20 sexually mature fish per sex (and an additional 20 sexually immature fish if small-bodied fish species are being sampled) for each of the 2 sentinel fish species in both the reference and exposure area. A power analysis should be conducted to determine sample size if the appropriate data are available.

### 8.3.6 Effect and Supporting Endpoints

#### 8.3.6.1 Size-at-Age

Rates of growth are commonly described by the relationship of size (as weight or length) to age. Over the entire lifespan of a fish, this relationship is curvilinear, with the rate of increase declining as fish approach the limit of their lifespan (Ricker 1975). As only adult fish are often sampled, classical growth rates cannot be calculated. Nevertheless, for the purposes of the EEM program, fish growth can be inferred from size-at-age estimates determined for each area using ANCOVA. This calculation assumes that the relationship between size and age for adult fish is approximately log-linear (log size vs. log age) (Bartlettet al. 1984).

Size-at-age may be estimated by calculating the regression relationship between body size (weight or length) and age for each sampling area (reference and exposure). It is recommended that both length and weight be used to calculate size-at-age, in order to determine which provides the best fit and tightest regression.

#### 8.3.6.2 Gonad Weight, Liver Weight, Condition and Fecundity

Relative gonad and liver size (and fecundity) are obtained by regression and analyzed using ANCOVA, using body weight as the covariate. Likewise, condition is obtained by regressing body weight against body length, and essentially describes how “fat” fish are at each area.

A variety of indices have been used in fisheries biology to describe the condition of fish (Bolger and Connolly 1989). Calculating the ratio of one variable to another has been used to derive many of them. Examples of a few common indices are):

- condition factor (k) = 100 (body weight/length
^{3}); - GSI = 100 (gonad weight/body weight); and
- liver somatic index (LSI) = 100 (liver weight / body weight).

In general, however, investigators have become cautious about using derived variables and ratios because they may have undesirable statistical properties (Green 1979; Jackson et al. 1990). Although these indices may be used for presentation purposes, it is preferable statistically to estimate (and analyze) the parameters from regressions of original variables (i.e., ANCOVA) rather than from ratios (Gibbons et al. 1993).

#### 8.3.6.3 Mean Age

Calculation of mean age is meant as a gross reflection of the age distribution of adult fish collected from each area. Variability in mean age of fish can be estimated using ANOVA. The mean square error from the model is the best estimate of variability. Site difference in length and weight can also be analyzed in this fashion. It is essential that the sampling gear be consistent between the sampling areas, because most sampling methods select for certain age classes.

#### 8.3.6.4 Age-at-Maturity

Age-at-maturity is a commonly used parameter in fisheries biology. However, few methods of calculation incorporate a measure of statistical confidence or variability. Therefore, it is recommended that age-at-maturity be estimated by traditional probit analysis, as is commonly used for determining median lethal concentration (LC50) in toxicity tests. By determining the proportion (%) of mature individuals in each adult age class, and converting these data to probits (or plotting the data on probit paper), a straight-line relationship is generated (probit vs. log age) that allows one to estimate the age where 50% of the fish sampled are sexually mature. An estimate of variability in age-at-maturity among individual fish can be obtained from the slope of the line. The slope estimates 1/SD. Therefore, the SD is estimated by 1/slope. Using data collected over several phases, confidence limits can be calculated as an estimate of precision and statistical comparison of area values. Most statistical software packages can convert percentages to probits, and several small, independent packages are designed to conduct LC50/probit analysis and generate the confidence limits. For more detailed information on conducting probit analysis, refer to Hubert (1980). For a discussion of factors to be considered when using probit analysis and other techniques for estimating age-at-maturity, refer to Trippel and Harvey (1991).

### 8.3.7 Statistical Analysis for Non-lethal Sampling

For non-lethal sampling, length-frequency distributions should be compared using a 2-sample Kolmogorov-Smirnov test. Gray et al. (2002) analyzed young-of-the-year fish separately, in order to assess age-specific variability in growth rates.

The Kolmogorov-Smirnov test is a robust analysis to determine if two data sets differ significantly, and can be used to look at relative distributions of data. This is a non-parametric, distribution-free test that assesses the similarity of two cumulative distribution functions of two data sets (Sokal and Rohlf 1995):

H_{0}: F(X) = F(Y); H_{1}: F(X) ≠ F(Y)

Differences are considered significant at p < 0.05.

ANOVAs can be performed on length and weight. Data may need to be transformed. If appropriate, a post hoc analysis of differences between sites can be conducted using the Tukey Honestly Significant Difference test.

ANCOVAs should be performed for size-at-age (if possible) and condition factor (length vs. weight by site). The analyses should examine whether there were significant regressions, and if there was a significant interaction between areas. If slopes were equal, the data should be examined for a difference between areas, which area had the greatest values, what is the percentage area difference, and what was the p for slope or adjusted mean differences. If there is an interaction, the data should be plotted to see if the data are interpretable.

### 8.3.8 Data Quality Assurance / Quality Control and Analysis (Errors and Outliers)

Guidance on QA/QC for data analysis is provided below. The importance of ensuring data quality cannot be overemphasized. Each applicable chapter provides further guidance on QA/QC for study design, consistency of methods and measurements, and definitions of protocols and procedures.

There are various types of common entry errors, including data entry errors, entering the wrong species, missing or moved decimal places, and wrong sex or stage of maturity. It is critical to examine the data for errors and outliers prior to initiating analysis of data. Entry errors, transcription errors and invalid data are impossible to detect in final reports.

Data that have been entered incorrectly can sometimes be easily detected using scatter plots of length vs. weight, weight vs. gonad weight and weight vs. liver weight to look for points that are obviously different. Data entry errors are relatively easy to correct and can be re-entered. If the error cannot be reconciled because of obvious errors or omissions in the original data sheet, the fish (data point) should be removed from the data set.

Errors and extreme observations inflate the variance and reduce the power to detect significant differences in the data set. Evaluation of outliers includes consideration of the raw data, the field conditions, and the data collection process. Data points that are different, but are not due to entry errors, can arise for a number of reasons. For example, fish may appear sick or damaged, the fish may be an outlier for no apparent reason, or the outlier may represent an important phenomenon that is part of the response to the stressors under study.

In the first case, there can be a small number of fish that are obviously sick or were damaged (in a manner unrelated to the stressors under consideration) and should not be considered part of the data set for interpretation. These usually appear as single points that are separate from the main data set. Examples of these include fish that are missing their tail due to predation wounds, fish that have a jaw deformity or injury that has affected their feeding, or fish that are blinded through injury and are thinner than other fish. In these cases, the fish should be removed from the comparison.

If there is no obvious reason for the presence of rare outliers, the analysis should be conducted with and without the suspect observation, to determine how much influence it has on the conclusions. If it has an impact on whether a relationship is significant or not, statistics textbooks should be consulted for advice on how to evaluate whether the measurement can be removed.

In the third case, there can be several fish that are obviously different but possibly part of the relationship being examined. In other cases, fish can have a delay in sexual maturity associated with environmental stressors. In this case, several fish would appear as outliers. As noted above, the analyses should be conducted both with the outliers (to see if there are differences between sites) and without the outliers (to see if the fish with gonadal development are showing normal levels of gonadal development).

There may be cases when some fish within a population are different--for example, in situations where some fish may skip a year of spawning. If one is evaluating impacts on spawning, the analysis should consider the potential impacts on spawners and non-spawners independently. Individuals that skip reproductive seasons can usually be identified as negative outliers in a plot of gonad weight vs. body weight, i.e., plots of residuals from ANCOVA will be skewed left, and will not be normally distributed. These individuals should be excluded from analyses of reproduction, and possibly all variables. The reductions in variance achieved will usually compensate for any loss of power from reduced sample sizes. If females skipping reproductive years are excluded, that exclusion should be made objectively (Environment Canada 1997). Also, the frequency of such individuals in reference vs. exposed areas should be provided, in case skipping reproductive years is related to exposure. It is much more difficult to identify males that might skip reproductive years, if in fact that ever occurs.

## 8.4 Effects on Usability of Fisheries Resources

The purpose of examining the usability of fisheries resources is to determine whether the effluent has altered fish in such a way as to limit the resources’ use by humans. Fish usability can be affected by altered appearance, altered flavour, or odour (tainting), or tissue contaminant levels that exceed consumption guidelines for human health and levels found in the reference area. Table 8-5 outlines the effect and supporting endpoints and appropriate statistics (or guideline levels) that are applicable for usability of fisheries resources.

| Variable | Statistical Procedure |
---|---|---|

Effect Endpoint^{1} | Contaminants in fish tissue (mercury) | ANOVA, and evaluate against tissue guideline levels |

Supporting^{2} Endpoints | Physical abnormalities | Chi-square (separate test done for each class of abnormality; number of tests will depend on how many classes of abnormalities are present in the fish collected) |

Tainting | ANOVA |

^{1} Effect endpoint to be used for determining “effects” as designated by exceedence of tissue guideline levels. Statistically significant differences between exposure and reference areas may also be relevant (MMER Schedule 5, s. 9(c)).^{2} These analyses are for informational purposes, and significant differences between exposure and reference areas are not necessarily used to designate an effect.

### 8.4.1 Mercury in Fish Tissue

One of the methods for evaluating fish usability is by measuring concentrations of contaminants of concern in tissue from fish collected from the exposure and the reference areas. Contaminants may be identified as a concern if they are present in the effluent and there are applicable human health consumption guidelines for those contaminants. Local consumption and commercial fisheries should guide which fish species and edible tissues (e.g., liver, kidney, bones, flesh, or even entire fish) should be analyzed. Chapter 3 provides further guidance on methods for determining which (if any) contaminants should be included in the analyses. This determination depends, in part, on previously collected data on contaminant levels in fish tissue and the effluent.

Mines are required to measure levels of mercury in fish tissue if mercury is detected in the effluent (during effluent characterization – Chapter 5) above 0.10 mg/L. An effect in fish tissue, as defined in the MMER, means measurements of concentrations of total mercury that exceed 0.5 µg/g wet weight in fish tissue taken in an exposure area and that are statistically different from and higher than the measurements of concentrations of total mercury in fish tissue taken in a reference area (MMER, Schedule 5, section 1). Other potential metal mine-related contaminants of concern on a site-specific basis include copper, zinc, manganese, cyanide, radium, and uranium.

Chapter 3 recommends that tissue analyses be performed on five composite samples (each composed of at least eight individual fish) of a single species (preferably one sex) for each of both areas. That is, the sample size (n) for the ANOVA is five. This would be sufficient replication to detect an effect size of ±2 SD at power = 0.9, if α and β are set at 0.1 (see Section 3.0). Thus, careful consideration should be given to the appropriate effect size to use for the particular contaminant of concern and whether increased replication may be justified. If lesser effect sizes (i.e., less than 2 SD) or greater power levels are decided to be more appropriate for the contaminant, it will be necessary to increase sample size by analyzing more composite samples.

Percent lipid and percent moisture should also be reported for each tissue sample. This is for informational purposes only to aid in data interpretation. Statistical differences in percent lipid or moisture does not constitute an effect.

### 8.4.2 Physical Abnormalities

Fish usability can be affected by altered appearance of fish. The data collected during the biological monitoring studies shall be used to identify the sex of the fish sampled and the presence of any lesions, tumours, parasites or other abnormalities (MMER Schedule 5, s. 16(*b*)). Obvious abnormalities may include:

- tumours and/or lesions on the body surface (including the eyes, lips, snout, gills);
- spinal column malformations;
- eroded, frayed or hemorrhagic fins;
- other physical malformations; or
- obvious parasites.

For each class of abnormality that has been noted, a comparison between reference- and exposure-area fish should then be done using a chi-square goodness-of-fit test for relative frequencies. This information is used to help interpret effects, although, for EEM purposes, a significant difference does not necessarily signify an effect. The number of statistical tests that are necessary will depend on the number of classes of abnormalities that are noted in the collected fish. Sample size will have been determined by the number of fish collected for the fish survey. Cohen (1988) provides guidance on the power of a chi-square test that would result from that level of replication.

## 8.5 Data Assessment and Interpretation for the Benthic Invertebrate Community Study

The data collected during the benthic invertebrate community survey shall be used to determine the following effect indicators (MMER Schedule 5, s. 16(*a*)(*iii*)):

- total benthic invertebrate density;
- the evenness index;
- taxa richness; and
- the similarity index (referred to in this document as Bray-Curtis Index).

The above effect indicators are to be used for determining statistically significant differences between exposure and reference areas or along an exposure gradient. See Chapter 4 for additional information on these effect indicators. The mean, median, SD, SE, and minimum and maximum values are determined for each effect endpoint for the sampling areas. In addition, an analysis of the results shall be used to determine if there is a statistical difference between the sampling areas for each of the effect indicators (MMER Schedule 5, s. 16(c)).

### 8.5.1 Study Design and Statistical Procedures

Table 8-6 outlines the appropriate statistical procedures that are applicable for analysis for each of the recommended study designs. See Chapter 4 for additional information on these study designs. In contrast to the fish survey, the statistical procedure used to determine whether there has been an effect is dependent on which of the seven study designs is employed. For a given study, all four effect indicators are analyzed using the same study-design-determined statistical procedure. The one exception is the Reference Condition Approach, which uses a different set of statistical procedures that do not require inter-area comparisons of these four indicators, unless accompanied by ANOVAs; the procedures for this study design are outlined below and in Chapter 4.

Study Design | Statistical Procedure |
---|---|

Control-Impact (C-I) | ANOVA |

Multiple Control-Impact (MC-I) | ANOVA |

Before/After Control-Impact (BACI) | ANOVA |

Simple Gradient (SG) | Regression/ANOVA |

Radial Gradient (RG) | Regression/ANOVA |

Multiple Gradient (MG) | ANCOVA |

Reference Condition Approach (RCA) | Multivariate/ANOVA |

Note: Multivariate analyses can be performed on data collected using any of the designs in Table 8-6, to look for patterns that may be useful for highlighting potential areas of concern. Under certain circumstances, ANCOVAs may also be appropriate for any of these designs (e.g., to factor out the effect of a potentially confounding environmental variable).

Although it is possible to use ANOVA to analyze data collected under most of the study designs listed in Table 8-6, ANOVA is most applicable to the control-impact (C-I) and multiple control-impact (MC-I) designs. The simplest of these study designs is the C-I (or reference/exposure) design. In rivers, for example, this consists of one (usually upstream) reference area and one or more downstream exposure areas. Chapter 4 provides guidance on the different ways that C-I designs can be laid out. This type of study design employs ANOVA comparisons between reference and exposure areas, with a significant difference signifying an effect.

The MC-I design is similar to the C-I design, except that it employs additional reference areas that are located in adjacent watersheds or bays where the sampled habitat is comparable to that found within the exposure area. This type of design helps to reduce problems with confounding factors (e.g., when a single reference area differs from an exposure area with respect to several environmental variables in addition to the point-source effluent). Analogous to a C-I design, a significant difference between an exposure area and the mean of the reference areas, as determined by ANOVA, would represent an effect.

ANCOVA can also be used for both C-I and MC-I designs to factor out covariates that may create “noise” that makes it difficult to make simple ANOVA comparisons of reference to exposure areas. For example, without the use of ANCOVA, differences in depth among stations within the reference and exposure areas may mask effluent-related differences that may exist between those areas. This may occur when the benthic invertebrate indicators change along a continuum of increasing depth, and when it is not possible to take all samples at identical depths. In this example, ANCOVA can be used to factor out the effect of the depth covariate so as to focus on the effect of effluent exposure. The same approach can be used for other covariates that influence the benthic invertebrate indicators along a continuum.

An improvement to the above C-I and MC-I designs is possible when data can be collected both before and after initiation of effluent discharge into the receiving water area. This kind of monitoring design has been termed a before/after control-impact (BACI) design (Schmitt and Osenberg 1996). Use of a BACI design helps to distinguish effluent effects from natural differences between reference and exposure areas that may have existed before the initiation of effluent discharge.

In its simplest form, a BACI design entails collecting monitoring data at least once, both before and after initiation of effluent discharge in both a reference and exposure area, with the data analyzed using an area-by-time factorial ANOVA (Green 1979). In this situation, evidence for an effluent effect is inferred when the area-by-time interaction term in the ANOVA is significant. When the reference and exposure areas have been sampled repeatedly during both the before and after periods, it is possible to use a BACI paired series analysis, in which case the potential effects are investigated by testing for a change in delta (difference between reference and exposure) from the before to after period (Schmitt and Osenberg 1996). The design can be further improved by incorporating multiple reference areas (Schmitt and Osenberg 1996; Underwood 1997).

In contrast to the C-I and MC-I designs, the simple gradient (SG) and radial gradient (RG) designs are more amenable to regression analysis. The assumptions for regression analysis are applicable to the analysis of the benthic invertebrate community data, and have already been outlined in the section 8.3.3.2 discussion on ANCOVA (regression is one component of ANCOVA).

For additional information on study designs, refer to chapters 2 and 4.

### 8.5.2 Data Treatment

As for the fish survey, the data should be reported in both graphical and tabular format for each area (reference and exposure area(s)) being summarized. The reported data will include the descriptive statistics (mean, median, SD, SE, and minimum and maximum values) as well as the sample sizes. Gradient data should be presented graphically as scatter plots of variable vs. distance from the effluent outfall. For gradient designs with no discrete “areas,” tabular presentation prior to the main analysis would be applicable to station-by-station summary statistics, with the sampling unit being field sub-samples rather than stations. Station-by-station summary statistics are also applicable to C-I–type designs in cases where field sub-samples are not pooled prior to taxon enumeration, although the key summary statistics are those that are calculated for whole areas (to help with interpreting significant differences [“effects”] among areas).

The same three main analysis steps outlined in section 8.3.3 should be followed to determine whether statistically significant “effects” have occurred:

- The data should be inspected to see whether they satisfy the assumptions of the statistical test or procedure being used (ANOVA, ANCOVA, regression or multivariate analyses).
- The appropriate statistical procedure would be performed following data inspection and any necessary transformations (or non-parametric alternative).
- The key results for the effect indicators should then be presented to clearly indicate whether there has been an effect, with details on the nature of the effect (including direction and magnitude). Again, an effect is declared if the p-value is less than the a priori α value determined, as outlined in section 8.6.

The same considerations and constraints discussed in section 8.3.3 for conducting ANOVA and ANCOVA analyses apply to benthic invertebrate community analyses using those two statistical procedures. Thus, data inspection, analysis and interpretation when using ANOVA or ANCOVA for the benthic invertebrate community survey should follow the generic recommendations provided in section 8.3.3.

Gradient designs are particularly useful for 1) situations where rapid effluent dilution precludes the selection of an exposure area that is comparatively homogeneous in terms of effluent concentration and 2) determining how far along an effluent path the effects are observed (i.e., determining the geographical extent of “effects”). The geographic extent of “effects” can be determined graphically by plotting the response variable(s) against distance from the effluent outfall, and inspecting the data for an inflection point where the response variable asymptotes to the reference condition. Data from sampling stations arrayed in this manner could also be used, together with measured physicochemical data, in a multivariate analysis (e.g., ordination or clustering) that is used to identify which more distant stations tend to group with reference stations and which tend to group with clearly affected stations.

Both of these approaches (graphical plotting and multivariate analysis) look for patterns in the data to qualitatively determine the approximate geographic extent of an effect. That is, they do not necessarily entail hypothesis testing, and therefore, in the context of the EEM program, are not used to designate an effect sufficient to warrant follow-up action, but rather are used for informational purposes.

Nevertheless, statistical tests are possible for some gradients. In the simplest case, an effect would be declared if the slope of the regression of the variable against distance from the effluent source is significantly different than zero, or if the correlation coefficient is statistically significant (data transformations may be necessary to satisfy assumptions of linearity). In this case, the effect is a relatively uniform gradient of variable values away from the point source, rather than an effect in a given discrete area.

An effect can also be signified by a significant exposure vs. reference ANOVA difference when comparing a group of stations along the gradient close to the mine to “reference” stations along the gradient far from the mine. This is analogous to the C-I approach, and assumes some degree of uniformity in exposure within the exposure group of stations and within the “reference” group of stations. Furthermore, the two groups of stations would need to be far enough apart to represent clear differences in exposure, and a sufficient number of stations would need to be available for each group to attain the desired level of power. Based on the power analysis discussion in the following section, an initial recommendation is to have at least five fairly uniform stations relatively close to the mine (high effluent exposure area) and five fairly uniform stations far enough from the mine to approximate a “reference” area (i.e., minimally affected by the effluent). Providing intermediate stations would likely necessitate a total of at least 15 gradient stations overall.

Regardless of the method of analysis, overall statistical power is usually improved by emphasizing station replication on the 2 ends of the gradient. Again, emphasis should also be placed on extending the gradient sufficiently far from the mine (as much as is feasible) to allow sampling of stations that are as minimally affected as possible (and that serve as approximate “reference” stations).

Given sufficient sub-samples per station, it is also possible to use ANOVA to determine the presence or absence of an effect for a given station. This would entail using field sub-samples as replicates (treating stations as areas) and making station-by-station ANOVA comparisons of more high effluent exposure stations along the gradient to more distant reference stations. This method of analysis could be used to determine where along the gradient an effect disappears at the given α level of significance. This latter approach may, however, require extensive sampling effort, depending upon the number of stations along the gradient and the required (by power analysis) number of field sub-samples per station.

In cases where these kinds of statistical tests are not adequate for a given gradient design, a redesign of the monitoring program will be necessary to enable an appropriate statistical test during the next monitoring study. The redesign may entail increased replication focused on the key exposure and reference areas (or stations) that are to be compared (e.g., increased replication in the area of greatest effluent exposure and in the area with the lowest effluent exposure that best represents reference conditions).

In some cases, it may be necessary to compare exposure vs. reference gradients. This would be the case when a co-occurring (non–mine-related) environmental gradient (i.e., covariate) confounds effluent effects in the exposure area. By using a multiple gradient (MG) design, it may be possible to make statistical comparisons of the exposure area gradient to a similar (non–mine-related) environmental gradient in an unexposed reference area. The reference gradient should be as similar as possible in depth and habitat to the exposure area gradient. Potential effluent “effects” would be tested for by using ANCOVA to compare reference to exposure area regression elevations (or adjusted means), while factoring out the influence of the co-occurring environmental covariate.

For example, if the gradient in effluent exposure away from the mine was confounded by a co-occurring increase in depth, an ANCOVA comparison might be made to a reference area where the depth gradient is the same. If the slopes for the reference and exposure area regressions against the covariate (X = depth) are approximately equal, a significant difference in adjusted means would indicate an effect of the effluent on the effect indicator Y (e.g., taxon richness). Again, section 8.3provides further guidance for ANCOVA analyses and the different ways these analyses can be used to indicate an effect.

### 8.5.3 Reference Condition Approach

The reference condition approach (RCA) is a study design that combines inspection of multivariate patterns in the data with assessments of whether exposure stations fall outside a given ordination probability ellipse for reference stations. The fundamental concept of the RCA is to establish a database of stations that represent unimpaired conditions (reference stations) at which biological and environmental attributes are measured. This database is used to develop predictive models that match a set of environmental variables to biological conditions. These predictive models then allow a set of environmental measurements to be made at a new station and used in the model to predict the expected biological condition at the new station. An assessment of whether there has been an effect at the exposure station is enabled by a comparison of the actual biological condition at the new (exposure area) station with conditions at the reference stations to which the new station is predicted as belonging.

The reference condition database is established by an initial standardized sampling program at a wide variety of spatial scales. The same benthic macroinvertebrate sampling protocol is used in as many ecoregions and stream orders or lakes as are available in a catchment. A number of environmental variables are measured in conjunction with invertebrate sampling. The data are then subjected to a 3-step multivariate analysis in which:

- a number of invertebrate groups are formed based on similarity of community structure;
- biological data are correlated with environmental attributes, and an optimal set of environmental variables is identified that can be used to predict group membership; and
- the biological condition of test (exposure) stations is assessed by using the optimal set of environmental variables to predict group membership.

How the test station fits, relative to the group to which it is predicted to belong, establishes whether and to what degree the station is different from the reference group. A station or group of stations that fall outside the statistically determined ordination probability ellipse for the reference stations signifies the presence of an effect. The boundaries of the reference ellipse should be set a prioribased on some of the considerations discussed in section 8.6. A more complete discussion of the assumptions, procedures and interpretation of the RCA is available in Reynoldson et al. (1995, 2000) and Bailey et al. (2003).

It should be further noted that, depending on the timing and locations of an RCA sampling program, it may also be possible to use the resulting database to make ANOVA comparisons between reference and exposure areas in order to determine whether there has been an effect. This latter kind of analysis would be analogous to an MC-I design.

To summarize, an overall procedure similar to that outlined in section 8.3 should also be followed (with appropriate modifications) for the benthic invertebrate community survey. However, the power analysis is not applicable to graphical approaches and the RCA. Consequently, RCA studies should be designed in a way that provides an accurate and precise determination of reference conditions so as to maximize the likelihood of detecting departures from reference conditions at exposure stations, when they exist. The following elements may be included as part of an RCA study:

- Preparing the analyses: QA/QC (including checks for data entry errors), summary of confounding factors, description of the sampling design and taxonomic level used, clear identification of the sampling units used for statistical comparisons (e.g., stations rather than field sub-samples), ensuring equivalence of sampling substrata, and sampling techniques among different reference and exposure areas being compared
- Summary statistics (graphical and tabular presentation of means, etc., as described above)
- Statistical analyses (hypothesis testing) to determine “effects” (ANOVA, ANCOVA, regression)
- Graphical approaches (e.g., inspection of the shape of regression lines, which is used for inspecting patterns in the data rather than determining “effects”)
- Multivariate statistical analyses used for determining a) patterns in the data and b) the position in multivariate space of exposure stations relative to reference ordination probability ellipses; only b) is used to determine “effects”
- Power analyses (not applicable to graphical approaches and RCA)

### 8.5.4 Supporting Endpoints

The following benthic invertebrate community supporting endpoints should also be reported, including means, medians, SDs, SEs, minimum and maximum values, and sample sizes:

- Simpson’s diversity
- taxon (e.g., family) density
- taxon (e.g., family) proportion
- taxon (e.g., family) presence/absence

Unlike the effect endpoints (total benthic invertebrate density, the evenness index, taxa richness and the similarity index), the above-listed variables are included as supporting endpoints and are not statistically analyzed to determine “effects.” They may, however, be used to interpret effects at later stages (e.g., determining the magnitude and causes of “effects”). These should be reported in both graphical and tabular format for each area (reference and exposure area(s)) being summarized. It should be noted that there may be other descriptors that may also be useful for the interpretation of monitoring data, on a site-specific basis (see Resh et al. 1995 for a review).

## 8.6 The Role of Power Analysis, α, β and Critical Effect Size in Determining Effects

### 8.6.1 Setting α and β

In testing whether exposure areas differ significantly from reference areas, a low probability of a Type I error (α) is usually allowed so that a normal population or community will not be mistaken for an affected one. However, the monitoring program should also be designed to provide a reasonably high probability of statistically detecting a predetermined critical effect size (CES) if it has occurred, i.e., the power of the test should be high. Power is 1-β, where β is the Type II error (see below).

Type I error is partially kept in check by setting a broad margin for variation around what is considered “healthy.” Sufficient sampling effort should also be expended to reduce Type II error, taking into account the low probability allowed for Type I error. Thus, to determine what sampling effort is required, the CES and the Type I and Type II error will all be taken into account and set a priori. That is, decisions should be made about the magnitude of Type I and Type II errors that are acceptable for determining power and thus the sampling effort required to detect the recommended CES.

Type I error occurs (at probability α) if the null hypothesis that there is no effect is rejected when in fact it is true (e.g., an exposure area is declared as being different from reference when it is not).

Type II error occurs (at probability β) if the null hypothesis is accepted when it is false (e.g., the exposure area is declared as not being significantly different from reference when it is actually impaired). Therefore, α is the risk to industry and β is the risk to the environment.

The power of a statistical test is 1-β, the probability associated with correctly rejecting the null hypothesis when it is false (e.g., the probability associated with correctly identifying an impaired area). In a well-designed, properly replicated monitoring program, the goal is to keep α and β low and power high.

As can be seen from the equation given later in this section, one way to increase power, given a fixed sampling effort (i.e., sample size), is to increase α, i.e., there are trade-off decisions to be made when setting α and β.Traditionally, α has been set at 0.05 for experimental studies where, in many cases, the cost of a Type II error is not particularly high. That is, an α of 0.05 is typically used in situations where the primary concern is to have maximal confidence that a statistically significant effect is real. On the other hand, there is much less consensus and available literature on what is an appropriate level for β. Some studies have suggested using a minimal power of 0.8 (i.e., β = 0.2) (Alldredge 1987; Cohen 1988; Burd et al. 1990; Osenberg et al. 1994; Keough and Mapstone 1995).

In many cases, “this rule of thumb” can be traced back to Cohen’s seminal work on power analysis (see Cohen 1988), which is primarily geared toward applications in the behavioural sciences. For those types of applications, Cohen contended that Type I errors were likely to be more serious than Type II errors for cases where the biggest concern is to not propagate erroneous conclusions based on incorrect declarations of significant differences. Specifically, he suggested that, if Type I errors were to be considered four times more serious, it might be reasonable to set α at the traditional (in terms of experimental studies) 0.05 and β at 4 x 0.05 = 0.2. He cautioned, however, that this rule of thumb should be ignored for other types of studies where these assumptions are not applicable.

This latter caveat applies to environmental monitoring studies where, because of the potentially high cost (both ecological and monetary) of failing to detect negative impacts, many researchers in the field of biomonitoring argue that α should be set at least to the same level as β (e.g., Alldredge 1987; Underwood 1993; Mapstone 1995). That is, the argument has been widely made that, barring extenuating circumstances, the risk to the environment should not be set greater than that to industry. This suggests that the most reasonable starting point is to set α = β, and this position has been adopted by the EEM program. On a site-specific basis, it may sometimes be decided to 1) set α > β if it can be shown that the risk to the environment is of greater concern than the risk to industry, or to 2) set α < β if it can be shown that the risk to industry is of greater concern.

After deciding to set α = β, it is necessary to make a decision on an appropriate value for α and β. In many cases, this decision will be made within the context of the desired power of the test, the CES that the program is to be designed to detect, and the implications for sampling effort. This decision-making process can be illustrated using Table 8-7 for the benthic invertebrate survey, where the effects on sample size of setting α and β at different levels were examined for detecting a CES of ± 2 SD by using the following power analysis equation, which yields an approximate sample size (n) in one step for the most basic C-I ANOVA design (see also the discussion in the next section for further details) (Guenther 1981; Alldredge 1987):

n = (2(Z_{α} + Z_{β})^{2}(SD/CES)^{2}) + 0.25(Z_{α})^{2}

where:

- n = sample size
- Z
_{α}= standard normal deviate for α significance level (Type I error) - Z
_{β}= standard normal deviate for β significance level (Type II error) - SD = standard deviation
- CES = critical effect size

α | 1-β | |||
---|---|---|---|---|

0.99 | 0.95 | 0.90 | 0.80 | |

0.01 | 14 | 11 | 10 | 8 |

0.05 | 11 | 8 | 7 | 5 |

0.10 | 9 | 7 | 5 | 4 |

Using Table 8-7 for guidance (and the recommendation that the benthic invertebrate community survey should minimally have sufficient power to detect a CES of ±2 SD), the benthic invertebrate working group recommended α and β be initially set at 0.1. This implied that, in most cases, the sampling effort would require a sample size of 5, which is within the range used in many benthic surveys (Resh and McElravy 1993). Basic ANOVA power analysis calculations also indicate that α and β can be set equal to 0.1 for the fish survey effect endpoints as well, with very little effect (relative to α = 0.05, β = 0.2) on the sample size required to achieve the resulting level of power (1-β). The use of an α or β level other than 0.1 would require appropriate justification by either the proponent or the Authorization Officer (e.g., setting a more rigorous, lower Type II error (β) when the risk to the environment is judged to be of greater concern). Consultation with the Authorization Officer may also be required in cases where power analysis recommends the use of unreasonably high sample sizes.

It should also be noted from Table 8-7 that, by increasing sample size, it is possible to obtain lower Type I and II errors (lower α and β) while maintaining α equal to β. For example, α and β can both be set at 0.05, resulting in 95% power to detect a CES of ±2 SD, by increasing sample size to 8 (see Table 8-7). The same argument applies to the other components of EEM (e.g., the fish survey and fish usability components) for different desired CESs, although the required sample sizes will be different. Thus, setting α equal to β provides an economic incentive to carrying out a well-designed, well-replicated monitoring program, because providing sufficient replication will help reduce the probability of Type I errors (i.e., α is kept low), thereby reducing the probability of unnecessary follow-up studies. Furthermore, since α is linked to β, the power of the monitoring program to detect real effects will also be increased. This improvement in monitoring design helps to ensure a better understanding of what types of effects, if any, are occurring.

### 8.6.2 Power Analysis: Determination of Required Sample Size, Power and Appropriate Critical Effect Size

Power analysis is used for two major purposes during EEM:

- at the beginning of a monitoring study (a priori), to calculate the sampling effort (sample size) that will be required to detect a given CES at a given level of power; and
- following a recently completed monitoring study (post hoc), to determine the level of power that was actually achieved.

Both of these uses of power analysis are briefly reviewed here to help clarify the relationship between the two.

#### 8.6.2.1 A Priori Power Calculations

During the initial design phase of an EEM study, power analysis can be used to determine the sample size required to achieve a test adequate to detect an effect equal to a predetermined CES prior to sampling. Using the CES, the probability of Type I error “α,” the probability of type II error “β,” an estimate of reference variability (e.g., SD for the reference area), and making some assumptions about the distribution of the data being evaluated, a scientifically defensible sampling strategy can be devised. The discussion below outlines the most basic (i.e., C-I ANOVA or ANCOVA) procedure for determining required sample size. Sample size refers to the number of fish for the fish survey and the number of stations for the benthic invertebrate community survey. In cases where the required sample size calculated for one effect endpoint (e.g., invertebrate density/condition) is greater than that calculated for another (e.g., invertebrate taxon richness / relative gonad weight), the greater sample size should be used (unless, as discussed above, consultation with the Authorization Officer confirms that this would result in excessively high sample sizes).

Once CES has been determined, the levels of α and β have been selected, and SD for the particular mine location in question has been estimated, they are entered into the power analysis equation to calculate the sample size required to detect an impact of magnitude CES between or among areas at a given power level. For the case where CES is set at ± 2SD, due to cancelling of terms the determination of SD is not required for the power analysis, and Table 8-7 above gives pre-calculated sample sizes for various values of α and β.

It should be noted that determination of required sample size assumes that the variability among replicates for the exposure area is similar to that for the reference area. Although ANOVAs are fairly robust with respect to violation of normality assumptions, if the variance within an exposure area is much higher (or lower) than within the reference area, ANOVA comparisons may not be appropriate unless the variances can be made homogeneous by transformation. For the case where the exposure and reference variances remain significantly different following transformation, the power analysis outlined here may overestimate or underestimate the number of sampling stations required. Non-parametric tests may be used in this case; non-parametric power analyses would then be required to estimate required sampling effort (Thomas and Krebs 1997).

For a basic C-I ANOVA or ANCOVA design, the estimated sample size required to detect a given CES at a given power level can be calculated by arranging the standard power analysis equation as follows (Green 1989):

n = 2(t_{α} + t_{β})^{2} (SD/CES)^{2}

where:

n = sample size

t_{α} = value of Student’s t statistic (two-tailed) with (n-1) degrees of freedom (df) at a significance level of α

t_{β} = value of Student’s t statistic (one-tailed) with (n-1) df at a significance level of β

SD = standard deviation

CES = critical effect size, represented in the measurement units of the response variable

The equation is solved iteratively by choosing an approximate value of n (usually 20 for the fish survey) to look up t_{α} and t_{β} and then using the solution to find a more accurate n; the procedure is repeated until arriving at a final estimate for n (see section A1-8 of Appendix 1). Alternatively, the equation given in section 8.6.1 can be used to approximately solve for n in one step. Pre-calculated tables of n (expanding upon Table 8-7) are available for a variety of values of α, β and CES (Alldredge 1987; Cohen 1988).

The reader is referred to the appropriate literature (e.g., Cohen 1988) for guidance on power analysis and tables for determining sample size for regression (simple gradient, radial gradient) and chi-square (analysis of physical abnormalities in fish) monitoring designs. A number of software programs are also available for conducting power analyses for a variety of statistical designs (Thomas and Krebs 1997). As for a basic C-I design, power analysis for these other designs will also require an a priori decision on an appropriate magnitude for CES. For regression analyses, Cohen (1988) gives a table for converting CESs from SD units to a correlation coefficient (r), and in some cases it may be acceptable to use this r to look up the approximate sample size required for a regression-type gradient design. For example, given certain assumptions, he shows that using a CES of 2 SD is equivalent to using r = 0.707 (or r^{2} = 0.5). Although the exact equivalency depends on the assumptions involved, it may be acceptable to use this conversion (possibly with a correction factor) to obtain an approximate CES appropriate for use in regression-type analyses. Tables are provided in Cohen (1988) for looking up required sample sizes for various values of r, α and β.

CESs for the fish survey are percentages of the reference mean and are not represented in the measurement units of the response variable, as these effect sizes would vary for different studies. Therefore, the coefficient of variation (COV), expressed as a percentage of the reference mean (COV = SD / reference mean x 100) is used as a measure of variability in sample size calculations. For a basic fish survey C-I ANOVA design with untransformed data (e.g., as used for the age effect endpoint), the estimated sample size required to detect a given effect size at a given power level can be calculated by using a different version of the equation above. This equation is as follows (Green 1989):

n = 2(t_{α} + t_{β})^{2} (COV/CES)^{2}

where:

COV = coefficient of variation (expressed as a percentage using reference site data)

CES = critical effect size (expressed as a percentage of the reference mean)

For a basic C-I ANCOVA design using log-transformed data (e.g., as used for the relative gonad weight effect endpoint), the estimated sample size required to detect a given CES at a given power level can also be calculated by using a different version of the equation above. This equation is as follows (Green 1989):

n = 2(t_{α} + t_{β})^{2}(SD_{z}/CES_{z})^{2}

where:

SD_{Z} = standard deviation of the residuals using log-transformed data

CES_{Z} = log(*f* +1), where *f* = CES represented as a fraction of the reference mean (e.g., for a CES of 25% ⇒ *f* = 0.25)

For both of the above equations, sample size must be solved iteratively by choosing an approximate value of n to start with as discussed above.

#### 8.6.2.2 Post Hoc Power Analyses

After completion of a sampling program, if a non-significant result has been obtained, a post hoc power analysis can be used to calculate the actual power that was available to detect an effect and the minimum CES that could be detected for a given power (Quinn and Keough 2002). This is particularly important if any of the relevant parameters that could affect power (i.e., n, α, CES, SD) have changed since the beginning of the study. In addition, these calculations should be used to make sample size recommendations for the subsequent monitoring study. The post hoc power calculations can be performed by rearranging the formulas above to solve for t_{β} or the CES. For example, to calculate power for the previous two equations, we obtain:

and

Power can then be obtained from the calculated value of t_{β}.

## 8.7 Critical Effect Sizes

To ensure that increased monitoring efforts are focused in the appropriate areas, Environment Canada has developed CESs for key fish and benthic invertebrate survey effect endpoints. See Chapter 1for the table on CESs and for additional information.

## 8.8 Statistical Considerations for Mesocosm Studies

Some considerations would be unique to a mesocosm-type study. For example, control over experimental considerations would likely result in lower levels of variability within reference and exposure treatments, as compared with field data. This may make it possible to attain equivalent levels of statistical power using smaller sample sizes than used in the field. In the same vein, it may be possible to attain higher power levels or to detect smaller effect sizes while using the same sample sizes as used in the field. In fact, it may be desirable to have sufficient power to detect smaller effect sizes in mesocosm studies than in field surveys, due to the shorter exposure times typical of mesocosm studies. That is (using hypothetical numbers), a 10% effluent-induced change over a 30-day exposure period in a mesocosm study may be equivalent to a 25% change over a much longer lifetime exposure in the field.

In addition, due to the possibility of caging artifacts, it may be necessary to switch from using individual fish as the sampling unit for replication (as in the field) to using individual experimental enclosures (mesocosms) as sampling units. Using two mesocosm units (one for reference and one for exposure) with 20 fish each may not be valid, because it may not be possible to separate effluent effects from the effects due to subtle differences in the experimental enclosures. This is an example of the potential for confounding effects due to pseudo-replication (Hurlbert 1984).

In comparison to the fish survey, it may be even more straightforward to substitute mesocosm studies for benthic invertebrate community field monitoring, at least in terms of statistical design and analysis. As for the fish survey, the same steps outlined for data preparation, presentation and analysis would apply. Furthermore, due to comparatively fast turnaround times for changes in invertebrate community structure within mesocosms, it may be possible to use the same effect endpoints as used in the invertebrate field survey (section 8.5). The most likely study design would be analogous to the C-I design (Table 8-6), with ANOVA comparisons being made between replicated reference and exposure mesocosms. The sampling units would be the individual mesocosms (equivalent to “stations” in the field survey). As for fish mesocosms, control over variability under experimental conditions may make it possible to attain greater statistical power or to detect smaller effect sizes (in terms of percentage change) using the same sample sizes as typically used in the field. This increase in *precision*is one of the most frequently cited advantages of using mesocosms in place of field sampling, and is weighed against the disadvantage of a potential decrease in *accuracy* due to using a (hopefully realistic) simulation of actual field conditions.

Chapter 9 provides more extensive discussion on data assessment and interpretation for alternative methods.

## 8.9 References

Alldredge JR. 1987. Sample size for monitoring of toxic chemical sites. Environ Monit Assess 9:143-154.

Bailey RC, Norris RH, Reynoldson TB. 2003. Bioassessment of freshwater ecosystems: using the reference condition approach. Boston (MA): Kluwer Academic Publishers.

Barrett TJ, Tingley MA, Munkittrick KR, Lowell RB. 2010. Dealing with heterogeneous regression slopes in analysis of covariance: new methodology applied to environmental effects monitoring fish survey data. Environ Monit Assess 166(1-4):279-291.

Bartlett JR, Randerson PF,Williams R, Ellis DM. 1984. The use of analysis of covariance in the back-calculation of growth in fish. J Fish Biol24:201-213.

Bligh EG, Dyer W. 1959. A rapid method of total lipid extraction and purification. Can J Biochem Physiol 37:911–917.

Bolger T, Connolly PL. 1989. The selection of suitable indices for the measurement and analysis of fish condition. J Fish Biol 34: 171-182.

Burd BJ, Nemec A, Brinkhurst RO. 1990. The development and application of analytical methods in benthic marine infaunal studies. Adv Mar Biol 26:169-247.

Cohen J. 1988. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Associates.

Cook RD. 1977. Detection of influential observation in linear regression. Technometrics 19:15-18.

Cook RD. 1979. Influential observations in linear regression. J Amer Stat Assoc 74:169-174.

Day RW, Quinn GP. 1989. Comparisons of treatments after an analysis of variance in ecology. Ecol Monogr 59:433–463.

Draper NR, Smith H. 1981. Applied regression analysis. 2nd ed. New York (NY): John Wiley & Sons, Inc.

Environment Canada. 1997. Fish survey expert working group report. EEM/1997/6. Ottawa (ON): Environment Canada.

Folch J, Lees M, Sloane Stanley GH. 1957. A simple method for the isolation and purification of total lipides from animal tissues. J Biol Chem 226:497-509.

Fox J. 1997. Applied regression analysis, linear model, and related models. Thousand Oaks (CA): Sage Publications Inc.

Gibbons DW, Reid JB, Chapman RA. 1993. The New Atlas of Breeding Birds in Britain and Ireland: 1988–1991. London (UK): Poyser.

Gray MA, Curry AR, Munkittrick KR. 2002. Non-lethal sampling methods for assessing environmental impacts using a small-bodied sentinel fish species. Water Qual Res J Can 37:195-211.

Green RH. 1979. Sampling design and statistical methods for environmental biologists. New York (NY): Wiley-Interscience.

Green RH. 1989. Power analysis and practical strategies for environmental monitoring. Environ Res 50:195-205.

Grubbs F. 1969. Procedures for detecting outlying observations in samples. Technometrics 11:1-21.

Guenther WC. 1981. Sample size formulas for normal theory *T* tests. Am Stat 35:243-244.

Hamilton BL. 1977. An empirical investigation of the effects of heterogeneous regression slopes in analysis of covariance. Educ Psychol Meas 37:701-712.

Hubert JJ. 1980. Bioassay. Dubuque (IA): Kendall/Hunt Publishing.

Huitema BE. 1980. The analysis of covariance and alternatives. New York (NY): John Wiley & Sons, Inc.

Hurlbert SH. 1984. Pseudoreplication and the design of ecological field experiments. Ecol Monogr 54:187-211.

Iman, R.L., Conover, W.J. 1982. A distribution-free approach to inducing rank

correlation among input variables. Commun. Statist.-Simula. Computa. 11, 311-334.

Jackson DA, Harvey HH, Somers KM. 1990. Ratios in aquatic sciences: statistical shortcomings with mean depth and the morphoedaphic index. Can J Fish Aquat Sci 47:1788-1795.

Keough MJ, Mapstone BD. 1995. Protocols for designing marine ecological monitoring associated with BEK mills. (Technical Report Series 11). National Pulp Mills Research Program. Canberra (AU): *Commonwealth Scientific and Industrial Research Organisation*.

Lowell RB, Kilgour BW. 2008. Interpreting effluent effects on fish when the magnitude of effect changes with size or age of fish. dans K.A. Kidd, R. Allen Jarvis, K. Haya, K. Doe et L.E. Burridge (éd.), *Compes rendus du 34 ^{ième} atelier annuel surla toxicité aquatique: du 30 septembre au 3 octobre 2007, Halifax, Nouvelle-Écosse*. Can Tech Rep Fish Aquat Sci 2793:82-83.

Mapstone BD. 1995. Scalable decision rules for environmental impact studies: effect size, Type I and Type II errors. Ecol Appl 5:401-410.

Miller RG. 1986. Beyond ANOVA: Basics of applied statistics. New York (NY): John Wiley & Sons, Inc.

Osenberg CW, Schmitt RJ, Holbrook SJ, Abu-Saba KE, Flegal AR. 1994. Detection of environmental impacts: natural variability, effect size and power analysis. Ecol Appl 4:16-20.

Paine RT, Tegner MJ, Johnson EA. 1998. Compounded perturbations yield ecological surprises. Ecosystems 1:535–545.

Peters RH. 1983. The ecological implication of body size. New York (NY): Cambridge University Press. 329 pp.

Quade D. 1967. Rank analysis of covariance. J Amer Stat Assoc 62:1187-1200.

Quinn GP, Keough MJ. 2002. Experimental design and data analysis for biologists. Cambridge (UK): Cambridge University Press.

Randall R, Lee II H, Ozretich R, Lake J, Pruell J. 1991. Evaluation of selected lipid methods for normalizing pollutant bioaccumulation. Environ Toxicol Chem 10:1431-1436.

Resh VH, McElravy EP. 1993. Contemporary quantitative approaches to biomonitoring using benthic macroinvertebrates. In: Rosenberg DM, Resh VH, editors. Freshwater biomonitoring and benthic macroinvertebrates. New York (NY): Chapman and Hall. p. 159-194.

Resh VH, Norris RH, Barbour MT. 1995. Design and implementation of rapid assessment approaches for water resource monitoring using benthic macroinvertebrates. Austral J Ecol 20:108-121.

Reynoldson TB, Bailey RC, Day KE, Norris RH. 1995. Biological guidelines for freshwater sediment based on benthic assessment of sediment (the BEAST) using a multivariate approach for predicting biological state. Austral J Ecol 20:198-219.

Reynoldson TB, Day KE, Pascoe T. 2000. The development of the BEAST: a predictive approach for assessing sediment quality in the North American Great Lakes. In: Wright JF, Sutcliffe DW, Furse MT, editors. Assessing the biological quality of fresh waters: RIVPACS and other techniques. Ambleside (UK): Freshwater Biological Association. p. 165-180.

Ricker WE. 1975. Computation and interpretation of biological statistics of fish populations. Bull Fish Res Board Can (23)2:519–529.

Schmitt RJ, Osenberg CW. 1996. Detecting ecological impacts: concepts and applications in coastal marine habitats. San Diego (CA): Academic Press. 401p.

Shirley EAC. 1981. A distribution-free method for analysis of covariance based on ranked data. Appl Stat 30:158-162.

Sokal RR, Rohlf FJ. 1995. Biometry. 3rd ed. New York (NY): W.H. Freeman.

Thomas L, Krebs CJ. 1997. A review of statistical power analysis software. Bull Ecol Soc Am 78:126-139.

Trippel EA, Harvey HH. 1991. Comparison of methods used to estimate age and length of fishes at sexual maturity using populations of white sucker (*Catostomus commersoni*). Can J Fish Aquat Sci 48:1446-1495.

Underwood AJ. 1993. The mechanics of spatially replicated sampling programmes to detect environmental impacts in a variable world. Austral J Ecol 18:99-116.

Underwood AJ. 1997. Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge (UK): Cambridge University Press. 504 pp.

**Appendix 1: Step-by-Step Guidance through Statistical Procedures**

- A1.1 Identifying Immature Fish
- A1.2 Summary Statistics
- A1.3 Analysis of Variance
- A1.4 Analysis of Covariance (ANCOVA)
- A1.5 Non-parallel Slopes in Analysis of Covariance
- A1.6 Non-parametric ANCOVA
- A1.7 Issues with the Range of the Covariate
- A1.8 A priori Power Analyses
- A1.9 Post hoc Power Analyses

## Appendix 1: Step-by-Step Guidance through Statistical Procedures

The following provides statistical background and step-by-step guidance through the statistical procedures required for the environmental effects monitoring (EEM) fish survey. This background material and the step-by-step procedures are meant as general guidance, and can be adapted to the particular statistical software package procedures that are being used. Examples are taken from different data sets from pulp and paper EEM cycles to illustrate concepts where possible.

Analysis of covariance (ANCOVA) can be performed as multiple linear regression with indicator variables to represent sites. In an analysis with a reference (ref.) and an exposure (exp.) site, data can be fit to the regression model

*y* = *β*_{0} + *β*_{1}*x*_{1} + *β*_{2}*x*_{2} + *β*_{3}(*x*_{1} · *x*_{2})

(1)

where *y* is the response, *x*_{1} is the covariate, *x*_{2} is an indicator variable for treatment (e.g., 0 for reference and 1 for exposure), and *x*_{1} · *x*_{2} is a covariate by treatment interaction term which is equal to the product of the covariate and the indicator variable for each observation. This model fits the data to two regression lines with distinct intercepts and slopes, namely *y* = *β*_{0} + *β*_{1}*x*_{1}for the reference site and *y* = (*β*_{0} + *β*_{2}) + (*β*_{1} + *β*_{3})*x*for the exposure site. A test for parallel regression slopes is equivalent to testing the significance of the coefficient of the *x*_{1} · *x*_{2} interaction term (i.e., a test of whether of *β*_{3} = 0). If this coefficient is not significant (at the α = 0.05 level of significance), the data can be described by two parallel lines with distinct intercepts. This model is

*y* = *β*_{0} + *β*_{1}*x*_{1} + *β*_{2}*x*_{2}

(2)

The test for differences in the response between treatments can proceed with (2). This test is equivalent to testing whether the two regression lines have equal intercepts (i.e., a test of whether *β*_{2} = 0). If there is no significant difference in response between treatments, the data can be represented by a single regression line without the *β*_{2} term.

Thus, analyzing data using ANCOVA is equivalent to fitting the data to (1) to assess parallel slopes, and testing for differences among sites is equivalent to testing the significance of the *β*_{2} in (2). Comparisons to critical effect sizes are made by comparing the percentage difference in adjusted means (mean response adjusted to factor out differences in the covariate values) to predetermined critical effect sizes. This percentage difference can be easily calculated from (2). The coefficient *β*_{2} in (2) is the vertical distance between the two regression lines (i.e., the difference in intercepts) and can be converted into a percentage difference in the responses variable as

% difference = (10^{β2} − 1) · 100%

(3)

when the response variable is log-transformed. The adjusted means can be calculated by evaluating (2) using the grand mean of the covariate (average covariate value over all sites) for *x*_{1} and using the appropriate indicator value for *x*_{2} to obtain each adjusted mean if desired.

### A1.1 Identifying Immature Fish

- Calculate gonadosomatic index (GSI) = gonad weight / body weight x 100. Immature fish can typically be identified as those with GSI < 1%.
- Plot gonad weight vs. body weight. Immature fish can usually be quickly identified.

Figure A1-1 illustrates a data set with several immature fish. A line representing GSI = 1% is added to help identify immature fish.

**Figure A1-1:** A plot of gonad weight vs. body weight for female *Catostomus macrocheilus*. Line represents GSI = 1% (text description)

Some fish species do not spawn every year. Some fish will not invest energy into reproduction every year. These species can be easily identified from plots of gonad weight vs. body weight where the data form two different groups corresponding to the spawning fish and non spawning fish. When a line of GSI = 1% is added to the plot, the spawning and non-spawning fish can be easily distinguished. See Figure A1-2.

**Figure A1-2:** A plot of gonad weight vs. body weight for female Lota lota. Line represents GSI = 1% (text description)

### A1.2 Summary Statistics

- Separate data by species, sex and site (e.g., reference or exposure).
- Plot each data set using a box plot and examine for obvious data entry errors or any unusual observations.

Box plots for the length variable for female *Catostomus commersoni* are shown in Figure A1‑3. The box plot in A reveals an unusually long fish at the exposure site. A review of field notes and comments in the spreadsheet indicate that this fish was exceptionally longer than all other fish. This observation lies considerably far outside the range of values for “Length” and may be considered an outlier.

**Figure A1-3:** Box plots for female *Catostomus commersoni* by site

A. Outlier detected in exposure site. B. Outlier is removed. (text description)

- Calculate and present summary statistics in a table.

Species | Sex | Site | N | Mean | SD* | SE** | Min | Max |
---|---|---|---|---|---|---|---|---|

Catostomus commersoni | F | Exp | 39 | 437.49 | 24.57 | 3.93 | 395 | 496 |

Catostomus commersoni | F | Ref | 40 | 432.18 | 31.46 | 4.97 | 357 | 510 |

Catostomus commersoni | M | Exp | 39 | 405.36 | 19.72 | 3.16 | 367 | 448 |

Catostomus commersoni | M | Ref | 39 | 405.00 | 18.00 | 2.88 | 369 | 448 |

Etheostoma exile | F | Exp | 33 | 3.7492 | 0.349 | 0.0440 | 3.0 | 4.5 |

Etheostoma exile | F | Ref | 31 | 3.7129 | 0.556 | 0.0999 | 2.8 | 5.2 |

Etheostoma exile | M | Exp | 37 | 3.5973 | 0.295 | 0.0485 | 3.0 | 4.1 |

Etheostoma exile | M | Ref | 26 | 3.5346 | 0.277 | 0.0543 | 3.1 | 4.1 |

* Standard deviation

** Standard error

### A1.3 Analysis of Variance

- Test all variables for normality.
- Test all variables for homogeneity of variances.
- Provide the statistical tests used and the p-value of the tests.
- If statistical assumptions are seriously violated or are violated and sample sizes are unequal, consider using a non-parametric alternative to analysis of variance (ANOVA) (e.g., Kruskal-Wallis test).
- Provide means (and medians if using non-parametrics) and pooled SD, as well as the test p-value.
- Plot residuals and check for outliers. Observations with studentized residuals of magnitude greater than 4 warrant investigation and potential removal. If any outliers are removed, provide both an analysis with all data and one with outlier(s) removed.

**“Weight” - female Catostomus commersoni**

Normality (tested using Anderson-Darling test)

*Catostomus commersoni*exposure fish

*Catostomus commersoni*reference fish

Homogeneity of variances (tested using Levene’s test)

*Catostomus commersoni*fish

Statistical assumptions are met, therefore proceed with analysis of variance

Response: Weight

Results:

Pooled SD = 248.6

**“Age” - female Catostomus commersoni**

Normality (tested using Anderson-Darling test)

*Catostomus commersoni*exposure fish

*Catostomus commersoni*reference fish

Homogeneity of variances (tested using Levene’s test)

*Catostomus commersoni*fish

Assumption of normality was not met for reference fish. Sample sizes are 40 (ref) and 39 (exp). The sample sizes are approximately equal and the assumptions are not strictly violated. Either the parametric ANOVA or a non-parametric alternative to ANOVA may be used. Here we use the non-parametric Kruskal-Wallis test.

Results:

**“Length” - female Catostomus commersoni**

Residual plot – studentized residuals vs. order (order data are entered in spreadsheet)

Outliers are typically regarded as observations with magnitude > 4 and can be easily identified in this plot.

**Figure A1-4:** A plot of studentized residual vs. observation order (in spreadsheet) for the ANOVA on length for female *Catostomus commersoni* (text description)

### A1.4 Analysis of Covariance (ANCOVA)

- Plot the response variable vs. covariate for all sites.
- Inspect plot for a linear trend and appropriate overlap of covariate values.
- Inspect plot for outliers--calculate studentized residuals from ANCOVAmodel.
- Consider removing outliers with magnitude > 4 (studentized residual).
- Test residuals for normality (each regression line).
- Test residuals for homogeneity of variances (among regression lines).
- Test homogeneity of regression slopes--fit data to regression model with interaction term and test significance of interaction term. Provide coefficient of determination “R
^{2}” for the regression model. - Test for differences in the response--fit data to regression model without interaction term and test significant of the site (treatment) term. Provide coefficient of determination “R
^{2}” for the regression model and the pooled SD (of the residuals). - Provide adjusted means for each site. Also take the anti-log of the mean if log‑transformed data were used.
- Calculate the percent difference, calculated as a percent of the reference site (using anti-logs of adjusted means).

**“Condition” - male Rhinichthys cataractae**

**Figure A1-5:** A plot of log(body weight) vs. log(length) for male *Rhinichthys cataractae*. Data are fit to two distinct regression lines, one for each site (text description)

Overlap of covariate values seems appropriate and there is a linear trend.

Normality (tested using Anderson-Darling test)

*Rhinichthys cataractae*exposure residuals

*Rhinichthys cataractae*reference residuals

Homogeneity of variances (tested using Levene’s test)

*Rhinichthys cataractae*residuals

Homogeneity of regression slopes

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.9212

*β*_{3} not significant (p-value = 0.337), thus there is no evidence of non-parallel slopes.

Test for differences in the response

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}

^{2}= 0.9199

*β*_{2} is significant (p-value = 0.0001), thus there is a significant difference in weight between sites.

Adjusted mean for reference weight: 1.3113 g

Adjusted mean for exposure weight: 1.4496 g

(Means are anti-logged to obtain original units when log-transformed--the anti-log of x is 10^{x} if the transformation was log base 10.)

Pooled SD = 0.0420164

Percent difference = 10.54% (calculated as percent of reference using adjusted means)

### A1.5 Non-parallel Slopes in Analysis of Covariance

**Method 1**

**“Relative gonad weight” – male***Catostomus commersoni*

**Figure A1-6:** A plot of log (gonad weight) vs. log(body weight) for male *Catostomus commersoni*. Data are fit to two distinct regression lines, one for each site (text description)

Overlap of covariate values seems appropriate and there is a linear trend. One observation warrants investigation in the exposure group.

A plot of the studentized residuals does not reveal any observations with extremely large magnitudes. See Figure A1-7.

**Figure A1-7:** A plot of studentized residual vs. log(body weight) for male *Catostomus commersoni* data fit to the interaction model *y* = *β*_{0} + *β*_{1}*x*_{1} + *β*_{2}*x*_{2} + *β*_{3}(*x*_{1} · *x*_{2}) (text description)

Normality (tested using Anderson-Darling test)

*Catostomus commersoni*exposure residuals

*Catostomus commersoni*reference residuals

Homogeneity of variances (tested using Levene’s test)

*Catostomus commersoni*residuals

Homogeneity of regression slopes

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.7710

*β*_{3} significant (p-value = 0.014), thus there is evidence of non-parallel slopes.

Assess influence by plotting Cook’s distance vs. the covariate.

**Figure A1-8:** A plot of Cook’s distance vs. log(body weight) for male *Catostomus commersoni* data fit to the interaction model *y* = *β*_{0} + *β*_{1}*x*_{1} + *β*_{2}*x*_{2} + *β*_{3}(*x*_{1} · *x*_{2}) (text description)

One observation in the exposure group has a large Cook’s distance. Remove and test assumptions again.

Normality (tested using Anderson-Darling test)

*Catostomus commersoni*exposure residuals

*Catostomus commersoni*reference residuals

Homogeneity of variances (tested using Levene’s test)

*Catostomus commersoni*residuals

Homogeneity of regression slopes

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.7841

*β*_{3} not significant (p-value = 0.205), thus there is no evidence of non-parallel slopes.

Continue with procedure.

**Method 2**

**“Condition” - male***Catostomus catostomus*

**Figure A1-9:** A plot of log(body weight) vs. log(length) for male *Catostomus catostomus*. Data are fit to two distinct regression lines, one for each site (text description)

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.8530

*β*_{3} significant (p-value = 0.036), thus there is evidence of non-parallel slopes, but R^{2} > 0.8, thus fit parallel model and compare coefficients of determination.

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}

^{2}= 0.8450

R^{2} for parallel model is also > 0.8 and is less than 0.02 (i.e. 2 percentage points) less than R^{2} for interaction model. Thus use parallel model to describe data and continue with analysis.

**Method 3**

**Figure A1-10a:** A plot of log(gonad weight) vs. log(body weight) for male *Catostomus catostomus*. Data are fit to two distinct regression lines, one for each site (text description)

Homogeneity of regression slopes

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.4695

*β*_{3} significant (p-value = 0.036), a plot of Cook’s distance vs. the covariate reveals no influential points, and R^{2} < 0.8 thus application of method 2 cannot be attempted.

- Determine the maximum and minimum values of the range of the covariate for each site.
- Calculate the predicted values of the response for each site (regression line) at these two values of the covariate.
- Calculate a percentage difference (calculated as exposure – reference, expressed as a percentage of reference) at the two values of the covariate.

**Figure A1-10b:** The data from Figure 10a but with the minimum and maximum values of the range of overlap of the covariate between sites identified (text description)

Covariate values 2.4314 and 2.7782

For 2.4314: predicted values for the response are 1.0949 (ref) and 1.1544 (exp).

For 2.7782: predicted values for the response are 1.4963 (ref) and 1.4883 (exp).

Thus percent differences are calculated to be (after taking the anti-log of the predicted response values) 14.69% and –1.84% for the covariate values of 2.4314 and 2.7782, respectively. These will be the estimates of the effects for smaller and larger fish, respectively, and can be compared to a critical effect size.

### A1.6 Non-parametric ANCOVA

**“Relative gonad weight” – female Catostomus commersoni**

**Figure A1-11:** A plot of log(gonad weight) vs. log(body weight) for female *Catostomus commersoni*. Data are fit to two distinct regression lines, one for each site (text description)

Distribution of covariate values for two sites are not very similar.

Sample sizes = 26 (ref) and 25 (exp)

*Catostomus commersoni*exposure residuals

*Catostomus commersoni*reference residuals

Homogeneity of variances (tested using Levene’s test)

*Catostomus commersoni*residuals

- Only the assumption of homogeneity of variances is not met--sample sizes are almost equal, so parametric ANCOVA could be used--or the non-parametric ANCOVA on the ranks of the data.

**Non-parametric ANCOVA on the ranks**

Response: Gonad weight ranks

Covariate: Body weight ranks

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}+

*β*

_{3}(

*x*

_{1}·

*x*

_{2})

^{2}= 0.8299

*β*_{3} not significant (p-value = 0.364), thus there is no evidence of non-parallel slopes.

Test for differences in the response

*y*=

*β*

_{0}+

*β*

_{1}

*x*

_{1}+

*β*

_{2}

*x*

_{2}

^{2}= 0.8268

*β*_{2} is significant (p-value < 0.0001), thus there is a significant difference in gonad weight between sites.

Comparisons to critical effect sizes can sometimes be made by calculating a percentage difference using the adjusted mean ranks. This percent difference is simply the difference in adjusted mean ranks, calculated as exposure – reference, expressed as a percentage of the reference adjusted mean rank. The adjusted mean ranks can be calculated by evaluating Equation 2 (in the statistical background discussion at the beginning of this appendix) using the mean covariate rank for *x*_{1} and using the appropriate indicator value for *x*_{2} if the regression approach to ANCOVA is being used.

Adjusted mean for reference gonad weight rank: 30.8660

Adjusted mean for exposure gonad weight rank: 20.9394

Pooled SD = 6.31325 (ranks)

Percent difference = -31.16% (calculated as percent of reference using adjusted means using ranks)

Note: parametric ANCOVA will give a percent difference of -28.90%.

### A1.7 Issues with the Range of the Covariate

**Range of covariate values not similar between sites**

- Look for a subset of the data where there is good overlap in the covariate values for each site. For example consider the data set for male
*Pleuronectes americanus*relative gonad weight in Figure A1-12a. The range of covariate for the reference and exposure site is quite different where the reference has several smaller fish. We can take a subset of the data (exclude fish with log(length) < 1.375) and obtain a data set with similar ranges of the covariate with good overlap. The analysis can be performed on this subset of the data (data set illustrated in Figure A1-12b). An analysis with all the data may be performed for comparison purposes but caution should be used in interpreting the results of the analysis using all the data.

**Figure A1-12a:** A plot of log(body weight) vs. log(length) for male *Pleuronectes americanus*. Data are fit to two distinct regression lines, one for each site (text description)

**Figure A1-12b:** A plot of log(body weight) vs. log(length) for male *Pleuronectes americanus*. A subset of the data in Figure 12a using only fish with log(length) > 1.375 (text description)

**Covariate observed only at a few values**

- Figure A1-13 is an example of a data set where the covariate is only observed at a few values of the covariate. These data sets are typical for weight-at-age analyses for small-bodied fish but may arise with other data sets. ANCOVA may be inappropriate.
- Perform a one-way ANOVA on body weight (factor: site) for fish aged 1.
- Perform a one-way ANOVA on body weight (factor: site) for fish aged 2.
- If sample sizes for an age group are too small for analysis, provide means and sample sizes.

**Figure A1-13:** A plot of log(body weight) vs. age for female *Fundulus heteroclitus*. Data are fit to two distinct regression lines, one for each site (text description)

### A1.8 A priori Power Analyses

**“Age” female Perca flavescens**

We would like to determine what sampling effort is required to detect a 25% difference in age for female *Perca flavescens*. The following data are available from the fish survey from the previous cycle at the same mill. See section 8.6.2.1 for further explanations and definition of terms.

Species | Sex | Site | N | Mean | SD | SE | Min | Max |
---|---|---|---|---|---|---|---|---|

Perca flavescens | F | Exposure | 30 | 4.100 | 1.094 | 0.200 | 3 | 8 |

Perca flavescens | F | Reference | 29 | 3.759 | 1.300 | 0.241 | 2 | 7 |

- Suppose the probability of type I error (α) and the probability of type II error (β) are chosen to be 0.05 and 0.2, respectively (this is for illustrative purposes only; for most cases in the EEM program, type I and type II error should be set equal; α=β).
- The coefficient of variation for the reference site can be calculated to be
- COV = 1.300/3.759 · 100 = 34.58%.

- Our critical effect size (CES) is 25%.
- We will start with n = 20 and solve the following iteratively

for estimated n ()- = 2(t
_{α}+ t_{β})^{2}(COV/CES)^{2}

- = 2(t
- Using n = 20, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.093 and t_{β}= 0.861

[t_{α}calculated as two-tailed with (n-1)df, t_{β}calculated as one-tailed with (n‑1)df] - = 2(2.093 + 0.861)
^{2}(34.58/25)^{2}= 33.39 = 34 - Using n = 34, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.035 and t_{β}= 0.853 - = 2(2.035 + 0.853)
^{2}(34.58/25)^{2}= 31.9 = 32 - Using n = 32, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.040 and t_{β}= 0.853 - = 2(2.040 + 0.853)
^{2}(34.58/25)^{2}= 32.03 = 32 - = n = 32

Approximately 32 female *Perca flavescens* will be needed from each site (reference and exposure) to detect a difference of 25% in age.

**“Relative gonad weight” female Perca flavescens**

We would like to determine what sampling effort is required to detect a 25% difference in relative gonad weight for female *Perca flavescens*. The following results are available from the ANCOVA from the previous cycle at the same mill. See section 8.6.2.1 for further explanations and definition of terms.

- Sample sizes: 29 (ref), 30 (exp)
- Pooled SD (of residuals) using log transformed data = 0.0743033 (this is also equal to the square root of the mean square error term obtained from fitting the data to the parallel slope ANCOVAmodel).
- Suppose the probability of type I error (α) and the probability of type II error (β) are chosen to be 0.05 and 0.2, respectively (this is for illustrative purposes only; for most cases in the EEM program, type I and type II error should be set equal; α=β).
- SD
_{Z}= 0.0743033 - CES
_{Z}= log(0.25+1) = log(1.25) = 0.09691 - We will start with n = 20 and solve the following iteratively

for estimated n ()- = 2(t
_{α}+ t_{β})^{2}(SD_{z}/CES_{z})^{2}

- = 2(t
- Using n = 20, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.093 and t_{β}= 0.861

[t_{α}calculated as two-tailed with (n-1)df, t_{β}calculated as one-tailed with (n‑1)df] - = 2(2.093 + 0.861)
^{2}(0.0743033/0.09691)^{2}= 10.26 = 11 - Using n = 11, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.228 and t_{β}= 0.879 - = 2(2.228 + 0.879)
^{2}(0.0743033/0.09691)^{2}= 11.34 = 12 - Using n = 12, α = 0.05, and β = 0.2 we obtain t
_{α}= 2.201 and t_{β}= 0.876 - = 2(2.201 + 0.876)
^{2}(0.0743033/0.09691)^{2}= 11.13 = 12 - = n = 12

Approximately 12 female *Perca flavescens* will be needed from each site (reference and exposure) to detect a difference of 25% in relative gonad weight.

### A1.9 Post hoc Power Analyses

**“Condition” female Catostomus commersoni**

In this example, a non-significant result is obtained for the condition effect endpoint for female *Catostomus commersoni*. An example of a post hoc power analysis is performed to determine the power of the test to detect the CES. We are given the following output from the ANCOVA using log (body weight) as the response variable and log(length) as a covariate. See section 8.6.2.2 for further explanations and definition of terms.

Source | Sum-of-Squares | Degrees of Freedom | Mean-Square | F-Ratio | p-value |
---|---|---|---|---|---|

log(length) | 0.121190 | 1 | 0.119427 | 119.32 | <0.001 |

Site | 0.000046 | 1 | 0.000046 | 0.05 | 0.831 |

Error | 0.027025 | 27 | 0.001001 | ||

Total | 0.148261 | 29 |

- The CES for condition is 10% of the reference mean (converted to CESZ in the following formula), and the probability of type I error (α) initially used for the above ANCOVA in this example was 0.05 (0.831 is greater than 0.05, so the exposure vs. reference comparison was declared as being non-significant).
- The power formula is

- CES
_{Z}= log(*f*+1) , where*f*= CES represented as a fraction of the reference mean

So CES_{Z}= log(0.1+1) = log(1.1) = 0.0413926 - n = 15 for each site, thus t
_{α}= 2.145 - t
_{β}= 1.538 corresponds to β = 0.1486 - Power = 1 - β = 0.8514

The test had a moderate level of power (Power = 0.8514) to detect a difference of 10%, although the type II error (β = 0.1486) was not low enough to be equal to type I error (α = 0.05), and the EEMprogram recommendation is that a should be set equal to β (risk to industry set equal to risk to the environment). Thus, preferably a higher α value should have been used for this ANCOVA before declaring non-significance, so that α = β. In this particular case, the ANCOVA p-value of 0.831 was quite high, so the exposure vs. reference comparison would still have been declared non-significant, even if α had been set as high as β = 0.1486 (p = 0.831 > 0.1486). Rerunning the power analyses at higher α levels would result in lower β levels. So further post hoc power analysis would not be necessary in this case to be confident with declaring non-significance. Future monitoring efforts at this facility should use some combination of greater sample sizes and/or higher α values, so as to ensure sufficiently high power to detect the CES of interest. Thus, the study proposal for the next round of monitoring should include appropriatea priori power analyses.

**Appendix 2: Graphical and Tabular Representation of Data**

**List of Figures:**

**Figure A2-1:**Decisional flow chart outlining the various processes data should go through for fish and benthic effect endpoints and linking these to tabular and graphical examples present in this appendix**Figure A2-2:**Box plots of descriptive statistics for age by fish species and sex**Figure A2-3:**Analysis of Variance (ANOVA) results of mean age of fish taken from reference and exposure areas (mean and standard error)**Figure A2-4:**Linear regression of fish liver weight at body weight as an example of effect summary for liver weight or gonad size**Figure A2-5:**Descriptive statistics for benthic invertebrate total density using a control/impact design**Figure A2-6:**Analysis of Variance (ANOVA) results of benthic invertebrate total density using a control/impact design**Figure A2-7:**Plot of benthic invertebrate total density vs. distance from diffuser using a simple gradient design

**List of Tables:**

**Table A2-1:**Descriptive statistics for age by fish species and sex**Table A2-2:**Analysis of Variance (ANOVA) results for fish age by species and sex**Table A2-3:**Analysis of Covariance (ANCOVA) results for liver weight at body weight by sex and by species**Table A2-4:**Fish result summary table**Table A2-5:**Descriptive statistics for total benthic invertebrate density**Table A2-6:**Analysis of Variance (ANOVA) results for benthic invertebrate total density**Table A2-7:**Summary of all benthic invertebrate descriptive statistics**Table A2-8:**Summary table of all benthic invertebrate results**Table A2-9:**Overall summary of site effects

## Appendix 2: Graphical and Tabular Representation of Data

**Figure A2-1:** Decisional flow chart outlining the various processes data should go through for fish and benthic effect endpoints and linking these to tabular and graphical examples present in this appendix (text description)

**Figure A2-2:** Box plots of descriptive statistics for age by fish species and sex (text description)

**Figure A2-3:** Analysis of Variance (ANOVA) results of mean age of fish taken from reference and exposure areas (mean and standard error) (text description)

Note: Bars with different letters **are** significantly different. The vertical bar is a mean and the horizontal variance bars represent the standard errors.

**Figure A2-4:** Linear regression of fish liver weight at body weight as an example of effect summary for liver weight or gonad size – male *Catostomus* sp. (text description)

**Figure A2-5:** Descriptive statistics for benthic invertebrate total density using a control/impact design (text description)

**Figure A2-6:** Analysis of Variance (ANOVA) results of benthic invertebrate total density using a control/impact design (text description)

Note: Bars with the same letters are **not** significantly different. Values reported are means and associated standard errors.

**Example : Simple Gradient Design**

**Figure A2-7:** Plot of benthic invertebrate total density vs. distance from diffuser using a simple gradient design (text description)

Location | Mean | SD* | SE** | (n) | Max. | Min. |
---|---|---|---|---|---|---|

Reference | 4.23 | 1.16 | 0.19 | 39 | 8.00 | 3.00 |

Exposure | 4.93 | 0.93 | 0.14 | 46 | 6.00 | 3.00 |

* Standard deviation

** Standard error

Source of Variation | Sum of Squares (SS) | Degrees of freedom (df) | Mean Square (MS) | F-Ratio | p-value | sig. at p < 0.05 |
---|---|---|---|---|---|---|

Between groups | 0.072624 | 1 | 0.072624 | 11.25004 | 0.001202 | Yes |

Within groups | 0.535802 | 83 | 0.006455 | |||

Total | 0.608426 | 84 |

Area | N | Log-transformed | R-Squared (R ^{2}) | Slopes Different? | Log-transformed | Means Different? | Antilog LSM | Magni-tude differ- ence | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Slope | SD | (p-value) | sig. atp < 0.05 | Least Squares Means (LSM) | SD | (p-value) | sig. at p < 0.05 | |||||

Reference | 39 | 1.3 | 0.0547 | 0.727 | - | - | 0.95 | 0.0624 | - | - | 8.93 | - |

Exposure | 38 | 1.03 | 0.0632 | 0.5135 | 0.212 | no | 1.04 | 0.0616 | 0.001 | yes | 10.96 | 23% |

Trophic Level | Species | Sex | Response | Effect Endpoint | Effect? | Direction | Magnitude |
---|---|---|---|---|---|---|---|

Fish | Catostomus catostomus(Longnose Sucker) | F | Survival | Age | NA | ||

Energy Use | Weight-at-age | NA | |||||

Relative gonad weight | No | ||||||

Energy Storage | Condition | Yes | ref < exp | 7%^{2} | |||

Relative liver weight | Yes | ref < exp | 21%^{2} | ||||

M | Survival | Age | NA | ||||

Energy Use | Weight-at-age | NA | |||||

Relative gonad weight | NO | ||||||

Energy Storage | Condition | Yes | ref < exp | 6%^{2} | |||

Relative liver weight | Yes | ref < exp | 23%^{2} | ||||

Cottus ricei(Spoonhead Sculpin) | F | Survival | Age | Yes | ref < exp | 8%^{2} | |

Energy Use | Weight-at-age | Yes | ref < exp | 52%^{1} | |||

Relative gonad weight | Yes | ref < exp | 57%^{2} | ||||

Energy Storage | Condition | Yes | ref < exp | 31%^{1} | |||

Relative liver weight | Yes | ref < exp | 62%^{2} | ||||

M | Survival | Age | Yes | ref < exp | 8%^{2} | ||

Energy Use | Weight-at-age | Yes | ref < exp | 106%^{1} | |||

Relative gonad weight | Yes | ref < exp | 11%^{2} | ||||

Energy Storage | Condition | Yes | ref < exp | 18%^{2} | |||

Relative liver weight | Yes | ref < exp | 52%^{2} |

^{1} ANCOVA is done and the slopes **are** significantly different. See Appendix 1 for guidance on calculating magnitude of effect.^{2} Magnitude calculated by comparing the adjusted means between reference and exposed sites (if data were log-transformed; magnitude is calculated on the antilog of the adjusted means). In this case, the slopes **are not** significantly different and so the adjusted means can be compared directly. (The equation is: [(exposed adjusted mean – reference adjusted mean)/ reference adjusted mean] x 100).

Location | Mean | SD | SE | (n) | Max. | Min. |
---|---|---|---|---|---|---|

Reference | 4986.85 | 2011.21 | 899.44 | 5 | 8062.02 | 2442.73 |

Near-field* | 8062.73 | 2135.30 | 954.94 | 5 | 10360.31 | 5535.88 |

Far-field* | 7685.04 | 3205.63 | 1433.60 | 5 | 11027.65 | 2717.00 |

* Near-field = high effluent exposure; Far-field = low effluent exposure

Source of Variation | SS | df | MS | F | p-value | Sig. at p < 0.05 |
---|---|---|---|---|---|---|

Between Groups | 2.81E+07 | 2 | 1.41E+07 | 2.236 | 0.15 | NO |

Within Groups | 7.55E+07 | 12 | 6.29E+06 | |||

Total | 1.04E+08 | 14 |

EffectEndpoint | Location | Mean | SD | SE | (n) | Max. | Min. |
---|---|---|---|---|---|---|---|

Taxa | Ref | 19.60 | 1.52 | 0.68 | 5 | 21 | 18 |

Near-field* (NF) | 21.20 | 1.48 | 0.66 | 5 | 23 | 19 | |

Far-field* (FF) | 20.00 | 1.87 | 0.84 | 5 | 23 | 18 | |

Density | Ref | 4986.85 | 2011.21 | 899.44 | 5 | 8062.02 | 2442.73 |

NF | 8062.73 | 2135.30 | 954.94 | 5 | 10360.31 | 5535.88 | |

FF | 7685.04 | 3205.63 | 1433.60 | 5 | 11027.65 | 2717.00 | |

Simpson’s Evenness | Ref | 0.77 | 0.03 | 0.014 | 5 | 0.82 | 0.75 |

NF | 0.81 | 0.03 | 0.015 | 5 | 0.86 | 0.78 | |

FF | 0.67 | 0.04 | 0.017 | 5 | 0.71 | 0.63 | |

Bray-Curtis | Ref | 0.24 | 0.11 | 0.05 | 5 | 0.42 | 0.14 |

NF | 0.37 | 0.10 | 0.05 | 5 | 0.48 | 0.24 | |

FF | 0.44 | 0.09 | 0.04 | 5 | 0.55 | 0.34 |

* Near-field = high effluent exposure; Far-field = low effluent exposure

Trophic Level | Effect Endpoint | Effect? | Direction | Magnitude^{1} | |
---|---|---|---|---|---|

% | SD | ||||

Benthos | Density | No | |||

Number of taxa | No | ||||

Simpson’s Evenness | Yes Yes | ref > FF NF > FF | 13% 17% | 3.33 4.67 | |

Bray-Curtis | Yes | ref < FF | 83% | 1.82 |

^{1} For a control impact design, magnitude of effect should be reported as the % difference from the reference area [(exposure mean – reference mean)/reference mean] x 100 and standardized for the SD of the reference area (exposure mean – reference mean) / reference SD

Trophic Level | Species | Sex | Response | Effect Endpoint | Effect? | Direction | Magnitude^{3} | |
---|---|---|---|---|---|---|---|---|

% | SD | |||||||

Fish | Catostomus catostomus(Longnose Sucker) | F | Survival | Age | NA | |||

Energy use | Weight-at-age | NA | ||||||

Relative gonad weight | No | |||||||

Energy storage | Condition | Yes | ref < exp | 7%^{2} | ||||

Relative liver weight | Yes | ref < exp | 21%^{2} | |||||

M | Survival | Age | NA | |||||

Energy use | Weight-at-age | NA | ||||||

Relative gonad weight | No | |||||||

Energy storage | Condition | Yes | ref < exp | 6%^{2} | ||||

Relative liver weight | Yes | ref < exp | 23%^{2} | |||||

Cottus ricei(Spoonhead Sculpin) | F | Survival | Age | Yes | ref < exp | 8%^{2} | ||

Energy use | Weight-at-age | Yes | ref < exp | 52%^{1} | ||||

Relative gonad weight | Yes | ref < exp | 57%^{2} | |||||

Energy storage | Condition | Yes | ref < exp | 31%^{1} | ||||

Relative liver weight | Yes | ref < exp | 62%^{2} | |||||

M | Survival | Age | Yes | ref < exp | 8%^{2} | |||

Energy use | Weight-at-age | Yes | ref < exp | 106%^{1} | ||||

Relative gonad weight | Yes | ref < exp | 11%^{2} | |||||

Energy storage | Condition | Yes | ref < exp | 18%^{2} | ||||

Relative liver weight | Yes | ref < exp | 52%^{2} | |||||

Benthos | Density | No | ||||||

Number of taxa | No | |||||||

Simpson’s Evenness | Yes | ref > FF | 13% | 3.33 | ||||

Yes | NF > FF | 17% | 4.67 | |||||

Bray-Curtis | Yes | ref < FF | 83% | 1.82 |

^{1} ANCOVA is done and the slopes **are** significantly different. See Appendix 1 for guidance on calculating magnitude of effect.^{2} Magnitude calculated by comparing the adjusted means between reference and exposed sites (if data were log-transformed; magnitude is calculated on the antilog of the adjusted means). In this case, the slopes **are not** significantly different and so the adjusted means can be compared directly. (The equation is: [(exposed adjusted mean – reference adjusted mean)/ reference adjusted mean] x 100).^{3} For benthic invertebrate community surveys following a control impact designs, magnitude of effect should be reported as the % difference from the reference area [(exposure mean – reference mean)/reference mean] x 100 and standardized for the SD of the reference area (exposure mean – reference mean) / reference SD.

## Appendix 3: Case study – ANCOVA and Power Analysis for Fish Survey

A case example is provided to demonstrate the application of some of the methods recommended above. The data were collected during a previous adult fish survey at a Canadian pulp mill. In this particular example, the mill was a bleached-kraft operation and discharged effluent into a lake receiving environment. The reference area was an adjacent bay of the lake exhibiting similar natural habitat characteristics as the near-field (high effluent exposure) area, and did not receive any allochthonous discharges. The sentinel fish species selected for the survey was White Sucker (*Catastomus commersoni*). The sample sizes approximated those recommended for the fish survey, with the exception of males at the near-field area:

Near-field (high effluent exposure) area | Reference area | |

Males | 12 | 22 |

Females | 26 | 24 |

The data set included information on length, weight, liver weight, sex, gonad weight, and age of male and female adult sucker. Fecundity estimates were not available.

After the initial step of ensuring the data set was free of transcription errors, the mean and standard deviation of each variable were calculated per sex and sampling area (Table A3‑1). Mathematical procedures were conducted separately for males and females. Normal probability plots were generated for each variable (per sex and area) to identify extreme outliers and to assess normality of the data. Examination of these plots did not indicate obvious extreme outliers with the exception of one male from the near-field area. Residual plots from the Analysis of Variance (ANOVA) / ANCOVA models can also be used to inspect the data.

*Near-field = High effluent exposure

As previously outlined, most of the results were derived using ANCOVA. For the purposes of illustration, a detailed description is provided for the parameter size (length)‑at-age for female White Sucker (one of the supporting endpoints for the fish survey). For these calculations, both length and age were log_{10}-transformed.

The first step is to conduct the preliminary test of equality of slopes. The model statement for this analysis of size-at-age is:

*log(length) = constant + area + log(age) + area*log(age)*,

where the interaction term *area*log(age)* represents the test for equality of slopes of the area regression lines, and *log(age)* is the covariate. From the ANCOVA table, it is evident that the interaction term, *area*log(age)*, is not significant (P=0.376) (Table A3-2a). This tells us that the slopes of the regression lines for each area can be treated as being approximately parallel. It also tells us that the interaction term can be dropped from the model, and we can proceed to the ANCOVA model:

*log(length) = constant + area + log(age)*,

where *area* represents the test for differences in adjusted means. The mean square error (mean square error) from the resulting ANCOVA table will provide the estimate of variability (mean square error=0.00033) for length-at-age (Table A3-2b). While conducting the above analyses, the residuals from the preliminary and ANCOVA model can be saved for the purpose of assessing the assumptions of normality and homogeneity of variance.

Source | Sum-of-Squares (SS) | Degrees of Freedom (df) | Mean-Square (MS) | F-Ratio | p-Value |
---|---|---|---|---|---|

Area | 0.00086 | 1 | 0.00086 | 2.62639 | 0.11193 |

Log(age) | 0.02126 | 1 | 0.02126 | 64.58474 | < 0.0001 |

Area*Log(age) | 0.00026 | 1 | 0.00026 | 0.79780 | 0.37640 |

Error | 0.01514 | 46 | 0.00033 |

Source | SS | df | MS | F-Ratio | p-Value |
---|---|---|---|---|---|

Area | 0.01240 | 1 | 0.01240 | 37.57576 | < 0.0001 |

Log(age) | 0.02167 | 1 | 0.02167 | 65.66667 | < 0.0001 |

Error | 0.01541 | 47 | 0.00033 |

**Calculation of Sample Size**

To calculate sample size, the Z-value power equation described earlier can be used. As a reminder, the equation is:

*n* = 2 (Z_{α} + Z_{β})^{2} (SD/CES)^{2} + 0.25Z_{α}^{2}

The square root of the mean square error from the ANCOVA model substitutes for the SD in the power equation. The critical effect size (CES) refers to the effect or difference in the parameter one wishes to detect. For the purpose of this example and the remaining parameters of the case study, samples sizes were calculated for a CES of 5, 10, 20, 50 and 100% (i.e., differences between areas).

Many of the parameters calculated for the fish survey, are typically log-normally distributed and require log transformations. To calculate sample sizes, SD and CES should be expressed in logarithms. It should be noted, however, not to add 1 to values before logging for the purposes of the fish environmental effects monitoring (EEM) survey because it has undesirable effects on the calculated variances when changing measurement units. A difference in logarithms is equivalent to multiplying or dividing by some factor. For example, if the difference in log length between two areas is 0.301, then the fish from one area is twice the length (antilog 0.301 = 2) as fish from the other area. In the following table, CES has been expressed in logarithms with the corresponding antilog; these values of CES correspond roughly with those used for untransformed data:

Critical effect size (logarithm) | 0.0212 | 0.0414 | 0.0792 | 0.176 | 0.301 |

Critical effect size (antilog) | 1.05 | 1.10 | 1.20 | 1.50 | 2.00 |

% increase* | 5 | 10 | 20 | 50 | 100 |

% decrease^{†} | 5 | 9 | 17 | 33 | 50 |

* In exposure area vs reference

^{†} In exposure area vs reference

Therefore for length-at-age (log_{10} data):

- SD = (mean square error)
^{0.5}= (0.00033)^{0.5}= 0.01817 - Z
_{α(2)}(2-tailed test) = 1.96 - Z
_{β}(1-tailed test) = 1.282 - CES = 5% (see above table)

*n* = 2 (1.96+1.28)^{2} (0.01817/0.0212)^{2} + 0.25(1.96)^{2}

*n* = 16.4 (or, rounding up, n=17)

Similarly for the remaining effect sizes, the estimated sample sizes (i.e., number of fish to be sampled per area) would be:

CES | 5% | 10% | 20% | 50% |

n | 17 | 5 | 3 | 2 |

The estimate of variability and sample size calculations for gonad weight, liver weight and condition for female and male sucker were calculated in the same fashion as described for length-at-age (Table A3-3). In all but one case, the slopes of the reference/near‑field regression lines were equal and the mean square errors from the ANCOVA model were used as the estimate of variability. For male White Sucker, the slopes of the regressions of log(weight) on log(length) (i.e., condition) were not equal among areas (P=0.0068). To investigate whether the one possible outlier (male, near-field) influenced the ANCOVA, it was rerun without this data point. In this case, the regressions were homogeneous between areas. This was partially a consequence of the low sample size (i.e., an increased influence of an outlier on the regression) and should be noted when reporting the data.

For mean age, the mean square error from the one-way ANOVA was used to estimate the variability (Table A3-3).

The final results of the sample size calculations (Table A3-3) indicate that the maximum numbers of fish needed to be collected from each area were approximately 703 males and 738 females to detect a 5% difference between areas (CES), 185 males and 194 females to detect a 10% difference, 52 males and 54 females to detect a 20% difference, and 12 males and 12 females to detect a 50% difference. Among all the parameters, mean age was the most variable and required the highest sample size to detect differences.

Endpoint | Sex | Model | Log | Estimated Sample Size (number of fish/area) | |||
---|---|---|---|---|---|---|---|

Mean Square Error | CES=5% | CES=10% | CES=20% | CES=50% | |||

Length-at-age | Male | ANCOVA | 0.00014 | 8 | 3 | 2 | 2 |

Female | ANCOVA | 0.00033 | 17 | 5 | 3 | 2 | |

Weight-at-age | Male | ANCOVA | 0.00211 | 100 | 27 | 9 | 3 |

Female | ANCOVA | 0.00295 | 139 | 38 | 11 | 3 | |

Condition | Male | ANCOVA^{1} | N/A | - | - | - | - |

Female | ANCOVA | 0.00100 | 48 | 14 | 5 | 2 | |

Liver Weight | Male | ANCOVA | 0.00994 | 466 | 123 | 35 | 8 |

Female | ANCOVA | 0.00626 | 294 | 78 | 22 | 6 | |

Gonad Weight | Male | ANCOVA | 0.00881 | 413 | 110 | 31 | 7 |

Female | ANCOVA | 0.01013 | 475 | 126 | 35 | 8 | |

Mean Age | Male | ANOVA | 0.01499 | 703 | 185 | 52 | 12 |

Female | ANOVA | 0.01574 | 738 | 194 | 54 | 12 |

^{1} Preliminary analysis (test of slopes) conducted as first step to ANCOVA was significant (i.e., slopes not parallel).

## Figures and Tables

**Table 8-1** outlines the expected precision and summary statistics of required fish survey measurements. Measurement requirements to be assessed include length, total body weight, age, gonad weight, egg size, fecundity, weight of liver or hepatopancreas, abnormalities, and sex. Each measurement requirement is accompanied by its expected precision, and a reporting of summary statistics.

**Table 8-2** outlines the fish survey effect indicators and endpoints for various study designs. The primary effect indicators include growth, reproduction, condition and survival. Each effect indicator is accompanied by the identification of supporting endpoints in the case of each of the three study designs: standard survey, non-lethal sampling, and a study based on wild molluscs.

**Table 8-3** provides the supporting endpoints to be used for supporting analyses. Effect indicators--energy use and energy storage--are aligned accordingly with supporting endpoints and the necessary statistical procedures.

**Table 8-4** provides a summary of effect endpoints analyzed using ANCOVA. The primary effect endpoints include condition, relative liver weight, relative gonad weight, weight-at-age, size-at-age, and relative fecundity. Each effect endpoint is aligned with a response variable and a covariate.

**Table 8-5** exhibits the Fish Tissue effect with supporting endpoints and statistical procedures. Variables and statistical procedures of the effect endpoint and supporting endpoints are identified.

**Table 8-6** outlines the statistical procedure used to determine an effect for each of the seven study designs. Each study design is aligned accordingly with its statistical procedure.

**Table 8-7** provides the sample sizes required to detect a difference of plus or minus two standard deviations for given values of α(0.01, 0.05 and 0.10) and 1-*ß*(0.99, 0.95, 0.90 and 0.80).

**Figure A1-1** is a scatter plot showing gonad weight vs. body weight for female Catostomus macrocheilus. The X axis represents body weight, while the Y axis represents gonad weight. The line in the graph represents GSI = 1%

**Figure A1-2** is a scatter plot illustrating gonad weight vs. body weight for female Lota lota. The X axis represents body weight, while the Y axis represents gonad weight. The line in the graph represents GSI = 1%.

**Figure A1-3** displays box plots for female Catostomus commersoni by site. Image A shows an outlier detected in the exposure site, while in image B, the outlier is removed.

**Table A1-1** outlines the summary statistics for “length”. Each species is aligned with different factors, including sex, site, number, mean length, standard deviation, standard error, minimum length, and maximum length.

**Figure A1-4** is a scatter plot showing studentized residual vs. observation order for the ANOVA on length for the female Catostomus commersoni. The X axis represents the observation order, while the Y axis represents the studentized residual.

**Figure A1-5** is a scatter plot illustrating body weight vs. length for male Rhinichthys cataractae. The X axis represents length, while the Y axis represents body weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-6** is a scatter plot displaying gonad weight vs. body weight for male Catostomus commersoni. The X axis represents body weight, while the Y axis represents gonad weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-7** is a scatter plot illustrating studentized residual vs. body weight for male Catostomus commersoni data fit to the interaction model *y* = *ß*_{0} + *ß*_{1}*x*_{1} + *ß*_{2}*x*_{2} + *ß*_{3}(*x*_{1} · *x*_{2}). The X axis represents body weight, while the Y axis represents studentized residual.

**Figure A1-8** is a scatter plot showing Cook’s distance vs. body weight for male Catostomus commersoni data fit to the interaction model *y* = *ß*_{0} + *ß*_{1}*x*_{1} + *ß*_{2}*x*_{2} + *ß*_{3}(*x*_{1} · *x*_{2}). The X axis represents body weight, while the Y axis represents Cook’s distance.

**Figure A1-9** is a scatter plot illustrating body weight vs. length for male Catostomus catostomus. The X axis represents length, while the Y axis represents body weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-10a** is a scatter plot showing gonad weight vs. body weight for male Catostomus catostomus. The X axis represents body weight, while the Y axis represents gonad weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-10b** is a scatter plot showing the data from Figure 10a but with the minimum and maximum values of the range of overlap of the covariate between sites identified.

**Figure A1-11** is a scatter plot illustrating gonad weight vs. body weight for female Catostomus commersoni. The X axis represents body weight, while the Y axis represents gonad weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-12a** is a scatter plot showing body weight vs. length for male Pleuronectes americanus. The X axis represents length, while the Y axis represents body weight. Data are fit to two distinct regression lines, one for each site.

**Figure A1-12b** is a scatter plot illustrating body weight vs. length for male Pleuronectes americanus, providing subset of the data in Figure 12a using only fish with a length greater than 1.375. The X axis represents length, while the Y axis represents body weight.

**Figure A1-13** is a scatter plot showing body weight vs. age for female Fundulus heteroclitus. The X axis represents age, while the Y axis represents body weight. Data are fit to two distinct regression lines, one for each site.

**Figure A2-1** is a decisional flow chart outlining the various processes data should go through for fish and benthic effect endpoints. The flowchart links these processes to tabular and graphical examples present in this appendix.

**Figure A2-2** shows box plots of descriptive statistics for age by fish species and sex. The box plot range is between the 30th and 70th percentiles, while the error bar range is between the 10th and 90th percentiles. The mean age is represented by a dashed line, and the median age is represented by a full line.

**Figure A2-3** is a graph illustrating the Analysis of Variance (ANOVA) results of mean age of fish taken from reference and exposure areas. The vertical bar represents the mean age, while the horizontal variance bars represent the standard errors.

**Figure A2-4** is a graph showing the linear regression of fish liver weight at body weight as an example of effect summary for liver weight or gonad size, using the example of a male Catostomus. The X axis represents log of fish weight, while the Y axis represents log of liver weight.

**Figure A2-5** is a graph providing descriptive statistics for benthic invertebrate total density using a control/impact design. The box plot range is between the 30th and 70th percentiles, while the error bar range is between the 10th and 90th percentiles. The mean density is represented by a dashed line, and the median density is represented by a full line.

**Figure A2-6** is a graph illustrating the Analysis of Variance (ANOVA) results of benthic invertebrate total density using a control/impact design. Values reported are means and associated standard errors.

**Figure A2-7** is a scatter plot displaying benthic invertebrate total density vs. distance from diffuser using a simple gradient design. The X axis represents the distance from the diffuser (in kilometres), while the Y axis represents the total density (number of individuals per square meter).

**Table A2-1** outlines descriptive statistics for age by fish species and sex. Using the example of the female Cottus sp. species, information on location, mean age, standard deviation, standard error, number of specimens, maximum age and minimum age is provided.

**Table A2-2** outlines the Analysis of Variance (ANOVA) results for fish age by species and sex. Sources of variation include between groups, within groups, and the total. Other information, such as the sum of squares, degrees of freedom, the mean square, the F-ratio, p-value and significance at p smaller than 0.05 is provided.

**Table A2-3** outlines the Analysis of Covariance (ANCOVA) results for liver weight at body weight by sex and by species. Using the male Catostomus sp. species, ANCOVA results regarding reference area and exposure area is provided. For each area, the number of specimens, the slope of regression line (log-transformed data), standard deviation, and r square are provided. To answer the question are the slopes different, the significance of the p value indicated is compared to a p value of 0.05. As well, the least squares means (or LSM) and standard deviation for each of the two areas is provided, as are the antilog of the LSM and finally the magnitude difference in percentage. The second question: are the means (LSM) different is answered by comparing the p value indicated to a p value of 0.05.

**Table A2-4** is a fish result summary table. Information provided includes trophic level (fish), species, sex, response, effect endpoint, effect, direction, and magnitude.

**Table A2-5** outlines descriptive statistics for total benthic invertebrate. Primary descriptive statistics include location, mean, standard deviation, standard error, number of samples, maximum density, and minimum density.

**Table A2-6** provides the Analysis of Variance (ANOVA) results for benthic invertebrate total density. Sources of variation include between groups, within groups, and the total. Other information, such as the sum of squares, degrees of freedom, the mean square, the F-ratio, the p-value and significance of reported p value compared to 0.05 is provided.

**Table A2-7** outlines a summary of all benthic invertebrate descriptive statistics. Primary descriptive statistics include effect endpoint, location, standard deviation, standard error, and number of samples. The means, maximums and minimums are provided for taxa, density, Simpson’s Evenness, and Bray-Curtis.

**Table A2-8** illustrates a summary table of all benthic invertebrate results. The trophic level (benthos), effect endpoint, effect, direction, and magnitude are provided. Effect endpoints include density, number of taxa, Simpson’s Evenness, and Bray-Curtis.

**Table A2-9** provides an overall summary of site effects. The trophic level (first fish, then benthos), for fish: species, sex, response, effect endpoint, effect, direction, and magnitude are provided; for benthos: effect endpoint, effect, direction and magnitude are provided.

**Table A3-1** outlines the mean, standard deviation (SD) and sample size (n) of measurements recorded on White Sucker (Catostomus commersoni) during the example survey. Recorded measurements are expressed for each fish sex and area, while recorded measurements include fork length, body weight, gonad weight, liver weight, and age.

**Table A3-2** illustrates the size-at-age (length) vs. age for the female White Sucker using ANCOVA. The information is provided in two tables. Table A outlines a preliminary analysis f the equality of slopes. Sources include area, log (age), area multiplied by log (age), and error. Information identified for each source includes sum-of-squares, degrees of freedom, mean-square, F-ratio, and p-value. Table B is a model of an ANCOVA table (test of adjusted means). Sources include area, log (age), and error. Like table A, each source in table B is aligned with its sum-of-squares, degrees of freedom, mean-square, F-ratio, and p-value.

**Table A3-3** provides an example survey of the numbers of fish needed to detect significant differences in fish endpoints among areas using the model mean square error as the estimate of variability. Sample sizes were calculated for a range of CESs with power=0.90 and α=0.05. All data were log-transformed.

- Date modified: