Learning Objectives
- Review Type I and Type II error in experimental design
- Introduce commonly used functions in the
pwrpackage
- Use power analysis to calculate Type II error rate and statistical power
- Use power analysis to determine the number of experimental replicates needed to reduce Type I and Type II error to acceptable levels
Power analysis is an important tool in the early stages of experimental design. It allows the researcher to determine the power of an experiment to detect a treatment effect of a given size. It also permits the researcher to determine the number of experimental replicates that will be needed to detect a treatment effect with a reasonable degree of confidence.
In many cases, a power analysis may help to eliminate experimental designs that are unfeasible (would require too many replicates) or unlikely to succeed (lack the statistical power to detect a treatment effect).
We know that the following four components of an experiment are related:
1. Sample size (n)
2. Effect size
3. Type I error rate (\(\alpha\))
4. Power (\(1 - \beta\))
Given any three of these terms, you can use power analysis to calculate the fourth. For example, if you know the relative effect size you would like to detect, and you have determined acceptable levels of Type I and Type II error, you can calculate the sample size that would be needed to detect a treatment effect.
Log in to Google Drive Desktop, Open your
Environmental Data Analysis project in RStudio, and create
a new script named “A4_PowerAnalysis”.
Use the library() function to load the pwr
package.
# Load pwr package
library(pwr)
The pwr package was developed by St\(\acute{e}\)phane Champely, following best
practices outlined in Cohen (1988). It includes the following
functions:
| Function | Test |
|---|---|
| pwr.2p.test | two proportions (equal n) |
| pwr.2p2n.test | two proportions (unequal n) |
| pwr.anova.test | balanced one way ANOVA |
| pwr.chisq.test | chi-square test |
| pwr.f2.test | general linear model |
| pwr.p.test | proportion (one sample) |
| pwr.r.test | correlation |
| pwr.t.test | t-tests (one sample, 2 sample, paired) |
| pwr.t2n.test | t-test (two samples with unequal n) |
The argument sig.level defaults to 0.05 in all of these
functions, but you can use sig.level = x.xx to specify a
different value for \(\alpha\). If you
wish to calculate (\(\alpha\)), you can
include the argument sig.level = NULL.
The following table provides values that Cohen (1988) suggests for “small,” “medium,” and “large” effect sizes for common statistical tests.
| Test | small | medium | large |
|---|---|---|---|
| tests for proportions (p) | 0.20 | 0.50 | 0.80 |
| tests for means (t) | 0.20 | 0.50 | 0.80 |
| chi-square tests (chisq) | 0.10 | 0.30 | 0.50 |
| correlation test (r) | 0.10 | 0.30 | 0.50 |
| anova (anov) | 0.10 | 0.25 | 0.40 |
| general linear model (f2) | 0.02 | 0.15 | 0.35 |
You can also return a suggested effect size using the function
cohen.ES(). For example, the following code will return a
suggested “small” effect size for a t-test:
cohen.ES(test = "t", size = "small")
##
## Conventional effect size from Cohen (1982)
##
## test = t
## size = small
## effect.size = 0.2
In this exercise, we will focus on power analyses for t-tests, ANOVA, and linear regression.
The t-test is one of the simplest statistical tests. It tests the null hypothesis that the mean of a group is equal to the mean of another group (or a value).
For t-tests, use the function:
pwr.t.test(n = , d = , sig.level = , power = , type = c("two.sample", "one.sample", "paired"))
If your groups include different numbers of samples, you should use the
function:
pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = )
The alternate hypothesis for a two-tailed t-test
does not make any assumption about the direction of effect. For example,
my alternate hypothesis could be that an experimental treatment group
will differ from a control group, but I may not want to assume that the
mean of my treatment group will be greater than or less than the control
group. The functions for t-tests in R, as well as for power analyses for
t-tests, will always default to a two-tailed test. The alternate
hypothesis for a one-tailed t-test does assume a
direction of effect. For example, my alternate hypothesis could be that
an experimental treatment group will have a greater mean value for some
variable than the control group. You can specify that you wish to run a
one-tailed test by adding an argument for alternative. In
my example, I would add the argument
alternative = "greater".
The effect size for a t-test is calculated as
\(d = \frac{|\mu_{1}-\mu_{2}|}{\sigma}\)
where
\(\mu_1\) = mean of group 1
\(\sigma\) = standard deviation
\(\mu_2\) = mean of group 2
Let’s say we’re interested in whether there is a difference in body mass between male and female smallmouth bass in Lake Champlain. We think it would be reasonable to catch 30 individuals of each sex. Previous studies have detected differences of ~0.5 kg between sexes, so we would like to be able to detect that size difference. We also need to make a guess about the population standard deviation to calculate an effect size. If you have no idea, one rule of thumb is to calculate the difference in the total range and divide by 4. If previous studies have noted that the typical range of adult fish mass is ~1-5 kg, we can estimate that the population standard deviation is ~(5-1)/4. How powerful is our sampling plan?
# Calculate effect size
# d = difference/standard deviation
d <- 0.5/((5-1)/4)
# Calculate power of sampling plan
pwr.t.test(n = 30, d = d, sig.level = 0.05)
##
## Two-sample t test power calculation
##
## n = 30
## d = 0.5
## sig.level = 0.05
## power = 0.4778965
## alternative = two.sided
##
## NOTE: n is number in *each* group
Our power to detect a significant difference in mass between male and female fish at \(\alpha = 0.05\) is 0.48, or <50%. That’s not great!
How many individuals would we need to catch to achieve a power (or detection probability) of 0.80?
# Calculate effect size needed for power = 0.80
pwr.t.test(d = d, sig.level = 0.05, power = 0.80)
##
## Two-sample t test power calculation
##
## n = 63.76561
## d = 0.5
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
This tells us that we would need to catch at least 64 fish of each sex to have an 80% chance of detecting a 0.5 kg difference in fish mass, given our estimates of the population variance.
A one-sample t-test tests the null hypothesis that the mean of a group is equal to some value. For example, I may want to test whether the mean value of denitrification (nitrogen removal) in my wetland is greater than 0 (a one-tailed test). How many measurements would I need to achieve a 90% chance to detect a “small” difference from zero?
# Determine effect size for "small" difference
cohen.ES(test = "t", size = "small")
##
## Conventional effect size from Cohen (1982)
##
## test = t
## size = small
## effect.size = 0.2
# Calculate effect size needed for power = 90%
pwr.t.test(d = 0.2, sig.level = 0.05, power = 0.90, alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 215.4562
## d = 0.2
## sig.level = 0.05
## power = 0.9
## alternative = greater
According to this analysis, I would need to collect ~215 samples. I know that this is not feasible, so I may need to change the goal of my study or adjust my standards for Type I error. How does my result change if I accept a slightly less conservative Type I error rate?
# Calculate effect size needed for power = 90%, with alpha = 0.10
pwr.t.test(d = 0.2, sig.level = 0.10, power = 0.90, alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 165.0649
## d = 0.2
## sig.level = 0.1
## power = 0.9
## alternative = greater
165 samples is more reasonable, but still too many. What if I also relaxed my expectation with respect to the effect size I would like to detect.
#Calculate effect size needed for power = 90%, alpha = 0.10, medium effect size
pwr.t.test(d = 0.4, sig.level = 0.10, power = 0.90, alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 41.90565
## d = 0.4
## sig.level = 0.1
## power = 0.9
## alternative = greater
This analysis reveals a sample size of 42 would be needed, which I know would be feasible, but also time-consuming and expensive. I now have to decide whether the sampling would be worth the effort, given my detection probability and chance of committing a Type I error. Power analysis always involves a trade-off among the effect size of interest, acceptable risk of a Type I versus Type II error, and feasibility of sampling.
Challenge 1: Two-sample t-test
Determine the probability of detecting a “medium” difference in height between male and female students in the classroom if you have only 15 male students and 8 female students to measure. Assume that you are willing to accept a Type I error rate of \(\alpha = 0.10\).
# Determine what a "medium" effect size would be
cohen.ES(test = "t", size = "medium")
##
## Conventional effect size from Cohen (1982)
##
## test = t
## size = medium
## effect.size = 0.5
# Calculate power
pwr.t2n.test(n1 = 15, n2 = 8, d = 0.5, sig.level = 0.10)
##
## t test power calculation
##
## n1 = 15
## n2 = 8
## d = 0.5
## sig.level = 0.1
## power = 0.2977344
## alternative = two.sided
Challenge 2: One-sample t-test
I would like to determine whether the poverty level for census tracts that contain a waste incinerator exceed the national poverty level of 12.3%. How many census tracts would I need to sample to have an 80% chance to detect a difference of 2% if I know that the values will range ~2-25%. Assume that you are willing to accept a Type I error rate of \(\alpha = 0.05\).
Calculate d.
Determine sample size. (Round your answer up to the nearest whole number.)
# Calculate d
d <- 2/((25-2)/4)
# Determine sample size
pwr.t.test(d = d, sig.level = 0.05, power = 0.80, alternative = "greater", type = "one.sample")
##
## One-sample t test power calculation
##
## n = 52.48232
## d = 0.3478261
## sig.level = 0.05
## power = 0.8
## alternative = greater
An Analysis of Variance (ANOVA) tests the null hypothesis of no difference among groups. Or it tests the null hypothesis that partitioning samples into groups will NOT help explain variance among samples.
For ANOVA, use the function:
pwr.anova.test(k = , n = , f = , sig.level = , power = )
where k = number of groups and f = effect
size.
The effect size for ANOVA is calculated as
\(f = \sqrt{\frac{\sum_{i=1}^{k} p_i * (\mu_{i}-\mu)^{2}}{\sigma^{2}}}\)
where
\(p_i\) = \(n_i / N\)
\(n_i\) = number of observations in group i
\(N\) = total number of observations
\(\mu_i\) = mean of group i
\(\mu\) = grand mean
\(\sigma^{2}\) = within-groups error variance
Let’s return to our example of fish survivorship in 4 different salt concentrations. ANOVA will test the null hypothesis that mean survivorship does not differ among groups. How many experimental replicates would we need per treatment to have an 80% chance of detecting a “medium” effect, if we accept a Type I error rate of \(\alpha = 0.05\)?
# Determine what a "medium" effect would be
cohen.ES(test = "anov", size = "medium")
##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = medium
## effect.size = 0.25
# Determine sample size needed
pwr.anova.test(k = 4, f = 0.25, sig.level = 0.05, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 44.59927
## f = 0.25
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
We would need 45 experimental replicates per group, 180 total! That’s certainly not feasible given the number of mesocosms at our disposal. But what if we wanted to make a first approximation so we can apply for money to run a larger experiment? Let’s say we set up two groups and increase the doseage in the treatment group to create a “large” effect. Let’s say we also accept a less conservative Type I error rate of \(\alpha = 0.10\).
# Determine what a "large" effect would be
cohen.ES(test = "anov", size = "large")
##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = large
## effect.size = 0.4
# Determine sample size needed
pwr.anova.test(k = 2, f = 0.4, sig.level = 0.1, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 2
## n = 20.03178
## f = 0.4
## sig.level = 0.1
## power = 0.8
##
## NOTE: n is number in each group
Now we would only need 20 experimental replicates per group, 40 total. We may be able to manage that if we can borrow some mesocosms. Otherwise, we may need to reexamine the goals of our experiment.
Challenge 3: Single-factor ANOVA
You wish to compare growth in the first month after germination for oak seeds collected from four different geographic regions in a common-garden experiment at 30\(^\circ\)C. How many oak seeds would you need to collect from each region to have an 80% chance of detecting a “small” difference in growth at \(\alpha = 0.05\)? Round your answer up to the nearest whole number.
# Determine what a "small" effect would be
cohen.ES(test = "anov", size = "small")
##
## Conventional effect size from Cohen (1982)
##
## test = anov
## size = small
## effect.size = 0.1
# Determine sample size needed
pwr.anova.test(k = 4, f = 0.1, sig.level = 0.05, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 273.5429
## f = 0.1
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
A linear regression tests the null hypothesis that there is no relationship between one continuous variable and another continuous variable. Or that variation in one continuous variable will not explain variation in another continuous variable.
For linear models, use the function:
pwr.f2.test(u =, v = , f2 = , sig.level = , power = )
where u and v are numerator and denominator
degrees of freedom. In other words, u refers to the number
of coefficients you are calculating (minus the intercept). If your model
has only one independent variable, then you are only calculating the
slope, and \(u = 1\). If your model has
more than one independent variable, then \(u
>1\). \(v = n - u - 1\).
Therefore, if your model includes only one independent variable, then
\(v = n - 2\).
The effect size for linear models is calculated as
\(f^{2} = \frac{R^{2}}{1 - R^{2}}\)
where \(R^{2}\) = squared multiple correlation
Let’s say that we wish to determine whether annual precipitation can help us predict annual net primary production (ANPP) for the National Bison Range in Montana. We expect that it should explain ~50% of the variance in ANPP (\(R^{2} = 0.5\)). How many years worth of data would we need to have a 90% chance to detect such a relationship if we accept a Type I error rate of \(\alpha = 0.01\)?
# Determine effect size
f2 <- 0.5 / (1 - 0.5)
# Determine denominator degrees of freedom v
pwr.f2.test(u = 1, f2 = f2, sig.level = 0.01, power = 0.90)
##
## Multiple regression power calculation
##
## u = 1
## v = 16.50303
## f2 = 1
## sig.level = 0.01
## power = 0.9
# Calculate n = v + 2
17 + 2
## [1] 19
So we would need at least 19 years worth of data to detect that strong of a relationship. That’s a relatively short period of time. Obviously, if the relationship is less strong \(R^2 < 0.50\) or the variation in precipitation among years is too small, then we would need to include a greater number of years. Power analysis gives you a place to start, but you may also need to factor in knowledge of variation in the system you are studying.
Challenge 4: Linear regression
You wish to determine if you can use the population density of humans in the watershed to predict discharge of nitrate in river water. You have reliable data from 20 watersheds in North America. You expect that population density should explain ~70% of the variation in nitrate discharge. What is your power to detect a significant relationship between these variables? Assume that you are willing to accept a Type I error rate of \(\alpha = 0.01\).
# Determine effect size
f2 <- 0.7 / (1 - 0.7)
# Determine denominator degrees of freedom (v)
v <- 20 - 2
# Calculate power
pwr.f2.test(u = 1, v = v, f2 = f2, sig.level = 0.01)
##
## Multiple regression power calculation
##
## u = 1
## v = 18
## f2 = 2.333333
## sig.level = 0.01
## power = 0.9998269
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). LEA.
Ford, C. 2018. Getting started with the pwr package. https://cran.r-project.org/web/packages/pwr/vignettes/pwr-vignette.html
Kabacoff, R.I. 2017. Power Analysis. Quick-R by DataCamp. https://www.statmethods.net/stats/power.html