Learning Objectives
- Review major goals of multivariate analysis and ordination, with a focus on analysis of ecological communities.
- Introduce primary ordination tools in the
veganpackage: Principal components Analysis (PCA), Canonical Correspondence Analysis (CCA), and Non-metric Multidimensional Scaling (NMDS).
- Use environmental variables to explain differences in community structure.
- Construct biplots to visualize the results of ordinations.
Ordination is a process by which multidimensional data are projected in fewer dimensions. Today we will focus on ordination as a tool of community ecology, but the same techniques can be used to reduce the number of dimensions for many kinds of data. Communities consist of many species, and we may want to describe changes in community structure, or compare the structures of communities to each other, in ways that are easy to visualize.
We begin with some measure of abundance of each species as a different axis of variation in your data. Points that are close together along these axes are more similar in terms of the abundance of that species, and points that are further are more different in terms of the abundance of that species. Ordination processes rotate and reshape your cloud of data points such that the maximum amount of variation is shown in the fewest number of dimensions (typically 2 or 3 dimensions).
Ordination as a projection of 3D data onto 2D surface. Source: http://ordination.okstate.edu/overview.htm
vegan packageThe vegan package is not the only package available to
perform multivariate analyses, but it is one of the most commonly used,
especially in the field of community ecology. It is also one of the most
extensively tested and documented, and many packages that have been
developed for graphing multivariate data rely on vegan
objects. For all of these reasons, we will focus on tools in the
vegan package today.
Open your “Environmental Data Analysis” project and begin a new script. Save it as something sensible like “A13 Multivariate Analysis.”
Load the packages you will need today. We will use the
vegan package for data analysis. We will use the
tidyverse package to reshape data into the form that
vegan requires. Computers in Hudson 032 should
have all of these packages installed. If you are working on your
personal computer, you will need to install the vegan
package.
library(tidyverse)
library(vegan)
Today we will use the Portal small-mammal survey data that we have
used throughout the course. I have added a column and rearranged the
variables to facilitate our activity today. Download the
surveys_complete_v2.csv from the course Brightspace page to
your data folder, and use read.csv() to save the data as
object surveys.
# Import data
surveys <- read.csv("data/surveys_complete_v2.csv")
# View structure of data
str(surveys)
## 'data.frame': 30463 obs. of 14 variables:
## $ record_id : int 845 1164 1261 1756 1818 1882 2133 2184 2406 3000 ...
## $ species_name : chr "Neotoma albigula" "Neotoma albigula" "Neotoma albigula" "Neotoma albigula" ...
## $ genus : chr "Neotoma" "Neotoma" "Neotoma" "Neotoma" ...
## $ species : chr "albigula" "albigula" "albigula" "albigula" ...
## $ taxa : chr "Rodent" "Rodent" "Rodent" "Rodent" ...
## $ month : int 5 8 9 4 5 7 10 11 1 5 ...
## $ day : int 6 5 4 29 30 4 25 17 16 18 ...
## $ year : int 1978 1978 1978 1979 1979 1979 1979 1979 1980 1980 ...
## $ plot_id : int 2 2 2 2 2 2 2 2 2 2 ...
## $ species_id : chr "NL" "NL" "NL" "NL" ...
## $ sex : chr "M" "M" "M" "M" ...
## $ hindfoot_length: int 32 34 32 33 32 32 33 30 33 31 ...
## $ weight : int 204 199 197 166 184 206 274 186 184 87 ...
## $ plot_type : chr "Control" "Control" "Control" "Control" ...
The vegan package requires an input matrix with each
species in the community as a different column and each observation as a
different row. I will first use group_by() to group the
data by species_name and my environmental variables of
interest. Then I will use count() to count the number of
individuals of each species, and use the spread() function
to reshape my data into the correct format.
matrix <- surveys %>%
# group by species and by environmental factors of interest
group_by(species_name, plot_id, plot_type) %>%
# count the number of individuals of each species
count() %>%
# spread the data such that each species has its own column
spread(key = species_name, value = n, fill = 0)
*Note that I added the fill = 0 argument in the spread
function. This indicates that I want missing values to be recorded as
absences, or 0s, rather than NAs.
Examine the first few rows of the matrix data frame to
make sure it looks the way you expect.
head(matrix)
| plot_id | plot_type | Chaetodipus baileyi | Chaetodipus penicillatus | Dipodomys merriami | Dipodomys ordii | Dipodomys spectabilis | Neotoma albigula | Onychomys leucogaster | Onychomys torridus | Perognathus flavus | Peromyscus eremicus | Peromyscus maniculatus | Reithrodontomys fulvescens | Reithrodontomys megalotis | Sigmodon hispidus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Spectab exclosure | 109 | 211 | 565 | 460 | 196 | 50 | 48 | 96 | 28 | 11 | 15 | 0 | 23 | 0 |
| 2 | Control | 167 | 175 | 528 | 289 | 119 | 169 | 59 | 180 | 14 | 137 | 35 | 1 | 66 | 11 |
| 3 | Long-term Krat Exclosure | 332 | 347 | 95 | 80 | 13 | 82 | 60 | 149 | 114 | 65 | 38 | 5 | 194 | 31 |
| 4 | Control | 95 | 113 | 1010 | 56 | 229 | 5 | 51 | 69 | 124 | 1 | 4 | 0 | 18 | 2 |
| 5 | Rodent Exclosure | 0 | 55 | 406 | 90 | 54 | 24 | 70 | 103 | 38 | 33 | 47 | 5 | 98 | 3 |
| 6 | Short-term Krat Exclosure | 238 | 192 | 188 | 201 | 85 | 14 | 39 | 140 | 100 | 66 | 26 | 7 | 79 | 8 |
Now you want to separate your matrix into a species matrix and a separate data frame of environmental variables.
# Select species columns, columns 4-16 of the matrix data frame
spp <- matrix[,4:16]
# Select environmental columns, columns 1-3 of the matrix data frame
env <- matrix[,1:3]
Principle Components Analysis (PCA) is a simple rotation of your data
in space, where distances among data points are preserved. The PCA will
rotate your data such that the maximum amount of variation is displayed
on axis 1, the maximum remaining variation that is uncorrelated to axis
1 is shown on axis 2, and so on. The “factor loadings”
will tell you the weighted combination of species that contribute to
each axis. We will use the function rda() to run a PCA on
our matrix of species.
# Create PCA object
pca <- rda(spp, scale = T)
# Plot results of pca as biplot of axis 1 and axis 2
plot(pca)
This plot was generated in using the default settings. We can make it
look a little nicer by adding type = "n" to our plot code
and adding the two components separately, using points to show the
different sites and text to show the different species.
# Set base plot
plot(pca, type = "n")
# Add sites to plot as points
points(pca, display = "sites")
# Add species names to plot
text(pca, display = "species")
This plot contains a lot of information.
Sites (points) that are far apart from each other horizontally differ
along the first principle component axis.
Sites that are far apart vertically differ along the second principle
component.
Species that are in opposite directions tend not to co-occur, while
species that group in the same direction do tend to co-occur.
For linear transformations like PCA (e.g., rotations of the
data), you can also visualize species as vectors
(arrows) pointing from the center of the ordination. To create that
plot, we will use the biplot() function.
biplot(pca, scaling = "symmetric", type = c("text", "points"))
The length of the vector indicates the strength of the species’
“loading” on an axis. Very short vectors mean that variation in the
occurrence of those species cannot be visualized well along these two
axes. To visualize variation in those species, you may need to choose a
different projection.
Vectors pointing in opposite directions indicate that species tend to be
negatively correlated on that axis. Arrows pointing in the same
direction indicate that species tend to be positively correlated on that
axis.
Challenge 1
Based on the results of the PCA biplot, which of the following statements is/are true regarding relationships between Dipodomys (kangaroo rat) captures and captures of individuals of other species?
a. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Reithrodontomys.
b. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Peromyscus.
c. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Onychomys.
d. Variation in the abundance of Dipodomys is not well represented in this biplot.
Find any pair of genera that tend to co-occur in the same trapping plots. Google those genera to learn a little more about them (briefly, do not spend more than ~5 min doing this). Why do you think these two genera tended to co-occur in the dataset?
The PCA has helped you to visualize the structure of your community. However, you may also want to investigate whether environmental factors may help explain differences in community structure. This set of questions is analogous to using a linear model to test the effect of one variable on another, only in this case, your response variable is a community matrix rather than a single variable.
The benefit to using CCA over PCA is that the rotation of your data is “constrained” such that it optimally describes variance in the community and the environmental variables at the same time. A PCA describes variance in the community and then superimposes the environmental variables.
The formula for a CCA is similar to a linear model
- Linear model formula:
mod <- lm(DepVar~IndepVar, data = data)
- CCA model formula:
ord <- cca(DepVarMatrix~IndepVar, data = data)
Let’s use CCA to test the hypothesis that community structure will differ among the various plot types.
# First I define my model with the species matrix as the dependent variable
# and plot type as the independent variable
cca <- cca(spp~plot_type, data = env)
# Call results of model
summary(cca)
##
## Call:
## cca(formula = spp ~ plot_type, data = env)
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 0.6094 1.0000
## Constrained 0.3433 0.5633
## Unconstrained 0.2661 0.4367
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CCA1 CCA2 CCA3 CCA4 CA1 CA2 CA3
## Eigenvalue 0.2555 0.0714 0.008849 0.007542 0.08325 0.06938 0.04099
## Proportion Explained 0.4193 0.1172 0.014521 0.012376 0.13661 0.11385 0.06726
## Cumulative Proportion 0.4193 0.5365 0.550974 0.563350 0.69996 0.81381 0.88107
## CA4 CA5 CA6 CA7 CA8 CA9
## Eigenvalue 0.02580 0.01509 0.01050 0.007897 0.004811 0.003623
## Proportion Explained 0.04234 0.02476 0.01723 0.012958 0.007894 0.005944
## Cumulative Proportion 0.92340 0.94817 0.96540 0.978354 0.986248 0.992192
## CA10 CA11 CA12
## Eigenvalue 0.002309 0.001511 0.0009379
## Proportion Explained 0.003789 0.002479 0.0015390
## Cumulative Proportion 0.995982 0.998461 1.0000000
##
## Accumulated constrained eigenvalues
## Importance of components:
## CCA1 CCA2 CCA3 CCA4
## Eigenvalue 0.2555 0.0714 0.008849 0.007542
## Proportion Explained 0.7443 0.2080 0.025775 0.021969
## Cumulative Proportion 0.7443 0.9523 0.978031 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
##
##
## Species scores
##
## CCA1 CCA2 CCA3 CCA4 CA1
## Chaetodipus penicillatus 0.31888 0.38714 -0.0126186 -0.11485 -0.17601
## Dipodomys merriami -0.41536 -0.10032 0.0771726 -0.02062 0.28224
## Dipodomys ordii -0.42062 -0.03984 -0.1857826 -0.07096 -0.35995
## Dipodomys spectabilis -0.46513 0.04062 -0.1621712 0.10599 0.22989
## Neotoma albigula 0.07580 0.07664 0.1076133 0.14505 -0.20920
## Onychomys leucogaster -0.02057 0.04218 -0.0348863 0.09295 0.24958
## Onychomys torridus 0.11634 0.15477 0.0537211 0.08161 -0.01676
## Perognathus flavus 0.56391 0.68460 0.0007958 0.10577 0.18462
## Peromyscus eremicus 0.66292 -0.43029 0.0147822 0.12819 -0.42604
## Peromyscus maniculatus 0.94824 -0.34986 -0.0709096 -0.02145 -0.36388
## Reithrodontomys fulvescens 0.80648 -0.39626 -0.1440087 0.49770 -0.42119
## Reithrodontomys megalotis 0.96707 -0.34614 -0.0189596 -0.06498 -0.44210
## Sigmodon hispidus 0.73863 0.64428 0.2906644 -0.39622 -0.10609
## CA2
## Chaetodipus penicillatus -0.147158
## Dipodomys merriami 0.035062
## Dipodomys ordii -0.491868
## Dipodomys spectabilis 0.046923
## Neotoma albigula -0.242248
## Onychomys leucogaster -0.145283
## Onychomys torridus -0.149976
## Perognathus flavus 0.187732
## Peromyscus eremicus 0.006578
## Peromyscus maniculatus 0.513197
## Reithrodontomys fulvescens 0.512437
## Reithrodontomys megalotis 0.554232
## Sigmodon hispidus -0.149857
##
##
## Site scores (weighted averages of species scores)
##
## CCA1 CCA2 CCA3 CCA4 CA1 CA2
## row1 -0.86982 0.30089 -4.56241 -1.99659 -0.83639 -0.72364
## row2 -0.22654 -0.20707 -0.46241 1.18599 -1.46004 -0.64505
## row3 1.26583 1.55841 0.32728 -2.01080 0.08548 -1.24631
## row4 -0.96072 0.33999 2.11716 0.69267 1.38073 0.83009
## row5 -0.04887 -0.52818 0.69537 0.67392 2.39221 -1.99047
## row6 0.30199 0.99450 -3.19205 -0.05102 -0.61179 -1.00587
## row7 2.73264 -2.86473 -1.50283 -1.99035 -2.14179 1.22723
## row8 -0.89126 0.50246 -5.02388 -2.42896 -0.91650 -1.37743
## row9 -1.07577 -0.26544 0.08036 -0.13545 0.84282 0.72921
## row10 3.13500 -4.09568 -3.31507 -1.64945 -2.09194 3.94301
## row11 -0.66007 -0.29282 1.37423 -1.25795 -0.02382 0.18207
## row12 -0.56887 -0.17809 1.52717 -0.30714 -0.41660 -0.17649
## row13 0.19463 0.60637 0.49328 2.94117 0.66686 -0.02304
## row14 -0.97503 -0.58345 4.68813 -0.19360 1.17521 0.66275
## row15 1.69602 1.73965 0.83102 0.36548 0.19794 -0.18943
## row16 1.74604 -2.37210 -1.39852 -3.23920 -0.55758 0.71204
## row17 -0.58692 -0.08597 1.38834 -0.47651 -0.29285 -0.02175
## row18 0.54446 0.84086 -0.03019 2.14133 0.37268 0.86662
## row19 2.29324 0.79824 -0.79291 -1.20354 -0.52060 1.31696
## row20 0.62459 0.61370 0.16806 3.52947 -0.37895 0.29837
## row21 2.09386 2.10880 1.02687 -1.86170 0.22398 1.15186
## row22 -0.93468 0.18294 0.01046 3.88735 1.27292 1.01307
## row23 2.87115 -3.91128 -1.72975 -1.12700 -1.93999 2.85809
## row24 0.99612 -2.48964 1.76890 3.01679 0.24076 -1.08022
##
##
## Site constraints (linear combinations of constraining variables)
##
## CCA1 CCA2 CCA3 CCA4 CA1 CA2
## row1 -0.9724 0.01881 -2.2499 -1.0695831 -0.83639 -0.72364
## row2 -0.7037 -0.05852 0.7744 -0.0552729 -1.46004 -0.64505
## row3 1.7422 1.52956 0.3156 -1.2859821 0.08548 -1.24631
## row4 -0.7037 -0.05852 0.7744 -0.0552729 1.38073 0.83009
## row5 1.4054 -2.21651 -0.1817 0.0005864 2.39221 -1.99047
## row6 0.4101 0.77013 -0.7124 2.0743174 -0.61179 -1.00587
## row7 1.4054 -2.21651 -0.1817 0.0005864 -2.14179 1.22723
## row8 -0.7037 -0.05852 0.7744 -0.0552729 -0.91650 -1.37743
## row9 -0.9724 0.01881 -2.2499 -1.0695831 0.84282 0.72921
## row10 1.4054 -2.21651 -0.1817 0.0005864 -2.09194 3.94301
## row11 -0.7037 -0.05852 0.7744 -0.0552729 -0.02382 0.18207
## row12 -0.7037 -0.05852 0.7744 -0.0552729 -0.41660 -0.17649
## row13 0.4101 0.77013 -0.7124 2.0743174 0.66686 -0.02304
## row14 -0.7037 -0.05852 0.7744 -0.0552729 1.17521 0.66275
## row15 1.7422 1.52956 0.3156 -1.2859821 0.19794 -0.18943
## row16 1.4054 -2.21651 -0.1817 0.0005864 -0.55758 0.71204
## row17 -0.7037 -0.05852 0.7744 -0.0552729 -0.29285 -0.02175
## row18 0.4101 0.77013 -0.7124 2.0743174 0.37268 0.86662
## row19 1.7422 1.52956 0.3156 -1.2859821 -0.52060 1.31696
## row20 0.4101 0.77013 -0.7124 2.0743174 -0.37895 0.29837
## row21 1.7422 1.52956 0.3156 -1.2859821 0.22398 1.15186
## row22 -0.7037 -0.05852 0.7744 -0.0552729 1.27292 1.01307
## row23 1.4054 -2.21651 -0.1817 0.0005864 -1.93999 2.85809
## row24 1.4054 -2.21651 -0.1817 0.0005864 0.24076 -1.08022
##
##
## Biplot scores for constraining variables
##
## CCA1 CCA2 CCA3 CCA4 CA1
## plot_typeLong-term Krat Exclosure 0.6525 0.572900 0.11822 -0.4816675 0
## plot_typeRodent Exclosure 0.5342 -0.842531 -0.06908 0.0002229 0
## plot_typeShort-term Krat Exclosure 0.1737 0.326267 -0.30183 0.8787877 0
## plot_typeSpectab exclosure -0.3636 0.007033 -0.84130 -0.3999433 0
## CA2
## plot_typeLong-term Krat Exclosure 0
## plot_typeRodent Exclosure 0
## plot_typeShort-term Krat Exclosure 0
## plot_typeSpectab exclosure 0
##
##
## Centroids for factor constraints
##
## CCA1 CCA2 CCA3 CCA4 CA1 CA2
## plot_typeControl -0.7037 -0.05852 0.7744 -0.0552729 0 0
## plot_typeLong-term Krat Exclosure 1.7422 1.52956 0.3156 -1.2859821 0 0
## plot_typeRodent Exclosure 1.4054 -2.21651 -0.1817 0.0005864 0 0
## plot_typeShort-term Krat Exclosure 0.4101 0.77013 -0.7124 2.0743174 0 0
## plot_typeSpectab exclosure -0.9724 0.01881 -2.2499 -1.0695831 0 0
The cca output gives you a lot of information. I want to point out just a few things. The Inertia is similar to variance in a linear model. In this case, the total inertia is 0.6094, and the constrained inertia (or the inertia explained by your model) is 0.3433, or 56% of the total. The eigenvalues are similar to slopes, but in this case, for a whole axis. You generally hope that your eigenvalues, and thus the relative importance of your axes, will decrease in magnitude after the first 2 or 3 axes, which seems to be the case here. The species scores tell your the relative contribution of each species to each axis.
You can run an ANOVA on the results of your CCA to test the
significance of your grouping variable using the function
anova(). In this case, the p-values are calculated using
permutation tests. A permutation test randomly
reshuffles your data a set number of times and comes up with a frequency
distribution; it then uses this distribution to test how likely your
data are, given the combinations that were possible.
# Run anova on cca object
ano <- anova(cca)
# Call results of anova
ano
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: cca(formula = spp ~ plot_type, data = env)
## Df ChiSquare F Pr(>F)
## Model 4 0.34332 6.1283 0.001 ***
## Residual 19 0.26611
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can also plot the results of your CCA. We will use base plotting
for most of today’s activity, and then I will provide a brief
introduction to how you can create similar plots using
ggplot2.
# We can plot the results of the cca
plot(cca)
As you can see, it’s hard to see what is going on in the default plot. Let’s try substituting points for text and adding ellipses to show differences among groups.
# Set up base plot
plot(cca, type = "p")
# Add ellipses to show differences among plot_type groups
ordiellipse(cca, env$plot_type, col=1:5, draw="polygon", label = T)
The default is to plot the ellipses as the standard deviation of each group. Therefore, the relative size of the ellipses tells you the relative amount of variation within each group.
Challenge 2
Pull up the help file for the
ordiellipse()function by typing?ordiellipse().
Use this help file to figure out how to:
1. Change the ellipses so that they enclose all of the observations of each group.
2. Plot the ellipses as lines rather than polygons.
What does this plot tell you about how well these groups are represented in this 2D projection? Are any groups not well represented?
So far all of the techniques we have used have been linear transformations, where distances between points are preserved when projecting into fewer dimensions. This is equivalent to running a parametric test (or linear model) like ANOVA or regression. As such, these techniques have all the same assumptions of parametric tests.
Assumptions of Linear Ordination
- Multivariate normal distribution
- Homogeneity of variances
- Linearity
- Independence
Data likely never meet all of these assumptions (particularly the multivariate normal distribution), but like parametric tests, linear multivariate tests are robust to minor violations of these assumptions. However, some data are not well suited to linear transformations. In these cases, it may be more appropriate to use a non-linear transformation to visualize changes in community structure. The most common of these is non-metric multidimensional scaling, or NMDS. NMDS uses a distance matrix and projects the data such that distances among points are maintained as much as possible. As with other ordinations, points that are close together are more similar with respect to species composition, while points that are further apart are more different with respect to species composition, and the axes represent a weighted combination of species abundances.
We will use the function metaMDS() to run an NMDS
ordination. This is actually a wrapper that uses the function
vegdist() to compute Bray-Curtis similarity values for your
sites, rotates the data using several random starting points, and stops
when it finds a similar low-stress projection twice. Since this is an
iterative process, it may take the function a bit to run.
# Run NMDS
nmds <- metaMDS(spp)
# View output of NMDS
nmds
##
## Call:
## metaMDS(comm = spp)
##
## global Multidimensional Scaling using monoMDS
##
## Data: wisconsin(sqrt(spp))
## Distance: bray
##
## Dimensions: 2
## Stress: 0.08133011
## Stress type 1, weak ties
## Best solution was repeated 5 times in 20 tries
## The best solution was from try 0 (metric scaling or null solution)
## Scaling: centring, PC rotation, halfchange scaling
## Species: expanded scores based on 'wisconsin(sqrt(spp))'
# Plot NMDS projection
plot(nmds)
# Add ellipses representing plot types
plot(nmds)
ordiellipse(nmds, env$plot_type, col=1:5, kind = "ehull", draw="polygon", label = T)
## Error in Fortran routine computing the spanning ellipsoid,
## probably collinear data
text(nmds, display = "species")
ggplotYou can extract output from an ordination object to plot your results using ggplot. Let’s run through an example from our nmds analysis. We will extract two data frames from the nmds object: 1) x and y coordinates that describe relative distances among sites on your nmds axes, and 2) species scores on each axis. This will put the nmds information we need into a format that ggplot can read.
# extract NMDS scores (x and y coordinates)
datascores <- as.data.frame(scores(nmds)$sites)
# add plot_type variable to datascores data frame
datascores$plot_type <- env$plot_type
# extract NMDS scores for species
speciesscores <- as.data.frame(scores(nmds, "species"))
# create column for species based on row names of NMDS data frame
speciesscores$species <- row.names(speciesscores)
Once we have those data frames, we can use the ggplot()
function to plot components of the nmds as separate layers. This is
similar to code we used to plot map layers in our mapping workshop.
ggplot() +
# add points to NMDS axes and color by plot type
geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
# add species names to plot, and reduce text size
geom_text(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species), size = 3) +
# scale plot so that distances are the same on the two axes,
# and adjust limits so that species names are readable
coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
# remove background shading
theme_minimal() +
# change legend label
labs(color = "Plot Type")
Note the species names are overlapping. Just as we did in the mapping
unit, you can load the ggrepel package and substitute
geom_text_repel() for geom_text() so that
species names are in the same relative position but not overlapping.
# load ggrepel package
library(ggrepel)
# create same nmds plot
ggplot() +
geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
# substitute geom_text_repel
# add segment.color = NA to remove segment lines
geom_text_repel(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species),
size = 3, segment.color = NA) +
coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
theme_minimal() +
labs(color = "Plot Type")
If you download and load the package ggalt from the CRAN
repository, you can also add a geom_encircle() layer to add
a polygon that groups your points.
# load ggalt package
library(ggalt)
# create same nmds plot
ggplot() +
geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
# plot polygon including points within each plot_type group
geom_encircle(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type),
s_shape = 1, expand = 0) +
geom_text_repel(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species),
size = 3, segment.color = NA) +
coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
theme_minimal() +
labs(color = "Plot Type")
At this point, you have a publication-ready graph that can be used in association with the “stress” value and any other statistical results from your ordination. If you are running a PCA or CCA, you may also want to include the amount of variation explained by each axis in the axis labels.
If you use ordinations a lot in your research, you may also want to
look into the ggvegan package. This user-created package
has the ability to add other features including vector lines and
ellipses. Unfortunately, this package is not available in the CRAN
repository, so it takes a bit of work to download and is outside the
scope of today’s lesson. If you are performing multivariate analyses for
your research, I would recommend consulting the documentation for
installing and using ggvegan. Documentation for
ggvegan (including instructions on how to install the
package) is available at https://github.com/gavinsimpson/ggvegan.
Challenge 3
How do the results of the NMDS compare to the results of the CCA?
Was this ordination more/less successful in projecting sites into 2D space?
How would you describe the differences in community composition among the different plot types?
The vegan package also includes a function for detrended
correspondence analysis (DCA), decorana(). This function is
commonly used in community ecology, but I cannot in good conscience
recommend that anyone ever use it.
“Detrending” is a more general statistical tool, and sometimes it makes sense. For example, let’s say that you wish to examine climate variation due to the North Atlantic Oscillation; there will be variation in your temperature data due to the oscillation and due to the long-term climate trends. You can “detrend” the variation by removing the long-term trend, allowing you to focus on the variation due to the climate oscillation. That kind of detrending makes sense because you know what variation you are removing from your data and why.
An example of detrending that makes sense. Temperature anomolies due to the Atlantic Multidecadal Oscillation, after detrending to remove long-term warming. Source: Wikipedia Commons, Marsupilami Rosentod
Detrended correspondence analysis is often used where species-rich ecological communities change gradually along a gradient. One example could be an elevational gradient. The extreme ends of this gradient tend to be different relative to the middle. When you plot the community using CCA, the data plot as an arch or a parabola. Ecologists wishing to describe the gradient find this annoying because it’s difficult to fit the data to linear models. DCA “solves” this problem by collapsing one axis of variation.
Comparison of arch community data in CCA versus the “solution” in DCA. Source: Wikipedia Commons
Users of this method argue that the variation was an “artifact” of the gradient and should be removed. I want to emphasize that this method is not a transformation. It is simply hiding variation in the data. For this reason, many statisticians find this method to be ethically questionable. The better solution would simply be to retain the additional axis of variation, or to focus on explaining variation in the first coordinate axis.
Oksanen, Jari. 31 August 2019. Vegan: an introduction to ordination. processed with vegan 2.5-6 in R version 3.6.1 (2019-07-05). https://cran.r-project.org/web/packages/vegan/vignettes/intro-vegan.pdf