Learning Objectives

  • Review major goals of multivariate analysis and ordination, with a focus on analysis of ecological communities.
  • Introduce primary ordination tools in the vegan package: Principal components Analysis (PCA), Canonical Correspondence Analysis (CCA), and Non-metric Multidimensional Scaling (NMDS).
  • Use environmental variables to explain differences in community structure.
  • Construct biplots to visualize the results of ordinations.

Ordination

Ordination is a process by which multidimensional data are projected in fewer dimensions. Today we will focus on ordination as a tool of community ecology, but the same techniques can be used to reduce the number of dimensions for many kinds of data. Communities consist of many species, and we may want to describe changes in community structure, or compare the structures of communities to each other, in ways that are easy to visualize.

We begin with some measure of abundance of each species as a different axis of variation in your data. Points that are close together along these axes are more similar in terms of the abundance of that species, and points that are further are more different in terms of the abundance of that species. Ordination processes rotate and reshape your cloud of data points such that the maximum amount of variation is shown in the fewest number of dimensions (typically 2 or 3 dimensions).

Ordination as a projection of 3D data onto 2D surface. Source: http://ordination.okstate.edu/overview.htm

The vegan package

The vegan package is not the only package available to perform multivariate analyses, but it is one of the most commonly used, especially in the field of community ecology. It is also one of the most extensively tested and documented, and many packages that have been developed for graphing multivariate data rely on vegan objects. For all of these reasons, we will focus on tools in the vegan package today.

Getting started

Open your “Environmental Data Analysis” project and begin a new script. Save it as something sensible like “A13 Multivariate Analysis.”

Packages

Load the packages you will need today. We will use the vegan package for data analysis. We will use the tidyverse package to reshape data into the form that vegan requires. Computers in Hudson 032 should have all of these packages installed. If you are working on your personal computer, you will need to install the vegan package.

library(tidyverse)
library(vegan)

Import Data

Today we will use the Portal small-mammal survey data that we have used throughout the course. I have added a column and rearranged the variables to facilitate our activity today. Download the surveys_complete_v2.csv from the course Brightspace page to your data folder, and use read.csv() to save the data as object surveys.

# Import data
surveys <- read.csv("data/surveys_complete_v2.csv")
# View structure of data
str(surveys)
## 'data.frame':    30463 obs. of  14 variables:
##  $ record_id      : int  845 1164 1261 1756 1818 1882 2133 2184 2406 3000 ...
##  $ species_name   : chr  "Neotoma albigula" "Neotoma albigula" "Neotoma albigula" "Neotoma albigula" ...
##  $ genus          : chr  "Neotoma" "Neotoma" "Neotoma" "Neotoma" ...
##  $ species        : chr  "albigula" "albigula" "albigula" "albigula" ...
##  $ taxa           : chr  "Rodent" "Rodent" "Rodent" "Rodent" ...
##  $ month          : int  5 8 9 4 5 7 10 11 1 5 ...
##  $ day            : int  6 5 4 29 30 4 25 17 16 18 ...
##  $ year           : int  1978 1978 1978 1979 1979 1979 1979 1979 1980 1980 ...
##  $ plot_id        : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ species_id     : chr  "NL" "NL" "NL" "NL" ...
##  $ sex            : chr  "M" "M" "M" "M" ...
##  $ hindfoot_length: int  32 34 32 33 32 32 33 30 33 31 ...
##  $ weight         : int  204 199 197 166 184 206 274 186 184 87 ...
##  $ plot_type      : chr  "Control" "Control" "Control" "Control" ...

Reshaping the Data

The vegan package requires an input matrix with each species in the community as a different column and each observation as a different row. I will first use group_by() to group the data by species_name and my environmental variables of interest. Then I will use count() to count the number of individuals of each species, and use the spread() function to reshape my data into the correct format.

matrix <- surveys %>% 
  # group by species and by environmental factors of interest
  group_by(species_name, plot_id, plot_type) %>% 
  # count the number of individuals of each species
  count() %>% 
  # spread the data such that each species has its own column
  spread(key = species_name, value = n, fill = 0) 

*Note that I added the fill = 0 argument in the spread function. This indicates that I want missing values to be recorded as absences, or 0s, rather than NAs.

Examine the first few rows of the matrix data frame to make sure it looks the way you expect.

head(matrix)
plot_id plot_type Chaetodipus baileyi Chaetodipus penicillatus Dipodomys merriami Dipodomys ordii Dipodomys spectabilis Neotoma albigula Onychomys leucogaster Onychomys torridus Perognathus flavus Peromyscus eremicus Peromyscus maniculatus Reithrodontomys fulvescens Reithrodontomys megalotis Sigmodon hispidus
1 Spectab exclosure 109 211 565 460 196 50 48 96 28 11 15 0 23 0
2 Control 167 175 528 289 119 169 59 180 14 137 35 1 66 11
3 Long-term Krat Exclosure 332 347 95 80 13 82 60 149 114 65 38 5 194 31
4 Control 95 113 1010 56 229 5 51 69 124 1 4 0 18 2
5 Rodent Exclosure 0 55 406 90 54 24 70 103 38 33 47 5 98 3
6 Short-term Krat Exclosure 238 192 188 201 85 14 39 140 100 66 26 7 79 8

Creating species and environmental matrices

Now you want to separate your matrix into a species matrix and a separate data frame of environmental variables.

# Select species columns, columns 4-16 of the matrix data frame
spp <- matrix[,4:16]

# Select environmental columns, columns 1-3 of the matrix data frame
env <- matrix[,1:3]

Principle Components Analysis

Principle Components Analysis (PCA) is a simple rotation of your data in space, where distances among data points are preserved. The PCA will rotate your data such that the maximum amount of variation is displayed on axis 1, the maximum remaining variation that is uncorrelated to axis 1 is shown on axis 2, and so on. The “factor loadings” will tell you the weighted combination of species that contribute to each axis. We will use the function rda() to run a PCA on our matrix of species.

# Create PCA object
pca <- rda(spp, scale = T)
# Plot results of pca as biplot of axis 1 and axis 2
plot(pca)

This plot was generated in using the default settings. We can make it look a little nicer by adding type = "n" to our plot code and adding the two components separately, using points to show the different sites and text to show the different species.

# Set base plot
plot(pca, type = "n")
# Add sites to plot as points
points(pca, display = "sites")
# Add species names to plot
text(pca, display = "species")

This plot contains a lot of information.
Sites (points) that are far apart from each other horizontally differ along the first principle component axis.
Sites that are far apart vertically differ along the second principle component.
Species that are in opposite directions tend not to co-occur, while species that group in the same direction do tend to co-occur.

For linear transformations like PCA (e.g., rotations of the data), you can also visualize species as vectors (arrows) pointing from the center of the ordination. To create that plot, we will use the biplot() function.

biplot(pca, scaling = "symmetric", type = c("text", "points"))

The length of the vector indicates the strength of the species’ “loading” on an axis. Very short vectors mean that variation in the occurrence of those species cannot be visualized well along these two axes. To visualize variation in those species, you may need to choose a different projection.
Vectors pointing in opposite directions indicate that species tend to be negatively correlated on that axis. Arrows pointing in the same direction indicate that species tend to be positively correlated on that axis.

Challenge 1

Based on the results of the PCA biplot, which of the following statements is/are true regarding relationships between Dipodomys (kangaroo rat) captures and captures of individuals of other species?
a. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Reithrodontomys.
b. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Peromyscus.
c. Locations where Dipodomys mice were captured also tended to capture individuals of the genus Onychomys.
d. Variation in the abundance of Dipodomys is not well represented in this biplot.
Find any pair of genera that tend to co-occur in the same trapping plots. Google those genera to learn a little more about them (briefly, do not spend more than ~5 min doing this). Why do you think these two genera tended to co-occur in the dataset?

Canonical Correspondence Analysis

The PCA has helped you to visualize the structure of your community. However, you may also want to investigate whether environmental factors may help explain differences in community structure. This set of questions is analogous to using a linear model to test the effect of one variable on another, only in this case, your response variable is a community matrix rather than a single variable.

The benefit to using CCA over PCA is that the rotation of your data is “constrained” such that it optimally describes variance in the community and the environmental variables at the same time. A PCA describes variance in the community and then superimposes the environmental variables.

The formula for a CCA is similar to a linear model

  • Linear model formula: mod <- lm(DepVar~IndepVar, data = data)
  • CCA model formula: ord <- cca(DepVarMatrix~IndepVar, data = data)

Let’s use CCA to test the hypothesis that community structure will differ among the various plot types.

# First I define my model with the species matrix as the dependent variable
# and plot type as the independent variable
cca <- cca(spp~plot_type, data = env)
View CCA Output
# Call results of model
summary(cca)
## 
## Call:
## cca(formula = spp ~ plot_type, data = env) 
## 
## Partitioning of scaled Chi-square:
##               Inertia Proportion
## Total          0.6094     1.0000
## Constrained    0.3433     0.5633
## Unconstrained  0.2661     0.4367
## 
## Eigenvalues, and their contribution to the scaled Chi-square 
## 
## Importance of components:
##                         CCA1   CCA2     CCA3     CCA4     CA1     CA2     CA3
## Eigenvalue            0.2555 0.0714 0.008849 0.007542 0.08325 0.06938 0.04099
## Proportion Explained  0.4193 0.1172 0.014521 0.012376 0.13661 0.11385 0.06726
## Cumulative Proportion 0.4193 0.5365 0.550974 0.563350 0.69996 0.81381 0.88107
##                           CA4     CA5     CA6      CA7      CA8      CA9
## Eigenvalue            0.02580 0.01509 0.01050 0.007897 0.004811 0.003623
## Proportion Explained  0.04234 0.02476 0.01723 0.012958 0.007894 0.005944
## Cumulative Proportion 0.92340 0.94817 0.96540 0.978354 0.986248 0.992192
##                           CA10     CA11      CA12
## Eigenvalue            0.002309 0.001511 0.0009379
## Proportion Explained  0.003789 0.002479 0.0015390
## Cumulative Proportion 0.995982 0.998461 1.0000000
## 
## Accumulated constrained eigenvalues
## Importance of components:
##                         CCA1   CCA2     CCA3     CCA4
## Eigenvalue            0.2555 0.0714 0.008849 0.007542
## Proportion Explained  0.7443 0.2080 0.025775 0.021969
## Cumulative Proportion 0.7443 0.9523 0.978031 1.000000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## 
## 
## Species scores
## 
##                                CCA1     CCA2       CCA3     CCA4      CA1
## Chaetodipus penicillatus    0.31888  0.38714 -0.0126186 -0.11485 -0.17601
## Dipodomys merriami         -0.41536 -0.10032  0.0771726 -0.02062  0.28224
## Dipodomys ordii            -0.42062 -0.03984 -0.1857826 -0.07096 -0.35995
## Dipodomys spectabilis      -0.46513  0.04062 -0.1621712  0.10599  0.22989
## Neotoma albigula            0.07580  0.07664  0.1076133  0.14505 -0.20920
## Onychomys leucogaster      -0.02057  0.04218 -0.0348863  0.09295  0.24958
## Onychomys torridus          0.11634  0.15477  0.0537211  0.08161 -0.01676
## Perognathus flavus          0.56391  0.68460  0.0007958  0.10577  0.18462
## Peromyscus eremicus         0.66292 -0.43029  0.0147822  0.12819 -0.42604
## Peromyscus maniculatus      0.94824 -0.34986 -0.0709096 -0.02145 -0.36388
## Reithrodontomys fulvescens  0.80648 -0.39626 -0.1440087  0.49770 -0.42119
## Reithrodontomys megalotis   0.96707 -0.34614 -0.0189596 -0.06498 -0.44210
## Sigmodon hispidus           0.73863  0.64428  0.2906644 -0.39622 -0.10609
##                                  CA2
## Chaetodipus penicillatus   -0.147158
## Dipodomys merriami          0.035062
## Dipodomys ordii            -0.491868
## Dipodomys spectabilis       0.046923
## Neotoma albigula           -0.242248
## Onychomys leucogaster      -0.145283
## Onychomys torridus         -0.149976
## Perognathus flavus          0.187732
## Peromyscus eremicus         0.006578
## Peromyscus maniculatus      0.513197
## Reithrodontomys fulvescens  0.512437
## Reithrodontomys megalotis   0.554232
## Sigmodon hispidus          -0.149857
## 
## 
## Site scores (weighted averages of species scores)
## 
##           CCA1     CCA2     CCA3     CCA4      CA1      CA2
## row1  -0.86982  0.30089 -4.56241 -1.99659 -0.83639 -0.72364
## row2  -0.22654 -0.20707 -0.46241  1.18599 -1.46004 -0.64505
## row3   1.26583  1.55841  0.32728 -2.01080  0.08548 -1.24631
## row4  -0.96072  0.33999  2.11716  0.69267  1.38073  0.83009
## row5  -0.04887 -0.52818  0.69537  0.67392  2.39221 -1.99047
## row6   0.30199  0.99450 -3.19205 -0.05102 -0.61179 -1.00587
## row7   2.73264 -2.86473 -1.50283 -1.99035 -2.14179  1.22723
## row8  -0.89126  0.50246 -5.02388 -2.42896 -0.91650 -1.37743
## row9  -1.07577 -0.26544  0.08036 -0.13545  0.84282  0.72921
## row10  3.13500 -4.09568 -3.31507 -1.64945 -2.09194  3.94301
## row11 -0.66007 -0.29282  1.37423 -1.25795 -0.02382  0.18207
## row12 -0.56887 -0.17809  1.52717 -0.30714 -0.41660 -0.17649
## row13  0.19463  0.60637  0.49328  2.94117  0.66686 -0.02304
## row14 -0.97503 -0.58345  4.68813 -0.19360  1.17521  0.66275
## row15  1.69602  1.73965  0.83102  0.36548  0.19794 -0.18943
## row16  1.74604 -2.37210 -1.39852 -3.23920 -0.55758  0.71204
## row17 -0.58692 -0.08597  1.38834 -0.47651 -0.29285 -0.02175
## row18  0.54446  0.84086 -0.03019  2.14133  0.37268  0.86662
## row19  2.29324  0.79824 -0.79291 -1.20354 -0.52060  1.31696
## row20  0.62459  0.61370  0.16806  3.52947 -0.37895  0.29837
## row21  2.09386  2.10880  1.02687 -1.86170  0.22398  1.15186
## row22 -0.93468  0.18294  0.01046  3.88735  1.27292  1.01307
## row23  2.87115 -3.91128 -1.72975 -1.12700 -1.93999  2.85809
## row24  0.99612 -2.48964  1.76890  3.01679  0.24076 -1.08022
## 
## 
## Site constraints (linear combinations of constraining variables)
## 
##          CCA1     CCA2    CCA3       CCA4      CA1      CA2
## row1  -0.9724  0.01881 -2.2499 -1.0695831 -0.83639 -0.72364
## row2  -0.7037 -0.05852  0.7744 -0.0552729 -1.46004 -0.64505
## row3   1.7422  1.52956  0.3156 -1.2859821  0.08548 -1.24631
## row4  -0.7037 -0.05852  0.7744 -0.0552729  1.38073  0.83009
## row5   1.4054 -2.21651 -0.1817  0.0005864  2.39221 -1.99047
## row6   0.4101  0.77013 -0.7124  2.0743174 -0.61179 -1.00587
## row7   1.4054 -2.21651 -0.1817  0.0005864 -2.14179  1.22723
## row8  -0.7037 -0.05852  0.7744 -0.0552729 -0.91650 -1.37743
## row9  -0.9724  0.01881 -2.2499 -1.0695831  0.84282  0.72921
## row10  1.4054 -2.21651 -0.1817  0.0005864 -2.09194  3.94301
## row11 -0.7037 -0.05852  0.7744 -0.0552729 -0.02382  0.18207
## row12 -0.7037 -0.05852  0.7744 -0.0552729 -0.41660 -0.17649
## row13  0.4101  0.77013 -0.7124  2.0743174  0.66686 -0.02304
## row14 -0.7037 -0.05852  0.7744 -0.0552729  1.17521  0.66275
## row15  1.7422  1.52956  0.3156 -1.2859821  0.19794 -0.18943
## row16  1.4054 -2.21651 -0.1817  0.0005864 -0.55758  0.71204
## row17 -0.7037 -0.05852  0.7744 -0.0552729 -0.29285 -0.02175
## row18  0.4101  0.77013 -0.7124  2.0743174  0.37268  0.86662
## row19  1.7422  1.52956  0.3156 -1.2859821 -0.52060  1.31696
## row20  0.4101  0.77013 -0.7124  2.0743174 -0.37895  0.29837
## row21  1.7422  1.52956  0.3156 -1.2859821  0.22398  1.15186
## row22 -0.7037 -0.05852  0.7744 -0.0552729  1.27292  1.01307
## row23  1.4054 -2.21651 -0.1817  0.0005864 -1.93999  2.85809
## row24  1.4054 -2.21651 -0.1817  0.0005864  0.24076 -1.08022
## 
## 
## Biplot scores for constraining variables
## 
##                                       CCA1      CCA2     CCA3       CCA4 CA1
## plot_typeLong-term Krat Exclosure   0.6525  0.572900  0.11822 -0.4816675   0
## plot_typeRodent Exclosure           0.5342 -0.842531 -0.06908  0.0002229   0
## plot_typeShort-term Krat Exclosure  0.1737  0.326267 -0.30183  0.8787877   0
## plot_typeSpectab exclosure         -0.3636  0.007033 -0.84130 -0.3999433   0
##                                    CA2
## plot_typeLong-term Krat Exclosure    0
## plot_typeRodent Exclosure            0
## plot_typeShort-term Krat Exclosure   0
## plot_typeSpectab exclosure           0
## 
## 
## Centroids for factor constraints
## 
##                                       CCA1     CCA2    CCA3       CCA4 CA1 CA2
## plot_typeControl                   -0.7037 -0.05852  0.7744 -0.0552729   0   0
## plot_typeLong-term Krat Exclosure   1.7422  1.52956  0.3156 -1.2859821   0   0
## plot_typeRodent Exclosure           1.4054 -2.21651 -0.1817  0.0005864   0   0
## plot_typeShort-term Krat Exclosure  0.4101  0.77013 -0.7124  2.0743174   0   0
## plot_typeSpectab exclosure         -0.9724  0.01881 -2.2499 -1.0695831   0   0

The cca output gives you a lot of information. I want to point out just a few things. The Inertia is similar to variance in a linear model. In this case, the total inertia is 0.6094, and the constrained inertia (or the inertia explained by your model) is 0.3433, or 56% of the total. The eigenvalues are similar to slopes, but in this case, for a whole axis. You generally hope that your eigenvalues, and thus the relative importance of your axes, will decrease in magnitude after the first 2 or 3 axes, which seems to be the case here. The species scores tell your the relative contribution of each species to each axis.

Significance tests

You can run an ANOVA on the results of your CCA to test the significance of your grouping variable using the function anova(). In this case, the p-values are calculated using permutation tests. A permutation test randomly reshuffles your data a set number of times and comes up with a frequency distribution; it then uses this distribution to test how likely your data are, given the combinations that were possible.

# Run anova on cca object
ano <- anova(cca)
# Call results of anova
ano
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: cca(formula = spp ~ plot_type, data = env)
##          Df ChiSquare      F Pr(>F)    
## Model     4   0.34332 6.1283  0.001 ***
## Residual 19   0.26611                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plotting Ordinations

You can also plot the results of your CCA. We will use base plotting for most of today’s activity, and then I will provide a brief introduction to how you can create similar plots using ggplot2.

# We can plot the results of the cca
plot(cca)

As you can see, it’s hard to see what is going on in the default plot. Let’s try substituting points for text and adding ellipses to show differences among groups.

# Set up base plot
plot(cca, type = "p")
# Add ellipses to show differences among plot_type groups
ordiellipse(cca, env$plot_type, col=1:5, draw="polygon", label = T)

The default is to plot the ellipses as the standard deviation of each group. Therefore, the relative size of the ellipses tells you the relative amount of variation within each group.

Challenge 2

Pull up the help file for the ordiellipse() function by typing ?ordiellipse().
Use this help file to figure out how to:
1. Change the ellipses so that they enclose all of the observations of each group.
2. Plot the ellipses as lines rather than polygons.
What does this plot tell you about how well these groups are represented in this 2D projection? Are any groups not well represented?

Non-metric Multidimensional Scaling

So far all of the techniques we have used have been linear transformations, where distances between points are preserved when projecting into fewer dimensions. This is equivalent to running a parametric test (or linear model) like ANOVA or regression. As such, these techniques have all the same assumptions of parametric tests.

Assumptions of Linear Ordination

  • Multivariate normal distribution
  • Homogeneity of variances
  • Linearity
  • Independence

Data likely never meet all of these assumptions (particularly the multivariate normal distribution), but like parametric tests, linear multivariate tests are robust to minor violations of these assumptions. However, some data are not well suited to linear transformations. In these cases, it may be more appropriate to use a non-linear transformation to visualize changes in community structure. The most common of these is non-metric multidimensional scaling, or NMDS. NMDS uses a distance matrix and projects the data such that distances among points are maintained as much as possible. As with other ordinations, points that are close together are more similar with respect to species composition, while points that are further apart are more different with respect to species composition, and the axes represent a weighted combination of species abundances.

We will use the function metaMDS() to run an NMDS ordination. This is actually a wrapper that uses the function vegdist() to compute Bray-Curtis similarity values for your sites, rotates the data using several random starting points, and stops when it finds a similar low-stress projection twice. Since this is an iterative process, it may take the function a bit to run.

# Run NMDS
nmds <- metaMDS(spp)
# View output of NMDS
nmds
## 
## Call:
## metaMDS(comm = spp) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(spp)) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.08133011 
## Stress type 1, weak ties
## Best solution was repeated 5 times in 20 tries
## The best solution was from try 0 (metric scaling or null solution)
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'wisconsin(sqrt(spp))'
# Plot NMDS projection
plot(nmds)
# Add ellipses representing plot types
plot(nmds)
ordiellipse(nmds, env$plot_type, col=1:5, kind = "ehull", draw="polygon", label = T)
## Error in Fortran routine computing the spanning ellipsoid,
##  probably collinear data
text(nmds, display = "species")

Plotting ordinations with ggplot

You can extract output from an ordination object to plot your results using ggplot. Let’s run through an example from our nmds analysis. We will extract two data frames from the nmds object: 1) x and y coordinates that describe relative distances among sites on your nmds axes, and 2) species scores on each axis. This will put the nmds information we need into a format that ggplot can read.

# extract NMDS scores (x and y coordinates)
datascores <- as.data.frame(scores(nmds)$sites)
# add plot_type variable to datascores data frame
datascores$plot_type <- env$plot_type

# extract NMDS scores for species
speciesscores <- as.data.frame(scores(nmds, "species"))
# create column for species based on row names of NMDS data frame
speciesscores$species <- row.names(speciesscores)

Once we have those data frames, we can use the ggplot() function to plot components of the nmds as separate layers. This is similar to code we used to plot map layers in our mapping workshop.

ggplot() +
  # add points to NMDS axes and color by plot type
  geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
  # add species names to plot, and reduce text size
  geom_text(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species), size = 3) +
  # scale plot so that distances are the same on the two axes,
  # and adjust limits so that species names are readable
  coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
  # remove background shading
  theme_minimal() +
  # change legend label
  labs(color = "Plot Type") 

Note the species names are overlapping. Just as we did in the mapping unit, you can load the ggrepel package and substitute geom_text_repel() for geom_text() so that species names are in the same relative position but not overlapping.

# load ggrepel package
library(ggrepel)

# create same nmds plot
ggplot() +
  geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
  # substitute geom_text_repel
  # add segment.color = NA to remove segment lines
  geom_text_repel(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species), 
                  size = 3, segment.color = NA) +
  coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
  theme_minimal() +
  labs(color = "Plot Type") 

If you download and load the package ggalt from the CRAN repository, you can also add a geom_encircle() layer to add a polygon that groups your points.

# load ggalt package
library(ggalt)

# create same nmds plot
ggplot() +
  geom_point(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type)) +
  # plot polygon including points within each plot_type group
  geom_encircle(data = datascores, aes(x = NMDS1, y = NMDS2, color = plot_type), 
                s_shape = 1, expand = 0) +
  geom_text_repel(data = speciesscores, aes(x = NMDS1, y = NMDS2, label = species), 
                  size = 3, segment.color = NA) +
  coord_fixed(ratio = 1, xlim = c(-0.75, 1)) +
  theme_minimal() +
  labs(color = "Plot Type") 

At this point, you have a publication-ready graph that can be used in association with the “stress” value and any other statistical results from your ordination. If you are running a PCA or CCA, you may also want to include the amount of variation explained by each axis in the axis labels.

If you use ordinations a lot in your research, you may also want to look into the ggvegan package. This user-created package has the ability to add other features including vector lines and ellipses. Unfortunately, this package is not available in the CRAN repository, so it takes a bit of work to download and is outside the scope of today’s lesson. If you are performing multivariate analyses for your research, I would recommend consulting the documentation for installing and using ggvegan. Documentation for ggvegan (including instructions on how to install the package) is available at https://github.com/gavinsimpson/ggvegan.

Challenge 3

How do the results of the NMDS compare to the results of the CCA?
Was this ordination more/less successful in projecting sites into 2D space?
How would you describe the differences in community composition among the different plot types?

Detrended Correspondence Analysis

The vegan package also includes a function for detrended correspondence analysis (DCA), decorana(). This function is commonly used in community ecology, but I cannot in good conscience recommend that anyone ever use it.

“Detrending” is a more general statistical tool, and sometimes it makes sense. For example, let’s say that you wish to examine climate variation due to the North Atlantic Oscillation; there will be variation in your temperature data due to the oscillation and due to the long-term climate trends. You can “detrend” the variation by removing the long-term trend, allowing you to focus on the variation due to the climate oscillation. That kind of detrending makes sense because you know what variation you are removing from your data and why.

An example of detrending that makes sense. Temperature anomolies due to the Atlantic Multidecadal Oscillation, after detrending to remove long-term warming. Source: Wikipedia Commons, Marsupilami Rosentod

Detrended correspondence analysis is often used where species-rich ecological communities change gradually along a gradient. One example could be an elevational gradient. The extreme ends of this gradient tend to be different relative to the middle. When you plot the community using CCA, the data plot as an arch or a parabola. Ecologists wishing to describe the gradient find this annoying because it’s difficult to fit the data to linear models. DCA “solves” this problem by collapsing one axis of variation.

Comparison of arch community data in CCA versus the “solution” in DCA. Source: Wikipedia Commons

Users of this method argue that the variation was an “artifact” of the gradient and should be removed. I want to emphasize that this method is not a transformation. It is simply hiding variation in the data. For this reason, many statisticians find this method to be ethically questionable. The better solution would simply be to retain the additional axis of variation, or to focus on explaining variation in the first coordinate axis.

References

Oksanen, Jari. 31 August 2019. Vegan: an introduction to ordination. processed with vegan 2.5-6 in R version 3.6.1 (2019-07-05). https://cran.r-project.org/web/packages/vegan/vignettes/intro-vegan.pdf