Analyses#
This text about available analyses uses several terms and mathematical symbols one might be unfamiliar with. The Glossary should be able to explain those terms.
seal analyse subcommand ingests a
taskfile
to perform analyses requested therein.
Alternatively, the seal.analyse() method in combination with seal.Config.from_taskfile() does the same.
Provided dataset, specified by the key encounters, must be a comma-delimited CSV.
Some tools export CSVs with semicolons as a delimiter by default.
Example:
seal analyse --taskfile ./tasks/example-task.toml
Currently supported analyses are:
a1 - Overview#
This analysis calculates per-quadrat number of species and number of encountered individuals for each level. Resulting plots are heatmaps with these data.
The auxiliary data contain general description of given dataset and various smaller statistics for each level. Such statistics include, for example, mode, mean, median, min, and such for each data column.
This helps familiarize oneself with the data and serves as a basic check of the suitability for the levels-creating strategy.
Output columns: coord_x,coord_y,n_species,n_individuals,level
a2 - Species-area relationship#
Note
This analysis shows the influence of grain and extent on species-area relationship and thus perceived species richness. It is the core analysis of seal.
Species-area curve is calculated for each level by accumulating quadrats and tallying the number of species. Since this method is sensitive to ordering of the quadrats, number of permutations may be specified (default = 200) and their arithmetic means are plotted. The second plot shows the extremes of said permutations.
The auxiliary data contain statistics of the calculated results.
The greater the influence of scale, the further apart the lines representing each level are. The number of species at the lower bound of each line segment—representing a level— is the average number of species in single quadrat of that level. If the influence of scale is high, more species are found per unit of area if smaller quadrats are placed further apart rather than larger ones closer together (provided they end up covering the same amount of surface). The upper bound of curve denoting highest level is the total number of species in the study grid, since it’s covered only during analysis of the highest level (does not apply to repeated transect merging strategy).
Output columns: 0,1,...,n_permutations,min_acc,max_acc,area,level
a3 - Species turnover#
This analysis computes \(|Q_1 \setminus Q_2|\) and \(|Q_1 \cup Q_2|\) for every pair of quadrats along with \(d(q_1,q_2)\)—\(d()\) being chosen distance function—for each level. Simply put, the number of species found in the first quadrat but not in the second one is noted. Then the number of all species found in either quadrat (their cumulative species richness) is noted. Finally the distance between the quadrats is noted.
Distance can (and, with the exception of Chebyshev’s, likely should) be binned using the interval parameter in the taskfile.
Resulting plots depict the richness difference and union, respectively, as a function of distance between quadrats. Line plots show the relationship for point-based distance, while box plots show the values for binned distance.
Output columns: distance,abs_diff,spp_union,distance_bin,level
Used notation: Q_{3,2}
a4 - Cumulative species richness#
This analysis aims to depict relationship between extent and total richness for various grains. It selects a subset of quadrats \(S\) so that its elements make up vertices (“corners”) of a rectangle with varying gaps (measured in quadrats) between them, where the maximum valid gap \(g\) is limited by the side of study grid with the least amount of quadrats.
Afterwards \(|\bigcup_{q_i \in S} Q_i|\) is computed for every possible pattern with gap \(g\).
For grids containing single zone (\(z = 1\)), said union is limited to two elements, forming endpoints of line segments with \(g\) quadrats/transects in between.
Resulting plots depict the mean cumulative species richness of the four (or two) quadrats per extent. One plot depicts this information via boxplot, the second one uses line chart.
The lower bound of each curve depicts a situation where the four quadrats/transects are abutting, meaning they represent a quadrat/transect of the next level. Typically, we see that extent has only a minor influence, as the lines are aligned with the x-axis. On the other hand, the influence of grain is clearly visible as the curves are separate from each other.
Output columns: spp_total,extent,spp_mean,level
a5 - Ratio of observed and expected richness variance in subgrids#
This analysis calculates the \(\frac{V_o}{V_e}\) in every valid subgrid \(S\) with \(V_o\) being observed richness variance and \(V_e\) expected richness variance. Subgrid selection is explained below in more detail.
The expected value is calculated as \(\sum P_i \times (1 - P_i)\) with \(P_i\) being proportion of quadrats in the subgrid occupied by species \(i\). The summation being over all the species in the study grid.3
Subgrids (and results) are created as follows:
\(g = 0\)
select \(S\) (that hasn’t been selected before) of \(4 \times 4\) quadrats with \(g\)-sized gap between them
compute \(\frac{V_o}{V_e}\), write it down along with \(g\)
unless every viable \(S\) is processed, go to step 2
\(g = g+1\)
go to step 2
Note
In other words, a set of 16 quadrats arranged in a grid of 4x4 is chosen. \(V_o\) and \(V_e\) in this subgrid are calculated and the gap between the quadrats (measured in quadrats) is noted. The process is repeated until all possible subgrids containing quadrats of the given level having the given gap are not used. The process is then repeated for the next gap in same level.
Resulting graph depicts the ratio of observed and expected variance in number of species per level.
Typically, the plot shows that observed variance differs more from the expected one with larger extent. If this holds true, the biodiversity of the sample tends to occur in “hotspots” - while some quadrats/transects have many species, some have little or none.
Output columns: variance_ratio,gap,level
a7 - Jaccard dissimilarity#
This analysis calculates Jaccard dissimilarity with regard to species between all possible pairs of quadrats.
Note
While we realize there are many ways to calculate dissimilarity, we used Jaccard’s dissimilarity for its straightforwardness. Users are encouraged to modify the calculation to the dissimilarity type of their liking in the analyse.py file.
The resulting is plot depicts dissimilarity against the distance between quadrats.
The analysis gives another look into turnover and distance decay, typically showing the plots are less similar with distance between them. The effect is typically more pronounced in plots of higher levels, as they are less prone to effects of randomness. On the other hand the dissimilarity is usually clearer between smaller quadrats as they contain smaller sample of the community and have lower chance to randomly choose individuals of same species.
Output columns: jaccard_dissimilarity,log_jaccard_dissimilarity,distance,level
Used notation: Q_{3,2}
a8 - Species abundance distribution#
This analysis calculates number of individuals per species. This allows for assessment of how (un)evenly are species distributed in a community at each level. The results are, optionally, plotted in two types of plots providing visual guidance.
The first plot serves as a quick visual overview of species abundance, with species names included to highlight which are the most and the least common.
The second plot displays dominance-diversity relationship, using a CDF plot. Or, to be technically correct, empirical complementary cumulative distribution function (ECCDF) plot, sometimes, esp. in forecasting, called exceedance curve. This type of curve shows the probability that an abundance of a sampled species exceeds a certain value.
The results of this analysis help provide an overview of the species abundance distribution.
The first plot shows the most and least common species and their representation in the sampled community. As the plot shows raw data, it is the purest representation of community evenness.
The second plot helps illustrate the differences in species abundance distributions between analysed levels without binning or other data manipulation that could hide naturally occurring patterns.
Output columns: species,individuals,level