Analyses
========

This text about available analyses uses several terms and mathematical symbols one might be unfamiliar with.
The :doc:`../glossary` should be able to explain those terms.

:option:`seal analyse <analyse>` subcommand ingests a
:seal-repo-file:`taskfile <tasks/example-task.toml>`
to perform analyses requested therein.

Alternatively, the :meth:`seal.analyse` method in combination with :meth:`seal.Config.from_taskfile` does the same.

Provided dataset, specified by the key ``encounters``, must be a comma-delimited CSV.
Some tools export CSVs with semicolons as a delimiter by default.

Example:

.. code-block:: console

   seal analyse --taskfile ./tasks/example-task.toml

Currently supported analyses are:


a1 - Overview
-------------

This analysis calculates per-quadrat number of species and number of
encountered individuals for each level. Resulting plots are heatmaps with
these data.

The auxiliary data contain general description of given dataset and
various smaller statistics for each level. Such statistics include, for
example, mode, mean, median, min, and such for each data
column.

|a1 species|
|a1 individuals|

This helps familiarize oneself with the data and serves as a basic check
of the suitability for the levels-creating strategy.

Output columns: ``coord_x,coord_y,n_species,n_individuals,level``


a2 - Species-area relationship
------------------------------

.. note::
   This analysis shows the influence of grain and extent on species-area relationship and thus perceived species richness. It is the core analysis of :program:`seal`.


Species-area curve is calculated for each level by accumulating quadrats
and tallying the number of species. Since this method is sensitive to
ordering of the quadrats, number of permutations may be specified (default = 200) and
their arithmetic means are plotted. The second plot shows the extremes of said permutations.

The auxiliary data contain statistics of the calculated results.

|a2 main|
|a2 extremes|

The greater the influence of scale, the further apart the lines representing each level are.
The number of species at the lower bound of each line segment—representing a level—
is the average number of species in single :term:`quadrat` of that level. If the influence of scale is high, more
species are found per unit of area if smaller quadrats are placed further apart rather than larger ones closer together (provided they end up covering the same amount of surface). The upper
bound of curve denoting highest level is the total number of species in the study grid, since it's
covered only
during analysis of the highest level (does not apply to :ref:`repeated transect merging <repeated transect merging>` strategy).

Output columns: ``0,1,...,n_permutations,min_acc,max_acc,area,level``


a3 - Species turnover
---------------------

This analysis computes :math:`|Q_1 \setminus Q_2|` and :math:`|Q_1 \cup Q_2|` for every pair of quadrats
along with :math:`d(q_1,q_2)`—:math:`d()` being chosen distance function—for each level. Simply put, the number of species found in the first quadrat
but not in the second one is noted.
Then the number of all species found in either quadrat (their cumulative species richness) is noted.
Finally the distance between the quadrats is noted.

Distance can (and, with the exception of Chebyshev's, likely should) be binned using the ``interval`` parameter in the taskfile.

Resulting plots depict the richness difference and union, respectively, as a function of distance between quadrats.
Line plots show the relationship for point-based distance, while box plots show the values for binned distance.

|a3 union|
|a3 union bin|
|a3 difference|
|a3 difference bin|

Output columns: ``distance,abs_diff,spp_union,distance_bin,level``

Used notation: :term:`Q_{3,2}`


a4 - Cumulative species richness
--------------------------------

This analysis aims to depict relationship between extent and total
richness for various grains. It selects a subset of quadrats :math:`S` so that its elements
make up vertices ("corners") of a rectangle with varying gaps (measured in quadrats) between
them, where the maximum valid gap :math:`g` is limited by the side of study grid with
the least amount of quadrats.

Afterwards :math:`|\bigcup_{q_i \in S} Q_i|` is computed for every possible pattern
with gap :math:`g`.

For grids containing single :term:`zone` (:math:`z = 1`), said union is
limited to two elements, forming endpoints of line segments with :math:`g` quadrats/transects in between.

Resulting plots depict the mean cumulative species richness of the four (or two) quadrats per extent.
One plot depicts this information via boxplot, the second one uses line chart.

|a4 total|
|a4 mean|

The lower bound of each curve depicts a situation where the four quadrats/transects are abutting, meaning they
represent a quadrat/transect of the next level. Typically, we see that extent has only a minor influence, as the lines
are aligned with the x-axis. On the other hand, the influence of grain is clearly visible as the curves are separate from each other.

Output columns: ``spp_total,extent,spp_mean,level``

Used notation: :term:`Q_{3,2}`, :term:`z`


a5 - Ratio of observed and expected richness variance in subgrids
-----------------------------------------------------------------

This analysis calculates the :math:`\frac{V_o}{V_e}` in every valid subgrid :math:`S` with :math:`V_o` being observed richness variance
and :math:`V_e` expected richness variance. Subgrid selection is explained below in more detail.

The expected value is calculated as :math:`\sum P_i \times (1 - P_i)`
with :math:`P_i` being proportion of quadrats in the subgrid occupied by
species :math:`i`.
The summation being over all the species in the study grid.\ :cite:p:`Schluter1984Variance`

Subgrids (and results) are created as follows:

#. :math:`g = 0`
#. select :math:`S` (that hasn't been selected before) of :math:`4 \times 4` quadrats with :math:`g`-sized gap between them
#. compute :math:`\frac{V_o}{V_e}`, write it down along with :math:`g`
#. unless every viable :math:`S` is processed, go to step 2
#. :math:`g = g+1`
#. go to step 2

.. note::
   In other words, a set of 16 quadrats arranged in a grid of 4x4 is chosen. :math:`V_o` and :math:`V_e` in this subgrid are
   calculated and the gap between the quadrats (measured in quadrats) is noted. The process is repeated until all possible
   subgrids containing quadrats of the given level having the given gap are not used.
   The process is then repeated for the next gap in same level.

Resulting graph depicts the ratio of observed and expected variance in number of species per level.

|a5 vove|

Typically, the plot shows that observed variance differs more from the expected one with larger extent.
If this holds true, the biodiversity of the sample tends to occur in "hotspots" - while some quadrats/transects have many species,
some have little or none.

Output columns: ``variance_ratio,gap,level``


a6 - Ratios of shared and unique species
----------------------------------------

This analysis calculates several ratios for every pair of quadrats and plots them against their distance.

#. .. math:: \frac{|Q_i \cap Q_j|}{|\bigcup_{k=1}^{n}Q_k|}; \forall i,j

   (ratio of shared species among the quadrats and all species in the study grid of given level)

#. .. math:: \frac{|Q_i \cap Q_j|}{|Q_i \cup Q_j|}; \forall i,j

   (ratio of shared species among the quadrats and their union)

#. .. math:: \frac{|Q_i \cap Q_j|}{|Q_i \oplus Q_j|}; \forall i,j

   (ratio of shared species among the quadrats and exclusive species among the quadrats)

The resulting plot shows these ratios.

|a6 gridtotal|
|a6 union|
|a6 xor|

These analyses help explore the species turnover deeper.

Output columns: ``shared_exclusive,shared_gridtotal,shared_cumulative,distance,level``

Used notation: :term:`Q_{3,2}`


a7 - Jaccard dissimilarity
--------------------------

This analysis calculates Jaccard dissimilarity with regard to species
between all possible pairs of quadrats.

.. note::
   While we realize there are many ways to calculate dissimilarity, we used Jaccard's dissimilarity for its straightforwardness.
   Users are encouraged to modify the calculation to the dissimilarity type of their liking in the analyse.py file.

.. math:: J(q_1, q_2) = 1 - \frac{|Q_1 \cap Q_2|}{|Q_1 \cap Q_2| + |Q_1 \setminus Q_2| + |Q_2 \setminus Q_1|}

The resulting is plot depicts dissimilarity against the distance between quadrats.

|a7 jaccard|

The analysis gives another look into turnover and distance decay, typically showing the plots are less similar
with distance between them. The effect is typically more pronounced in plots of higher levels, as they are less prone to
effects of randomness. On the other hand the dissimilarity is usually clearer between smaller quadrats as they contain
smaller sample of the community and have lower chance to randomly choose individuals of same species.

Output columns: ``jaccard_dissimilarity,log_jaccard_dissimilarity,distance,level``

Used notation: :term:`Q_{3,2}`


a8 - Species abundance distribution
-----------------------------------

This analysis calculates number of individuals per species.
This allows for assessment of how (un)evenly are species distributed in a community at each level.
The results are, optionally, plotted in two types of plots providing visual guidance.

The first plot serves as a quick visual overview of species abundance,
with species names included to highlight which are the most and the least common.

The second plot displays dominance-diversity relationship, using a CDF plot.
Or, to be technically correct, empirical complementary cumulative distribution function (ECCDF) plot,
sometimes, esp. in forecasting, called exceedance curve. This type of curve shows the probability that
an abundance of a sampled species exceeds a certain value.

|a8 rankabundance|
|a8 eccdf|

The results of this analysis help provide an overview of the species abundance distribution.

The first plot shows the most and least common species and their representation in the sampled community.
As the plot shows raw data, it is the purest representation of community evenness.

The second plot helps illustrate the differences in species abundance distributions between analysed levels
without binning or other data manipulation that could hide naturally occurring patterns.

Output columns: ``species,individuals,level``


.. |a1 individuals| image:: img/analysis/a1-individuals.svg
.. |a1 species| image:: img/analysis/a1-species.svg
.. |a2 extremes| image:: img/analysis/a2-extremes.svg
.. |a2 main| image:: img/analysis/a2-main.svg
.. |a3 union| image:: img/analysis/a3-union.svg
.. |a3 union bin| image:: img/analysis/a3-union-bin.svg
.. |a3 difference| image:: img/analysis/a3-difference.svg
.. |a3 difference bin| image:: img/analysis/a3-difference-bin.svg
.. |a4 total| image:: img/analysis/a4-total.svg
.. |a4 mean| image:: img/analysis/a4-mean.svg
.. |a5 vove| image:: img/analysis/a5-vove.svg
.. |a6 gridtotal| image:: img/analysis/a6-gridtotal.svg
.. |a6 union| image:: img/analysis/a6-union.svg
.. |a6 xor| image:: img/analysis/a6-xor.svg
.. |a7 jaccard| image:: img/analysis/a7-jaccard.svg
.. |a8 rankabundance| image:: img/analysis/a8-rankabundance.svg
.. |a8 eccdf| image:: img/analysis/a8-eccdf.svg