| Title: | Community Niche Position and Width Estimation Tools |
|---|---|
| Description: | Provides methods for estimating species niche position and niche breadth under continuous environmental gradients. The package implements canonical correspondence analysis (CCA), partial CCA (pCCA), generalized additive models (GAM), and Levins' niche breadth metrics for species-level and community-level analyses. Methods are based on ter Braak (1986) <doi:10.2307/1938672>, Okie et al. (2015) <doi:10.1098/rspb.2014.2630>, Feng et al. (2020) <doi:10.1111/mec.15441>, Wood (2017) <doi:10.1201/9781315370279>, and Levins (1968, ISBN:978-0691080628). |
| Authors: | Shuotao Zhou [aut, cre], Kai Feng [aut], Ye Deng [aut] |
| Maintainer: | Shuotao Zhou <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.2 |
| Built: | 2026-05-10 07:49:25 UTC |
| Source: | https://github.com/yedeng-lab/econiche |
Evaluates the relationship between the environmental gradient and the aggregated niche position of sites.
cca_calc_gradient(env, site_pos, var, make_plot = TRUE, galaxy_colnum = TRUE)cca_calc_gradient(env, site_pos, var, make_plot = TRUE, galaxy_colnum = TRUE)
env |
A sample-by-environment data.frame (rows = samples). |
site_pos |
Named numeric vector of site scores on the position axis. |
var |
Environmental variable to plot against (name or index). |
make_plot |
Logical; if |
galaxy_colnum |
Logical; for numeric indices, whether to treat them as 1-based indices. |
A list containing:
Data frame used for plotting (ENV, NichePosition).
The ggplot object.
Computes aggregated niche width statistics at the sample level and group level.
Supports two calculation modes via choice: "all" (global context)
or group-specific context.
cca_calc_group(otu, site_width, group, choice = "all")cca_calc_group(otu, site_width, group, choice = "all")
otu |
OTU/species matrix (rows = taxa, columns = samples). |
site_width |
Site scores for width axis (usually matrix). |
group |
A named vector or factor defining groups. Names must match sample IDs. |
choice |
Calculation mode. If |
A list containing:
Data frame of sample-level statistics (ID, AverageValue, StandardDeviation/Error).
Data frame of group-level statistics.
Computes the niche width and niche position for each species using the site scores from Step 2. Generates a plot of Niche Width vs. Niche Position.
cca_calc_species( otu, site_width, site_pos, method = c("lm", "loess"), make_plot = TRUE, top_node = 10000 )cca_calc_species( otu, site_width, site_pos, method = c("lm", "loess"), make_plot = TRUE, top_node = 10000 )
otu |
OTU/species matrix (rows = taxa, columns = samples). |
site_width |
Site scores for width axis (usually matrix from partial CCA). |
site_pos |
Site scores for position axis (usually vector from CCA axis 1). |
method |
Smoothing method for the plot ("lm" or "loess"). |
make_plot |
Logical; if |
top_node |
Integer; maximum number of most abundant species to use for loess smoothing calculation (default 10000). Points are plotted for all valid species. |
A list containing:
Data frame of species traits (NichePosition, NicheWidth).
The ggplot object (if make_plot = TRUE).
Performs Constrained Correspondence Analysis (CCA) and Partial CCA to obtain site scores for niche position and niche width calculations.
cca_fit_ordination( otu, env, sel, covariates, standardize = TRUE, galaxy_colnum = TRUE )cca_fit_ordination( otu, env, sel, covariates, standardize = TRUE, galaxy_colnum = TRUE )
otu |
An OTU/species matrix or data frame (rows = taxa, columns = samples). |
env |
A data.frame of environmental variables (rows = samples). |
sel |
Variables (names or indices) defining the width axis (used in partial CCA). |
covariates |
Variables (names or indices) defining the position axis (used in CCA and as covariates in partial CCA). |
standardize |
Logical; if |
galaxy_colnum |
Logical; for numeric indices, whether to treat them as Galaxy-style 1-based indices (skipping column 1). |
A list containing:
Matrix of site scores from the partial CCA (width axes).
Named vector of site scores from the first axis of the CCA (position axis).
Matrix of all site scores from the CCA.
The CCA model object.
The Partial CCA model object.
Visualizes the relationship between environmental gradients and niche width at the sample or group level.
cca_plot_group_env( env, sample_summary, var, group = NULL, plot_type = c("sample", "group", "both"), method = c("lm", "loess"), make_plot = TRUE, galaxy_colnum = TRUE, show_ci = TRUE, annotate_stats = TRUE )cca_plot_group_env( env, sample_summary, var, group = NULL, plot_type = c("sample", "group", "both"), method = c("lm", "loess"), make_plot = TRUE, galaxy_colnum = TRUE, show_ci = TRUE, annotate_stats = TRUE )
env |
A sample-by-environment data.frame. |
sample_summary |
Output from |
var |
Environmental variable to plot against (name or index). |
group |
Named vector/factor of group labels (required for group plots). |
plot_type |
Type of plot: "sample", "group", or "both". |
method |
Smoothing method ("lm" or "loess"). |
make_plot |
Logical; if |
galaxy_colnum |
Logical; for numeric indices, whether to treat them as 1-based indices. |
show_ci |
Logical; whether to show confidence intervals on the smooth line. |
annotate_stats |
Logical; whether to annotate R-squared and P-value on the plot. |
A list containing data frames and ggplot objects for samples and/or groups.
Performs standardization and Principal Component Analysis (PCA) on environmental variables. This step assists in selecting constrained variables and covariates for downstream CCA or partial-CCA.
cca_prep_env( env, sel, constrain = NULL, standardize = TRUE, galaxy_colnum = TRUE )cca_prep_env( env, sel, constrain = NULL, standardize = TRUE, galaxy_colnum = TRUE )
env |
A data.frame of environmental variables. Rows must be sample IDs and columns must be environmental variables. |
sel |
A vector of variable names or column indices to be used as constrained variables. |
constrain |
Optional. A vector of variable names or column indices to be used as covariates. (Note: in the legacy workflow this parameter corresponds to 'covariates'). |
standardize |
Logical; if |
galaxy_colnum |
Logical; if |
A list containing:
The standardized environmental data frame.
Indices of selected variables.
Indices of constrained covariates.
The PCA result object (from prcomp).
A data frame combining SampleID, PCA axes, and constrained variables.
A matrix/table of correlations between PCA axes and original variables, including significance levels.
A controller function that orchestrates the CCA-based niche analysis workflow.
It dispatches the execution to either the gradient workflow (Steps 1-4)
or the group workflow (Steps 1-2, 5-6) based on the specified mode.
cca_workflow(mode = c("gradient", "group"), ...)cca_workflow(mode = c("gradient", "group"), ...)
mode |
A character string specifying the analysis mode.
Must be one of |
... |
Additional arguments passed to the specific workflow functions
( |
A list containing the results of the executed workflow steps.
See cca_workflow_gradient or cca_workflow_group for structure details.
set.seed(1) otu <- matrix(rpois(20*25, 5), nrow = 20) rownames(otu) <- paste0("OTU", 1:20) colnames(otu) <- paste0("S", 1:25) env <- data.frame( Temp = rnorm(25, 15, 3), pH = rnorm(25, 6.5, 0.4), SOC = rlnorm(25, 2, 0.3) ) rownames(env) <- colnames(otu) res <- cca_workflow( mode = "gradient", otu = otu, env = env, sel = c("Temp", "pH"), covariates = "SOC", make_plot = FALSE, top_node = 20 ) str(res, max.level = 1)set.seed(1) otu <- matrix(rpois(20*25, 5), nrow = 20) rownames(otu) <- paste0("OTU", 1:20) colnames(otu) <- paste0("S", 1:25) env <- data.frame( Temp = rnorm(25, 15, 3), pH = rnorm(25, 6.5, 0.4), SOC = rlnorm(25, 2, 0.3) ) rownames(env) <- colnames(otu) res <- cca_workflow( mode = "gradient", otu = otu, env = env, sel = c("Temp", "pH"), covariates = "SOC", make_plot = FALSE, top_node = 20 ) str(res, max.level = 1)
Runs the complete environmental gradient analysis pipeline: 1. Prepare env data (PCA) 2. Fit CCA/Partial-CCA 3. Calculate species niche traits 4. Analyze gradient vs niche position
cca_workflow_gradient( otu, env, sel, covariates, standardize = TRUE, method = c("lm", "loess"), var = sel[1], galaxy_colnum = TRUE, make_plot = TRUE, top_node = 10000 )cca_workflow_gradient( otu, env, sel, covariates, standardize = TRUE, method = c("lm", "loess"), var = sel[1], galaxy_colnum = TRUE, make_plot = TRUE, top_node = 10000 )
otu |
An OTU/species matrix or data frame (rows = taxa, columns = samples). |
env |
A data.frame of environmental variables. Rows must be sample IDs and columns must be environmental variables. |
sel |
A vector of variable names or column indices to be used as constrained variables. |
covariates |
Variables (names or indices) defining the position axis (used in CCA and as covariates in partial CCA). |
standardize |
Logical; if |
method |
Smoothing method for the plot ("lm" or "loess"). |
var |
Environmental variable for gradient plotting (passed to Step 4). |
galaxy_colnum |
Logical; if |
make_plot |
Logical; if |
top_node |
Integer; maximum number of most abundant species to use for loess smoothing calculation (default 10000). Points are plotted for all valid species. |
A named list containing outputs of steps 1 through 4.
Runs the complete group-based analysis pipeline: 1. Prepare env data (PCA) 2. Fit CCA/Partial-CCA 5. Calculate group niche widths 6. Plot group/sample niche width vs gradient
cca_workflow_group( otu, env, sel, group, covariates, standardize = TRUE, method = c("lm", "loess"), var = sel[1], choice = "all", galaxy_colnum = TRUE, make_plot = TRUE, plot_type = c("sample", "group", "both"), show_ci = TRUE, annotate_stats = TRUE )cca_workflow_group( otu, env, sel, group, covariates, standardize = TRUE, method = c("lm", "loess"), var = sel[1], choice = "all", galaxy_colnum = TRUE, make_plot = TRUE, plot_type = c("sample", "group", "both"), show_ci = TRUE, annotate_stats = TRUE )
otu |
An OTU/species matrix or data frame (rows = taxa, columns = samples). |
env |
A data.frame of environmental variables. Rows must be sample IDs and columns must be environmental variables. |
sel |
A vector of variable names or column indices to be used as constrained variables. |
group |
A named vector or factor defining groups. Names must match sample IDs. |
covariates |
Variables (names or indices) defining the position axis (used in CCA and as covariates in partial CCA). |
standardize |
Logical; if |
method |
Smoothing method ("lm" or "loess"). |
var |
Environmental variable for gradient plotting (passed to Step 6). |
choice |
Calculation mode. If |
galaxy_colnum |
Logical; if |
make_plot |
Logical; if |
plot_type |
Type of plot: "sample", "group", or "both". |
show_ci |
Logical; whether to show confidence intervals on the smooth line. |
annotate_stats |
Logical; whether to annotate R-squared and P-value on the plot. |
A named list containing outputs of steps 1, 2, 5, and 6.
Computes the abundance-weighted mean niche breadth for each sample, given an
OTU-by-sample table and OTU-level niche breadth estimates (e.g., from gam_fit_model).
The formula is:
gam_calc_sitewidth( otu, niche_df, otu_col = "OTU", width_col = "breadth50", weight_mode = c("auto", "counts", "relative") )gam_calc_sitewidth( otu, niche_df, otu_col = "OTU", width_col = "breadth50", weight_mode = c("auto", "counts", "relative") )
otu |
An OTU-by-sample matrix or data frame. |
niche_df |
A data frame containing at least columns for OTU IDs and niche width values. |
otu_col |
Character string; the column name in |
width_col |
Character string; the column name in |
weight_mode |
Character string; how to handle abundance weights. One of |
A data frame with columns sample and Bw50_abundance_weighted.
Fits a Generalized Additive Model (GAM) response curve for each OTU along a single environmental gradient. This function handles both count data (using Negative Binomial or Poisson families) and relative abundance data (using logit-transformed Gaussian models). It estimates the niche optimum and the 50% niche breadth (breadth50) for each taxon.
gam_fit_model( otu, env, env_var, data_type = c("auto", "count", "relative"), count_family = c("nb", "poisson"), use_offset = TRUE, lib_size = NULL, min_prev = 0.1, min_total = 100, min_mean = 1e-05, k_spline = 5, n_grid = 200, verbose = TRUE )gam_fit_model( otu, env, env_var, data_type = c("auto", "count", "relative"), count_family = c("nb", "poisson"), use_offset = TRUE, lib_size = NULL, min_prev = 0.1, min_total = 100, min_mean = 1e-05, k_spline = 5, n_grid = 200, verbose = TRUE )
otu |
An OTU-by-sample matrix or data frame. Rows should be OTUs/taxa and columns should be samples. Values can be counts or relative abundances. |
env |
A sample-by-environment data frame. Rows must be samples and columns must be environmental variables. |
env_var |
A character string specifying the column name in |
data_type |
Character string specifying the data type. One of |
count_family |
Character string specifying the error distribution family for
count data. One of |
use_offset |
Logical; whether to use an offset term (log library size) in the
GAM for count data. Default is |
lib_size |
Optional named numeric vector of library sizes (sequencing depth)
for each sample. If |
min_prev |
Minimum prevalence threshold (proportion of samples with abundance > 0). OTUs below this threshold are skipped. Default is 0.10. |
min_total |
Minimum total abundance threshold. Relevant for count data. Default is 100. |
min_mean |
Minimum mean abundance threshold. Relevant for relative abundance data. Default is 1e-5. |
k_spline |
Integer; the upper limit on the spline basis dimension for the
smooth term |
n_grid |
Integer; number of grid points along the gradient used to estimate optimum and niche breadth. Default is 200. |
verbose |
Logical; whether to print progress messages. Default is |
A data frame with one row per OTU and the following columns:
OTU identifier.
Number of samples with non-zero abundance.
Proportion of samples with non-zero abundance.
Total counts (for count data) or NA.
Adjusted R-squared (for relative abundance models) or NA.
Proportion of deviance explained.
Effective degrees of freedom of the smooth term.
Test statistic (F or Chi-sq) for the smooth term.
Approximate p-value for the smooth term.
Environmental value at the peak of the fitted curve.
Lower environmental bound where fitted abundance >= 50% of peak.
Upper environmental bound where fitted abundance >= 50% of peak.
Niche breadth (env50_max - env50_min).
Fits a GAM response curve for a specific OTU and plots the results. Includes options for confidence intervals and different color palettes.
gam_plot_species( otu, env, env_var, otu_id, data_type = c("auto", "count", "relative"), count_family = c("nb", "poisson"), use_offset = TRUE, lib_size = NULL, min_mean = 1e-05, min_prev = 0.1, k_spline = 5, n_grid = 200, add_ci = TRUE, palette = c("blue", "orange", "green", "purple", "viridis"), point_alpha = 0.85 )gam_plot_species( otu, env, env_var, otu_id, data_type = c("auto", "count", "relative"), count_family = c("nb", "poisson"), use_offset = TRUE, lib_size = NULL, min_mean = 1e-05, min_prev = 0.1, k_spline = 5, n_grid = 200, add_ci = TRUE, palette = c("blue", "orange", "green", "purple", "viridis"), point_alpha = 0.85 )
otu |
An OTU-by-sample matrix or data frame. |
env |
A sample-by-environment data frame. |
env_var |
Character string; the environmental variable to use as the gradient. |
otu_id |
Character string; the ID of the OTU to plot (must exist in |
data_type |
Character string; |
count_family |
Character string; |
use_offset |
Logical; whether to use library size offset for count data. |
lib_size |
Optional named vector of library sizes. |
min_mean |
Minimum mean abundance filter (for relative mode). |
min_prev |
Minimum prevalence filter. |
k_spline |
Spline basis dimension. |
n_grid |
Number of grid points for prediction. |
add_ci |
Logical; whether to plot the 95% confidence interval. |
palette |
Character string; color palette name ( |
point_alpha |
Numeric; transparency alpha for observed data points. |
A list containing:
Data frame of observed values (env, y).
Data frame of fitted values along the gradient.
The GAM model object.
Estimated niche optimum.
Lower bound of 50% niche breadth.
Upper bound of 50% niche breadth.
Niche breadth.
The ggplot object.
Computes OTU-level Levins niche breadth along a continuous composite axis (e.g., CCA1)
by discretizing the axis into bins (states). Abundance is first aggregated within bins
(mean or sum), converted to within-bin proportions , and then used to compute:
A standardized breadth is also reported:
where is the number of bins.
levins_calc_binned( otu, env, axis_var, nbin = 8L, bin_method = c("equal_freq", "equal_width"), agg_fun = c("mean", "sum"), otu_mode = c("auto", "count", "relative"), min_occ = 3L, min_abund = 5 )levins_calc_binned( otu, env, axis_var, nbin = 8L, bin_method = c("equal_freq", "equal_width"), agg_fun = c("mean", "sum"), otu_mode = c("auto", "count", "relative"), min_occ = 3L, min_abund = 5 )
otu |
An OTU-by-sample matrix or data frame. |
env |
A sample-by-environment data frame with row names as sample IDs. |
axis_var |
Character string; column name in |
nbin |
Integer; number of bins used to discretize the axis. Default is 8. |
bin_method |
Character string; binning strategy. |
agg_fun |
Character string; aggregation method within bins. |
otu_mode |
Character string; input data type. |
min_occ |
Minimum number of samples with abundance > 0 required to keep an OTU. |
min_abund |
Minimum total abundance required to keep an OTU. |
A data frame with one row per OTU and the following columns:
OTU identifier.
Name of the gradient axis used.
Number of bins (K).
Raw Levins niche breadth.
Standardized Levins niche breadth.
Number of samples where the OTU is present.
Total abundance of the OTU.
Computes the abundance-weighted mean Levins width for each sample, using OTU-level Levins widths and sample-level OTU abundances. Optionally plots the relationship between this community metric and an environmental gradient.
levins_calc_group( otu, env, levins_df, grad, width_col = "levins_Bstd", method = c("lm", "loess"), make_plot = TRUE )levins_calc_group( otu, env, levins_df, grad, width_col = "levins_Bstd", method = c("lm", "loess"), make_plot = TRUE )
otu |
An OTU-by-sample matrix or data frame. |
env |
A sample-by-environment data frame. |
levins_df |
A data frame of OTU-level Levins results containing at least
columns |
grad |
The environmental gradient to plot against (column name or index in |
width_col |
Character string; column name in |
method |
Character string; smoothing method for the trend line. |
make_plot |
Logical; whether to return a ggplot object. |
The community-mean width for sample is:
where is the relative abundance of OTU in sample
and is the standardized Levins width (levins_Bstd).
A list containing:
Data frame with columns Sample, ENV, and CommLevinsWidth.
The ggplot object (if make_plot=TRUE).
Merges species niche optima (from any method such as CCA, GAM, or weighted average) with standardized Levins niche width, and plots the position-width relationship showing the correlation (R-squared) and P-value.
levins_plot_pos_width( pos_df, levins_df, id_col, pos_col = "NichePosition", width_col = "levins_Bstd", method = c("lm", "loess"), make_plot = TRUE )levins_plot_pos_width( pos_df, levins_df, id_col, pos_col = "NichePosition", width_col = "levins_Bstd", method = c("lm", "loess"), make_plot = TRUE )
pos_df |
A data frame containing species niche positions. Must include |
levins_df |
A data frame containing Levins widths. Must include |
id_col |
Character string; column name of the species/OTU ID shared by both tables (e.g., |
pos_col |
Character string; column name of the niche position in |
width_col |
Character string; column name of the Levins width in |
method |
Character string; fitting method for the trend line. |
make_plot |
Logical; whether to return a ggplot object. |
A list containing:
Merged data frame used for plotting.
The ggplot object (if make_plot=TRUE).
Computes OTU-level niche breadth by treating each sample as a discrete state.
For OTU , let be the proportional abundance of OTU in sample
(i.e., abundance of OTU normalized to sum to 1 across all samples).
niche_width_calc( otu, env = NULL, min_occ = 3L, min_abund = 5, standardize = TRUE, method = c("levins", "shannon", "both") )niche_width_calc( otu, env = NULL, min_occ = 3L, min_abund = 5, standardize = TRUE, method = c("levins", "shannon", "both") )
otu |
An OTU-by-sample matrix or data frame (rows = OTUs, columns = samples). Values should be non-negative (counts or relative abundance). |
env |
Optional sample metadata data frame with row names as sample IDs.
If provided, samples are aligned by the intersection of |
min_occ |
Minimum number of samples with abundance > 0 required to keep an OTU. Default is 3. |
min_abund |
Minimum total abundance required to keep an OTU (sum across samples). Default is 5. |
standardize |
Logical; if |
method |
Character; which niche width index to compute. One of
|
Two niche-width indices are supported via method:
Levins breadth (Levins):
Optionally, a standardized breadth ranging from 0 to 1 is returned:
where is the total number of samples (states).
Shannon breadth (Shannon):
Terms with are excluded from the sum to avoid .
A data frame with one row per OTU. Columns include:
OTU identifier.
Number of states (samples) used in calculation.
Number of samples where the OTU is present.
Total abundance of the OTU across samples.
Raw Levins niche breadth. Present if method is "levins" or "both".
Standardized Levins niche breadth (if standardize=TRUE).
Present if method is "levins" or "both".
Shannon niche breadth (entropy). Present if method is "shannon" or "both".