This repository contains work for my M.Sc. on the identification of beta diversity hotspots using species distribution models (SDMs). The results are part of a manuscript available as a preprint on EcoEvoRxiv. There is also a specific repository for the manuscript.
This project is implemented in Julia v1.6.1. The required packages and versions are listed in Project.toml. To install them, run the first lines of src/required.jl. Some steps are also implemented in R v4.1.0, with packages & versions tracked by renv. More details below.
The data used in this project comes from the eBird Basic Dataset from June 2019. The project is for now focused on all warblers species (Parulidae family) in North America (CA, US, MX).
eBird Basic Dataset. Version: EBD_relJun-2019. Cornell Lab of Ornithology, Ithaca, New York. Jun 2019.
Note however that the data is not hosted in this remote repository due to size limitations.
The repository is organized as follows:
-
assets/contains the pre-coarsened Copernicus land cover data (downloaded and coarsened insrc/00c_data_landcover-copernicus.jl). -
data/is used to store the data.jld2/contains exported Julia.jld2elements, such as SDM predictions. Earlier versions relied heavily on these, but since they were too large to be version controlled, they have now been replaced by raster files inraster/. Some.jld2are still exported here and there but are not central to the analyses.proc/contains processed CSV data. Importantly, the prepared eBird data and some BART predictions are locally stored here but are not version controlled due to their size.raster/contains raster files with species distributions (observed and predicted) and environmental data layers. Raster files are now central to the workflow and are used to save and reload data between scripts.raw/contains the raw CSV datasets from eBird (not version-controlled).rdata/contains.RDatafiles used as backups in the R scripts, which are not essential and are not version controlled.
-
fig/contains the figures produced, organized by outcome (bartfor figures based on predicted data andrawfor figures based on observed data). -
src/contains all the scripts used in the project. Ordered scripts in this directory represent the main steps of the analyses. Subfolders contain scripts with a more specific use.lib/is the library of the custom functions used in the main scripts.others/contains useful scripts that are not part of the main analysesshell/contains Bash scripts used for some operations.
All analysis scripts are in src/.
main.jlcan be used to run all the analyses and produce the figures.required.jlloads all the required packages and library functions.
Else, the general workflow of the analyses is as follows:
-
00a_ebd_extraction.jlextracts the Warblers data from the complete EBD todata/raw(not version controlled). -
00b_ebd_preparation.jlprepares the Warblers data indata/rawfor the analyses, then saves the results indata/proc(not version-controlled) -
00c_landcover.jlprepares the landcover data from Copernicus and exports the environmental data as CSVs indata/procand as TIFF files indata/raster. -
01_distributionsassembles the species distributions from the raw data as layers, then exports these todata/raster. It also produces examples of single species maps. -
02_training_bart.Rtrains BARTs (Bayesian Additive Regression Trees) in R (packageembarcadero) based on the distribution and environmental rasters, then predicts the species distributions (exported as CSV files, which are not version controlled). -
03_predictions_bart.jlassembles the predicted distributions as layers and exports them as raster files. -
04_full-extent.jlperforms the main analysis steps (on the full spatial extent): getting species richness and LCBD values per site and verifying the relationship between the two. These steps can be performed on either the observed or predicted distributions. -
05_subareas.jlreapplies the analyses on smaller regions and investigates the effect of the spatial scale on the results. -
06_moving-windows.jlinvestigates the effect of the proportion of rare species on the relationship between species richness and LCBD values at varying scales. -
07_comparison_data.jlre-runs the main analysis steps on both the observed and predicted data and prepares the results for comparison in the following scripts. -
08_comparison_glm.jlperforms GLMs in R to compare the observed and predicted results and saves the results to be plotted in the next script. -
09_comparison_plots.jlproduces plots comparing the observed and predicted results. The comparison is made by comparing the results directly (called difference plots) or the GLM residuals produced in the previous script (called residual plots).
This code is built around the package SimpleSDMLayers.jl and its SimpleSDMLayer types, which are used to store the environmental variables and the species distributions.
-
analysis.jlcontains the functions to perform the main analyses. -
bart.Rcontains utility functions for the BART analyses. -
betadiv.jlcontains functions to compute beta diversity statistics. -
csvdata.jlcontains functions to prepare the data extracted from CSV files. -
landcover.jlcontains functions to extract and prepare the landcover data (similarly to the other data sources inSimpleSDMLayers.jl). -
plotting.jlcontains a function to allow easier plotting of theSimpleSDMLayertype elements. -
presence-absence.jlcontains the function to convert the raw data into a presence-absence layer. -
shapefiles.jlcontains a function to download the background shapefiles for plotting, and a function to clip them so that they overlap with aSimpleSDMLayer. -
version-control.jlcontains the list of important files which are too large to be version controlled and a set of custom functions to track their changes. See additional notes for details.
- For each important file that is too large to be version controlled, a version-controlled placeholder file was created (for example
data/proc/ebd_warblers_prep_placeholder.csv) to record the time where the large file was last updated on purpose. The placeholder is updated when the files are changed, and the functions will trigger a warning prompting to make sure the change was made on purpose. If it was, the placeholder should be re-committed with the new modification time. If the file was overwritten without a change (and the user is sure of it), the placeholder change can be discarded. - This project is based on previous proof of concept by @tpoisot, my M.Sc. advisor, at https://gitlab.com/tpoisot/BioClim.


