Generalised Additive Model : BCCVL (Sandpit)

Introduction

Generalized Additive Models (GAMs) are an extension of Generalized Linear Models in such a way that predictor variables can be modeled non-parametrically in addition to linear and polynomial terms for other predictors. Therefore, GAMs are useful when the relationship between the variables are expected to be of a more complex form, not easily fitted by standard linear or non-linear models, or where there is no a priori reason for using a particular model.

Like GLMs, GAMs have three important components: 1) the probability distribution of the response variable, 2) the linear predictor (LP), which is a combination of the predictor variables, and 3) the link function that describes how the mean of the response depends on the linear predictor. However, in GAMs the coefficients of the predictor variables in the LP are replaced by a smoothing function. The model fits a smooth curve to each predictor variable and then combines the results additively. The GAM algorithm in BCCVL uses a cubic spline smoother.

The estimation of the values of the variable coefficients is obtained by maximum likelihood estimation (MLE), which maximizes the "agreement" of the predicted species occurrences with the observed data. In other words, MLE finds the values of the coefficients that result in a model under which you would be most likely to get the observed results. As for GLM models, GAM uses the iteratively reweighted least squares (IWLS) method for MLE.

Advantages

Able to deal with non-linear and non-monotonic relationships between the response and the predictor variables.
Able to deal with categorical predictors.

Limitations

More susceptible to overfitting. To avoid this, it is good practice to compare the model fit of a GLM with the fit of a GAM and evaluate whether the added complexity of GAMs is necessary in order to obtain a satisfactory fit to the data. If the fit of a GLM and GAM is comparable, it is advised to use a GLM model.
Less easy to interpret compared to GLMs.

Assumptions

No assumptions are made about the distributions of the environmental variables. However, they should not be highly correlated with one another because this could cause problems with the estimation.

Requires absence data

Yes

Configuration options

BCCVL uses the ‘gam’ function in the ‘mgcv’ package, implemented in biomod2.

Configuration option	Description
Weights:	allows to give more or less weight to particular observations. If this option is kept to NULL (default), each observation (presence or absence) has the same weight (independent of the number of presences and absences). If value = 0.5 absences will be weighted equally to the presences (i.e. the weighted sum of presence equals the weighted sum of absences). If the value is set below or above 0.5 absences or presences are given more weight, respectively.
Resampling:	number of permutations to estimate the importance of each variable. If this value is >0, the algorithm will produce an object called 'variableImportance.Full.csv', in which high values mean that the predictor variable has a high importance, whereas a value close to 0 corresponds to no importance.
Interaction level:	the number of interactions between predictor variables that need to be considered.
Family:	the description of the error distribution of the response variable and the link function used in the model. For binary data such as presence/absence of species, the binomial family is used (default in BCCVL).
Ridge regression penalty:
Epsilon:
Maximum MLE iterations:	the maximum number of IWLS iterations to find the maximum likelihood estimates.
Convergence tolerance:
Number of halvings:

References

Elith J, Graham CH, Anderson RP et al. (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29(2), 129-151.
Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press.
Guisan A, Edwards TC, Hastie T (2002) Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecological modelling, 157(2), 89-100.
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction. 2nd edition, Springer.

solutions