Boosted Regression Trees combine decision trees and boosting, and use a stagewise procedure to iteratively fit random subsets of the data.
Introduction
Boosted Regression Tree (BRT) models are a combination of two techniques: decision tree algorithms and boosting methods. Like Random Forest models, BRTs repeatedly fit many decision trees to improve the accuracy of the model. While Random Forest models use the bagging method, which means that each occurrence has an equal probability of being selected in subsequent samples, BRTs use the boosting method, in which occurrences get weighted probabilities for subsequent samples. The weights are applied in such a way that occurrences that were poorly modelled by previous trees have a higher probability of being selected in the new tree. This sequential approach is unique to boosting. Final fitted values are based on the entire dataset.
Advantages
-
Can be used with a variety of response types (binomial, gaussian, poisson).
-
Stochastic, which improves predictive performance.
-
The best fit is automatically detected by the algorithm.
-
Model represents the effect of each predictor after accounting for the effects of other predictors.
-
Robust to missing values and outliers.
Limitations
- Needs at least 2 predictor variables to run.
Assumptions
No formal distributional assumptions (non-parametric).Requires absence data?
Yes
Configuration options
Configuration option | Description |
Tree complexity (tc): | controls whether interactions between predictor variables are fitted. A value of 1 fits an additive model without interactions between the predictor variables, a value of 2 fits a model with up to two-way interactions, and so on. Generally, a larger ‘tc’ needs to be combined with a smaller ‘lr’: doubling ‘tc’ should be matched with halving ‘lr’. |
Learning rate (lr): |
determines the contribution of each tree to the growing model. Smaller learning rates increase the number of trees (n.trees) required. In general, a smaller ‘lr’ and larger ‘n.trees’ are preferable. Together, the learning rate and the tree complexity will determine the total number of trees in the final model. The aim is to find the combination of parameters that results in the minimum error for predictions. As a rule of thumb this combination should result in a model with at least 1000 trees. The optimal ‘tc’ and ‘lr’ values depend on the size of your dataset. For datasets with <500 occurrence points, it is best to model simple trees (‘tc’ = 2 or 3) with small enough learning rates to allow the model to grow at least 1000 trees. |
Bag fraction: | specifies the proportion of observations to be selected at each step. A value of 0.5 means that at each iteration 50% of the data is randomly selected, without replacement. Fractions between 0.5 and 0.75 are known from the literature to give the best results for presence-absence data. |
n folds: | number of subsets used for cross-validation. |
prev stratify: | whether subsets should be stratified. This means that the subsets are selected in such a way that the mean response value is approximately equal in all subsets. In case of binomial data, each subset will thus contain roughly the same proportion of each data class, for example presence/absence. |
Family: | distribution of the response variable. Select ‘bernoulli’ for binary responses such as presence/absence data. |
Number of trees (n.trees) | number of initial trees to fit, and then added to the model at each cycle. For example, if the default of 50 is selected, the model will start with fitting 50 trees using recursive binary partitioning of the data. Residuals from the initial fit are then fitted with another set of 50 trees, these residuals are then fitted with another set of trees, and so forth, whereby the process focuses more and more on poorly modelled occurrences from previous sets of trees. |
Max trees: | maximum number of trees to fit before stopping. |
Tolerance method: | method used in deciding to stop. If this is set to ‘fixed’, the value indicated in ‘tolerance value’ is used. If this is set to ‘auto’, the value is ‘tolerance value * total mean deviance’. |
Tolerance value: | used in ‘tolerance method’. |
References
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press.